OF ENCYCLOPEDIA ARTIFICIALINTELLIGENCE VOLUME1
BOARD EDITORIAL SaulAmarel of NewJersey TheStateUniversity Rutgers, Nic...
51 downloads
2262 Views
161MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
OF ENCYCLOPEDIA ARTIFICIALINTELLIGENCE VOLUME1
BOARD EDITORIAL SaulAmarel of NewJersey TheStateUniversity Rutgers, NicholasFindler Arizona StateUniversitY
fohn McDermott lon UniversitY Carnegie-Mel lack Minker of MarYland UniversitY DonaldE. Walker Be l l C ommuni cati onsR esearch
David [. Waltz nesCorPoration Machi Thinking editor, BarbaraChernow Developmental
About the editor
E d i t o r - i n - C h i eS f tuorl C. Shop i r ob e g o n hi ste o ch i n g coreerof IndionoUniversity in 1972 oftereorningo BSof MITin 1966o n d o P h Din co mp u te r s c i e n c eo f t h e U n i v e r s i t o yf Wisconsinin 1971, He movedto SUNYof Buffoloin 1978where he iscurrentlyfulfprofessor ond choirmon of the Deportment of Computer Science.He is o member of the Associotionfor ComputingMochinery, the Associotionfor Computotionol Linguistics, the Instituteof Elect r i c o l o n d E l e c t r o n i cE s ngineers,the Societyfor the Study of ArtificiolIntelligence, ond the Societyfor the Interdisciplinory Studyof Mind. Hisreseorch interests include ortificiolintelligence,knowle d g e r e p r e s e n to ti o ni,n fe rence, end noturol-longuoge understonding.
ENCYCLOPEDIA OF ARTIFICIALINTELLIGENCE VOLUME1 StuortC. Shopiro, Editor-in-Chief Dovid Eckrofh, Monogingeditor Editor Developmental Editofial Services, George A. Vollosi, Chernow
Wiley-lntersciencePublication
fohn Wiley & Sons New York
/
Chichester /
Brisbane /
Toronto
/
Singapore
Copyright O 1987 by John Wiley & Sons,Inc. All rights reserved.Published simultaneouslyin canada. Reproduction or translation of any part of this work beyond that permitted by sections 102 or 10g of the 19z6 united States copyright Act without the permission of the copyright owner is unlawful. Requestsfor permission or further information should be addressedto the Permissions Department, John Wiley & Sons,fnc. Library of congress Cataloging in publication Data: Encyclopediaof artificial intelligence. "A Wiley-Intersciencepublication." 1. Artificial intelligence-Dictionaries. I. Shapiro, Stuart Charles. II. Eckroth, David. 006.3,03,2L Q335.E53 1987 86_26739 (set) ISBN 0-471-80748-6 (Vol. l) ISBN 0-471-62974-x Printed in the United States of America 109876543
i-!v**i- *" :F i" ;t: f:. l t r:.! j;' .....:....,,
i:::: ....J
L. r,r,. " -^t L.,...-r.,...i
i ::ti,3';.' V r, ;i"
EDITORIAL STAFF E d i t o r - i n - C h i e fS: t u a r tC . S h a p i r o Managing Editor: David Eckroth EditorialManager: Carole Schwager EditorialSupervisor:Robert Colden
ProductionManager: JenetMclver ProductionSupervisor:RoseAnn Campise P r o d u c t i o nA i d e : J e a nS p r a n g e r I n d e x e r :D i a n a W i t t
CONTRIBUTORS Marc Abrams,Universityof Maryland,Coltegepark, MD, coMpurER BruceW. Ballard,AT&T Bell Laboratories, Murray Hill, NJ, C9MpUTASYSTEMS T I O N A LL I N C U I S T I C S PhillipL. Ackerman,University of Minnesota, Minneapolis, MN, INTELLI- RananB. Banerji,St.Joseph'sUniversity,philadelphia,pA, CAME pLAyGENCE I N G ; M I N I M A XP R O C E D U R E sanjayaAddanki, IBM Corporation,yorktown Heights,Ny, coNNEC- StephenT. Barnard,SRI International, Menlo Park,CA, STEREO VISION TIONISM Harry G. Barrow, SchlumbergerPalo Alto Research,Palo Alto, CA, Gul Agha,Massachusetts Institute of Technology,Cambridge, MA, ACTOR MATCHINC FORMALISMS David R. Barstow,schlumberger-Doll, Ridgefiefd, cr, PROCRAMMINC Ashok K. Agrawala,Universityof Maryland,Collegepark, MD, coMASSISTANTS PUTERSYSTEMS MadeleineBates,Bolt, Beranek& Newman,Cambridge,MA, NATURALPhilip E. Agre, Massachusetts Instituteof Technology,Cambridge,MA, LANCUAGEINTERFACES CONTROLSTRUCTURES Antal K. Bejczy,Jet PropulsionLaboratory,Pasadena, CA, TELEOPERANarendraAhuja,University of lllinois,Urbana,lL, DOT-PATTERN ANALYTORS SIST ; E X T U RAEN A L Y S I S Robertc. Berwick,Massachusetts Instituteof Technology,Cambridge, MA, faniceS. Aikins,AION Corporation,PatoAlto, CA, ACENDA-BASED SySCRAMMAR,TRANSFORMATIONAL TEMS Alan w. Biermann,Duke University,Durham,NC, AUTOMATICpRoselim G. Akl, sRl International, Menlo park, cA, cHECKERS-PLAY|Nc G R A M M I N C PROCRAMS ThomasO. Binford, StanfordUniversity,Stanford,CA, CENERALIZED famesF. Allen, Universityof Rochester, Rochester, Ny, spEECHACTS C Y LN I D E RR E P R E S E N T A T I O N fonathanAllen, Massachusetts Instituteof Technology,Cambridge,MA, Roberto Bisiani, Carnegie-MeflonUniversity, pittsburgh, pA, BEAM S P E E CR HE C O C N I T I O N S;P E E C S HY N T H E S I S SEARCH PeterK. Aflen,University pA, MULTISEN- Piero P. Bonissone,General of Pennsyfvania, Philadelphia, Efectric, schenectady,Ny, REAS9N lNC, SORINTECRATION PLAUSIBLE sergiof . Alvarado,Universityof California,LosAngeles CA,scRlprs E. f. Briscoe,Universityof Lancaster, , cambridge,uK, spEECHuNDERSaulAmarel,Rutgers University, New Brunswick, NJ,PROBLEM SOLVINC S T A ND I N C charlesAmes,4g-BYaleAvenue,Eggertsville, Ny, MUSlc, Al lN christopherM. Brown, Universityof Rochester, Rochester, Ny, HoucH RobertA. Amsler,BellCommunications Research, Morristown,NJ,LITERTRANSFORM A T U R EO F A I BertramBruce, Bolt Beranek& Newman,Cambridge,MA, DlscouRSE DanaAngluin,YaleUniversity, New Haven,cT, lNDucTlvE INFERENCE UNDERSTANDING; CRAMMAR,CASE Kulbirs. Arora,stateUniversityof New york, Buffalo,Ny, BELLE; BoRIS; MauriceBruynoo8he,KatholiekeUniversiteit Leuven,Heverlee,Belgium, C A D U C E U SE; P I S T L E U ; R I S K OF;o L ; F R L ;M E R L I NM ; SMALAPR9P; BACKTRACKI NG; COROUTI N ES N O A H ; P A N D E M O N I U M ;P A R R Y ;P H R A N A N D P H R E D ;R o S I E ; RichardR. Burton,XeroxPaloAlto Research Center,PaloAlto,CA, CRAMSNIFFER MAR, SEMANTIC peqn-syJyqllE RuzenaBaicsy,Universityof p3, MULT|SEN?hil_a_d_elBNa, pA, LEARNfaime G. Carbonell,Carnegie-Mef lon University,Pittsburgh, -! \ soRTNTECRAION wlTl ;;r,r\\efN I N C , M A C H I N EN ; ATURAL.LANCUAC UEN D E R S T A N D I N C GUGGENHEIh,IM AMORTALLIBRARY MONMOUTTICOLLEGE WEST LONC BRAhICH, NEW JERSEY UTI6I
tvtONhnOUTH UNlVEffiSi]"f LrtsFrAt':i'.,'n bW.gTLONGBRAFICH,NJ ON64
Vi
CONTRIBUTORS
fohn Case, State Universityof New York, Buffalo, NY, RECURSION; T U R I N CM A C H I N E lN, MANIPULATORS RaymondCipra,PurdueUniversity,West Lafayette, Civil, Lisboa,Portugal, Nacionalde Engenharia HelderCoelho,Laboratorio CRAMMAR,DEFIN ITE-CLAUSE HaroldCohen,Universityof California,La Jolla,CA, ARTS,Al lN Amherst,MA, DISTRIBDaniel D. Corkill, Universityof Massachusetts, SOLVINC UTEDPROBLEM , CA, LISPMAMace Creeger,LISPMachinesCompany,Los Angeles CHINES fames L. Crowfey, Carnegie-MellonUniversity,Pittsburgh,PA, PATH AVOIDANCE PLANNINGAND OBSTACLE Richard E. Cullingford, Ceorgia Instituteof Technology,Atlanta, CA, SCRIPTS MA, LISPMACHINES GeorgeCuret,LISPMachinesCompany,Cambridge, G. R. Dattatreya,Universityof Maryland,CollegePark,MD, PATTERN RECOCNITION COMErnestDavis,New York University,New York, NY, REASONINC, MONSENSE EXLarryS. Davis,Universityof Maryland,CollegePark,MD, FEATURE TRACTION APDC, MILITARY, Washington, Laboratory, LauraDavis,NavalResearch IN PLICATIONS Martin Davis,New York University,New York, NY, CHURCH'STHESIS fohan de Kleer, Xerox Palo Alto ResearchCenter,Palo Alto, CA, BACKPHYSICS QUALITATIVE TRACKINC,DEPENDENCY.DIRECTED; Congress, States Assessment-United Technology of Office Dray, fim OF Al Washington , DC, SOCIALISSUES GavanDuffy, Universityof Texas,Austin,TX, HERMENEUTICS MichaelG. Dyer, Universityof California,LosAngeles,CA, SCRIPTS George \ry. Ernst, Case-WesternReserveUniversity,Cleveland,OH, M E A N S - E N DASN A L Y S I S MA, CONCambridge, ThinkingMachinesCorporation, Carl R. Feynman, N E C T I O NM A C H I N E VISION Menlo Park,CA, STEREO Martin A. Fischler,SRIInternational, MILITARY, VA, Mclean, Corporation, Research Planning Franklin, fude IN APPLICATIONS Peter \ry. Frey, NorthwesternUniversity, Evanston,lL, HORIZON
Lawrencef. Henschen,NorthwesternUniversity,Evanston,lL, INFERPROVINC E N C ER ; E A S O N I N CT;H E O R E M Instituteof Technology,Cambridge,MA, ACCarl Hewitt, Massachusetts TOR FORMALISMS MA, Cambridge, Instituteof Technology, EllenC. Hildreth,Massachusetts OPTICALFLOW EDGEDETECTION; fane C. Hill, SmithCollege,Northampton,MA, LANCUAGEACQUISITION Murray Hill, NJ, DEEPSTRUCDonald Hindle, AT&T Bell Laboratories, T UR E University,Pittsburgh,PA, BOLTZGeoffrey Hinton, Carnegie-Mellon MANN MACHINE GraemeHirst, Universityof Toronto,Toronto,Ontario,SEMANTICS C. f . Hogger,Universityof London,London,UK, LOCICPROCRAMMINC SpringHouse,PA, CHEMISBruceA. Hohne,Rohmand HaasCompdfrY, TRY,AI IN famesHollenb€rg,New EnglandMedicalCenter,Boston,MA, DECISION THEORY Keith f. Holyoak,Universityof California,LosAngeles,CA, COGNITiVE PSYCHOLOCY of lllinois,Urbana,lL, MOTIONANALYSIS ThomasS. Huang,University fonathanf. Hull, StateUniversityof New York, Buffalo,NY, CHARACTER
RECOCNITION Instituteof Technology,Carnbridge,MA, RogerHurwitz, Massachusetts HERMENEUTICS DC, HUMANWashington, Laboratory, Robertf . K. facob,NavalResearch INTERACTION COMPUTER Murray Hill, NJ, COMPUTAMark A. fones, AT&T Bell Laboratories, T I O N A LL I N C U I S T I C S Philadelphia,PA, CRAMAravind K. foshi, Universityof Pennsylvania, MAR, PHRASE-STRUCTURE University,Pittsburgh,PA, COLORVlTakeoKanade,Carnegie-Mellon SION LaveenN. Kanal,Universityof Maryland,CollegePark,MD, PATTERN RECOGNITION lNG. P. Kearsley,ParkRow Software,La Jolla,CA, COMPUTER-AIDED S T R U C T I O NI N , TELLIGENT NY, REPInstituteof Technology,Rochester, RobertP. Keough,Rochester RE-FRAME WI RESENTATION, EFFECT Instituteof Technology,Cambridge,MA, Samuelf . Keyser,Massachusetts RichardP. Gabriel,Lucid,Inc., Menlo Park,CA, LISP P H O N E M E S APPLICATIONS LAW CA, Annev.d.L.Gardner,286SelbyLane,Atherton, Ann Arbor, Ml, COGNITIVE ScottR. Garrigttr,LehighUniversity,Bethlehem,PA, ROBOTS,ANTHRO- David E. Kieras, UniversitYof Michigan, M O D E L I N G POMORPHIC New Haven,CT, ROBOT-CONCeraldGazdar,Universityof Sussex,Brighton,UK, GRAMMAR,CENER- Daniel E. Koditschek,Yale University, SYSTEMS TROL RE ALIZEDPHRASESTRUCTU HEULosAngeles,CA,SEARCH; of California, TAKER; RichardE. Korf,University lamesGeller,StateUniversityof New York, Buffalo,NY, ADVICE RISTICS OPS'5; LOCO; MICROPLANNER; INTELLECT; ELIZA;EPAM;HACKER; STU. Kimmo Koskenniemi,Universityof Helsinki, Helsinki, Finland,MORSNOBAL-4; SNCPS; SHRDLU;SIMULA;SMALLTALK; SCHOLAR; PHOLOCY DENT A. Kowalski,Universityof London, London, UK, LOCIC PROPATTERN Robert MN, Minneapolis, Minnesota, of Maria L. Gini, University CRAMMINC REDUCTION MATCHINC;PROBLEM of Toronto,Toronto,Ontario,REPRESENTARichardD. Greenblatt,LISPMachinesCompany,Cambridge,MA, LISP BryanM. Kramer,University K N O W L E D C E TION, MACHINES CAUSAL of Texas,Austin,TX, REASONING, David D. Grossman,IBM Corporation,YorktownHeights,NY, AUTOMA- BenjaminKuipers,University CasimirA. Kulikowski,RutgersUniversity,New Brunswick,NJ, DOMAIN TION,INDUSTRIAL HarrisonHall, Universityof Delaware,Newark,DE, PHENOMENOLOCY K N O W L E D G E of Texas,Austin,TX, SEARCH,BRANCH-ANDShoshanaL. Hardt, StateUniversityof New York, Buffalo,NY, CONCEP- vipin Kumar,University D E PTH-FIRST S E A R C H , B O U N D ; YH ; Y S I C SN,A I V E T U A L D E P E N D E N CP East37th Ave., Eugene,OR, SELF-REPLICATION 290 Laing, Richard AUTONOCA, Diego, San Center, Systems Ocean Naval Harmon, Y. Scott MACHINE Pat Langley,Universityof California,lrvine,CA, LEARNING, ROBOTS,MOBILE MOUS VEHICLES; PA, CREATIVITY MichaelLebowitz,ColumbiaUniversity,New York,NY, MEMORYORCAPittsburgh, University, Carnegie-Mellon Hayes, R. fohn NIZATIONPACKETS PA, NATURALUniversity,Pittsburgh, Philip f. Hayes,Carnegie-Mellon Amherst,MA, EMOTION G. Lehnert,Universityof Massachusetts, Wendy D E R S T A N D I N C N U LANCUACE A N A L Y S I S S T O R Y M O D E L I N C ; BLACKBOARD CA, Alto, BarbaraHayes-Roth,StanfordUniversity,Palo Larry f . Leifer,StanfordUniversity,Stanford,CA, PROSTHESES SYSTEMS M. Lesgold,Universityof Pittsburgh,Pittsburgh,PA, EDUCATION SYSAlan EXPERT CA, Alto, Frederick Hayes-Roth, Teknowledge, lnc., Palo APPLICATIONS TEMS; RULE.BASEDSYSTEMS Amherst,MA, DISTRIBUTED Universityof Massachusetts, R. Lesser, Austin Henderson, Xerox Palo Alto ResearchCenter, PaloAlto,CA, OFFICE Victor SOLVINC PROBLEM AUTOMATION
CONTRIBUTORS
vii
, CA, MEDICALADVICE Gfenn f. Rennels,StanfordUniversity,Stanford SYSTEMS ElaineA. Rich, Microelectronicsand ComputerTechnologyCorporation INTELLICENCE (MCC),Austin,TX, ARTIFICIAL ChristopherK. Riesbeck,Yale University,New Haven,CT, PARSINC,EXPECTATION.DRIVEN fay Rosenberg,StateUniversityof New York, Buffalo,NY, BASEBALL; FOLOOPS;MACHACK-6;POP-2;REASONINC, 4.5; KAISSA; CHESS SHAKEY CUS-OF-ATTENTION ; REF-ARF; Paul S. Rosenbloom,StanfordUniversity,PaloAlto, CA, SEARCH,BESTFIRST MA, DISCOURSE Remkof . H. Scha,Bolt Beranek& Newman,Cambridge, METHODS CHESS REVIBELIEF Portugal, U N D E R S T A N D I N G Lisboa, Tecnico, Superior Instituto Martins, P. foao LenhartK. Schubert,Universityof Alberta,Edmonton,Alberta,MEMORY, SION PA, DESEMANTIC Pittsburgh, University, Carnegie-Mellon McClelfand, L. fames facobT. Schwarlz,New YorkUniversity,New York,NY, LIMITSOF ARTIMONS Drew V. McDermott, Yale UniversitY,New Haven,CT, REASONINC, F I C I A LI N T E L L I C E N C E PA,COLORVlUniversity,Pittsburgh, StevenA. Shafer,Carnegie-Mellon TEMPORAL REASONING, SPATIAL; NATUMA, S I O N Amherst, Massachusetts, of University McDonald, David D. StuartC. Shapiro,StateUniversityof New York, Buffalo,NY, PROCESSUACECENERATION RAL.LANG INC, BOTTOM.UPAND TOP-DOWN CA, COMAngeles, Los California, of University Michel A. Melkanoff, David E. Shaw,ColumbiaUniversity,New York, NY, NON-VON P U T E R . A I D EDDE S I G N PaloAlto, CA, PROSystems, Inc., Cincinnati,OH, BeauA. Sheil,XeroxArtificialIntelligence Associates M. EugeneMerchant,Metcut Research C R A M M I N CE N V I R O N M E N T S RING MANU FACTU COMPUTER-INTECRATED Laboratory,lbaraki,Japan,PROXIMITY RyszardS. Michalski,Universityof lllinois,Urbana,lL, CLUSTERING; YoshiakiShirai,Electrotechnical SENSINC CONCEPTLEARNINC TEMPOYoav Shoham,Yale University,New Haven,CT, REASONINC, famesH. Moor, DartmouthCollege,Hanover,NH, TURINCTEST RAL APPLICATIONS MILITARY VA, Mclean, Corp., Mitre PaulMorawski, EdwardH. Shortliffe,StanfordUniversity,Stanford,CA, MEDICALADVICE IN METASYSTEMS NY, Buffalo, York, New of University State Morgado, Ernesto RandallShumaker,Naval ResearchLaboratory,Washington,DC, MlLlK N O W L E D C E- R , U L E SA, N D - R E A S O N I N C IN TARY,APPLICATIONS MargaretC. Moser, Bolt, Beranek& Newman,Cambridge,MA, CRAMMN, ALPHA-BETA lamesR. Slagle,Universityof Minnesota,Minneapolis, MAR, CASE REDUCTION PROBLEM MATCHINC; PATTERN PRUNING; REPRESENTAOntario, Toronto, Toronto, of University fohn Mylopoulos, Steven L. Small, Universityof Rochester,Rochester,NY, PARSINC, T I O N ,K N O W L E D C E WORD-EXPERT ANTHROROBOTS, PA, Bethlehem, University, Lehigh Nagel, RogerN. Brian C. Smith, Xerox Palo Alto ResearchCenter,Palo Alto, CA, SELFPOMORPHIC REFERENCE WA, LlNCUlSSeattle, Frederickf. Newmeyer,Universityof Washington, INFERof Maryland,CollegePark,MD, INDUCTIVE CarlSmith,University AND PERFORMANCE TICCOMPETENCE ENCE Menlo Park,CA, ROBOTICS David Nitzan,SRIInternational, NETWORKS Thornwood,NY, SEMANTIC REASON- lohn K. Sowa,IBM Corporation, VA, EPISTEMOLOCY; faneT. Nutter,VirginiaTech,Blacksburg, INFORMAUK, of Cambridge, Cambridge, Karen University Sparck fones, ING,DEFAULT NY, SENTION RETRIEVAL Kennethf. Overton, Ceneral ElectricCompany,Schenectady, SargurN. Srihari,StateUniversityof New York, Amherst,NY, VITERBI SORS ALGORITHM Instituteof Technology,Cambridge,MA, SeymourPapert,Massachusetts of lllinois,Urbana,lL, CLUSTERING COMPUTERSIN EDUCATION,CONCEPTUALISSUES;PERCEP. RobertStepp,University Salvatoref . Stolfo,ColumbiaUniversity,New York, NY, DADO TRON RohitParikh,City Universityof New York,New York,NY, MODAL LOCIC William R. Swartout,Universityof SouthernCalifornia,Marinadel R"y, CA, EXPLANATION StephenG. Pauker,New EnglandMedicalCenter,Boston,MA, DECISION Ming RueyTaie, StateUniversityof New York, Buffalo,NY, AM; DENTHEORY T ;A C S Y M AM; Y C I N ;P A M ; D R A L ;E L I ;E M Y C I NC; U I D O N ;I N T E R N I SM fudeaPearl,Universityof California,LosAngeles,CA, AND/ORCRAPHS; X-CON SAM;SOPHIE; POLITICS; PROLOC;PROSPECTOR; BAYESIANDECISIONMETHODS; BRANCHINCFACTOR;CAME fay M. Tenenbaum,SchlumbergerPalo Alto Research,Palo Alto, CA, TREES DonafdPerlis,Universityof Maryland,CollegePark,MD, CIRCUMSCRIP- M A T C H I N C MENUInc., Dallas,TX, ELLIPSIS; Harry Tennant,TexasInstruments, T I O N ; R E A S O N I N CN, O N M O N O T O N I C BASEDNATURALLANCUACE StanleyR. Petrick,IBM Corporation,YorktownHeights,NY, PARSINC PaloAlto Research Center,PaloAlto, SpringHouse,PA,CHEMIS- DemetriTerzopoulos,Schlumberger ThomasH. Pierce,Rohmand HaasCompdoY, CA, VISUALDEPTHMAP T R Y ,A I I N PA, INHERIUniversity,Pittsburgh, lra Pohl, Universityof California,SantaCruz, CA, SEARCH,BIDIREC- David S. Touretzky,Carnegie-Mellon TANCEHIERARCHY TIONAL Livia Polanyi, Bolt Beranek& Newman,Cambridge,MA, DISCOURSE fohn K. Tsotsos,Universityof Toronto,Toronto,Ontario,IMACEUNDERSTANDINC UNDERSTANDINC ANALUniversityof lllinois,Urbana,lL, DOT-PATTERN Keith E. Price, Universityof SouthernCalifornia,Los Angeles,CA, RE- MihranTuceryatr, YSIS C I O N - B A S ES DE C M E N T A T I O N NY, PHILOSOPHICAL University,Syracuse, ZenonW. Pylyshyn,The Universityof WesternOntario,London,Ontario, RobertVan Gulick,Syracuse QUESTTONS SCIENCE COCNITIVE Everberg, Belgium,BACKWilliam f. Rapapofr,StateUniversityof New York, Buffalo,NY, BELIEF Raf Venken,BelgianInstituteof Management, T R A C K I N CC; O R O U T I N E S LOCIC,PROPOSITIONAL LOCIC;LOCIC,PREDICATE; $Y$TEMS; Pasadena, CA, PLANNINC StevenA. Vere,JetPropulsionLaboratory, PaloAlto, CA, A* ALCORITHM BertramRaphael,Hewlett-Packard,
lnstituteof Technolagy , Cambridge,MA, Henry Lieberman,Massachusetts LANCUACES,OBJECT-ORIENTED MEMORY G. lack Lipovski,Universityof Texas,Austin,TX, ASSOCIATIVE Donaldw. Loveland,Duke University,Durham,NC, COMPLETENESS Alan K. Mackworth,Universityof BritishColumbia,Vancouver,British SATISFACTION Columbia,CONSTRAINT Anthony S. Maida, The PennsylvaniaStateUniversity,UniversityPark, FRAMETHEORY Pennsylvania, Instituteof Technology,Cambridge,MA, lohn C. Mall€ry, Massachusetts HERMENEUTICS Alberta,COMPUTER of Alberta,Edmondton, TonyA. Marsland,University
Viii
CONTRIBUTORS
PaloAlto Research Center,PaloAlto, CA, R. Verott, lJniversityof New Mexico,Albuquerque,NM, RESOLUTION, Andrew Witkin, Schlumberger METHODS SCALESPACE BINARY CA, CYBERNETICS Robertf. Woodham,Universityof BritishColumbia,Vancouver,British Heinz von Foerster,1 EdenWest Road,Pescadero, DeborahWalters,StateUniversityof New York, Buffalo,NY, REPRESEN- Columbia,SHAPEANALYSIS TATION,ANALOCUE William A. Woods, AppliedExpert Systems,Inc. and HarvardUniversity, AUGMENTED TRANSITION NETWORK; MA,GRAMMAR, SEMANCambridge, Cambridge, MA, WALTZ DavidL. Waltz, ThinkingMachinesCorporation, TICS,PROCEDURAL FILTERINC Argonne,lL, RESOLUTION, Mitchell Wand, NortheasternUniversity,Boston, MA, LAMBDA CAL- LawrenceWos,ArgonneNationalLaboratory, B I N A R Y CULUS MEMORY A. HanyongYuhan,StateUniversityof New York, Buffalo,NY, CONMichaelf. Watkins,Rice University,Houston,TX, EPISODIC ; R L ;L I F E RL; U ; E A R S A IYI ; K L - O N EK ; P S ;H A R P Y H N I V E RF ; R U M PG PA, QUESPhiladelphia, Bonnie[. Webber,Universityof Pennsylvania, ; TRIPS ; I R ;S L I PS N A R ;P L A N E SP; L A N N E RS; A I N T S TION ANSWERING Yorick Wilks, New Mexico StateUniversity,LasCruces,NM, MACHINE StevenW. Zucker,McCill University,Montreal,Quebec,VlSlON,EARLY TRANSLATION ; PRIMITIVES
REVIEWERS f . K. Aggarwal,Universityof Texas,Austin,TX Washington,DC famesF. Albus,NationalBureauof Standards, NY Rochester, Rochester, of Allen, University fames Instituteof Technology,Cambridge,MA fonathanAllen, Massachusetts SaulAmarel,RutgersUniversity,New Brunswick,NJ Menlo Park,CA D. E. Appelt,SRIInternational, MichaelArbib, Universityof California,SanDiego,CA PA Philadelphia, NormanBadler,Universityof Pennsylvania, PA Philadelphia, RuzenaBajcsy,Universityof Pennsylvania, RobertBalzer,Universityof SouthernCalifornia,Marinadel Ray,CA NY Rochester, Universityof Rochester, Amit Bandyopadhyay, PA RananB. Banerji,St.Joseph'sUniversity,Philadelphia, Cambridge, MadeleineBates,Bolt, Beranekand Newman Laboratories, MA GerardoBeni,Universityof California,SantaBarbara,CA Menlo Park,CA fared Bernstein,SRIInternational, DonaldBerwick,HarvardCommunityHealthPlan,Cambridge,MA lnstituteof Technology,Cambridge,MA RobertBerwick,Massachusetts Alan Biermann,Duke University,Durham,NC and ComputerTechnologyCorporation Woody Bledsoe,Microelectronics (MCC),Austin,TX Instituteof Technology,Cambridge,MA Ned Block,Massachusetts Center,PaloAlto, CA Daniel Bobrow,XeroxPaloAlto Research MargaretA. Boden,Universityof Sussex,Brighton,UK State,MS StateUniversity,Mississippi LoisBogges,Mississippi Instituteof Technology,Cambridge,MA Michael Brady,Massachusetts Instituteof Technology,Cambridge,MA RodneyBrooks,Massachusetts Chris Brown, Universityof Rochester,Rochester,NY Center,PaloAlto, CA lohn S. Brown, XeroxPaloAlto Research BertramBruce,Bolt, Beranek& Newman,Cambridge,MA Leuven,Heverlee,Belgium MauriceBruynooghe,KatholiekeUniversiteit BruceBuchanan,StanfordUniversity,Stanford,CA Arthur Burks,Universityof Michigan,Ann Arbor, Ml Holmdel,NJ David Burr, Bell Laboratories, PA University,Pittsburgh, laime Carbonell,Carnegie-Mellon EugeneCharniak,Brown University,Providence,Rl MurrayHill, NJ KennethW. Church,AT&T Bell Laboratories, K. L. Clark,QueenMary College,London,UK l. C. Colson,IBM Corporation,Austin,TX LawrenceDavis,Universityof Maryland,CollegePark,MD Martin Davis,New York University,New York, NY Center,PaloAlto, CA fohan de Kleer,XeroxPaloAlto Research Daniel Dennett,TuftsUniversity,Medford,MA Atomique,Cif sur Yvette,France a L'Energie f . Detriche,Commissariat PA University,Pittsburgh, fohn Doyle,Carnegie-Mellon Hubert Dreyfus,Universityof California,Berkeley,CA Menlo Park,CA RichardDuda,Syntelligence, MichaelDyer, Universityof California,LosAngeles,CA Alberto Elses,Carnegie-Mellon University,Pittsburgh,PA E. Aflen Emerson,Universityof Texas,Austin,TX GeorgeW. Ernst,CaseWesternReserveUniversity,Cleveland,OH RichardFateman,Universityof California,Berkeley,CA
NY ferry Feldman,Universityof Rochester,Rochester, NichofasFindler, ArizonaStateUniversity,Tempe,AZ HarveyFineberg,HarvardSchoolof PublicHealth,Boston,MA Fernando,Flores,Logonet,Berkeley,CA PA University,Pittsburgh, Mark Fox,Carnegie-Mellon EugeneC. Freuder,Universityof New Hampshire,Durham,NH PeterW. Frey,NorthwesternUniversity,Evanston,lL foyce Friedman,221Mt. AuburnSt.,Cambridge,MA GeraldGazdar,Universityof Sussex,Brighton,UK MichaelGeorgeff,SRIInternational,Menlo Park,CA Center,PaloAlto, CA Adele Goldberg,XeroxPaloAlto Research RichardGreenblatt,LispMachinelnc., Cambridge,MA MD EvonC. Greanias,IBM Corporation,Caithersburg, Instituteof Technology,Cambridge,MA l,V.E. L. Grimson,Massachusetts David Grossman,IBM Corporation,YorktownHeights,NY Center,PaloAlto, CA Chris Halvorsen,XeroxPaloAlto Research Amherst,MA A. R. Hanson,Universityof Massachusetts, Ann Arbor,Ml RobertHarlick,MachineVisionInternational, Center,San Diego,CA Scott Harmon, Naval Ocean Research RobertM. Harnish,Universityof Arizona,Tucson,AZ Menlo Park,CA PeterHart, Syntelligence, lohn Haugeland,Universityof Pittsburgh,Pittsburgh,PA StanfordUniversity,Stanford,CA BarbaraHayes-Roth, TecknowledgeInc., PaloAlto, CA FrederickHayes-Roth, University,Fairfield,lA ChrisHaynes,MaharishiInternational Gary Hendrix,Symantec,Cupertino,CA Instituteof Technology,Cambridge,MA Carl Hewitt, Massachusetts Instituteof Technology,Cambridge,MA EllenC. Hildreth, Massachusetts Cambridge,MA W. DanielHillis,ThinkingMachinesCorporation, University,Pittsburgh,PA GeoffreyHinton, Carnegie-Mellon GraemeHirst, Universityof Toronto,Toronto,Ontario f. R. Hobbs,Ablex Publishing,Norwood,NJ Keith Holyoak,Universityof California,LosAngeles,CA Instituteof Technology,Cambridge,MA BertholdHorn, Massachusetts RobertA. Hummel, New York University,New York, NY Menlo Park,CA David lsrael,SRIInternational, Rayfackendoff,BrandeisUniversity,Waltham,MA PA Philadelphia, Aravindfoshi, Universityof Pennsylvania, University,Pittsburgh,PA TakeoKanade,Carnegie-Mellon LaveenKanal,Universityof Maryland,CollegePark,MD RobertKling,Universityof California,lrvine,CA fanet Kolodner,CeorgiaInstituteof Technology,Atlanta,GA William Kornfeld,QuintasCorporation,PaloAlto, CA of Helsinki,Helsinki,Finland KimmoKoskenniemi, University RobertKowalski,Universityof London,London,UK BenjaminKuipers,Universityof Texas,Austin,TX Vipin Kumar,Universityof Texas,Austin,TX MichaelLebowitz,ColumbiaUniversity,New York, NY Amherst,MA Wendy Lehnert,Universityof Massachusetts, and ComputerTechnologyCorporation DouglasB. Lenat,Microelectronics (MCC),Austin,TX Atomique,Cif sur Yvette,France a L'Energie B. Lesigne, Commissariat
REVIEWERS Naomi Sager,New York University,New York, NY Amherst,MA Victor Lesser,Universityof Massachusetts, G. Salton,CornellUniversity,lthaca,NY NJ Hill, Murray DianeLitman,AT&T Bell Laboratories, Ericf . Sandewall,LinkoepingUniversity,Linkoeping,Sweden Ray Liuzzi, CriffithsAir ForceBase,Rome,NY L. K. Schubert,Universityof Alberta,Edmonton,Alberta Donald Loveland,Duke University,Durham, NC University,Pittsburgh,PA Instituteof Technology,Cambridge, StevenShafer,Carnegie-Mellon Massachusetts TomasLozano-Perez, lbaraki,Japan Laboratories, Electrotechnical Shirai, Yoshiaki MA Park,MD College Maryland, of University Ben Shneidermar, SC Clemson, University, Luh, Clemson fohn TX Austin, Texas, of University British CoVancouver, Simmons, Robert Alan Mackworth,Universityof BritishColumbia, PA University,Pittsburgh, HerbertSimon,Carnegie-Mellon lumbia StateUniversity,UniversityPark,PA famesR. Slagl€,Universityof Minnesota,Minneapolis,MN AnthonyS. Maida,The Pennsylvania Rochester,NY SteveSmall,Universityof Rochester, Instituteof Technology,Cambridge,MA fohn Mallery, Massachusetts Center,PaloAlto, CA Brian Smith,XeroxPaloAlto Research Instituteof Technology,Cambridge,MA David McAllister,Massachusetts Park,MD College Maryland, of lL University Smith, Chicago, Carl of Chicago, University McCawley, famesD. PA DouglasR. Smith, KestrelInstitute,PaloAlto, CA University,Pittsburgh, fames[. McClelland,Carnegie-Mellon famesSolberg,PurdueUniversity,West Lafayette,lN Drew McDermott,Yale University,New Haven,CT Lowell, MA Inc., Cincinnati,OH ThomasM. Sommer,Wang Laboratories, Associates, EugeneMerchant,MetcutResearch California,Marinadel Ray, Southern of University Sondheimer, Norman lL Urbana, lllinois, of RyszardS. Michalski,University CA lack Minker, Universityof Maryland,CollegePark,MD FrankSonnenberg,New EnglandMedicalCenter,Boston,MA lnstituteof Technology,Cambridge,MA Marvin Minsky,Massachusetts PA fohn F. Sowa,IBM Corporation,New York, NY University,Pittsburgh, HansMoravec,Carnegie-Mellon Cambridge,MA Guy Steele,ThinkingMachinesCorporation, RogerNagel,LehighUniversity,Lehigh,PA Center,PaloAlto, CA Marc Stefik,XeroxPaloAlto Research Dana Nau, Universityof Maryland,CollegePark,MD salvatorestolfo, columbia University,New York, NY Frederickf. Newmeyer,Universityof Washington,Seattle,WA ComputerAided SystemsFacility,Palo Marty Tenenbaum,Schlumberger CA La of California, University Jolla, DonaldNorman, Alto, CA fane T. Nutter,TulaneUniversity,New Orleans,LA Inc., Austin,TX Harry Tennant,TexasInstruments, GregOden, Universityof Wisconsin,Madison,Wl Center,PaloAlto, PaloAlto Research DemetriTerzopoulos,Schlumberger A. L. Pai,ArizonaStateUniversity,Tempe, AZ CA MA Boston, Center, Medical StevenPauker,New England Henry Thompson,Universityof Edinburgh,Edinburgh,uK fudeaPearl,Universityof California,LosAngeles,CA PA University,Pittsburgh, David Touretzky,Carnegie-Mellon PORTUCAL Lisbon, Lisboa, Nova de L. M. Pereira,Universidade tohn K. Tsotsos,Universityof Toronto,Toronto,Ontario DonaldPerlis,Universityof Maryland,CollegePark,MD EndetTulving,Universityof Toronto,Toronto,Ontario Menlo Park,CA RayPerrault,SRIInternational, Vass,Universityof Pittsburgh,Pittsburgh,PA NY Heights, Yorktown fames IBM Corporation, Petrick, Stanley CA vere, JetPropulsionLaboratory,Pasadena, MA steven Amherst, Gerry Pocock,Universityof Massachusetts, New York, Buffalo,NY of University State MA Walters, Deborah Cambridge, TechnologY of Institute , Massachusetts Poggio, Tomasso Cambridge,MA DavidWaltz, ThinkingMachinesCorporation, lra Pohl,Universityof California,SantaCruz,CA Boston,MA University, Northeastern wand, Mitchell Ontario London, Ontario, Western of University Pylyshyn, Zenon DavidWarren,QuintasCorporation,PaloAlto, CA William f. Rapaport,StateUniversityof New York, Buffalo,NY DonaldWaterman,RandCorporation,SantaMonica,CA CA Alto, Palo Hewlett-Packard, Raphael, Bertram PA Phitadelphia, VA Bonniewebber, Universityof Pennsylvania, CharlesReiger,Vidar SystemsCorporation,Herndon, Brunswick,NJ New University, Rutgers Columbia weiss, British Vancouver, shalom Columbia, British of RayReiter,University TX Elaine Rich, Microelectronicsand ComputerTechnologyCorporation CraigWilcox, Universityof Texas,Austin, Yorick Wilks, New Mexico StateUniversity,LasCruces,NM (MCC),Austin,TX CT Ridgefield, PeterWill, Schlumberger-Doll, Charfesf . Rieger,1OO2BroadmoorCircle,SilverSpring,MD center,PaloAlto, CA Research Alto Palo schlumberger CT witkin, Haven, New Andrew University, Yale ChristopherK. Riesbeck, MA Cambridge, Systems, MA William Woods,Applied Expert lnstituteof TechnologyPress,Cambridge, CurtisRoads,Massachusetts lL Argonne, Laboratories, National Argonne CA Wos, Stanford, Larry University, Paul Rosenbloom,Stanford PA University,Pittsburgh, steven Tucker,Mccill University,Montreal,Quebec Alexanderl. Rudnicky,Carnegie-Mellon CA Alto, Palo Inc., EarlSacerdoti,Teknowledge,
GUESTFOREWORD branch of AI, part of the new field of cognitive Artificial Intelligence (AI) is a domain of research, application, Z. The second science,is aimed at programs that simulate the actual proand instruction concerned with programming computers to cessesthat human beings use in their intelligent behavior. perform in ways that, if observed in human beings, would be These simulation programs are intended as theories (sysregarded as intelligent. Thus intelligence is attributed to hutems of difference equations) describing and explaining hu*"tt beings when they play chessor solve the Tower of Hanoi man performances. They are tested by comparing the compuzzle. A computer that can perform one of these tasks even puter output, second-by-secondwhen possible,with human moderately *ett is regarded as an example of artificial intelliLehavio" lo determine whether both the result and also the gence. actual behavior paths of computer and person are closely the Research in AI began in the mid-1950s, shortly afber similar. security wartime first digital computers emerged from their nuout carry primarily to designed wraps. The computer was Early research in AI was directed mainly at studying well*rri.ul computations in an efficient way. But it was soon obstructured puzzle-Iike tasks, where human behavior in the served (the English logician, A. M. Turing, was perhaps the Iaboratory could be compared with the traces of the computer first to make this observation) that computers were not limited This work produced a basic understanding of probprograms. all of processing general to numbers, but were capableof quite i.* solving as (nonrandom) search guided by heuristics or kinds of symbols or patterns, Iiteral and diagrammatic as well rules of thumb. It confirmed Duncker's* early emphasis upon as numerical. AI progTams exploit these capabilities. as a central tool for solving problems. A digital computer is an example of a physical symbol sys- means-endsanalysis into domains like chess-playing and expanded research As (reading); outputtem, a system that is capable of inputting diagnosis, two tasks that have been prominent in the ting (writing); organizing (associating); storing, copying, and med.icat grew that successfultask performance de.o*puring symbols; and of branching-following different literature, "nid"nce to large bodiesof knowledgeby a process access rapid pends on .orrrr.s of action depending on whether a comparison of sym(often called "intuition"). Experiments recognition cue of bols led to judging them to be the same or different. The fundain such domains is capable of expert human the that showed just it capabilities the mental hypothesis of AI is that these are patterns-using chunks-familiar 50,000 or more requires to exhibit "intelligence." Two corollaries follow from recognizing in long-term stored information access to recognition th; hypothesis. First, since computers demonstrably have this physician recogthe Thus, patterns. the to relevant memory to these capabilities, they are capable of being programmed and symptoffis, disease to colTesponding patterns nizes beof people capable are behave intelligently. Second,since their diseases, the about his knowledge having intelligently, their brains are (at least) physical sym- thereby gains accessto treatment, and further diagnostic tests. bol systems. Research in the cognitive science branch of AI up to the The fundamental hypothesis of AI and its corollaries are (1986) has placed particular emphasis on problem present deterbe to empirical hypotheses, whose truth or falsity are or the organi zation of long-term memory (semantic solving, at aimed Research test. empirical mined by experiment and memory), and on learning processes. testing them leads to the two main branches of AI: From the beginnirg, research in both branches of AI was by the invention of programming languages espefacilitated aimed 1. AI in the narrow senseis a part of computer science, to their needs.The so-calledlist processinglanadapted cially be can computers which over tasks of at exploring the range programmed to behave intelligently. It makes no claims guages, first developedin 1956, allowed for flexible, associative organization of memory and convenient representation of that computer intelligence imitates human intelligence in such psychologicalconceptsas directed associationsand scheto responses produces intelligent its processes-only that it Around 1970, production-system languages were develmas. for may, category programs in this AI the task demands. sophistiexample, use rapid arithmetic processesat a rate that peo- oped, whose basic instruction format represents a and stimuli betwen connection of the elaboration cated ple are incapable of. Thus, an AI chess program may exa choosing game tree before of the plore a million branches move, while a human grandmaster seldom explores more {' 0 ans+ans*x output ('READ:') input (x) print (ans)
ProgramConstructionUsingMechanizedAssistant More recently researchershave been examining the role that AI can play in industrial programming environments where large toft*ut" systems are specified, coded, evaluated, and maintained. Here the whole life cycle of the software system is under consideration: The client and the professional systems analyst discuss informally a proposed software product. More formal specificationsare then derived, performance estimates are made, and a model of the system evolves. Many times specificationsare modified or redefined as analysis proceeds. The next phase is the actual construction, documentation,and testing of the product. After release into the user environment the tyttu* may be debugged and changed or improved on a regular basis over a period of years. A developing idea in somecurrent automatic programming projects @4,45) envisions a mechanrzedprogrammer's assisiant that would intelligently support all of the aboveactivities.
AUTOMATIC PROGRAMMING It would provide a programming environment for the user capableof receiving many kinds of information from programmers, including formal and informal specifications,possibly natural-language assertionsregarding goals,motivations, and justifications, and codesegments.It would assist the programmer in debugging these inputs and properly fitting them into the context of the programming project. It would be knowledge basedand thus capable of fully understanding all of the above inputs. It would provide library facilities for presenting the programmer with standardized program modules or with information concerning the current project. It would be able to generate code from specifications, program segments, and other information available from the programmer and other sources.It would be able to understand program documentation within the code and to generate documentation where necessary. Finally, it would maintain historical notes related to what was done, by whom, when, and, most important, why. All of these functions are envisioned as operating strictly in a supportive role for human programmers, who are expectedto carry on most high-level tasks. Thus, the concept of the automatic programmer's assistant placesthe human programmer in the primary position of specifying the program and guiding progress toward successful implementation and maintenance. The task of the assistant is to maximally utilize available technologies to automate as many lower level functions as possible.
33
merations, state transformation sequences,and other constructions. The V language is being implemented within the CHI project (47,48), which emphasizes the idea of self-description. That is, the CHI system is a programmer's assistant that provides an environment for using the v language in program development. The CHI system is also being written in the V language; hence,it is "self-describittg."V has been designedto include capabilities for expressing program synthesis rules as well as its many other facilities. Another approach (44,49)is basedprimarily on the concept of plans for programs that contain the essential data and control flow but exclude programming language details. An example of a plan appears in Figure 14, where the computation of absolute value is represented. The advantages of such plans are that because they locally contain essential informutiorr, they can be glued together arbitrarily without global repercussions. This facilitates the use of a library of small standard plans [or "cliches" (49)],which can provide the building blocks for the assembly of large plans. This approach uses code and plans as parallel representations for the program and allows the user to deal easily with either one, as illustrated in Figure 15. If the user choosesto work in the plan domain, each action in creating or modifying a given plan results in appropriate updates in the code domain. The coder module translates the current version of the plan into code. If the user wishes to work with the code,the The ProgrammingParadigm.This view emphasizesthe de- analyzer appropriately revises the associatedplan. compositionof the programming task into two stages,as illusThe use of the system could begin with the assembly and trated in Figure 13, systems analysis and programming. The manipulation of various plans from the library to result in a first stage involves the development of formal specifi.utior6 large plan. Then it could be translated automatically to code. and deals primarily with what performance is required; the Another usage might begin with an existing code segment latter includes the decompositionof the task into an appropri- that needsto be modified. Its correspondingplan could be autoate hierarchy of subparts, the selection of data structures, and matically created and then manipulated in appropriate ways, the coding and documentation of the product. The former is including possibly the addition of some library routines. Then assumedto be the appropriate domain for considerablehuman translation back to codewould yield the desired codewith its involvement, whereas the latter is expectedto be more amena- revision. ble to automation. The automatic programmer's assistant concept assumes In order to begin implementing such an assistant,it is nec- that most coding functions below the formal specificationstage essary to have appropriate languages to handle the many will be automated. Once the specifications are derived, the kinds of information that appear in this application. One ap- machine will be able to select data structures for the developproach is to introduce the conceptof a wide-spectrumlanguage ment of efficient code,generate the code, and produce approthat can be used at all levels of implementation from the speci- priate documentation. This level of automation has -utty imfication of requirements to high-level coding of the actuaf target program. An example of such a language is V (46), which has as primitives sets, mappings, relafions, predicates,enuData Controf r n fo r m o ls p e cfi r c o t i o n
s g s t e mo n o gl s rs
f o r m a ls p e c r f r c s t r o n
p r 0 g r 6 m mnt g
p r o g r 6 mp r o d u c t Figure
Testnegative
h u m a nl a b o r rn t e n svl e
-l
h e o v rI g outomoted
_l
13. Stages in program construction.
Data
Control
Figure 14. Example plan for computing absolute value showing both flow of data and flow of control.
34
AUTOMATIC PROGRAMMING
Library
Analyzer
User to codeand assoFigure lb. Architecturegiving userparallelaccess ciatedplans. plications in that programmers might then wish to automatically generate several versions of the target system while varying specifications or other implementation parameters. Thus, a higher degree of optimi zation would be possible becausemore experimentation could be done on different design strategies. A secondbenefit made possible by this approach would be that program maintenance and improvement would be done in a new way. Instead of modifying a system by workin g at the programming language level, changes would be made by workin g at the specification or planning level. After the completion and validation of the new specification or plan, the automatic program generator would then be released to assemble the product, again repeating, where appropriate, previous design decisions but modifying decisions both at local and global levels where earlier choices are no longer acceptable. The automatic programmer's assistant will thus be aimed at revolutioni zrrrg software development processes.With the successof this research,human programmer activities will be moved more into the software specification cycle, Ieaving code generation to the assistant. More efficient programs may be possiblethrough more extensive experimentation with design alternatives. Fewer programming personnelwill be neededfor actual coding and documentation, and fewer errors should occur at these levels. Program maintenance and upgrading will be done by working with plans and specificationsrather than with the code itself. Conclusion Automatic programming is the processof mechanically assembling fragmentary information about target behaviors into machine-executablecode for achieving those behaviors. This section has described the four main approachesto the field followed by researchersin recent years. The field is still very much in its infancy, but already many useful discoveries have been made. Becauseof its tremendous importance, it is clear that automatic programming will be a researcharea central to AI in the years to come. Additional readings on the subject are found in Refs.50-54. BIBLIOGRAPHY program 1. z. Manna and R. waldinger, "A deductive approach to (1980)' synthesis," Tra,ns. Progr. Lang. Sysf. 2(L),90-121
2. C. C. Green, "Application of theorem proving to problem solving," Proc. of the First Int. Joint Conf. Artif. Intell., Washington, DC, May 1969,pp. 219-239. 3. R. J. Waldinger and R. c. T. Lee, "PROW: A step toward autowriting ," Proc. of the First Int. Joint Conf. Artif' matic ptogr* Intell., Washington, DC, May 1969, pp' 24L-252' 4. M. Broy, Program construction by Transformations: A Family Tree of sorting ProgTams," in A. W. Biermann and G. Guiho (eds.),Computir Program SynthesisMethodologies,D. Reidel, pp' 1-50, 1983. b. R. M. Burstall and J. Darlington, "A transformation system for developingrecursive programs,"JACM,24, 44-67, L977. 6. Z. Manna and R. Waldinger, "synthesis: dreams ) programs," IEEE Trans. software Eng., SE-5, 294-328 (1979). 7. W. Bibel and K. M. Hornig, LOPS-A System Based on a Strategical Approach to Program Synthesis, in A. Biermann' G' Guiho, and y. Kodratoff (eds.),Automatic Progrqm Construction Techniques,Macmillan, PP.69-90, 1984' g. A. W. Biermann, "On the inference of Turing machines from sample computatioDs,"Artif. Intell.3, 181-198 (L97D. 9. A. W. Biermann and R. KrishnaswailY, "constructing programs from example computations," IEEE Trans. software Eng., sE'z' 141-153 (1976). 10. D. A. Waterman, w. s. Faught, P. Klahr, s. J. Rosenschein,and R. wesson, Design Issues for Exemplary Programming, in A. Biermann, G. Guiho, and Y. Kodratoff (eds.),Automatic Program Construction Techniques,Macmillan, 433-461, 1984. in 11. A. W. Biermann, "Automatic insertion of indexing instructions (1978)' program synthesis," Int. J. Comput. Inf. Sci., 7, 65-90 A L2. D. R. Smith, The synthesis of LISP Programs from Examples: (eds.), AutoKodratoff Y. and G. Guiho, survey, in A. Biermann, -324, matic Program Construction Techniqttes,Macmillan, pp. 307 1984. from 13. A. W. Biermann, "The inference of regular LISP progTams exampl€s," IEEE Trans. sys/. Man cybern SNIC-8, 585-600 (1e78). ,,A methodology for LIsp program construction p. D. summers, 14. from examples,"JACM 24, L6L-I75 (1977)' Cam15. E. Y. Shapiro, Algorithmic Program Debugging, MIT Press, bridge, MA, 1982. l0' 16. M. Gold, "Language identification in the limit," Inf' Contr' 447-474 (1967). 17. D. R. Smith, A Classof SynthesieeableLISP Prograinzs,A.M. Thesis, Duke UniversitY, L977. Program 18. S. Amarel, On the Automatic Formation of a computer A' D' which Represents a Theory, M. Yovits, G. T. Jacobi, and spartan systems-L962, organizing Goldstein- (eds.), in self Books,PP. 107-t75, 1962' ,.A formal theory of inductive inferenc}," Inf. 19. R. solomonoff, Contr. l(22),, 224-254 (1964)' zo. A. w. Biermann and J. A. Feldma', A survey of Results in Gram(eds.),context' matical Inference, in y. H. Rao and G. w. Ernst Technolopattern Intelligence Machine and. Recognition Directed. giesfor Inforrnation Processing,IEEE Computer Society Press, L982,pP. 113-136. ,,on the complexity of minimum inference of regular 2r. D. Angluin, 39,337-350 (1978)' Contr. Inf. sets," "Toward a mathematical theory of inducBlum, M. and Blum 22. L. (1975). tive inferen cQ,"Inf. contr. 28, L25-L55 S' Redet, Grammatical and 23. J. A. Feldmatr, J. Gips, J. J. Horning, cs-125, computer Report Technical Infirence, and, complexity 1969. university, science Department, stanford Automatic 24. T. J. Biggerstaff, c2: A Super compiler Model of program-Jing, Ph.D. Dissertation, University of Washington, Seattle, L976.
AUTOMATION, INDUSTRIAT
35
25- S. Hardy, "synthesis of LISP functions from exampl€s,"proc. of 49. R. C. Waters, "The programmer's apprentice: Knowledge based the Fourth Int. Joint conf. Artif. Intell., pp. 240-z4s (lgzb). program editing," IEEE Trans. Softtar. Eng., SE-g(l), L_r2 26- Y. Kodratoff and J.-P. Jouannaud, Synthesizing LISP Programs ( 1982). Working on the List Level of Embedding, in A. Biermann, G. 50. A. Barr and E. A. Feigenbaum, The Handbook of Artifi.ciat IntelliGuiho, and Y. Kodratoff (eds.),Automatic program Construction gence,Vol. 2, Kaufmann, Los Altos, CA, LggZ. Techniqu,es, Macmillan, pp. B2b-874,1gg4. 51. A. w. Biermann, Approaches to Automatic programming, in M. 27. D. shaw, w. swartout, and c. Green, "Inferring LISP programs Rubinoff and M. C. Yovits (eds.),Aduancesin Computers,Vol. 1b, from exampl€s," Int. Joint conf. Artif. Intell., 4, 260-267 (1975). AcademicPress,New York, pp. 1-69, L976. 28' A. W. Biermann and D. R. Smith, "A production rule mechanism 52. A. w. Biermann, G. Guiho, and y. Kodratoff, (eds.),Automatic for generating LISP code,"IEEE Trans.sys/. Man cybern, sMCProgram construction Techniqtres,Macmillan, 1994. g, 260_276 (1979). 53. G. E. Heidorn, "Automatic programming through natural lan29. C. Green, The Design of the PSI Program Synthesis Syst em,proguage dialogue: a survey," IBM J. Res.Deuelop.902-g1g (19z6). ceedingsof the SecondInternational Confere'nceon Software Engi54A. W. Biermann, "Formal methodologiesin automatic programneering,San Francisco,pp. 4-19, 1926. ming: A tutorial," J. Symbol. comput. l, 119-L42 (Lg8b). 30' J. M. Ginsparg, Natural Language Processingin an Automatic Programming Domain, Report No. srAN-cs-Tg-6?1, computer A. BrnnlaANN science Departmenf, stanford university, 1g7g. Duke University 31. L- Steinberg, A Dialogue Moderator for Program Specification Dialogues in the PSI System, Ph.D. Thesis, Stanford University, 1980. AUTOMATION,INDUSTRIAT 32. R. Gabriel, An Organi zation for Programs in Fluid Dynamics, Report No. STAN-CS-81-856, Computer Science Department, The term automation as a combination of automatic and operStanford University, 19g1. ation was coined by Ford executive D. S. Harder in Ig47.It 33' J' V. Phillips, "Program reference from traces using multiple connotesthe use of machinery to augment or replace human knowledge sources,"Int. Joint conf. Artif. Intell., s, aIz O}TT). endeavor. Although AI plays a very minor role in industrial 34' B. P. McCune, "The PSI program model builder: synthesis of very automation today, within a decade it can be expectedto behigh-level programs," SIGART Newsletter,64,180-1gg (Lg7T. come one of the drivers of industrial automation. 35' D. R. Barstow, Knowled,ge-Based, Program Construction, Elsevier The development of industrial automation dates back sevNorth-Holland, Amsterdam, lg7g. eral thousand years, but it acceleratedduring the Industrial 36' E' Kant, "The selection of efficient implementations for a high Revolution. The steam engine provided a new technique for level language," SIGART Newsletter,64, L40_r46 (L}TT. powering manufacturing tools, interchangeable part, gurr. u 37' G. E. Heidorn, "English as a very high level language for simula- new methodology for designing products, and assembt! hnes tion programming," SIGPLAN Noticesg, 91-100 (1924). presented a new approach to logistical control. 38' R' M. Balzer, N. Goldman, and D. Wile, On the Transformational Up until about 1950 nearly all industrial automation sysImplementation Approach to programming, proceed,ings of the tems involved fixed automation. Due to its inflexibility and second Internationar conference on softwire Engineiring, pp. high cost, such equipment could be justified only for high337-344 (1976). volume products with unchangrng designs. Since 1gb0 39. R. M. Balzer, N. Goldman, and D. wile, .,rnformarity comin program puters have facilitated the new technology of programmable specifications,"IEEE Trans. softwr. Erg., sE-4, g4-10g (1gzg). automation. Even today, however, over 807oof *t automation 40. w. A. Martin, M. J. Ginzberg, R. Krumland, B. Mark, M. Morgen- is fixed rather than programmable. stern, B. Niamir, and A. sunguroff, Internar Memos, Automatic From a historical perspective, manufacturing has been Programming Group, Massachusetts Institute of laTechnology, bor intensive, it is now capital intensive, and it is becoming Cambridge,MA, IgT4. data intensive. 4L' A' W' Biermann and B. W. Ballard, "Towards natural language programming," Am. J. comput. Linguist.,6, (1gg0). z1-g6 42. A. w. Biermann, B. w. Bailard, and A. H. sigmon, ,,An experi- Objectives mental study of natural language programming,,, Int. J. of Man_ Mach. stud.,1g, 71_g7, 19g3. Industrial automation addressesthe processesby which prod43. R. Geist, D. Kraines, and p. Fink, Naturar ucts are designed, developed, and manufactured. The Language computing objec_ in a Linear Algebra Course, Proceedings of the National Ed,uca- tives are to improve effrciency, increase quality, and reduce tional computing conference, rggz, pp. zod -20g. the time to effect changes (see Computer_aided-desigrr; Com44' C' Rich and H. E. Shrobe,"Initial report on puter-integrated manufacturing). a LISP programmer,s apprentice," IEEE Trans. softwr. Erg., sE-4, 4b6-467 (rgzg). As a result of evolution, human hands are well adapted for 45. R. Balzer, T. E. cheatham, Jr., and c. Green, ,,sofbware technol_ holding branches,but !h"v are poorly adaptedfor *ort fuctory ogy in the 1990's:using a new paradigm cornpltter,16, gg_4b tasks. A major focus of auto*"iiott is, therefore, the reduction ,,, (November 1g8B). of direct labor in manufacturing. 46- c. Green,J. philrips, s. westford, T. pressburger, Human minds are adept at learning new skills, but B. Kedzierski, s. they are Angebranndt, B. Mont-Reynaud, and s. T-appel, Research on poor at remembering large amounts of data. In factories, Knowledge-Based programming and Argorithm therefore, such data is normally written down on paper. Design-lggl, The TechnicalReport KES. U. 81.2,Kestrel Institute, palo Alto, 19g1. volume of paper results in inefficiency, poor quality, and slow 47' C' Green and S. Westfold, "Knowledge-based programming self- response.A secondaryfocus of industrial automation is thereapplied," Mach. Intell., 10, (1gg1). fore the elimination of paperwork. More generally, it is a re48' D' R' Smith, G. B. Kotik, S. J. Westfold, "Researchon knowledge- structuring of the indirect operations that support the manubased software environments at Kestrel Institute,,, IEEE Trans. facturing floor, including design, drafting, planning (qv), softwr. Eng. sE-rr(11), LzTg-Lzgl (19gb). and
control.
36
INDUSTRIAL AUTOMATION,
Move. Within plants small vehicles are frequently used to move parts. They may be human operated or they may autoAlthough industrial automation has enormously increasedthe matically foltow a desired path. In order to facilitate moveworld's averagestandard of living, the social impact of automent, objectsare often placed on pallets. Each pallet may conmation is a controversial subject. When people are displaced tain a single part, an arTay of parts, ordered parts in by automation, it is no consolation to realize tnat they are a magazines,or disordered parts in tote boxes (seealso Autonosmall dislocation in a globally good picture. mous vehicles). A historical example is the farming industry, which until When higher throughput is required, conveyor systemsare population. Today the Middle Ages employed over 90Voof the used. When parts and materials are being moved, sensorson the figure is much smaller, even when supporting industries the conveyor can be used to detect and count the passage of like farm machin€rY, pesticides, transportation, marketing, objects,or codedpatterns can be used to keep track of what is and so on are included. The social impact was limited by the actually in transit and where it goes. fact that the transition occurredover a considerablenumber of Store. Store operations are a means to smooth the flow of years. parts and material. Objects that are stored constitute either One potentially unique aspect of today's situation is the work in processor final inventory. existenceof world markets, which may be reaching the limits Storage may take the form of a magazine of parts or a small of growth. Another is the availability of powerful inexpensive buffer associatedwith an individual tool. At the other end of computers, leading to speculation that the new automation the scale, it may be an enormous stacker crane warehouse, may raise rather than lower required skill levels. covering more than 10,000 ftz (929 m2) to a height of 50 ft Over the next hundred years or more it is possible that (L5.2m) and containing millions (106)of items. industrial automation witl cause the number of direct and indirect manufacturing jobs to decreaseultimately to a numEngineeringDesign. In recent years there has been a rapid ber near zera. Whether this outcome actually happens, growth in the use of computer-aided design (qv) systems to whether it is desirable, and whether appropriate social policy acquire, manipulate, and maintain design data. Engineering can be formulated will remain controversial. deslgn of a product plays an overwhelmingly important role in determining how that product is manufactured. Taxonomyof IndustrialSYstems Design information for a typical discrete object includes and process. Industrial systems range from continuous processto discrete data on form, hierarchical composition, information on the input gxaphics to use Designers Form. ;;;;;rr, Uuftney oro"ily involve a blend oi bottt extremes. tools, generally in and products of finish and geometric shape major activithree are there Within all industrial systems and tolerdimensions with views, ties: the engineering design of the products and the manufac- terms of front, side, and top built, by are they before visualized be turing procfsses,thJ logistics operations to ensure that manu- ances.objects may then with or frames, wire show may Views drawings. means of facturing operations prJceedsmoothly, and the manufacturing with optional color renderings solid or lines, hidden without operations themselves. and shading. Computer-aided design systems have automated the creaManufacturing Operations. Manufacturing operations can tion of drawings. In general, however, any set of drawings is be recursively classified as make, test, move, or store. piece likely to have inconsistenciesthat require human interpretafabricate to Make. In make operations tools are used tion for their resolution. To reduce the current ambiguities, subproducts. or products into parts that are then assembled store object models that contain geoAssembly is defined to be orienting and placing parts in prox- computers will need to on form, not just drawings. information complete imity for subsequent fastening operations. Assembly tools metricatly progresses,computers modeling object of (see technology the As Rotrttg" from simple mechanisms to multiaxis robots function, cost, and ease of manufacture. model to used be can botics). Dimensionality makes electronic modeling generally much In discrete manufacturing, make operations are dependent easier than mechanical modeling. The most elementary meon the location, orientation, and shape of the workpiece. A aschanical property that two things cannot occupy the same that common procedure, therefore, is to provide fixturing provide spaceat tle r.*" time is nonlinear and difficult to model. can sensors Alternatively, sures workpiece placement. In the mechanical design domain structural deflection is feedbackto allow adaptive make operations. an object into a mesh of small eleIesf. Tools are also used for test operations. Usually, test- simulated by subdividing differential equation can be appropriate an which for ments used ing is used to cull bad products. Increasingly, it is being In the electronic design domain complex as a means of providing feedback to control or correct pro- solved iteratively. digital logic circuitry can be simulated from the model of the cesses. and interconnections' In some casestest tools are componentswithin make tools, logic elements When computers are used to automate electronic design, providing sensory feedback to allow manufacturing processes they incorporut. design rules to assurethat the objectis buildto be controlled more effectively' they can generate data automatically to control Test tools are also used extensively for quality assuranceto able. often, manufacturing iools that build the completed object. Generadetermine when in-processquality is outside of acceptablelimtion of *"rrrri"cturing instructions is referred to as cADl its to the extent that intervention or correction is needed. computer-aideddesign/computer-aidedmanuData collected from tests may be written down or automati- CAM, meaning cally collectedin a database.If tests uncover the existenceof a facturing. In computer-aidedmechanical design,however, it is beyond defect, statistical analysis may determine the probable cause state of the art to build in many well-known design rules. the of the defect. Social lmpact of Automation
AUTOMATION,INDUSTRIAL
For example, for ease of assembly,parts should be symmetric or markedly asymmetric; shafts and holes should be chamfered; and parts should not interlock. BecauseCAD systemsdo not incorporate these rules, it is possibleto design objectsthat are unnecessarily difficult or even impossible to manufacture. The proliferation of low-costplant floor computerswill lead explosion of fully automated control systemsthat provide an to execution-time-adaptive behavior. Significant research is neededto determine how to exploit these execution time capabitities from computer-aided design systems. Hieiirchical Composition.Hierarchical compositionis given by the bitl of materials, a description of the "part-of" relationship. This information is often specifiedimplicitly as annotation in drawings, but a more precise approachis to provide an explicit textual specification. Design for ease of manufacture generally favors objects whose bills of materials have as few parts and as few part types as possible. Process.Routings describe process steps to be performed and associatedcostsat each node of the bill of materials. Routings are used for operational control, but they could also allow the simulation and analysis of logistical properties. High-technology products differ from ordinary consumer products becausethey depend on the design of new manufacturing processes.Even when existing processesare sufficient, the selection and sequencingof processesis an issue. In discrete-part manufacturing there is some possibility of deriving routings automatically from object models.A heuristic method called group technology attempts to classify part shapes so that similar shapes can be used to imply similar routings. In continuous-processmanufacturing the choice between alternative processescan sometimesbe formulated as a linear programming problem, for which the precise mathematically optimal solution can be found. Logistics.Manufacturing logistics relates to the acquisition, storage, allocation, and transportation of manufacturing resources, including materials, parts, machines, and personnel. Logistics is important when manufacturing facilities are initially designed,as well as later, when existing facilities are operated. Logistical models allow the designer to evaluate trade-offs. For some variables, like quality and flexibility, the analysis may be subjective becausethere are no easy means to quantify the costs and benefits. After manufacturing facilities are built, logistics is concerned with planning, tracking, and controlling the ongoing operation as well as providing methods of improving the actual performance. It encompassesadministrative operations like order entry, purchasing, receiving, inventory management, shipping, and billing as well as planning operationsthat relate to long-range resources, final shipment, material requirements, and load balancing. For efficient control of complex manufacturing operations, logistics is an essential function. It is possible,in principle, to automate both the logistics planning and execution, even for manufacturing operations that are otherwise unautomated. Timeliness and accuracy of data, however, are best assured when data distribution and collection are automatic. Material requirements planning (MRP) is an algorithm that determines schedulesfor completing constituent parts of
37
a final product. For many products, the manufacturing process may need to begin years before the final product is to be shipped. MRP is sensitive to routing times. These times, unfortunately, ilay be very inaccurate whenever machine setup times are long becausethere is no easy way to infer when setup is required and when it is not. The preferred solution is to design tools to minimize setup time. Sometimes management attempts to protect against unforeseen contingencies by providing unnecessarily conservative timing information in the routings. In turn, MRP then computesunnecessarily early starts to the manufacturing activities, and work-in-processinventory abounds.Such logistical operations are referred to as push systems. The alternative to push systems are pull systemsin which the start of any operation triggers the start of antecedent steps in the bill of materials. If these antecedent steps have appreciable time delays, the lack of work-in-process inventory results in immediate work stoppages. Thus, accurate routing time and production plan data are neededfor smooth logistical operations,regardlessof whether they are push or pull. Once the system is based on accurate planning, the distinction between push and pull becomes moot. In a well-run system each step is completedjust in time to be used by the next step, and work-in-processinventory is minimized. Other names for such systems are just-in-time manufacturing and continuous-flow manufacturing. MRP by itself fails to consider the utilization of machines and personnel. As a result, even if production plans are reasonable and routings contain correct timing data, MRP may yield unworkable solutions. To complement MRP, computer programs can compute how to balance loads at the level of machines, lines, and plants. By alternating line-balancing computations with MRP computations, reasonably good overall solutions can be found. Plant floor monitoring and control systemsallow the collection of data from manufacturing tools, conveyors,and personnel to detect stoppagesand provide means for analyzing performance. But even with instantaneous data on machine availability, existing programs do not generally provide the rapid responsetime needed to manage logistics efficiently in an environment of uncertainty and rapid change. Integration Industrial systems that link the engineering design and manufacturing functions are referred to as vertically integrated. Those that couple logistics and manufacturing are called horizontally integrated. The acronym CIM, for computer-integrated manufacturing (qv), refers to idealized industrial systems in which all three functions cooperatesmoothly. One example would be flexible machining systems,which have automatically guided vehicles delivering parts between numerically controlled machine tools in darkened unmanned factories. As new parts are designed,machining instructions, bills of materials, and routings are transferred to logistical software that controls the plant floor. Software systemsmanage plant floor communications and data basesfor design and logistics. The design databaseallows several designersto work concurrently, and it provides a formal processby which completed designs get released to manufacturing. Another example would be fast turnaround lines for inter-
38
INDUSTRIAL AUTOMATION,
connecting logic elements on gate array semiconductors.After designers specify the interconnection patterns, silicon wafers are moved automatically through lengthy sequencesof lithogfaphic and chemical operations, with each wafer taking a unique route. By definition, all manufacturing is integrated, but in only a small fraction of industrial systems is this integration highly efficient. Driversof Automation Industrial automation is currently undergoing rapid growth and change throughout the world, stimulated by international competition, which motivates companies of every nation to increase their efficiency, quality, and flexibility. A major driver is the proliferation of low-cost computing hardware. A decadeago the cost of a computer neededto control a manufacturing tool might have been more than the tool itself. Today, nearly every new tool costing more than $10'000 probably contains a computer. Also, industrial automation is being advancedby software technology. Most of this technology has been in the mainstream of computer science:algorithms, languag€s,operating systems,databases,and data communications. The latest addition to this repertory is AI and, more specifically, expert systems (qv). Although AI has not yet had a major impact in industrial automation, it will probably becomea driver within the next decade.
ciently constrained that there is a reasonable prospect for AI to be practical. It appears that the problem could be mapped into one large expert system or an arcay of many small expert systems. Such an AI system might offer a heuristic approximation to MRP and load balancirg, but with a much faster turnaround time. It might be able to copewith incomplete, inaccurate, and volatile data, making fast decisions to act, to delegate, or to deny requests. Additionally, it might automatically derive subordinate objectives from higher level ones. Quality Analysis.Testing and customer feedback provide the basic inputs for quality analysis, which looks for meaningful patterns in voluminous data that are frequently irrelevant uttd obsolete. The defects being sought may be masked by purely random events, they may be intermittent, or they may dependin a nonlinear fashion on a coincidental combination of many independent systematic factors. The difficulties are compounded by bad testers and inaccurate field reports. The similarity of this problem to that of diagnosing illness in humans (see Medical advice systems) suggeststhat an AI expert system might be able to outperform the quality experts.
process Planning. Procedures used to construct object models may be different from the processesto construct the objects themselves. As a result, there may be features in the constructed model that are not identified but are nevertheless essential to processplanning. For example, an object that is almost cubicll with a groove machined away may have been representedby the union of three cuboids. Roleof Artificial lntelligence If an AI expert system were built to do processplanning, recognizing features in an object Although industrial automation offers several unique and fer- the hardest problem would be would include flats, grooves, features of types The be model. tile areas for AI research, advances in AI are likely to similar subparts, and so on. edges, rounded pockets, holes, unstructured highly of motivated more by the requirements for the approximaprocedures environments, such as the military, the office, the home, and Such recognition would require or identity of two of similarity the of tion of shape, recognition the laboratory. in a design. The symmetry of recognition the utta Within industrial automation adaptive tools in general and designs, very large, a be can models object because is difficult problem to continue will industrial robotics in particular have been and contiguous occupy generally not does pattern given feature's additional the Among research. AI be a major stimulation to decomposition canonical or features storage,and no algebra of current problems within industrial automation that have AI (see Image understanding)' potential are real-time logistics, quality analysis, process has been invented AI system would have to represent meththe the Additionally, determining planning, design for easeof manufacture, and from manufacturable features and routings inferring of ods quality' fi.nancial value of flexibility and routings based on availalternative Most of these problems are characterized by people mud- rules for choosing among processes. dling through somehow,without understanding or a good al- able gorithm to guide them. Since computers can communicate DesignFor Easeof Manufacture. Design for easeof manufacmuch more rapidly and precisely than people,AI should make object model feature recognition problems of it possiblefor computers to muddle through at least as well as ture has all the plnr the harder problem of representing depeiple do. Expert systems, in particular, seem to thrive in processplanning of hypothesizing alternative designs methods and situations for which there is no alternative prospect of devel- ,igt intent intent. this meet to oping the analog of Newton's laws' Additionally, there would be expert design rules like ,.chamfer all holes and shafts," but this expert system portion Real-TimeLogistics.From a logistical viewpoint, the plant piece of the overall problem' floor can be modeled as a graph whose arcs represent the flow is a trivial and decision of data and material and whose nodes represent FinancialValue of Flexibilityand Quality. When automation manufacturing processes.At each node there is a set of someproposed, costs and benefits affect the design what ambiguo* objectives and a menu of possible actions systems are the subsequent financial analyses that deteruttd At trade-offs with urro.iited probabilities of achieving each objective. the systems are justified. Perhaps AI expert whether incommine by each node involuntary changesof state are created provide a means of estimating the value of flexiing data and parts, by chance, and by the passage of time systems can which frequently overshadowsthe objective q.,"tity, itser. The incoming data may be purely informative, or it may uitity and benefi-ts. and costs contain action requests that need to be prioritized. For example, typical inflexible electronic assembly lines In manufactuiing, the objectives and actions are suffi-
AUTONOMOUS VEHICTES
cause work in process to spend less than L%oof the time in "value add" make and test operations. Having more part feeders on each tool would reduce the number of times a card would need to passthrough, reduce the need to move and store cards, reduce the frequency with which tools must undergo setup, and vastly improve the overall line throughput. Similarly, lack of quality can result in tangible costs in terms of scrap and rework within the plant and field returns from distribution centers and customers.More insidious intangible costsare the consequentialdamagesthat customersmay suffer or the loss of company reputation that can adversely affect sales for years to come. AdaptiveTools. One area of research since the earliest days of the field of AI has been hand-eye robotics. The motivation was to create highly adaptive robotic systemsthat emulate the dexterity of animate motion and sensing systems(seeAutonomous vehicles; Robotics). Although it is not necessary to have intelligent, dextrous, humanoid robots because factories are sufficiently constrained, there is some benefit to be gained by providing modest levels of adaptive behavior in a broad range of make and test tools. Software can substitute for hardware precision, and it can make decisionsthat reduce the need for operator intervention. The mainstream of current industrial robot research is aimed at making robots that are faster, more precise,cheaper, and easier to program. Of these topics, only ease of programming appears to be appropriate for AI. Two promising approachesare teaching by showing and object modeling, both of which are relatively simple for nonadaptive tools. Conversely, in adaptive teaching by showing the system must infer an adaptive strategy from one example of the desired behavior. The use of object modeling to simulate adaptive tool programs is fairly easy if the tool reads its sensorsless often than about once a secondbecausethe user can be asked to provide simulated sensoryinput (seeSensors).If the feedbackactually occurs at a much higher rate, the model must provide an autonomous means of simulating the sensors.It is reasonableto expect comprehensivesolutions by the end of this decade. An entirely different application of object modeling is the generation of adaptive robot programs automatically from higher level task descriptions. This problem has been a major focus of AI hand-eye research over the course of the past 20 years, but the limited scopeof successhas mainly served to clarify the intrinsic technical difficulties. AI researchers have also worked on robotic sensing. The emphasis has been on emulating human sensory capabilities, especially taction and vision (qv). Contact-sensing micro_ switches in a gripper's fingers allow a robot to do a centering grasp. Strain gaugespermit a raw eggto be grasped.Contact image sensing allows part identification. Current approaches to contact image sensing include miniature contact-sensing arrays on silicon and artificial skin made from conductive polymers (seeMultisensor integration). Vision includes one-dimensionalsensorsthat detect when a light beam is interrupted, two-dimensional imaging sensors, and three-dimensional ranging devices. With ; one-dimensional tight sensor between a robot's fingers, the robot can calibrate itself to fiducial posts in the workplace. Imaging and ranging can be used to inspect, determine shape, *"u.1,r", determine location and orientation, and identify-workpieces.
39
Actually, researcherswho restrict their attention to a subset of the five human sensesare anthropomorphic chauvinists. In a factory every test tool, instrument, and transducer is a sensor. Factory sensorsmeasure temperature, current, color, chemical composition, vibration, and hundreds of other quantities that are outside the range of direct human sensation. Similarly, there is much more to adaptive tools than just robotics. General Referenees M. P. Groover,Automation,ProductionSystems, and Computer-Aided Manufacturing,Prentice-Hall,Englewood-Cliffs, NJ, 1980. Computerized Manufacturing Automation: Employment, Education, and the Workplace, Washington D.C., U.S. Congress, office of TechnologyAssessment,Report OTA-CIT-?}1, April LgB4. D. F. Nobel, Forcesof Production: A SocialHistory of Industrial Automation, Knopf, New York, 1984. D. GnossMAN IBM Corporation
AUTONOMOUSVEHICLES Simply defined, 8r autonomous vehicle must travel from one specifiedlocation to another with no external assistance.This definition encompassesall vehicles from unmanned vehicles without data links to remotely piloted vehicles with high bandwidth data links for real-time control. So broadly defined, autonomous vehicles for simple or well-structured environments are commonplacein military applications [e.g., some missiles and torpedoes,advanced remotely piloted vehicles (RPVs)1,in industry [e.g.,automatic guided vehicles(AGVs)], and in space exploration (e.g., Voyager, Viking. Automatic control technolory alone is sufficient to meaningfully coordinate sensorand actuator resourcesfor nearly all of these vehicles. However, automatic control becomesinadequate for uncertain, unknown, complex, and dynamic environments, where the most interesting applications for autonomousvehicles exist. Many autonomousvehicles have been developedfor simple environments. Only a few efforts approachrelatively .o-pi." environments and only a notable subset of those is discussed here. More information about past autonomousvehicle efforts is provided in other sources (I,2) (see also Manipulations; Multisensor integration; Robotics; Robots, mobile). sHAKy was developedin the late 1960sas a researchtool for problem solving and learning research(B).SHAKY could u.r"pl incomplete task statements, represent and plan paths through space occupiedby known and unknown obstacles,and collect information through visual and touch sensors.JASON was among the first mobile robots to use acousticand infrared (ir) proximity sensorsfor path planning and obstacleavoidanceas well as having a considerableproportion of its computation done onboard (4). The Jet Propulsion Laboratory (JPL) Rover was intended as the prototype for a mobile planetary exploration robot and was designedto deal with an unknown environment and uneven terrain populated by obstacles(b). HILARE was the first mobile robot to actually build a map of unknown space using acoustic and visual sensors,represent map information as a graph partitioned into a hierarchy of places, construct approximate three-dimensional representationswith informa-
40
AUTONOMOUS VEHICLES
tion from two-dimensional optical vision and a laser range finder, and integrate information from a variety of sensorsto make vehicle position estimates (6). The Stanford University (SU) Cart was developed to explore stereovision navigation and guidance for a mobile robot. It could travel over completely unknown flat territory while avoiding obstacles and has been tried outdoorswith man-made obstacleswith limited success(7). Of all these vehicles only HILARE remains an active research effort, although the SU Cart experiments are used in other vehicles at Carnegie-MellonUniversity (CMU) (8). Nevertheless, participation at a recent autonomous ground vehicles workshop has indicated a rapidly growing interest in the field (9). In spite of the diversity of possibleconfigurations,all autonomous vehicles must perform certain common functions to be capableof autonomousmobility. For simple vehiclesonly vehicle control and position location functions are required. An autonomousvehicle must control its transport mechanism and internal environment to reach the goal, and it must know its location in some absolute reference frame, at least, to determine when it has reached the goal. All past implementations have employed this minimal functional set. If the traversed environment is insufficiently knowh, an autonomous vehicle must perceive the environment through sensors(qt) for various purposes;if the environment contains localizedobstacles, the vehicle must perceiveand avoid them; if potential vehicle paths to the goal location are constrainedby known or perceivable large-scale features and the time that the vehicle has to reach the goal is finite, the vehicle must plan its route using information provided by an existing map and/or by the perception system; and if the environment is unknown and the vehicle must store environmental characteristics during its transit for later use (i.e.,make a map), the systemmust learn from its sensorperceptions.Perception,vehicle control, position location, obstacleavoidance(qv),route planning, and learning (qv) are the generic functions necessary for any level of autonomous mobility.
edge map. A decision tree guided the image search for obstacles (3). HILARE uses a two-dimensional camera image together with a laser range finder to develop three-dimensional world representations. An adjacency matrix that represents each region in the image is constructed by following edges detected by nearest-neighbor analysis. The matrix is pruned using region dimensions and inclusion- and object-contrast constraints; then a computer-controlled laser range finder obtains the range information for each region in the scene(12). As an example of stereo vision (qv), the SU Cart took nine pictures at different positions and used an interest operator on one of them to identify features for tracking. A correlator looked for those features in the remaining images. Features were stored as several different-sized windows, and the correlator used a coarse-to-finestratery to match the features. A camera solver took the information from the correlator and computed the relative feature positions. The camera solver superpositionedthe normal error curves of the feature position estimates from each image and chose the peak value as the feature position. Features that were not reacquired after several successiveframes were forgotten, and new features were added to the feature list using the interest operator. Objects were modeled as clouds of features approximated by spheres. This system did not seebland objects,and the long processing time causedit to becomeconfusedby moving shadowsof outdoor situations (7). Recent work has extended the SU Cart work. This work, embodied in a system called FIDO, uses imaging and motion geometry constraints to reduce the correlator searchwindow and to improve the accuracyof the vision. Imaging geometry constraints include near and far limits and epipolar constraints.Motion constraints use estimatedvehicle motion to limit the search areaand to gauge to reasonableness of a stereo match. FIDO reducescomputational complexity by restricting vehicle motion to a plane (8). Experiencehas provided the following observations:epipolar constraints are the single most powerful constraints, more features improve vision accuracy, and geometric constraints tend to limit the search area too much (8). Optical flow (qv) analysis can also locate the obstaclesnear a vehicle. One technique assumes Perception that the scenecontains visible vertical edgesand that the floor Perception subsystems in autonomous vehicles are used pri- is almost flat. Information from a camera tilt sensorconstrains marily for path detection (3-5,10), position location (10,11), the search for the vanishing point in an image. The exact and mapping (5,10).Path detectionincludesdetectionof obsta- camera tilt angles are computedfrom the vanishing point locacles and roadways. Perceptual position locating can be accom- tion. Knowing the camera angles reduces the optical flow plished by map matching and landmark recognition. Mapping equations to just the translational components.The optical activities build and improve the vehicle's assessmentof the flow equations are used to track features found in the neighborhood of vertical lines using an interest operator through environment. successiveimages (13). located ObstacleDetection. Obstaclescan be detectedand RoadDetection. Road detection is an alternative to obstacle with direct-ranging sensors(e.g., acoustic ranging sensors)or with a variety of vision techniques (e.g., simple two-dimen- detection if roads are available. In one technique the edgesof an image are detected with a model-directed gradient operasional vision, stereovision, motion stereo, and optical flow). Acoustic ranging sensorscan detect and locate both obsta- tor, and the edge map is corrected using a camera model and cles and free space.In one technique raw sensor returns are assuming a flat world. Roadsare detectedby rotating the edgethresholded and clustered; then probability functions of range filtered image 45" and applying a Hough transform (qv) to and azimuth are assigned to each filtered sensor reading. detect path edges.This technique works well when the vehicle Maps are generated by superpositioning the sensor-reading is closeto the road center and degradesnear the edges(14). In another technique visual road detection is performed in two probability distributions onto the floor plane (10). phases,bootstrap and feed forward. The bootstrap phase operSHAKY located free spaceand obstacleson aflat floor with a single camera'simage. The raw image was first reducedto a ates in situations when no prior sceneinformation is known. (qv) line representation usin g a gradient operator, and then floor Dominant linear features are extracted by region growing fitsmoothing edge-preserving using components to the connected applied were operations boundary and object-finding
AUTONOMOUS VEHICLES ters' The resulting features are consistently labeled by geometric and rule-based reasoning modules (seeRule-basua ,yr_ tems). The feed-forward phase uses information from previou, imagery to constrain the image search to a small region of the total image. Accurate predictions significantly reduce the window size' substantial processing savings are available if the absolute camera orientation is known (iP(Hrlrt, . . , eN)_ 1. Example 3. Assume that the system contains two detectors having identical characteristics, given by the matrix above. Further, let the prior probabilities for the hypothesesin Example 2 be representedby the vector P(H) - {0.099, 0.009, 0.001,0.891) and assumethat detector 1 was heard to issue a high sound while detector 2 remained silent. From Eq. 22 one has t r t _ ( 0 . 1 , 0 . 4 4 , 0 . 40, )
) . 2- ( 0 . b ,0 . 0 G 0, . b , 1 )
A - trltr2- (0.05,0.02G,0.2, 0) P ( H i l t ' , u ' ) : a ( 4 . 9 5 ,0 . 2 3 8 ,0 . 2 0 ,0 ) 1 0 * s _ (0.919,0.0439,0.0375,0) Thus, the chanceof attempted burglary (Hzor Hs) is 0.043g + 0.0375_ 8.l4vo. The updating of belief need not wait, of course,until all the evidenceis collectedbut can be carried out incrementally. For example, if one first observese t - high sound, the beliei in H calculates to
o(Hlw) - L(wlH)o(H)
P(HilG, W) - aP(G, WIH)pfH,) : aP(H,) ) p(G, wlH,, sJ)p( silH,)
P(Hilu') _ a(0.0099,0.00996,0.0004,0) - (0.694,0.277,0.028,0) This now serves as a prior belief with respect to the next datum, and after observing e2 : no sound, it updates to P ( H i l t t , u , ) - o ' ) \ ? p ( H , l u r -) a , ( 0 . 8 4 70, . 0 1 6 6 0 , . 0 1 4 ,0 ) _ (0.919,0.0439,0.032b,0), as before. Thus, the quiescent state of dete ctor 2 lowers the chancesof an attempted burglary from B0.s to g.L4vo.
(25)
Unfortunately, the task of estimating L(WIH) will not be as easy as that of estimating ^L(SlH) becausethe former requires the mental tracing of a two-step process,as shown in Figure 1. Moreover,even if L(WIH) could be obtained,one *orrtd not be able to combine it with other possibletestimonies, say Mrs. Gibbons's (G), by a simple processof multiplication dq. zg) becausethose testimonies will no longer be conditionally independent with respect to H. What Mrs. Gibbons is abouf to say dependsonly on whether an alarm sound can be heard in the neighborhood,not on whether a burglary actually took place. Thus, it will be wrong to assumep(Glburglary, W) _-p(Gt burglarY) becausethe joint event of a burglary together with Watson's testimony constitutes a stronger eviden.. fo1"the occurrence of the alarm sound than the burglary alone. Given the level of detail used in the story, it is *or. reasonableto assumethat the testimonies W and G and the hypothesisH are independent of each other once one knows whether the alarm sensorwas actually triggered. In other words, each testimony dependsdirectly on the alarm system (S) and is only indirectly influenced by the possible occurrenceof a burgla ry (H) or by the other testimony (seeFiS. 1). These considerations can be easily incorporated into Bayesian formalism; using Eq. 3, Eq. 19 is simply conditioned and summed on all possiblestates of the intermediate variable S:
(26)
Gibbons's testimony Burglary
--'P Watson's testimony
Figure 1. A diagram illustrating cascadedinference through an intermediate variable S.
52
BAYESIANDECISIONMETHODS
where Sj U - 1, 2) stands for the two possibleactivation states alarm sound should be accordeda confidencemeasure of 80Vo, The task is to integrate this probabilistic judgment into the oftheaIarmsystem'namely,Sr:alarmtriggeredandS2 alarm not triggered. Moreover, the conditional independence body of hard evidence previously collected. In Bayesian formalism the integration of virtual evidence af G,IV, and I/; with respectto the mediating variable S yields is straightforward. Although the evidencee cannot be articu(27) P(G,WlHt Sj) : P(GIS;)P(W|S;) lated in full detail, one interprets the probabilistic conclusion as conveying likelihood ratio information. In the story, for and Eq. 26 becomes example, identifying e with G : Gibbons's testimotrY, Mr. P ( H i l G , W ) - a P ( H ) T P ( G l S j ) P ( W l S ; ) P ( S J l H , ) ( 2 8 ) Holmes's summary of attributing 80Vocredibility to the alarm t sound event witl be interpreted as the statement P(Gl alarm The computation in Bq. , ,unbe interpreted as a three- sound):P(Glno alarm sound) : 4:L More generally,if the state process:first, the local likelihood vectors P(GIS;) and variable upon which the tacit evidence e impinges most di. , S;, . . . the P(W lS; ) are multiplied together, componentwise,to obtain rectly has several possible states Sr, Sz, the likelihood vector Aj(S): P(elS;), where e standsfor the interpreter would be instructed to estimate the relative magtotal evidencecollected,G and IV. Second,the vectorP(elS,) it nitudes of the terms P(elS,) [e.g.,by eliciting estimatesof the multiptied by the link matrix n4rj: P(Sj lgt) to form the likeli- ratios P(elS;) :P(elSr)1,and sincethe absolutemagnitudesdo hood vector of the top hypothesis Li(H) - P(elH)- Finally, not affect the calculations, one can proceedto update beliefs as using the product rule of Eq. 5 (see also Eq. 19 or 24), Li(H) if this likelihood vector originated from an ordinary, logically is multiplied by the prior P(H) to give the overall belief crisp event e. For example, assuming that Mr. Watson's phone in I/;. call already contributed a likelihood ratio of 9 : 1 in favor of the This processdemonstrates the psychologicaland computa- hypothesis alarm sound,the combinedweight of Watson's and tional role of the mediating variable S. It permits one to use Gibbons'stestimonies would yield a likelihood vector A;(S) : local chunks of information taken from diverse domains [e.9., P(W,G lS;) : (36,1). P(H), P(GlS; ), P(W lS; ), and P(Sj lH)l and fit them together This vector can be integratedinto the computationof Eq. to form a global, cross-domaininference P(H le) in stages,us- 28, andusing the numbersgiven in Example1, onegets ing simple and local vector operations. It is this role that prompted somephilosophersto posit that conditional indepenA;(r/): ) nr(s)P(srlHr) j passively must one for which nature of an accident is not dence wait but rather a psychological necessity that one actively dictates, as the need develoPs,bY, for example, coining names to new, hypothetical variables. In medical diagnosis, for inP(Hil G, W) _ at\;(H)P(Ht) stance, when some symptoms directly influence each other, - a ( 3 4 . 2 51 the medical profession invents a name for that interaction , . 3 5 ) ( 1 0 - 41, 1 0 - 4 ) (e.g., complication, pathological state, etc.) and treats it as a (30) : (0.00253,0.99747) new auxiliary variable that induces conditional independence; knowing the exact state of the auxiliary variable renders the Note that it is important to verify that Mr. Holmes's 807o interacting symptoms independent of each other. summarization is indeed based only on Mrs. Gibbons's testimony and does not include prejudicial beliefs borrowed from Virtual (lntangible)Evidence previous evidence (e.g., Watson's testimony or crime rate information); otherwise one is in danger of counting the same Holmes: Mr. of story in the development Imagine the following information twice. The likelihood ratio is, indeed, unaffected practitioners claim that people Example 5. When Mr. Holmes calls Mrs. Gibbons, he soon by such information. Bayesian of their beliefs and of anorigins the retracing of capable ur. his answering realizes that she is somewhat tipsy. Instead of as "What if you didn't question directly, she goes on and on describing her latest swering hypothetical questions such increment increasein the "estimate or operation and how terribly noisy and crime ridden the neigh- receive Watson's call?" alone." testimony to Gibbons's due belief borhoodhas become.When he finally hangs up, all Mr. Holmes An effective way of eliciting pure likelihood ratio estimates can make out of the conversation is that there probably is an by previous information would be to first let one unaffected from sound alarm an hear did 80Tochance that Mrs. Gibbons imagine that prior to obtaining the evidence, one is in the her window. standard state of total ignorance and then estimates the final given to a proposition as a result of observing The Holmes-Gibbons conversation is the kind of evidence degree of belief this example, if prior to conversing with Mrs. In evidence. the estimate that is hard to fit into any formalism. If one tries to had a "neutral" belief in s, that is, the probability P(el alarm sound), one would get ridiculous Gibbons Mr. Holmes : t, the postconversationestimate _ p(alarm) alarm) P(no numbers becauseit would entail anticipating, describing, and p(alarmlc) indeed correspondto a likelihood would 80Vo assigning probabilities to all possible coursesMrs. Gibbons's alarm. of favor in 1 4: of ratio circumstances. the conversation might have taken under These difficulties arise whenever the task of gathering evidenceis delegated to autonomous interpreters who, for various PredictingFutureEvents reasons,cannot explicate their interpretive processin full defeatures of causal modelsin the Bayesian tail but, nevertheless, often produce informative conclusions One of the attractive they lend to the prediction of yet-unobease that summarize the evidence observed. In this case Mr. formulation is the possible denouementsof social epithe as such events served Holmes's conclusion is that, on the basis of his iud8mental given test, prognosesof a given disease, a of outcomes sodes, (alone!), hypothesis the interpretation of Gibbons'stestimony
: ("1;") (2e (3:3? 3:33)(T)
BAYESIAN DECISION METHODS and so on. The need to facilitate such predictive tasks may, in fact, be the very reason that human beings have adopted causal schema for encoding experiential knowledge Example 6. Immediately after his conversation with Mrs. Gibbons, as Mr. Holmes is preparing to leave his office, he recalls that his daughter is due to arrive home any minute and, if confronted by an alarm sound, would probably (0.2) phone him for instructions. Now he wonders whether he shouldn't wait a few more minutes in case she calls. To estimate the likelihood of the new target event: D : daughter will call, one has to add a new causal link to the graph of Figure 1. Assuming that hearing an alarm sound is the only event that would induce the daughter to call, the new link should emanate from the variable S, and be quantified by the following P(DIS) matrix:
-D (will not call)
on
D (will call) 0.7
off
0
1
0.3
S Accordingly, P(D lall evidence)is given by P(Dle) which means ;* all the length, .oilodes with Dr. warson and Mrs. Gibbons impart their influence on D only via the belief they induced on S, p(S; le). It is instructive to see now how p(S, le) can be obtained from the previous calculation of p(Hile). A natural temptation would be to use the updated belief p(Hile) and the link matrix P(sj lH) and, through rote, write the conditioning equation
53
Thus, together, one has P(S;le) _ a(36, 1X0.0101,0.9899)_ (0.268G, a.7gr4) (gb) which gives the event sr - alarm-sound-on a credibility of 26.86Vo and predictsthat the event D - daughter-will-call will occur with the probability of
P(Dld f
: (0.2686 ,0.7s14)(ool) : 0.188
(36)
MultipfeCauses and"Explaining Away', TYeestructures like the one used in the preceding section require that only one variable be considereda causeof any other variable. This structure simplifies computations,but its representational power is rather limited because it forces one to group together all causal factors sharing a common consequenceinto a single node. By contrast, when peopleassociatea given observation with multiple potential causes,they weigh one causal factor against another as independent variables, each pointing to a specializedarea of knowledge. As an illustration, consider the following situation: Example 7. As he is pondering this question, Mr. Holmes remernbershaving read in the instruction manual of his alarm system that the device is sensitive to earthquakes and can be triggered (A.D by one accidentally. He realizes that if an earthquake had occurred,it would surely (0.g) be on the news. So, he turns on his radio and waits around for either an announcement or a call from his daughter.
Mr. Holmes perceives two episodesthat may be potential causes for the alarm sound, an attempted burglary and an P(S;le)_ T p(S;lH)P(H;le) (32) earthquake. Even though burglaries can safely be assumed independent of earthquakes, stilt a positive radio announcement would reduce the likelihood of a burglary, as it "explains also known as Jeffrey's rule of updating (1). This equation, away" the alarm sound. Moreover, the two causal u.r.rt, are however, is only valid in a very special set of circumstances.It perceived as individual variables (seeFig. 2); general knowlwill be wrong in the example becausethe changesin the belief edge about earthquakes rarely intersects knowledge about of H actually originated from the correspondingchangesin S; burglaries. reflecting these back to S would amount to counting the same This interaction among multiple causesis a prevailing patevidencetwice. Formally, this objection is reflected by the intern of human reasoning. When a physician discoversevidence equality P(S; lH) + P(SilHt, e), stating that the evidenceobin favor of one disease,it reduces the credibility of other distained affects not only the belief in H and s but also the eases,although the patient may as well be suffering from two strength of the causal link between H and S. On the surface, or more disorders simultaneously. A suspectwho provides an this realization may seem detrimental to the usefulness of alternative explanation for being present at the ,r.r," of the Bayesian methods in handling a large number of facts; having to calculate all links' parameters each time a new piece of evidence arrives would be an insurmountable computational burden. Fortunately, there is a simple way of updating beliefs ( B u r g l a r y , n o b u r g l a r y) that circumvents this difficulty and uses only the original link matrices (2). The calculation of P(S; le), for instance, can be performed as follows. Treatittg S as an intermediate hypothe-E) {Earthquake, sis, Eq. 5 dictates P(S; le) _ oP(elSj )P(Sj)
(33)
The term P(elS;) is the likelihood vector Aj(S), which was calculated earlier to (36, 1), and the prior p(S; ) is given by the matrix multiplication
P(s;) :
)
rrs; |H)P(H,)- (10-4,1
- (0.0101, 0.g8gg)
0.01\ 1o-4)(0.e5 \0.01 0.99/ (34)
(Report, -R
( A l a r mn, o a l a r m)
\
t\r-.
( W i l l c a l l ,w i l f n o t )
Watson's call - true
T9 Gibbons's testimony
Figure 2. A diagram representing the causal dependencies among the variables in Examples l-7.
BAYESIANDECISIONMETHODS
crime appearsless likely to be guilty even though the explanation furnished does not preclude his committing the crime. To model this "sideways" interaction a matrrx M should be assessedgiving the distribution of the consequencevariable as a function of every possible combination of the causal variables. In the example one should specify M _ P(SIE, H), where E stands for the variable E - {earthquake, no earthquake). Although this matrix is identical in form to the one describedin Example 2, Eq. L8, where the two causal variables were combined into one compoundvariable {f/1 , Hz, Hs, Hq}, treatin g E and H as two separateentities has an advantage in that it allows one to relate each of them to a separate set of evidencewithout consulting the other. For example, the relation betweenD andft (the radio announcement)can be quantified by the probabilities P(RIE) without having to consider the irrelevant event of burglary, as would be required by compounding the pair (8, R) into one variable. Moreover, having received a confirmation of R, the beliefs of E and f/ can be updated in two separatesteps,mediated by updating S, closely resembling the processused by people. An updating scheme for networks with multiple-parent nodesis describedin Refs.3 and 4. If the number of causal factors ft is large, estimating M may be troublesomebecause,in principle, it requires a table of size 2k. In practice, however, people conceptualizecausal relationships by creating hierarchies of small clusters of variables, and moreover, the interactions among the factors in each cluster are normally perceived to fall into one of a few prestored, prototypical structures each requiring about k parameters. Common examples of such prototypical structures are: noisy OR gates (i.e., &Dy one of the factors is likely to trigger the effect), noisy AND gates, and various enabling mechanisms (i.e., factors identified as having no influence of their own except enabling other influences to becomeeffective).
sitions, and the strengths of these influences are quantified by conditional probabilities (Fig. 3). Thus, if the graph contains . , rcn,and Si is the set of parents for the variables tc1, variable xi, a complete and consistent quantification can be attained by specifyitg, for each node r;, a subjective assessment P ' (x| S; ) of the likelihood that r; will attain a specific value given the possible states of S,. The product of all these assessments, P(h, . , xn) : fl r'(r; lSr) constitutes ajoint-probability *oa., that supports the assessed quantities. That is, if the conditional probabilities P(xtlS;) dictated by P(h, . , xn)are computed,the original assessmentsare recovered.Thus, for example,the distribution corresponding to the graph of Figure 3 can be written by inspection: P ( x t , X z , X B ,X 4 , x g , X 6 )
- P(xol*r)p(rsIx2, xs)P(xdl*r,*r)P(rg l*t)P(*rl*r)P(rr) An important feature of a Bayesian network is that it provides a clear graphical representation for many independence relationships embeddedin the underlying probabilistic model. The criterion for detecting these independenciesis based on graph separation: Namely, if all paths between f; and xcia;re ,,blocked" by a subset s of variables, r; is independent of xi given the values of the variables in S. Thus, eachvariable r; is independent of both its grandparents and its nondescendant siblings, given the values of the variables in its parent set S;. For this blocking criterion to hold in general, one must provide a specialinterpretation of separation for nodesthat share common children. th. pathway along arrows meeting head to any of its descendants head at noderp is blocked;neither xcptrot is in S. In FigUre 3, for example, n2 and xs &te independent given Sr : {tJ or Sz : {xt, x+} becausethe two paths between i, and,xs &YQblocked by both sets.HowevQt,)c2and tr3m&'' not BayesianNetworks bL independentgiven Sg : {h,ro} becauses6, &s a descendant 2 1 and Figures as such In the preeeding discussiondiagrams of x5, ,rrrblocksthe head-to-headconnectionat xs,thus opening purposes. illustrative or mnemonic for merely not were used a pathway betweerl xz and 13. They in fact convey important conceptual information, far more meaningful than the numerical estimates of the probabilities involved. The formal properties of such diagraffis, Belief Propagationin BayesianNetworks called Bayesian networks (4), are discussedbelow. Once a Bayesian network is constructed, it can be used to Bayesian networks are directed acyclic graphs in which the represent the generic causal knowledge of a given domain and nodesrepresent propositions (or variables), the arcs signify the can be consulted to reason about the interpretation of specific existenceof direct causal influencesbetween the linked propo- input data. The interpretation processinvolves instantiating a sel of variables coruespondingto the input data and calculating its impact on the probabilities of a set of variables designated as hypotheses.In principle, this processcan be executed by an external interpreter that may have accessto all parts of the network, may use its own computational facilities, and may scheduleits computational steps so as to take full advantagl of the network topology with respect to the incoming data. However, the use of such an interpreter seemsforeign to the reasoning process normally exhibited by humans. one's limited short-term memory and narrow focus of attention combined with the resistance to shifting rapidly between alternative lines of reasoning seem to suggest that one's reasoning process is fairly local, progressing incrementally along prescribed pathways. Moreover, the speed and ease with which x6 one performs someof the low-level interpretive functions, such r, ,L.ognizing scenes, comprehending text, and even underwith six variables' Figure
3. A typical Bayes network
BAYESIAN DECISION METHODS standing stories, strongly suggestthat these processesinvolve a significant amount of parallelism and that most of the processingis done at the knowledge level itself, not external to it. A paradigm for modeling such an active knowledge base would be to view a Bayesian network not merely as a passive parsimonious codefor storing factual knowledge but also as a computational architecture for reasoning about that knowledge. That means that the links in the network should be treated as the only pathways and activation centersthat direct and propel the flow of data in the process of querying and updating beliefs. Accordingly, one can imagine that each node in the network is designated a separate processorthat both maintains the parameters of belief for the host variable and managesthe communication lines to and from the set of neighboring, logically related variables. The communication lines are assumed to be open at all times, that is, each processor may at any time interrogate the belief parameters associated with its neighbors and comparethem to its own parameters. If the compared quantities satisfy some local constraints, no activity takes place. However, if any of these constraints is violated, the responsible node is activated to revise its violating parameter and set it straight. This, of course, will activate similar revisions at the neighboring processorsand will set up a multidirectional propagation process,which will continue until equilibrium is reached. The fact that evidential reasoning involves both top-down (predictive) and bottom-up (diagnostic)inferences(seeprocess_ irg, bottom up and top down) has causedapprehensionsthat, oncethe propagation processis allowed to run its courseunsupervised, pathological casesof instability, deadlock,and circular reasoning will develop(b). Indeed, if a stronger belief in a given hypothesis means a greater expectation for the occurrence of its various manifestations and if, in turn, & gr€ater certainty in the occurrenceof these manifestations adds further credence to the hypothesis, how can one avoid infinite updating loops when the processors responsible for these propositions begin to communicate with one another? It can be shown that the Bayesian network formalism is supportive of self-activated, multidirectional propagation of evidence that convergesrapidly to a globally consistent equilibrium (4). This is made possible by characierizing the belief in each proposition as a vector of parameters similar to the likelihood vector of Eq. 20, wtth each componentrepresenting the degree of support that the host proporition obtains from one of its neighbors. Maintaining such a breakdown record of the origins of belief facilitates a clear distinction between belief basedon ignorance and those basedon firm but conflicting evidence.It is also postulated as the mechanism that permits peopleto trace back evidenceand assumptionsfor the prrrpo." of either generating explanations or -odifying the -oa.i. As a computational architecture, singly connectedBayesian networks exhibit the following characteristics: New information diffuses through the network in a single pass;that is, equilibrium is reachedin time proportional to the diameter of the network. The primitive processors are simple and repetitive, and they require no working memory exceptthat used in matrix multiplication. The local computations and the final belief distribution are entirely independent of the control mechanism that acti-
55
vates the individual operations. They can be activated by either data-driven or goal-driven (e.g., requests for evi_ dence)control strategies, by a clock, or at random. Thus, this architecture lends itself naturally to hardware implementation capable of real-time interpretation of rapidly changing data. It also provides a reasonable model of neural nets involved in cognitive tasks such as visual recognition, reading comprehension,and associativeretrieval wher" ,rrr*rrpervised parallelism is an uncontestedmechanism. RationalDecisionsand euality Guarantees Bayesian methods, unlike many alternative formalisms of uncertainty, provide coherent prescriptions for choosingactions and meaningful guarantees of the quality of these choices.The prescription is basedon the reahzation that normative knowledge-that is, judgments about values, preferences,and desirability-represents a valuable abstraction of actual human experienceand that, like its factual knowledge counterpart, it can be encodedand manipulated to produce useful ,..o**.rrdations. Although judgments about the occurrenceof events are quantified by probabilities, the desirability of actionconsequencesis quantified by utilities (also called payoffs, or values) (6). Choosing an action amounts to selecting a set of variables in a Bayesian network and fixing their values unambiguously. Such a choice normally alters the probability distribution of another set of variables, judged to be conseq,.,"rr."*of the decision variables. If to each configuration of the consequenceset C a utility measure u(C) is assignedthat representsit, d.gr.. of desirability, the overall expected utility associated with action o is given by
U(a)_ )
C
e) "G)P(Cla,
(37)
where P(Cla, e) is the probability distribution of the consequence set C conditioned upon selecting action a given the evidencee. Bayesian methodologiesregard the expectedutil tty u (d as a figure of merit of action o and treat it, therefore, as a prescription for choosingamong alternatives. Thus, if one has the option of choosingeither action e,1ol^ a2, orle,can calculateboth U (a) and U (a) and select that action that yields the highest value. Moreover, since the value af U (a) dependson the evidencee observedup to the time of decision,the outcomeof the maximum expectedutility criterion witl be an evidence-dependent plan (or decision rule) of the form: If elis observed,choose a1; if e2isobserved,choos€o2, and so on (seeDecisiontheory). The same criterion can also be used to rate the usefulnessof various information sourcesand to decide which piece of evidenceshould be acquired first. The merit of querying variable tr can be decided prior to actually observing its value, by the following consideration.If one queries r and finds the value ,r, the utility of action @will be U (alr*) one is able, dtthis oorrrr, to choosethe best action among all pending alternatives and attain the value
U (u,) -*3* U (alr,)
(3e)
s6
BEAM SEARCH
However, since one is not sure of the actual outcomeof querying r, one must average (J(u") over all possiblevalues of v*, weighed by their appropriate probabilities. Thus, the utility of querying tc calculates to
(J":2P{w - v,le)U(v")
(40)
8. J. von Neumann and O. Morgenstern, Theory of Games and Economic Behauior, 2nd ed., Princeton University Press, Princeton, NJ, t947. General References Bayesian Methodology
where e is the evidence available so far. This criterion can be used to schedule many control functions in knowledge-basedsystems.For example, it can be used to decidewhat to ask the user next, what test to perform next, or which rule to invoke next. The expert system PROSPBCTOR (7) employeda schedulingprocedure(calledJ") basedon similar considerations (seeRule-basedsystems).If the consequence set is well defined and not too large, this informationrating criterion can also be computed distributedly, concurrent with the propagation of evidence.Each variable r in the network stores an updated value of U, and as more evidence arrives, each variable updates its U, parameter in accordance with those stored at its neighbors. At query time, attention will be focused on the observable node with the highest U, value. It is important to mention that the maximum expectedutility rule was not chosenas a prescription for decisionsfor sheer mathematical convenience.Rather, it is founded on pervasive patterns of psychological attitudes toward risk, choice,preferLn..r, and likelihoods. These attitudes are captured by what came to be known as the axioms of utility theory (8). Unlike the caseof repetitive long series of decisions(e.g.,gambling), where the expectedvalue criterion is advocatedon the basis of a long-run accumulation of payoffs, the expectedutility criterion is applicable to single-decisionsituations. The summation operation in Eq. 3? originates not with additive accumulation oi pryoffs but, rather, with the additive axiom of probabitity theory (Eq. 3). In summary, the justification of decisions made by Bayesian methods can be communicated in intuitively meaningful terms, and the assumptions leading to these decisionscan be traced back with ease and claritY.
R. O. Duda, P. E. Hart, P. Barnett, J. Gaschnig,K. Konolig€, R. Reboh, and J. Slocum, Development of the PROSPECTOR Consultant System for Mineral Exploration, Final Report for SRI Projects b821 and 6915, Artificial Intelligence Center, SRI International, 1978. M. Ben-Bassat,R. W. Carlson, V. K. Puri, E. Lipnick, L. D. Portigal, and M. H. Weil, "Pattern-basedinteractive diagnosis of multiple disorders: The MEDAS system," IEEE Trans. Patt. Anal. Mach. Intell. PAMI-L, 148-160 (1980). J. Kim, CONVINCE: A CoNVersational /Nference Consolidation Engine, Ph.D. Dissertation, University of California, Los Angeles, 1983. D. J. Spiegelhalter and R. P. Knill-Jones, "statistical and knowledgebased approachesto clinical decision-supportsystems, with an application to gastroenterology,"J. R. Stat. Soc. A(L47), 35-77, 1984. G. F. Cooper, NESTOR: A Computer-BasedMedical Diagnostic Aid that Integrates Causal and Probabilistic Knowledge, Report No. STAN-CS-84-103L, Stanford University, November 1984' Quasi -Bayesian M ethods Medical Consultation:MYCIN, ElseE. H. Shortliffe , Computer-Based, vier, New York, 1976. C. Kulikowski and S. Weiss, Representation of Expert Knowledge for Consultation: The CASNET and EXPERT Projects,in P. Szolovitz (ed.),Artificial Intelligence in Medicine, Westview Press,Boulder, CO, pp.21-55, t982. R. A. Miller, H. E. Pople, and J. P. Myers, "INTERNIST-1, 8r experimental computer-baseddiagnostic consultant for general internal medicine,"N. EngI. J. Med. 307(8), 468-470 (1982). J. R. euinlan, INFERNO: A Cautious Approach to Uncertain Inference,Rand Note N-1898-RC,September1982. J. PPenr, UCLA
BIBLIOGRAPHY 1. R. Jeffrey, The Logic of Decisions,McGraw-Hill, New York, chapter 11,1965. Z. J. pearl, ReverendBayes on InferenceEngines: A Distributed Hierarchical Approach, Proceedingsof the second AAAI Conferenceon Artifi,cial Intelligence, Pittsburgh, Pennsylvania, pp' 133-136, 1982. B. J. Kim and J. Pearl, A Computational Model for Combined Causal and Diagnostic Reasoningin Inference Systems, Proceedingsof the Eighth interlational Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, PP. 190-193, 1983' 4. J. Pearl, Fusion, Propagation and structuring in Belief Networks, Technical Report CSD-850022, Cognitive Systems Laboratory, ucLA, June 1985;A.1.29(3), 24t-288 (sept. 1986). s. J. Lowrance, Dependency-Graph Models of Evidential support, COINS Technical Report 82-26, University of Massachusettsat
This work was supported in part by the National ScienceFoundation, Grant #DSR 83-13875.
BEAMSEARCH
Beam search is a heuristic search technique in which a number of nearty optimal alternatives (the beam) are examined in parallel. Beam search is a heuristic technique becauseheurislic rules are used to discard nonpromising alternatives in order to keep the size of the beam as small as possible. Some of the ,rr.."rsful applications of beam search include speech recognition (1), job rttop scheduling (2), vision (3), and learning (4). Beam searchcan easily be explained by using a searchstate Amherst, 1982. described by a directed graph in which each node is a space un6. H. Raiffa , Decision Analysis: Introductory Lectures on Choices slate and each arc represents the application of an operator d,er (Jncertainty,Addison-wesl"y, Reading, MA, 1968. a successorstate. A solution is a Bayesian that transforms a state into 7. R. O. Duda, P. E. Hart, and N. J. Nilsson, "subjective a goal state. A few operators are to state initial path an from methods for rule-based inference systems,"Proc. 1976 Natl' Com' (NEXT) to expand a state, that is, gennecessary;an operator ConferenceProceedings),45, 1075-1082 (1976)' put. Conf. AIFPS
BEAM SEARCH
erating all the successornodes of a given node; an operator (SCORE)to evaluate a state, that is, generating the likelihood that a node belongs to the optimal solution; an operator (PRUNE) to select the alternatives that are most promisirg, that is, choosing the best nodes; and an operator (FOUND) to check if the goal has been reached. The operation implemented by PRUNE is often called forward pruning. Beam searchalso requires two data structures: one that coniains the set of states that are being extended (called cuRRENr. srerus) and one that contains the set of new states that is being created (called cANDIDATE. srArps).At each iteration of the Ltgorithm a new set of states is generated and becomesthe current set of states for the next iteration. Given these operators and data structures, beam searchcan be expressedby this simple program: Start:
cURRENT.sTATES:: initial . state while (not FOUND (cunnaxr. srArEs)) do CANDIDATE. STATES:: NEXT (CUNNENT. SIATNS) SCORE (caNomArES. srerps) CURRENT.STATES:: PRUNE
(CANUOATE. STATPS)
5/
in terms of how expensive the search is and in terms of the ability of the algorithm to reach the goal. In general, e ,,permissive" PRUNE will reach the goal most of the time at the expense of examining many unpromising paths (in the extreme case, beam search simply becomes a breadth-first search).On the contraty, & very "strict" PRUNE will limit the amount of computation but will increase the risk of pruning the path that leads to the goal. Therefore, one would like to use the strictest PRUNE that does not prevent the algorithm from finding the optimal solution. How well (if at all) such a compromisecan be reached is a function of the domain being searchedand of the quality of the scoring function. For example, in a speechsystem, if the SCORE operator generateshigh scoresfor only a few allophones(including the correct one) and low scoresfor the other allophones,the algorithm will tolerate a very narrow beam without losing accuracy. In general, the pruning function is no substitute for the quality of the scores since poor and confusedscoreswill generate sets of states for which the score does not truly reflect the likelihood that a state is on the correct path. Finally, it should be noted that although beam search is a very cost-effectivesearch method, becauseit only examinessome of the alternatives, it doesnot guarantee that the optimal solution is found. One of the reasonsthat beam search is attractive is that it reduces computation by reducing the number of states that have to be examined. The amount of saving dependson the specific search domain; experiments with speechrecognition programs showed an improvement of a few orders of magnitude over an exhaustive search.Nevertheless,the large size of some search spacesrequires even higher performance.To this end, the design of parallel beam search algorithms has been investigated. Although it would appear thai parallelism could be readily exploited by performing the NnX1' and SCORE operators in parallel, it has been found (b) that beam search needsto be partitioned into such small componentsthat their synchronization, using the primitives available on generalpurpose multiprocessors, results in too much overhead. This problem can be solved by designittg special architectures for beam search.For example, the Harpy machine (6), a five-processor architecture using small microprocessors,was able to execute the beam search for a speechrecognition application in real time and twice as fast as a large *ui.tftame^. Another example, describedin Ref. 7, is the custom VLSI architecture that can executebeam searchthree orders of magnitude faster than a million-instruction-per-second general purpose processor.
The algorithm is started by providing an initial state (e.g., the initial node of the graph to be sea"ched). Then the NEX-T and SCORE operators are applied to generate all the possible new states and give them a score. When all the new states have been generated, the PRUNE operator is applied to the set of new states, and the unpromising alternatives are discarded. The algorithm iterates until the goal has been reached. For example, beam search is used in the Harpy speech recognition system (1) to search a graph that embodies the syn_ tactic and vocabulary constraints of the language as a sequence of basic speech sounds (allophones). This graph is such that any path from the initial state to the finai state represents a pronunciation of a legal sentence. Given an unknown utterance, Harpy segments the signal and computes the likelihood that each segment represents an allophone. The sequence of labeled segments is then compared against each of the alternative paths in the graph thaC represent acceptable allophone sequences in the language. The operator NEXT extracts from the graph all the nodes that can follo* the nodes in cuRRENr.srArEs. The operator SCORE compares the allophone in each node with a segment of speech and returns a value that indicates how well they match. The PRUNE operator computes a threshold score as a function of the best score and then discards all the nodes that have a score that is worse than the threshold. Therefore, in the Harpy system the pruning is anchored to the best path, and all the nodes that are so close to the best node to have a chance to be on the best path are kept. The FOUND operator simply triggers when all the input speech data have been evaluated. At this point, if the search BIBLIOGRAPHY was successful, the set caNDIDATE. sTATES contains the last node in the network, and the correct utterance can be retrieved by 1. B. T. Lowerre and R. D. Reddy, The Hurpy SpeechUnderstanding tracing the best path backward (a simple look,rp operation if System, in W. A. Lea (ed.;, Trend,sin SpeechRecognition.prenticethe pointers for each path in the beam rtu kept until the end of Hall, EnglewoodCliffs, NJ, 19g0,pp 840_860. the search). Note that the best node at each segment during 2. M. S. Fox, Constraint-Directed Search: A Case Study of Job-Shop the search is not necessarily on the gtobally best path discovScheduling,Ph.D. Thesis,Carnegie-Mellon University, pittsburgh, ered at the end of the search. Thus, local ulrorr, fbr example, PA, computer science Department, December 1ggg. errors due to errorful acoustic data, are recovered by delaying 3. s. Rubin, The ARGos Image understanding system, ph.D. Thesis, commitment to a particular path until the end. carnegie-Mellon university, pittsburgh, pA, computer science As one can see from the Harpy system example, the NEXT Department, November 1928. and SCORE opertors depend on the problem being searched 4' T. G. Dietterich and R. S. Michalski, "fnductive learning and do not directly influence the performance of strucof the search. tural descriptions: Evaluation criteria and comparative review of The PRUNE operator instead influences the performance both selectedmethods," Artif. Intell. 16, zsT-zg4 (November 19g1).
s8
BEUEFREVISION
5. P. N. Oleinick, The implementation and Evaluation of Parallel Algorithms on C.mfrp, Ph.D. Thesis, Carnegie-MellonUniversity, Pittsburgh, PA, Computer ScienceDepartment, 1978. 6. R. Bisiani, H. Mauersberg,and R. Reddy,"Task-Oriented Architectures," Proceedingsof the IEEE,885-896, July, 1983. 7. T. Ananthamaran and R. Bisiani, "Hardware Accelerators for SpeechRecognition Algorithms," in Proceedingsof the 13th International Symposium on Computer Architectu.re, IEEE I4(2) 2L6223 (June 1986). R. BlstnNt Carnegie-Mellon UniversitY
REVISION BETIEF The ability to reason about and adapt to a changing environment is an important aspectof intelligent behavior. Most computer programs constructed by researchers in AI maintain a model of their environment (external and/or internal environment) that is updated to reflect the perceived changes in the environment. One reason for model updating is the detection of contradictory information about the environment. The conventional approach to handling contradictions consists of changing the most recent decisionmade [chronologicalbacktracking (qv)1. An alternative solution [dependency-directed backtracking (qv)l consists of changing not the last choice made, but an assumptionthat provokedthe unexpectedcondition. This secondapproach generated a great deal of research in one area of AI, which becamelooselycalled belief revision. Belief revision is an area of AI researchconcernedwith the issuesof revising setsof beliefs when new information is found to contradict old information. Researchtopicsin belief revision include the study of representation of beliefs, in particular how to represent the notion of belief dependency;the development of methods for selecting the subset of beliefs responsible for contradictions; and the development of techniques to remove somesubsetof beliefs from the original set of beliefs. The research on betief revision is related to the research on nonmonotonic logic, which aims at capturing parts of the logic of belief revision systems (seeReasoning,nonmonotonic). The fietd of belief revision is usually recognized to have been initiated by J. Doyle, who, basedon the work of Stallman and sussman (1), developedan early domain-independentbelief-revision system (2,3), although a system which performs belief revision was developedat approximately the same time by P. London (5). Following Doyle, several researchers pursued this topic, most of them building on the system of Doyle. Some of the important systems developed for belief revision are: TMS (/),RUP (6,7),MBR (8,9),and ATMS (10,11)'In the last few years some commercial systems that perform belief revision becomeavailable, for example, DUCK (from Smart Systems Technology),ART (I2) (from Inference Corporation), and LOOPS (from XEROX). Rootsof the Problemin Al Belief-revision systemsare AI programs that deal with contradictions. They work with a knowledge base, containing propositions about the state of the environment, performing reasoning from the propositions in the knowledge base, and "filtering" the propositions in the knowledge base so that only
part of the knowledge base is perceived-the set of propositions which is under consideration. This set of propositions is usually called the set of believed propositions. When the belief-revision system switches from one of these sets to another, we say that it changes its beliefs. Typically, belief revision systemsexplore alternatives, make choices,explore the consequencesof the choices,and comparethe results obtained when using different choices.If during this processa contradiction is detected, the belief-revision system revises the knowledge base, "erasing" some propositions so that it gets rid of the contradiction. Belief-revision systems have their roots both in the problems raised during searchand in the frame problem of McCarthy and Hayes (13). The frame problem (13,14,15)is the problem of deciding which conditions change and which conditions do not change when a system undergoessome modification. The basis of the problem is that although it is possible to specify the ways in which a system'senvironment might changein terms of effects of actions it still remains to specify someway of deciding what stays unchanged in face of the actions. Early systems approaching these probleffis, (e.g., STRIPS (16) and PLANNER (17,18)) basically worked in the same way: for each of the actions allowed there was a list of conditions which were deleted by the action and a list of conditions which were added by the action. When one action was executed the conditions associated with these lists would be added to and deleted from the knowledge base. In what concerns the revision of the model of the environment, this approach presents two problems: the conditions to be added and deleted have to be carefully tailored as a set to avoid unintended infinite loops of adding and deleting information to the knowledge base; and if a proposition depends on another one that is deleted by some action then the former may be kept in the knowledge base if it is not part of the set of propositions explicitly deleted by the action. An alternative approach, context-layered knowledge bases, divides the knowledge base into smaller knowledge bases so that the consequences of the effect of an action can be grouped with a reference back to a causing action. Such an approach was taken by Fikes (19), who stores situations of a model in a tree, the context tree, in which each node represents a situation. The root of the context tree represents the initial situtation. Since most of the information in a given situation is the same as the information in the previous situation, as a matter of space efficiency, only the differences between the new situation and the old one are actually stored in the node of the context tree representing the new situation. Actions have the effect of creating a new situation in the context tree or returning to some previous situation. Fikes's approach presents the fotlowing drawbacks: The propositions about a given situation of the model are scattered along a path in the context tree and there is no record about the sequence of actions performed. similar approaches were taken in Refs. 2A-23. A new research direction was created by Stallman and sussman, who designed a system,.called EL, in which depen(1). EL maindencies of propositions are permanently recorded it both to using (trace) reasoning, its of tains a comptete record something when make to choices alternative decide which goes wrong and to explain its line of reasoning. Along with each derivid proposition, EL stores the set of all propositions directly used in its derivation and the rule of inference used to derive it, the d.ependency record of the proposition'
B E L I E FR E V I S I O N
EL solveselectric circuit problems.While searchingfor the values of the circuit parameters, EL may have to "guess" the operating range of some devices.Later, if an inconsistency is found, EL knows that somewhere along its way it guesseda wrong state for some device. The novelty of EL's approach to backtracking is that the assumption that is changed during backtracking doesnot necessarilycorrespondto the last choice made but rather to the assumption that provoked the inconsistency fdependency-directedbacktracking (qv)1. When an inconsistencyis detected,EL searchesthrough the chain of dependencyrecords of the inconsistent propositions until it finds all the assumptionsupon which the inconsistentpropositions depend. This set of assumptions is recorded as leading to a contradiction and is never tried again. Then heuristics are used to select one of them to rule out. Stallman and Sussman'swork (1) had two major influences in AI: it opened a new perspectiveto the handling of alternatives (dependency-directedbacktracking) and it triggered the researchon belief-revision systems.
59
inlists of the propositions in the knowledge base,starting with the SL justifications of the contradictory propositions,until it finds all the assumptionsconsideredby the contradictory propositions. One of those assumptionsis selectedas the culprit for the contradiction and is disbelieved. To disbelieve this assumption, TMS believes in one of the propositions referenced in the outlist of the assumption and justifies this proposition with an SL justification whose inlist contains the proposition representing the contradiction. After selecting the culprit for the contradiction, it is necessary to disbelieve all the propositions depending on it. This is done by following the chain of dependencyrecordsand disbelieving each proposition that has no SL justification other than the one that includes the selected culprit in its inlist. This "disbelieving process"is not as simple as it may seem owing to the possibility of circular proofs. Suppose,following an example from Ref. 25, that the knowledge base contains the following propositions: (Vr)[Man(r) -> Person(r)] (VrXPerson(r) -+ Human(r)l (VrXHuman(r) - Person(r)1.
Adding Man(Fred) to the knowledge base will causethe derivation of Person(Fred),which in turn will causethe derivation Building upon Stallman and Sussman'swork, Doyle (2,3) de- of Human(Fred). The addition of Human(Fred) causesPersonsigned the truth-maintenance systems (TMSs), the first do- (Fred) to be rederived. Figure L represents the dependencies main-independent belief revision system. TMS maintains a among the propositions in the knowledge base. In this figure, two directed arcs (labeled PR, for premises) knowledge base of propositions each of which is explicitly marked as believed or disbelieved.TMS may be told that some pointing to a circle mean that the two propositions at the end propositions are contradictory, in which caseit automatically of the arcs were combined to produce the proposition that is revises its beliefs so that no inconsistent propositions are si- pointed by the arc leaving that circle (labeled C, for conclumultaneously believed. sion): The inlist of the SL justification of a proposition pointed TMS is basedon the definition of two kinds of objects:propo- by a conclusion arc contains the propositions at the end to the sitions and justifications. Justifications represent the reasons premisesarcs leading to that proposition. If there exists a path of arcs from the proposition A to the proposition B, it means that TMS believes or disbelieves a certain proposition. Attached to each proposition in the knowledge base there is one that B depends on A. In Figure 1 Human(Fred) depends on (or more)justification(s) that supports TMS's belief or disbelief in the proposition. Although Doyle points out the usefulnessof (Vx) [Man (x) - Person (x)f Man (Fred) four kinds of justifications (4), he mainly implemented one of them, the SL (Support List) justifications. This type of justification contains two lists of propositions, the inlist and the outlist. The proposition supported by an SL justification is believed if and only if every proposition in its inlist is believed and every proposition in its outlist is disbelieved. Whenever one proposition is derived, it is justified by an SL justification (Vr) [Person(x) - Human(r/l containing all the propositions directly used in its derivation and the rule of inference used to derive it. Person Based on the Sl-justifications, there are two distinguished types of propositions in TMS: premises are propositionswhose current Sl-justification has empty inlist and empty outlist (premises are always believed); and assurnptionsare propositions whose current Sl-justification has nonempty outlist. Assumptions are propositions whose belief dependson the disbelief in other propositions. TMS may be asked to add a new proposition to the knowledge base or to change (add or retract) a justification for a proposition. In either caseTMS tries to find disbelievedpropoHuman (Fred) sitions that will be believed by such addition or retraction and tries to find believed propositions that will be disbelieved by the addition or retraction. ( V r ) [ H u m a n ( x ) * P e r s o n( x ) ] In addition, TMS may be told that two believedpropositions are contradictory. In this case the dependency-directedback- Figure 1. Knowledge base dependencies: PR _ premise; C : contracking mechanism is invoked, which will searchthrough the clusion. ExplicitConcernabout RevisingBeliefs
BELIEFREVISION
Person(Fred),which in turn dependson Human(Fred). This is called a circular proof. Supposenow that Man(Fred) is disbelieved. The dependencyarcs leaving Man(Fred) lead to Person (Fred). However, Person(Fred) has another justification and one is faced with the problem of whether to disbelieve Person (Fred) since, although one of its justifications is no longer valid, Person(Fred) may still be believed owing to the other justification. Handling circular proofs raises several problems. A discussionof the possiblesolutions to those problems can be found in Refs. 3 and 24. Doyle's researchtriggered the developmentof several belief revision systems (6,26-29). These systems share two characteristics: They are mainly concernedwith implementation issues, paying no special attention to the logic underlying the system, and each proposition is justified by the propositions that directly originated it.
Concernsfor Foundations The early 1980s saw the development of new research directions in belief revision systems, characterized by an explicit concern about the foundations of the systems independent of their implementations (8,9,30,31)and the use of a new type of justification (8- LL,32). One such system, the MBR (multiple belief reasoner) system of Martins (8,9), is describedhere. There are two distinct aspectsto consider concerning MBR: the logic underlying the system and the way the propositions in the knowledge base (generated according to the rules of inference of the logic) are interpreted by MBR. Any logic underlying belief-revision systems has to keep track of and how to propagate propositional dependencies.The concern for this problem is shared, although for different reasons,with the relevance logicians whose main goal is to avoid the paradoxes of implication. Relevance logicians developed logics that keep track of and propagate propositional dependencies.The logic underlying MBR, the SWM system was influenced by the relevance logic of Shapiro and Wand (33) and on the FR system of Anderson and Belnap (34). The SWM systemassociateseachpropositionwith one (or more) tripls(s), its support, which justifies the existence of the proposition. Each triple contains the following information: 1. The origin tag (OT) tells how the propositionwas obtained. Propositions can be hypotheses,normally derived propositions, or specially derived propositions(propositionswhose derivation sidesteps some of the relevance logic assumptions). This latter case is not discussedhere; see Ref. 8 for further details. 2. The origin set (OS) contains all the hypothesesthat were really used in the derivation of the proposition. 3. The restriction set (RS) contains every set known to be inconsistent with the proposition's origin set. A set is known to be inconsistent with another if it is inconsistent and a contradiction was in fact derived from that union. If the same proposition is derived in multiple ways, its support contains multiple triples. The OT and the OS reflect the way the proposition was derived. The Rs, on the other hand, reflects the current knowledge about how the hypotheses underlying that proposition relate to the other propositions.Once
a proposition is derived, its OT and OS remain constant, whereas its RS may change as contradictions are uncovered. The rules of inference of SWM use the RSs to prevent the derivation of propositions whose OSs would be known to be inconsistent. MBR is a belief-revision system that works with a knowledge base containit g propositions generated according to the rules of inference of SWM. In this knowledge base each proposition is associatedwith a support (in SWM's sense).MBR relies on the notions of context and belief space.A context is any set of hypotheses.A context determines a belief space,the set consisting of every propositionwhose OS is a subsetof the context which definesthat belief space.At any moment there is one active context, the current context, and the knowledge base retrieval operations are defined such that they only retrieve the propositions in the belief spacedefined by the current context. Figure 2 shows MBR's knowledge base originated by the example of the last section. In this figure a circle pointed to by an arc labeled DO (derivation origin) represents the support of the proposition at the end of the arc. Note that Person (Fred) has two supports. The arcs labeled OS leaving the support point to the hypothesesfrom which the proposition was derived. Since each proposition is directly connectedwith the hypotheses that underly it, there are no circular proofs. When a contradiction is detected,the origin sets of the contradictory propositions are inspectedand their union becomes a set known to be inconsistent. Every proposition in the knowledge base whoseorigin set is not disjoint from this newly discovered inconsistent set has its restriction set updated in order to reflect the current knowledge about inconsistent sets in the knowledge base. In MBR's implementation there is a considerableamount of sharing between knowledge base structures, namely, origin sets and restriction sets,which is possiblesince SWM's formalism guarantees that two propositions with the same OS have the same RS as well. systems versusAssumption-Based fustification-Based Any belief revision system must keep a record of where each proposition in the knowredge base came from. These records .r"lrrrpected while searching for the culprit of a contradiction. corThere are two ways to recoia tne origin of propositions' assumption-based to and responding to 5ustincation-based proposition ,yi"-, fsz). In justification-based systems each origdirectly that propositions the about information contains and 6,26-29' 2, 3, inated it. This approachwas used in Refs' contains 81. In assumption-basedsystems each proposition information about the hypotheses (nonderived propositions) that originated it. This approach was taken in Refs. 8-11, and 32. A5o*ption-based systems present several advantages over justification-based systems. These advantages are summarized by a comparison of the two systems discussedin this entry, TMS and MBR. An excellent comparison of the two can be found in Ref. (32). The advantages of as"pprouches sumption-based systems over justification-based systems are presented as follows: 1. Changing setsof beliefs.In TMS changing one set of beliefs into another can only be accomplished upon detection of a contradiction, in which casethe dependency-directedbacktracking goesthrough the entire knowledge base,marking
BELIEFREVISION
6I
H u m a n( F r e d )
(Vr) [Person(x) - Human (x)]
DO
(Vr) [Man (x) * Person(x)l
Man(Fred)
Person (Fred)
DO (Vr) [Human (x) - person(x)f
OS
Figure 2' Knowledge base dependencies:DO - derivation origin; OS : origin set.
and unmarking propositions. In MBR changing sets of be- the techniques developedby belief-revision systems.However, liefs is done by changing the current context. Afterward the there are someareas in which the techniques discussedin this knowledge base retrieval operations will only consider the entry are of paramount importance, some of which propositions in the new belief space.There is are listed no marking or below. unmarking of any kind. 2' Comparing setsof beliefs.In TMS it is impossible to exam- 1' Reasoning based on partial information, d,efault assumpine two sets of beliefs simultaneously. This may tions, and potentially inconsistentd,ata.This kind be impor_ of reasontant when one must outweigh the outcomeof ,"rr.r"l possiittg is likely to generate contradictions. Thus, it is of prible solutions. In MBR t"n"r"l sets of beliefs may mary importance that the system be able to determine colxist; the thus, it is simple to compare two solutions. causesof contradictions, remove them, and after doing so, 3' Backtracking- TMS relies on the dependency-directed be able to find every proposition in the knowredge backbase tracking mechanism, which follows the dependency depending on the selectedculprit (seeReasonirg, recdefault). ords, identifying all the assumptions leading to u gi,o.r, 2. Learning. A potential source of learning (qv) consists of contradiction. In MBR there is no backtracking of analyzittg the mistakes so that the same anyi.irra. mistake is not Upon detection of a contradiction, all the assumptions made twice. This calls for belief revision and unassignment of derlying that contradiction are directly identifiabre (they credit to the source of the mistake. are the union of the origin sets of the .ont"rdictory proposi- 3. Replanninq from failures. In any planning (qv) system tions). there should be a component that analyzes sourcesof prob4. Finding faulty assumptions.In MBR, upon lems and prevents thl generation detection of a of a plan that leads to contradiction, the hypotheses underlyi"g it trouble. Again, berief revision techniqrrl, are immedican be used to ately identified, making it easy to compur. sets detect the sourceof the problems and prevent of hypothe_ to the generases underlying contradictions. tion of ill-formed plans. 4' Reasoning about the _beriefsof other agents. Any program However, using only assumptions as support disables the that reasons about the beliefs of othei agents (see Belief explanation of the reasoning sequencefoliowed by the prosystems) should maintain a clear-cut distinction between gram.The system of Refs. 10, 11, and 32 uses its beliefs and the beliefs of the others. both assumpBelief-revision techtions and justifications, offering the advantages of both niques contribute to this application in aptheir concernswith proaches. the changing of belief rp".*. The program must be able to changebelief spaces,must know *rti.r, belief spaceis being considered,and must fail to consider the Appfications information from the other(s) belief space(s). The capability of determining the source of information cou- 5. Systemsfor natural-Ianguage und,erstand,ing(qv) (in which pled with the possibility of chlnging beliefs are essential feaone needsto considerseveral competing interpretations tures of any intelligent system. In general, any of a system that sentence)and uision (qv) (in which one needs has to chooseamong alternatives can use (and benefit to revise hyfrom) pothesesabout the contents of images).
62
BELIEF REVISION
6. Qualitatiue reasoning (qv), a kind of reasoning that requires making choicesamong alternatives (see,for example, ref 35). 7. Systemsthat selectbetweendesign alternatiues,which may have to change choices made. 8. Diagnoses(seeMedical Advice Systems). It should be kept in mind, however, that belief revision is only applicable in caseswere credit for consequences of choices is assignable. Referencesto other work in the domain of belief revision (both in AI and in other disciplines) can be found in Ref. 36, which presents an extensive reference list. References32 and 37 presents an excellent discussion of belief-revision techniques and problems. References3 and 8 give overviews of the field and discuss in detail the two systems presented here, TMS and MBR respectively.
BIBLIOGRAPHY 1. R. M. Stallman and G. J. Sussman, "Forward reasoning and dependency-directed backtracking in a system for computer-aided circuit analysis,"Artif. Intell.9, 135-196 (1977). 2. J. Doyle, Truth Maintenance Systemsfor Problem Solving, Technical Report AI-TR-4L9, MIT AI Laboratory, Cambridge, MA, 1978. 3. J. Doyl€, "A truth maintenance system," Artif. Intell. Lzr 23L-272 (197e). 4. Reference3, pp. 239-244. 5. P. London, DependencyNetworks as Representation for Modelling in General Problem Solvers, Technical Report 698, Department of Computer Science,University of Maryland, CollegePark, MD, 1978. 6. D. McAllester, An Outlook on Truth Maintenance, AI Memo 551, MIT AI Laboratory, Cambridge, MA, 1980. 7. D. McAllester, "A Widely Used Truth Maintenance System," unpublished, MIT, Cambridge, MA, 1985. 8. J. Martins, Reasoningin Multiple Belief Spaces,Technical Report 203, Department of Computer Science,State University of New York at Buffalo, Buffalo, NY, 1983. 9. J. Martins and Shapiro S. C., "Reasoning in Multiple Belief Spaces,"Proc. of the Eighth IJCAI, Karlsruhe, FRG, 1983, pp. 370-373. 10. J. DeKleer, "An Assumption-BasedTMS," Arfificial Intelligence 28, (L996). 11. J. DeKleer, "Problem Solving with the ATMS," Artifi,cial Intelligence28, (1986). L2. B. D. Clayton, "ART Programming Primer," Inference Corporation, April 1985. 13. J. McCarthy and P. Hayes, SomePhilosophicalProblems from the Standpoint of Artificial Intelligence, in B. Meltzer and B. Michie (eds.),Machine Intelligence, Vol. 4, Edinburgh University Press, Edinburgh, U.K., pp. 463-502, L969. L4. P. J. Hayes, The Frame Problem and Related Problems in Artificial Intelligence, in B. Elithorn and B. Jones (eds.),Artificial and Human Thinking, Jossey-Bass,San Francisco, CA, pp. 45-59, 1973. B. Raphael, The Frame Problem in Problem Solving Systems,in 15. N. Findler and B. Meltzer (eds.),Artificial Intelligenceand Heuris' tic Programming, American Elsevier, New York, pp. 159-169, L97L. 16. R. Fikes and N. Nilsson, "STRIPS: A new approachto the applica-
tion of theorem proving to the problem solving," Artif. Intell. 2, 189-208 (1971). L7. C. Hewitt, Description and Theoretical Analysis of PLANNER: A Language for Proving Theorems and Manipulating Models in a Robot, Technical Report TR-258, MIT, Cambridge, MA, L972. 18. G. Sussman, T. Winograd, and E. Charniak, MICRO-PLANNER ReferenceManual, Technical Report Memo 203, MIT, Cambridge, MA, L97L. 19. R. Fikes, Deductive Retrieval Mechanisms for State Description Models, Proceedingsof the Fourth IJCAI, Tbilisi, Georgia, pp. 99106, 1975. 20. S. Fahlman, "A planning system for robot construction tasks," Artif. InteII. 5, I-49 Q974). 2I. P. J. Hayes, A Representationfor Robot Plans, Proceedingsof the Fourth IJCAI, Tbilisi, Georgia,pp. 181-188, L975. 22. D. McDermott and G. Sussman,The CONNIVER ReferenceManual, Technical Report Memo 259, MIT, Cambridg., MA, 1972. 23. J. Rulifson, J. Derksen, and R. Walding€r, QA4: A Procedural Calculus for Intuitive Reasoning,Technical Report Note 73, SRI International, Menlo Park, CA, L972. 24. E. Charniak, C. Riesbeck,and D. McDermott, Artificial Intelligence Programming, Lawrence Erlbaum Associates, Hillsdale, NJ, 1980. 25. Reference24, p. 197. 26. J. Goodwin, An Improved Algorithm for Non-Monotonic Dependency Net Update, Technical Report LITH-MAT-R-82-23, Department of Computer and Information Science,Linkoping University, Linkopirg, Sweden,L982. 27. D. McDermott, Contexts and Data Dependencies:A Synthesis, Department of Computer Science,Yale University, New Haven, CT, L982. 28. H. Shrobe, Dependency-DirectedReasoning in the Analysis of Programs which Modify Complex Data Structures, Proceedingsof the Sixth IJCAI, Tokyo, Japan, pp. 829-835, 1979. 29. A. Thompson, Network Truth Maintenance for Deduction and Modeling,Proceedingsof the Sixth IJCAI, Tokyo, Japan, pp. 877879, 1979. 30. J. Doyle, SomeTheories of ReasonedAssumptions,Carnegie-Mellon University, Pittsburgh, PA, L982. 31. J. Goodwin, WATSON: A DependencyDirected Inference System, Proceedings of the Non-monotonic Reasoning Workshop, AAAI, Menlo Park, CA, pp. 103-Lt4, 1984. 32. J. deKleer, Choices without Backtracking, Proceedings of the Fourth AAAI, Austin, Texas, 1984. 33. S. C. Shapiro and M. Wand, The Relevanceof Relevance,Technical Report 46, Computer ScienceDepartment, Indiana University, Bloomington, IN, 1976. 34. A. Anderson and N. Belnap, Entailment: The Logic of Releuaruce and Necessity,Yol. 1, Princeton University Press,Princeton, NJ, r975. 35. B. C. Williams, "Qualitative Analysis of MOS Circuits," MIT, AILab, Technical Report TR-567, 1983. 36. J. Doyle and P. London, "A selecteddescriptor-indexedbibliography to the literature on belief revision," SIGART Newslett.7L,723 (1980). 37. J. de Kleer and J. Doyle, "Dependenciesand Assumptions," in The Handbook of Artificial Intelligence,Vol. 2, A. Barr and E. Feigenbaum (eds.),William Kaufmann, Inc., Los Altos, CA, 1982, pp. 72-7 6.
J. Mnntns Instituto Superior Tecnico,Lisbon
BEIIEFSYSTEMS
BEIIEFSYSTEMS
63
sentation [e.g.,Moore (2)]; and (c) psychologicalheuristic theories, also concernedwith reasoning but ,rcing techniques that make some explicit claim to psychological adequ..y-ruch theories typicalty are not concernedwith representational issues per se [e.g., Colby and Smith (19) and Wilks and Bien (20)1.
A belief system may be understood as a set of beliefs together with a set of implicit or explicit proceduresfor acquiring new beliefs. The computational study of belief systemshas focused on building computer systems for representing or expressing beliefs or knowledge and for reasoning (qr) with o, utout beliefs or knowledge. Such a system is often expressedin terms of a formal theory of the syntax and semantics of belief and PhilosophicalBackground knowledge sentences. Much of the data, probleffis, and theories underlying AI research on formal belief systems has come from philosophy, in Reasonsfor Studying Such Systems.There are several dis- particular, epistemology, philosophy of langu dge, and logic tinct, yet overlapping, motivations for studying such systems. (especially modal and intensional logics). As McCarthy and Hayes, two of the earliest contributors to this field, have explained (1), Philosophical lssues.There are several philosophical issues-logical' semantic, and ontological-that tr,ave been faced by AI researchersworking on belief systems. A computer program capable of acting intelligently in the world must have a general representation of the world. . 1. The problem of the relationship between knowledge and [This] requires commitments about what knowledge is and belief. This problem, dating back to Plat o'sTheaetetus,i", ,rr,rhow it is obtained. . . . This requires formalizing concepts ally resolved by explicating knowledge as justified true belief of causality, ability, and knowledge. (seeRef 2L for the standard critique of this view and Ref 22 for a discussion in the context of AI). Thus, one motivation is as a problem in knowledge representa2. The problem of the nature of the objectsof belief, knowltion (see Representation, knowledge). In the present context edge,and other intentional (i.e., cognitive) attitudes: Are such this might less confusingly be refened to as "information repobjectsextensional (e.g., sentences,physical objectsin the exresentation" since not only knowledge but also beliefs are repternal world) or intensional (i.e.,nonextension"t;..g., proposiresented. A secondmotivation is as a componentof computations, concepts,mental entities)? tional studies of action. Subcategories of the latter include 3. Problems of referential opacity: the failure of substitutplanning systems (e.g., Ref 2), systems for planning speech ability of co-referential terms and phrases in intentional conacts (e.9., Ref 3), and systems for planning with rn,rltipl. texts. This can best be illustrated as a problem in deduction: agents (e.g., Ref 4). These systems frequently involve repreFrom senting and reasoning about other notions as well (such as can, wants, etc.). Susan believes that the Morning Star is a planet A third motivation is the construction of AI systems that and can interact with human users, other interacting AI systems, or even itself (..9., Refs S and G). Among the subcategories The Morning Star is a planet if and only if the Evening Star here are the study of user models for determining appropriate is a planet, output (e.9.,Refs 7 and 8) and the prediction of others' blhavior and expectationson the basis of their beliefs (e.g.,Ref g). A it does not logically follow that fourth motivation is directly related to such inteLction: the study of AI systems that can converse in natural language -base,, Susan believes that the Evening Star is a planet. (e.g., Ref 10), either with users or with a "knowledge (e.g.,Ref 11). A fifth motivation is the study of reasorrlng' how Nor from a particular individual reasons (Ref L2) or how reasoni"g can be carried out with incomplete knowledge (e.g., Ref 18) or in Ruth believes that Venus is a planet the face of resource limitations (e.g.,Ref 1 ). Finally, there is and the ever-present motivation of modeling a mind (..g., Refs lb Venus - the Evening Star and 16) or providing computational theories of human reasoning about beliefs (e.g.,Refs LZ and 1g). does it logically follow that Typesof Theories. There are four overlapping types of theoRuth believes that the Evening star is a planet. ries identifiable by research topics or by rlr.urch methodologres. One is belief revision (qv), which is concernedwith the 4. The problem of quantifying in (i.e., into intentional conproblem of revising a system's databasein light of new, possitexts): From bly conflicting information; such theories ur. dealt with in another entry. The other types of theory can be usefully cateCarol believes that the unicorn in my garden is white, gorized [by augmenting the scheme of McCarthy and Hayes (1)l as (a) epistemologicaltheories, concerned primarily with it does not logically follow that representationalissues[e.g.,McCarthy (9)]; (b) formal heuris_ tic theories, concernedprimarily with the logic of belief and There is a unicorn in my garden such that Carol believes knowledge, that is, with reasoning in termr oi a formal reprethat it is white.
64
SYSTEMS BELIEF
or b. problems of logical form (or semantic interpretation, .,knowledge repres.tttution" in the sense of AI): How should what are the following kinds of sentencesbe understood, and knowledge? and belief of cases simpler with relatioiships their is the same as Margot knows whether Ben's phone number Ariana's. Mike knows who SallY is' philosopher. Jan believes that stu believes that he is a the movie at that believe Harriet and Frank mutually Loew's starts at 9 P.m' de dicto 6. The problem of the distinction between de re and is not one actions' person's a of a cause is belief beliefs: When a how in but also only interested in what the person believes, in a interested only the person believes it. That is, one is not in also but beliefs, agent's the third-person charac terization of that suppose beliefs. those of the agent,s owncharacterization janitor stealRalph seesthe person whom he knows to be the to ing some go.r.rrrment documents, and suppose-unknown Ralph Then lottery. just the won Ralph_triat the janitor has believes de re believes d.ed,ictothat the janitor is a spy, and he Ralph would asked, if is, That spy. a is that the lottery winner he merely but janitor spy"; a is "Th; proporition assent to the winner lottery believes of th; *un known to the hearer as the winlottery "The to that he is a spy-Ralph would not assent referis a dicto de ef beli a viewed, ner is a spy." iraditionally rs referentially entially opaque context, whereas a belief de re inference the Thus, transparent. Ralph believes lde dictol that the janitor is a spy' The janitor - the lottery winner' a spy' Ralph believes lde d,icto)that the lottery winner is
(A4) v(KoP --- P). (A5) r(Kop -> KoKoP). (A6) '([Kop A K"(p + g)1-
Koq)
Roughly, (AB) says that o knows all theoreffis, (A4) says that whai is known must be true (recall that knowledge is generally consideredto be justified true belief), (A5) says that what is tnown is known to be known, and (A6) says that what is known to follow logically from what is known is itself known. A (propositional) logic of belief (a propositional doxastic logic) (Aa); other can be obtained by using operators Bo and deleting simitaking by epistemic and doxastic logics can be obtained logics' lar variants of other modal possible-worldssemantics for epistemic and doxastic logics can be provided as in ordinary modal logics by interpreting the accessitility relation between possible worlds as a relation of epistemic or doxastic alternativeness. Thus, for example, Kop is true in possible world w rf and only tf p is true in ' possibleworld w' for all w that are epistemic alternatives to w. Intuitively, o knows that p if and only tf p is compatible with everything that a knows lsee Hintikka Q3,24) for details]. (or accessibility) Various restrictions on the alternativeness relation yield correspondingly different systems.Thus, s4 can be char actertzedsemantically by requiring the relation to be seonly reflexive and transitive. If symmetry is allowed, the p -' KoP : mantics chara ctertzesthe stronger system S5 54 * a Ko -- Kop. (Roughly, what is unknown is known to be un-
known.) Note that none of these systems is psychologically plausible. For example, no one knows or believes aII tautologies or all logical consequencesof one's knowledge or beliefs as suginfalse presents gested by (A6). Nor is it clear how to interpret (A5)-is the is invalid. Moreover, its conclusion not only namelY, information' of loss a consequentto be read as"aknows that o knows that p" or as"a, formation but it also represents Ralph's of "content" propositional the knows that he (or she) knows that p,,?-rtot whether it is of the information aboui hand, plausible. Indeed, some philosophers feel that there are no other the belief. On axioms that charac tertze u pry.hologicalty plausible theory of spy. a is janitor he that the of rel believes Ld,e Ralph belief. There is a large philosophical literature discussing (1967), these issues [e.g.,Ref i5, thu special issuesof Noas 1 epistemic a spy' and synthdse 2r (1970)1.other formalizations of Ralph believes lde ref ofthe lottery winner that he is (26) logics that are of relevance to AI are to be found in Sato philosophijust information little the of as and McCarthy et al. Qn Further discussion is valid. But the conclusion conveys the first premise. cal issuu, *uy be found in Ref 28, The Encyclopediaof Philosoabout Ralph's actual belief d,ed,ictoas does recommending phy (2g), an&through The Philosopher'sIndex.Interesting reAn AI system that is capable of explaining or with two kinds these between aistinguish to able be must cent work on the semantics of betief sentencesdealing behavior Refs in representing found of be means may linguistic and computational issues of belief reports by having two distinct 30-33. them. point of EpistemicLogic. of central importance from the proknowledge and view of AI have been the logics of belief Hinof fragment propositional posed by Hintikka (23). The logic) can tikka's logic of knowledge (propositional epistemic modal logic s4 the of variant notational a as be axiomatized family (seeModal logic), replacing the necessityoperator by a a individual each for Ko, of proposition-forming operators are axioms The p"). that knows tK"p is to be read"a (A1) If P is a tautologY, then FP' (A2) If rP and '(P - g), then Fg' (A3) If vp, then vKoP'
Surveyof Theoriesand SYstems In this section the major published writings on belief systems types of are surveyed following th; three-part cat egorizationof reminded is reader The types. the *itttin liner by arrd theories that the categt rrzatton is highly arbitrary and that virtually aII of the research falls into more than one category. Theories Epistemological iarly Wori. One of the earliest works on AI belief systems' system of by Mccarthy and Hayes (L), begins by considering a are detergiven time a at states whose ata autom interacting
BELIEF SYSTEMS 65 mined by their states at previous times and by incoming signals from the external world (including other automata). A personp is consideredto be a subautomaton of such a system. Belief is representedby a predicateB, where B o(s, w) is true if p is to be regarded as believing proposition w when in state s. Four sufficient conditions for a "reasonable" theory of belief are given: l. p's beliefs are consistent and correct. 2. New beliefs can arise from reasoning on the basis of other beliefs. 3. New beliefs can arise from observations. 4. If p believesthat it ought to do something, then it doesit.
(generatedby the de relde dicto distinction) because it doesnot allow for the full hierarchy of Fregean senses(gb). The three readings are: believes(pat,Wants{Mike, Meetg{Mike$, Wifeg Jimg}}) believes(pat,Exist p$.Wants{Mike, Meetg{Mike$,p$}} And Conceptof{Pg,Wife Jim}) lP$ P.believes(pat, wants{Mike, Meetg{Mikeg, p$}}) n conceptof(Pg,P) n conceptof(p,wife jim)
Here, if mike is the name of a person whose concept is: Mike, then Mike is the name of that concept and its concept is: Mike$, etc. It is not clear, however, that such a hierar.hy i, neededat all (cf. Ref. 8T) nor whether McCarthy's notation is However, criterion 1 is psychologically implausible and seems indeed incapable of representing the ambiguity. Creary does, to better characterize knowledge; criterion 4 is similarly too however, discuss reasoning about proposiiional attitudes of strong. Knowledge is represented by a version of Hintikka's other agents by "simulating" them using ,,contexts,,-temposystem (23): The alternativeness relation, shrug(p, sl, sz is rary databasesconsisting of the agent's beliefs plus common ), true if and only if: if p is in fact in situation s2, then for all he beliefs and used only for reasoning, not for ,.pr.r.ntation knows he might be in situation sr. (A "situation" is a complete, lthus escaping certain objections to "database approaches,, actual or hypothetical state of the universe.) Koq is true (preraised by Moore (seeRef. z)l.Creary,s system was subjected sumably at s) if and onry if vttshrug(p, t, s) A(D),where q(t) to criticism and refinement by Barnden (Bz). is a "fluent" -4 Boolean-valued function of situations-that BeliefSpaces.The problems of nested beliefs and of the de "translates" Q, and where shrug is reflexive and transitive. re-de dicto distinction suggest that databasescontaining repAlthough this paper is significant for its introduction of philoresentations of beliefs should be partitioned into units (often sophical conceptsinto AI, it discussesonly a minimal reprecalled "contexts," "spacesr"or "views") for each believer. one sentation of knowledge and belief. of the earliest discussionsof these issues in a computational A more detailed representation is offered by McCarthy (b,g) framework was by Moore (36), who developeda LlSir-like lanin which individual concepts-that is, intensional entities guage, D-SCRIPT, that evaluates objectsof belief in different somewhat like Fregean senses-are admitted as entities on a environments (see also Ref. Zg.) Another early use of such par with extensional objects,to allow for first-order expression units was Hendrix's (88) partitioning of semantic networks of modal notions without problems of referential opu.ity. No- into "spaces"and "vistas": The former can be usedto represent tationally, capitalized terms stand for concepts, lowlrcase the propositions that a given agent believes; the latler are terms for objects.Thus, know(p, X) is a Boolean_valued(extenunions of such spaces.similarly, schneider (gg) introduced sional) function of a person p (an extensional entity) and a "contexts" to represent different views of a knowledge base, conceptX (an intensional entity), meaning "p knows the value and Covington and Schubert (40) used "subnets" to ,frr.rent of x," defined as true Know(p, x), wherl ir* is a Booleanan individual's conceptionof the world. Filman et al. f+f l treat valued function of propositions, and where Know(p, x) is a a context as a theory of some domain, such as an agent,s beproposition-valued(i.e., concept-valued)function of a person liefs, with !h" ability to reason with the agent's beliefs in the conceptp and a concept x. Nested knowledge is handled by context and about them by treating the context as an object in Know rather than know; thus, "John knows whether Mary a metacontext. knows the value of X', ts Knoru(John Know(Mary, X)). , The Fully lntensionalTheories.The notions of intensional entiHintikka-style knowledge ("knowledge-that") is r"pr"rented ties and belief spacescometogether in the work of Shapiro and by a function K(P, e), defined as (e And Knout(p, e)); thus, his associates.Maida and Shapiro (16) go a step beyond the "John knows that Mary knows the value of x,, is K(John, approach of McCarthy by dropping extensional entiii., altoKnow(Mary, X)). A denotation function maps intensional con- gether. Their representational schemeusesa fully intensional ceptsto extensional objects,and a denotation relatio rt,d,enotes, semantic network in which all nodes represent distinct conis introduced for conceptsthat lack corresponding objects.An cepts, all represented concepts are represented by distinct existence predicate can be defined in terms of th; laiter: true nodes,and arcs represent binary relations between nodesbut ExistsX if and only if IIr ld,enotes(X,r)1. Belief is not treated in cannot be quantified over (they are ,.nonconceptual,,). The en_ nearly as much detail. FunctionsBelieueand belieueare intro- tire network is considered to model the belilf system of an duced,though so are functions belieuespyand notbelieuespy(to intelligent agent: nondominated propositional nodls represent handle a celebrated pu zzle of referential opacity concerning the agent's beliefs, and "base" ttoh", represent individual conspies; see Linsky (28)), yet no axioms are provided to relate cepts. [Similar philosophical theories are those of Meinong them to each other or to the ordinary belief functions. [A simi- (42) and Castafleda (48); see Rapaport (44).1Two versions lar theory in the philosophical literature was of described in 'know' are treated (both via .g"trt-verb-object case frames): Rapaport (84).1 hnowl for "knows that" and inow2 for "knows by acquaintCreary (I7) extended McCarthy's theory to handle concepts ance." There are correspondingversions of ,believJ, of concepts.According to Cre dty, McCarthy's notation rtrr""gh it cannot is not clear what belieue2is); the fundamental principle represent three distinct readings of con_ necting knowledge and belief is that the system believesl that an agent knowsl that p only if the system believesl both that Pat believes that Mike wants to meet John,s wife the agent believesl that p and that ift" agent believesl that p
BELIEF SYSTEMS
for the right reasons.Unlike other belief systemS,their system Roughly, (A7) says that if a is common knowledge, then it is can handle questions,as queries about truth values (which are common knowledge that S knows it; (A8) says that if B follows represented by nodes).Thus, whereas most systems represent from a in K4, then F is true in the context of a in KI4; and (Ag) "John knows whether p" as "John knows that p or John knows saysthat if B doesnot follow from a in K4, then it is not true in that -p," Maida and Shapiro (16) consider these to be merely the context of a in KI4. The context operator may be explained logically equivalent but not intensionally identical; instead, as follows: If a - [S]9, then [a] identifies S's theory whose they represent it as "John knows2 the truth value of p." axiom is g. Thus, "all S knows about p is that et or q2" canbe Among the consequencesof the fully intensional approachare representedas: [a][S]p, where a - [S]qr V [S]gz. (1) the ability to represent nested beliefs without a type hierKobsa and Trost. Kobsa and Trost @il use the KL-ONE archy [see Maida (18)], (2) the need for a mechanism of co- knowledge representation system, augmented by their version referentiality (actually, their "a EQUIV b" represents that the of partitions: "conte;f,s"-collections of "nexus" nodes linked systembelieuesthat a and b arc coreferential), (3) the dynamic to "concept"nodes,representing that the agent modeledby the introduction of new nodes, through user interaction, in the context containing the nexus nodes believes propositions order they are needed (which sometimes requires node merg- about the concepts. There is a system context and separate ing by means of EQUIV arcs), and (4) the treatment of all contexts for each agent whose beliefs are modeled, with extransitive verbs as referentially opaqueunless there is an ex- plicit (co-referential-like) links between isomorphic structures plicit rule to the contrary. in the different contexts (instead of structure sharing or patRapaport and Shapiro (45) lsee also Rapaport (a6)] make tern matchirg). Of particular interest is their use of "embedessential use of the notion of a "belief space"to represent the ded" (i.e., nested) beliefs to represent recursive beliefs (the distinctions between de re and de dicto beliefs. In dynamically special case of nesting where a lower level context models a constructing the system's belief space,they follow the princi- higher level one, as in the system'sbeliefs about John's beliefs ple that if there is no prior knowledge of coreferentiality of about the system's beliefs) and mutual beliefs (by linking the conceptsin the belief spacesof agents whose beliefs are being context for one agent embeddedin the context for another with modeled by the system, then those concepts must be repre- the embedding context). sented separately. This has the effect of reintroducing a kind of hierarchy [see the discussion of Creary (17), above], but FormalHeuristicTheories there is a mechanism for "merging" such entities later as new Moore. One of the most influential of the formal theories information warrants. Thus, the conjunctive de dicto proposi- (both epistemological and heuristic) has been that of Moore (2,47,56).His was the first AI theory to offer both a representation "John believes that Mary is rich and Mary believes that Lucy is rich" requires four individuals: the system'sJohn, the tional scheme and a logic and to show how they can interact system's John's Mary, the system's Mary, and the system's with other notions to reason about action. For his representaMary's Lucy. But the de re proposition "John believes of Mary tion, Moore uses a first-order axiomatization of the possiblethat she is not rich" only requires two: the system'sJohn and worlds semantics of Hintikka's 54 lrather than the modal axithe system's Mary. This technique is used to represent quasi- omatic version; it should be noted that Moore (2) erroneously indicators: Virtually all other systems fail to distinguish be- addedthe S5 ruleJ. Specifically,he introduces a predicate T (w, tween "John believes that he. is rich" and "John believesthat p) to represent that the object language formula p is true in John is rich" [although Moore (47) briefly discussesthis]; the possibleworld w, and the predicateK(A, wl, w2) to represent starred, quasi-indexical occurrenceof "he" is the system'sway that w2 ts possibleaccordingto what A knows tn wL. "A knows of depicting John's use of 'I' in John's statement, "I am rich." thatp" is then representedby Know(A,p), which satisfiesthe This is represented as a de dicto proposition requiring two axiom:T (utL,Know(aL,p1)) - S w2(K(aL, wt, w2) -+ T (w2 , individuals: the system'sJohn and the system'sJohn's repre- p1)). Since Moore is concernedwith using knowledge to reason sentation of himself (which is distinct from the system'sJohn's about actions, he formulates a logic of actions, where complex John). actions are built out of sequences,conditionals (defined in Other Theories.Among other theories that may be classi- terms of Knorp), and loops,and a logic fot "can," understoodas fied as epistemological (though some have considerableover- "knowing how to do." The criticisms one can offer of Moore's lap with formal heuristic theories) are the important early work are both two-sided: (1) its psychologicalinadequacy (priwork of Konolige (48), a series of papers by Kobsa and his marily due to his reliance on Hintikka's system)-but, of colleagues(49-52), Xiwen and Wiede (53), and Soulhi (54). course,this is shared by most other formal theories-and (2) Konolige.Konolige (48) is concernedwith the other side of its similarity to much work that had been going on in philosothe coin of knowledge: ignorance. In order to prove ignorance phy during the 1960sand 1970s,but here it must be noted that based on knowledge limitations l"circumscriptive ignorance"; one advant ageof (some)AI theories over (some)philosophical seeMcCarthy (55)1,he uses a representation schemebasedon theories is the former's attention to detail, which can often a logic called KI4, an extension of the work of Sato (26). KIA indicate crucial gaps in the latter. (Moore's critique of the has two families of modal operators:knowledge operators,[S], database approach is discussedbelow.) for each agent S, and (what might be called "context") operaKonolige.Konolige and Nilsson (6) consider,from a tbrmal tors, [a1,for each proposition a; and it has an agent 0 ("fool"), point of view, a planning system involving cooperatingagents. where [0]a means "A is common knowledge." The axioms and Each agent is representedby a first-order language, d "simularules of KIA include analogs of (A1)-(A6)(system K4), plus: tion structure" (a partial model of the language), a set of facts (expressedin the language and including descriptions of other -+ t0ltSla (A7) r-[0]CI agents), a "goal structure" (consisting of goals and plans), a deduction system, and a planning system. An agent uses a (A8) If a tsx+F,then FKI+tal,B formal metalanguage to describe the langUages of other (Ag) If not-(orrx+F),then FKI4- talF agents and can use its representation of other agents (or it-
BETIEF SYSTEMS
67
self-but not quasi-indexically) to reason by simulation about knowledge base but as a query language. He defines a firsttheir plans and facts in order to take them into account when order language g that has its singular terms partitioned by making its own plans. Belief, rather than knowledge, is taken means ofa relation u into equivalence classesofcoreferential as the appropriate cognitive attitude, to allow for the possibil- terms; the classesare referred to by numerical ,,parameters" ity of emor [not allowed by axiom (A4), above],and ,,agentA0 (for the knowledge base to be able to answer wh-questions). g believesthat agent A1 believesthat agent A0 is holdinl object has a truth value semantics based on a set s of ,.primitive,' B" is representedby FACT(AI, ,HOLDING(AO, B),) (true) sentences,and g is said to describe a ,,world structure" "pp.uring in AO's FACT-list. Although an analog of axiom (A5) is taken (s, u). Levesque argues that although g may be sufficient to as an axiom here, the analog of (4,.6)is not, since (1) their query the knowledge base about the world, it is not sufficient system allows different agents to have different deduction sys- to query it about itself. For this, g is extended to a language tems and (2) the deductive capabilities of the agents are con- ?f9, containing a knowledge operator -I( and satisfying two sidered to be limited. principles: (1) "every logical consequenceofwhat is known is This theory was made more rigorous in Konolige (14) [see also known," but not everything is known (i.e., the knowledge also Ref. 57l.Here, a planning system with multiple agents base is "an incomplete picture of a', possible world); and (2) ia has a "belief subsystem" consisting of (1) a list of ,,base,'sen- pure sentence(i.e., one that is about only the knowledge base) tences(about a situation) expressedin a formal language with is true exactly when it is known" (i.e., the knowledge base is a modal belief operator and a Tarski-like truth lrul,r" ,"*unan accurate picture of itself). The operator K satisfies slightly tics; (2) a set of deduction processes(or deduction rules) that modified axioms for I (which are like those for a typicalhrsi_ are sound, effectively computable, have "bounded,'input, and order logic), plus: are, therefore, monotonic; and (B)a control strategy (for apply_ ing the rules to sentences).Belief derivation is "total"; thal is, all queries are answered in a bounded amount of time. The If rso, then ryyKc. system is deductively consistent (i.e., a sentenceand its negarys((Ka A K(q- F)) -- KF). tion are not simultaneously believed), but it is not logically rws,(YxKa --+ I{Vra). consistent (i.e., there might not be a possible world in which If c is pure, then rys(a = K6). all beliefs are true). Thus, somern""r,rre of psychologicalplausibility is obtained. A system can be deductivety though not logically consistent if there are resourcelimitations on deducThe first ofthese says, roughly, that ifa is provable in g, then tions; that is, the deductive processesmight be incomplete "c islmown" is provable inXtg; the secondis similar to (A6); becauseof either weak rules or a control stratery that doesnot the third says, roughly, that if everything is such that o is perform all deductions. Konolige uses the former (though his k-nownto hold of it, then it is known that everything is such sample of a weak rule- modus ponensweakenedby conjiirrirrg that c holds of it; and the fourth says, roughly, thut th" If a "derivation depth" to each sentence-seems to require a operator is redundant in pure sentences.Semantically, ifft is a -modus nonstandard conjunction in order to prevent ordinary set of world structures (i.e., those compatible with the knowl_ ponens from being derivable). The system satisfiestwo properedge base), then I{a is true on s, u, & if and only if a is true ties: closure (sentencesderived in the system are closedunder on all (s', u') in k. It should be observedthat K is more like the deduction rules; i.e., all deductions are made) and recura belief operator since Ka -> a is not a theorem, whereas sion (the belief operator tSl is interpreted as another belief KjKl--> p is. Two operations on an abstract data type KB can system). Thus, [S]a means that a is derivable in S's belief then be defined roughly as follows: (I) ASK: KB x if-g _{yes, system. A "view" [similar to Hendrix's ,,vista,, (Bg)] is a no, unknown), where ASK : yes if Ko is true in KB; .A,sit : belief system as "perceived through a chain of agents,,; for no if K- a is true in KB; and ASK is unknown otherwise. (II) example u - John, Sue is John's perception of Sue'sbeliefs. To TELL: KB x 1fg -+ KB, where TELL : the intersection of KB bound the recursive reasoning processes,the more deeply with the set of all world structures on which the query is true. nested a system is, the weaker are its rules. Konolige presents Although the query language is epistemic, Levesque proves a a Gentzen-style propositional doxastic logic B consisting of: representation theorem stating that the knowledge in KB is the axioms and rules of propositional logt; a set of rules for representable using g [essentially by trading in KJ for Fy(k ___> each view u; and,for each r, (1) a rule Cuj* (essentialtymodus c),_where k may be thought of as thl conlun-"ctionof seniences ponens) that implements closure (2) a rule 85 that formalizes , in KBl. agent i's deductive system in view z (roughly, the rule is that if a sentence 6 from some set of sentencesA can be -Jn Ref. 59, principle 1 is weakened, for several psychologi_ inferred cally interesting reasons: (a) it ignores resource limitatiois; using the rules of the view v, i from a set of sentences f that (b) it requires beliefofall valid sentences;(c) it ignores differ_ are believedby S;, then [S;,lAcan be inferred using the rules of encpg-between logically-equivalent, yet distinct, sentences; z from [s;]f), and (3) a rule B,that says that *rryihing can be and (d) it requires belief of ail sentences if inconsistent ones derived from logically inconsistent beliefs. B is strons", than are believed. To achieve an interpretation sensitive to these, might be desired, since, if the z rules are complete and recur- two belief operators are used: Ba for,.o is explicitly (or ac_ sion is unbounded, B is equivalent to Sb (A4). Konolige tively) believed" and, La for ..a is implicit in what is blheved.,, points out, however, that it can be weakened (A4). to s4 To distinguish (A) situations in which only a and a __> Levesgue.A very different approach was taken B are by believed from (B) those,in which they are believed togltfr", Levesquein a seriesof papers (11,bg,59)on knowledge bases. wit! F-without being forced to distinguish (C) situati-ons in The problem he confronts is that of treating a knowlfige base which only a y B is believed from (D) tt or" in which only B y a that is incomplete (i.e.,that lacks someinformation neededto is believed-Levesque uses ,,partial possible worlds,,'in whicfr answer queries) as an abstract data type. However, his use of n9t sentencesget truth values. A formal logic is defined in 1ll epistemic logic is not as a representation device within the which Z is logically'bmniscient', (much like Levesque,s ear_
58
SYSTTMS BETIEF
-- Lc is valid, but lier -[f), but B is not. More precisely: (i) Ba -+; (iii) B need not its converseis not; (ii) B is not closedunder logically equivaof two both to apply to all valid sentencesor of great philobeliefs. (iv) inconsistent B allows lent ones; and -'> BB if and only if a Ba that a theorem is interest sophical logic (see Ref. B, where entails comes from relevance "itol, 60)'
(BA) TaBELIEVE(P) V aBELIEVE(Q) aBELIEVE(P V Q) (B4) TaBELIEVE(P & Q) -->
"knowledge" what appears to be a notational variant of an accessibility relation iefined, however, not between possible worlds but between possible sets of answers to a questiorl ;;il;; ;;;: McAllester (13) add knowredge operators ," " ilgi;;;; (;;5;":d;;; Rabin soning about likelihood. [Halpern and 4r'rveowr64vtvrlv et al. (62,68, and Gb)have extended thei" f""*;ii;;;;;it"t into these and other logics.l
(Bg) r All agents believe that all agents believe
aBELIEVE(p) & aBELIEVE(e) (B5) TaBELIEVE(P) --+ -aBBLIEVE(-P) (86)
r-aBELIEVE(P -+ Q1-.' (aBELIEVE(P) -
aBELIEVE(Q)) :'lii::::",Tfi-T"Ifft:iJ:ffi11ffi --- aBELIEVE(3xP(x)) ",f.'T:"'*".i'i;,ffi colleagues(18,62-6b).Nilsson (61) attempts a formalization of G?) r!lxlaBELIEVE(P(x))l bt.$,iig (81)-(87) (actually, belief) without a K operator
psychologicalHeuristic Theories. This category of rryearcf, which attempts to be more psychologically realistic than eia ther of the preceding two, may be further subdivided along psychomore the to formal more ruur'-_-rr the urtt spectrum ranging from logical. More Formal than psychologicaLThere are two major, and related, topics investigated under this heading: speechact theory and mutual belief. "Spe".h ActTheory. Speechacttheory, developedbythe phi-Grice, and Searle eonsidersthe basic unit of Austin, losofhe"s iirrsoirti. communication to be the rule-governed production an of a token of a sentence(or word) in the performance of a statement making of (such act the as act illocutionary speech
plausiThey admit that this is too strong to be psychologicallv axiomatized' not but represented arso are bre."Agents' wants and' Leuesque' Cohen and Levesque (67) claim that .--Cot'en iJlocutionary act definitions can be derived from statements this redescribing ihe recognition of shared plans and that the perhaps offer They beliefs' qures a lefinition of mutual of representation plausible' psychologically most honest' if not (BEL r p) is true if and only if p follows from what r believes (KNow*p)isdefinedas_(ANDp(BElrp))andKNowlFrp) is used to as (oR txifuw r p)(KNolil r (Nor p)))' The latter (2)' MuMoore of lines the d"firr" u., if-then-else rule, along by is characterized below) detail more (discussed in tual belief two axioms: If rp, then r(MB r y p)' = (BELr (ANDp (MB yxflD' r(MBryp)
*':lr"l"-;"*l';1.ffi:l:f"nilTiJ:':'il"$ll"i;l? f ;*itffiiltTJi::i":XT:r:i'::5lT';'.il splaker S meanssomethingbv his or !e1 "lt"l1ll"-,i f: ;."*hlr);believes'thatpeimpliesthattheresultofrdoingo dressedto hearerl/ if and onlv if, *"ChI{}: t:1":1":":f ir-qr-;d t6atp; implies thai r's making q;-1 true therebv by II in effect a certain produce utterance of U to , ft). Various illocutionary opera;.il"r g; true (fir i = 1, . 1'e1lt ctetarrs :1T lurther and (see references intention recognitionofthis ilir --- ."" .fr.tacterized using notionssuchas these' 66). This researchprogramwas continued Ref. in efi"" perrault (68) in orderto model"helpful" linguisCohenand,Perrauh. Cohen and Perrault !!l ajtgmgi to bv Allen ""aperrault. and intentrorts possible the models by a hearer(much provide"a theory that formally Ii. b"h.uior, that is, appropriateresponses plans" intentions.as treating by They offer a sim. . below). acts see untlerlying speech i' tn" *urrrrer of user modeling; is presented which person), first (stated the in involving;thecommunicationofbeliefs."Plans,areT::l:_11? ;il;;fit someof the of "action" operators,.wli.l .?"::t1:l illustrate (in to order prespeciiedsequences gene.ality *ore ir* i" preconditions,bodies,andeffectsandareevaIuatedrelatrveto
(incruai,,g *oa"i.':?T;"';i:ffi!;:T"tlifHJi,k;;ffi1[#liH:;X"T,"r'"ff'r:*;il moder worrd thepranner,s beliefs). When the action operatol i1.a speech must be two preconditions: that S inteilocutor's for the approact, it takes beliefs and goals and returns plans th9ory for of adequacv 1 9r f"i.tu ,p"".h act. Their criteria b:tl:f: AGTl's agent (1) distinguish beliefs are that it must from AGTl's beliefs about AGT2's beliefs t"1(2l3U"Y +::l to represent (a) that AGT2 knows whether-P without,AGl'r (b) that having to know which of P and -P AGTZ believes and
(S) is tired, there F;il|h"thai he- is tired and that he (S) intend thatH believe ;;i; ii"i fr". (S) is tired, and there should be the effect that I/ (1) i"ii"u. that S is tired. Their methodologv is as follows: to wants agent an if example, planning rules; for ift"i" "." ""fri*"panddoesnotknowwhetherPistrue,thentheagent
knowswhat the r such l9T1 of beliei *iui 6CT'Z thinks the r suchthat Rac is' Their logic takes BELIEVE as a relation (though thev call,iteTfl operator)betweenan agent and a proposition'satisfyingthe iJrio*i"g axioms(for eachagenta):
irr-o*f"ig"ofplanningandhisorherbeliefsabouttheagent's actions;for iii nt""" u"" ilrrf"."rr.e rules for inferring if S beplanningrule_above, the ""lfr. to :;;;pi;'**"sponding then S true' P is iii""r ifr"t A has a go"l of ktto*ing whether believe may S or goal of achievingP *uv L"turru thatA has a
that3' R"'.;^"*"l,ril-igTrand thatRob berieves AGr2 Fc"Y#ili"x""-f##t"j,TJrllixiiii$L?':"tff;"1 kl",Illc" that Ror is without
thenTaBELIE'E(P) losic, ofnrst-order (81) rfpisanaxiom l{'-tlm .uz)
r-aBELIEVE(p)--+ aBELIEVE(aBELIEVE(P))
;
;i";
'.,E;::}:1fl11!
n:*:l"lt"";;lt;"rtlT:l
schema of the form (though in different notation)
BELIEFSYSTEMS
(Ba(P * 8) n BeP) -t BaQ, although their commentary suggests that such schemata are really of the form Bs (Ba (P - 8) A BaP) -+ BsBaQ.Knowledge is defined as true belief: Ke,P : (P A BaP), interpreted as BSKAP if and only if Bs(S and A agree that P). Knowing-whether and knowingwho are defined as follows: KNOWIFAP - (P n BAP) V (-P A Bo-'p;. KNOWREF4P - 3y[y - the r such that D(x) A Ba (y - the r such that D(x))1. There are also numerous rules relating these forms of belief and knowledge to wants and actions. Other theories include those of Allen, Sidner, and Israel. Allen (69) continued this line of research, embedding it in a theory of action and time; here, BELIEVES(A, p, To, Tu) is taken to mean that A believes during time interval 'io that p holds during time interval Te. Sidner and Israel (20) and Sidner (71) attack similar problems, treating the "intended meaning" of utterance Uby speaker S for hearer H asa set of pairs of propositional attitudes (beliefs, wants, intentions, etc.) and propositional "contents" that are such that S wants F/ to hold the attitude toward the content by means of u. Mutual Belief. The problems of mutual belief and mutual knowledge, notions generally acceptedto be essential to research programs such as these, are most clearly stated by Clark and Marshall (72). They raise a paradox of mutual knowledge: To answer a successful definite reference by speaker ,Sto hearer H that term / refers to referent R, edoubly infinite sequenceof conditions must be satisfied:Ks(r is R), KsKn(r is R), KsKHKs(r is R), . , and KnG is R ), KHKy(/ is R ), . . But each condition takes a finite amount of time to check, yet successful reference does not require an infinite time. Their solution is to replace the infinite sequencesby mutual knowledge defined in terms of "copresence":s and I/ mutually know that f is R if and only if there is a state of affairs G such that S and I/ have reason to believe that G holds, G indicates to them that they have such reason, and G indicates to them that f is R. Typically, G will be either (1) community membership (i.e., shared world knowledge),for example, when f is a proper name; (2) physical copresence(i.e.,a shared environment), for example, where t is an indexical; or (3) linguistic copresence(i.e.,a shareddiscourse),for example, where / is anaphoric (seeRef. 78 for a critique.) Mutual knowledge has been further investigated by Appelt (4,74) and Nadathur and Joshi (75). Appelt's planning system is an intellectual descendantof the work of Allen, Cohen, perrault, and Moore. It reasonsabout A's and B's mutual knowledge by reasoning about the knowledge of a (virtual) agent_ the "kernel"-whose knowledge is characterizedby the union of sets of possibleworlds that are consistentwith A's and B,s knowledge. Nadathur and Joshi replace Clark and Marshall's (72) requirement of mutual knowledge for successfulreference by a weaker criterion: if S knows or believes that // knows or believes that / is R, and if there is no reason to doubt that this is mutual knowledge, then S conjectures that it is mutual knowledge. This is made precise by using Konolige's KI4 to formulate a sufficient condition for S's usin g t to refer to R. Other Theories.Other formal psychologicalheuristic work has been done by Taylor and Whitehiil (26) on deception and by Airenti et al. (77) on the interaction of belief witl conceptual and episodic knowledge.
69
More Psychologicalthan Formal Wilks and Bien. The various logics of nested beliefs in general and of mutual beliefs in particular each face the threat of infinite nestings or combinatorial explosions of nestings. Wilks and Bien (10,20)have attempted to deal with this threat by using what might be called psychologicalheuristics. Their work is based on Bien's (78) approach of treating naturallanguage utterances as programs to be run in "multiple environments" (one of the earliest forms of belief spaces):a global environment would represent a person P, and local environments would represent P's models of his or her interlocutors. The choice of which environment within which to evaluate a speaker'sutterance [/ dependson P's attitude toward the discourse:if P believes the speaker,then U would be evaluated in P's environment, else inP's environments for the speakerand hearer. Wilks and Bien use this technique to provide an algorithm for constructing nested beliefs, given the psychological reality of processinglimitations. They offer two general strategies for creating environments: (1) "presentation', strategies determine how deeply nested an environment should il to represent information about someone.The "minimal,, presentation stratery, for simple cases,constructs a level onlyfor the subject of the information but none for the speaker;the "standard" presentation stratery constructs levels for both speaker and subject; and "reflexive" presentation strategies construct more complex nestings. (2) "Insertional" strategies determine where to store the speaker's information about the subject; for example, the "scatter gun" insertion strategy would be to store it in all relevant environments. A local environment is representedas a list of statements indexed by their behavior and nested within a relatively global environment: A{B} representsA's beliefs about B, A{B{c}-} representsA,s beliefs about B's beliefs about C. Supposea USER informs the SYS"EM about personA. To interpret the USER's utterance, a nested environment within which to run it is constructed, only temporarily, as follows: SYSTEM{Ai and SYS"E Ili[{usERI are constructed, and the former is "pushed down into,, the latter to produce SYSTEM{usnn{A}1.p,rctting is done according to several heuristics: (1) "Contradiction,'heuristics:The SySTEM's beliefs about the USER's beliefs about A are assumed to be the SYSTEM's beliefs about A unless there is explicit evidenceto the contrary.(2) Pragmatic inference rules change some of the sys?EM's beliefs about A into the sysTEM,s beliefs about A's beliefs about A. (B) ,,Relevance',heuristics: Those of the SYSTEM's beliefs about the USER's beliefs that explicitly mention or describe A "become part of the SySTEM's beliefs about A. (4),.percolation,,heuristics: Beliefs in sys"E Mtus'Rtolt th.t are not contradicted remain in sysTEM{A} when the temporary nested environment is no longer neededfor evaluation purposes.Thus, percolation seemsto be a form of learning by means of trustworthiness, though there is no memory of the source of the new beliefs in SyS 7gy{el after percolation has occurred; that is, the SYSTEM changes its beliefs about A by merely contemplating its beliefs about the USER's beliefs. Other difficulties concern"self-embedded,, beliefs: In SySTEM{svs?EM},there are no beliefs that the SYS"EM has about the SyS?E M thatare not its own beliefs, but surely a SYS TEM might believe things that it does not believe that it believes;and there are potential problems about quasi-indicators when SYS TEM{A} is pushed down into itself to produce SYS"E 114vtAtt. Colby. Although the work of Wilks and Bien has a certain
70
BETIEFSYSTEMS
formality to it, they are not especially concernedwith the explicit logic of a belief operator, an accessibility relation, or a formal logic. The lack of concern with such issues may be taken to be the mark of the more psychological approaches. The pioneers of this approach were Colby and Abelson and their co-workers. Colby and Smith (19) constructed an "artificial belief system," ABS1. ABS1 had three modes of operation: During "talktime" a user would input sentences,questions, or rules; these would be entered on lists for that user (perhaps like a belief space;but seebelow). If the input were a question,ABS1 would either search the user's statement list for an answer (taking the most recent if there were more than one answer),or deducean answer from the statement list by the rules, or else generate an answer from other users' lists. During "questiontime" ABS1 would searchthe user's statement list for similarities and ask the user questions about possiblerules; the user's replies would enable ABS1 to formulate new rules. ABS1 would also ask the user's help in categorizing concepts.During "thinktime" ABSL would infer new facts (assignedto a "self"list) and compute "credibility" weightings for the facts, rules, and user. It should be noted that beliefs in this system are merely statements on a user's list, which makes this approach seem very much like the database approach criticized by Moore (2). Moore's objections are as follows: (1) If the system does not know which of two propositionsp or g a user believes,then it must set up two databasesfor the user, one containing p and one containing g, leading to combinatorial explosion. (2) The system cannot represent that the user doesnot believe that p, since neither of the two database alternatives-omitting p or listing -'p-is an adequate representation. Although these are serious probleffis, Colby and Smith's ABSI s€€trIsnot to have them. First, ABS1 only reasons about explicit beliefs; thus, it would never have to represent the problematic cases. Of course,a more psychologically adequatesystem would have to. Second,ABS1 doesnot appear to reason about the fact that a user believes a statement but only about the statement and ABSl's source for its believing the statement. In Colby (79) a belief is characterized as an individual's judgment of acceptance,rejection, or suspendedjudgment toward a conceptual structure consisting of concepts-representations of objectsin spaceand time, together with their properties-and their interrelations. A statement to the effect that A believesthat p is treated dispositionally (if not actually behavioristically) as equivalent to a series of conditionals asserting what A would say under certain circumstances. More precisely, "U BelieveE'C, t" if and only if experimenter E takes the linguistic reaction (i.e., judgment of credibility) of language user U ta an assertion conceptualizedas C as an indicator of U's belief in C during time ?. Thus, what is represented are the objects of a user's beliefs, not the fact that they are believed. Various psychologically interesting types of belief systems (here understood as sets of interacting beliefs)-neurotic, paranoid, and so on-can then be investigated by "simulating" them. The most famous such system is Colby's PARRY (80,81),which has been the focus of much controversy [see Colby (82) and Weizenbaum's (83) critiquel. Abelson.A similar research program has been conducted by Abelson and co-workers (L2,15).Underlying their work is a theory of "implicational molecules," that is, sets of sentences that "psychologically" (i.e., pragmaticallY) imply each other;
for example, a "purposive-action" molecule might consist of the sentence forms "person A does action X," '.X causesoutcome Y," and "A wants F." The key to their use in a belief system is what Abelson and Reich consider a Gestalt-like tendency for a person who has such a molecule to infer any one of its members from the others. Thus, a computer simulation of a particular type of belief system can be constructedby identifying appropriate molecules,letting the system'sbeliefs be sentences connected in those molecules (together with other structures, such as Schank's "scripts") and then having the system understand or explicate input sentencesin terms of its belief system. A model of a right-wing politician was constructed in this manner [see also the discussionsof Colby's as well as Abelson'swork in Boden (84)1. U serModels. An extended,databasetype of belief system is exemplified by user models such as those investigated by Rich (7,8).Here, instead of the system being a model of a mind, the system must construct a model of the user's mind, yet many of the techniques are similar in both cases.A user model consists of properties of the user ("facts") ranked in terms of importance and by deglee of certainty (or confidence)together with theirjustifications. The facts comefrom explicit user input and inferencesbased on these, on "stereotypes"(so that only minimal explicit user input is needed), and on the basis of the user's behavior (sothat the model is not merely the user's selfmodel). The user model is built dynamically during interaction with the user. Discussionand Conclusions If there is any criticism to be leveled at the wide variety of current research, it is that the formal systems have not been sufficiently informed by psychology (and, hence, behave more like logicians than like ordinary people),and the psychological theories have not been flexible enough to handle some of the logical subtleties (which ordinary people, perhaps with some instruction, are certainly capable of). What is neededis a robust system whose input-output performance (if not the intervening algorithms) is psychologically plausible but whose underlying logic is competent,if needed,to handle the important (if often ignored) formal subtleties. In spite of radically differing approachesand terminolory, it seemsclear that AI research into belief systemssharescommon issues and goals. This can be brought out by discussing Abelson's (85) characterization of a belief system.For Abelson, a "system" is a "network of interrelated conceptsand propositions" and rules, with proceduresfor accessingand manipulating them. Such a system is a "belief system" if: 1. The system's elements are not consensual. This can be taken, perhaps, either as a rejection of Bp - p or as Wilks and Bien's heuristics. By contrast, a "knowledge system" would be consensual.Abelson urges that 1 be exploited by AI belief systems even though it makes them nongeneralizable. 2. The system is concerned with existence questions about certain conceptual objects. The need to have a logic of the intensional objectsof belief may be seen as a version of 2, even though 1 and 2 make it difficult to deal with beliefs that are held in common.
BELIEFSYSTEMS
3. The system includes representations of "alternative worlds." 2.
This desideratum may be taken as covering the notions of possible worlds and of nested and mutual beliefs. 4. The system relies on evaluative and affective components. 5. The system includes episodic material. A "knowledge system" would rely more on general knowledge and principles. Clearly, though, a full system would need both. 6. The system's boundaries are vague. 7. The system's elements are held with different degrees of certitude. Although these criteria are psychologically oriented, many of them are also applicable to formal approaches.In particular, 1-3 and 7 are relevant to logical issues; 4-7 are relevant to psychological issues. Indeed, except for the choiceof underlying logic, most of the systems discussed here seem compatible, their differences arising from differences in aim and focus. For instance, Abelson and Reich's implicational molecules could be among the z rules in Konolige's system. Note that the rules do not have to be "logical" if they do not need to be consistent;moreover, as mentioned earlier, there might not be any (psychologically plausible) logic of belief. As a consequence,a psychologically plausible belief system, whether "formal" or not, must be able to deal with incompatible beliefs. This could be done by a belief revision mechanism or by representational or reasoning techniques that prevent the system from becoming "aware" of its inconsistencies(with, of course,occasionalexceptions,as in real life). It is, thus, the general schemesfor representation and reasoning that seemmost important and upon which, as a foundation, specific psychological heuristics may be built. In this w&y, too, it may be possibleto overcomethe computational complexity that is inevitably introduced when the underlying inference package is made to be as powerful as envisagedby, say, Konolige or when the underlying representational schemeis made to be as complete as proposedby, say, Rapaport and shapiro. A psychologicallyadequate"shell" that would be efficient at handling ordinary situations could be built on top of a logically adequate "core" that was capable of overriding the shell if necessaryfor correct interpretation. The trade-offs between psychological and logical adequacy that have been made in most current systemscan, in prin.ipt", be overcome. (They have, after all, been overcome in those humans who study the logic of belief yet have not been hindered from interacting in ordinary conversational situations.) Whether it is more feasible to make a formally adequate system psychologically adequate or to "teach" a psychologically adequate system to be logically subtle remains an interesting research issue.
BIBLIOGRAPHY 1. J. McCarthy and P. J. Hayes, "Some philosophical problems from the standpoint of artificial intelligence," in B. Meltzer and D. Michie (eds.), Machine Intelligence, Vol. 4, Edinburgh University Press, Edinburgh, pp. 46s *s02, 1969, reprinted in B. L. webber
3.
45. 6. 7. 8.
71
and N. J. Nilsson (eds.),Readings in Artift,cial Intelligence,Tioga, Palo Alto, CA, pp. 431-450, 1981. R. C. Moore, "Reasoningabout knowledge and action," Proc. of the Fifth IJCAI, Cambridge, MA, 223-227 Q977); reprinted in B. L. Webber and N. J. Nilsson (eds.),Readings in Artificial Inteltigence,Tioga, Palo Alto, CA pp. 478-472, 1981. P. R. Cohen and C. R. Perrault, "Elements of a plan-basedtheory of speechacts," CognitiueScienceg, r77 -zr2 (1979);reprinted in B. L. Webber and N. J. Nilsson (eds.),Readings in Artificial IntelIigence,Tioga, Palo Alto, CA, pp. 428-495,1981. D. E. Appelt, "A planner for reasoning about knowledge and action," Proc. of the First AAAI, Stanford, CA, 131-133 (1980). J. McCarthy, "Epistemologicalproblems of artificial intelligence," Proc. of the Fifth IJCAI, Cambridge, MA, 10gg-1044 (1977). K. Konolige and N. J. Nilsson, "Multiple-agent planning systems," Proc. of the First AAAI, stanford, cA, 1Bg-144, 1gg0. E. Rich, "Building and exploiting user models," Proc. of the Sixth IJCAI, Tokyo, Japan, 720-722, Ig7g. E. Rich, "LJser modeling via stereotyp€s," Cognitiue Science g, 329_354 (1979).
9. J- McCarthy, "First-order theories of individual concepts and propositions,"in J. E. Hayes, D. Michie, and L. I. Mikulich (eds.), Machine Intelligence, Vol. 9, Ellis Horwood, Chichester, pp. 12gL47, 1979. 10. Y. Wilks and J. Bien, "speech acts and multiple environments," Proc. of the sixth IJCAI, Tokyo, Japan, 96g-920, 1gzg. 11. H. J. Levesque,"Foundations of a functional approach to knowledge representationi' Artif. Intell. 28, Ls5-2r2 (lgg4). 12. R. P. Abelson and C. M. Reich, "Implicational molecules: A method for extracting meaning from input sentenc€s,"Proc. of the First IJCAI, Washington, D.C., 64I-642, 1969. 13. J. Y. Halpern and D. A. McAllester, Likelihood, Probability, and Knowledg", IBM ResearchReport RJ 4B1g (47L4L),19g4; shorter version in Proc. of the Fourth AAAI, LBT-L4r,1994. 14. K. Konolig€, "A deductive model of belief," Proc. of the Eighth IJCAI, Karlsruhe, FRG, BT7-991, 1gg3. 15' R. P. Abelson, "The structure of belief systems," in R. C. Schank and K. M. colby (eds.), computer Mod,ers of rhought and Language, w. H. Freeman, san Francisco, cA, pp. igz-ggg, 1973. 16' A' S. Maida and S. C. Shapiro, "fntensional conceptsin propositional semantic networks," cognitiue science 6, zgl-gilo irggzl. 17- L. G. Creary, "Propositional attitudes: Fregean representation and simulative reasoning," proc. of the sixth IJCAi, Tokyo, Jap&tr,176-181,1g7g. 18' A. S. Maida, "Knowing intensional individuals, and reasoning about knowing intensional individuals," Proc. of the Eighth IJCAI, Karlsruhe, FRG, 382-894, 1ggg. 19' K' M. Colby and D. C. Smith, "Dialoguesbetweenhumans and an artificial belief system," Proc. of the First IJCAI, Washington, D.C., 319_324,1969. 20' Y' Wilks and J. Bien, "Beliefs, points of view, and multiple environments," cognitiue scienceT, gs-116 (1gg3). 2L' E. L. Gettier, "Is justified true belief knowledge?," Analysis 28, L2t-I23 (1963);reprinted in A. P. Griffiths (ed.),Knowled,ge and Belief, oxford university press, oxford, 1962. 22. J. H. Fetzer, "on defining'knowledge,,,,AI Mag.6, 19 (spring 1985). 23' J. Hintikka, Knowledge and Belief: An Introduction to the Logic of the Two Notions, Cornell university press, Ithaca, Ny, 1962. 24' J- Hintikka, "semantics for propositional attitudes," in J. W. Davis et al- (eds.),Philosophicq,I Logic,D. Reidel,Dordrecht,1969, pp. 2L-45, reprinted in ref. 2g, pp. L4S,L6T.
72
BELIEFSYSTEMS
25. H.-N. Castafleda,Review of Ref. 28, J. Symbolic Logic zg, LBz* 134 (1964).
46. W. J. Rapaport, "Logical Foundations for Belief Representation," Cognitiue Science10, 37L-422 (1936).
26. M. Sato, A Study of Kripke-Type Models for Some Modal Logics by 47. R. C. Moore, Reasoning about Knowledge and Action, Technical Gentzen'sSequential Method, Kyoto University Research InstiNote No. 191, SRI International, Menlo Park, CA, 1980. tute for Mathematical Sciences,Kyoto, Lg7G. 48. K. Konolige, "Circumscriptive ignorance," Proc. of the Second 27. J. McCarthy, M. Sato, T. Hayashi, and S. Igarashi, On the Mod,el AAAI, Pittsburgh, PA, 202-204, L982. Theory of Knowledge, Stanford Artificial Intelligence Laboratory 49. A. Kobsa and H. Trost, "Representing belief models in semantic Memo AIM-312, Stanford University, 1978. networks," Cybern Sys.Res.2,753-757 (1984). 28. L. Linsky (ed.),Referenceand Modality, Oxford University Press, 50. A. Kobsa, "VIE-DPM: A user model in a natural-language diaOxford, 1977, correctededition. logue system,"in Proc. Bth Germa,nWorkshopon Artificial Intelti29. P. Edwards (ed.),Encyclopediaof Philosophy, Macmillan and Free gence,Berlin, 1984. Press,New York, 1967. 51. A. Kobsa, "Three stepsin constructing mutual belief models from 30. B. H. Partee, "The semanticsof belief-sentences," in K. J. J. Hinuser assertiors," in Proc. 6th European Conferenceon Artificial tikka, J. M. E. Moravcsik, and P. Suppes (eds.),Approaches to Intelligence, Pisa, Italy, 1984. Natural Language: Proceedingsof the 1970 Stanford Workshopon 52. A. Kobsa, "Generating a user model from wh-questions in the Grammar and Semantics, D. Reidel, Dordrecht, pp. B09-BBO, VIE-LANG system,"inProc. GLDV Meeting on Trends in Linguis1973. tischer D atenuerarbeitung,1984. 31. B. H. Partee, "Belief-sentencesand the limits of semantics,"in S. 53. M. Xiwen and G. Weide, "W-JS: A modal logic of knowledge," Peters and E. Saarinen (eds.),processes,Beliefs, and euestions: Proc. of the Eighth IJCAI, Karlsruhe, FRG, 398-401, 1989. Essays on Formal Semantics of Natural Language and Natural 54. S. Soulhi, "Representingknowledge about knowledge and mutual LanguageProcessing,D. Reidel, Dordrecht, pp. 87-106, 1982. knowledg"," Proc. COLING, 194-199, 1984. 32. J. Moravcsik,"Commentson Partee'spaper," in K. J. J. Hintikka, 55. J. McCarthy, "Circumscription-A form of non-monotonicreasonJ. M. E. Moravcsik, and P. Suppes (eds.),Approaches to Natural irg," Artif. Intell. 13,27-39 (1930). Language: Proceedingsof the 1970 Stanford Workshop on Gram56. R. C. Moore, "Problems in logical form," Proc. ACL 19, 1L7-L24 mar and Semantics,D. Reidel, Dordrecht, pp. 349-369, 1973. (1e81). 33. R. C. Moore and G. G. Hendrix, "Computational models of belief 57. A. Konolige, Belief and Incompleteness,CSLI Report No. CSLIand the semantics of belief sentenc€s," in S. Peters and E. 84-4, Stanford University, 1984. Saarinen (eds.),Processes, Beliefs, and Questions.'Essays on For58. H. J. Levesque, "The interaction with incomplete knowledge mal Semantics of Natural Language and Natural Language Probases: A formal treatment," Proc. of the SeuenthIJCAI, Vancoucessing,D. Reidel, Dordrecht, pp. 107-I27 , L982. ver, Brit. Col., 240-245, 1981. 34. W. J. Rapaport, "Meinongian theories and a Rusellian paradox," 59. H. J. Levesque,"A logic of implicit and explicit belief," Proc. of the Nor2st2, 153-180 (1978);errata, 13, 125 (1979). Fourth AAAI, Austin, TX, 198-202, 1984. 35. G. Frege,"On senseand reference"(1892),translated by M. Black in P. Geachand M. Black (eds.), Translations from the Philosophi- 60. A. R. Anderson and N. D. Belnap, Jr., Entailment: The Logic of Releuanceand Necessify,Princeton University Press, Princeton, cal Writings of Gottlob Frege, Basil Blackwell, Oxford, U.K., pp. NJ, 1975. 56-78, 1970. 61. M. Nilsson, "A logical model of knowledg"," Proc. of the Eighth 36. R. C. Moore, "D-SCRIFT: A computational theory of descriptions," IJCAI, Karlsruhe, FRG, 37 4-376, 1983. Proc. of the Third IJCAI, Stanford, CA, 223-229, L973. 62. J. Y. Halpern and Y. Moses,Knowledgeand commonknowledgein 37. J. A. Barnden, "fntensions as such: An outline ," Proc. of the a distributed enuironment,IBM ResearchReport RJ 4421(47909) Eighth IJCAI, Karlsruhe, FRG , 280-286, 1983. 1984. 38. G. G. Hendrix, "Encoding knowledge in partitioned networks," in 63. J. Y. Halpern, Towards a theory of knoutledge and ignorance: N. V. Findler (ed.), AssociatiueNetworks, Academic Press, New preliminary report, IBM Research Report RJ 4448 (48136) York, pp. 51-92, L979. 1984. 39. P. F. Schneider,"Contextsin PSN," Proc.CSCS/,3,71-78 (1980). 64. J. Y. Halpern and M. O. Rabin, A logic to reasonabout likelihood, 40. A. R. Covington and L. K. Schubert, "Organization of modally IBM ResearchReport RJ 4136 (45774)1983. embeddedpropositionsand of dependentconcepts,"Proc. CSCS/, 3 analy65. R. Fagin, J. Y. Halpern, and M. Y. Vardi, A model-theoretic -94 (1980). 87 sis of knowledge: preliminary report, IBM Research Report RJ 4I. R. E. Filman, J. Lamping, and F. S. Montalvo, "Meta-language 4373 @7631)1984; also in Proc. 25th IEEE Symposium on Founand meta-reasoning," Proc. of the Eighth IJCAI , Karlsruhe, FRG, dations of Computer Science,1984. 365-369,1983. (ed.),Philosophy (1904),in R. Haller (ed.), 66. J. R. Searle,"What is a speechact?,"in M. Black 42. A. Meinohg, "Uber Gegenstandstheorie" in America, Allen and unwin, London, pp. 221-239, 1965; reAlexius Meinong Gesamtausgabe,Vol. 2, Akademische Druck- u. printed in J. R. Searle (ed.), The Philosophy of Language, Oxford Verlagsanstalt, Graz, pp. 481-535, 1971. English translation University Press,Oxford, pp. 39-53, 1971. (ed.), ("The Theory of Objects")by I. Levi et al., in R. M. Chisholm 67. P. R. Cohen and H. J. Levesque,"Speechacts and the recognition Realism and the Background of Phenomenology,Free Press, New of shared plans," CSCS/ 3, pp. 263-271, 1980. York, pp. 76-116, 1960. F. Allen and C. R. Perrault, "Analyzing intention in utter68. J. 43. H.-N. Castafleda,"Thinking and the structure of the world," Phiances,"Artif. Intell. 15, 143-L78 (1980). Iosophia,4,3-40 (L974).Originally written in L972;reprinted in 69. J. F. Allen, "Towards a general theory of action and time," Artif. L975in Critica 6,43-86 (L97D. Intell. 23, 123-154 (1984). propositional seman44. W. J. Rapaport, "Meinongian semantics for 70. C. L. Sidner and D. J. Israel, "Recognizrngintended meaning and tic networks," Proc. ACL,23, 43-48 (1985). speaker's plans," Proc. of the Seuenth IJCAI, Vancouver, Brit. 45. W. J. Rapaport and S. C. Shapiro, "Quasi-indexical reference in Col., 203-208, 1981. propositional semantic networks," Proc. COLING-94, 65-70, 71. C. L. Sidner, "What the speaker means:The recognition of speak1984,
BLACKBOARDSYSTEMS ers' plans in discourse,"in N. Cercone(ed.),ComputationalLinguistics, PergamonPress,Oxford, pp. 7I-82, 1983. 72. H. H. Clark and C. R. Marshall, "Definite reference and mutual knowledge,"in A. Joshi, B. Webber,and I. Sag (eds.),Elementsof Discaurse (Jnderstanding, Cambridge University Press, Cambridge,U.K., pp. 10-63, 1981. 73. C. R. Perrault and P. R. Cohen, "It's for your own good:A note on inaccurate reference,"in A. Joshi, B. Webber, and I. Sag (eds.), Elements of Discourse Understanding, Cambridge University Press,Cambridge,U.K., pp. 2t7 -230, 1981. 74. D. E. Appelt, "Planning natural-language utterances," Proc. AAAI, Pittsburgh, PA, 59-62, 1982. 75. G. Nadathur and A. K. Joshi, "Mutual beliefs in conversational systems:Their role in referring expressions,"Proc. of the Eighth IJCAI, Karlsruhe, FRG, 603-605, 1983. 76. G. B. Taylor and S. B. Whitehill, "A belief representationfor understanding deceptior," Proc. of the SeuenthIJCAI, Vancouver, Brit. Col.,388-393, 1981. 77. G. Airenti, B. G. Bara, and M. Colombetti,"Knowledgeand belief as logical levels of representation,"Proc. Cogn. Sci. Soc. 4, 2L22L4 (1982). 78. J. S. Bieri, "Towards a multiple environments model of natural language," Proc. of the Fourth IJCAI, Tbilisi, Georgia, 379-382, 1975. 79. K. M. Colby, "simulations of belief systems,"in R. C. Schank and K. M. Colby (eds.),Computer Models of Thought and Language, W. H. Freeman,San Francisco,CA, pp. 251-286, 1973. 80. K. M. Colby, S. Weber, and F. Dennis Hilf, "Artificial paranoia," Artif. Intell. 2, I-25 (1971). 81. K. M. Colby, F. D. Hilf, S. Weber, and H. C. Kraemer, "Turinglike indistinguishability tests for the validation of a computer Artif. Intell.3, 199-22L (L972). simulation of paranoid processes," 82. K. M. Colby, "Modeling a paranoid mind," Behau.Brain Sci.4, 5 1 5 - 5 6 0( 1 9 8 1 ) . 83. J. Weizenbaum,"Automating psychotherapy,"ACM Forum, L7, 543 (I974); reprinted with replies, CACM 26,28 (1983). 84. M. Boden, Artificial Intelligence and Natural Man, Basic Books, New York, 1977. 85. R. P. Abelson, "Differences between belief and knowledge systems," CognitiueScience3,355-366 (1979). 86. B. C. Bruce, Belief Systemsand Language Understanding, BBN Report No. 2973, 1975. 87. Terence D. Parsons, "Frege's hierarchies of indirect sensesand the paradox of analysis," in P. A. French et al. (eds.),Midwest Studiesin Philosophy6,3-57 (1981). W. J. Reppeponr SUNY Buffalo
BELLE A chess-playing system (see Computer chessmethods) developed at Bell Laboratories by Joe Condon and Ken Thompson, BELLE won the World Computer Chess Championship in 1983 and was rated at the master level. The system contains specialrzedhardware[seeP. Frey (ed.),ChessSkill in Man and Machine, Springer-Verlag 2F,,New York, 19831.
73
SYSTEMS BLACKBOARD Blackboard systems are domain-specificproblem solving (qv) systemsthat exploit the blackboard architecture and exhibit a characteristically incremental and opportunistic problem solving style. The blackboard architecture was developedby Erman, Hayes-Roth,Lesser,and Reddy (1) for the HEARSAY-II speech understanding system. Since then, it has been exploited in a wide range of knowledge-basedsystems(2-9) (see Expert systems) and psychological simulations (10-L4). Four illustrative blackboard systems, HEARSAY-II, HASP, CRYSALIS, and OPM, and important architectural variations they introduce, are described below. Three blackboard systembuilding environments, HEARSAY-II, AGE, and BB1, are also described. MotivatingObjectivesfor the BlackboardArchitecture The blackboard architecture was designed to achieve several objectives that emerged in the HEARSAY-II speech-understanding project and reappear in a broad range of problemsolving domains: 1. To reduce the combinatorics of search (qv): Even with a restricted vocabulary and domain of discourse,the speechunderstanding problem entailed a spaceof utterances too large for conventional search techniques. 2. To incorporate diverse sorts of knowledge in a single problem-solving system: The speech-understanding problem brought with it several sorts of knowledge (e.g., syntax, phonetics, word transition probabilities) but no method for integrating them in a single program. 3. To compensatefor unreliability in the available knowledge: Much of the available speech-understandingknowledge was heuristics (qv). 4. To compensatefor uncertainty in the available data: The acoustic signal for speechis inherently ambiguous, occurs against a noisy background, and incorporates idiosyncracies in the speaker's articulation, diction, graffimar, and conceptualization of utterances. 5. To apply available knowledge intelligently in the absence of a known problem-solving algorithm: Much of the available speech-understandingknowledge was simultaneously applicable, supporting multiple potential inferences from each intermediate problem-solving state but providing no known algorithm to guide the inference process. 6. To support cooperative system development among multiple system builders: Approximately sevenindividuals cooperated to design and implement HEARSAY-II. 7. To support system experimentation, modification, and evolution: Because HEARSAY-II was an experimental research effort, all aspects of the system evolved gradually over a period of several years. The BlackboardArchitecture:Defining Featuresand CharacteristicBehavior
K. S. Anone SUNY at Buffalo
BIT-MAP DISPIAY. See Visual-depth map.
Defining Features.The blackboard architecture has three defining features: a global database called the blackboard, independentknowledge sourcesthat generate solution elements on the blackboard, and a scheduler to control knowledge
74
BLACKBOARDSYSTEMS
source activity. These features are described directly below and illustrated with examples from HEARSAY-II. HEARSAY-II is discussedin more detail in a later section. All solution elements generatedduring problem solving are recorded in a structured, global database called the blackboard. The blackboard structure organizes solution elements along two axes, solution intervals and levels of abstraction. Different solution intervals represent different regions of the solution on someproblem-specificdimension, for example, different time intervals in the speechsignal. Different levels of abstraction represent the solution in different amounts of detail, for example,the phrases,words, and syllables entailed in the speechsignal. Solution elements at particular blackboard locations are linked to supporting elements in the same solution interval at lower levels. For example, the phrase "Are any by Feigenbaum and Feldman" in interval L-225 in the speech signal might be supported by the word "Feigenbaum" in interval 70-150 and the syllable "Fa" in interval 70-95. Solution elements are generated and recordedon the blackboard by independent processescalled knowledge sources. Knowledge sourceshave a condition-action format. The condition describes situations in which the knowledge source can contribute to the problem-solving process.Ordinarily, it requires a particular configuration of solution elements on the blackboard. The action specifiesthe knowledge source'sbehavior. Ordinarily, it entails the creation or modification of solution elements on the blackboard. Only knowledge sources whose conditions are satisfied can perform their actions. For example, the knowledge sourceMOW's condition requires the appearance of new syllable hypotheses on the blackboard. MOW's action generates new word hypothesesencompassing sequential subsets of the syllables. Knowledge sourcesmay exploit both top-down and bottomup inference methods (see Processing, bottom up and top down). For example, MOW generates new word hypotheses bottom up by integrating syllable hypotheses.The knowledge sourcePREDICT generates new word hypothesestop down by extending phrase hYPotheses. Knowledge sources are independent in that they do not invoke one another and ordinarily have no knowledge of each other's expertise, behavior, or existence.They are cooperative in that they contribute solution elements to a shared problem. They influence one another only indirectly, by anonymously responding to and modifying information recorded on the blackboard. Although implementations vary, in most blackboard systems knowledge source activity is event driven. Each change to the blackboard constitutes an event that in the presenceof specific other information on the blackboard can trigger (satirfy the condition of) one or more knowledge sources.Each such triggering produces a unique knowledge source activation record (KSAR) representing a unique triggering of a particular knowledge source by a particular blackboard event. Because several KSARs may be triggered simultaneously and compete to execute their actions, a scheduler selects a single fbnn to execute its action on each problem-solving cycfe. The scheduler may use a variety of criteria, such as the credibility of a KSAR's triggering information, the reliability of its knowledge source,or the importance of the solution element it would generate.When a KSAR is scheduled,its knowledge source action executes in the context of its triggering information, typically producing new blackboard events.
These events may trigger knowledge sources, creating new KSARs to compete for scheduling priority with previously triggered, not yet executed KSARs (see Agenda-basedsystems). CharacteristicBehavior.Blackboard systemsconstructsolutions incrementally. On each problem-solving cycle a single KSAR executes, generating or modifying a small number of solution elements in particular blackboard locations. Along the way some elements are assembled into growing partial solutions; others may be abandoned.Bventually a satisfactory configuration of solution elements is assembled into a complete solution, and the problem is solved. Blackboard systems apply knowledge opportunistically. On eachproblem-solving cycle the schedulerusesa set of heuristic criteria to select a KSAR to execute its action. Depending on the heuristics available to the scheduler, this may produce a more or less orderly approach to solving the problem. At one extreme the scheduler may follow a rigorous procedure,scheduling a planned sequenceof KSARs that monotonically assemble compatible solution elements. At the other extreme it may apply many conflicting heuristics that are extremely sensitive to unanticipated problem-solving states, scheduling KSARs that assemble disp arate, competing solution elements out of which a complete solution only gradually emerges. The BlackboardArchitecture'sApproachto the Obiectives Each feature of the blackboard architecture is designedto address one or more of the seven objectivesintroduced above. 1. To reduce the combinatoricsof search:First, the blackboard architecture integrates reasoning (qv) at multiple levels of abstraction. An application system can solve a simplified version of a problem and then use that solution to guide and limit exploration of a larger space of more detailed solutions (15,16). Second,the blackboard architecture provides independent knowledge sourcesand opportunistic scheduling. As a consequence,an application system can generate and merge independent solution "islands," potentially reducing the search spacedramatically (17,18). Z. To incorporate diverse sorts of knowledge in a single problem-solving system: The blackboard architecture preserves the distinctions among knowledge sources.It permits different knowledge sourcesto embody qualitatively different sorts of expertise, applying idiosyncratic processesto idiosyncratic representations. It permits them to operate indeplndently, contributing solution elements when and where lft.y can. Thus, the blackboard architecture finesses the ptotl"m of integrating different sorts of knowledge per se. instead, it integrates the results of applying different sorts of knowledge. g. To compensatefor unreliability in the available knowledge: The blackboard architecture permits multiple knowledge sourcesto operate redundantly upon the same subproblem. An application system can combine the implications of several unreliable, but redund ant knowledge sourcesto converge upon the most credible solution elements' 4. To compensate for uncertainty in the available data: The blackboard architecture permits different knowledge sourcesto embody top-down and bottom-up inference meth-
BLACKBOARDSYSTEMS
ods.An application system can exploit top-down knowledge sourcesto prune solution elements generatedby bottom-up knowledge sources operating upon uncertain data. Conversely, it can exploit bottom-up knowledge sources to prune solution elements generated top down from uncertain expectations(seeProcessing,bottom up and top down). b. To apply available knowledge intelligently in the absence of a known problem-solving algorithm (see Problem solving): The blackboard architecture provides an opportunistic scheduler that decides, on each problem-solving cycle, which potential action is most promising. The scheduler can integrate multiple, heuristic scheduling criteria. Its decisions depend on the available criteria and the current problem-solving situation. 6. To support cooperative system development among multiple system builders: The blackboard architecture permits functionally independent knowledge sources.Once a blackboard structure and representation of solution elements have been agreed upon, individual system builders can design and develop knowledge sourcesindependently. 7. To support system modification and evolution: First, the blackboard architecture permits functionally independent knowledge Sources,which can be added,removed, or modified individually. Second,the architecture makes a sharp distinction between domain knowledge and scheduling (see Domain knowledge). Modifications to knowledge sources need not affect the scheduler.Conversely,experimentation with different scheduling heuristics need not affect any knowledge sources.
Four lllustrativeBlackboardSystems This section describesfour blackboard systems:HEARSAY-II (1), HASP (2), CRYSALIS (3), and OPM (4). These systems illustrate the range of problems attacked within the blackboard architecture and important variations on the architecture's major components. HEARSAY-II.HEARSAY-II interprets single spoken sentencesdrawn from a 1000-wordvocabulary that request information from a database.As discussedabove,it operateson an ambiguous signal in the presenceof acousticnoise complicated by idiosyncracies in the vocabulary, syntax, pronunciation, and conceptual style of individual speakers. Given training with a speaker'svoice, HEARSAY-II interprets requests with 90Voaccuracy in a factor of 10 of real time. HEARSAY-II begins with a parameterized representation of the speechsignal and attempts to generate a coherent semantic interpretation of it. Between these two extremes, parameter and database interface, HEARSAY-II generates hypotheses at five additional levels of abstraction: segment, syllable, word, word sequence,and phrase. The blackboard's solution intervals represent different time intervals within the speech signal (see also Parsing; Phonemes;Semantics; Speechunderstanding). HEARSAY-II has 12 knowledge sources.Most knowledge sourcesoperate bottom up, inferring hypothesesat one level of abstraction from data or hypothesesat lower levels. For example, the knowledge source MOW hypothesizesall words that encompasssequential subsetsof previously generated syllable hypotheses.A few knowledge sources operate top down. For
75
example, PREDICT hypothesizesall words that might syntactically precede or follow a given phrase hypothesis. Finally, some knowledge sources operate within a single level of the blackboard. For example, RPOL rates the credibility of each new or modified hypothesis at every level. In HEARSAY-II knowledge source conditions and actions are implemented as programs. Becausethey can be very large prograffis, both condition matching and action execution are scheduted.When a blackboard event occurs at a knowledge source'sblackboard level of interest, it generatesa "condition KSAR." When the condition KSAR is scheduledfor execution, it runs the knowledge source'scondition program. If the condition program concludes successfully, it generates an "action KSAR." When the action KSAR is scheduledfor execution, it runs the knowledge source's action program and produces changeson the blackboard. HEARSAY-II pursues a two-stage strategy. During phase 1 it schedulesa sequenceof KSARs that operate bottom up until it has generated all word-level hypotheses supported by the data. During phase 2 it opportunistically schedulescompeting KSARs. However, HEARSAY-II's scheduler has no explicit representation of the two-phase strategy. It applies a uniform set of control heuristics throughout the problem-solving process.The two-phase strategy is implicit in the engineering of different knowledge sources(see also Control structures). During phase 1 three knowledge sourcesprocessthe data bottom up to the word level. The knowledge source SEG is triggered by input of data at the parameter level and hypothesizesall encompassingsegments.POM is triggered by the segment hypothesesand hypothesizesall encompassingsyllables. MOW is triggered by the syllable hypothesesand hypothesizes all encompassingword hypotheses.Each of these knowledge sourcesis triggered exactly once during phase 1, producesthe single KSAR available for scheduling on its problem-solving cycle, and generates all possiblehypothesesat its target level. Thus, although the scheduler knows nothing about phase 1, it has no alternative but to schedule SEG, POM, and MOW in sequence. During phase 2 multiple knowledge sourcesare triggered on each problem-solving cycle, accumulating in a growing list of pending KSARs. The scheduler assigns each KSAR a priority based on its required computing resources,the credibility of its triggering events, the reliability of its knowledge source, and its potential to extend high-credibility partial solutions already on the blackboard. In general, on each problem-solving cycle the scheduler selects the single, highest priority KSAR to execute its action. However, if several pending KSARs propose to extend existing hypothesesof equal credibility, the scheduler selects all of them, effecting a breadthfirst interlude in an otherwise depth-first search. Processinghalts when the system has pursued all credible partial hypothesesor when the system runs out of computing resources(time or space).In the former case the system produces the most complete and credible solution. In the latter caseit may produce several equally completeand credible partial solutions. As the first blackboard system, HEARSAY-II introduces the basic architectural features and the first specification of knowledge sources and scheduler. Regarding knowledge sources, HEARSAY-II specifies an unstructured, procedural representation for knowledge source conditions and actions. Both condition and action procedures produce KSARs for
76
SYSTEMS BLACKBOARD
pected blackboard modifications. Rules label events with the predefinedlabels used for triggering (seeRule-basedsystems). HASP's scheduler iterates a hierarchical procedurethat sequentially selectsall currently due clock events in LIFO order, sequentially selectsall confirmed expectedevents in LIFO order, and selectsthe highest priority simple event by the LIFO rule. For each selectedevent the scheduler executesa predetermined sequence of knowledge sources triggered by the event's label. HASP explains solution elements recorded on its blackboard by reviewing the sequenceof knowledge source rules that produce them. HASP introduces variations on both knowledge source specification and scheduling. Regarding knowledge source specification,HASP constrainsthe syntax of both condition and action components.The restriction of conditions to event labels provides an efficient mechanism for triggering knowledge sources. However, it requires coordination of all knowledge HASp. HASP (2) interprets sonar signals from a circum- sourcesto produce and respond to a manageably small set of for scribed area of the ocean in real time. Given the locations, event labels. The production system representation used Regarding neat. is conceptually actions hydrosource of several knowledge the outputs of descriptions ranges, and coded is phone arrays, it detects,identifies, localizes,gloups, and char- scheduling, HASP's hierarchical, event-based procedure in flexibility limits it severely but efficient, in the computationally acterizes the movement of each ship or other vessel execution. area. Some of these vesselsare friendly or neutral, and others both the selection and sequencingof KSARs for perform its must HASP In addition, are wary and elusive. CRySA[1S.CRYSALIS determinesthe spatial locationsof a interpretation against the background noise and distortions of of information, a the ocean environment. Finally, becausethe ocean sceneis protein's constituent atoms. It usestwo kinds sequenceand acid protein's amino the of going changing description and complete dynamic, with many ships coming and function that (EDM). is a EDM An map problem density its electron interpretation the their behavior, HASP must "solve" often reprecloud, electron protein's the of "snapgives density presenting the reports of series is a output Its repeatedly. or local Peaks, map. contour sh-ots"of the changing scene.These reports also contain expla- sented as a three-dimensional groups atoms, of or to atoms (see correspond EDM Milithe also in maxima, nations justifying their constituent hypotheses their of function approximate providing an peak height with tary, applications in). peaks on the low-density away Stripping signal number. sonar the of atomic representation line a with HASP begins graph structure approximating and attempts to characterize the situation it represents. Be- EDM reveals its skeleton, a groups of atoms. Finally, identifiable tween these two extremes, Line and Situation Board, HASP the connectivity among meaningful componentsof represent levels: the skeleton of segments generates hypotheses at three additional hypothesis (e.g., or side chain). Using the backbone propellers, protein structure or th; engines as such harmonics in the signal, sources of the EDM, CRYSAfeatures these and solusequence Its acid amino carriers. aircraft or submarines as and vessels such in a day. Like human protein zed tion intervals categorically distinguish different ocean re- LIS can solve a medium-si of the nonhydrogen 757o about locates it crystallographers, gions. (seealso Chemnm 8 of accuracy an protein with of Most the in sources. atims knowledge 40 HASP has approximately them operate bottoffi up, inferring hypotheses at one level of istry, AI in; Medical advice systems). CRYSALIS uses an expanded blackboard. As discussed abstraction from data or hypothesesat lower levels. For examhypotheabove,the EDM data themselvessupport hierarchical analysis ple, the knowledge source CROSS.ARRAYRULES Howindependent of any efforts to interpret them. Accordingly, the harmonics. sizes sources that encompasshypothesized confirming CRYSALIS btackboard has two separate "panels," one for the down, top operate Sources ever, Some knowledge hypotheses.Each blackboard panel emexpectationsimplicit in hypothesesat higher levels of abstrac- EDM data and one for and solution intervals. The EDM panel levels bodies different the knowledge source SouRcE.INcoRtion. For peaks, nodes,and segments'Its points, "*urnpb, EDM in levels: implicit four has are that pORATIONRULpS hypothesizessources solution intervals represent spatial location in the EDM. The vesselhypotheses. levels: atoms, superatoms (meanHASP uses a uniform condition-action syntax for all hypothesis panel has three stereotypes (larger structures' and atoms), groups of knowledge sources.Knowledge source conditions specify one ingful Its solution intervals reprebeta-sheets). or anticialpfra-helices of like classes or more predefined event labels representing protein. The blackboard the in locations spatial different systems production sent are pated blackboard events. Actions data and hypothesis related links between whoserules generate, cate gortze,and label blackboard events. permits interpanel links. vertical conventional the as Rules categorize events as simple, clock, or expected events. elements as *ell like structured are sources knowledge and CRySALIS's Simple ,rrrrrtr add or modify hypotheseson the blackboard production a and labels event predefined exploit time. They HASp's. any at sources can b" processedby triggered knowledge for actions. However' CRYSALIS proClock events also add or modify hypotheses,but they must be system ,"pr".en[ation more complex,referring to 250 semantically are rules processedat particular times. Expected events describe ex- duction
scheduling. This specification allows individual system builders to tailor appropriate representations for different knowledge sources. It permits knowledge sources to examine all blackboard contents and perform any desired computations during both triggering and action execution. On the other hand, this specification entails computationally expensive methods for triggering and executing knowledge sources.Regarding scheduling, HEARSAY-II defines a sophisticated scheduler that incorporates multiple criteria to make purely opportunistic scheduling decisions.It exhibits the power of a global control strategy and implements it in the engineering of individual knowledge sources. These specifications allow HEARSAY-II to make intelligent scheduling decisionsin the absenceof a known algorithm for speechunderstanding. However, the combination of an opportunistic scheduler and carefully engineered knowledge sources is an unprincipled approach to scheduling.
BLACKBOARDSYSTEMS
LISP functions that define a crystallographic language for manipulating data and hypotheses. CRYSALIS uses a knowledge-intensivescheduling procedure. The scheduler uses a domain-specific strategy in conjunction with global solution state to sequencedomain-specific problem-solving tasks. It uses each task, in conjunction with local solution state, to selectindividual blackboardevents.For each selected event it executes a predetermined sequenceof knowledge sourcestriggered by the selectedevent's label. CRYSALIS introduces variations on blackboard specification and scheduling. Regarding blackboard specification, CRYSALIS introduces different panels to distinguish reasoning about data from reasoning about interpretations of the data. GEARSAY-II and HASP effectively finessedthis problem by operating upon hand-coded data.) CRYSALIS introduces a domain-specificscheduling procedure.By exploiting this knowledge, CRYSALIS further improves scheduling efficiency. Its knowledge-basedscheduling procedure also provides a perspicuousframework for interpreting system behavior. Of course,this approachis possibleonly when an effective scheduling procedure is known. OPM. OPM plans multiple-task sequencesin a context of conflicting goals and constraints. Given a list of desirable tasks and a map of the region in which tasks can be performed, OPM plans which tasks to perforffi, how much time to allocate for each task, in what order to perform tasks, and by what routes to travel between successivetasks. The problem is complicated by differences in task priorities and time requirements, constraints on when tasks can be performed, intertask dependencies,and limitations on the time available for performing tasks. OPM's blackboard has four levels of abstraction: outcomes (tasks) the plan should achieve, designs for the general spatial-temporal layout of the plan, proceduresthat sequenceindividual tasks, and operationsthat sequencetask components. Its solution intervals represent different plan execution time intervals. Two coordinated blackboard panels with parallel levels of abstraction record reasoning about data and planning heuristics. Each decisionon the plan panel dependson a coordinated set of decisionson these other two panels;for example: Heuristic: Perform the closest task in the right direction next. The closest task in the right direction is the Data: newsstand. Plan: Go to the newsstand next. OPM has about 50 knowledge sources.Someoperatebottom up. For example, the knowledge source NOTICE-PATTERN detects spatial configurations of tasks at the design level from individual task locations at the procedure level on the data plane. Other knowledge sourcesoperate top down. For example, the knowledge sourceREFINE-DESIGN expandsdesigns as sequencesof procedureson the plan plane. OPM uses a two-part condition structure for knowledge sources.A condition's trigger is an event-basedtest of knowledgesourcerelevance.Its precondition is a state-basedtest of the knowledge source'scurrent applicability. Satisfaction of a knowledge source's trigger generates a KSAR, but a KSAR
can be executed only at times when its precondition is true. Both triggers and preconditions may contain arbitrary LISP code as long as they can be evaluated true or false. As in HEARSAY-II, knowledge source actions are arbitrary programs that produce blackboard events. OPM uses a uniform blackboard mechanism for reasoning about control. Control knowledge sourcesdynamically generate, modify, and execute a control plan out of modular control heuristics on the control blackboard. The control blackboard has different levels to represent control heuristics of varying scope.Its solution intervals represent different problem-solving time intervals. For example, at an intermediate point in the problem-solving process, OPM's control plan might contain this partial plan: Solueproblem P by generating an outcomeleuelplan and successiuelyrefi,ning it at lower leuelsof abstraction.Begin by generating an outcomeleuelplan. Always prefer KSARs with cred' ible triggering information and reliable actions. OPM's schedulerhas no control knowledgeof its own. Instead, it adapts its scheduling behavior to whatever heuristics are recorded on the control blackboard. OPM introduces variations in blackboard structure, knowledge source specification, and scheduling. Regarding blackboard structure, OPM distinguishes reasoning about problem data, planning (qv) heuristics, and the plan itself on separate blackboard panels. It also provides a separate blackboard panel for reasoning about scheduling.Thus, OPM introduces explicit representation of all aspects of the problem-solving process.Regarding knowledge sources,OPM introduces a twopart condition structure that combines an efficient eventbased triggering mechanism with a precondition mechanism for restricting execution of triggered KSARs to appropriate contextual conditions. Finally, OPM introduces a simple schedulerthat adapts to a dynamic control plan and a uniform blackboard mechanism for generating the control plan. This enables OPM to integrate the opportunistic and strategic scheduling heuristics. Further, OPM need not commit to any particular combination of heuristics but can dynamically adapt its control plan to unanticipated problem-solving situations. The control blackboard provides a perspicuous framework in which to interpret system behavior. ThreeBlackboardSystem-Building Environments This section describesthree blackboard system-building environments: AGE, HEARSAY-III, and BB1. A11three environments provide the basic architectural components:blackboard, knowledge sources, and scheduler, which a system builder must specify with LISP expressions.In general, AGE is the most constrained of the three systems and, as a consequence, provides the strongest guidance in system design. HEARSAYIII is the least constrainedand, as a consequence, providesthe greatest freedom in system design.BB1, which was developed several years after AGE and HEARSAY-III, adoptsand elaborates upon selectedfeatures of both systems and incorporates them with new features of its own. Age. AGE permits a userr,todefine a blackboard with any number of named levels and associatedattributes. Anv solu-
78
BLACKBOARD SYSTEMS composeseach blackboard into any desired lower level panels as well as desired levels and attributes. Knowledge source conditions specify a triggering pattern and immediate code. The user must express a knowledge source's triggering pattern as a predicate on AP3 fact templates and any LISP predicates composedwith AND and OR operators (seeANDi OR graphs). Whenever one of the constituent APB fact templates is modified, the entire pattern is evaluated. If it is evaluated as true, HEARSAY-III createsa KSAR that includes the knowledge source'sname, the AP3 context in which the pattern matched, and the values of variables instantiated by the match. At the same time the knowledge source's immediate code,which may be any LISP code,is executed.It records potentially useful scheduling information in the KSAR and places the activation record at a particular level of the scheduling blackboard. Knowledge-sourceactions are arbitrary LISP programs. The default scheduler simply selects any KSAR from the scheduling blackboard and executesits action program. However, the system builder can replace it with another scheduler tailored to the application. The scheduling blackboard provides an environment for explicit control reasoning through the activities of control knowledge sources. In illustrative HEARSAY-III systems the control blackboard typically partitions pending KSARs into different priority levels. Control knowledge sourcestypically assign KSARs to particular levels, adjust KSAR priorities within a level, and generate lists of KSARs for sequential execution by the scheduler. However, HEARSAY-III doesnot place any constraints on the structure of the control blackboard or the activities of control knowledge sources.The system builder can use them in whatever manner appears useful. HEARSAY-III is the least constrained of the three blackboard environments. It provides only the defining features of the architecture: the blackboard, condition-action knowledge sources,and a scheduler.But it imposesalmost no restrictions at all on their specification.The knowledge source conditions and actions and the scheduler can be arbitrary programs. This guidSelect a blackboard event according to a function specified gives the system builder great freedom but very little most HEARSAY-III's system. application an designing in ance by the system builder. domain between distinction its in lies specification important sethe Retrieve the list of knowledge sourcestriggered by and control blackboards and its suggestionthat control knowllected event. edge sources should record information on the control blackExecute each triggered knowledge source'slocal production board to influence the scheduler. However, HEARSAY-III system. leaves the productive use of this specification to the system
tion element created at a given level of the blackboard assumes the associatedattributes. Although AGE does not explicitly distinguish multiple blackboard panels, it permits the system builder to distinguish panels implicitly in the behavior of specific knowledge sources. Knowledge source conditions are lists of event labels that correspondto anticipated blackboard events. When an event with one of these labels is selectedby the scheduler, as discussedbelow, the knowledge source is triggered. Knowledge sourceactions are local production systems.The left side of a rule specifiespredicates that determine its applicability. The right side instantiates a template specifying a change to the blackboard and a label for that blackboard event. AGE provides a variety of blackboard accessfunctions for use in the rules. The system builder can define parameters that determine how many times individual rules can fire, how many rules can fire on each triggering of the knowledge source, and how predicates in the left sides of rules combine to invoke their right sides. These restrictions on knowledge source specification have advantages and disadvantages. First, the use of event labels permits an efficient table-lookup method for knowledge source triggering. On the other hand, it requires that the system builder anticipate all important blackboard events and the distinctive contexts in which they may occur. Knowledge sourcesthat generate and respond to events must be coordinated to use the same labels. Second,AGE's production system representation for actions and its blackboard modification templates provides a neat, uniform syntax with detailed code hidden in referencedfunctions. They also provide a foundation for AGE's explanation (qv) capability (in which it reiterates the sequenceof fired rules that produced a particular hypothesis) and for its elaborate interface for creating and editing knowledge sources. On the other hand, these restrictions sometimes hinder specification of complex knowledge source actions. AGE's scheduler iterates the following procedure:
Efficiency is the primary advantage of this scheduler. How- builder. ever, it severely restricts system behavior and the system BBI . BB1 (21) supports blackboard systemsthat explicitly builder's control over system behavior. The system builder can (seePlanning) their own problem-solvsupply only the event selectionfunction. The scheduleralways and dynamically plan (see Explanation) their behavior in explain predeblhavior, ing a opliates by first choosingan event and then executing plan, and learn (seeLearning) control underlying an of the terms by triggered source termined sequence of knowledge BB1 implements the experience. from event's label. It cannot incorporate heuristics for selecting new control heuristics in Ref. 22, which defined architecture control blackboard among or ordering knowledge sources. makes a sharp distinction between domain problems and the of its potential actions should a system HEARSAY-I|I.Erman, London, and Fickas (19) developed control problem: Which cycle? problem-solving execute on each HEARSAY-III, a general-purposeblackboard architecture. It control blackboards to reand domain (20) explicit defines 88L is built upon the relational database system called AP3 control problems. The and domain for elements solution cord searching and exploits APB's capabilities for representing and domain blackboard the of the structure directed graph structures, defining and preserving context, system builder defines BB1 definesthe levels. within attributes ., ,r"*ed levels and and triggering knowledge sources with a demon mechanism. problem to be the distinguish levels whose blackboard, control and HEARSAY-il partitions its blackboard into domain problem-solving strategies, local attenscheduling blackboards.The system builder hierarchically de- solved, sequential
BTACKBOARDSYSTEMS
tional foci, general scheduling policies, to-do sets of feasible actions, and chosen actions selectedfor execution. It also defines the attributes used to specify control decisions at each level. For example, a focus decision'sgoal attribute describes desirable actions, such as "generate solution elements at the outcome level." Its criterion describes the goal's expiration condition, such as "there is a complete and satisfactory solution at the outcome level." The control blackboard's solution intervals distinguish different problem-solving time intervals in terms of problem-solving cycles. 8BL definesexplicit domain and control knowledge sources. Domain knowledge sourcesoperate primarily on the domain blackboard to solve the domain problem. They are domain specificand defined by the system builder. Control knowledge sources operate primarily on the control blackboard to solve the control problem. Some control knowledge sourcesare domain independent and provided by BB1. For example, the knowledge source implement strategy incrementally refines a stratery decision as a series of prescribed focus decisions.The system builder may define additional domain-specificcontrol knowledge sources.All knowledge sourcesare data structures that can be interpreted or modified. A knowledge source'scondition comprises a trigger and a precondition. The trigger is a set of event-basedpredicates. When all of them are true in the context of a single blackboard event, the knowledge source is triggered and generates a representative KSAR. When running an application system, BB1 generates and uses a discrimination net of trigger predicates used in the system's knowledge sources.The precondition is a set of state-basedpredicates.When all of them are true, which may occur after an arbitrary delay, the triggered KSAR is executable. If the preconditions describe transient states, the KSAR may oscillate between triggered and executablestates. This specification of knowledge source conditions provides an efficient event-basedtriggering mechanism with a state-based mechanism for restricting action execution to appropriate contexts. A knowledge source's action is a local production system. The left sides of rules determine under what conditions they fire. The right sides instantiate blackboard modification templates. Control parameters determine how many times individual rules can fire, how many rules can fire on each triggering of the knowledge source, and how multiple left-side predicates are integrated to fire rules. In addition to its condition and action, each knowledge source has descriptive attributes that are potentially useful in scheduling. Thebe include the blackboard panels and levels at which its triggering events and actions occur, its computational cost, its relative importance compared to other knowledge sources,and its reliability in producing correct results. BB1 provides a variety of functions for inspecting the blackboard, knowledge sources, and blackboard events for use in defining knowledge sources.It also provides a simple menudriven facility for creating and editing knowledge sources. BBl defines a simple scheduler that adapts to foci and policies recordedon the control blackboard and schedulesthe execution of both domain and control knowledge sources.On each problem-solving cycle the scheduler rates executable KSARs against operative foci and policies.It applies a scheduling rule, which is also recorded on the control blackboard and modifiable by control knowledge sources, to the KSAR ratings to select one for execution.
79
BB1 provides a graphical run time interface with capabilities for inspecting knowledge sources, blackboard contents, blackboard events, or pending KSARs; enumerating pending KSARs; recommending a KSAR for execution; explaining a recommendation; accepting a user's recommendation;executing a recommended KSAR; and running without user intervention until a specifiedcondition occurs. The specification of BBl's knowledge sources and control mechanism underlie its capabilities for control, explanation, and learning. BB1 provides a general blackboard mechanism for reasoning about control, incorporating any strategic or opportunistic scheduling heuristics specifiedby the user. Moreover, it can construct situation-specific control plans dynamically out of modular control heuristics, avoiding the need to enumerate important problem-solving contingenciesor to predefine an effective control plan. BB1 explains its problem-solving actions by showing how they fit into the underlying control plan and by recursively explaining the control plan itself. BB1 learns new control heuristics when a domain expert overrides its scheduling recommendations.It identifies the critical features distinguishing the expert's preferred action from the scheduler'srecommendedaction and generates a heuristic favoring actions with those features. Researchlssues Tbo research issues dominate current studies of blackboard systems: effective scheduling and parallel computing. Effective scheduling is crucial to the speed and accuracy with which blackboard systems solve problems. Of the three defining architectural components ftlackboard, knowledge sources,and scheduler),the schedulershowsthe greatest variability among application systems and system-building environments. There is a general trend toward making scheduling decisionsand the reasoning underlying them explicit. In addition to improving performance, explicit control reasoning appears essential for automatic acquisition of more effective scheduling heuristics and for strategic explanation. Blackboard systems appear to have great potential for exploiting parallel computing environments. The modularity of knowledge sourcesmakes them ideal candidates for distribution among multiple processors.In addition, knowledge source triggeritg, KSAR execution, and blackboard modification could operate in parallel. There has been some exploratory work in this are (8,23,24),but the potential gains from a paralIel blackboard architecture remains largely unexplored.
BIBLIOGRAPHY 1. L. D. Erman,F. Hayes-Roth, V. R. Lesser,andD. R. Reddy,"The system:Integratingknowledge Hearsay-Ilspeech-understanding to resolveuncertainty,"Conl.put. Suru. 12r2t3-253 (1980). 2. H. P. Nii, E. A. Feigenbaum, J. J. Anton, and A. J. Rockmore, transformation:HASP/SIAPcasestudy," A/ "Signal-to-symbol Mag.3,23-35 (1982). Ph.D.The3. A. Terry,HierarchicalControlof ProductionSystems, sis,Universityof California,Irvine, 1983. 4. B. Hayes-Roth,F. Hayes-Roth,S. Rosenschein, and S. Cammarata, "Modelling Planning as an Incremental, Opportunistic Process,"Proceedings of the Sixth International Joint Conference on Artificial Intelligence,Tokyo, Japan, pp. 375-383, 1979.
BOTTZMANNMACHINE
BOLTZMANNMACHINE
5. D. D. Corkill, V. R. Lesser, and E. Hudlicka, "Unifying DataDirected and Goal-Directed Control: An Example and Experiments, Proceedingsof the SecortdAAAI, Pittsburgh, PA, pp. 143r47, 1982. 6. A. Hanson and E. Riseman, "VISIONS: A Computer System for Interpreting Scenes,"in A. Hanson and E. Riseman (ed.),Computer Vision Systems,Academic Press, New York, 1978. 7. E. Hudlicka and V. R. Lesser,Meta-Level Control Through Fault Detection and Diagnosis,Technical Report,Amherst, MA, University of Massachusetts,1984. 8. V. R. Lesser and D. Corkill, "Functionally accurate cooperative distributed systems," IEEE Trans. Sys/. Man Cybern SMC-I' 81-e6 (1981). 9. M. Nagao, T. Matsuyama, and H. Mori, "Structured Analysis of Complex Photographs," Proceedings of the Sixth International Joint Conferenceon Artificial Intelligence,Tokyo, Japan, pp. 610616, 1979. 10. B. Hayes-Roth, The Blackboard Architecture: A General Framework for Problem-Solving?,Technical Report HPP-83-30, Stanford CA, Stanford University, 1983. 11. B. Hayes-Roth and F. Hayes-Roth, "A cognitive model of plann i n g , " C o g . S c i .3 , 2 7 5 - 3 1 0 ( 1 9 7 9 ) .
The Boltzmann machine (1) is a massively parallel architecture that uses simple on-off processingunits and stores all its Iong-term knowledge in the strengths of the connectionsbetween processors.Its main difference from other connectionist architectures (2-4) (seeConnectionism;Connectionmachine) is that the units use a probabilistic decision rule to decide which of their two states to adopt at any moment. The network computes low-cost solutions to optimi zation problems by settling to thermal equilibrium with some of the units clamped into their on or off states to represent the current task. For a perceptual interpretation task the clamped units would represent the perceptual input; for a memory retrieval task they would represent a partial description of the item to be retrieved. At thermal equilibrium the units continue to change their states, but the relative probability of finding the network in any global configuration is stable and is related to the cost of that configuration by a Boltzmann distribution:
L2. J. L. McClelland and D. E. Rumelhart, "An interactive activation model of context effectsin letter perception:Part 1. An accountof basic findings,"Psychol.Reu.88' 375-407 (1981).
where Po is the probability of being in the ath global configuration and Eo is the cost of that configuration.
13. M. Rose, The Composition Process,Ph.D. Thesis, University of California at Los Angeles, 1981. t4. D. E. Rumelhart and J. L. McClelland, "An interactive model of context effects in letter perception: Part 2. The contextual enhancement effect and some tests and extensions of the model," Psychol.Reu.89, 60-94 (1982). 15. A. Newell, J. C. Shaw, and H. A. Simon, "Report on a General Problem-Solving Program," Proceedings of the International Conferenceon Information Processing,UNESCO House, Paris, France, 1959. 16. M. Stefik, J. Aikens, R. Balzer' J. Benoit, L. Birnbauh, F. HayesRoth, and E. Sacerdoti, "The organization of expert systems: A prescriptive tutorial," Artif. Intell. 18, 135-L73 (1982). L7. M. Minsky, Steps Toward Artificial Intelligence, in E. A. Feigenbaum and J. Feldman (eds.), Computers and thought, McGrawHill, New York. PP.406-450, 1961' 1g. M. Stefik and L. Conway, "Towards the principled engineering of knowledge,"AI Mag.3,4-16 (1982). 19. L. D. Erman, P. E. London, and s. F. Fickas, The Design and an Example Use of Hearsay-Ill, Proceedingsof the SeuenthInternational Joint Conference on Artifi,cial Intelligence, Vancouver' BC, pp. 409-4L5, 1981. 20. N. M. Goldman, AP3 ReferenceManual, Technical Report, Los Angeles, cA, Information sciencesInstitute, L982. Zl. B. Hayes-Roth, BB1: An Architecture for Blackboard Systems that Control, Explain, and Learn about Their Own Behavior, Technical Report HPP-84-16, Stanford, CA, Stanford University,
)tr e-(Eu-EB
ft:
(1)
CooperativeComputationof BestFitsby EnergyMinimization Tasks like perceptual interpretation (seeVision) and contentaddressablememory can be formulated as optimization probIems in which there are massive numbers of plausible constraints (see Constraint propagation), and low-cost solutions typically satisfy most but not all of the constraints (5,6).The Boltzmann machine allows the constraints to be implemented directly as interactions between units. If these interactions are symmetrical, it is possible to associatean energy E with each global configuration (7,8): E-
|*,is;sr*)0,t, i<j
(2)
i
where w;i rsthe weight of the connectionfrom theTth to the ith unit, s; is the state of the lth unit (0 or 1), and g; is a threshold. Each unit can compute the difference in the global energy for its off and on statesgiven the current states of all the other units. This energy gap is simply the sum of the weights on the connectionscoming from other on units. So to monotonically reduce the global energy, units should adopt their on state if and only if their energy gap is positive (7). Searchesfor minima of an energy function can be improved by adding thermal noise to the decision rule (9). The thermal noise allows the network to escape from local minima and to pass through higher enerry configurations. By giving each of the very large 1984. number of higher enerry configurations a small chance of beZZ. B. Hayes-Roth,"A blackboard architecture for control," Artif. Ining sampled, it effectively removes energy barriers between telli. J. 26, 25L-321 (1985). -ini-a. In the Boltzmann machine the probabilistic decision ZB. R. Fennell and V. Lesser, "Parallelism in AI problem-solving: rule used to simulate thermal noise is C'26, 98-111 A case study of HSII," IEEE Trans' Comput' (1e77). 24. V. R. Lesser and L. R. Erman, "Distributed Interpretation: A model and experiment," IEEE Trans. Comput. C'29, 1144-1163 (1e80).
B. HeYns-RotH Stanford UniversitY
Pn:fu
(3)
on where pp is the probability that the kth unit adopts the each If temperature. state, [nuis its energ"ygap, and ? is the are unit is sampled withfinite probability and if time delays to network whole the cause will rule decision negligible, lttir a upitouch thermal equilibrium. The fastest way to approach
BRANCHINGFACTOR
low-temperature equilibrium (at which low-cost configurations are far more probable than high-cost ones) is to start with a high temperature and to gradually reduce it, a process called simulated annealing (9,10). Probabilities Representing In a Boltzmann machine the probability that an atomic hypothesis is correct is representedby the probability of finding the corresponding unit in the on state. This allows the machine to correctly represent the probabilities of complex hypothesesthat correspondto configurations of on and off states over many units. Systems that use real numbers to directly represent the probabilities of the atomic hypotheses(seeReasoning, plausible) have great difficulty representing the higher order statistical structure correctly and so they cannot implement Bayesian inference (see Bayesian decision methods) unless exponentially many numbers are used or very strong independenceassumptions are made (11). In a Boltzmann machine the weights implicitly encodethe a priori probabilities of an exponential number of configurations. Learning
81
2. S. E. Fahlman, G. E. Hinton, and T. J. Sejnowski, "Massively parallel architectures for A.I.: Netl, Thistle, and Boltzmann Machines,"Proc. of the Third Nat. Conf.Artif. Intell. 109-113, 1983, Washington, D.C. 3. J. A. Feldman, and D. H. Ballard, "Connectionistmodelsand their properties"Cogn. Sci. 6, 205-254 (1982). 4. G. E. Hinton, and J. A. Anderson (eds.),Parallel Models ofAssociatiueMemory. Erlbaum, Hillsdale, NJ, 1981. 5. D. H. Ballard, G. E. Hinton, and T. J. Sejnowski,"Parallel visual computation," Nq,ture306, 2I-26 (1983). 6. D. E. Rumelhart and J. L. McClelland (eds.),Parallel Distributed Processing:Bxplorations in the Microstructures of Cognition. Vol, 7, Foundations, MIT Press,Cambridg", MA, 1986. 7. J. J. Hopfield, "Neural networks and physical systemswith emergent collective computational abilities," Proc. Nat. Acad. Sci. usA 79, 2554-2558 (1982). 8. R. A. Hummel and S. W. Zucker, "On the foundations of relaxation labeling processes,"IEEE Trans. Pattern Anal. Mach. Intell. PAMI-', 267-287 (1983). 9. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecci, "Optimization by simulated annealing," Science22O,671-680 (1983). 10. S. Geman, and D. Geman, "Stochasticrelaxation, Gibbs distributions, and the Bayesian restoration of images,"IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6,72L-74L (1984).
There is a simple but powerful learning (qv) algorithm (1,11) 11. G. E. Hinton and T. J. Sejnowski,"Optimal perceptualinference," Proc.IEEE Conf. Comput. Vision Pattern Recog.,Washington DC, that allows a Boltzmann machine to learn weights that constipp. 448-453,1983. tute an internal, generative model of the structure of an envi12. F. Crick and G. Mitchison, "The function of dream sleep,"Nature ronment in which it is placed.The environment clamps config3 0 4 , 1 1 1 - 1 1 4( 1 9 8 3 ) . urations of on and off states on a "visible" subset of the units. The learning algorithm modifies the weights so as to maximize G. E. HwroN Carnegie-Mellon University the likelihood that the same probabitity distribution of configurations will occur over the visible units when the machine is run without environmental input. The learning works in two phases.In the positive phasethe environment clampsthe visi- BORIS ble units, the network settles to thermal equilibrium at a finite temperature, and the weights between units are in- An understanding program written by Michael Dyer at Yale creasedby an amount proportional to how often the units are in L982, BORIS can read and then answer questions about both on together at equilibrium. In the negative phase the several complex narrative texts (seeQuestion Answering) and visible units are unclamped, the network settles to equilib- it uses the approach of integrating parsing and inferencing rium, and the weights are decreasedby an amount propor- (see M. Dyer, In-Depth Understanding, MIT Press, Camtional to how often the two units are on together (12), The bridge, MA, 1933). result of repeatedly applying this procedure is that the netK. S. Anona work turns its nonvisible units into feature detectorsthat alSUNY at Buffalo low it to represent the structure of its environment in the weights. Problems Several obstaclescurrently prevent Boltzmann machines from being of practical use. It can take a long time to reach thermal equilibrium (10) so if the weights are hand coded,care must be taken to avoid energy barriers that are too high for annealittg searchesto cross.If the weights are learned, equilibrium must be reached many times to know how to change the weights, and the weights must be changed many times to construct goodmodels, so even very simple learning tasks require many hours of CPU time.
See Processing, bottom-up and BOTTOM-UP PROCESSING. top-down.
BRANCHINCFACTOR
A branching factor is a parameter that measuresthe effective complexity of a problem or a searctr(qv) algorithm, especially those charactertzed by an exponentially growing complexity. The term branching factor has evolved from the metaphor of a uniform tree where each internal node sprouts exactly b branches and the total number of nodes up to depth d is 164+r- 1)/(b - 1). Thus, if an algorithm searchessuch a tree BIBLIOGRAPHY and generatesevery node up to depth d, the complexity of that 1. D. H. Ackley, G. E. Hinton, and T. J. Sejnowski,"A learning algorithm will be roughly tbl(b L)lbd,with b measuring the to each additional level of due in relative increase complexity -169 for Boltzmann machin€s," Cogn. Sci. 9, I47 algorithm search (see Problem solving). (1985).
82
BRANCHINGFACTOR
This growth rate measurement can be extended to algorithms whose search spacesare nonuniform trees. It d stands for the maximal depth reached by an algorithm A, and Na stands for the number of nodes generated during the search, then the effective branching factor, Be, can be defined by
where fa is the unique positive root of the equation xbr-x-l_o
(6)
Moreover, this branching factor is the best achievable by any game-searchingalgorithm (seeAlpha-beta pruning). (1) Ba - (No;tra Roughly speaking, a fraction of only B lb : $Lt4ofthe b legal available from each game position is exploredby alphamoves Indeed, when applied to a uniform tree, this formula gives beta. Alternatively, for a given search time allotment, the alpha-beta pruning allows the search depth to be increasedby (2) 86: (h)''ou a factor log bllogfi, : 413over that of an exhaustive minimax search. which, for large d, reduces to Under perfect ordering of successors,alpha-beta examines ( 3 ) a total of 26dt2 1 game positions,thus, Ba_ b In general, the complexity Na ma1zvary significantly from one problem instance to another and may be a complex function of d. Therefore, the definition of Ba is usually applied to the average number, 14, of nodes generated by algorithm A and usually invokes the limit as d -) oo: Bs- jg tlo(d)frtd
(4)
This definition extracts the basis of the dominant exponential term in the expressionof Ia@). In summary, the branching factor measures the relative increase in average complexity due to extending the search depth by one extra level or, equivalently, it measuresthe average number of branches explored by an algorithm from a typical node of the search space(1).
B _ b for exhaustive search. B : b3t4for alpha-beta with random ordering. B _ bltz for alpha-beta with perfect ordering. It is important to mention that the branching factor only captures the asymptotic growth rate of a search strategy as the search depth increasesindefinitety; it doesnot reflect the size of nonexponential factors in I(d) regardless of how large they are. However, an exact evaluation of the average performances of three game-searchingstrategies shows that the ratio /( illBd is fairly small (3); it remains below 5 over wide 20). r a n g e s o bf a n d d f t ' 2 0 , d =
BIBLTOGRAPHY
Applications The primary usage of the branching factor has been in comparing the pruning power of various game playing strategies (see Game playing; Game trees). Theoretical analysis of these strategies usually assumes uniform, b-ary game trees, searchedto depth d, with random values assignedto nodesat the search frontier (1). Based on this model, it can be shown (2,3) that the branching factor of the alpha-beta pruning (qv) algorithm (as well as that of SCOUT and SSS.) is given by
B:
tt : 1- tu
[3t4
(5)
pruning," 1. D. E. Knuth andR. E. Moore,"An analysisof alpha-beta Artif. Intell. 6(4),293-326(1975)2. J. Pearl,"The solutionfor the branchingfactor of the alpha-beta pruning algorithm and its optimality," CACM 25(8),559-564' (1982). g. J. Pearl, Heuristics:Intelligent SearchStrategiesfo, Computer Readitg,MA, Chapters8 and9, ProblemSoluing,Addison-Wesley, 1984. J. Pr^qnl UCLA
THEREADINCOF RECOGNITION: CHARACTER TEXTBY COMPUTER
CADUCEUS An expert system for medical diagnosis (see Medical advise systems) developed by Jack Myers and Harry Pople at the University of pittsburgh and completed in 1985. This system is an enhancement of INTERNIST (qv) in that it incorporates causal relationships in its diagnosis(seeP. Szolovits(ed.),Ar' tifi.ciatIntettigrnri in Med.icine,westview,Boulder, co, 1982). K. S. Anone SUNY at Buffalo
The reading of text by computer is an AI topic that has been investigated for more than 25 years. An early example is the work of Bledsoe and Browning (1). The objective of work in this area is to develop the ability to convert an image (twodimensional array of intensity values known as pixels) of text into a computer-interpretable form, such as ASCII code,with the same fluency and accuracy that a human could read the same material.
THE READINGOF TEXT BY COMPUTER RECOGNITION: CHARACTER
Currently, the most frequently used methodology for the design of reading algorithms has the three stages illustrated in Figure 1. The image preprocessingstage determines which of the image are text and isolates images of individual "rr", characters within the words of the text. These character imagesare then passedto a character recognition algorithm that identifies one or more letters that match each one. These decisions are then passedto a contextual postprocessingalgorithm that resolves ambiguities or corrects errors in the character decisions. The following sections of this entry survey the character recognition and contextual postprocessingaspectsof reading algorithms. The basic stratery of each area is discussed,and r.u.t.l notable AI approachesto each one are presented.An analogy is developedbetween these methods and explanations of human reading. The large gap that exists between the performance of current algorithms and human fluency is shown. The benefits to be gained by adapting results from studies of human reading to the development of reading algorithms are speculatedabout and preliminary efforts in this area are discussed. CharacterRecognition Character recognition techniques associatea symbolic identity with the image of a character. These methods can be generally classified as either template matching or feature analysis (seeMatchitg) algorithms. TemplateMatching. Template matching techniques directly comparean input character image to a stored set of prototypes. The prototype that matches most closely provides recognition. The comparison method can be as simple as a one-to-onecomparison of the input and prototype images or as complex as a decision tree analysis in which only selectedpixels are tested (2). Template matching is suitable for an application where a limited number of character images have to be recognized(3). However, it suffers from a lack of robustness because of a sensitivity to noise in the image and an inability to adapt to differencesin character style. It is interesting from an AI perspective that template matching has been ruled out as an explanation for human performance for similar reasons (4). FeatureAnalysis. Feature analysis techniques are more frequently used for character recognition. In this approachsignificant features are extracted from a character image and compared to the feature descriptions of ideal characters. The description that matches most closely provides recognition. A comparison procedure favored by several AI researchers is basedon the size and relative placement of strokes. Strokes in this context are somewhat analogousto the strokes made by a
Scanning
Input document
person when a letter is drawn. For example, an F is composed of thtee strokes: one long vertical stroke and two short horizontal strokes. An advantage of feature analysis is its ability to adapt to new characters and its tolerance to noise in an image. Thus, the capabilities of a human reader are captured more accurately by feature analysis than by template matching. This has causedfeature analysis to be proposedas a model for human letter recognition (4). Many feature analysis techniques have been developedand applied to character recognition. Most of these are examplesof traditional pattern recognition methods and are usually suitable for application to constrained domains. SomeAI methodologies have been incorporated in such traditional techniques [e.g.,a rule-basedsystem (5)] (seeRule-basedsystems).In particular, the use of a semantic network (qv) is discussedbelow. Two additional AI approaches are also presented that have utilized analogies to the human character recognition process. The feature analysis technique developedby Krumme (6) is an example of a traditional solution to the character recognition problem that uses AI techniques. It uses a semantic network to encodeknowledge about strokes. The network is also used to direct the analysis of a character image. The Krumme network is made up of many types of directed arcs and nodes, only a small portion of which are describedhere. The subset arc s states that the node at its tail is a subset of the node at its head. The property arcp states that the node at its tail has the property at its head. Terminal nodes represent a primitive property of the image, and nonterminal nodesrepresent a settheoretic property about the image. A node with outgoing s and p arcs represents the largest subset of the set at the head of the s arc with the property at the head of the p atc. A node with more than one outgoing s arc represents the intersection of the sets at the heads of the s arcs. Descriptionof F. The example description of the capital F shown in Figure 2 illustrates these concepts.Node 2 represents the subset of all the input with a major vertical line on the left. Note that this includes many letters such as B, D, E, F, H, and so on. Nodes 4 and 5 represent the strokes near the top and middle of the major vertical line, and node 6 represents the conceptthat there is no other stroke near its bottom. Nodes 7 and 8 represent the concept that the horizontal line near the top of the major vertical line is on its top and to its right. Nodes 9 and 10 represent a similar conceptfor the horizontal line near the middle of the character. Finally, node 11 represents F as the intersection of the sets represented by n o d e s6 , 7 , 8 , 9 , a n d 1 0 . This is not only a description of F but also a plan to follow for its recognition. The terminal node input reads a character image and begins recognition. The major vertical line is then tested for, and if it is located, additional tests are carried out to locate the appropriately oriented strokes near the top and mid-
Segmented word images
lmage preprocessing
83
Character recognition
Character decisions
Coded word decisions
aI Contextu postprocessing
Figure 1. Methodolory of most current reading algorithms.
A S CII, EBCDIC, etc.
B4
CHARACTER RECOGNITION:THE READINGOF TEXTBY COMPUTER
mine the functional attribute at the pivot of the ambiguity (7). An intermediate skeletal level provided a description that distinguished characters from everything else as well as from characters in other families of type fonts (8). This level of description was implemented as a set of graphs, one for each character in each font family. The lowest physical level in this hierarchy is where actual character images were placed. This representational system can be used for recognition in several ways. Functional descriptionscan be used directly if O nr i g h t H o r i z o n t aI procedures are developedto detect the features they specify. p line This would be appropriate if it were known a priori that only character images (not graphics, halftones, etc.) might be preFocustop sented to a recognition system since functional descriptions can only distinguish one character from another. Otherwise, a On top/ bottom skeletal representation would be a better choice since it can 1 1 discriminate characters from everything else. This corre10 sponds more closely to the way people read letters; however, its font-specific nature loses some robustness. The main advantage of this line of research is its acknowledgementof the On right complexity of the character recognition task and the necessity Horizontal l i n e to incorporate knowledge about human character recognition p in algorithms. Knowledge Source Organization. The robustness of human Ietter recognition and its place in a more complex reading processthat involves the syntax and semanticsof an input text In p u t was acknowledged by Brady and Wielinga (9). They studied the organization of knowledge sources needed to read handprinted FORTRAN coding sheets.As part of this project they developedalgorithms for the recognition of isolated characters that could utilize syntactic and semantic information provided by a FORTRAN reasoner. Clearbottom A stroke-based character recognition scheme similar to of F in the Krummenetwork(adaptedfrom that of Krumme was implemented in an early version of this Figure 2. Representation system. In this method strokes and relations between strokes Ref.6). were used to specify models for characters. For example, the model for an F shown in Figure 3 specifiesthat it must contain one vertical stroke and two horizontal strokes. One of the horidle of the image. If any of these tests fail, backtracking takes zontal strokes must be above both the other strokes, and the place and the presenceof primitives from other characters is two horizontal strokes must be to the right of the vertical determined. For example, in the complete syst€ffi,if the major stroke. The junctions between strokes were also specified. vertical line cannot be located,a loop, such as occursin an O, is There must be an Z junction between the vertical stroke and tested for next. Advantages of this approach include its use of one of the horizontal strokes, and there must also be a 7 junca more flexible control structure than most traditional meth- tion between the vertical stroke and the other horizontal ods. Disadvantagesinclude its application to a limited alpha- stroke. bet of only 20 uppercaseletters. Although many casesof disRecognition was determined by the features specified in torted input were recognizedcorrectly, the robustnessof this such models. The relations and junctions between strokes aptechnique remains unclear. pear to have been particularly important features. It is interIsolated Characfers.The development of an algorithm for the recognition of isolated characters that overcomesthe constraints of traditional techniques was pursued by the MIT research group composedof M. Eden and B. Blesser, among STROKES 3 [ V HT HB ] ; Vertical, Horizontal Top, and Horizontal Bottom others. As part of this work they developed a character de- D IR EC T ION S hor i z ontat I H T H B I v er ti c al t V l scription schemethat was basedon human experiments.The people to use features if the was that approach this behind idea R ELAT IO N S [ ( ABO VE H T H B) (ABOVE HT V) recogntzeLettersare properly describedand used in a charac(RIGHT HT V) perform as ter recognition algorithm, the algorithm should (RIGHT HB V) I well as a human. Functional,Skeletal,and PhysicalLevels.Three levels of de- PO SIT ION S [((HH- -TMOIPD DHLTE) HB) scription were distinguished. The abstract or functional level ( V- LEF T V) ] defined the essential meaning of letters in terms of a set of ruNcrIoNS tli-iUilSl3il features or functional attributes. These were determined by a XilI]r procedure that included the presentation of ambiguous characfor F (adapted from Ref. 9). representation Figure 3. Brady's ters to human subjects and the use of their responsesto deter-
RECOCNITION:THE READINCOF TEXTBY COMPUTER CHARACTER
8s
binary data in the matrix indicates whether or not the letter combination that specifiesits location occursin the diction ary. A 1 (logical true value) indicates the occurrenceof the letter combination, and a 0 (logical false value) indicates its nonoccurrence.Typically other n values (position indices) are associFigure 4. Exampleof handprintedFORMATstatement. ated with each anay. These teII the positions in which the letter combinations occur within diction ary words. This method can be used to detect as well as correct eruors esting that FORTRAN knowledge could be used to modify the in the output of a character recognition algorithm. Many error search for these features. For example, if Figure 4 was input types can be handled by this approach;however, only the suband a FORMAT statement was expected,an F would be sought stitution of one character for another is described here since in the first position. At some point in this processconfirming this is the most commonerror in character recognition. A word evidence would be sought for the L junction at the top of the is consideredcorrect only if the intersection of all its appropricharacter. However, since this junction is not physically con- ate n-gram entries is nonzero. Otherwise, it must contain an nected, additional image processing could be invoked to in- error. The position of the error is determined by intersecting crease the length of the strokes to see if they could be con- the sets of position indices that returned zero in the detection nected. If this occurred, the junction was confirmed; otherwise phase.If there is only a single position in this intersection,it the presenceof an F was denied. contains the error. Vectors from all the aruays that involve The sensitivity of this approach and its dependenceon that position, given that the other positions are correct, are many empirically determined thresholds such as how far to then intersected. If there is only a single letter in that interincrease the length of strokes was acknowledged.An alterna- section,it can be substituted in the error position to produce a tive representation basedon a two-dimensional version of gen- word that is acceptableto the n-gram arrays. eralized cylinders was proposedto better capture the characAn example illustrates these points. Figure 5 shows a dicteristics of letters that are used by human readers.However, it tionary of the three three-letter words {cat, cot, tot}. The three was not extensively describedin published accounts.The im- binary digram (n portant point of this work is its acknowledgment of the com- shown. plexity of readittg and the necessity to use many knowledge If a character recognition technique outputs the string coo, sources to adequately recogRizeisolated characters. On the detection of the error would be done by detrimental side, one knowledge source that apparently was dt,z(c,o) O dt,sk, o) n dz,sb, o) not incorporated is very important to human readers. This is
t-ORV/hT
the dependenciesbetween letters of an input vocabulary. This knowledge sourcehas been extensively investigated and used in contextual postprocessingtechniques.
This would return 0 from both dt3 and dz,s.Since the intersection of {1, 3} and {2, 3} yields {3}, correction is done by intersecting the vectors: dt,s(c,*) O dz,s(o,x)
ContextualPostprocessing Contextual postprocessing techniques utilize a knowledge source one step above the level of individual characters to resolve ambiguities and correct errors in character recognition. These methods use information about other characters that have been recognized in a word as well as knowledge about the text in which the word occursto carry out this task. Typically, the knowledge about the text takes the form of a dictionary (a list of words that occur in the text). For example, a character recognition algorithm may not be able to reliably distinguish between a u and a u in the secondposition of q.ote. A contextual postprocessingtechnique would determine that u is correct since it is very unlikely that qvote would be in an English language dictionary.Thus, some of the knowledge possessedby human readers is incorporated in such methods. Methods of contextual postprocessingdiffer in their manner of knowledge representation (qt). Some methods use an approximation to a diction ary that often takes the form of probabilities of letter transitions (10). Other approachesuse an exact representation such as a serial representation (11), a hash table (Lz), or a graph structure (18). Binary n-Grams. The method of binary n-grams is one approach that uses an approximate representation (L4). In this method a set of n -dimensional binary arrays representsa dictionary. Each of the dimensions can take on one of m values, where rn is the number of letters in the alphabet, and the
The resulting vector has only one nonzero element, corresponding to a t. Therefore coo is corrected to cot. This short example illustrates several of the advantages and disadvantagesof this method. The computations to locate and correct errors are relatively simple and involve only binary comparisons. Hence they can be economically implemented. However, the potential storage costs are also apparent by observation of the sparsenessof the arrays. This is a major weakness of this method as discussedin Ref. 15, where the binary n-gram technique is comparedto an approachthat uses an approximate statistical representation of a dictionary. The binary n-gram method is shown to be goodfor error detection; however, the statistical approach is better for error correction especially because of the large amount of memory
di c ti onar y : { c at, c ot, tot }
a
a
c
c
c
o
o
o
t
t
t
d r ,z
a
dt,z
d2,3
Figure 5. Example dictionary and its representation by binary digram arrays.
CHARACTER RECOCNITION:THE READINGOt TEXTBY COMPUTER
needed to yield good correction performance with the binary n-gram method. DVA. The dictionary viterbi algorithm (DVA) is a contextual postprocessingtechnique that uses an exact representation for a dictionary (seeViterbi algorithm). A graph of letter alternatives produced by a character recognition algorithm is first set up. An example of such a graph is shown in Figure 6. The string tof at the top of the graph is assumedto be input from a character recognition algorithm and {o, c, o, t} is the alphabet of the sourcetext. Each node is labeled with a letter of the alphabet and has a cost associatedwith it that is the probability that the letter on the node is confused with the corresponding letter of the input word. Each arc in the graph also has a cost associatedwith it that is the probability that the letter at its head follows the letter at its tail in the source text. A path is traced through this graph in a left-to-right manner one column at a time. The costsof all the ways of reaching a node from nodes in the previous column are computed, and only the partial path with the best cost is retained. Each time the cost of an arc is evaluated, the presencein the dictionary of the substring composed of the letters on the path from the beginning of the graph to the node at the head of the arc is determined. If it does not occur in the diction&ry, this partial path is discarded from future consideration. This evaluation processis performed once for every node in the graph of alternatives. The letters on the best path from the first node to the last node are output. Trie. The simultaneous searching of the graph of alternatives and the dictionary is done with a data structure for the dictionary known as a trie. An example trie for the dictionary {cat, cot, tot} is shown in Figure 7 .If the graph of alternatives shown in the previous figure was evaluated with this trie, only the c and f nodesin the first column would be consideredsince these are the only two letters at the first level of the trie. At the next step only one path to each of the o and o nodesin the secondcolumn would be retained. These partial paths would most likely be ca and to. At the next step, only cat and tot would be consideredbecauseof the absenceof any other paths in the trie. Most probably tot would be output becauseit is most like the input. The DVA, as other techniques that use an exact representation for a diction &ty, is more accurate than methods that use an approximate representation. It is shown in Ref. 13 that its performance is better than a similar technique that uses an approximation. However, methods based on exact representa-
Figure 6. Example graph of alternatives for DVA.
Figure 7. Trie representation for {cat, cot, tot}.
tions incur additional processing costs. The acceptability of these costs should be determined by the application and the need for improved performance. Applications Reading technolory has many examples of practical applications in the marketplace. Small desk-top character readers that cost about $10,000each and can typically recognizeup to six fonts have recently been appearing in offices.The Kurzweil Corporation manufactures medium-sized character readers that cost about $35,000each but can recognizea wide range of character fonts (16). The United States and other countries have recently installed large postal address-readingmachines that cost about $500,000each;however,they must meet more stringent performance requirements than most other character readers(17). The performance of all these machines is controlled by many constraints. Deviations from these constraints can cause a large deterioration in performance(18). In most casesindividual characters must not touch one another, and text must be clearly printed in dark ink on a lightly coloredbackground (19). In some units the location of individual charactersmust fall within prespecified limits. Such constraints are present even in products manufactured by the Kurzweil Corporation. Even these machines require that characters be unsmudged and that adjacent characters not touch one another. Furthermore, although the ability to read many different fonts is claimed for the Kurzweil Data Entry Machine, this capability is achieved by requiring an operator to train the machine on new fonts. This constraint frequently causesthe machine to misrec ognrzetext printed in a font that it has not previously seen(20). The mere presence of such constraints in even the most sophisticated reading machines illustrates that the ability to read text automatically with the same fluency as a human remains an unachieved goal. This is further evidencedby the performance of portal address-reading machines that have been the subject of much research and development and are designed to read relatively unconstrained text. These machines can correctly read over 907oof the addressesthat appear on machine-printed first-class mail. However, they can only read about 347oof the addresseson mail from collection boxes. Overall, 627o of the addresseson mail processedby postal reading machines are correctly recognized(2L). These percentagesare basedon mail samplesthat were readableby a hr*utt operator. This shows that even the most expensive commercial equipment is not nearly as fluent as a human reader. Obviously much work is needed if a program is to reach levels of human comPetence.
CHARACTER RECOGNITION: THt READINGOF TEXTBY COMPUTER
Conclusions Levels of performance comparable to human capabilities are thus unachieved. Although some notable efforts exist (discussed above) for studying the human process of character recognition and applying the results of those studies to the analogousmachine process,no such effort has been carried out at the word level. The potential successof this approach is clear when the basic strategy of current algorithms is compared to explanations of human performance in word recognition. The recognition of individual characters followed by postprocessingwith a dictionary as an explanation for human performance was rejected in 1886 Q2). Although some interesting similarities exist between the relaxation-based word recognition system of Hayes (23) and the contemporary theory of word perception proposedby McClelland and Rummelhart (24), the algorithm is different from the theory in essential places. The development of a synergism between algorithms and theories is essential if algorithms are to reach levels of human competence. A preliminary study of word perception by human and computer was carried out by Brady (25). He used a computational simulation of human early visual processingto show that previous psychologicalresults that were attributed to higher level processingcould in fact be accountedfor by visual processing. He also speculatedon the importance of such an investigation to the development of an understanding of human and machine reading. The relationship between the shape of words and their recognition by humans and computershas been investigated (26). Word shape (the pattern of ascenders,descenders,and normalheight characters in a lowercaseword) is a visual cue that has been known for many years to be useful for word recognition by humans. Several alternative representations for word shape are looked into, and a representation is found that produces a small search space in a large dictionary. This representation is based on features that can be reliably extracted from word images and does not require the segmentation of words into characters. This avoids the major pitfall of current reading algorithms and more closely reflects the way visual information is used in the early stagesof word recognition by humans. These efforts are just the beginnitrg of what is needed to develop a fluent reading ability for computers. The background material discussedin this article points out the great amount of effort already expendedin the developmentof reading algorithms and shows several notable approachessuitable for application to limited domains. However, the many constraints imposed on implementations of these techniques and their lack of demonstrable general-purposeperformurt." illustrates a large gap between human and machine reading capabilities. Only if further efforts are made to apply resulti from studies of human reading to the design of algorithms will this gap be bridged.
BIBLIOGRAPHY 1. W. W. Bledsoeand I. Browning, "Pattern recognition and reading by machine," Proc. Eastern J. Comput. Conf. L6,225-282 (1g5g). 2. K. Y. wong, R. G. Casey, and F. M. wahl, "Document analysis system," IBM J. Res. Deuelop.26(6), 647-686 (November rgg2). 3. R. O. Sheppard,Jr., Feasibility and Implementation of an Adap-
87
tive Recognition Technique, PTR Research Report, USPS Research and Development Laboratories, January 1978. 4. R. N. Haber and L. R. Haber, "Visual componentsof the reading process,"Visible Lang. XV(2), I47-181 (1981). 5. D. D'Amato, L. Pintsov, H. Koay, D. Stone,J. Tan, K. Tuttle, and D. Buck, "High speed pattern recognition system for alphanumeric handprinted characters," Proceedingsof the IEEE Computer Society Conferenceon Pattern Recognition and Image Processing, Las Vegas, Nevada, July L982,pp. 165-170. 6. D. W. Krumme, Theory and Implementation of a Network Representation of Knowledge: Application to Character Recognition, Ph.D. Thesis, University of California, Berkley, June 1gzg. 7. R. J. Shillman, Character Recognition Based on Phenomenological Attributes: Theory and Methods, Ph.D. Thesis, Massachusetts Institute of Technology, August t97 4. 8. C. H. Cox, III, P. Coueignoux,B. Blesser,and M. Eden,"skeletons: A link between theoretical and physical letter descriptions,"Pa,ttern Recog.,15(1),IL-22 (1982). 9. J. M. Brady and B. J. Wielinga, Readingthe Writing on the Wall, in A. R. Hanson and E. Riseman (eds.),ComputerVision Systems, Academic Press,New York, pp. 283-299,1928. 10. R. Shinghal and G. T. Toussaint, "Experiments in text recognition with the modified viterbi algorithm," IEEE Trans. Pattern Anal. Mach. Intell., PAMI-L(2), 184-192 (April lg7g). 11. R. Shinghal and G. T. Toussaint,"A bottom-up and top-downapproach to using context in text recognition," Int. J. Man-Mach. stud., tt, 20L-2I2 (19?g). L2. W. Doster, "Contextual postprocessingsystem for cooperation with a multiple-choice character-recognition system," IEEE Trans. Comput. C-26, 11 (November Lg77). 13. J. J. Hull, S. N. Srihari, and R. Choudhari, "An integrated algorithm for text recognition: Comparison with a cascadedalgorithm," IEEE Trans. Pattern Anal. Mach.Intell.,pAMl-b(4), Bg4395 (July 1983). L4. A. R. Hanson, E. Riseman, and E. G. Fisher, "Context in word recognition,"Pattern Recog.,8, 35-45 (lgZG). 15. J. J. Hull and S. N. Srihari, "Experiments in text recognitionwith binary n-gram and viterbi algorithms," IEEE Trans. Pattern Anal. Mach. Intell., PAMI-4(E), b20-590 (September1982). 16. R. C. Kurzweil, "Artificial intelligence program at CORE of scanning system", Graphic Arts Monthly 5G,5G4-b66 (July 1984). L7. J. J. Hull, G. Krishnan, P. Palumbo, and S. N. Srihari, optical Character Recognition Techniques in Mail Sorting: A Review of Algorithms, Technical Report 214, State University of New York at Buffalo, Department of Computer Science,June 1984. 18. J. Schurmann, Reading Machines, Proceedingsof the Sixth International Conferenceon Pattern Recognition, Munich, FRG, October 1982,pp. 1031-1044. 19. Automation: A Guide to BusinessMail Preparation, Publication 25, United States Postal Service, March 1984. 20. H. Brody, "Machines that read move up a grade," High Technol., 3(2), 35-40 (February 1983). 2I. USPS, Report on the Field Testing of Commercial OCR's, USPS Researchand Development Laboratories, 1980. 22. J. K. Cattell, "The time it takes to see and name objects,"Mind,, 11, 63-65 (1886). 23. K. C. Hayes, Jr., Reading Handwritten Words Using Hierarchical Relaxation, Ph.D. Thesis, TR-783, Computer Vision Laboratory, University of Maryland, CollegePark, Maryland, July, lg7g. 24. J. L. McClelland and D. E. Rumelhart, "An interactive activation model of context effects in letter perception:part 1. an account of the basic findings," Psychol. Reu., 88(b), g7s-407 (September 1981). 25. M. Brady, "Toward a computational theory of early visual processing in reading," Visible Lang., XV(2), 188-21b (Spring 1981).
PROGRAMS CHECKERS-PLAYING 26. J. J. Hull, Word ShapeAnalysis in a Knowledge-BasedSystem for Reading Text, Proc. of the Second IEEE Conferenceon Artificial Intelligence Applications, Miami Beach, Florida, December 1985, 114-119.
element of chance.The presenceof clear rules and goals makes it a game of strategy. Also, the game is one of perfect information in the sense that at any given time both players have complete knowledge of all the previous moves and the current board situation. Finally, the outcomeof a game is either a win for one of the two players and a loss for the other or a draw: General References Checkers is therefore a zero-sum game. G. Nagy, Optical Character Recognition:Theory and Practice, in P. R. Like most other game-playing programs all known proKrishnaiah and L. N. Kanal (eds.),Handbook of Statistics,Vol. 2, grams for playing checkers search a game tree (qv), an exampp. 621-649, L982,is a survey of statistical feature analysis techple of which is shown in Figure 1. In such a tree nodes correniques for character recognition. spond to board positions and branches correspondto moves. E. Reuhkala, "Recognition of strings of discrete symbols with special The root node represents the board position from which the application to isolated word recognition," Acta Polytech. Scand. player whose turn it is to play is required to make a move. A Ma 38, l-92 (1983)containsa brief survey of methodsfor contexnode is at ply (or depth) k if it is at a distance of k branches tual postprocessingand an exhaustive bibliography. ply fr, which has branches leaving it S. N. Srihari , Cornputer Text Recognition and Enor Corcection,IEEE from the root. A node at Computer SocietyPress,Silver Spring, MD, 1984,is a tutorial on and entering nodesat ply k + 1, is called a nonterminal node; the reading of text by computer. Twenty basic papers and an ex- otherwise the node is terminal. A nonterminal node at ply ft is connectedby branches to its offspring at ply k + 1. Thus, the tensive bibliography are given. C. Y. Suen, M. Berthod, and S. Mori, "Automatic recognition of offspring of the root represent positions reached by moves from handprinted characters-the state of the art," Proc. IEEE 68(4), the initial board; offspring of these represent positions reached 469-487 (April 1980) is a survey of techniques developedfor the by the opponent'sreplies; offspring of these represent positions recognition of isolated handprinted characters. reached by replies to the replies, and so on. The number of I. Taylor and M. M. Taylor, The Psychology of Readirg, Academic branches leaving a nonterminal node is the fan-out of that Press,Orlando,FL, 1983,is an overview of researchabout human node. The term branching factor (qv) is used to denote the reading. Contains a comprehensivebibliography. average fan-out for a given tree over all nonterminal nodes. J. R. Ullmann, Advances in Character Recognition,in K. S. Fu (ed.), A complete game tree represents all possible plays of the Applications of Pattern Recognition, CRC Press, Boca Raton, FL, game. Each path from the root to a terminal node corresponds pp. 197-236, L982,is a general overview of character recognition to a complete game with the terminal nodes representing a techniques oriented toward practical applications. A comprehenwin, loss, or draw. It has been estimated that a completegame sive bibliography including many U.S. and U.K. patents is given. tree of checkers contains approximately 1040 nonterminal nodes(3). Assuming that a program is capableof generating 3 J. J. Hur,r (10e)such nodesper second,it would still require in the SUNY at Buffalo biltion vicinity of 1021centuries in order to generate the whole tree. Instead, checkers-playingprograffis, like programs for playing most other similarly challenging games,search an incomplete NG PROGRAMS CHECKERS.PLAYI tree. The depth of such a tree is limited and, in addition, it is often the casethat not all paths are explored.In an incomplete tree terminal nodes are those appearing at some predefined Programs Game-Playing ply k or less and do not necessarily represent positions for Programming computers to play games is one of the earliest which the game ends. A static evaluation function is used to areas of AI research (1,2).As it did in the past, it continues assign a value to each of the positions.representedby terminal today to attract workers for a number of reasons.The first and most obvious of these is that the ability to play complex games appears to be the province of the human intellect. It is therefore challenging to write programs that match or surpassthe ployer I skills humans have in planning (qv), reasonitg, and choosing ployer 2 among several options in order to reach their goal. Another plyo motivation for this research is that the techniques developed while programming computers to play games may be used to solve other complex problems in real life, for which games serve as models. FinaIIy, games provide researchersin AI in particular and computer sciencein general with a medium for ply I testing their theories on various topics ranging from knowledge representation (qt) and the process of learning (qv) to searching algorithms (seeSearch)and parallel processing.The game of checkerswas one of the first for which a program was ply 2 written. This entry describesthe early and important work of (5) and Samuel (3,4) as well as more recent efforts by Griffith Akl and Doran (6) (see also Game playing).
u o
The Game of Checkers Checkers is an old board game believed to have originated in ancient Egypt (Z). It is played by two persons and involves no
ply 3
Figure l. A gametree:P, Q, andE are boardpositions.Number9 is the value ofthe alpha-betasearchofpositionP.
PROGRAMS CHECKERS.PLAYING
nodes.The alpha-beta algorithm (a refined version of minimax (qv) analysis) is then used to back up these values up the tree (see Atpha-beta pruning). When all the offspring of the root have been assigned backed-up values representing their "goodness,"the progTam choosesthe move that appears to be best (in light of this incomplete information). Once this move is made and the opponent has replied, the program generates and searchesa new tree from the current position to determine its next move. Note that game trees are generated while they are searched.A so-calleddepth-first search (qv) is usually followed: It starts by generating a completepath from the root to a terminal node; search then resumes from the latest nonterminal node on the path whose offspring have not all been generated or eliminated by the alpha-beta algorithm. Search continues until all nodes-up to some depth k-have been either generated or eliminated.
B9
W HI T E
t2
t3
3
l4
to
tl 7
I 4
r5
r6
t7
r9
20
2l
22
23
24
25
26
?8
29
30
3l
32
33
34
35
6
5
2
BLACK Figure 2. The standard8 x 8 checkerboard.
Samuel'sWork
1 if square i holds a black king; otherwise, it is set to 0. The third and fourth words are defined similarly for the white The best documented checkers-playing program was written -1967 pieces.Note that bits 9, L8, 27, and 36 correspondto squares (3,4). plays The program by Samuel in the period L947 not appearing on the board and are therefore unused in all players loses and most at avery high level: It can win against game four words. a against one win to managed it 1962 In only to the best. To see how all the possible next moves can be generated former champion of Connecticut and drew another with the world champion in L965. The purpose of Samuel'swork was to quickly from a given position, assumethat it is Black's turn to use the game of checkers to perform experiments in machine play. Ignoring kings and jumps for the moment, the rules of learning. The result was one of the earliest and most success- checkersspecify that piecesare only allowed to move forward ful game-playing programs that could learn from its own mis- to a diagonally adjacent square. Hence a black pieceon square i can go either to square i + 4 (by a right move) or to square i + takes to improve its plaY. 5 (by a left move) provided that such a numbered square appears on the board. For example, if there are four black men on lanin assembly program was written Representation.The squares5, 13, L5, and 26, as shown in Figure 3, the squares guage for the IBM 700 series computers whose word length was 36 bits. A clever technique that savedboth time and space reachableby these men are 10, 17,19 and 20, and 30 and 31, was used to represent a board position and generate all possi- respectively. By shifting the contents of the word in Figure 3 ble next moves. Consider the standard 8 x 8 checkerboard four positions to the right, one obtains a representation of the shown in FigUre 2, where black squares are numbered, and squares potentially occupied by right moves, namely, 9, 17, recall that checker pieces can be placed exclusively on these 19, and 30, as shown in Figure 4 (squarl9, of course,is not on squares.Thus, only four computer words, each with bits num- the board). Similarly, by shifting five positions to the right, bered 1-36, are needed to represent a given position. In the one obtains a representation of the squares potentially occufirst word bit i is set to 1 if square i holds a black piece (man or pied by left moves, namely 10, 18, 20, and 31, as shown in king); otherwise, it is set to 0. In the secondword bit i is set to Figure 5 (square L8 is also not on the board). Now let EMPTY
|
2
3
4
5
6
7
8
9 t O l r 1 2 1 3 1 4 1 5 1 6 t 7 r 8 1 9 2 0 2 1 ? 2 2 - 3 ? 42 5 2 6 ? 7 ? A e 9 3 0 3 1 3 2 3 3 3 4 3 5 3 6
Figure 3. Four black men on squares5, 13, 15, and 26.
12 1314 t5 t6 t7 t8 t9 20 2t 2?23 2425 ?6 27 28 29 30 3t 32 33 34 35 36
5
Figure 4. The word in Figure 3 is shifted four positions to the right to obtain all potential right moves.
7
14 r5 t6 t7 l8 19 20 2t 222324 25 2627 ?A 293031 32 3334 3536
Figure 5. The word in Figure 3 is shifted five positions to obtain aII potential left moves.
PROGRAMS CHECKERS-PIAYING
used to order the offspring, and this order is to be respectedin be a word such that bit i is set to 1 if square I in the current board position is unoccupied;otherwise, if square I is occupied the search that follows. This method was called plausibility by either a black or a white piece,bit i is set to 0 (note that bits analysis by Samuel, and it ordered the available movesbased 9, 18, 27, and36 are also set to 0). By taking a bit-by-bit logical on their promise. Dynamic Ordering. Samuel also introduced a technique AND of the word in Figure 4 with EMPTY, it is possible to obtain simultaneously all right moves available to the four that allowed the program to revise the ordering of moves arblack men. Similarly, a logical AND of the word in Figure 5 rived at by the plausibility analysis. Suppose that the offwith EMPTY yields all left moves. Backward king moves, spring of a node have been ordered as above; a search up to a jumps, and multiple jumps are handled by simple modifica- limited depth is now started from the offspring ranked "best." tions to this approach. In terms of storage five words are At the end of this search the backed-up value is comparedto that of the offspring ranked earlier as "second best." If the neededto represent the moves: one word for each of the jump, forward right, forward left, backward right, and backward left former is better, the search continues to a greater depth; othmoves.The various rules of checkers,such as crowning, recog- erwise, it is interrupted and a new limited search is started nizing a win, loss or draw, and so on, are incorporated in the from the cument best offspring. The method can be repeated as program within this representation in a straightforward way. many times as needed and to any required depth. Forward Pruning. Both fixed and dynamic ordering are simSearch. The program uses the alpha-beta algorithm to ply time-saving heuristics and do not in any way affect the search trees up to a maximum depth of 20 moves. Instead of overall outcome of the search. Another way of reducing the holding the actual depth used while searching from a move to number of nodes examined by the alpha-beta algorithm proa constant, it is allowed to vary according to the position under ceedsas follows. First, plausibility analysis is used to order all consideration.Typically, the program begins by looking ahead the legal moves from a node; then the best few of these are three moves. Nodes at that level are evaluated directly if nei- retained and the others discarded. The number of moves rether the last nor the next moves are jumps and no exchange tained is inversely proportional to the depth at which they are offer is possible. If any of these conditions is satisfied for a generated. A variant of this method is used at a later stage given node, however, search proceeds from that node. For when a node is chosento begin a search.If the value assigned nodes at depth 4 search terminates if neither a jump nor an earlier to that node by plausibility analysis falls outside the exchangeare possiblefrom that position. From ply 5 to ply 10 range currently set by the alpha-beta algorithm, the node is discarded. Neither of these two forms of forward pruning is look-aheadis interrupted if no jump is possible.Searchtermiguaranteed not to discard a good move. more by is ahead greater if one side ply L l or at nodes nates for than two kings. Learning.Two learning mechanismswere provided in the In many situations during the search the program needsto generated program to constantly better the quality of its play. The first having without directly estimate the value of a node to memorize moves (or rote learning), the secability disthe function, was evaluation A static its offspring. or examined cussedin more detail below, is used for that purpose.It con- ond was a variable static evaluation function that could be improved through training (or learning by generalization) (see sists of a computational procedure that assigns a numerical various Learning). on value to the position the node represents based pieces Rofe Learning.fn rote learning the program memortzed the the of worth and number parameters such as the program has, the mobility of these pieces and their potential boards and their evaluations that were encountered during for capturing opponent pieces, their situation on the board, the courseof previous games.Assume that a goodstatic evaluand so on. The primary application of a static evaluation func- ation function has already been constructed and that at some tion is in assigning scoresto terminal nodes. It can also be point during the game it is the program's turn to move from game tree in used to enhance the alpha-beta algorithm through ordering board position P. The program generates the function evaluation its static Figure L and determines using and pruning of moves. say. At positionP is 9, of value the that (qv) search and alpha-beta coupled with Fixed ordering' when depth-first search by the suggested move the makes program point the a this generate search and the alpha-beta algorithm is used to supNow 9. value with position P together stores and examsearch node are of a offspring the game tree, the order in which in a arise were to 6 Figure in depicted pose situation the is that moves of perfect ordering A great importance. ined is of funcevaluation static the defined as one in which, for any node in the tree, the first move later game. Rather than invoking the generated is the best for the player whoseturn it is. Then for a tion to assign a value to position P , the program could use of number total fu"" of depth D and branching factor B, the terminal nodes generated by the alpha-beta algorithm is approximately 2gDrz instead of the full BD. This represents a rignincant savings in time due to the large number of nodes eliminated by the alpha-beta algorithm and that the program therefore need not examine. Consequently, for a constant number of terminal nodes search depth can be almost doubled. Of course,there is no way of guaranteeing such ordering, and many heuristics (qv) exist that attempt to approximate it. One such heuristic used by Samuel's program is to perform a shallow look-aheadfrom a given node and use the static evaluation functlon to assign values to the resulting terminal nodes. Thesevalues are backed up by the alpha-beta algorithm to the Figure 6. offspring of the original node. The backed-up values are now
CHECKERS-PLAYING PROGRAMS
stored value of P. This would have two advantages.First, if the time required to retrieve the value of P from storage is much smaller than that required to compute the static evaluation function, time is saved that could be used to search deeper somewhere else in the tree. Secondand more important, the value assignedto P in this manner was obtained by searching to depth 3 below P (Fig. 1) and is therefore more accuratethan the static value that would otherwise be computed. The net effect therefore is an improvement of the look-aheadability of the program. In addition to the board position and its value, the tength of the path followed in the game tree to compute this value was also stored. Subsequently, when the program had to choose between two or more moves leading to positions with equal values, it favored the position whose value had been reached by the shortest search. A senseof direction was thus acquired by the program, which was able in this way to progressquickly toward its goal (e.g.,a win in the end game). The board positions and their associatedvalues were saved in a large file that was stored on magnetic tape due to centralmemory limitations. The file was organized so as to achieve storage efficiency and fast retrieval. In order to use as little spaceas possible,all the positions were savedas though white is to move, various rotational symmetries were exploited, and least-usedpositions were deletedperiodically. Quick accessof stored values was made possibleby indexing board positions according to some important characteristics (e.g., number of pieces)and by keeping them on the tape in approximately the order in which they might occur in actual play. In order to study the effect of rote learning, the program was trained by playing against itself and against humans (including masters) and by following many book games between masters. It was noticed that rote learning is particularly useful in improving the program's play steadily during the opening and end games but not so much during the middle game, where the number of possiblemovesfrom a given position is fairly large by comparison. The program reached a better-than-average novice level, having stored over 53,000positions.Samuel pointed out a limitation of rote learning if it were to be used alone: A program would need to accumulate an estimated number of about 1 million positionsto play at the master level. He concludedthat this would be too impractical, requiring an inordinate amount of playittg time, not to speak of the storage and retrieval problems. Other learning processesare therefore needed. Learningby Generalization. Samuel experimented with two static evaluation functions: a linear polynomial and a nonlinear signature table method. As with rote learning two training methods were used: in the first, the program played either against itself or against a human, and in the secondit learned by following book moves. The LinearPolynomialApproach.Here the static value assigned to a position is obtained from the polynomial wtpt + wzpz + ' ' ' + wnpn, where the parameter p; is a numerical measure of some feature of the board and u;; is a real-valued weight indicating the worth of pi. The larger the value of the polynomial, the more attractive the position is to the player moving to it. Typical parameters used as ADV (advancement), EXCH (exchange), MOB (total mobility), and THRET (threat). THRET, for example, is defined as the number of squares to which the player whose turn it is can move a piece and in so doing threaten to capture an opponent piece on a subsequent move. Assume now that pi corresponds to THRET for ro*" i.If board position Q leads to board position R by a move as shown
91
in Figure 1, the value of piis equal to the value of THRET for R minus the value of THRET for Q. There are two decisionsto be made when designing such an evaluation function: which parameters to use and what values the weights are to take. In this casethe first of these decisions was made in part by Samuel himself. He initially selecteda set of 38 board features. It was then left to the program to choose the 16 best of these as well as the values of the associated coefficients. To begin, the program selected arbitrarily 16 parameters, pt, pz, . , prc. Two versions of the program were then created, call them X and Y, each with 16 arbitrary weight s, Ln!, trz, . . , wr6. Version X played a sequenceof games against Y. During any given game X learned by generalizing on its experienceand changed its coefficientscorrespondingly,while the coefficientsfor Y remained constant. At each move X computed two evaluations for the current board position: the static value given by the polynomial and a backed-upvalue obtained by looking ahead a few ply in the game tree. On the assumption that the secondvalue ought to be more accurate than the first, X adjusted its coefficientsin order for the static value to better match the backed-up value. Whenever X won a game, its polynomial was used by Y in the next game. If X lost a sequenceof games, it would change its coefficientsat random in order to move away from the current local optimum. This technique is sometimes referred to as hill climbing or local neighborhood search. Parameter selection proceededin conjunction with the adjustment of coefficients.Starting with the 16 arbitrarily chosen parameters, the program keeps a count of the number of times each parameter is assigned the lowest coefficient. Following each move by X, this count is incremented until, for some parameter, it exceeds 32. This parameter is then removed from the polynomial and placed at the end of a queue formed by the currently unused parameters. The first element in that queue is now added to the polynomial. similar approacheswere used to selectparameters and adjust weights in playing against humans. Samuel observedthat the rcf of parameters and their weights reacheda stable state after several games.Considerable,though slow, improvement in the quality of the program's play was obtained by this method, particularly during the middle game, where it attained a better-thanaverage level. An alternative to actual play, book learning, proved to be a more efficient approach for adjusting the coeffrcients.Approximately 250,000 different board positions together *itft the move recommendedby an expert for each of them were stored on tape. The program was then asked to produce, for each position, all possible next positions and the associatedvalues of the 16 parameters. Then the coefficientw;forevery parameter p; was obtained from (L H)I(L + H), where L is the overall number of positions for which the value of the parameter was lower than its value for the recommendedposilion and f/ is the number of times it was higher. The major drawback of the polynomial approach is its 1inear nature. Two techniques were used in one version of the program to overcomethis weakness.One was to introduce new parameters that were logical combinations of earlier ones;the other was to divide the game into six phaseseach employing an entirely different polynomial. SignatureTables.A third method for obtaini.tg a nonlinear function of the parameters was suggestedby crimtrr (b) and used by Samuel very successfully in a later version of his
92
PROGRAMS CHECKERS-PLAYING
checkers-playittgprogram. Here each of the parameters measuring a board feature is restricted to take values from a small set of integers. Typically, the parameter GUARD is 0 if both or neither of the two players have complete control of their back rows, + 1 if the player whose turn it is controls his back row while the opponent does not, and - 1 if the latter condition is reversed. An n-dimensional table is then created (conceptually) with one dimension per parameter. Entries in this table represent static evaluations corresponding to various combinations of parameter values. Thus, Lf n - 2, for example,and the two parameters are GUARD and MOB, taking values from {- 1, 0, U and {-2, 1, 0, 1,2), respectively,then the signature in table is as shown Figure 7. If for the board under consideration MOB - 1 and GUARD - 0, this correspondsto cell (1, 0) in the table. Since this is a desirable situation, a relatively high value is found in (1, 0), and this signature is assignedto the position as its static evaluation. Signature tables therefore have potential to produce a fairly accurate estimate of the worth of a position, as they expressthe various dependencies among the parameters. Their major disadvantage, if implemented as describedabove, would be their inordinate storage space and learning time requirements for any nontrivial n. Samuel dealt with these problems as follows. First, 18 of 24 chosenparameterswere restricted to the values {-1, 0, 1}, and the remaining parameterstook their values from {-2, -L,0, 1, 2). Next, d hierarchy of signature tables was constructed. In the first level of the hierarchy the parameters were divided into six subsets each containing one five-valued and three three-valued parameters. Six signature tables, one per subset, were constructed,with entries chosenfrom {-2, -1, 0, 1,2}. For each three of these tables there is one second-Ieveltable with entries from -7 to 7. The third level consistsof just one table. When a position is to be evaluated, the parameters are measuredand used to index the first-level tables. The program then moves up in the hierarchy, with values read from tables at one level giving accessto tables at the next level. Finally, the entry obtained from the single third-level table is the static evaluation of the board position under consideration. As with the polynomial evaluation function, the game was divided into six phases with a different three-level signature table set for each phase, and the program was trained by following book moves. Two cumulative totals A (agfee) and D (differ), initially set to zero,are associatedwith eachcell in the hierarchy. As before, the proglam is made to follow book games. For any given position there is a number of next positions one of which is recommendedby the book. Each of these positions correspondsto one cell in each of the three levels of the signature table hierarchy. A 1 is addedto the D totals of all
-2 -l
MOB O I 2
-r
o GUARD Figure 7.
I
such cells not representing the book move; for cells associated with the book move, however, the A count is incremented by the number of nonbook moves. Once in a while the correlation coefficient C : (A - D)16 + D) would be computedfor every cell as a measure of the goodnessof the associatedpositions as book-recommendedmoves. The value obtained for C becomes the new cell entry after being adjusted to fall in the required range for every level in the hierarchy. Book learning worked particularly well: After following 1-73,989 book moves the program was tested on 895 new positions. It was able to predict the best move recommendedby the book 387oof the time and the secondbest 267oof the time. This performance was attained using only the evaluation function. When it conducted a tree search in addition, the program's ability to follow book moves was increased substantially. The signature table method was distinctly superior to the polynomial evaluation function in improving the quality of the program's play. Samuel's work was one of the first successfulcontributions to machine learning and game playing (8-10). No program before his had reached a championship level of play in a nontrivial game of strategy. Few other game-playing programs today exhibit a better performance. It remains therefore as one of the major achievements of AI research. SimpleHeuristicsand the PhaseTable Method Additional experiments with various static evaluation functions were conductedby Griffith (5). Using the book-learning approach, he showed that a very simple evaluator is better than the linear polynomial but not as goodas signature tables in capturing checkersknowledge.This new method is basedon four checkers-related heuristics: highest priority is given to moving a king and next highest to a move along the main diagonal and into the two central squares; third priority is given to all remaining moves, except those from specified squares in the first row and those leading to jumps, which are given lowest priority. Similarly, a second static evaluation function proposedin Ref. 5 was found to be at least as good as signature tables and considerably simpler to implement. In this method the game is divided into six consecutivephases, and for each phase a table is created with 98 entries representing all legal moves in the game. When a position is to be evaluated, its goodnessis determined by the value in the appropriate table correspondingto the move that leads to it. SearchingCheckersTreesin Parallel Besidesbeing used as an experimental ground for researchon learning, the game of checkers served to test the applicability of parallel processingideas to AI. A parallel computer is one consisting of several processingunits: Given a computational task it is subdivided into subtasks each of which is assignedto a different processing unit. Such a computer is of particular use to a game-playing program, as the time required to search enormoustrees could be significantly reducedthrough parallel processing.By speedingup the search,a program can examine deepertrees in a fixed amount of time and, as a consequence, improve the quality of its play. A number of experiments with parallel algorithms for searching game trees are describedin Ref. 6. Two versions of a checkers-playing prog1am are compared each using a different parallel algorithm for tree search. The programs were tested on an experimental parallel com-
Al lN CHEMISTRY, puter. With the exception of the opening game, where all moves appear to be equally good, the results indicated that a parallel implementation of the alpha-beta algorithm was especially effective in reducing the running time as well as the total number of nodes examined and the total number of terminal nodes evaluated.
BIBLIOGRAPHY 1. P. C. Jackson, Introduction to Artificial Intelligence, Petrocelli, New York, 1974. 2. A. Barr and E. A. Feigenbaum, The Handbook of Artificial Intelligence,Vol. 1, Kaufmann, Los Altos, CA, 1981. 3. A. L. Samuel, SomeStudies in Machine Learning Using the Game of Checkers,in E. A. Feigenbaum and J. Feldman (eds.),Computers and Th.oughf,McGraw-Hill, New York, pp. 71-105, 1963. 4. A. L. Samuel, "Some studies in machine learning using the game of checkers.Il-Recent progress,"IBM J. Res. Deuelop.,11(6), 601-617 (November1967). 5. A. K. Griffith, "A comparison and evaluation of three machine learning procedures as applied to the game of checkers," Artif. Intell., 5, L37-148 (1974). 6. S. G. Akl and R. J. Doran, A Comparisonof Parallel Implementations of the Alpha-Beta and Scout Tree Search Algorithms using the Game of Checkers, in M. A. Bramer, €d., Computer-Game Playing: Theory and Practice, Ellis Horwood, Chichester, LJ.K., pp. 290-303, 1983. 7. W. F. Ryan, PIay Winning Checkers, Coles, Toronto, Canada, 1978. 8. B. G. Buchanan, T. M. Mitchell, R. G. Smith, and C. R. Johnson, Jr., Models of Learning Systems,in J. Belzer,A. G. Holzman and A. Kent (eds.),Encyclopediaof Computer Scienceand Technology, Vol. 11, Marcel Dekker, New York, pp. 24-51, 1978. 9. P. McCorduck, Machines Who Think, Freeman, San Francisco,pp. 149-153, 1979. 10. A. L. Samuel, AI, Where It Has Been and Where It Is Going, Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, August 1983, pp. 1L52-1157.
93
are pattern recognition (qv) techniques and learning machines. To avoid the difficult problem of defining AI exactly, this entry is limited to work that uses expert systems (qv) llogical inference (qv)1, symbolic manipulation, and naturalIanguage interpretation techniques (seeNatural-langu ageunderstanding). Within these three areas AI technologieshave made many contributions to the practice of chemistry. Work in applying AI technologiesto chemistry has recently expanded beyond the traditional academic environment. Although academia continues to developnew techniques,industry has begun to apply older, more developedtechniques to solve their problems. Vendors are using AI techniques to enhance existing products and develop new ones. Other industrial researchersare developing proprietary systemsin an attempt to gain a competitive advantage. Endeavors covered in this article can be divided into six general categories: natural and chemical langu&B€, organic synthesis planning, chemical structure elucidation, improving chemical instrumentation, symbolic algebraic manipulation, highly specificexpert systemsthat do not fall into the above categoriesand probable directions for future work. Natural-and Chemical-Language Applicationsin Chemistry
Natural-langu age applications may be divided into two classes:the "language" of chemical structures, substructures, and reactions and the methods used to convert between that "language" and English sentences.Chemical structure language requires a method for representing molecules in the computer and a syntax for manipulating those representations (3,4). This language requirement was clearly defined by researcherswho were storing chemical information in computer files. Wiswesser line notation (5,6) and its derivatives were developedto uniquely define chemical structures as a string of SsLrNr G. Axr Queen'sUniversity characters. Each character represents a specificfragment of a molecule, allowing the computer to "recognize" a molecule. An alternative approach uses graph theory, defining moleCHEMISTRY, AI IN cules as vertices and the connectionsbetween them (7). This approach creates a connection table, or matrix, whose rows Chemistry was one of the first disciplines, aside from computer and columns refer to atoms. The values stored in the matrix science,to actively engage in research on AI techniques. The describe the type of connection between atoms. Syntax rules first chemistry AI project was the DENDRAL project at Stan- can be defined for manipulating these computer representaford University. This project began in 1964, involved more tions, allowing substructures of molecules to be defined and than 50 researchers,and producedmore than 100 articles and matched. Computer representations of molecules have led to 2 books. DENDRAL began with the goal of automatic inter- chemical databases of molecular representations. Molecules pretation of mass spectral data. However, during its 23-year may be graphically entered into a computer and the database history, it has also investigated other topics, including auto- searched for "substructures." The popularity of these datamated learning (qv), computerized representation of chemical baseshas created an entire industry to fill the demand (8). structures, applications of graph theory, exhaustive chemical The seconduse of natural-language techniques is more fastructure generation, and proton and 13CNMR interpretation. miliar. Natural-language systems attempt to understand EnA detailed account of the project can be found in Refs. 1 and 2. glish sentencesthat contain chemical information. These sysDefining which computer systemsshould be classifiedas AI tems function as user-friendly interfaces to other chemical systems has remained a problem throughout AI's history. In expert systems or as intelligent interfaces to chemical datachemistry this problem is compoundedbecausemany AI appli- bases.Understanding English-phrased questions about datacations rely heavily on numerical algorithms, and many appli- bases are simpler problems than understanding problems recations that exhibit Al-like characteristics are completely nu- lating to chemistry. Commercial natural-language systems merical rather than symbolic in nature. Examples of the latter can perform some of these tasks in chemical applications, but
94
AI IN CHEMISTRY,
several systems have been developed specifically for chem-
istry. Chemical Abstracts Service is investigating automatic keyword indexing ofpapers based on a computer interpretation of the text (9-11). Other work has focused on searching the chemical literature, basedon an interpretation ofthe text (12). Chemical ReactionSynthesis Chemical reaction synthesis is one ofthe oldest applications of AI in chemistry, beginning in 1967 (13,14). These programs attempt to design a sequenceof chemical reactions that would result in a "target" molecule. This early work was based on a chemical synthetic tree. The target moleeule was decomposed into its potential precursors using every possible single-step chemical synthesis. Each precursor was further decomposed into earlier antecedents, thus creating a synthetic tree. This decomposition led to an impossibly Iarge number of potential synthetic paths. The deeper into a synthetic tree one proceeded, the more the number of potential paths multiplied. Initially, the selection of the best branch at each junction in the tree required the chemist's intervention in the program. This was necessary to limit the number of pathways and was accomplished by using interactive computer graphics to display the potential paths, and the "best" precursor was selected by the chemist (15). The program, Simulation and Evaluation of Chemical Synthesis (SECS), improved the interactive graphics by enabling the chemist to select the reaction path and broadenedthe computer knowledge base by adding stereocherrlical reactions and displays (3'16). One technique, which eliminated the need for a chemist's intervention to find a synthetic route, incorporate{ synthetic rules and heuristic programmin1 (J7). The program uses only the molecular substructures of the "target" molecule that participate in the available synthetic methods. Heuristic rules determine the correct sequence ofreactions required to protect any reactive functional groups on the target. A secondtechnique to eliminate a chemist's intervention is based on the principle of minimum chemical distance (18). This technique allows the program to eliminate synthetic routes that are not likely to be useful. Future developments for this method include the potential of predicting chemical reactions that are not known today (19). Expert systems techniques have also been applied to this problem. These methods, before further evaluation, reduce chemical reactions to more general axioms. SYNLMA (20) uses theorem-proving (qv) techniques to design organic synthesis. QED (21) applies multivalue-logic predicate calculus (qv) with axioms to select a plan to choose among the possible precursors. ChemicalStructureElucidation Structure elucidation is a prime area for AI applications because it requires both scientific expertise and problem-solving capabilities. Information on molecular formula and structural fragments generally comes from spectral interpretation but from any source at the chemist's disposal. Internal ".r, "o-" checks combine data from multiple sourcesto reconsistency solve conflicting information about the presenceor absenceof particular fragments. Enumeration programs conneet the remaining fragments to obtain all chemically possible molecules. Those structures are ranked based on such properties as comparison of predicted and observed spectra and steric
stresses.The chemist must then devise a way to distinguish between the remaining candidate structures. Determinationof Structural Fragments.There are three approaches to identifying compounds using spectroscopicdata. The oldest method, library searching, comparesthe unknown spectrum with a collection of known spectra. This is straightforward, but it becomesimpractical as the size of the reference libraries increases.In addition, library searching cannot identify compounds not in the library, such as newly synthesized compounds. The second approach, pattern recognition, compares the unknown spectrum with "patterns" that are characteristic of classesof compounds.This solves the two problems library searching presents but requires a substantial number of spectra for each class of compounds to be recogntzed.AI avoids these problems by interpreting spectra using the rules a spectroscopistwould use. AI techniques have an advantage over spectroscopistsbecausethe computer system doesnot forget or confuse information. Unfortunately, AI systems do not have all the knowledge known to the spectroscopist.AI structure elucidation systems are comparable in performance to a postgraduate spectroscopist(22). Below, the approachestaken to interpret the various types of spectral data are described. Infrared Spectroscopy.Spectroscopistshave known for some time that certain functional groups and substitution patterns have characteristic absorptions in the ir. These patterns are documented in standard Colthup Charts. Early work attempted to computenze these tables to automatically interpret ir spectra (23,24). These programs must be able to deal with the following problems: Many functional gloups absorb at each frequency; functional groups can cause more than one absorption; and the solvents used can shift or mask peaks. Recent work Qil focusedon reducing the task of codifying the rules required to identify the functional groups of interest. This approach extracts rules automatically from the spectraof known compounds. Mass Specfroscopy. The major work on MS interpretation was done through the DENDRAL project. The first step in interpreting ms data was to determine, basedon known masscharge ratios, the probable molecular formula of an ion. The rules determining how a molecule will fragment were also codified. Using these rules, the mass spectrum of a candidate structure could be predicted. These spectra are compared to the unknown's mass spectrum to determine the likelihood that the candidate structure was the correct one. Later work from the DENDRAL project automatically determined molecular fragmentation rules using the mass spectra of known compounds.This work was called meta-DENDRAL (26). A recent instrumental development is ms/ms which takes the ms of each peak in the original ms. Determination of the structure of each fragment from the original ms provides a unique way to preform internal consistency checking using the data from only one spectrometer (27). Nuclear Magnetic Resonance. Work in nmr has included both 1H and 13Canalysis. Proton nmr is similar to ir in that functional groups tend to absorb in certain regions of the spectrum. The ranges of possibte absorptions for each functional group makes lH nmr more suitable for eliminating functional groups determined from another sourcethan for generating a iirt of fragments (28). 13Cnmr, a more recent development,is very sensitive to the environment of the resonating carbon. This sensitivity causes every structurally unique carbon to resonate at a different frequency. The structural equivalence extends two or three bonds in all directions. Structurally
AI IN CHEMISTRY,
equivalent carbons in different molecules absorb at similar frequencies. 13Cnmr is, therefore, very good at generating a list of molecular fragments that are present. Work has been done to determine 13Cnmr interpretation rules (29); however, the general method used is a library search. The library, in this case,is composedof molecular fragments and the characteristic absorption (30). Becausethe characteristic absorptions overlap, a list of possible functional groups is generated for each nmr peak. X-Ray Powder Diffraction. Peak heights in x-ray powder diffraction are proportional to concentration, and therefore quantitative analysis is possible. Variations in relative peak heights from laboratory to laboratory and day to day preclude the use of simple least-squares fitting of reference spectra to the unknown. An expert system was developedto use the same knowledge that a mineralogist uses to solve this problem (31,32).This work was repeated using several different expert system development tools (EXPERT, UNITS, EMYCIN, OPS5) and LISP. The study concluded that all the development tools and LISP had their shortcomings. The convenienceof the expert system development tools can lead to restrictions that prevent the program from completely solving the problem. LISP, on the other hand, requires a great deal of progfamming effort. Fortunately, the knowledge base is generally easier to translate from one system to another than it is to extract from the expert. Internal ConsistencyChecks.Internal consistency checks eliminate fragments that are inconsistent with the other fragments present. This elimination processhelps to reduce the combinatorial explosion of possiblestructures. One of the most effective ways to provide this check is to combine information from different sources.As an example, one possiblesourceof a 13Cnmr peak involves a carbonyl carbon, but there is no carbonyl absorption in the ir spectrum. These cross-checkscan also come from different peaks in the same spectrum. Conversely, consistency checks can also increase the confidencein the presence of a fragment if more than one source of data suggestsits presence. techStructureEnumeration.Oncethe various spectroscopic niques have generated a list of molecular fragments, they must be connectedto form possiblestructures for the unknown compound. The generation of possible structures must be exhaustive and nonredundant. Several problems arise during this procedure.The most serious problem is combinatorial explosion. Sometimes, there may be millions of possible structures. Enumeration programs must be able to eliminate chemically impossible structures and allow the chemist to eliminate other highly unlikely structures. Elimination is usually done at each step of a depth-first search to "prune" the search tree when possible. Another problem is that the list of fragments may not be complete and/or may contain ambiguities. The programs must handle multiple possible starting points and recognizewhen and where the assembledfragments begin to overlap. Additionally, the input fragments themselves may overlap one another (33). The enumeration program must also recognizethe existence of stereoisomersand be able to distinguish between stereoisomers(34-36). Rankingof CandidateStructures.There are two methods of ranking candidate structures. The first method compares the unknown's spectrum with a predicted spectrum for the candidate structure. This is the approach taken by DENDRAL by
using mass spectra.The secondmethod, which is generally not as useful, discriminates against structures that are highly strained, such as three-membered rings. In the end, the chemist must devise physical tests to distinguish between remaining candidates. Below is a comparison of packages developed to address various portions of the structure elucidation process. DENDRAL: Uses mass spectral data and 13Cnmr data. All other constraints on structures must be deduced by the chemist. It excels in the structure generation process,handling stereoisomers and overlapping substructures (37). The candidate-testing procedure is built into the program (L,2). CHEMICS: usesmass spectral, tH and 13Cnmr, and ir data. It is a fully integrated package written in FORTRAN. It is limited to moleculescontaining only carbon,hydrog€tr,and oxygen. The structure generator cannot handle overlapping substructures (38- 42). CASE: Uses 13Cnmr and ir data. It is written in FORTRAN and cannot handle overlapping substructures (43-46). SEAC: Uses ir, 1H nmr, and uv data. The structure generator cannot handle overlapping substructures (47,22,4L). STREC: Uses mass spectral, nmr, ir, and uv data. The program is written in FORTRAN and cannot handle overlapping substructures (48). B. Curry: Uses mass spectral, ir, and uv data. The package concentrateson the internal consistencycheck and conflict resolution (49). PAIRS: Is strictly an ir interpretation program, its strength being the ability to easily add new rules (25,32,50-55). EXMAT: Uses ir and mass spectral data. Its strength is the ability to design the entire analysis and use other chemometric techniques to solve parts of the problem (56).
lmprovingChemicalInstrumentation Infrared Spectroscopy.The ir interpretation program PAIRS has been incorporated into at least one vendor's Fourier transform ir spectrometer(54). This gives the spectroscopicstructural information for those spectra not found in the limited spectral library available on the spectrometer. The program is also being made available through QCPE (55). Mass Spectroscopy.One of the most complex mass spectrometers today is the triple quadrupole mass spectrometer. The spectrometer is completely computerizedwith more than 30 controllable parameters. Tuning the spectrometer for an optimum signal requires a high level of operator expertise. The MS signal must be maximized, and the peak shape must be evaluated. An expert system was developedusing the KEE expert system development facility to automatically tune the instrument (57,32). The expert system is capable of outperformin g a simplex optimization but is not quite as good as a competent operator. Chromatography.Expert Chromatography Assistance Team (ECAT) is an expert system designedto aid chemists in developing liquid chromatography methods (32,58). Liquid chromatography design involves analyzirg, optimizing, and troubleshooting a particular separation. The expert system includes general chromatographic knowledge, specific litera-
AI IN CHEMISTRY,
ture references,an experiment designer, and chromatography data analysis. Ultracentrifugation. Ultracentrifugation is a technique for separating biological samplesby their density. Most researchers consider it simply a tool and are not interested in the intricacies of the separation process.The expert system SpinPro questions the user on their research goals and recommends the optimum set of operating conditions (59). The operating parameters include rotor type, run speed, run time, gradient material, and gradient concentration. SpinPro also recommendsthe best set of conditions using only equipment available in the researcher's lab and details the results of those compromises.This expert system can be run on an IBM personal computer. ProcessControl. The monitoring and control of chemical processsystems is a new area for expert systems.Control systems are directly connectedto instruments that monitor the temperatures, pressures, concentrations, and other variables of the process equipment. These measurements are used to predict the development of the chemical processand, if problems are discovered,modify the controlling instrumentation to correct the process.These control systems are specificto each process and require extensive measurements of the chemical system and knowledge of processbehavior when the controlling instrumentation is varied (60,61). ComputerAlgebraApplications The fundamental theories in chemistry can be described by mathematical equations, which can be quite complex. Many chemical problems can be solved numerically using these equations (62,63).The solution of these equations has, however, been greatly simplified by the development of symbolic algebra packages (64). These packages solve complex equations analytically instead of using numerical approximations. As early as 1954 symbolic algebra techniques were applied to problems in quantum chemistry (65). However' only recently have symbolic algebra programs becomepopular (66). This delay was caused by the high cost of computers powerful enough to run the software and the availability of commercial packageswith full user support. The applications of symbolic algebra have becomeso diverse that an entire symposium was devotedto the subject at the August 1984 American Chemical Society national meeting. Twenty-two papers were presented at the symposium (67). There are five commercially available packages that perform algebraic, calculus, and differential manipulations. They are MACSYMA (68,69),also available through EDUNET and ARPANET; REDUCE (70);MAPLE (7r); SMP (72),written in C for speedof execution; and muMath (66,73),a microcomputer version. A useful feature of several of these packages is their ability to translate their answers into FORTRAN code, which can be used by numerical programs. Miscellaneous Computer-Aided Education.A chemical tutor called GEORGE has been developedg4). GEORGE is an expert system being developedto understand dimensional analysis problems dealing with basic chemistry. The program knows how to manipulate the dimensions of physical properties (such as
moles, density, concentration) and conversionfactors to different units of measure. Using these, the program can solve a wide variety of problems. The unique feature of this program is that the student specifiesthe problem, and GEORGE explains the solution with text and diagrams. Formulationof Agricultural Chemicals.Biologically active chemicals must be combined with various other chemicals to make a commercial product with the desired application characteristics. An expert system developedto assist in this processtakes into account cost, marketing, legal, chemical, and end-useconsiderationsin determining the "best" formulation (75). The program also uses several FORTRAN programs that calculate chemical parameters necessaryto the decision-making process. Analysisof Water Chemistryin SteamPower Plants.Corrosion becauseof improper water and steam chemistry is a major cause of downtime at steam power plants. This corrosion can cost a company approximately $1 million (106) per day. An expert system has been developedthat receivesdata from both the operator and remote chemical sensors and recommends corrective measures, if necessary, or the likely result if no action is taken (76,77). ExperimentalDesign. Deciding what experiments are required to answer a particular question is a pervasive problem in chemistry. Several expert systems have been developedto solve the following problems: determination of intracellular Mgr* levels, derivin g enzymekinetic models to fit experir4ental data, design of experiments to determine safety and efficacy of drugs (?8), determination of the number of analyses (including blanks) that must be done for environmental water analysis (79), and design of molecular geneticscloning experiments (80). MacromolecularStructureDetermination.A recently developedexpert system allows the construction of moleculesusing heuristic rules. The system creates a three-dimensional protein based on the protein amino acid sequence(81). This system includes heuristic rules that determine when the protein sequenceturns on itself and for determining which sequences form alpha- and beta-sheets(82). Similarly, another progIaffi, Artificial Intetligence in Model Building (AIMB)' has been written for creating three-dimensional molecular models.This program can construct the three-dimensional model of a molecule from a two-dimensional drawing faster than a chemist can using mechanical models (83). FutureApplications Computer software is dramatically increasing its penetration into the chemist's laboratory. The volume and sophistication of software is exceeding the chemist's desire and ability to keep curyent. However, the integration of numerical software, graphical displays, and expert systems promises to revolutionize the practice of chemistry. Expert systemswill build on the vast library of existing chemical software and make these technologiesavailable to the practicing chemist (84). Integration of these techniques will lead to "intelligent computer assistants" for every chemist. There will be structure elucidation assistants for analytical chemists, process control assistants for chemical engineers (85), experimental design assistants for organic chemists, and mathematical as-
CHEMISTRY,AI IN
sistants for physical chemists. However, before these assistants can be built, each different rule-base must be further developed. This collation and development of chemical relationships, expressible as heuristic rules, is underway today in both academia and industry. It is thought that the intelligent application of these rules through expert systems can reduce problems that lead to combinatorial explosions of possible solutions. Problem simplification of this type will be necessary before the huge amount of chemical information that exists today can be integrated into intelligent assistants. BIBLIOGRAPHY 1. R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg, Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project, McGraw-Hill, New York, 1980. 2. R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg, Applications of Artificial Intelligence for Chemical Inference: The DENDRAL Project, NlcGraw-Hill, New York, 1980. 3. W. T. Wipke, S. R. Heller, R. J. Feldman, and E. Hyde (eds.), Computer Representation and Manipulation of Chemical Information, Wiley, New York, pp. I47-L74, 1974. 4. H. W. Whitlock, "An Organic Chemists View of Formal Language," in T. W. Wipke and J. Howe (eds.), Computer Assisted Organic Synthesis, ACS Symposium Series 61, American Chemical Society, Washington, DC, 1977. 5. W. J. Wiswesser, A Line-Formula Chemical Notation, Thomas Y. Crowell Comp&Dy, New York, L954. 6. E. G. Smith , The Wiswesser Line-Formula McGraw-Hill, New York, 1968.
Chemical Notation,
7. S. H. Bertz, W. C. Herndon, and G. Dabbagh, On the Similarity of Graphs and Molecules, Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, American Chemical Society, Washington DC, 1985. 8. J. E. Gordon and J. C. Brockwell, "Chemical Inference," J. Chem. Inf. Comput. Sci. 23, II7 (1983). 9. M. Moureau, A. Girard, and J. Delaunay, "Natural language bibliographic searches. PRETEXT program," Reu. Inst. Fr. Petrole Ann. Combust. Liq. 25(L0), 1117-1143 (1970). 10. S. M. Cohen, D. L. Dayton, and R. Salvador, "Experimental algorithmic generation of articulated index entries from natural language phrases at Chemical Abstracts Service ," J. Chem. Irf. Comput. Sci. 16(2), 93-99 (1976). 11. K. H. Baser, S. M. Cohen, D. L. Dayton, and P. B. Watkins, "Online indexing experiment at Chemical Abstracts Service: Algorithmic generation of articulated index entries from natural language phrases," J. Chem. Inf. Comput. Sci. lB(1), 18-25 (1978). .r2. P. J. Smith, D. A. Krawczak, and S. Shute, EP-X: A KnowledgeBased System to Aid in Bibliographic Searches of the Environmental Pollution Literature, Artificial Intelligence Applications in Chemistry, American Chemical Society Meeting, Chicago, IL, Sept., 1985. 13. E. J. Corey, "General methods for the construction of complex molecules," Pure Appl. Chem. 14, 19 (lgGZ). 14- E. J. Corey and T. W. Wipke, "Computer-assisted design of complex organic synthesis," ,SclencelGG, 128 (1969). 15. E. J. Corey, w. T. Wipke, R. D. Cramer, III, "Computer-assisted synthetic analysis", J. Am. chem. soc. g4(z), 4zL (Lg72); E. J. Corey, A. K. Long, and S. D. Rubenstein, "Computer-assisted analysis in organic synthesis," Science 2281 408 (1985). 16. T. Wipke and T. Dyott, "simulation and evaluation of chemical synthesis," J. Am. Chem. Soc. gG(15), 4B2S (LgT4).
97
17. P. E. Blower,Jr. and H. W. Whitlock, Jr., "An applicationof artificial intelligence to organic synthesis,"J. Am. Chem. Soc.,98(6), 1499-1510(1976). 18. C. Jochuffi, J. Gastiger, and I. Ugi, "The principle of minimum chemical distance,"Angew. Chem.Int. Ed., 19, 495 (1980). 19. J. Gasteiger,M. G. Hutchings, P. Low, and H. Saller, The Acquisition and Representationof Knowledge for Expert Systems in Organic Chemistry, Artificial IntelligenceApplications in Chemistry, ACS SymposiumSeries 306, American Chemical Society,Washington, DC, 1985. 20. T. Wang, I. Burnstein, S. Ehrlich, M. Evens, A. Gough, and P. Johnson, Using a Theorem Prover in the Design of Organic Synthesis,Artificial IntelligenceApplications in Chemistry,ACS Sy-posium Series306, American Chemical Society,Washington,DC, 1985. 2L. D. P. Dolata, QED Automated Inference in Planning Organic Synthesis, Ph. D. Thesis, University of California, Santa Cruz, 1984. 22. Z. Hippe, "Problems in the application of AI in analytical chemist y," Anal. Chim. Acta, 150, LL-?I (1983);T. Monmaney,"Complex window on life's most basic molecules,"Smithsonian, ll4 (July 1985). 23. B. Schradeet al., "Automatic reduction and evaluation of IR and Raman spectra:' F. Z. (Fresenius'Zeitschift) Anal. Chem., 303, 337-348 (1980). 24. H. B. Woodruff and M. E. Munk, "Computer-assistedinterpretation of IR spectra,"Anal. Chim. Acta 95, 13-23 (1977). 25. S. A. Tomellini, R. A. Hartwick, J. M. Stevenson,and H. B. Woodruff, "Automated rule generation for PAIRS," Anal. Chim. Acta, t62, 227-240 (1984). 26. B. G. Buchanan,D. H. Smith, w. C. white, R. J. Gritter, E. A. Feigenbaum,J. Lederberg,and C. Djerassi,Applications of artificial intelligence for chemical inference.22. Automatic rule formation in mass spectrometry by means of the meta-DENDRAL program," J. Am. chem. Soc.,98(20),6168-61?8 (19?6);D. Lindsay et al., Applications af Artificial Intelligence in Organic Chemistry: The Dendral Projecf, McGraw-Hill, New York, 1981. 27. K. P. cross, A. B. Giordani, H. R. Gregg, P. A. Hoffmann, c. F. Beckner, and C. G. Enke, "Automation of structure elucidation from mass spectrometry-mass spectrometry data," Artifi,cial IntelligenceApplications in Chemistry, ACS Symposium Series 306, American Chemical Society,Washington, DC, 1985. 28. H. Egli, D. H. Smith, and C. Djerassi,"Computer assistedstructural interpretation of proton NMR spectral data," Helu. Chim. Acta, 65, 1898-1919(1982). 29. T. M. Mitchell and G. M. Schwenzer,"Applications of artificial intelligence for chemical inference.XXV. A computer program for automated empirical t3C NMR rule formatior," org. Magn. Reson.,11(8),378-384 (1978). 30. M. R. Lindley, N.A.B. Gray, D. H. Smith, and C. Djerassi,"Applications of AI for chemical inference 40. Computerizedapproachto verification of r3CNMR spectralassignments,"J. Org. Chem.,47, L027_1035 ( 1982). 31. S. P. Ennis, Expert Systems,A {Jser'sPerspectiveof Some Current Tools, Proceedingsof the SecondNational Conferenceon AI, Pittsburgh, PA, pp. 319-321, 1982. 32. R. E. Dessy (ed.;, "Expert systems part II,,, Anal. Chem., 56(L2), 1312A-1332A(1984). 33. R. E. carhart, D. H. smith, N. A. B. Gray, J. G. Nourse, and c. Djerassi, "GENOA: A computer program for structure elucidation utilizing overlapping and alternative substructures," J . Org. Chem.,46, t708-1718 (1981). 34. J. G. Nourse, R. E. Carhart, D. H. Smith, and C. Djerassi, "Exhaustive generation of stereoisomersfor structure elucidation," J. Am. Chem.Soc.,l0l, L2I6-I228 (lg7g). 35. J. G. Nourse, "The configuration symmetry group and its applica-
CHEMISTRY, AI IN tion to stereoisomergeneration, specification,and enumeration," J. Am. Chem.Soc.,l0l(5), I2I0 (1979). 36. J. G. Nourse, D. H. Smith, R. E. Carhart, and C. Djerassi,"Computer assisted elucidation of molecular structure with stereochemistry," J. Am. Chem. Soc., t0z,6289-6295 (1980). 37. GENOA, Molecular Design Ltd., Hayward, CA, 1982. 38. I. Fujiwara, T. Okuyams, T. Yamasaki, H. Abe, and S. Sasaki, "Computer-aided structure elucidation of organic compounds with the CHEMICS system," AnaI. Chem. Acta, 133, 527-533 (1 9 8 1 ) . 39. S. Sasaki,H. Abe, I. Fujiwara, and T. Yamasaki, "The application of 13CNMR in CHEMICS, the computer program systemfor structure elucidation," Stud. Theor. Chem., 16, 186-204 (1981). 40. S. Sasaki et al., "CHEMICS-F: A computer program system for structure elucidation of organic compounds,"J. Chem. Inf. and Comp.Scl., 18(4),2II (1978). 4I. S. Sasaki,H. Abe, I. Fujiwara,T. Yamasaki,Z.Hipp€, B.Debska, J. Duliban, and B. Guzowska-Swider,"Recent problemsof application of artificial intelligence in computer-aided elucidation of chemical structures," Chem. Anal. (Warsaw), 27(3-4),, 171-181 (1982). 42. H. Abe, T. Yamasaki, I. Fujiwara, and S. Sasaki, "Computer aided structure elucidation methods," Anal. Chim. Acta, 133, 49995006 (1981). 43. M. E. Munk, C. A. Shelley,H. B. Woodruff, M. O. Trulson, "Computer assistedstructure elucidation,"F. Z. Anal. Chem.,3l3, 473479 (1982). 44. C. A. Shelley and M. E. Munk, "CASE, a computer model of the structure elucidation process,"Anal. Chim. Acta, 133, 507-516 (1 9 8 1 ) . 45. M. O. Trulson and M. E. Munk, "Table driven procedure for IR spectrum interpretation," AnaI. Chem., 56, 2137-2142 (1983). 46. A. H. Lipcus and M. E. Munk, "Combinatorial problems in computer assistedstructural interpretation of C-13 NMR spectra," J. Chem.Inf. Comput. 9ci.,25,34-45 (1985). 47. B. Debska,J. Duliban, B. Guzowska-Swider,and Z. Hippe, "Computer aided structural analysis of organic compoundsby an AI system,"Anal. Chim. Acta, 133, 303-318 (1981). 48. L. A. Gribov, M. E. Elyashberg, and V. V. Serov, "Computer system for structure recognition of polyatomic molecules by IR, NMR, UV, and MS methods,"Anal. Chim. Acta, 95, 75-96 (1e77). 49. B. Curry and J. A. Michnowrcz, An Expert System for Organic Structure Determination , Artifi,cial Intelligence Applications in Chemistry, ACS SymposiumSeries 306, American Chemical Society, Washington, DC, 1985. 50. G. M. Smith and H. B. Woodruff, "Development of a computer language and compiler for expressingthe rules of IR spectral interpretation," J. Chem. Inf. Comput. Sci. 24, 33-39 (1984). 51. S. A. Tomellini, J. M. Stevenson,and H. B. Woodruff, "Rules for computerized interpretation of vapor phase IR spectra," Anal. Chem., 56, 67 -70 (1984). 52. H. B. Woodruffand G. M. Smith, "Generating rules for PAIRS-A computerized IR spectral interpreter," Anal. Chim. Acta, 133' 545-553 ( 1981). 53' H' B' woodruffand G' M' smith"'computer program for the analysis of IR spectra,"Anal. Chem., 52, 2321-2327 (1980). 54. H. B. Woodruff et al., "Automated interpretation of IR spectra with an instrument basedminicomputer," Anal. Chem.,53r 23672369 (1981). b5. H. B. Woodruff and G. M. Smith, "Program for the analysis of ir spectra GAIRS) (QCPE 426)," QCPE Bull. 1, 58 (1981). 56. S. A. Liebmar, P.J. Duff, M. A. Schroeder,R. A. Fifer, and A- M. Harper, ConcertedOrganic Analysis of Materials and Expert Sys-
tem Development,Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, 1985. 57. C. Wong and S. Lanni^g, "AI in chemical analysis," Energ. Technol. Reu., Lawrence Livermore National Laboratory, Berkeley, CA, February 1984. 58. J. Karnicky, R. Bach, and S. Abbott, An Expert System for High PerformanceLiquid Chromatography Methods Development, Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, Washington, DC, 1985. 59. P. R. Martz and M. Heffron and O. M. Griffith, An Expert System for Optimizing Ultracentrifugation Runs, Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, 1985. 60. R. L. Moore, C. G. Knickerbocker, and L. B. Hawkinson, A Real-Time Expert System for Process Control , Artificial Intelligence Applications in Chemistry, ACS Symposium Series 306, 1985. 61. E. A. Scarl, J. R. Jamieson,and C. I. Delaune, "Processmonitoring and fault location at the Kennedy Space Center," SIGART Newlett.,93, 38 (1985). 62. Quantum Chemistry Program Exchange (QCPE), Dept. of Chemistry, Indiana University, Bloomington, IN 47405 812-3354784. 63. S. A. Borman, "scientific software," Anal. Chem., 57(9), 983A ( 1985). 64. J. F. Ogilvie, "Applicationsof computeralgebra in physical chemistry," Comput.Chem.,6, 169-I72 (1982). 65. S. F. Boys, B. G. Cook, C. M. Reeves,and I. Shavitt, Notu,re,178' 1207-1209 (1956). 66. C. S. Johnson, "Computer algebra in chemistry," J. Chem. Inf. Compuf.Sci. 23, 151-L57 (1983). 67. R. Pavelle (ed.),"Application of Computer Algebra," Kluwer, Boston, MA, 1985. 68. MACSYMA ReferenceManual, MIT Mathlab Group, Cambridge, MA, 1977. 69. MACSYMA Primer, MIT Mathlab Group, Cambridge, MA, 1982. 70. A. C. Hearn (ed.),REDUCE User's Manual, Version 3.0, Rand Publication CP78(4/83),The Rand Corp.,Santa Monica,CA, 1983. 7L. K. O. Geddes,G. H. Gonnet, and B. W. Char, MAPLE User's Manual, Znd. ed., University of Waterloo, Waterloo, Ontario, Canada. 72. C. A. Cole,S. Wolfram, et al., SMP Handbook,Caltech,Pasedena, cA, 1981. 73. G. Williams, "Mu Math-?9 Symbolic Math System," BYTE 11' 325-338 (1980);and D. Stoutemyer, "A Preview of the Next IBMPC Version of Mu Math," in G. Goos and J. Hartmanis (eds.), Eurocal'85, Springer-Verlag,New York, 1985. 74. R. Cornelius, D. Cabrol, and C. Cachet,Applying the Techniques of Artificial Intelligence to Chemical Education, Artifi.cial Intelligence Applications in Chemistry, ACS Symposium Series 306, American Chemical Society,Washington, DC, 1985. 75. B. Hohne and R. Houghton, An Expert System for the Formulation of Agricultural Chemicals,Arf ifi.cialIntelligenceApplications in Chemistry,ACS SymposiumSeries 306, 1985. 76. J. C. Bellows, "An artificial intelligence chemistry diagnostic system, Proc.45th Int. Water Conf.Eng. Soc.,Western PA, pp. 15-25' 1984. 77. J. C. Bellows, Chemistry Diagnostic System for Steam Power Plants, Artificiat Intelligence Applications in Chemistry, ACS SymposiumSeries306, 1985. 78. D. Garfinkel, L. Garfinkel, V. W. Soo,and C. A. Kulikowski, Interpretation and Design of Chemically Based Experiments with Expert Systems, Artificial Intelligence Applications in Chem,istry, ACS SymposiumSeries306, 1985.
CHURCH'STHESIS 79. H. L. Keith andJ. D. Stuart, A Rule Induction Program for Quality Assurance-Quality Control and Selection of Protective Materials", Artificial IntelligenceApplications in Chemistry, ACS Symposium Series 306, American Chemical Society,Washington, DC, 1985. of Symbolic Computation 80. P. Friedland, MolcEN-Applications and Artificial Intelligence to Molecular Biology, in M. Keenberg (ed.), Proceedingsof the Battelle Conferenceon Genetic Engineering, Vol. 5, Battelle Seminar Studies Progr&ffi, Seattle, WA, pp. 171-182,1981. 81. F. E. Cohen,R. M. Abarbanel, I. D. Kuntz, and R. J. Fletterick, "secondary structure assignment for alpha/beta proteins by a combinitorial approach,"Biochem. 22, 4894 (1983). 82. I. D. Kuntz, private communication,Univ. of Cal., San Francisco, 1985. 83. W. Wipke and M. A. Hahn, Analogy and Intelligence in Model Building, Artifi,cial Intelligence Applications in Chemistry, ACS Symposium Series 306, American Chemical Society,Washington, DC, 1995. 84. R. Banares-Alcantara, A. W. Westerberg, and M. D. Rychener, "Development of an expert system for physical property predictions," Comput. Chem.Eng., 9(2), L27-L42 (1985). 85. K. Brooks, Chem. Wee&,38-39 (Sept. 10, 1986). B. HouNE and T. Pmncn Rohm and Haas Co.
4.5 CHESS Chess 4.5 is a chessprogram (see Computer chessmethods) that uses a method called "iterative deepening" to determine its next move. It is a brute force method that doesexhaustive search, first to the secondlevel and then redone to the third level. It continues on with this iteration until a fixed time limit is reached.The newer version is known as Chess4.7 (see D. Slate and L. Atkin, Chess4.5: The Northwestern University ChessProgram, in P. Frey (ed.),ChessSkill in Man and Machitu€,"Springer-Verlag,New York, pp. 82-118, L977). J. RospNBERG SUNY at Buffalo
CHURCH'STHESIS Church's thesis is the assertion that any processthat is effective or algorithmic in nature defines a mathematical function belonging to a specific well-defined class, known variously as the recursive, the L-definable,or the Turing computable functions. These terms originated in the 1930s to designate what appeared superficially to be three quite different notions: Gridel's characterization of functions definable by means of recursive definitions of the most general kind (see Recursion), Church and Kleene's notion of functions definable using the ).operation (subsequently incorporated by John McCarthy into the LISP programming language (seeLISP)), and Turing's notion of a function computable by an abstract computing device (seeTuring machine). However, it was very soonseenthat the three notions define the very same class of functions. Church announcedhis proposal to identify the class of functions definable by means of an effective processwith the class of recursive functions (1), in April 1935 at a professional meeting, &
year after he had first suggested it to his student Kleene. Quite independently, Turing developed his own equivalent version during the spring of 1935.Godel,who had been skeptical of Church's arguments in favor of his thesis, was fully convinced by Turing's work. Gcidethad made use of a more restricted class of functions, later called primitive recursive, in his famous work on undecidability. The fact that there were functions like Ackermann's that were clearly definable by recursive means but were not primitive recursive led Gridel to attempt to charactenze recursive definitions in general. Using a suggestion of Jacques Herbrand, G6del was led to his class of general recursive functions. Godel went so far as to suggest,in lectures at the Institute for Advanced Study in Princeton in L934, that if this definition really included all possible recursive definitions, then all functions computable by "finite procedures" would be general recursive, but he was not yet prepared to assert that his definition was really so inclusive. Meanwhile, Church and his students developed the concept of ),-definability as part of an effort to salvage a theory of the ),-operator from an ambitious system of logic developed by Church that had been proved inconsistent by his students Kleene and Rosser.Turing developed his machines in connection with his work on Hilbert's Entscheidungsproblem,the problem of finding an algorithm for testing inferencesin first-order logic (qv) for validity: Turing was able to show that no such algorithm could exist, a conclusion that Church also reached. Post, who had developed some of these ideas many years earlier, now proposeda formulation very similar to Turing's. Post's work was independent of Turing, but not of Church. When these various conceptswere proved equivalent to one another, it was clear that something of great importance had been discovered.(For a discussionand analysis of this history as well as the importance of the work of Kleene and of Post, see Ref. 2, which also contains referencesto the original literature and to other historical accounts.) Church'sThesisand Al Church's thesis has made it possible to prove the algorithmic unsolvability of important problems in mathematics, provided an important basic tool in mathematical logic, made available an array of models in theoretical computer science,and provided the basis for an entirely new branch of mathematics. However, quite apart from all this, Church's thesis provides a crucial philosophical foundation for the proposition that AI is possible and that digital computers provide an appropriate instrument for realizing it. Before World War II large-scale computing machines were conceived and built as engines of numerical calculation. After the pioneering work of Church, Gcidel,Kleene, Post, and especially Turing, it becameclear that the notion of computation includes far more than numerical calculation; indeed, it encompasseseverything expressibleas an effective processor (as Turing onceput it) by a "rule of thumb." This insight is physically embodied in the von Neumann architecture for computers. Thus, the project of producing computer programs that successfully emulate human cognitive functions (see Cognition), which seems evidently preposterous so long as a computing machine is conceived of as merely a device for carrying out numerical calculations, comesinto focus as an ultimate goal. Of course, this is precisely the goal of AI research.These same consider-
l OO
CIRCUMSCRIPTION
ations lead to the proposalthat computer programs provide an appropriate theoretical model for cognitive functions, which is the principle paradigm held forth by workers in cognitive science (3) (qv).
and specificallyin nonmonotonicreasoning(qv). McCarthy describes circumscription as a "rule of conjecture" as to what objectshave a given property P. A useful example exploited by McCarthy is the familiar "missionaries and cannibals" puzzle: Three missionaries and three cannibals must crossa river using a boat that can hold only two persons; if the cannibals Church'sThesisand Mechanism outnumber the missionaries on either bank of the river, the The belief that AI is possible,in principle, is closelyassoci- missionaries will be eaten. How can the crossing be arranged ated with a mechanist view of the human mind. Such a view safely? Now, there are numerous features of interest in the holds that the properties of "mind" are ultimately to be under- puzzle. The one of concern here is that it is in fact a puzzle, stood on the basis of the behavior of the brain (and other rele- that is, the puzzler is expected to recognize certain implicit vant organs) as a material object obeying the laws of nature. ground rules, such as that the boat doesnot have a leak or any This is opposedto a mentalist position that mental states are other incapacity for transporting people. Moreover, there are irreducible and extramaterial. Church's thesis is related to no additional cannibals or missionaries lurking in the backthese matters in several ways. Turing's version of Church's ground, who may upset otherwise sound plans, even though it thesis (called Turing's thesis in Ref. 3) identifies effectiveness was not specifically stated that there are only three cannibals with mechanical computability. Thus, Church's thesis implies and three missionaries.It is as if there is an implicit assump(as emphasizedin Ref. 4) that mechanism is incapable of being tion that if something is not mentioned in the puzzle,then it is refuted by effective means. That is, a mentalist who wishes to not to be considered,an idea sometimesreferred to as a closedclaim that someparticular human mental activity is incapable world assumption. It correspondsto minimizing the number of of being duplicated on a purely mechanical basis had best be objectshaving certain properties. In effect, one is considering very sure that this activity is uneffective. On the other hand, conjecturesthat for certain properties P, an object r does not evidenceof the extensivenessof mechanical computability, for have P unless it is required to do so. Moreover, this sort of example, Turing's construction of a "universal" machine as minimizing assumption appearsto be very useful even in nonwell as the equivalence of the various precise explications of puzzle situations. Circumscription provides one way to make effectiveness,tends to refute a mentalist critique basedon the this rather vague idea precise. alleged limitations of the purely mechanical. Even the main negative consequenceof Church's thesis, the existence of algorithmically unsolvable problehs, servesto help refute men- The Formalismof Circumscription talism. The mentalist has traditionally ridiculed the claims of Circumscription involves the use of an axiom schemain a firstmechanists by contrasting the varied, unpredictable, and com- order language (see Logic), intended to express the idea that plex behavior of human beings with the rigid and simple be- certain formulas (wffs) have the smallest possible extensions havior of clockwork automata. However, the fact that there consistent with certain given axioms. To illustrate, if B is a are problems concerning Turing machines for which no effec- belief system (qv) including world knowledge W and specific tive solutions can exist shows that computing mechanisms as domain knowledge (qr) AlPl concerning a predicate P, then it well as people can exhibit a behavioral repertoire of great may be desired to consider that P is to be minimized, in the complexity and unpredictability. sensethat as few entities .r as possible have property P as is consistent with AlPl. The world knowledge W together with AlPl and the circumscriptive schema are used to derive conBIBLIOGRAPHY clusions in standard first-order logic, which then may be added to B (hopefully consistently and appropriately). It is this no1. A. Church,"An unsolvableproblemof elementarynumbertheory," tion of consistencywith a part of the belief system itself that Am. J. Math. 58, 345(1936). causesconceptual as well as computational problems in non2. M. Davis, "Why Godel didn't have Church's thesis l' Inf.Contrl.,54, monotonic reasonirg, essentially problems of self-reference. 3-24 (1982). McCarthy has found a very ingenious way of finessing such 3. W. Pylyshyn, Computation and Cognition: Toward a Foundation self-referencein the context of minimization, allowing a mefor Cognitiue Science. MIT Press, Cambridge, MA, 1984. chanical means of establishing the effect of consistencytests in 4. J. C. Webb, Mechanism, Mentalism, and Metamathematics, D. certain cases. Reidel, Dordrecht, the Netherlands, 1980. As suggestedabove, given a predicate symbol P and a formula AlPl containing P, the minimization of P by A [P] can be General References thought of as saying that the P objectsconsist of certain ones A. Church, Calculi of Lambda Conuersion, Princeton University Press, as neededto satisfy A [P] and no more, in the sensethat any Princeton, NJ, t941. tentative set of P objectsr (such as those given by a wff Zx such M. Davis, The Undecidable, Raven Press, New York, 1965. that AIZI holds) already includes all P objects.Circumscription expressesthis by means of a schema or set of wffs, which M. Dnvrs are denotedhere by AlPllP, as follows: New York University
CIRCUMSCRIPTION
AlPltP -{ta Vl A (rX Z(x) -+ P(x))l -
Circumscription is a technique devised by McCarthy (1) for formalizing certain notions in commonsense reasoning (qv)
(HereAlZlresultsfromAlPlbyreplacingeveryoccurrenceof P By Z.)
(y)(P(y) - Z(y))lZ is a wff}
CIRCUMSCRIPTION
101
which intended to have the interpretation that dead things (D) are those that are not living (L), and o is living, b is dead, and c is a kangaroo 6). The circumscription of D then corresponds to the notion that as few things as possible are to be considereddead. However, using mere predicate circumscription, that is, AlDl. rather than AlD, Ll*, D could not be "squeezed"down by means of an appropriate Z predicate since ^L,being unchanged, would force D to be its unchanging complement. Thus, A [D]. would not have either Dc or Lc as theorems. On the other hand, AlD, tr1. doeshave -Dc, and hence Lc, as theorems. This can be seen by circumscribing with the two predicatesr - b (for Zo) and x * b (for Z): AIZo, Zr) is just (r)[ Zox e -r Zfi) and Zta and Zob and Kc and(a * b and a * c and b * c), which is true, and also (i)(Zox + Dx), so that by A Generalization.McCarthy (2) generaltzeshis original no- the schema one has (x)(Dx - Zox). In particular, -, Zoc + tion of (predicate)circumscription to allow specifiedpredicates -Dc, and so on. other than P to vary as well as P; this decisively extends the Of course,formula circumscription can accomplishall that range of applicability of circumscription. In the new formula- variable conscription does,and even more, as is shown below. tion, called formula circumscription, the schemais replacedby Etherington, Mercer, and Reiter (3) establish several theoa single second-orderformula, but comparison with predicate rems characterizing the above kind of limitation of predicate circumscription is easier when a schema or set AlPt, . , circumscription, thereby bolstering the significance of variP"llB is retained, in the following form: able and formula circumscription.
A key example, a variation on one emphasized by McCarthy (1), is the following let AIPlbe a + b and P(a) y P(b). Let Zt@) be r - a and Zz@)be r - b. Then from P(a) yP(b) one gets that either 21 or 22 will serve for circumscription. That is, either P(a) holds, so that AIZ:I is true and ZJx) -n P(x) and hence circumscription using Zt for P yields P(x) --> ZJx); orP(b) holdsso that AIZz] is true and Zz@)- P(*) and hence,using Z2for P, P(x) --+Zz@).Thus, either o is the only P object or b is; indeed, -'P(a) V -rp(b) will then be provable from AIPI + AIPUP.In fact, it follows that there is a unique P object; this, however, should not cause concern, for the intention is to explore the consequencesof conjecturing the stated minimization of P.
{A[Zb
. , z,l n @)@[zr,
, Zn1 -> E)
* (rXE'-+ E(Zt, , Zn\ , Zn)) lwffs Zr, . The TheoreticalBasisfor Circumscription is formula in Pr, . a which whereE E(Pu . ,Pnl ,Pn Minimal Models. Aside from giving examples,it is desirable may appear,and ElZr, . , Znl is obtainedfrom E by substi- to show in precise terms in what sense the circumscriptive tuting Z; for each P;. Here the intuitive idea is to minimize AtPllP does in fact minimize. For this purpose,Mc(the extension of) the formul a E by allowing variations in (the schema Carthy (1) proposedthe conceptof minimal model in the conextensionsof) Pr, . , P n The new second-orderversion of text of predicate circumscription. Etherington (4) has redecircumscription is called formula circumscription; the weakfined minimal model in a manner appropriate to McCarthy's ened version retaining a schema but allowing variable predinew (formula) version of circumscription, which is presented cates is called variable circumscription. here in slightly modified form. Let M andN be models of A [P] As McCarthy has observed,it is the presenceof the predi- Al,Po,Pv. . , Pnl with the same domains and the same cate variables P1, . , P,that gives variable circumscription interpretations of all constant, function, and predicate symits power, and not the fact that E may be a formula. Indeed, bols exceptpossiblyPo, PL, , P,. Here M is a proper Pforming an extensionby definitions of AtPl bV adding the new N if P6 submodel of the extension in M is a proper subset of of axiom (x)(Pox Qsy) .- Qxl to replace P in the second-ordercircumscription axiom. Ohis trick will not work, of course,in Davis's example (5) since second-orderarithmetic is as prey to undecidability problems as is first-order arithmetic.) Results.Nevertheless,certain partial PositiveCompleteness conversesdo hold, which have rather broad application. First some terminology based on Doyle (8). AlPl is disjunctively P defining if it has theorems of the form ( x ) ( P i xforeach i : 0,.
Wr{) V'''
V ( r ) ( P i x* W i n i x )
, n where the V['s do not involvaPo,.
,
Pn.
Perlis and Minker (9) exploit this conceptin the following partial completenessresult: It A[P]- is disjunctively P definitg, AlPl" r B whenever AlPllP : B. They also show that if AIPI has only finite models, A[P)* r B whenever AIP)IP : B'
Efficiency As with many commonsensereasoning techniques, circumscription naturally presents itself as a candidate for a reasoning mechanism that could in principle be used in an intelligent robot (see Robotics),for instance, in conjunction with a theo. rem prover (see Theorem provirg). However, the fact that a schema or infinite set of axioms is involved presents practical difficulties, especially in the necessary choice of which instance(s) of the schema to use. That is, efficiency questions arise. In this regard, Lifschttz (10) has shown the significanceof a subclassof theories oppositeto circumscription: the separable theories. Separable theories AlPl are those that are formed, using conjunctions and disjunctions, from formulas containing no positive occurrencesof P6 and formulas of the form (x)(E(x) -+ Po(r)) where E is a predicate that doesnot contain Po. Ohese appear related to the disjunctively P defining theories and may afford fruitful terrain for further investigation.) Such theories turn out to afford expression by means of a single first-order wff replacing the second-ordercircumscription axiom. Applicationsand RelatedWork McCarthy (2) gives applications of circumscription to various problems in commonsensereasoning. Paramount among these is his use of a predicate ab for abnormal aspectsof entities. He shows how to represent reasoning to the effect that, for example, typically birds can fly. The idea is to minimize (as a conjectural assumption) the objects that are abnormal with respect to any given aspect,for instance, birds that are abnormal with respect to flying (such as penguins or ostriches). This allows the expression of default reasoning to be given a uniform treatment, in which the predicate ab is circumscribedprovided that other predicates as desired may be consideredvariable. For instance, letting ab(B, F, x) stand for x is an abnormal bird with respect to flying, then from the following axioms' Bird(r) and -'ab(B , F, x) .-+ Flies(r) Ostrictr(r) Ostrich(r) -
ab(B, F, x) Bird(r)
Bird(Tweety) one can prove by formula circumscription that Tweety can fly (and consequently that Tweety is not an ostrich). Here it is sufficient to use the null predicate (e.g.,x * x) for both ab(B, F, d and Ostrich(x), and rc _ Tweety for both Bird(r) and Flies(r). Grosof(11) presentsa translation schemefrom Reiter's (12) default logic into circumscription in an effort to unify and clarify these two approachesto nonmonotonic inference. Reiter (13) showsthat fbr certain special cases,circumscription achieves the effect of another formalism known as predicate completion (L4). Papalaskaris and Bundy (15) have applied circumscriptive reasoning to issues in natural-Ianguage processing(seeNatural-language understanding). In particular, they examine contextual cuesthat provide guidelines for appropriate predicates to circumscribe in formulating answers to questions.
CLUSTERING
BIBLIOGRAPHY 1. J. McCarthy, "Circumscription-A form of non-monotonicreasoni.g," Artif. Intell. L3,27-39 (1980). Z. J. McCarthy, "Applications of circumscription to formalizrng common sense knowledgu," Workshop on Nonmonotonic Reasonitg, New Paltz, NY, sponsoredby AAAI, October L7-19, 1984. 3. D. Etheringtotr, R. Mercer, and R. Reiter, "on the adequacy of predicate circumscription for closed-worldreasonirg," J. ComputIntell. 1, 11-15 (1985). 4. D. Etherington, personalcommunication,Comp.Sci. Dept., IJniv. of British Columbia, Canada, 1984. b. M. Davis, "The mathematics of non-monotonicreasoning," Artif. Intell. 13, 73-80 (1980). 6. J. Minker and D. Perlis, "Protected circumscription," Workshop on Nonmonotonic Reasoning,New Paltz, NY, Oct., L984. 7. D. Kueker, "Another failure of completenessfor circumscription," Week on Logic and Artificial Intelligence, University of Maryland, October 22-26, 1984. 8. J. Doyle, "Circumscription and implicit definability," Workshop on NonmonotonicReasoning,New Paltz, NY, Oct., 1984. g. D. Perlis and J. Minker, "Completenessresults for circumscription," Artif. Intell. 28, 29-42 (L986). 10. V. Lifschitz, "Some results on circumscription," Workshop on NonmonotonicReasonirg, New Paltz, NY, Oct., 1984. 11. B. Grosof, "Default reasoning on circumscription," Workshop on Nonmonotonic Reasoning,New Paltz, NY, Oct., 1984. 12. R. Reiter, "A logic for default reasoning,"Artif. Intell.,13' 81-132 (1980). 13. R. Reiter, "Circumscription implies predicate completion (sometimes)," Proc. Nat'I. conf. on Art. Intell., Pittsbursh, PA, 1982. 14. K. Clark, "Negation as failur€," in H. Gallaire and J. Minker (eds.),Logic and Data Bases,Plenum, New York, L978' lb. M. A. Papalaskaris and A. Bundy, "Topics for circumscription," Workshop on Nonmonotonic Reasonitg, New Paltz, NY, Oct',
103
similar according to a given measure, but becauseas a group they represent a certain conceptual class. This view, called conceptual clustering, states that clustering depends on the goals of classification and the conceptsavailable to the clustering system for characterizing collectionsof entities. For exampl;, if the goal is to partition a configuration of points into simple visual gfoupitrgs, one may partition them into those that form a T-sh&P€,an L-shape, and so on' even though the density distributions and distances between the points may suggest different groupings. A procedure that uses only similarities (or distances) between the points and is unaware of these simple shape types clearly can only accidently create clusterings corresponding to these concepts.To create such clusteri.g, these descriptive conceptsmust be known to the system. Another example of conceptual clustering is the gpouping of visible stars into named constellations. Conceptual clustering is contrastedwith the classicalview in the next section and describedin more detail in the section Conceptual Clustering. Clustering is the basis for building hierarchical classification schemes.For example, by first partitioning the original set of entities and then repeatedly applying a clustering algorithm to the classesgenerated at the previous step, one can obtain a hierarchical classification of the entities (a divisive strategy). A classification schema is obtained by determining the general characteristics of the classesgenerated. Building classification schemesand using them to classify objectsis a widely practiced intellectual processin scienceas well as in ordinary life. Understanding this process,and the mechanisms of clustering underlying it is therefore an important domain of research in AI and other areas. This process can be viewed as a cousin of the "divide and conquer" strategy widely used in problem solving (qv). It is also related to the task of decomposingany large-scale engineering system into smaller subsystemsin order to simplify its design and implementation.
1984.
D. Prnr,rs The ClassicalView versusthe ConceptualClusteringView University of MarYland In the classical approach to clustering mentioned above,clusters are determined solely on the basis of a predefinedmeasure of similarity. To define such a measure, a data analyst determines attributes that are perceived as relevant for characterizing objects under consideration. Vectors of values of these CLUSTERING attributes for individual objects serve as descriptions of these grouping physical process of objects.Considering attributes as dimensionsof a multidimenClustering is usually viewed as a or abstract objectsinto classesof similar objects.According to sional description space,each object description correspondsto this view, in order to cluster objects, one needs to define a a point in the space.The similarity between objectscan thus measure of similarity between the objectsand then apply it to be measured as a reciprocal function of the distance between determine classes.Classesare definedas collectionsof objects the points in the description space. Let Ve and Vn denote the attribute vectors representing whose intraclass similarity is high and interclass similarity is low. Becausethe notion of similarity between objectsis funda- objectsA andB, respectively.The distanceof objectA to object mental to this view, clustering methods based on it can be B is defined as a numerical function of the attribute vectors of called similarity-based methods. Many such methods have A and B and is written as d(Ve, Vil. For example, assuming been developed in numerical taxonohy, a field developedby that vector descriptions of objects A and B are Va . , x,(B)), re. , xn(A)) and Vs _ (rt(B), xz(B), social and natural scientists, and in cluster analysis, a subfield xz(A), (qt). Various similarity measures and spectively,where x!, x2, . of pattern recognition , xn are selectedobjectattributes, clustering algorithms utilizing them are presentedbelow (see a simple measure of distance is: also Concept learnirg; Region growing.) n Another view recently developedin AI postulates that obd(Ve, Vn) : l*,(e)- rc;(B)l jects should be grouped together not just because they are i:I
104
CTUSTERINC
Because distance is a function of only the attributes of two compared objects,the similarity-based clustering can be performed relatively easily and without a need for knowledge about its purpose.The similarity-based approachhas produced a number of efficient clustering algorithms, which have been useful in many classification-building applications. The classical approach suffers, however, from some significant limitations. The results of clustering are clusters plus information about numerical similarities between objects and objectclasses.No descriptionsor explanations of the generated clusters are supplied. The problem of cluster interpretation is simply left to the data analyst. Data analystsohowever, are typically interested not only in clusters but also in their explanation or characteri zation. To overcomethis, one may postscript the similarity-based clustering processwith an intelligent interpretation that tries to learn the conceptualsignificanceof each cluster through the use of AI techniques.Such a process,however, is not easy.In fact, it may be even more difficult than that of generating clusters themselves.This is becauseit requires inducing category descriptions from examples, which is a complex inferential task. Even if one ignores this difficulty, this processmay not produce desired results. Clusters generated solely on the basis of somepredefined numerical measure of similarity may in principle lack simple conceptual explanations. One reason for this is that a similarity measure typically considersall attributes with equal importance and thus makes no distinction between those that are more relevant and those that are less relevant or irrelevant. Consequently, if there is coincidental agreement between the values of a sufficient number of irrelevant attributes, objectsthat are different in a conceptual sensemay be classified as similar. Even if one assigns some a priori "weights" to attributes this wiII not change the situation very much, becausethe classicalapproachhas no mechanismsfor selecting and evaluating attributes in theprocessof generating clusters. Neither is there any mechanism for automatically constructing new attributes that may be more adequate for clustering than those initially provided. Another reason for the difficulty of the postclustering interpretation is that in order to generate clusters that correspond to simple concepts,one has to take into considerationconcepts useful for characteri zrrrgclusters as a whole in the processof clustering and not after clustering. The following example illustrates this point. Consider the problem of clustering the points in Figure 1. Typically, a per-
ooo
o o o
o Oa o a o
o o oo
o o o
.
o
OO
o
Ao
o
oo oo
o o
o
so
o
o
O
o
o
a
a
a o o
aooo o
oo
oo
a
Figure 1. How would you cluster these points?
son lookin g at this figure would say that it is a letter S intersecting with a letter M. One should observethat points A and B, which are closer to each other than to any other points, are classified into conceptually different clusters. The reason seems to be that people are equipped with conceptssuch as letter shapes,straight lines, and so on to help them recognize certain conceptsin the figure. Thus, clustering in this caseis not based on local closenessof points but on global concepts characterizing collections of points together. A conceptual clustering program would solve this problem by matching the descriptions of the letter shapes (contained in its memory as background knowledge) against the given collection of points. The best match would be obtained for shapes"S" and "M." One may add that, in general, classicaltechniquesdo not seem to be much concerned with the ways humans cluster objects.They do not take into consideration any Gestalt concepts or linguistic constructs people use in describing object collections.Observations of how peoplecluster objectssuggest that they search for one or more attributes (out of many potential attributes) that are most relevant to the goal of clustering and on that basis cluster the objects. Objects are put to the same cluster if they score similarly on these attributes. A description of the objects in the same cluster can therefore be expressedas a single statement or a conjunction of statements, each specifying one common property (attribute value) of the objectsin the cluster. The above remark doesnot mean, however, that individual statements could not include a disjunction of values of the same attribute (the so-called internal disjunction). For example, a cluster may be characterized as "a set of large boxes,made of cardboard, and colored either blue or yellow." Different clusters are expectedto have descriptions with different values of the relevant attributes. Conceptualclustering has been introduced as a way to overcomethe above-mentionedlimitations of classicalmethods.Its basic premise is that objectsshould be arranged in classesthat represent simple conceptsand are useful from the viewpoint of the goal of clustering. Thus, objectsin the same cluster do not necessarilyhave to be similar in somemathematically defined sensebut must as a group represent the sameconcept.In order to cluster objects into conceptual categories,the notion of similarity must be replacedby a more general notion of conceptual cohesiveness(1) (seealso Learning, machine). The conceptual cohesiveness(CC) between two objects A and B dependson the attributes of these objects,the attributes of nearby objects,and the set of conceptsavailable for describing object configurations. Thus, it is a function CC(VA, VB,E, C), where Va and Vs are vectors of attribute values for A and B, respectively,E denotesobjectsin the environment of A and B, and C is the set of available concepts.Thus, the conceptual cohesivenessis a four-argument function in contrast to a twoargument distance or similarity function. In conceptualclustering there is a constant duality between category descriptions and cluster membership. Specifically, the result of conceptual clustering is not only a set of clusters (a classification of the initially given objects)but also a set of conceptschar actenzing the obtained clusters (a classification scheme). One may say that from the viewpoint of AI, the similaritybasedapproach representsthe so-calledweak method, that is, a gene*I -"thod that uses little problem domain knowledge. Such a method can be called domain-general knowledge-poor.
CLUSTERINC
In contrast, the conceptual clustering approach that is dependent on the background conceptsand clustering goals can be called domain-generic knowledge-modular. It requires an interchangeablemodule of knowledge definedfor the problem at hand. A goal-dependencynetwork (GDN) (27) may be used to indicate which attributes are relevant to which goals of classification. Various algorithms for classical methods and conceptual clustering methods are presented below. of ClusteringProblems A Classification From the viewpoint of applications, it is useful to classify clustering problems on the basis of the dimensionality of objectsto be clustered. Three classesof problems can be distinguished:
105
or proximity measures and developing clustering techniques utilizing them. A large number of such measures and comesponding clustering methods have been developed to date. Comprehensivesurveys can be found in Sokal and Sneath (8), Cormark (9), Anderberg (10), Gower (11), and Diday and Simon $2). A summary of various distance measures is described in Ref. 13. Clustering techniques can themselvesbe clustered in many interesting ways. One classification partitions the techniques on the basis of the type of control used in building the clusters. The categoriesof clustering techniques accordingto this classification are agglomerative, divisive, and direct.
Agglomerative Techniques.Agglomerative techniques are often used in numerical taxonomy. These techniques form 1. One-dimensionalclustering (quantization of uariables).For clusters by progressive fusion, that is, by recursively joining continuous variables or discrete variables with ranges of separateentities and small groups together to form larger and values that are significantly larger than necessary for a larger gfoupings. Eventually a single universal group is given problem, one wants to reduce the number of distinct formed and the processhalts, leaving a record of the merges values of the variables by identifying equivalenceclassesof that took place. The history of merges is often displayed in the values. Clusters of values of individual variables are then form of a dendrogram (seeFig. 2c) that shows,by the position treated as single units. For example, in image processing of the horizontal location of the merge, the between-group the scannersusually distinguish between a large number of similarities. As the groups encompassmore and more entities, gray levels, but only a few levels may be neededfor solving the between-group similarity scoresdecrease. a given problem (see Image understanding). Rosenfeld (2) By adopting a threshold of minimum similarity, the aghas shown that clustering methods can be used for making glomeration process can be halted before all entities are such a reduction. Nubuyaki (3) proposeda clustering algo- merged into a single group. Conversely, the complete dendrorithm for this purpose in which the clusters have minimal gram may be "cut" apart across some similarity boundary. sums of squares of intracluster distances.Clustering tech- This yields a number of clusters, each containing those entiniques have also been used to analyze LANDSAT im- ties that were merged at a similarity score above the given ages(4). threshold. 2. Two-dimensional clustering (segmentation).This type of During the agglomerative clustering processit is necessary clustering occursmost often in image processing,where one to calculate the similarities between groups of entities. There searchesfor segmentsof an image in which all picture ele- are three standard ways to compute between-group similariments share some common properties. For example, they ties (measured as the reciprocal of distances). Supposetwo may have a similar gray level or similar texture. Coleman groups are identified as X and Y. The single-linkage methods (5) defined region segmentation as a problem of clustering calculate between-groupdistance between one entity in group (which he calls nonsupervised learning) and used the k- X and another entity in group Y. The complete-linkegemethmeans algorithm of MacQueen (6). Haralick and Shapiro ods use the maximum distance between one entity in group X (7) have used clustering to anatyze object shapes. and another entity in group Y. The auerage-linkagemethods 3. Multidimensional clustering. In multidimensional cluster- use the average of the distances between all possiblepairs of ing objects are partitioned into clusters in a description entities with one taken from group X and the other from space spanned by many attributes characterizing the ob- group Y. jects. As mentioned earlier, the basis for clustering is typically a similarity measure. Traditional clustering techDivisiveTechniques.Divisive techniques form a classificaniques may assumedifferent geometric distributions of the tion by progressive subdivision, that is, by repeatedly breakpoints in the space by the use of different normalization, ing the initial set into smaller and smaller clusters until only transformation, and statistical treatments of the attrisingle entities exist in each cluster. The result is a hierarchy of butes. The next sectiongives more details on the similarityclusters. The divisive technique of Edwards and Cavalli-Sforza based methods. In conceptual clustering the conceptof de(14) examines all 2w - 1 partitions of N objectsand selectsthe is not however, here the space scription spaceis also useful; fixed but may change as new attributes are generated by one that gives the minimum intracluster sum of the squared background knowledge heuristics. In addition, the method interobject distances. The computational cost of the method is equipped with a set of conceptsthat can be used to char- limits its use to casesinvolving the clustering of only a few objects. actertze object configurations.
ClassicalMethodsof Clustering The thrust of research in cluster analysis and numerical taxonomy has been toward determining various object similarity
Direct Techniques.The direct techniquesneither merge entities into clusters nor break large clusters into smaller ones. A direct technique is given the number (usually denoted k) of clusters to form and proceedsto find a partitioning of the enti-
106
CLUSTERING
1. MP (Microprocessor) Type:structured Domain:13 values 8080a 8502 Z,80 1802 6502C 6502A 68000 6800 6805 6809 8048 Z,8000 HP (Hewlett-Packard Co.proprietary)
2. RAM memory size Type: linear Domain: 4 values 16,000bytes 32,000bytes 48,000bytes 64,000bytes
4. Display type Type: structured Domain: 4 values Terminal B/W.TV Color-TV Built-in
3. ROM memory size Type: linear Domain: 7 values 1000 bytes 4000 bytes 8000 bytes 10,000bytes 11,000-16,000bytes 26,000bytes 80,000bytes (a)
5. Keys on keyboard Type: linear Domain: 5 values 52 keys 53-56 57-63 64-73 92
MP
oebs .d..)o.) .^1.1 \." ,r(, ur6oooebo
8o8oA z8o 8048
,
/
I
\
6502 6502A 6502c
T h e s t r u c t u r e dd o m a i n f o r t h e v a r i a b l e" M P . "
D i s p l a tyy p e
E x t e r n a tl e r m i n a l
C o l oT rV
B/W TV
Built-in
Thestructureddomainfor the variable"displaytype." (b) Figure 2. (o) Variable used to describe microcomputers. (6) The structure of domains of variables "MP" and "Display type." (c) A dendrogram generated by NUMTAX with descriptions generated by Aq. (d) A conceptual clustering of microcomputers.
ties into ft clusters that optimizes some measure of the goodnessof the clusters. Two early direct clustering techniquesare ft-meansdevelopedby MacQueen (6), and the center adjustment method developedby Meisel (15). A generalization of the k meansand center adjustment techniques called the dynamic clustering method has been developedby Diday (16). Another classification of clustering methods separatesthe monothetic techniques from the polythetic ones.A monothetic clustering algorithm divides the set of objects into clusters that differ in the value of one attribute. For example, such a technique might form one cluster in which attribute X; has the value 1 and another cluster in which attribute Xihas the value 0. A polythetic clustering technique forms clusters in which the values of several attributes differ for different classes. Traditional clustering relies on measuresof similarity and
the requisite need to "fold" the attribute values together to rneasure object-to-objectsimilarities. When this occurs in a multidimensional space,the question of attribute weighting comesup, and there is much controversy over what weighting schemeis best for various purposes. Weights on attributes have to be given a priori by the researcher. Problems with such an approach are that it is usually difficult to define such weights, and that some attributes may be dependenton other attributes. For example, attributes B and C may be important only if attributeA has the value 1. A similarity metric uses some static weights for attributes A, B, and C. The attributes B and C are weighted too high when attribute A takes the value 0 (since they should receive zero weight in that case),and they may be weighted too low when attribute A takes the value 1.
CLUSTERING
107
Similarity 0.35 0.20 0.05 0.10 0.25 0.40 0.55 0.70 0.85 1.0 +--+ --l---l---l +--+ +--+
r I
- - C : V l C2 0 - i - - - - ' r - - G : H P8 5
I - - lI
L -----i
I I
r--I
J : O h i oS c i . I - D:Sorcerer H:Horizon
--f t- - -
i r---
---
L- { I
]" ,)
-..l,!"ilh 19 t F : Z e n i t hH 89
)"
k= For the two-cluster solution (obtained by cutting the dendrogram at the dashed line marked by k - 2) cluster descriptions are:
.4SKlVlKeys< 631
IRAM:16K.
1RAM:G4Kl[Keys>63] (c)
= 53..63] [Display# Built-in]lKeys f
IMP - 8080x] - -'---l -
f
F--
---
BU u il l t --linnllL[ K K€elyS-=s O644. ./ .J7Jg ] ,O U ,S f rppllaayy= H If -L ------------zenith [Display* Built-in]& = 11K-16K] [ K e y s= 6 4 . . 7 3 ] [ R 0 M
-i
= u6502x] ruz-Al IMP rvrr = i L
- ----t-i [0isptay ColorTV]& = 10K] = J52..63][ROM rv^J l..oJjlnvrvr = It L [Keys neys = 1 : : - : - - : ; , :-; ;B- -/ W - - -- T- V ]&
---
| [ D i s p-l a y - 10K1 evs 53..56][ROM |l -l K ------
I M P= H P I
Sorcerer HOfiZOn
Trs-80| l -U 8 zZeeennnlirittnhhH H89
T r s - 8 0l l l
vlc20 Annte A pple ll
A t a rgi o o ghalenger o h i oS c i1 l
[ K e y s= 9 2 ] [ R O M- 3 0 K ]
A description of the class al: IMP : 8080x] & [Display * Built-in] & lKeys - 53..63] (d) Figure 2. (Continued)
ConceptualClustering As described above, conceptual clustering arranges objects into clusters corresponding to certain conceptual classes,for example, classescharacterized by conjunctive concepts(i.e., conceptsdefined by a simple conjunction of properties). The basic theory and an algorithm for conceptual clustering have been developedby Michalski (17). Implementation and experimentation with the algorithm has been performed by Michalski and Stepp (1,18)and Stepp (19) and has producedthe programs CLUSTER 12 andCLUSTER/S. Other programs that work differently but provide conceptual clustering features include DISCON (20), RUMMAGE (2I), and GLAUBER (22). From the viewpoint of AI, clustering is a form of learning from observation (or learning without a teacher). It is a processthat generatesclasses(conceptuallydefinedcategories)in order to partition a given set of observations.It differs from conceptlearning (qv) in that the latter creates descriptions of teacher-providedclassesby generalizing from the examplesof the classes. Below, one method for conceptual clustering is briefly outlined. The method is basedon the idea that conceptualcluster-
ing can be conductedby a series of conceptualdiscriminations similar to those used in learning conceptsfrom examples.The method uses the extended predicate calculus proposedby Michalski (17). Such a language is used to describe objects, classes of objects, and general and problem-specific background knowledge.The method employs a general-purposecriterion for measuring the quality of generatedcandidate classifications. Finding classificationsthat scorehigh on the quality criterion is the most general goal of the method. Additional problem-specificgoals may be supplied by the user or inferred by the system from a general goal dependencynetwork. Goal dependencyis important to reduce the spaceof hypothetical classificationsthe method investigates. Creating a classification is a difficult problem becausethere are usually many potential solutions with no clearly correct or incorrect answers. The decision about which classification to choosecan be basedon someperceivedset of goalsas described by Medin, Wattenmaker, and Michalski (23), a goal-oriented, statistic-basedutility function as describedby Rendell (24), or some other measure of the quality of the classification. One way to measure classification quality is to define various elementary, easy-to-measurecriteria specifying desirable properties of a classification, and to assemblethem into one
1OB
CTUSTERING
general criterion. Each elementary criterion measures a certain aspect of the generated classifications. Examples of elementary criteria are the relevance of descriptors used in the class descriptions to the general goal, the fit between the classification and the objects, the simplicity of the class descriptions, the number of attributes that singly discriminate among all classes,and the number of attributes necessaryto classify the objectsinto the proposedclasses. Building a meaningful classification relies on finding good classifying attributes. The method presentedbelow usesbackground knowledge in the search for such attributes. Background knowledge rules enable the system to perform a chain of inferencesto derive values for new descriptorsfor inclusion in object descriptions.The new descriptorsare tested by applying the classification quality criterion to the groupings formed by them. ConceptFormationby RepeatedDiscrimination.This section explains how a problem of conceptformation (here, building a classification) can be solved via a sequenceof controlled steps of concept acquisition (learning concepts from examples). Given a set of unclassified objects,k seedobjects are selected randomly and treated as representatives af k hypothetical classes. The algorithm then generates descriptions of each seed that are maximally general, form a good match with a subset of the objects given, and do not cover any other seed. These descriptions are then used to determine the most representative object in each newly formed class (where the newly formed class is defined as the set of objectssatisfying the generated classdescription). The k representativeobjectsare then used as new seeds for the next iteration. The process stops either when consecutive iterations converge to some stable solution or when a specificnumber of iterations pass without improving the classification (from the viewpoint of the quality criterion). This approach requires that the number of classesis specified in advance. Since the best number of classesto form is usually unknowr, two techniques are used: varying the number of classesand composingthe classeshierarchically. For most purposes, it is desired that the classification formed be simple and easy to understand. With this in mind, the number of classesthat stem from any nodeof the classification hierarchy can be assumed to be in some modest range such as from 2 to 7. With this small range, it is computationally feasible to repeat the whole clustering process for every number in the range. The solution that optimizes the score on the classification quality criterion (with appropriate adjustment for the effect of the number of classeson the score) indicates the best number of classesto form at this level of the hierarchy. The above method of repeated discrimination for performing clustering has been implemented in the program CLUSTERI} for a subset of extendedpredicate calculus (seeLogic, predicate) involving only attributes (zero-argument functions). Besidesits relative computational simplicity, this approach has other advantagesstemming from use of quantifierfree descriptions (for both objects and classes).It should be noted that classificationsnormally have the property that they can unambiguously classify any object into its corresponding class. To have this property, the class descriptions must be mutually disjoint.
For conjunctive descriptions involving relations on attribute-value pairs, the disjointness property is easy to test and easy to maintain. For the more complex problems that require object representations involving quantified variables, predicates on these variables, and function-value relationships over quantified variables, the test for mutual disjointness of descriptions is much more complex. To cope with this difficulty, the problem of clustering of structured objects is decomposedinto two steps. The first step finds an optimized characteristic description of the entire collection of objectsand then uses it to generate a quantifier-free description of each object. The second step processesthe quantifier-free object descriptions with the CLUSTERI? algorithm to form optimizedclassifications. These two processesare combined in the program CLUSTER/S.
Example1: Microcomputers.The problem is to develop a meaningful classification of popular microcomputers.Each microcomputer is described in terms of the variables shown in ((MP" and "Display type" are structured, Figure 2a.Yariables i.e., their value set forms a hierarchy (Fig. 2b). Two programs were applied to solve this problem: NUMTAX, which implements several techniques of numerical taxonoffiy, and CLUSTER/2, which implements conjunctive conceptual clustering. A representative dendrogram producedby NUMTAX is shown in Figure 2c. The dashedlines indicate where the dendrogram is cut apart to form two clusters (k _ 2). Accompanying the dendrogram is a logical description of the clusters. These descriptions were produced by an inductive learning proglam that acceptsas input a collection of groups (clusters)of objects and generates the simplest discriminant description of each group. For example, the first cluster is describedas IRAM - 16K . . . 48Kl V lKeyss 63] This description suggeststhat the cluster is composedof two kinds of computers,one that has [RAM _ 16K . . . 64Kl and the other that has [Keys =63]. The presenceof disjunction raises the question of why these computers are in the same cluster. The program CLUSTEP"I2 was given the same data and was told to use a classification quality criterion that maximizes the fit between the clustering and the objects in the cluster and then maximizes the simplicity of category descriptions. The clusterittg obtained is shown in Figure 2d. The firstlevel clustering is done on the basis of type of microprocessor. Example2: Trains. Consider a problem of classifying structured objects,for example, the problem of finding a classification of trains shown in Fig. 3o. The trains are structured objects, each consisting of a sequenceof cars of different shapes and sizes. The individual cars carcy a variable number of items of different shapes. Human classifications of the trains shown in Figure 3a have been investigated by Medin, Wattenmaker, and Michalski (23).The 10 trains were placedon separateindex cards so they could be arranged into groups by the subjects in the experiment. The experiment was completedby 3L subjectswho formed a total of 93 classifications of the trains. The most popular classification (17 repetitions) involved the number of cars in the trains. The three classesformed were "trains con-
CTUSTERING
109
A. H.
c. G.
lr.
^A J\ A.\1
Ctassl: "Train conteinslwo cars."
H.
c.
B.
c.
D.f
"Thesetrainsarecarryingtoxic chemicals"' E.
E.
G.
G.
B.
Class2: "Train containsthreeclrs."
D.
A.
E
f@l
rFEr
n
H.
Class3: "Train conteinsfour cars." (a)
(b)
"Thesc trains are not carrying toxic chemicals." (c)
Figure 3. (o) Trains to be classified. (b) The most frequent human classificationof trains. (c) Conceptual clustering of trains carrying toxic chemicals.
taining two cars," "trains containing three cars," and "trains containing four cars." This classification is shown in Figure 3b. This problem is an example of a classof problemsfor which the implicit classification goal is to generate classesthat are conceptually simple and basedon easy-to-determinevisual attributes. When people are asked to build such classifications, they typically form classeswith disjoint descriptions,BSin the above-mentionedstudy by Medin. For this reason methods that produce disjoint descriptions are of prime interest. The problem of classifying trains representsa general category of classification problems in which one wants to organize and classify observationsthat require structural descriptions, for example, classifying physical or chemical structures, anabuilding taxonomiesof plants or anilyzing genetic sequences, mals, characterizing visual scenes,or splitting a sequenceof temporal events into episodeswith simple meanings. One problem of concernhere is to developa general method that when applied to the collection of structured objects,such as trains, could potentially generate the conjunctiveconcepts occurring in human classificationsor invent new conceptshaving similar appeal. An extension of the trains problem illustrates the use of a goal dependency network and problem-specific background knowledge. Supposethat the knowledge base includes an inference rule that can identify trains carrying toxic chemicals and that the general goal "survive" has a subordinate goal "monitor dangerous shipments." This background knowledge can be used to help build a classification. In the illustrations of the trains a toxic chemical container is identified as a single sphere (circle) riding in an open-top car. A background-knowledgerule supplied to the program is
[contains(train,car)][car-shape(car)- opentop] [cargo-shape(car)_ circle][items-carried(car) - 1] e [has- toxic- chemicals(train)] In the above rule, equivalence is used to indicate that the negation of the condition part is sufficient to assert the negative of the consequencepart. After this rule is applied, all trains will have descriptions containing either the toxic chemical predicate or its negation. The characteristic description generated by the program will now contain the additional predicate "has-toxic-chemicals(train)" (or its negation). By recognrzrngthat this predicate is important to the goal "survival" through use of a GDN, the program producedthe classification shown in Figure 3c. Concept Formation by Finding ClassifyingAttributes. This section describesan alternative approachfor building classifications. This approach searches for one or more classifying attributes whose value sets can be split into ranges that define individual clusters. The important aspect of this approach is that the classifying attributes can be derived through a goaldirected chain of inferences from the initial attributes. The classifying attributes sought are the ones that lead to classes of objectsthat are best according to the classification goal and the given classification quality criterion. The "premise" of a descriptor to serve as a classifying attribute is determined by relating it to the goals or derived subgoals of the problem and by considering how many other descriptorsit implies. For example, if the goal of the classification is "finding food," the attribute "edibility" might be a good classifying attribute. The secondway of determining the promise of an attribute
11 0
CTUSTERING
can be illustrated by the problem of classifying birds. The question of whether "color" is a more important classifying attribute than "is-waterbird" is answeredin favor of "is-waterbird" becausethe latter leads to more implied attributes than doesthe attribute "color" in a given GDN (e.g.,"is-waterbird" implies can swim, has webbed feet, eats fish, and so on), as describedby Medin, Wattenmaker, and Michalski (23). There are two fundamental processesthat operate alternately to generate the classification. The first processsearches for the classifying attribute whosevalue set can be partitioned to form classessuch that the producedclassificationscoresbest according to the classification quality criterion. The second processgeneratesnew descriptorsby a chain of inferencesusing background knowledge rules. Descriptors that can be inferred are ordered by relevancy to the goals of the classification. The searchprocesscan be performed in two ways. When the number of classesto form (fr) is known in advance,the process searchesfor attributes having k,or more different values in the descriptions of the objects to be classified. These values are called the obseruedvaluesof the attribute. Attributes with the number of observedvalues smaller than k arc not considered. For attributes with observed value sets larger than k, the choice of the mapping of value subsets to classesdependson the resulting quality criterion score for the classification produced and the type of the value set. When the number of classesto form is not known, the above technique is performed for several different values of fr. The best number of classes,ft, is indicated by the classification that best satisfiesthe quality criterion and goals. The generate processconstructsnew attributes from combinations of existing attributes. Various heuristics of attribute construction are used to gUide the process.For example, two attributes that have linearly ordered value sets can be combined using arithmetic operators. When the attributes have numerical values (as opposedto symbolic values such as small, medium, and large), a trend analysis can be used to suggest appropriate arithmetic operators,as in the BACON system by Langley and his associates(25). Predicatescan be combinedby logical operators to form new attributes through background knowledge rules. For example, a rule that says an animal is a reptile if it is cold-bloodedand lays eggs can be written as [cold-blooded(ol)][offspringbirth(ol) - egg] ) lanimal-type(ol) - rePtile]. The application of this rule to the given animal descriptions yields the new attribute "animal-type" with the specified value "reptile." l)sing this rule and similar ones, one might classify some animals into reptiles, mammals, and birds even though the type of each animal is not stated in the original data. Summary Clustering objectsor abstract entities into meaningful categories is an important form of learning (qt) from observation. This entry has described a classical, "similarity-based" approach and the more recent conceptual clustering approachto this problem. The fundamental notion is conceptualcohesiveness that groups together objects that correspond to certain conceptsrather than objects that are similar according to a mathematical similarity function.
BIBLIOGRAPHY 1. R. S. Michalski and R. E. Stepp,Learning from Observation:Conceptual Clusterirg, in R. S. Michalski, J. Carbonell, and T. Mitchell (eds.),Machine Learning: An Artificial IntelligenceApproach, Tioga, Palo Alto, CA, pp. 331-363, 1983. 2. A. Rosenfeld, Some Recent Developments in Texture Analysis, Proceedings of the Conferenceon Pattern Recognition and Image Processing,Chicago, 1979. 3. O. Nubuyaki, Discriminant and Least Squares Threshold Selection, Proceedings of the Fourth International Conferenceon Pattern Recognition,Kyoto, Japan, p. 592, 1978. 4. P. H. Swain, "Image and Data Analysis in Remote Sensing," in R. M. Haralick and J. C. Simon (eds.),Issues in Digital Image Processing,Sijthoff and Noordhoff, Amsterdam, 1980. 5. G. B. Coleman, SceneSegmentation by Clusterirg, University of Southern California Image ProcessingInstitute, Report USCIPI, L977. 6. J. MacQueen, "Some methods for classification analysis of multivariate observations,"Proc. 5th BerkeleySymp. Math. Stat. Prob., 28r, 1967. 7. R. M. Haralick and L. Shapiro, Decomposition of Polygonal Shapesby Clustering, Proceedingsof the IEEE Conferenceon Pattern Recognition and Image Processing,Troy, NY, p. 183, 1977. 8. R. R. Sokal and R. H. Sneath,Principlesof NumericalTanonotrty, W. H. Freeman, San Francisco,1963. 9. R. M. Cormark, "A review of classification," J. Roy. Stat. Soc., SeriesA, P, L34-321 (1971). 10. M. R. Anderb€rg, Cluster Analysis for Applications, Academic Press,New York, 1973. 11. J. C. Gower, "A comparisonof somemethodsof cluster analysis," Biametrics 23, 623-637 (1967). L2. E. Diday and J. C. Simon, "Clustering analysis," Conxrnunication and Cybernetics,Springer-Verlag, New York, L976. 13. R. S. Michalski, R. E. Stepp,and E. DidaY, "A RecentAdvancein Data Analysis: Clustering Objects into ClassesCharacterizedby Conjunctive Concepts," in L. N. Kanal and A. Rosenfeld (eds.), Progressin Pattern Recognition, Vol. 1, North-Holland, Amsterdam, 1981. L4. A. W. F. Edwards and L. L. Cavalli-Sforza,"A method for cluster analysis,"Biometrics 2L, 362-375 (1965). 15. W. Mei sel,Computer Oriented Approachesto Pattern Recognition, Academic Press,New York, L972. 16. E. Diday, "Problems of clustering and recent advanc€s,"Eleuenth Congressof Statistics, Oslo Norway, 1978. t7. R. S. Michalski, "Knowledge acquisition through conceptualclustering: A theoretical framework and an algorithm for partitioning data into conjunctive concepts,"J. Pol. Anal. Inform. Sys. 4,2L9244 (1980). 18. R. S. Michalski and R. E. Stepp, "Automated construction of classifications: Conceptual clustering versus numerical taxonomy," IEEE Trans. Pattern Anal. Mach'ineIntelL PAMI'5 (4)' 396-410 (July 1983). 19. R. E. Stepp, Conjunctive Conceptual Clustering: A Methodology and Experimentation, Ph.D. Thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1984. 20. P. Langley and S. Sage,ConceptualClustering as Discrimination Learnin g, Proceedings of the Fifth Biennial Conferenceof the Ca' nadian Societyfor Computational Studies of Intelligence, London, Ontario, L984,pp. 95-98. ZL. D. Fisher, A Hierarchical Conceptual Clustering Algorithm, Technical Report, Department of Information and Computer Science,University of California, Irvine, 1984. 22. P. Langley, J . Zytkow, H. Simon, and G. Bradshaw, The Searchfor
COGNITIVEMODELING Regularity: Four Aspects of scientific Discovery, in R. S. Michalski, J. Carbonell, and T. Mitchell (eds'),Machine Learning: An Artificiat Intettigence Approach, Vol. II, Morgan Kaufmann' pp. 425-469, L986. 23. D. L. Medin, W. S. Wattenmaker, and R. S. Michalski, "Constraints in inductive learning: An experimental study comparing human and machine performance"' ISG Report 86-1, UIUCDS-F86-952,University of Illinois, 1986. 24. L. A. Rendell, "Toward a unified approach for conceptualknowledge acquisition," AI Mag. 4, L9-27 (Winter 1983). 25. P. Langley, G. L. Bradshaw, and H. A. Simon, "Rediscovering chemistry with the BACON system," in R. S. Michalski, J. Carbonell, and T. M. Mitchett (eds.), Machine Learning: An Artificial Intelligence Approach, Tioga, 1983, pp. 307-329. 2G. D. Fisher and P. Langl"y, Approachesto Conceptual Clusteritg, Proceedingsof the Ninth International Joint Conferenceon A.I, Los Angeles,CA, pp. 691-697, (August 1985). 27. R. E. Stepp and R. S. Michalski, ConceptualClustering: Inventing Goal-Oriented Classifications of Structured Objects,in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning: An Artificial Intelligence Approach, YoL II, Morgan Kaufmann, pp. 331-363, 1986. R. S. MrcselsKl and R. E. Stnpp University of Illinois
COCNITION.
See Reasoning.
MODELINC COGNITIVE A cognitive simulation model is a computer simulation of mental or cognitive processes.Such a model is normally constructed by cognitive psychologists,who are members of the branch of experimental psychologythat is concernedwith the scientific and empirical study of human behavior, with an emphasis on understanding the internal mental mechanismsthat underlie behavior (seeCognitive psychology).The purposesof cognitive modeling are to express a theory of mental mechanisms in precise and rigorous terms, to demonstrate the sufficiency of a set of theoretical concepts,and to provide an explanation for observedhuman behavior. Becausecognitive models use many techniques and ideas from AI, they are similar to AI proglams. But the goals of cognitive modeling and AI tend to be substantially different (seeRef. 1). Briefly put, the goal of AI is to build intelligent machines, whereas the goal of cognitive modeling is to build models of human mental mechanisms. These activities are very similar, but they differ mainly in the criteria for success. Again briefly put, the quality of a pieceof AI work is measured in terms of how well the machine is able to perform the task. In a cognitive modeling effort the question is not only whether the computer program is able to perform the task but also the extent to which it behaveslike a human performing the same task and whether the mechanisms involved are plausible theoretical explanations for human mental processes.Notice that in AI terms these mechanisms may be inefficient or unnecessarily complex for the task. This entry touches on the contribution of cognitive modeling to AI. It is not a commonly acceptedidea, but cognitive modeling work is relevant to AI in that some of the mechanisms in cognitive models are applicable to AI problems.
111
Purposesof CognitiveModeling The rationale for cognitive modeling is best seen in terms of the history of theoretical developmentin cognitive psychology. Except for the temporary aberration of behaviorism, the goal of experimental psychology over the last century has always been to construct an adequate theory of the mental processes that underlie behavior. An adequate theory of the human mind would explain the observed behavioral data in terms of plausible internal mechanisms. The traditional mode for describing such mechanismshas been in the form of verbal statements. As the ideas get more complex, such verbal theories become difficult to handle. Thus, there is a need to express psychological theory precisely and to demonstrate that theoretical conceptsare actually sufficient to explain the behavior and to derive testable predictions about data in a rigorous fashion. The idea of rigorous theoretical models in experimental psychology is a fairly old idea; an excellent early example is the work of Hull during the 1940s,who constructedone of the first large-scalemathematical theories of behavior. During the fifties and sixties mathematical models of psychological processeswere developed.These models represented perceptual and learning situations as stochastic processes,which were very successfulin accounting quantitatively for many details of human behavior. See Ref. 2 for a summary of these approaches. This combination of verbal and mathematical theory has producedwhat might be termed the "standard" theory of cognition, which is basedon a decompositionof the human mind into major components. These consist of structures such as short-term memory and long-term memory and processessuch as recognition, memory stor dge,and memory retrieval, which processand manipulate the information stored in the structures. This theory is the basic framework for most current cognitive models. As interest in cognitive psychology moved from simple learning (qv) and perception (see Vision, early) to complex behavior such as reasoning (qv) and reading comprehension (see Natural-language understanding), the mathematical models seemed to be inadequate because they characterized behavior in terms of a small number of continuous mathematical variables; it seemedthat complex qualitative, or symbolic, systems were neededinstead, especially in order to represent knowledge (see Representation, knowledge). In addition, many researcherscame to feel that a psychologicaltheory or model should describe the processesgoing on in the mind rather than simply providing a chara cterization of the statistical properties of the behavior (3). Thus, computer programs, in which these complex entities can be representeddirectly, became the ideal mode for expressing theory (4). Perhaps the most important event in symbolic cognitive modeling was the adoption of semantic networks (qv) from AI. For cognitive psychologists the significance of the semantic network representation was that it provided a representation of knowledge in a form that tied into the classical conceptof associationvery well (seeRef. 5 for a comprehensivereview of this topic). Semantic networks were so appealing theoretically that AI quickly became of intense interest to cognitive psychologists,and cognitive simulation models were the best way to incorporate AI concepts into cognitive theory. Currently, there seems to be a consensus that cognitive simulation models best represent the core theoretical conceptsin cogni-
112
COCNITIVEMODTTING
tive psychology.However, it is important to note that despite the recogntzedimportance of cognitive simulation models and the AI conceptsthat underlie them, relatively few cognitive psychologists actually construct and make use of simulation models (seeRefs. 6 and 7 for further discussion). Evaluationof CognitiveModels TheoreticalQuality. Since cognitive psychologyis an empirical sciencethat is attempting to construct explanatory theory, the quality of a cognitive model dependsboth on its ability to mimic observedbehavior and on the quality of the model as a piece of theory (6,7). Most of the extant cognitive modeling work has been done with the primary theoretical goals of demonstrating that a theory is sufficient to produce the behavior and of stating the theory rigorously. Beyond these concerns, the architectural integrity of the model is critical. Does the model make consistentuse of a set of explicit theoretical mechanisms that comprisea cognitive architecture or doesit appear to contain ad hoc, arbitrary mechanisms?If the architecture has been maintained, it will be relatively clear how the model works; a theory is of little value if it cannot be understoodby the scientists in the field. Thus, there is a great premium on the model having a basically simple and consistently maintained architecture. EmpiricalQuality. One criterion for empirical quality is apparent realism, which is the criterion that most AI projects attempt to meet; the system must be able to produce apparently realistic behavior. That is, most natural-language-processing systems are designed so that they appear to do the correct thing with the input. It is not necessaryto evaluate such systems on a systematic scientific basis becausethe established usage of language is adequate to characterize whether the model is reasonably correct. But, more recently, simulation models have been used to accountfor experimental data in great detail. Thus, it is desirable for a model to go beyond the apparently realistic stage and to account for data in a detailed way, preferably in a predictive rather than in an after-the-fact manner. In many casesthe time characteristics of the model and of human behavior are compared;somemeasure of processingtime or effort in the model should correspond to processingtime on the part of humans. The Nature of BehavioralData. There are somecharacteristics of behavioral data that are probably not obvious to those not familiar with cognitive psycholory. First, contrary to intuition, and perhaps common sense,introspection (observationof one's own thought processes)is neither a reliable nor a complete source of information about mental processes(seeRef. 8 for a history of this subject).The basic problems are that such observationsare highly idiosyncratic, easily distorted by subjective bias on the part of the observer,and more importantly, especiallythoseof interest most of the major mental processes, to AI, go on below the level of consciousawareness.The popular "think-out-loud" protocol-data are not strictly introspective data, but they suffer from related problems. Thus, modern cognitive psychology is based on behavioral, rather than introspective, data. Second,behavior is highly variable and subjectto the influence of many factors. This means that it is essential that behavioral data be obtained by the use of careful experimental methods and appropriate statistical analysis of the results. To
outsiders,this meticulousnessmay be hard to understand,but it is very easy to collect data that are worthless and misleadittg because of improper attention to such considerations. Third, human behavior is strongly determined by the task that the person is trying to do, meaning that the task should be carefully characterized, and inference from data to the internal processesmust be qualified by the task. Thus, the accuracy of a cognitive model is determined by how well it fits properly collected data on behavior in a suitable task, not by how well it agrees with the modeler's subjective impressions concerningmental processes. Finally, and most important, it is normally necessary to constrain a person'sbehavior in order to study it conveniently. This means that much more is known about certain aspectsof mental processesthan others. For example, perceptual processesare perhaps the best understood because the experimenter has great control over the stimulus and can require the subjectto producevery simple responsesbasedonly on observable propertiesof the stimulus. In more complexbehavior such as problem solving (qv), the behavior of a personbecomesless determined by specific features of the stimulus and more by the person'sinternal knowledgeand processes, such as his/her representation of the task. Normally, data from more complex tasks are much less reliable statistically and much harder to interpret. Thus, perhaps the most interesting processes,such as reasoning and problem solving, are the hardest to work with in terms of both data collection and the construction and evaluation of simulation models. Surveyof CognitiveModels This survey is limited to those modeling efforts in which modeling human behavior was a direct goal, as opposedto "pure" AI projects.Note, however, that Bower and Hilgard (2) use several AI projects directly as psychologicalmodelsbecausethese are the most complete and explicit statements available of certain theoretical mechanisms.Given below is a brief description of a variety of simulation models, grouPedby the cognitive processesunder investigation. Basic Approaches.There are three basic approachesthat have been used in cognitive models. In the first, basically a numeric simulation approach,representationsof somesort are activated in specifiedways over time, and the representations interact in terms of their activation. This is a very old concept in psychology;precursorsof it can be found in James (9) and Hebb (10), and it has great appeal becauseof its neurological flavor. This modeling work focusesmostly on the mathematical specificsof the time courseof activation and how the representations interact. The secondbasic approach involves the manipulation of symbolic structures that represent knowledge, essentially the same approach as current "mainstream" AI. Many of the cognitive models that are describedhere take this form. The third approach is a hybrid of the activation and symbolic approaches.That is, which knowledge structures are paid attention to and manipulated is determined by activation that typically spreads from one piece of knowledge to another. Quillian's (11) use of spreadingactivation is one of the original systems of this type. Perception.Perceptualprocesseshave usually beencharacterized as low-Ievel processesin which activation mechanisms
C O G N I T I V EM O D E L I N G
11 3
perhaps the earliest to point out the extreme amount and complexity of human knowledge when expressedin these terms. He suggestedthat the complexity of human thought, and its idiosyncrasies between individuals, could probably be accounted for in terms of the differences in knowledge rather than differences in basic cognitive processesthat use the knowledge. This is a precursor of the current emphasis on knowledge-basedsystemsin both AI and cognitive modeling. The work by Quillian (11) and Collins and Quillian (28,29) introduced the idea of semantic networks to cognitive psychology. This knowledge representation was widely acceptedbecauseit put the classic conceptof associationinto a form adequate to represent knowledge. The Collins and Quillian work led to the idea of cognitive economy, in which inheritance relations are used to reduce the amount of stored information, and the basic mechanism of spreading activation is used to explain how knowledge can be retrieved in terms of its relevance to currently active knowledge (21,30). Learning.Some of the earliest cogniti{'e models dealt with Important early models of semantic memory (seeMemory, The first was the classicEPAM (qv) model learning processes. of Simon and Feigenbaum (17), which constructeda discrimi- semantic) were Rumelhart, Lindsey, and Norman's (31) LNR nation network in order to perform simple learning. The pre- and Kintsch's (.32)model, both of which were based on case sented stimulus was sorted by the net to find the response;if grammar (seeGrammar, case)representations,and Anderson the responsewas incorrect, the net would be modified to pro- and Bower's (5) influential HAM model, which used a repreduce a new path to the comectresponse.EPAM is an example sentation similar to predicate logic (seeLogic, predicate).The of how a model of a psychologicalprocesscontributed to the contrast between systems like LNR and HAM shows how difdevelopmentof an important AI technique,the discrimination ferent representation systems can be developedthat are apnet. Hintzman (18) Iater built a more elaborateversion, SAL, parently adequate to represent human knowledge but have in which additional mechanisms were added to account for a substantial notational differencesand cannot be distinguished variety of experimentally observedphenomenaof interference from each other empirically [i.e., the problem of nonidentiand forgetting in simple learning situations. This early work fiability (4,2U1.The casegrammar form of representation has on learning was not followed up for quite a few years; instead become very popular both in cognitive modeling and in AI. most simulation efforts focused on models of performance However, the Anderson and Bower HAM model was probably more influential, simply becausethey made a special effort to rather than learning. try to bring their model in line with data on human perforfocused various has on Anderson work by More recently, The LAS system (19) learned the grammar mance.Also, Anderson'swork has been more concernedwith learning processes. for a language by constructing an augmented transition net- explicitly stated architectures for cognitive processes,which work (ATN in responseto pairs of semantic representations makes the theoretical status of the models more clear (21). and input sentences (see Grammar, augmented-transitionnetwork). In the ACT model (20,21)a distinction is made beLanguageComprehension.Comparableto the large amount tween procedural knowledg", representedas production rules, of work in AI on natural-language processing has been the and declarative knowledge, usually represented as proposi- considerableprogresson cognitive models of how humans untions in a semantic network. The production rules examine derstand language, usually in the context of reading compreand act on the semantic network. Only semantic representa- hension. Kintsch and Van Drjk (33) developeda model for how tions that are active, as a result of spreading activation, can people acquire and recall information from text, which has trigger the production rules. Considerableattention is paid to becomeone of the most important theoretical representations the mechanismsby which new proceduralknowledge,such as of comprehensionand memory processes.The model begins a skill, is learned; new production rules are acquired and re- with a representation of the propositional content of the input fined through practice. This approach has been applied to the text and selectswhich propositionsare to be retained in the learning of geomefuyQ2) and learning programming in LISP system'slimited short-term memory as it goesfrom one sen(qr) (zil. Kieras and Bovair (24) and Kieras and Polson(25,26) tence to the next using simple heuristics that are based prihave applied a similar, but greatly simplified, analysis to ac- marily on how the propositions are connectedto each other. count for the learning of skills in interacting with equipment. According to a basic principle of human learning, propositions Thus, the representation of learning as the acquisition and that reside in short-term memory longer are more likely to be refinement of production rules appears to be a powerful and transferred to long-term memory and thus recalled better. comprehensiveapproach. This model can account for what is rememberedfrom a text in a variety of reading and memory situations. Memory Organization and Processes. Most simulation A model by Kieras (34) used an ATN parser in conjunction models of mernory have dealt with long-term memory, which with a semantic network knowledgg representation and is the repository of general knowledge. This concern with spreading activation-memory searchmechanism and was able knowledge representation makes long-term memory a fruitful to account in considerabledetail for the time required to read area for application of AI concepts.An early paper by Frijda sentencesin simple passagesunder different task conditions.In (27)presentsthe basic idea that knowledgecan be represented another model Kieras (35) showed how certain higher-level in terms of labeled associationsbetween concepts.Frijda is comprehensionprocessescould be representedusing produc-
are of primary importance. For example, the Mcclelland and Rumelhart (L2,13) model recognizesfour-letter words using a network of representations of letter features, letters, and words, which activate and inhibit each other. The network reachesa stable state in which the representation of the presentedword is the most activated. Interestingly, there has not been much follow-up of the classic blocks-world work in AI (14),in which perceptionis seenas a matter of matching schemas for known objects against perceptual input. Although these conceptsare central to current cognitive theory (L5,2), there has been little or no attempt to construct and evaluate simulation models of perceptual processesin this domain. Perhaps the best simulation of higher-order perception is the Simon and Gilmartin (16) model of chessexpertise.This system learns to recognize patterns of pieces on the chessboardby building a discrimination net.
114
COGNITIVEMODETING
tion rules to perform inferenees on a propositional representation of the text content. The model was able to recognize or extract generaltzatronsfrom simple passagesin a manner similar to that used by human readers. Perhaps the single most comprehensivesimulation model of comprehension is that of Thibadeau, Just, and Carpenter (36), again using a combination of production rules, propositional representation, and activation mechanisms.This model captures the highly parallel and interactive processingthat apparently goeson in reading, all the way from syntactic analysis to the application of general knowledge. It was able to account for extremely detailed timing data from eye movement recordings of humans reading technical passages. ProblemSolvingand Reasoning.According to a classicpaper by Newell (1), this areais the most important one for AI, but it is one of the most difficult topics in cognitive psychology,&s pointed out above. The best known work in this field is the GPS model by Newell and Simon (37,38),which introducedthe idea of means-endsanalysis. It is very influential as a model for the methods humans use to solveproblems,as well as being one of the first representatives of what is now termed "weak methods" in problem solving. Another example of early work on problem solving is that of Simon and Kotovsky (gg),which was also one of the earliest cognitive simulation models. This was a model of how series completion probleffis, which often appear on IQ tests, could be solvedby recognrzingthe patterns of repetition and succession.The model was able to accountfor which problems would be the easiest and most difficult for people.Anderson, Greeno, Kline, and Neves (40) represented many of the processesinvolved in solving elementary proof problems in geometry with a system involving both semantic structures and production rules. The system would acquire and apply schemasrepresenting proof approaches.Hayes-Roth and Hayes-Roth (41) constructed an influential model of planning that represented how people would select a route in an errand performing task. This model was basedon a blackboard knowledge source architecture. ContributionTo Al
all goal of cognitive modeling is to arrive at a comprehensive architecture that is adequate for cognition, rather than simply constructing a multitude of unrelated special-purposesystems. Some of the specific cognitive architectures resulting from cognitive modeling might becomedirectly applicable,but by adopting this architectural approach, future work in AI could probably becomemore focusedtheoretically.
BIBLIOGRAPHY 1. A. Newell, Remarks on the Relationship Between Artificial Intelligence and Cognitive Psychology,in R. Baner{i and M. D. Mesarovic (eds.), Theoretical Approaches to Nonnumerical Problem Soluing, Springer-Verlag,New York, pp. 363-400, 1970. 2. G. H. Bower and E. R. Hilgard, Theories of Learning, 5th ed., Prentice-Hall, EnglewoodCliffs, NJ, 1981. 3. L. N. Gregg and H. A. Simon, "Process models and stochastic theories of simple concept formation," J. Math'. Psychol. 4, 246276 ( 1967). 4. D. E. Kieras, Knowledge Representationsin Cognitive Psychology, in L. Cobb and R. M. Thrall (eds.),Mathematical Frontiers of the Social and Policy Sciences,AAAS Selected Syrnposium 54, Westview,Boulder CO, pp. 5-36, 1981. 5. J. R. Anderson and G. H. Bower, Hurnan AssociatiueMeffiotA, Winston, Washington, DC, 1973. 6. D. E. Kieras, A Simulation Model for the Comprehensionof Technical Prose,in G. H. Bower (ed.),The Psychologyof Learning and Motiuation, Vol. 17, AcademicPress,New York, pp. 39-80' 1983. 7. D. E. Kieras, A Method for Comparing a Simulation Model to Reading Time Data, in D. Kieras and M. Just (eds.),New Methods in Reading ComprehensionResearch,Erlbaum, Hillsdale, NJ, pp. 299-325, 1984. 8. G. Humphrey Thinking: An Introduction to lts Experimental Psychology,Wiley, New York, 1963. 9. W. James, The Principles of Psychology,Henry Holt & Co., New York, 1890. 10. D. O. Hebb, The Organization of Behauior, Wiley, New York, L949. 11. M. R. Quillian, Semantic MemorY, in M. Minsky (ed.),Semantic Information Processirg, MIT Press,Cambridgu,MA, pp. 227-270, 1968. 12. J. L. McClelland and D. E. Rumelhart, "An interactive model of context effects in letter perception: Part 1. An account of basic findings," Psychol.Reu. 88, 375-407 (1981).
One way in which cognitive modeling work can contribute to AI is in the development of specific conceptsand techniques. Several approaches,such as discrimination nets and probably the idea of rule-based systems (qv), apparently developedas 13. D. E. Rumelhart and J. L. McClelland, "An interactive activation cognitive models at the same time &s, if not prior to, their model of context effects in letter perception: Part 2. The contexadoption as pure AI techniques. For example, the standard tual enhancement effect and some tests and extensions of the approach used in expert systems(qv) probably developedfrom model,"Psychol.Reu.89n60-94 (1982). the basic characterization of human expertise as the ability to L4. P. H. Winston (ed.),The Psychologyof ComputerVision, McGrawrecognize patterns, which could then be representedas a set of Hill, New York, L975. production rules. Tracing out the exact lines of descent of 1b. S. E. Palmer, Visual Perceptionand World Knowledge:Notes on a lh.r. ideas is beyond the scopeof this entry, but it certainly Model of Sensory-CognitiveInteraction, in D. A. Norman and D. made have efforts modeling cognitive that historically appears E. Rumelhart (eds.),Explorations in Cognition, W. H. Freeman, San Francisco,PP. 279-307, L975. important contributions to AI. One prime candidate for a new contribution is the general 16. H. A. Simon and K. Gilmartin, "A simulation of memory for chess positions,"Cog. Psycltol.5, 29-46 (1973). approach currently used in cognitive modeling. Since cogniposition theoretical specific a with developed are H. A. Simon and E. A. Feigenbaum,"An information-processing 17. models tive theory of some effectsof similarity, familiarization, and meaningin mind, they normally proposean explicit cognitive architectypes data fulness in verbal learnitg," J. Verb. Learn- Verb. Behau.S' 385basic of set ture. This consistsof a relatively small 396 (1964). more comthe all constructed are which and processes,out of net model plex knowledge representations and processesthat the model 18. D. L. Hintzman, "Explorations with a discrimination (1968). 123-62 5, Psychol. g:' Math. J. learnin paired-associate for overthe uses to represent human mental mechanisms.Thus,
COGNITIVEPSYCHOLOGY 19. J. R. Anderson, Computer Simulation of a Language-Acquisition System,in R. L. Solso(ed.),InformationProcessingandCognition: The Loyola symposium, Lawrence Erlbaum, Hillsdale, NJ, pp. 295-349, 1975. 20. J. R. Anderson, Language, Memory, and Thought, Lawrence Erlbaum, Hillsdale, NJ, I976. ZL. J. R. Anders on, The Architecture of Cognition, Harvard University Press, Cambridg", MA, 1983. 22. J. R. Anderson, Acquisition of Proof Skills in Geometry, in J. G. Carbonell, R. Michalski, and T. Mitchell (eds.),Machine Learning, an Artificial IntetligenceApproach, Tioga, San Francisco,CA, pp. 1 9 1 - 2 L 9 ,1 9 8 2 . 23. J. R. Anderson, R. Farrell, and R. Sauers, Learning to Plan in LISP, Technical Report #ONR-82-2, Carnegie-MellonUniversity, Pittsburgh, PA, 1982. 24. D. E. Kieras and S. Bovair, The Acquisition of Proceduresfrom Text: A Production-System Analysis of Transfer of Training, Journal of Memory and Languages 25, 507-524 (1986). 2b. D. E. Kieras and P. G. Polson,"An approachto the formal analysis of user complexity,"Int. J. Man-Mach. Stud. 22,365-394 (1985).
115
COGNITIVEPSYCHOLOGY The term artificial intelligence evokes a contrast with the "natural" intelligence of higher organisms' most notably human beings. Somewould argue that AI, as definedby successful AI prograffis, is likely to prove qualitatively different from the nalural variety. Another view, however, is that AI should be directed toward imitation of the cognitive capabilities of humans. The latter view suggests that AI should be closely linked to cognitive psychology,the field that investigates how people acquire knowledge, remember it, and put it to use to make decisions and solve problems.
History and Scope. Cognitive psychology, also sometimes called information-processing psychology, is currently the leading area of human experimental psycholory. The origins of the field can be traced to nineteenth-century psychologists such as James (1) and the German Gestalt psychologistssuch as Duncker and Wertheimer (2,3). For much of the twentieth 26. P. G. Polson and D. E. Kieras, A Quantitative Model of the Learncentury up until about 1960, however, American psychology ing and Performance of Text Editing Knowledge, In L. Borman was dominated by behaviorist theories that eschewedany refand B. Curtis (eds.), Human Factors in Computing SystemsProerence to unobservable mental processes.The modern revival pp. ceedings,Special Issue of Sigchi Bulletin, San Francisco,CA, of cognitive psychology was fostered in part by developments 207-2L2, 1985. in other disciplines, most notably linguistics and computer 27. N. H. Frijda, "simulation of human long-term memory," Psychol. science.In linguistics Chomsky's theory of generative gramBull. 77, L_3L (L972). mar coupledwith his scathing critique of behaviorist accounts 28. A. M. Collins and M. R. Quillian, "Retrieval time from semantic of language use provided the impetus for cognitive approaches memory," J.Verb. Learn. Verb. Behau. 8r 240-247 (L969). to language in the new field of psycholinguistics (4,5). In com29. A. M. Collins and M. R. Quillian, How to Make a Language User, in E. Tulving and W. Donaldson(eds.),Organizationand Memory, puter sciencethe digital computer became a striking example of an information-processing system in which observable inAcademicPress,New York, pp. 309-351, 1972. put-output relations clearly dependedon complex but well30. A. M. Collins and E. F. Loftus, "A spreading activation theory of specified intervening computational steps, discrediting the semantic processing,"Psychol.Reu. 82, 407-428 (1975). behaviorist claim that only observable stimulus-response 31. D. E. Rumelhart, P. H. Lindsol, and D. A. Norman, A Process Model for Long-Term Memory, in E. Tulving and W. Donaldson relations were respectable objects of scientific scrutiny. The (eds.),Organization and Memory, Academic Press,New York, pp. use of the computer as a model for theories of human intelliL97-246, 1972. gencerose to prominence in a seminal book by Miller, Galan32. W. Kintsch, The Representationof Meaning in Memory, Lawrence ter, and Pribram (6), a work that set the stage for a book by Erlbaum, Hillsdale, NJ, 1974. Neisser (7) that gave the field of cognitive psychologyits mod33. W. Kintsch and T. A. van Dijk, "Toward a model of discourse ern identity. The landmark computational account of human comprehensionand production,"Psychol.Reu.85, 363-394 (1978). problem solving by Newell and Simon (8) established a firm 34. D. E. Kieras, "Componentprocessesin the comprehensionof sim- link between the view that AI should strive to imitate human ple pros€,"J. Verb. Learn. Verb. Behau. 20, L-23 (1981). cognition and the view that computer simulations afford test35. D. E. Kieras, "A model of reader strategy for abstracting main able theoretical models of cognitive processes.(For a thorough ideas from simple technical prose," Text 2, 47-82 (L982). survey of the origins of cognitive psychologyseeRef. 9, as well 36. R. Thibadeau, M. A. Just, and P. A. Carpenter, "A model of the as the historical review in Ref. 8.) In recent years work in time courseand content of readiog," Cog. Sci.6, 157-203 (1982). cognitive psycholory has becomeincreasingly integrated with 37. A. Newell and H. A. Simon, Human Problem Soluing, Prentice- work in AI, neuropsychology, linguistics, and philosophy Hall, EnglewoodCliffs, NJ, L972. within the emerging field of cognitive science. 38. G. W. Ernst and A. Newell,GPS:A CaseStudyin Generalityand Human cognition is a complex and highly interactive sysProblem-Soluing, AcademicPress,New York, 1969. tem that does not lend itself to tidy compartmentalization; 39. H.A.SimonandK.Kotovsky,"Humanacquisitionofconceptsfor however, it is useful to divide cognitive psychologyinto five sequentialpatterns,"Psychol.Reu.70,534-546(1963). subareas. These are perception, attention, memory, thinking, 40. J. R. Anderson,J. G. Greeno,P. J. Kline, andD. M. Neves,Acqui- and language, each of which are discussed below. Like any sition of Problem-Solving Skill, in J. R. Anderson(ed.),Cognitiue scientific discipline, the scopeof cognitive psychologyis delineSkills and Their Acquisitioz,Erlbaum,Hillsdale,NJ, 1981. ated not only by its subject mattei but also by the methods it 41. B. Hayes-Rothand F. Hayes-Roth,"A cognitivemodelof plan- employs. A variety of research methods are commonly used, ning," Cog.5ci.3,275-BL0 (1979). including measurement of reaction time to perform simple tasks, patterns of eye movements, distributions of types of
i'"Hil.Tl'-1tf,i'j3ffiJff D Krunes ffiiTiJ,t'i::[?f;,liif fa"f processes University Michigan has of
been to attempt to decomposecognitive
into com-
116
coGNtTtvE PSYCHOLOGY
ponents and to estimate the temporal relations among them (LZ-L4). CognitivePsychologyand Al. Cognitive psychologyand AI have been closely intertwined since the inception of each. AI has provided cognitive psychologywith both a methodological tool and theoretical formalisms. Given the highly interactive nature of human cognition, computer simulation is often a useful tool for deriving predictions from a complex model. At the theoretical level, cognitive psychologyhas adaptednumerous conceptsthat were developedin computer sciencein general and AI in particular [e.g., content-addressablememory (seeAssociative memory), semantic networks (qv), and blackboard models (qv)1. Early work in cognitive psychologyyielded theoretical concepts that anticipated some that are now being explored within AI. For example, Bartlett (15) introduced the conceptof a schema, a knowledge structure that actively generates expectations based on regularities abstracted from past experience. Such AI concepts as frames (qv) and scripts (qv) are variants of the schemaconcept(16-18). Tolman's (19) work on mental maps and the representation of expectancieswas a precursor to current conceptionsof mental models (20). More generally, empirical and theoretical work in cognitive psycholory has yielded a clearer understanding of somegeneral principles of human information processing that can help direct developmentof AI systemsmodeled after human cognition. In particular, as is elaborated below, human intelligence appears to be based on multiple representational codesfor knowledge (e.g., visuospatial as well as linguistic), on a great deal of parallel processing of information, and on inference patterns that depend on similarity and associativelinks more than on strictly deductive logic (seeInference, logic). These properties of human information processing seem to be inextricably linked to powerful learning (qr) mechanisms, ranging from elementary detection of covariations among properties of the environment to exploitation of analogies between knowledge acquired in different domains QL). These learning mechanisms allow humans to avoid the "brittleness" of typical AI expert systems (qv), which generally lack humanlike flexibility in adapting themselves to changes in their initial domain of application. Theoretical approaches to cognition increasingly tend to link cognitive psychologyand AI. This is particularly evident in the case of the two major types of formalisms in which cognitive models are currently being developed,namely, production systems (see Rule-based systems) and connectionist neural networks (see Connectionism). Systems based on production rules were first introduced into cognitive psychology as modelsof human problem solving (8); later developmentsby Anderson and others (22-2il extended versions of production systems,sometimes coupled with semantic networks, to serve as models of other cognitive processes.Recentwork has begun to exploit the modularity of rule-based systemsto provide accounts of learning in terms of generation of new rules. Connectionist models Q6-Zg represent a current resurgence of interest in modeling cognitive processesat a relatively microscopiclevel of analysis analogousto neural units, as in earlier psychologicaltheories such as that of Hebb (29). Whereas production systems were first proposedas models of higher level thought processesand then pressed"downward"
to attempt to account for more elementary processes,connectionist models were first applied to basic perceptual processes and are currently being pressed "upward" to attempt to account for phenomena that seem more conceptual. Connectionist models place much greater emphasis on parallel processing than production system models tend to do. It is noteworthy, however, that production system models in psycholory, unlike their AI counterparts, often assume parallel changes in the degree of activation of knowledge in memory. The phenomena of human cognition seems to impose some form of parallelism on psychological theories. A theoretical frontier in cognitive psycholory is likely to center on attempts at integrating ideas derived from rule-based systems with those derived from neural modeling. Major Areasof Research The survey of active research areas in cognitive psychology presented below is of necessity selective and incomplete. In addition, much more could be said about the interconnections between the various areas of research. More extensive and integrative reviews can be found in recent textbooks (30,31). Perception.The earliest stages in perception, such as extraction of information directly from the retinal image, are usually considered outside the scopeof cognitive psychology (although early perception is an important topic in experimental psychology and is clearly relevant to AI). Cognitive work on perception is concernedwith the construction of meaningful patterns from elementary components,with vision (qv) receiving by far the most attention. Since the classical research of Gestalt psychologistssuch as Wertheimer (32), a basic concern has been with the principles that govern the construction of relatively constant interpretations of perceptual inputs despite wide variations in the input itself. For example, a square is perceived as such even though it may be tilted in various directions, partially occluded, or composedof broken rather than solid lines. An important theoretical position associated with Gibson (33) is that perception dependson the detection of invariant properties of the distal stimulus (i.e., the object in the environment), which either remain constant or change systematically as the proximal stimulus (i.e., the retinal image) undergoes a wide range of variations. The relationship between the Gibsonian position and AI work in vision is discussedin Ref. 34. Recentresearch has made considerableprogressin addressing the longstanding and basic issue of identifying the elementary features the human visual system detects and uses to construct visual patterns. Treisman and Gelade (35) used a selective-attention task (seebelow) to identify a level of visual processingin which the color, form, and location of an object appear to be represented as separate features not yet integrated into a unified representation of an object. These features are detected in parallel acrossthe entire visual field so that time to detect a target embeddedin an array is independent of the number of elements in the array if and only if the target can be consistently discriminated from all distractors by considering a single feature (a phenomenonreferred to as "pop-out"). In contrast, discrimination must be based on a slower serial processwhen features (see Feature extraction) must be combined to identify the target. Other work has used
COGNITIVEPSYCHOLOGY
similar techniques to identify someof the elementary features that compor" ,ritrral forms (36,37).Rock (38) provides a lucid introduction to the topic of perception. Attention. The core issue in theories of attention concerns information reduction. Because humans are constantly faced with an immense amount of information as the result of both perception and memory retrieval and are lirnited in theit capacity to processit, they must be selective in their analysis of inputs. The basic idea that humans can be viewed as limitedcapacity information-processingsystemswas first proposedby Broadbent (39) and became a cornerstone of cognitive psychology. This cornerstone, however, has been the focus of controversy since it was first erected. At issue is the degree and locus of parallelism in information processing.Broadbent proposed that inputs are "filtered" early in perceptual processing and that only a selectedfew are processedat higher levels (e.g.,&t the level of meaning). Soon afterward, however, evidence accrued that people occasionally respond to the meaning of highty familiar inputs (i.e., their names) even when the inputs are unattended, suggesting that unattended inputs are attenuated rather than filtered entirely (40). These early-selection models, which emphasized limits on perceptual processing, were subsequently challenged by late-selectionmodels (4\), according to which all inputs are processedto the level of meaning, with selection occurring only among responsesto the inputs. Late-selection models imply a greater degree of parallel processingthan do early-selection models.The conceptof "automaticity" has been invoked to explain why humans can perform sometasks in parallel whereas others demand serial processing (42,43).The general notion is that particular types of experience result in a decrease in the capacity required to perform tasks so that multiple tasks can be performed concurrently without interference. Development of automaticity is sometimestheoretically associatedwith a reduction in control so that the person is unable to avoid making an overlearned automatic responseto an input (e.g.,accessingthe meaning of a familiar word). An important form of automatic responding is revealed by the tendency for processingof an input rapidly to "prime" related inputs (e.g., words of similar meaning) so that subsequentprocessingof related inputs is facilitated (44). Neely @5) provided evidence of rapid automatic facilitation and slower consciousinhibition of the processingof inputs. The relationship between selectivity and automaticity remains controversial. The various theoretical properties of automaticity do not always co-occur,and putative evidence for capacity-free processing beyond a stage of early perceptual selection has been challenged (46). For recent analyses of issues in attention see Refs. 47 and 48. Memory. Research on memory is concernedwith the processesby which information is stored,retained over sometime interval, and subsequently retrieved. Memory is intimately related to perception and attention since memory is often the incidental by-product of attentive perceptual processing. Learning roughly correspondsto the storagephase of memory; however, except for purely rote memory (if such a thing exists), learning typically implies some degreeof generalization or integration of new information with old. A story, for exam-
117
ple, is remembered as a hierarchical structure that reflects schematicknowledge about similar episodes(49,50)(seeStory analysis). Learning extends to the acquisition of knowledge more general than specificperceptualinputs, aswhen a child acquires a general notion of what "dog" means from experience with particular exemplars.Reviews of the extensive literature on human memory can be found in Refs. 30, 31, 51, and 52. Early theories of memory in cognitive psychologyproposed a fundamentat distinction between short-term and long-term memory stores (53,54). The short-term store was viewed as a bottleneck that limited the rate at which information can be transferred into permanent long-term storage. This view has since been modified, partly owing to the influence of criticisms launched by Craik and Lockhart (55). Current theories tend to view human memory as an essentially unitary system in which the short-term store (often called "active" or "working" memory) correspondsto the portion of the system currently in a highly active state. It is widely acknowledgedthat incoming perceptual inputs quickly make contact with representations in long-term memory in a parallel fashion. The most important limit on the eventual retrievability of information is the time required to associatethe input with other information in memory that will afford potential retrieval cues. The nature of the stored representation of an input, referred to as a memory "ttaeer" is currently a matter of controversy. Many theories represent the trace as a Iocahzednode or set of nodes in a semantic network. An alternative view favored by connectionist models is that memory representations are distributed, with a trace corresponding to a pattern of activity across neural units that tends to be reinstated upon re-presentation of the same or a similar input. It remains unclear whether the localized versus distributed views of the memory trace can be reconciled. Theories of memory must accommodateevidence that memory retrieval sometimes resemblesautomatic activation of a trace by a retrieval cue and sometimes resembles a slow search process much like consciousproblem solving. Another controversial issue involves evidence suggesting that memory traces can be formed in qualitatively different codes.The focus of debate has centered on mental imagery, a memory code that preservesthe spatial and visual properties of perceptual inputs (seeAnalog representation). Shepard and his colleagues(56) demonstrated that when people are asked to judge whether two visual forms are the same despite a differencein orientation, the time to make the decisionincreases linearly with the difference in orientation, &s if people "mentally rotated" one of the objectsto place it into correspondence with the other. Kosslyn (57) proposedthat images can be constructed in an inner "space" analogous to a display screen attached to a computer and that the results of spatial transformations can be "read off" of the imaginal representation. AIthough the psychological and philosophical implications of mental imagery are still debated (58), the existence of perceptlike memory traces is supported by a large body of converging evidence. Human memory stores not only representations of specifi.c experiencesbut also representations of categories of experience.A great deal of research in cognitive psychology,particularly that of Rosch and her colleagues(59), indicates that natural human categories tend to be organized around clear prototypical exemplars but have relatively ill-defined bound-
118
COGNITIVEPSYCHOLOGY
aries. Recent work on categorization has centered on the mechanisms by which categories are induced from experience with exemplars and the form in which categories are representedin memory (60,61).The localizedversus distributed debate is particularly prominent in discussionsof categorization, as various theories suggest that categoriesare representedby sets of separate traces of category exemplars, a distributed representation formed by superimposition of such traces in a network of neural units, or more localized category nodes formed by inductive mechanisms such as generahzation.
mental psychologyon language acquisition. Psycholinguistics was initially devoted to tests of Chomsky's theory of transformational grammar (qt) as a performance model (seeLinguistics, competenceand performance)and was heavily influenced by his nativist position regarding language acquisition (qv). Transformational grammar failed as a performance model of actual language use (7+1, and strongly nativist accounts of language are now regarded as suspect. Explorations of the relationship between language and other cognitive processes, such as memory and learning, have led to greater integration of psycholinguistic theories with models of other aspects of Thinking. Thinking involves the active transformation of cognition, as illustrated in Refs. 24, 25, 27, 49, and 50. (For existing knowledge to create new knowledge that can be used reviews of research in psycholinguistics see Refs. 75 and 76.) Initial lexical accessof word meanings (at least for familiar to achieve a goal. The topic can be loosely divided into reasoning (qv) (drawing inferences from current knowledge or be- meanings) appears to be extremely rapid and initially quite liefs), decision making (the evaluation of alternatives and independent of contextual constraints (77), consistent with choiceamong them) (seeDecision theorY),and problem solving other evidenceof parallelism in basic recognition processes.At (qv) (methods for attempting to achieve goals). These topics a global level language comprehensionappearsto reflect parare closely intertwined and reflect different emphasesand ex- allel analysesof speechsounds(seeSpeechunderstanding) (or, perimental paradigms rather than strong conceptual distinc- in the case of readirg, visual features of words (seeCharacter recognition)), syntactic and semantic constraints (see Gramtions. Given the obvious power of human intellect, it is rather mar articles; Semantics), and the pragrnatic cues to meaning paradoxical that much of the work on thinking has served to provided by conversational contexts,integrated to make serial reveal ways in which human reason departs from the norma- decisions about the interpretation of the incoming speech tive standards set forth by such disciplines as statistics and stream (seeDiscourseunderstanding). Blackboard models (qv) Iogic. The research of Kahneman and Tversky and others (62- of the sort implemented in the Hearsay system for speechrec64) indicates that intuitive decision making is often based on ognition (78) constitute plausible descriptions of the general easily used but fallible heuristics. These heuristics are closely nature of human language comprehension. tied to basic memory processes,such as the easeof retrieving information from memory (the availability heuristic) and the Future Prospects similarity of an instance to a category prototype (the represenThe links between the aims, methods, and theories in AI and tativeness heuristic). (For a theoretical analysis of similarity judgments see Ref. 65.) Similarly, work on human deductive cognitive psychology are likely to bring the two fields yet reasoning reveals major departures from the normative stan- closertogether over the next decade.It is increasingly the case dards of formal logic (66,67).Although humans may basesome that cognitive psychologistsdemand of their theories the kind inferences on an abstract "natural logic" (68), everyday rea- of sufficiency test provided by computer simulation. To meet to soning often seemsto be basedon rules induced and applied in this standard, they wiII either adapt current AI concepts of cognition theories adapt the context of broad classesof pragmatically important tasks, build new theories of cognition or also such as understanding social regulations or causal relations to build new AI concepts.For their part AI researchers psycholin cognitive advances of aware remain to reason have among events (69) (seeReasonitg, Causal).The human "inferreence engine" (see Inference) appears to be very different in ogy. Human beings, despite their cognitive shortcomings, known intelliall of flexible general and most far the by main reaAI in some embodied systems logic-based from the kind for AI will soning prograffis, and may well be "better than normative" for gent systems.As long as this is so, a major stratery natimitate closely more programs that of construction the be in the range of problems humans encounter most frequently intelligence. ural everyday life. Human problem solving is also closely tied to basic properties of the memory system. A major area of current research involves the transition from novice- to expert-Ievel problem- BIBLIOGRAPHY solving skilt in domains such as physics QA,71) (see Physics, Dover,New York, (origi1. W. James,ThePrinciplesof Psychology, naive). Expertise appearsto reflect the reorganization of schepublished 1890). nally mas representing categoriesof problems and the acquisition of 2. K. Duncker,"on problemsolvingi' Psychol.Monogr.,58(270), specialrzed methods for dealing with the categories of prob(1e45). generaltze to ability lems encountered in the domain. The probM. Wertheimer,ProductiueThinhing,Harper& Row,New York, 3. new to applied be can they so methods problem-solving 1959. lems and the ability to solve novel problems by analogy to Mouton.The Hague,1957. 4. N. Chomsky,SyntacticStructu.res, known situations in other domains (72,73) distinguishes huverbal behaviot," LanSkinner's AI exF. of B. "Review typical 5. N. Chomsky, man problem solving from the performance of (1959). guage 35,26-58 pert systems. Language.The study of language-its acquisition, production, and comprehension-has been a distinct area within cognitive psyehology,with a closerelationship to work in develop-
6. G. A. Miller, E. Galanter, and K. H. Pribram , Plans and the Structure of Behauior, Holt, Rinehart and Winston, New York, 1960. 7. U. Neisser, CognitiuePsychology,Prentice-Hall, EnglewoodCliffs, NJ, 1967.
COGNITIVEPSYCHOLOGY 8. A. Newell and H. A. Simon, Human Problem Soluing, PrenticeHall, EnglewoodCliffs, NJ, L972. g. J. L. Lachman, R. Lachman, and E. C. Butterfield, Cognitiue Psychology q,nd Information Processing: An Introduction, Erlbaum, Hillsdale, NJ, 1979. Memory 10. C. R. Puff (ed.),Handbook of ResearchMethods in HurT-La,n and Cognition, AcademicPress, New York, 1982' 11. A. K. Ericcson and H. A. Simon, ProtocolAnalysis: Verbal Reports as Data, MIT Press, Cambridge, MA, 1984. L2. S. Sternberg, "High-speed scanning in human memory," Science, 153, 652-654 (1966). 13. M. I. Posner, ChronometricExplorations of Mind, Erlbaum, Hillsdale, NJ, 1982. L4. J. L. McClelland, "On the time relations of mental processes:An examination of systemsof processesin cascade," Psychol.Reu.,86, 287-330 (r"979). 15. F. C. Bartlett, Remembering, Cambridge University Press, Cambridge, UK, L932. 16. M. A. Minsky, A Framework for Representing Knowledg", in P. H. Winston (ed.),ThePsychologyof ComputerVision, McGrawHill, New York, 1975. L7. R. C. Schank and R. P. Abelson, Scripts,Plans, Goals,and Understanding: An Inquiry into Human Knowledge Structures, Erlbaum, Hillsdale, NJ, 1977. 18. D. E. Rumelhart, Schemata:The Building Blocks of Cognition, in R. Spiro, B. Bruce, and W. Brewer (eds.), Theoretical Issues in Reading Comprehension,Erlbauffi, Hillsdale, NJ, 1980. 19. E. C. Tolman, "Cognitive maps in rats and men," Psych'ol.Reu., 55, 189-208 (1948). 20. D. Gentner and A. L. Stevens (eds.), Mental Models, Erlbauffi, Hillsdale, NJ, 1983. 21. J. H. Holland, K. J. Holyoak, R. E. Nisbett, and P. R. Thagard, Induction: Processesof Inference,Learning, and Discouery,MIT Press,Cambridge,MA, 1986. 22. J. R. Anderson and G. H. Bower, Human AssociotiueMeffioIU, Winston, Washington, DC, 1973. 23. J. R. Anders on,Language,Memory, and Thoughf, Erlbaum' Hillsdale, NJ, L976. 24. J. R. Anderson, The Architecture of Cognition, Harvard University Press,Cambridge,MA, 1983.
119
gb. A. M. Treisman and G. Gelade, "A feature-integration theory of attention," Cogn.Psychol.,12, 97 -136 (1980). 86. B. Julesz, Figure and Ground Perception in Briefly PresentedIsodipole Textures, in M. Kubovy and J. R. Pomerantz (eds.),Perceptual Organization,Erlbaum, Hillsdale, NJ, 198137. J. R. Pomeratttz,Perceptual Organization in Information Processing in M. Kubovy and J. R. Pomerantz (eds.),PerceptualOrganization, Erlbaum, Hillsdale, NJ, 1981. 38. I. Rock, Perception, Sci. Am. Libr., W. H. Freeman, New York, 1984. gg. D. E. Broadbent, Perception and Communication, Pergamon Press,London, 1958. 40. A. M. Treisman, "Contextual cuesin selectivelistenin gl' Quart. J. Exper. Psychol., L2, 242-248 (1960). 4L. J. A. Deutsch and D. Deutsch, "Attention: Some theoretical considerations,"Psychol.Reu., 70, 80-90 (1963). 42. R. M. Shiffrin and W. Schneider, "Controlled and automatic human information processing.II. Perceptual learning, automatic attending, and a general theory," Psychol. Reu., U, t27-t90 (1 9 7 7 ) . 43. M. I. Posnerand C. R. R. Snyder,Attention and Cognitive Control, in R. Solso (ed.),Information Processingand Cognition: The Loyola Symposium,Erlbaum, Hillsdale, NJ, 1975. 44. D. E. Meyer and R. W. Schvaneveldt,"Facilitation in recognizing pairs of words: Evidence of a dependencebetween retrieval operations," J. Exper. Psychol. 90,227-234 (1971). 45. J. H. Neely, "semantic priming and retrieval from lexical memory: Role of inhibitionless spreading activation and limited capacity attention," J. Exper. Psychol.: Gen, 106, 226-254 (L977). 46. P. W. Cheng, "Restructuring versus automaticity: Alternative accounts of skill acquisition,"Psychol.Reu.,92, 414-423 (1985). 47. D. E. Broadbent, "Task combination and selectiveintake of information," Acta Psychol.,50,253-290 (1982). 48. D. Kahneman and A. M. Triesman, Changing Views of Automaticity, in R. Parasuraman, R. Davies, and J. Beatty (eds.), Varieties of Attention, Academic Press, New York, pp. 29-61, 1984. 49. W. Kintsch and T. A. van Dijk, "Toward a model of text comprehensionand production,"Psychol.Reu.,85' 363-394 (1978).
27. D. E. Rumelhart, J. L. McClelland, and the PDP ResearchGroup, Parallel Distributed Processing:Explorations in the Microstructure of Cognition,Vol. 1, MIT Press,Cambridg", MA, 1986.
50. D. E. Rumelhart, {Jnderstanding and Summarizing Brief Stories, in D. Laberge and S. J. Samuels (eds.),Basic Processesin Reading: Perceptionand Comprehension,Erlbauffi, Hillsdale, NJ, 1977. 51. A. D. Baddeley, The Psychology of MemoU, Basic Books, New York, 1976. 52. R. G. Crowder, Principles of Learning and Memory, Erlbauh, Hillsdale, NJ, 1976. 53. N. C. Waugh and D. A. Norman, "Primary memory," Psychol. Reu., 72, 89-104 (1965).
28. Cogn. Sci. 9(1), 1985-issue devotedto "Connectionism." 29. D. O. Hebb, The Organization of Behauior, Wiley, New York, 1949. 30. A. L. Glassand K. J. Holyoak,Cognition,2nd.ed.,RandomHouse, New York, 1986. 31. J. R. Anderson,CognitiuePsychologyand Its Implications,2nd ed., Freeman, San Francisco,CA, 1985. 32. M. Wertheimer, Principles of Perceptual Organization, in D. C. Beardsley and M. Wertheimer (eds.), Readings in Perception, Van Nostrand, New York, 1958 (abridged translation of M. Wertheimer, originally published 1923). 33. J. J. Gibson, The SensesConsidered as Perceptual Systems, Houghton Mifflin, Boston,MA, 1966. 34. D. J. McArthur, "Computer vision and perceptual psychology," Psychol.Bull., 92, 283-309 ( 1982).
54. R. C. Atkinson and R. M. Shiffrin, Human Memory: A Proposed System and Its Control Processes,in K. W. Spenceand J. T. Spence(eds.),The Psychologyof Learning and Motiuation, Vol. 2, AcademicPress,New York, 1968. 55. F. I. M. Craik and R. S. Lockhart, "Levels of processing:A framework for memory research," J. Verbl. Learn. Verbl. Behau., 11, 671-684 (1972). 56. R. N. Shepard and L. A. Cooper,Mental Images and Their Transformations, MIT Press, Cambridge, MA, L982. 57. S. M. Kosslyn, Image and Mind, Harvard University Press,Cambridge, MA, 1980. 58. N. Block (ed.),Imagery, MIT Press,Cambridge,MA, 1981. 59. E. Rosch,Principles of Categonzation, in E. Roschand B. B. Lloyd (eds.), Cognition and Categorization, Erlbaum, Hillsdale, NJ, 1978.
25. R. Thibadeau, M. A. Just, and P. A. Carpenter, "A model of the time courseof readiog," Cogn. Sci. 6' 157-203 (1982). 26. G. E. Hinton and J. A. Anderson, Parallel Models of Associatiue Memory, Erlbaum, Hillsdale, NJ, 1981.
120
COGNITIVESCIENCE
60. E. E. Smith and D. L. Medin, Categoriesand Concepfs,Harvard University Press, Cambridge, MA, 1981. 61. D. L. Medin and E. E. Smith, "Conceptsand conceptformatioil," Ann. Reu.Psychol.,35, 113-138 (1984). 62. D. Kahneman, P. Slovic, and A. Tversky (eds.),Judgment Under Uncertainty: Heuristics and Biases,Cambridge University Press, Cambridge, MA, 1982. 63. A. Tversky and D. Kahneman, "Extensional versus intuitive judgment: The conjunction fallacy in probability judgmert," Psychol. Reu.,90, 293-315 (1983). 64. R. E. Nisbett and L. Ross, Human Inference:Strategiesand Shortcomings of Social Judgment, Prentice-Hall, EnglewoodCliffs, NJ, 1980. 65. A. Tversky, "Features of similarity," Psychol. Reu., 84, 327-352 (1e77). 66. P. N. Johnson-Laird and P. C. Wason (ed.;, Thinking, Cambridge University Press,Cambridg", MA, 1978. 67. P. N. Johnson-Laird, Mental Models, Harvard University Press, Cambridge,MA, 1983. 68. M. D. S. Braine, B. J. Reiser, and B. Rumain. Some Empirical Justification for a Theory of Natural Propositional Logic, in G. H. Bower (ed.), The Psychologyof Learning and Motiuation, Vol. 18, Academic Press,New York, pp. 313-371, 1984. 69. P. W. Cheng and K. J. Holyoak, "Pragmatic reasoningschemas," Cogn.Psychol.L7, 391*416 (1985). 70. M. T. H. Chi, P. J. Feltovich, and R. Glaser, "Categotizationand representation of physics problemsby experts and novices,"Cogn. Sci., 5, I2I-152 (1981). 7I. J. H. Larkin, J. McDermott, D. P. Simon, and H. A. Simon, "Expert and novice performance in solving physics problems," Science,208, 1335- 1342 (1980). 72. D. Gentner and D. R. Gentner, "Flowing Waters or Teeming Crowds: Mental models of Electricity," in D. Gentner and A. L. stevens (eds.),Mental Models,Erlbaum, Hillsdale, NJ, 1983. 73. M. L. Gick and K. J. Holyoak, "schema induction and analogical transfer," Cogn.Psychol.,15' 1-38 (1983). 74. J. A. Fodor, T. G. Bever, and M. F. Garrett, The Psychologyof Language, McGraw-Hill, New York, L974. 75. H. H. Clark and E. V. Clark, Psychologyand Language, Harcourt Brace Jovanovich, New York, 19777G. D. J. Foss and D. T. Hakes, Psycholinguistics,Prentice-Hall, EnglewoodCliffs, NJ, 1978. 77. M. K. Tanenhaus,J. M. Leiman, and M. S. Seidenberg,"Evidence for multiple stagesin the p.rocessingof ambiguous words in syntactic contexts," J . Verbl. Learn. Verbl. Behau. 18, 427-440
(1e7e), 78. L. D. Erman, F. Hayes-Roth,V. R. Lesser,and D. R. Reddy,"The Hearsay-Il speech-understandingsystem: Integrating knowledge to resolveuncertainty," Comput. Suru., l2r 213-253 (1980). K. Hot YoAK UCLA
COGNITIVESCIENCE Relationto Other Fields Cognitive Scienceis an emerging field of study whose boundaries are far from being well deflned.A report prepared for the Alfred P. Sloan Foundation (a portion of which is reproduced as an appendix to Ref. 1) defines it as "the study of the principles by which intelligent entities interact with their environ-
ments" and notes that "by its very nature this study transcendsdisciplinary boundaries." In particular, the distinctions among cognitive psychology (qv), AI (qv), and cognitive science are extremely blurred in practice. This blurring is additionally exacerbated by the fact that research that clearly qualifies as cognitive scienceis being done in academicdepartments (as well as government and industrial research laboratories) whose titles identify them with disciplines as diverse as psychology, computer science,linguistics, anthropology, philosophy, education, mathematics, engineering, physiology, and neuroscience,among others. From an informal survey of cognitive sciencepublications, it is shown that papersin cognitive sciencejournals cited other papers in a very wide range of fields (1). Cognitive Science is also extremely closely related to AI. When the editors of the journal Artificial Intelligence decided to help their readership keep up with someof the literature in closely related disciplines by publishing regular "Correspondent's Reports" on work in these fields, they selectedthe areas of philosophy and logic, robotics, software engineering, natural language, cognitive psychology,and vision. Of these, all but perhaps software engineering and parts of robotics would be consideredcore areas of cognitive scienceresearch. Indeed, it has been argued (e.g.,Refs. z-il that AI and cognitive science may be nothing more than two paths to the same endunderstanding the nature of intelligent action in whatever physical form it may occur. The difference between them, 3ccording to this view, consists mainly in research style: AI takes the "high road" of asking how instances of intelligence can be realized (i.e., how they are possible)within the constraints of known computational mechanisms or how they might be attainable by the design of new mechanisms (i.e., new computational architectures); whereas cognitive science places greater emphasis on the question of how instances of intettigence are in fact realized within one particular architecture-the one constituted by the human mind. Becauseof this difference in orientation, many experimentally oriented cognitive scientists tend to place a somewhat greater premium on empirical fit, on testing processesagainst psychologicaldata to determine not only whether the two are input-output equivalent but also whether they are strongly equivalent, that is, whether in both casesthe behavior is produced by the same information-processing means. The notion of strong equivalence is central to much cognitive science,though it is not often discussedexplicitly. According to one interpretation (6), two processescan only be strongly equivalent if they produce the same behavior using the same computational process(or algorithm) and the same symbolic representations,something that is possibteonly if the two systemshave functionally identical computational architectures (i.e., the same primitive operations, the Same resource constraints, and the Same Symbolic notation). Despite this difference in principle between cognitive scienceand AI, differencesin practice are minimal. Indeed,it has even been argued (4) that a convergenceof the two approaches may be inevitable inasmuch as both adhere to a notion of intelligence that is inherently anthropocentric or human relative, at least at the Present time. Of course, the two fields diverge considerably in their applied side. A great deal (though by no means aII) of applied cognitive sciencedeals with such problems as designing better human-machine interfaces (see Human-computer interaction), better pedagogical methods (see Educational applications), better communications techniques, better aids for the
COGNITIVESCIENCE
handicapped (seeProstheses),or better methodologiesfor discovering such useful things as what experts know (seeKnowledge acquisition) or why children fail to read or do mathematics. What identifies these as cognitive science rather than simply applied psychology investigations is the fact that they take a fundamentatly computational view of the nature of the cognitive processinvolved; they view cognitive processas consisting of the execution of symbol manipulation procedures. Although it is clear that the fruits of such pursuits are relevant to what people in AI do, the work itself frequently requires different skills and proceedsusing different methodologresthan are typically (though, again, not always) found in AI laboratories. In contrast to this approach, applied AI places heavy emphasis on finding a practical match between available computational techniques and applications crying out for solution. As in all engineering or applied technology pursuits it must find suboptimal solutions to practical problems and proceedby incremental refinement. In terms of what has been referred to as the power generality trade-off (7), applied AI must perforce settle for the power end of the dimension. But none of this need be true, and indeed generally is not true, of basic research in either AI or cognitive science, where the overlap is gfeat enough that many are tempted to view AI as the more theoretical and more formal end of the spectrum of cognitive science research. This still leavesthe question: "What is cognitive science?" If it is simply the attempt to understand mental activity (or, as in the earlier quote from the Sloan report, to understand how intelligent entities interact with their environments), how is it different from psychology, especially from that branch of psychology that studies thinking, perception, memory, language, and so or, that is, cognitive psychology (qv)? Many people believe that cognitive sciencerepresents a new paradigm for understanding cognition, a paradigm that clearly owes much to developments in computer science. Yet one would like a better charactertzation than this, for if it is a new paradigh, it would be useful to know how it differs from other paradigms and on what assumptionsit stands. One would like to know this both in the abstract (i.e., What are some distinguishing principles of cognitive science?)and in terms of concrete examplesof how the new scienceis practiced and what it is seen as accomplishirg, or at least trying to accomplish. Many attempts at a statement of what cognitive scienceis have been made. One of the earliest was an unpublished report prepared by a committee (under the editorship of George Miller) for the Alfred P. Sloan Foundation, from which the earlier quote was taken. This report characterizeswhat is special about cognitive science and what runs through all the diverse work that falls under its scopeby defining its research objective as being "to discover the representational and computational capacities of the mind and their structural and functional representation in the brain." Although extremely general, this represents a fair statement. In addition to this early statement, the journal Cognitiue Science,th'e official organ of the Cognitive ScienceSociety, has published a number of articles that attempt to charactertze the field, beginning with its initial editorial, and include a number of papers first presentedat the inaugural conferenceof the Cognitive Science Society in 1979 (these were published in several issues of the journal, beginning with volume 4, 1980).An attempt at a systematic argument that cognitive scienceis not just a marriage of conveniencebut a genuine field of study is contained in Ref. 6.
121
In what follows are provided examples of the kinds of probIems that eognitive scientists are interested in pursuing and the approa.h.r that they take, pointers to literature that gives further details of such examples, and a brief statement of why some people believe that cognitive scienceis not just a collection oi r"r"arch problems that in one way or another are concerned with reasoning but a genuine scientific domain or inquiry. The reader must be cautioned, however, that this review is not without personal bias. It is primarily an attempt to char actenzecognitive sciencerather than to catalog someof its current resear.h dir"ctions (which are likely to changeradically in the next few years in any case).Moreover, a view of what is constitutive of cognitive scienceis presentedwhich the author believes to be correct and borne out by the classical work in the field (and defendedat some length in Ref. 6), yet one that nonethelessflies in the face of claims being made by some people who are legitimate researchers in cognitive science.This view concernsthe symbol-processingnature of cognition (what is referred to below as the representational metapostulate). Although this is not the proper forum for a debate on such issues,the author believes that the notion of symbolic representation is so very central to cognitive scienceand continues to be the central theoretical assumption underlying virtually all work in the field that it is appropriate to lay it out explicitly, even in the highty sketchy form presented here. SomeExamplesof CognitiveScienceResearchProblems Language.The study of the human capacity for language is one of the oldest areas of research in cognitive science.It is also one that has changed dramatically in the past two decades, partly under the influence of formal linguistics and partly because of attempts to develop computer systems for understanding natural langUage (see Natural-language understanding). It thus provides a prime example of cross-disciplinary cognitive scienceresearch,albeit one that continues to be steeped in controversy. In recent years this study has also encompassedwork by philosophers, as researchers become more concernedwith issuesof semanticsand pragmatics, with problems of meaning and discoursethat had occupiedphilosophers long before these problems arosein AI. It also brought in the work of clinical neuroscience,which investigated the taxonomy of language deficits causedby trauma and disease. This work has led to computational models of language performance. At the present time a number of alternative models of syntactic analysis (seeParsing) have been published and psycholinguistic research provides provocative evidence that parsing proceedswith only minimal input from the rest of the cognitive system, as is also the case,by the wBY, in most computational language understanding systems[a notable exception is the work of Schank and his colleagues(8)1.Experimental studies have also shown clearly that the lexical lookup (see Morpholory) phase of grammatical analysis retrieves many homogfaphs or homonyms of ambiguous words (9,10), thus empirically validating one computational proposal. Vision. The idea, popular in the 1950s,that perceptionconsists of hypothesis testing was challengedfirst by peopleworking on computational vision (qv) (e.g.,Refs. 11 and l2), who argued that it would be highly wasteful to not extract as much information as possible from the initial image before bringing cognitive processesto bear. Models, such as those developedby Marr and his colleagues (12), showed that a considerable amount of processingcould be done in a data-driven manner
122
COGNITIVESCIENCE
(see Processing,bottom up and top down). These ideas were this particular class of approachesis an allegiance to the netthen validated by psychophysicalinvestigations as well as by work of ideas that might roughly be summanzed as follows (1): findings from neuroscience(e.g., concerning the existence of separatespatial frequency channels,motion detectors(seeMo- 1. The approach is formalist in spirit: That is, it attempts to tion analysis), sensitivity to maxima in intensity derivatives, formulate its theories in terms of symbolic mechanismsof etc.). Someof this cross-fertiltzation is nicely illustrated by the the sort that have grown out of symbolic logic (qv), alpapers in Ref. 13. Although this work is described in some though the apparatus of formal logic itself very rarely apdetail elsewhere in this encyclopedia,it is in fact an excellent pears in cognitive sciencetheories. example of cognitive science research that falls at the more 2. The "level of analysis" or the level at which the explanacomputational end of the spectrum. The relevance of both the tions or theories are cast is functional and they are devision and the psycholinguistics work to the understanding of scribed in terms of their information flow. What this means mind is discussed in an insightful and provocative way in in particular is that this approach factors out questions Ref. 14. such as how biological material carries out the function and how biochemical and biophysical laws operate to produce Expertiseand QualitativeReasoning.The study of expert systhe required information-processingfunction. This factoritems (qv) (or, as it is sometimescalled, knowledge engineerzation is analogousto the separation of electrical engineerirg) both inspires and benefits from experimental investigaing considerations from programming considerations in tions of how experts in such areas as physics, mathematics, computer science.This doesnot mean that questionsof bioelectronics, medicine, or chessdiffer from their inexperienced logical realization are treated as any less important, only counterparts. Findings concerninghow experts structure their that they represent a distinct and to a large extent indepenknowledge and how this structure differs from that of less dent area of study. According to this view, neuroscience experiencedperformers is an interesting chapter in recent cogcontributes an understanding of how such computational nitive science.These investigations also relate to studies in processesas are uncoveredby empirical observationsof huboth psychologyand AI of how peoplereasonby building qualman capacities are realized by biological mechanisms. itative mental models (e.g.,Refs. 15-L7). Not everyone agreesthat cognition can be studied independently of its neurophysiological instantiation. There is, Modelsof Human Performancein VariousTasks.In this catefor example, an approach, sometimescalled connectionist gory one finds computational models of human performanceon (seeConnectionism),which attempts to build modelsof cogarithmetic (18), tasks involving interacting with text editors nition that are guided more closely by ideas from neuro(19), typing and other skills (20), and reasoning with spatial sciencethan by symbol-processingideas from current comproblems (21). Closely related to this work is the general study puter science.Some examples of such models can be found of cognitive skill, its acquisition and its nature Q2). Underin Ref. 27 and the specialissueof CognitiueSciencedevoted standing cognitive skill requires distinguishing cognitive cato this approach t9(1), (1985)1.Although this approachis pacities from performance differences that arise from differextremely promising from the perspective of modeling the ences in knowledge or habit, a difference that parallels the functional architecture of the mind, there is considerable distinction between functional architecture and computadoubt that it can displacerule-governed symbolic processes tional procedures.The importance of this distinction to underentirely, 3s some have claimed (seeRef. 6). standing the nature of cognitive processes(and of strong 3. In addition to factoring apart questions of capacities from equivalence)is discussedin Refs. 6 and 23. questions of biological reahzation, the approach is also characterized by the techniques it uses in formulating its Learning. The area of learning was one of the most thortheories and in exploring the entailments of its assumpoughly investigated during the last half century of psychology, tions. The most widely used (though not universal) techwith very little progresson what peoplecall learning in everynique is that of computer implementation. Thus, an imporday life. The work was guided by preconceivedideas about the tant methodological goal of cognitive scienceis to specify underlying mechanism (namely, association)rather than by a symbol-processingmechanisms that can actually exhibit careful analysis of the types of learning and the types of mechaspectsof the behavior being modeled.Adherenceto such a anisms capable of meeting the sufficiency condition that is "sufficiency" criterion makes this approach in many recentral to cognitive science. More recent work on language spectslike a design discipline rather than natural science, Iearning by cognitive scientists has shown that the acquisition at least insofar as the latter typically attempts to uncover a of syntax from the kind of evidencegenerally available to the small set of fundamental axioms or laws. Its concern with child would not be possiblewithout severeconstraints on both synthesis makes it, to use Simon's phrase (28), one of the the structure of the languages that can be learned and severe "sciencesof the artifici &I," along with AI. constraints on the mechanisms that could learn such lana strategy sometimesreguages. In particular, it is necessarythat the range of gram- 4. The approach tends to emphasize a premium is given in which analysis, as top-down to ferred mars that the organism could consider as possiblehypotheses general cognitive skill the how understanding of task the to must be extremely limited (seeRef. 24)-The same may also be (consonant the constraint of with possible is question in (25). More recent work on learning true of concept acquisition for emaccounting of task the with in contrast mechanism) forms of look at some to provided ways new within AI has also with contrasts in style This difference particulars. pirical learning in humans (26) (seealso Learning' machine). the traditional approach in experimental psychology that emphasizesthe observational fit of models. The contrast is of CognitiveScience Conclusion:SomeCharacteristics examined in Refs. 4, 29, and 30. Cognitive scienceis not the only form in which the search for 5. This commitment to the informational level also placesthe enterprise in contrast to the phenomenologicalapproachin an understanding of mind is proceeding.What characterizes
COGNITIVESCIENCE
123
legal tender). Though in both economicsand cognitive science th. meaning-bearing objects (or the instantiation of the symbols) are physical, it is only by referring to their symbolic or referential character that we can explain the observedregularities in the resulting behavior. There has been some misunderstanding of the significance of the assumption that cognition is explained in terms of regusome people have suggestedthat this is The above general characteristics of cognitive science are larities. For L*u*ple, any other sciencesince all scientific theories also shared to various degreesby other scientific disciplines. no different from (e.g., mathematical symbols that The formalist or symbolic mechanistic character 1 is deeply deal with representations properties). Hence simulations inor objects certain genera' (especially designate in entrenched in contemporary linguistics (e.g., of planetary motions) simulations theories such volving parts anthropolof tive grammar), decision theory, and even in principle from different no be to thought sometimes are is now 2 perspective functionalist ory (..g., Levi-strauss). The the two types of in difference the q,rit. general in psychologyand philosophy of mind as well as simulations of cognition. But caseof cogniin the because fundamental fact in is in engineering, where it is referred to as the black-box ap- simulation not just the modeled, being organism the is that claim the tion, as proach. Both 1 and 2 are fundamental to computer science physical tokens of the symbols, well as to any sciencethat concernsitself with notions such as theorist, actually manipulates parallel in physics unlessthe physino has clearly that a claim Such control. of distribution the or the flow of information ideas have thus affected everything from engineering to man- cist is being modeled! This representation thesis, sometimesreferred to in philosagement scienceand even political science(e.g.,as exemplified the as the "representational theory of mind" (32) and in cogophy as prevalent quite so in Ref. 35). Criteria 3 and 4 are not system" hypothesis first two. For example, the desire to synthesize aspectsof the r,itiu. science as the "physical-symbol phenomena being modeled as part of the attempt to under- (40,41)is one of the foundational cornerstonesof the discipline links it in a stand it is not widespread in the social sciencesoutside of the of cognitive scienceand is one of the features that philosophical and intellectual The AI. to way fundamental areas of cognitive psychology and management science[espelinked cially the branch of the latter called industrial dynamics (36)1, underpinnings of these two fields are now so closely pragnor is it yet very common in biology [see, however, Marr's that the distinction between them remains mostly at the role actual a how big as things such on critique of theories in neurophysiology that fail to characterize matic level, resting the constructive computational aspect of biological function computer programs play and how technical are the immediate (37)1.Even modern linguistics, which is in many ways a proto- applications of the research. Somepeopleexpectthat as cognitypical cognitive science,places little emphasison the human tive scientists becomebetter trained in computer science,and gencapacity to actually generate samples of performance (see, as AI begins to tackle the harder problem of what makes fields the between possible, distinction the intelligence however, an example of the contrary trend in Refs. 38 and 39). eral will fade. Similarly, the philosophy of mind is being influenced more and more by developmentsin AI and might be expected suggested as Although, Metapostulate. The Representational earlier, there are a number of theoretical and methodological to play a more central role in clarifying the difficult conceptual characteristics that pervade a variety of approachesto the issues that face both empirical and theoretical studies of intelunderstanding of intelligence and human cognition, there is ligence. one overriding theme that more than any other appears to charactenze the field of cognitive science.There are a number BIBLIOGRAPHY of ways of expressing this theme, for example, as the attempt to view intelligent behavior as consisting of the processingof 1. Z. Pylyshyn,Information Science:Its Rootsand Relationsas information or as the attempt to view intelligence as the outin F. Machlup of CognitiveScience, Viewedfrom the Perspective comeof rule-governed activity (seeRule-basedsystems).These and U. Mansfield(eds.),The Study of Information:Interdisci' charactenzations expressthe sameunderlying idea. ComputaWiley,New York, 1983,pp. 63-80plinary Messages, tion, information processing, and rule-governed behavior all 2, A. Newell,Remarkson the RelationshipbetweenArtificial Inteldepend on the existence of physically instantiated codes or in R. Banerji and M. D. Meligenceand CognitivePsychology, symbols that refer to or represent things and properties extrinsarovic(eds.),TheoreticalApproachesto Non-NumericalProblem sic to the behaving system. In all these casesthe behavior of New York, 1970. Soluing,Springer-Verlag, the systems in question (be they minds, computers, or social 3. A. Newell,Artificial Intelligenceandthe Conceptof Mind, in R. C. systems) are explained not in terms of intrinsic properties of Schankand K. Colby (eds.),ComputerModelsof Thoughtand the system itself but in terms of rules and processesthat operLanguage, Freeman, San Francisco, 1973. ate on representations of extrinsic things. Cognition, in other 4. Z. Pylyshyn, "Validating computational models: A critique of Anwords, is explained in terms of regularities holding over sederson's indeterminacy of representation claim," Psychol. Reu., 86(4),383-394 (1979). mantically interpreted symbolic representations,just as the behavior of a computer evaluating a mathematical function is 5. Z. Pylyshyn, Complexity and the Study of Human and Machine Intelligence, in J. Haugeland (ed.;,Mind Design, MIT Press,Camexplained in terms of its having representationsof mathematibridge, MA, 1980. cal expressions(e.g.,numerals) and in terms of the mathematical properties of the numbers these expressions represent. 6. Z. Pylyshyn, Computation and Cognition: Toward a Foundation This is also analogous to explaining economicactivity not in for Cognitiue Science,MIT Press, Cambridge, MA, 1984. 7. G. W. Ernst and A. Newell, GPS: A CaseStudy in Generality and terms of the categoriesof natural science(e.g.,speakingof the Problem Soluing, Academic, New York, 1969. physicochemicalproperties of money and goods)but in terms these objects value of symbolic or meaning 8. R. C. Schank and R. P. Abelson, Scripts,Plans, Goals and Underof the conventional (e.g., that they are taken to represent such abstractions as standing, Erlbauffi, Hillsdale, NJ, L977. which the existential notions of signifi.cance,meaningfulness,and experiential content is given a central role in the analysis and with behaviorism, which attempts to analyze behavior without appeal to internal representational states.For a discussionof these issues,seeRefs.6 and 3134.
124
COLOR VtStON
9. D. Swinney, "Lexical access during sentence comprehension: 35. K. Deutsch, "The nerves of government," Gen. Sysf. Yearbk., zl, (Re)considerationof context effects,"J . Verb. Learn. Berb. B ehau., L25-L76 (1963). t8, 645-660 (1979). 36. J. w. Forrester,WorldDynamics, Wright/Allen, Cambridgu, MA, 10. M. S. Seidenbergand M. K. Tanenhaus, "Modularity and Lexical I97I. t Access,in I. Gopnik and Myrna Gopnik (eds.),From Models to 37. D. C. Marr, Approachesto Biological Information Processirg,SciModules: Studies in CognitiueScience,Ablex Press,Norwood,NJ, ence,190, 875-876 (1975) (Book Review). 1985. 38. M. Marcus, A Theory of Syntactic Recognition for Natural Lan11. S. Zucker, A. Rosenfeld,and L. Davis, General PurposeModels: gua,ge,MIT Press, Cambridge, MA, L979. Expectations about the Unexpected, Proc. of the Fourth IJCAI, 39. R. C. Berwick and A. S. Weinberg, The Grammaticul Basis of Tbilisi, Georgia,September3-8, L975,The Artificial Intelligence Linguistic Performance: Language Use and Acquisition, MIT Laboratory, Publications Department, Cambridg", MA, pp. 216Press,Cambridg", MA, 1984. 7 2 L ,1 9 7 5 . 40. A. Newell, "Physical symbol systems,"CognitiueScience,4, L3512. D. Marr, Vision, W.H. Freeman, San Francisco,1982. 183 (1980). 13. M. Brady, Artificial Intelligence: An International Journal (Spe4L. A. Newell, "The knowledgelevel," Artif. Intell.,18, 87-L27 (L982). cial Volume on Computer Visian), 17(1-3), (Aug. 1981), NorthHolland, Amsterdam, 1981. Z. W. PvlvsHvx 14. J. A. Fodor, ThrModularity of Mind: An Essay on Faculty PsyUniversity of Western Ontario chology,MIT Press, a Bradford Book, Cambridge, MA, 1983. 15. J. R. Hobbs and R. C. Moore,FormalTheories of the Commonsense World, Ablex, Norwood, NJ, 1984. 16. D. Gentner and A. L. Stevens,Mental Models,Erlbauffi, Hillsdale, NJ, L982. L7. P. N. Johnson-Laird, Mental Models, Harvard University Press, Color enriches one's everyday visual experience.Comparing a Cambridge,MA, 1983. color image with a monochrome (black-and-white) image, the 18. J. S. Brown and K. Van Lehn, "Repair theory: A generativetheory color picture seemsto be alive with detail becauseof all the of bugs in proceduralskills," Cogn.Scl. 4,379-426 (1980). (qv)
coloR vlsroN
19. S. K. Card, T. P. Moran, and A. Newell, The Psychologyof Human-ComputerInteractions,Erlbaum, Hillsdale, NJ, 1983. 20. W. E. Cooper,Cognitiue Aspectsaf Skilled Typewriting, SpringerVerlag, New York, 1983. 2L. S. M. Kosslyn,Image and Mind, Harvard University Press,Cambridge, MA, 1980. 22. J. R. Anderson, "Acquisition of cognitive skill, Psychol.Reu.89, 369-406 (1982). 23. Z. Pylyshyn, "The imagery debate: Analogue media versus tacit knowledge,"Psychol,Reu.,88, 16-45 (1981). 24. K. Wexler and P. Cullicover, Formal Principles of Language Acquisition, MIT Press,Cambridg", MA, 1980. 25. W. Demopoulosand A. Marras, Language, Learning and Concept Acquisition: Foundational Issues,Ablex, Norwood, NJ, 1985. 26. R. S. Michalski, J. G. Carbonel, and T. M. Mitchell, Machine Learning: An Artificial IntelligenceApproach vol. 2, Tioga Press, Palo Alto, CA, 1986. 27. J. A. Anderson and G. E. Hinton, Models of Information Processing in the Brain, in G. E. Hinton and J. A. Anderson(eds.),ParalIel Models of AssociatiueMemoU, Erlbaum, Hillsdale, NJ, pp. 948,1981. 28. H. A. Simon, The Sciencesof the Artifi.cial, Compton Lectures, MIT Press,Cambridge,MA, 1969. 29. A. Newell, Remarks on the Relationship between Artificial Intelligence and Cognitive Psychology,in R. Banerji and M. D. Mesarovic (eds.), Theoreticq.lApproaches to Non-Numerical Problem Soluing, Springer-Verlag, New York, 1970. 30. A. Sloman, The Computer Reuolution in Philosophy: Philosophy, Science, and the Models of Mind, Humanities Press, New York, 1968. 31. R. Cummins, The Nature of PsychologicalExplanation, MIT Press, a Bradford Book, Cambridge, MA, 1983. 32. J. Fodor, Representations,MIT Press, a Bradford Book, Cambridge, MA, 1981. Haugeland, Mind Design, MIT Press, a Bradford Book, CamJ. 33. bridge, MA, 1981. 34. D. Dennett, Brainstorms, MIT Press, a Bradford Book, Cambridge, MA, L979.
additional information in the image. In computer vision researchers have attempted to harness this additional information. The simplest method for using color is by associating colorswith objects,for example, "trees are green" and "the sky is blue." If specific object colors are not known, BD image can still be chopped into meaningful pieces by finding regions of uniform color. Recent theories have been proposedto analyze color in terms of physical properties of objects,and efforts are beginning to model the processingof color information in the human visual system. Color and Color lmaging Color arises from the spectral properties of light. Figure 1 shows the spectrum of electromagnetic enerry. Wavelengths of enerry are customarily denoted by tr, and the unit of measure is the nanometer (nm). Visible light lies within a range of approximately 380-760 rffi, running the gamut from blue at the low end of the visible spectrum,through green, yellow, and orange, to red at the high end. To the right of the visible spectrum is the near-infrared (near-ir) portion of the spectrum, which is also frequently used in computer vision. Visible Iight is normally a mixture of energy at many wavelengths and is char actertzedby the spectral power distribution (SPD) that tells how much enerry is present at each wavelength. An SPD is usually denotedby S(I) (1). At each pixel (point) in an image the SPD of the incident light determines the pixel value. Monochrome imaging is simpler than color imaging. An imaging sensor(seeSensors)is sensitive to the different wavelengths of light to varying degrees,as expressedby the spectral responsivity s(I) of the sensor(a). Typical spectral responsivities of the two principal types of sensor,vidicon tubes and silicon CCD chips, are shown in Figure 2. The output pixel value p at any point in the image is defined, for a calibrated camera, by p-
dr, p, Js(^)s(r)
COLOR VISION
125
Visible
Ultra violet
TandX rays
UltrahiehfrequencY and radiobroadcast
l n fr a r e d
1 0 -4
10 12
108
104 W a v e l e n g t hn,m (a)
I r lrr
r rlr
rrr
400
I rrr
tr ltttt
rl rtttltt
lttlt 700
600
500 W a v e l e n g t hn, m (b)
Figure 1. (o) Spectrum of electromagnetic radiation (from Ref. 2\. (b) Magnification of visible - yellow, and O - orange. portion of spectrum (from Ref. 3): C - cyan, G - green, Y
where ps is a scaling factor. For a vidicon, s(tr) is appronmately equal to V(tr), the spectral luminous efficiency of the human eye, so a monochromeimage from a vidicon looks similar to the brightness seenby a person.However, CCD sensors are much more sensitive in the near-ir region of the spectrum; for this reason, CCD cameras are frequently fitted with ir cutoff filters or filters to match the responsivity to V(I). Color imaging is more complex, assigning a color to each SPD instead of a single number. However, the set of all perceivable colors,as determined by color-matching experiments, is only a three-dimensional space.Humans cannot distinguish alt different SPDs from each other but only those SPDs that correspond to different colors in color space. Since this is a many-to-one correspondence,there can be many SPDs that have the same color; such SPDs are said to be metameric. The axes of color space,called primary colors,can be chosen arbitrarily. A convenient set, universally used for color measurement, is the X-Y -Z set of colors adoptedby the CIE (International Commission on lllumination); each distinct point in X-Y-Z spacecorrespondsto a unique color perception. A psychophysical color C is defined by C:
: [l'(r)r(r)drl L;l Lj:[]llllilj [*l
The functions f,(I), i(I), and ,(I) are CIE tristimulus values that define the primaries X, Y, and Z (7). Sincei(I) - V (\), Y correspondsto luminance (brightness of color as seen by the human eye); the remaining coordinatesX and Z determine chromaticity (the aspectsof color independentof brightness). Color imaging for computer vision follows the paradigm of color television, in which colors are measured using color filters. For any filter, the output pixel value p is determined by
The filters used for color television are red, green, and blue to maximize the gamut of measurable color values. For a particular sensor s(tr), the fiIter transmittances rr(tr), rr(tr), and ra(tr) of the red, green, and blue filters determine tristimulus values f(I) : r.(L)s(I), gi(I) : rr(tr)s(tr),and 6tfl - ro(tr)s(I). For standard color television, these functions should obey (8)
0.b82 - 0.LG4 - 0.08eI[t ( r ) I [' r r II - l--0.301 0.611 -0.0087 I grrl | | ll ytrl I 4 0.27 0.08G2 )Lz(r)l Larrtj L 0.0128 Color pixel values P are then determined by P:
lnl : ['o J s(r)r(r)drl
lq | lq'{,s_(r)q(r)q^ | s (r)b(r)dr LBJ LboJ
J
where ro, go, and bs are scaling factors. "White" in a color TV image correspondsto CIE Standard Illuminant C (9). A broadcast color camera contains three separatesensors,each with a color filter, and beam-splitting optics to direct the image simultaneously to all three sensors;for research in computer vision, a single monochrome camera is normally used with a filter wheel to rotate each filter in turn into position (Fig. 3). The recent developmentof three-color CCD sensorchips prom-
.= .z a c o oa 0) E
p : Po I S(I)r(\)s(r) d\ where r(tr), the transmittance of the fi.lter, is the fraction of light the filter allows to pass through at each wavelength. To yield color values that uniquely correspondto color perceptions, three filters must be used that span the X-Y-Z space.
400
600
800
1000
W a v e l e n g t h ,n m Figure 2. Spectral responsivities Refs. 5 and 6).
of vidicon
and CCD chips (from
cotoR vrsroN
(m) beam-splittingmirrors (f) R, G, B color filters (s) sensor tubes or chips
(A) Typicalbroadastcolor televisioncamera
$ O
magnificationof sensor area
fifterwheel with R, G, B filters
(B) Typical.colorcomputervisioncamera
R
G
R
G
R G
G
B
B
R
G R G
R G
R
G
B
G B
G B
G
R
G R
G
R
R
monochromecamera
G
G
G B G
R
(C) Single-chipcolor CCD sensorchip with R, G, or B filter on each pixel Figure 3. Color camera arrangements.
ises more compact and inexpensive color camerasfor the near future. In computer vision, unfortunately, other measurement factors are usually uncontrolled including sensor spectral responsivity, nonlinear responseto intensity, and gain within each color band. This results in a lack of correspondencebetween color computer images and NTSC color TV standards. The usual color fiIters for computer vision are Kodak Wratten filters #25 (R), #478 (G), and #58 (B) (10,11).Infrared filters tr(I) highest in near irl are also used in remote sensing.In this entry color refers to colors measured in a computer vision system using these or similar filters. Color Spacesand Transformations In computer vision color pixel values usually contain R, G, and B values each measured in 6 or 8 bits. The set of image colors is thus a cube called the color space(Fig. 4). Intensity, measuredby I - (R + G + B)/3, is the main diagonal of this cube from black (0, 0, 0) to white (max, max' max). Researchers in computer vision have frequently used R'G'B coordinates, but have also explored transformations to other coordinate systems that have useful properties. One such system is the CIE X-Y-Zsystem, a linear transform of R'G'B as defined
above.This system was proposedbecauseof its use as an international standard. However, it is psychophysicalrather than psychologicaland hence doesnot capture the subjective attributes of color perceptions. These latter are incorporated into such systems as the Munsell color order system (2). The Munsell system defines three color attributes (hue, chroma, and value) that correspond roughly to the more familiar brightness,hue (color name, e.g.,blue, purple), and saturation (relative amount of pure hue as opposedto gray). In computer vision, normalized colors r-g-b are first defined by r - R/I, g Gll,and b - Bll; saturation S and hue H are then defined (12) by S:1-min(r,g,b) cos
2r-g-b
if b s g, then II - x; otherwise H - 2rr - x (SeeRef. 13 for a fast algorithm to compute H.) In color spaceI correspondsto distance along the intensity axis; on a plane of constant intensity S and H form a polar coordinate system with S measuring distance from the center (gray) point and H measuring angle from pure red (Fig. 5). It is easily seen that
coloR vlsloN white
color differences aE are then expressedby AE_
Green
However, it seemsunlikely that even the use of CIELUV coordinates will solve the fundamental problems of color image segmentation and analysis. other color spacesfrequently used foi computer vision include the NTSC television broadcasting encodingsystem Y'I'Q defined (15,10)by
0.58? 0.114-l lnl -lilljlgj : lo.zgg --3?7,i' Lil L:;ll
l-vl Magenta
Y is the same as the CIE Y coordinate; I and A measure chromaticity using parameters optimized for the acuity of the human eye. Other researchers have used opponent colors Black Red (black/white, bluelyellow, and red/green) or even normalized colors r-g-b themselves. Figure 4. The R-G-B color space' Kender has pointed out some serious problems in using certain color features (13). All features defined by a division (including HSrgbUVW, etc.) contain singularities and tend to for dethe H-S-f system is analogous to the Munsell system in an image becauseof the small intescribing human color perceptions, but numerically the two are be unevenly distributed ger nature of the values used in their computation. This creates quite distinct. clustering, and Even the Munsell systeffi, however, has drawbacks. Most problems for algorithms based on histograms, bits per output of number small a recommends Kender edges. work with color in computer vision has been based on the (adding a random randomizing features, for such pixel value in a distance Euclidean notion of color differences expressedas pixel color coordinate), and color space, but Euclidean distance in Munsell coordinates real number from 0 to 1 to each their singularities (usunear entirely does not correspond well to subjective perceived color differ- avoiding these features transformations of linear use to is better It low). is I when ence magnitudes. Spaceswith that property have been pro- ally transformations linear such Y-I-Q; as such instead, R-G-B color uniform posed, however, and are generically called value in each maximum the so scaled be themselves should probeen has occasionally LI-V'W, called spaces.The earliest, (16) derived 1'0' Ohta is matrix posed for computer vision; it has been replaced in the color row of the transformation (using algoOhlander's regions image 100 for over (14). statistics In sciencecommunity by a newer system called CIELUV color the analyzed statistically and below) described rithm, the CIELUV system define L*, u*, and v* for each color as frequently most that feature the that found He distributions. incident to the (quantities n refer denotedby subscript follows captured the greatest information was intensity, I. Ohta proillumination color): posedthe use of three features I1-I2-I3 that represent a simple Ln _ 116(Y/Y,,;tra 16 u* - 13L.(u r,,) linear transformation from R-G-B and capture the color infor-- (R + G + B)/3; 12 - R _ mation he observedvery well: I1 v* lBL.(v v, ) : B; I3 (2G R B)12. Ohta found these features to perform where at least as well as any other set of features (X'Y'Z, R'G'B, Y' I-Q, U-V-W, I-r-g, H-S-l) in his system.However,as he noted, v-gY/(X+15Y+32) u-4X/(X+15Y+32)
Green
White
Green Saturation
Blue A plane of constantintensity
Black Color space Figure
5. Hue, saturation,
and intensity.
128
COLOR VtStON
"usefulnessof a color feature is greatly influenced by the structure of the color scenesto be [analyzedf.,,
(qv) can be used instead. Haralick and many others (22) have applied this standard pattern recognition (qv) technique to color image segmentation. A histogram is first created by the color values at all pixels; it tells, for each point in color space, Color as a StatisticalQuantity how many pixels exhibit that color (Fig. 6). Typically, the Most research in color computer vision has been for image colors tend to form clusters in the histogroh, one for each segmentation, breaking an image into pieces that have uni- object in the image. By manual or automatic analysis of the form properties. In this work color is usually regarded as a histogr&ffi, the shapeof each cluster is found. Then, each pixel random variable to be analyzed statistically but without re- in the image is assignedto the cluster that is closestto the gard for the specific physical processesthat give rise to color pixel color in color space.Clustering differs from spectral sigand color variation. The earliest and most obvioustechniqueis nature analysis in that the clusters are found by analysis of called spectral signature analysis, in which prior knowledge the specific image under consideration rather than by prior about characteristic object colors is used to classify pixels. consideration of the expecteddata. In someclustering systems Spectral signature analysis has been used extensively in re- clusters are restricted to be rectangular boxes or ellipsoids mote sensing (satellite and aerial photograph interpretation) (23-25). Features that describetexture have been used along and biomedical image analysis; it has been applied occasion- with color to create a "feature space" with additional dimenally in robotics (qv) research.For example,Noguchi (17) clas- sions (26). All clustering techniques suffer from the problem sifies pixels in biomedical images of cells. He measuresthe that adjacent clusters frequently overlap in color space,caustypical colors of background, cytoplasm, and nucleus in ad- ing incorrect pixel labeling. In conjunction with clustering, a vance; then, for each image, each pixel is individually classi- technique called relaxation is sometimesusedto improve pixel fied into one of these categories. Whichever category has a Iabeling. In relaxation pixel labels are assignedby an iterative characteristic color closest to that pixel's color is assigned method.Each pixel has a probability of belongingto eachclusas the pixel label. The distance metric used in this work ter, and in each iteration step those probabilities are modified. is Euclidean distance in R-G-B space, that is, AP A probability is increasedor decreasedaccordittgto a weighted . A similar technique is used to label combination of two factors: the color resemblanceof the pixel image regions as specificobjectsusing absolute color compari- to the cluster center and the probabilities that the neighboring sons in production systems (18,19).Frequently, nonstandard pixels belong to that same cluster (27,28). color filters or features are used that optimize discriminability A refinement on clustering is region splitting, in which the for the specifictask at hand (20,2L). image is broken into successivelysmaller pieces until each If specificobjectcolors are not known in advance,clustering piece has a uniform color (and presumably representsa single (A)
(s)sky
Inputimage
(b)building
(g) grass
(B) in Histogram colorspaceGreen Yellow
c,
rJ
s- s s s BlueMagenta
Black
Red
g, b, s are pixelsin grass,building,and sky Clustersare outlined Figure 6. Clustenng in color space.
coloR vlsloN objector surface).Ohlander used color to perform this splitting operation (2$. His method begins by computing a variety of color features for each pixel (H, S, I, Y, I, and Q) before the actual segmentation begins. Then, the splitting step is applied (the initial region is the entire image). In the splitting step a histogram is created for each color feature (RGBHSIYIQ) within the region. Each object tends to produce a peak in the histogram so the clustering task becomessimply to find a single prominent peak in one of the histograms (Fig. 7). Each histogram is examined independently, and the feature whose histogram exhibits the most prominent peak is selected for use. The image is then thresholded according to the boundaries of the peak to isolate the pixels that contribute to the selectedpeak. This splits the original region into smaller regions. The splitting step is then recursively applied to each region, stopping when a region has all flat histograms (i.e., it has uniform color). A variation of this method isolates several peaks in each step (30). Some of the general problems associated with histogram analysis in this algorithm are discussed in Ref. 31. Region growing (qv) is a different segmentation technique
in which small regions (initially individual pixels) are merged together to form larger and larger regions according to color similarity. This method merges the pair of adjacent regions with the gteatest similarity of colors,accordingto a statistical measure. It may take into account both the average color and color variance within each region. Merging continues until no two adjacent regions are sufficiently similar. Variations include simple color differencemeasures(32-34), the use of color features other than R-G-B (32,35,36),the use of semantic information about object positions and relations (37,12),and assigning a globally optimal set of pixel labels after examining all pixels (38). Nagao determined the acceptablecolor difference limits by finding a valley in local color difference histograms (36). Edge detection (qv) techniques have also been examined for color images. Nevatia produced a method for color edgedetection in which edges are first detected in each color feature separately (39,40).The computed edgedirections at each pixel are then averaged to determine a hypothesizededgedirection at that pixel. Using that constraint, each color feature is reexamined to see if there is sufficient evidence to confirm the
(A) Inputimage
(s)sky (b)building
(g) grass
max
0
max
Greenhistogram
Red histogram
Blue histogram
(g), (b),(s) indicatepixelsof grass,building, andsky
(c) Regionoutlinesindicatedby thick lines
After thresholding
129
Region # 2
Each regionwill now be split into smallerregions Region # 1
Figure 7. Region splitting by histogram analysis.
130
COLOR VTSTON
presenceof an edge in the hypothesizeddirection. If confirmation is found in each color feature, the edgeis consideredto be present. A different analysis of color at image edges is Kanade's method for color edge profile analysis (41). In this method, when the geometric arrangement of regions suggests that two regions are part of the same surface,the colors along their corresponding edgesare matched to confirm or deny this hypothesis. In summary, every major image segrnentationmethod has been adaptedfor color, and some,like relaxation segmentation and region splitting, are almost always performed on color images. In the near future there will be applications of color for spectral signature analysis in robotics applications and research in the use of color for matching tasks such as stereo image analysis.
Color as a Physicalor PerceptualQuantity
have a crossoverpoint (wavelength at which the sign of the difference changes) if the material being viewed at the two pixels is different. By looking for sign changes in the color components of adjacent pixels, material boundaries can be found. Both of these quantitative methods dependon idealized assumptions about reflection and imaging, but this type of research appears quite promising and wiII be important. Yet another view of color is to interpret it as a perceptual variable in human vision. Still in its infancy within the computer vision eommunity, this work involves the explicit modeling of color procegsingwithin the human visual system. Researchers are currently studying the color constancy phenomenonthat allows humans to seeobject colors the same regardless of illumination (45) and the possible function of retinal and brain cells sensitive to specific color patterns or orientations @6,47).This researchis still in early stagesbut is arousing great interest, and as the understanding of human vision increases,so will the sophisticationof modeling of human color vision. The reader interested in color measurement is referred to Ref. 2, dvery readable discussion,and the more technical discussion of color standards in Ref. 14. Useful reference handbooks are Ref. 7 , which contains many tables and formulas, and Ref. 1, which presents formal definitions of terms and units of measure. A discussion of the physiology of human color vision is found in Ref. 48; Ref. 49 is older but also contains an excellent discussion of color perception. Surveys of color in computer vision are Refs. 50-54; however, these tend to survey human color vision more deeply than computer color vision.
Although most of the work in color computer vision has viewed color as a random variable to be used for image segmentation, there has been some progress in viewing color as a physical variable instead. In this work knowledge about how color is created is used to analyze a color picture and compute some important three-dimensional facts about the objects being viewed. The most successfulsuch research has used heuristic rules (seeHeuristics; Rule-basedsystems),usually embedded in production systems,to label regions as shadows,highlights, and so on, using knowledge about color behavior. Nagao uses Ir-R, the ratio of infrared to red at each pixel, to detect vegetation in aerial photographs (36). This is useful since chlorophyll typically has a high ir reflectance but low red reflectance. unfortunately, the blue rooftops in Nagao's BIBLIOGRAPHY photographs also exhibit high values of Ir-R; thus he uses an absolute B threshold to discriminate such roofs from vegeta1. International Commission on Illumination, International Lighttion. He also detects shadowsby comparing I to a threshold; if ing vocabulary,3rd ed., cIE 17 (E-1.1.),cIE, Paris, 1970. the intensity is low, a region is assumedto be a shadow. SevZ. D. B. Judd and G. Wys zecki,Color in BusinesE,Scienceand Induseral others have detected shadowsby adding the requirement try, Wiley, New York, 1975. g. S. J. Williamson and H. Z. Cummins, Light and Color in Nature that there be an adjacent region (presumably an illuminated part of the same surface) with higher intensity but similar and Art, WileY, New York, 1983. (23,30). similarly Ohlander (hue saturation) and chromaticity 4. F. Grum and R. J. Becherer, Optical Radiation Measurements, labels a region as a highlight if there is an adjacent region Vol. 7, Radiometry,AcademicPress,New York, 1979. with lower intensity and similar chromaticity @D. Sloan, ana5. Hamamatsu Corp., Vid,icons,Catalog SC-5-3, Hamamatsu Corp', Middlesex NJ, 1983. lyzing outdoor scenes,noted that distant objectsappear somewhat bluish (35). All these heuristics are qualitative in nature 6. Hamamatsu corp., silicon Photocells, catalog sc-3-6, Hamamatsu Corp., Middlesex NJ, 1983. and based on some simplifying assumptions about the images being viewed. Some theories have also been proposed for quantitative analysis of color. A theoretical analysis of highlights and object color reflection has been presented to provide a way to iu-one the highlights from large portions of an image (43). Highlight color and object colors can be characterizedby vectors in color space,and each pixel on a surfacehas a color that of these. The colors on a single surface is a linear "o*bination in color space,and by anaryztng a parallelogram a form thus histogram ol such colors, the parallelogram can be found. Then, by noting each pixel's color location within this parallelogram, the rll"ti.r" amounts of highlight and object color can be determined at that pixel. The method is proposedfor such materials as paint, plastic, and paper. Rubin has presented a method for using color to distinguish material (44). changes from artifacts such as shadows and highlights The ipectral power distributions of nearby pixels may only
7. G. Wyszecki and W. S. Stiles, Color Science:Conceptsand Method,s,QuantitatiueData and Formulae,2nd ed., Wiley, New York, L982. 8. J. Wentworth, Color TeleuisionEngineering, McGraw-Hill, New York, 1955. 9. R. W. G. Hunt, The Reprod,uctionof Colour, Wiley, New York, L967. 10. M. D. Levine, Region Analysis Using a Pyramid Data Structure, in S. Tanimoto and A. Klinget, (eds.),Structured Computer Vision,AcademicPress,New York, pp. 57-100, 1980. 11. T. Ito, "Color picture processingby comput€r," Proc. of the Fourth IJCAI , Tbilisi, Georgia, 635-6 42 (L975)' L2. J. M. Tenenbaum, T. D. Garvey, s. weyl, and H. c. wolf' An Interactiue Facitity for Scene Analysis Research,TN 87, SRI International, Menlo Park, CA, January 1974' 1g. J. R. Kender, Instabilities in Color Transformations, tnPRIP-77, IEEE Computer society, Troy NY, pp. 266-274, June L977'
COMPLETENESS
t4. F. Grum and C. J. Bartleson (eds.),Optical Radiation Measure' ments, Vol. 2, Colorimetry, Academic Press, New York, 1980. 15. L. E. DeMarsh, Color Reproduction in Color Television, in Proceedings of the Inter-Society Color Council 1971 Conferenceon Optimum Reproduction of Color, Williamsburg VA, January, 1971,pp. 69-97. 16. Y. Ohta, T. Kanade, and T. Sakai, "Color information for region segmentation," Comput. Graph. Image Proc., L9r 222-241 (1980). L7. Y. Noguchi, Y. Tonjin, and T. Sugishita, "A method for segmenting a clump of cells into cellular characteristic parts using multispectral information," IJCPR- , pp. 872-874, Kyoto, 1978. 18. T. D. Garvey, "An experiment with a systemfor locating objectsin multisensory images," IJCPR-3, pp. 567-575, IEEE, Coronado, CA, L976. 19. Y. Ohta, A Region-Oriented Image-Analysis System by Computer, Ph.D. Thesis, Kyoto University, Kyoto Japan, March 1980. 20. K. Akita and H. Kuga, "Towards understanding color ocular fundus images," Proc. of the Sixth IJCAI, Tokyo, Japan, pp. 7-L2, 1979. 2L. J. Engvall et al., "Development of a mathematical model to analyze color and density as discriminant features for pulmonary squamousepithelial cells,"Pattern Recog.,13(1),37-47 (1981). 22. R. M. Haralick and G. L. Kelly, "Pattern recognition with measurement spaceand spatial clustering for multiple images," Proc. IEEE, 57, 654-665 (April 1969). 23. M. Ali, W. N. Martin, and J. K. Aggarwal, "Color-basedcomputer analysis of aerial photographs," Comput. Graph. Image Proc., 9, 282-293 (1979). 24. D. M. Connah and C. A. Fishbourne, The Use of Colour Information in Industrial SceneAnalysis, in Proc. lst Intl. Conf. on Robot Vision and Sensory Controls, Stratford-upon-Avon, UK, April 1981,pp. 340-347. 25. A. Sarabi and J. K. Aggarwal, "Segmentation of chromatic images,"Pattern Recog.,13(6), 4L7-427 (1981). 26. G. B. Coleman and H. C. Andrews, "fmage segmentationby clustering," Proc. IEEE, 67(5),773-785 (May 1979). 27. A. Rosenfeld, Some Recent Results Using Relaxation-Like Processes,in L. S. Baumann (ed.),ARPA IU Workshop,May 1978,pp. 100- 101. 28. P. A. Nagin, A. R. Hanson, and E. M. Riseman,"Studies in global and local histogram-guided relaxation algorithms," IEEE Trans. Pattern Anal. Machine Intell., 4(3),263-276 (May 1982). 29. R. Ohlander, K. Price, and D. R. Reddy, "Picture segmentation using a recursive region splitting method," Comput. Graph. Image Proc.,8, 313-333 (1978).
131
shape analysis in aerial photographs," Comput. Graph. Image Proc., 10, 195-223 (1979). Decision 37. Y. Yakimovsky and J. A. Feldman, "A Semantics-Based Theory Region Analyz€r," in Proc. of the Third IJCAI, Stanford, CA, pp. 580-588, 1973. 38. S. Rubin, The ARGOS Image Understanding System, Ph.D. Thesis, Carnegie-Mellon University Computer ScienceDepartment, 1978. 39. R. Nevatia, A Color Edge Detector, in IJCPR-3, Coronado,CA, pp. 829-832, r976. 40. R. Nevatia, "A color edge detector and its use in scenesegmentation," IEEE Trans. Sys.Man Cybern, TSMC-7(L1),820-826 (November L977). 4L. T. Kanade, "Recoveryof the three-dimensional shapeof an object from a single view," Artif. Intell., 17, 409-460 (1981). 42. R. Ohlander, Analysis of Natural Scenes,Ph.D. Thesis, CarnegieMellon University Computer ScienceDepartment, 1975. 43. S. A. Shafer, Using Color to Separate Reflection Components, Color Researchand Application, 10(4),Winter 1985,pp. 210-218. 44. J. M. Rubin and W. A. Richards, "Color vision and image intensities: When are changes material?," Biol. Cybern., 45, 215-226 (1982). 45. B. A. Wandell and L. T. Maloney, Computational Methods for Color Identification, presented at 1984 Annual Meeting of the Optical Society of America, San Diego, CA. 46. J. M. Rubin and W. A. Richards, Color Vision: RepresentingMaterial Categories,AIM 764, MIT AI Lab, Cambridge, MA, 1984. 47. R. Gershon, Empirical Results With a Model of Color Vision, in CVPR -85, IEEE Computer Society, San Francisco CA, pp. 302305, June 1985. 48. C. J. Bartleson and F. Grum, Optical Radiation Mea.surements, Vol. 5, Visual Measurernents,AcademicPress, New York, 1984. 49. Committee on Colorimetry. The Scienceof Color, Optical Society of America, Washington, DC, 1963. (Originally published in 1953 by Thomas Y. Crowell.) 50. D. Taenzer, Physiology and Psychology of Color Vision-A Reuiew, AIM 369, MIT AI Lab, Cambridge,MA, 1976. 51. A. Nazif , A Suruey of Color, Boundary Information, and Textureas Features for Low-leuel Image Processing,TR 78-7R, McGill U. Elec. Engrg. Dept., Montreal, 1978. 52. C. M. Brown, Color Vision and Computer Vision, TR 108, U. Rochester Computer ScienceDept., Rochester,NY, 1982. 53. R. Gershon,Suruey on Color: Aspectsof Perceptionand Computation, RCBV-TR 84-4, U. Toronto Computer ScienceDept., Toronto, Quebec,1984.
30. B. Schachter, L. S. Davis, and A. Rosenfeld,SceneSegmentation 54. S. A. Shafer, Optical Phenomenain Computer Vision, in Proceedby Cluster Detection in Color Space, TR 424, U. Maryland Comings CSCSI-94, Canadian Society for Computational Studies of puter ScienceCenter, 1975. Intelligence, London, Ontario, May 1984. 31. S. A. Shafer and T. Kanade, Recursive Region Segmentation by Analysis of Histograms, in Proc. Intl. Conf. on Acoustics,Speech, S. A. Suarnn and T. KaNADE and Signal Processing,IEEE,Paris, France,May L982,pp. 1166Carnegie-Mellon University 1171. 32. M. Yachida and S. Tsuji, "Application of color information to vi- COMMON LISP. See Lisp. sual perception,"Pattern Recog.,3, 307-323 (1971). 33. R. Bajcsy, "Computer identification of visual surfac€s," Comput. Graph. Irnage Proc.,2, L18-130 (1973). 34. M. D. Levine and S. I. Shaheen, A Modular Computer Vision System for Picture Segmentation and Interpretation, Part I, in PRIP-79, IEEE Computer Society, Chicago,IL, August 1979, pp. 523-533. 35. K. Sloan, World Model Driven Recognition of Natural Scenes, Ph.D. Thesis, University of Pennsylvania Moore Schoolof Electrical Engineering, June L977. 36. M. Nagao, T. Matsuyama, and Y. Ikeda, "Region extraction and
COMPETENCEUNGUISTICS. See Linguistics, competence and performance.
COMPLETENESS Completenessis a property of deductive systems,or theories, as are consistency and soundness;these terms inherit their meaning from logic (qv). Informally, a complete theory is one strong enough to allow proof of any statement that ideally one
132
COMPLETENESS
would want to prove, a consistent theory is a theory free of formal contradiction, and a sound theory is, in some sense,a "correct" theory, that is, only true statements are provable. Reference to completeness within AI has been primarily within the automated theorem-proving (qv) subarea. Although its exact importance for AI has been controversial, it is generalty agreed that completenessis less important for an AI system than soundnessand consistency(and even consistency may have to be given up in larger systems),but that completeness can be an important property when basic deductive systems are considered. The term completenesshas also been used in the context of knowledge representations (qv) in AI to indicate that the notation can represent every entity within the intended domain. Proof Procedures To be precise,a first-order theory is completeif and only if (iff) for each closedformula A of the language of the theory either A or -A is provable (1). (A first-order theory is consistent iff for each formula A at most one of A and -A is provable. A theory is sound iff its theorems are a subsetof the intended set of theorems. This last definition involves interpretations of the theory; it is dependent on the notion of "intended" and so is less used in logic than the first two definitions.) Completeness is also defined for proof proceduresand is the meaning usually intended in the automated theorem-proving field. A proof procedure for a logic (a theory) is complete iff it is capable of generating a proof for every valid (true) formula of the logic (theory). Implicit in this definition is the acknowledgment that normal forms for formulas may be used, and only the normal forms may be provable; indeed, one may view the logic as restricted to these normal forms whereupon the definition is literal. For refutation logics and their associatedproof procedures, such as resolution (qv) and its various refutation procedures, the definition is the obvious variant: The refutation procedure is complete iff the procedure is capable of generating a refutation of every unsatisfiable formula (2,3).(A refutation procedure is consistent iff for no formula A are both A and -A refutable by the procedure. The refutation procedure is sound iff every formula refuted by the procedure is unsatisfiable.) Concern with completenessentered the AI community via interest in automated theorem proving (ATP) in the late 1950s.Abraham Robinson (4) was the first logician on record as proposing that a complete proof procedurebe used for ATP (in 1954), and Prawitz, Prawitz, and Voghera (5) in 1957 implemented a proof procedurefor the first-order predicate calcuius closely r.lut.d to Beth's (14) semantic tableaux method. Several other logicians respondedto the heuristic approachto proving propositional theorems taken by Newell, shaw, and Simon (6) by showing that complete proof procedures could perform as well as heuristic procedures(actually much better at that time) and could handle a much wider scopeof problems. In particular, the procedure of Gilmore (7) that used the socalled Herbrand theorem (better named the skolemHerbrand-Godel theorem) was improved by Davis and Putnam (8) and then J. A. Robinson (9), who proposed the resolution procedure (see Resolution). Each dramatically improved an aspectof the previous procedurewhile maintaining completeness,thus being able to claim that efficiency was not gained at the expense of generality (at least as a first-order uppro*imation). (For a detailed view of this first period in ATP
history seethe opening articles in Ref. 10.) However, the difficulties encounteredin proving deeper theorems using resolution techniques, in spite of a sizable repertoire of resolution "strategies," led several AI researchersto alternate methods. In particular, both Nevins (11) and Bledsoe (L2) developed incomplete theorem provers that proved some theorems not previously proved by resolution provers and illuminated techniques that promised further gains (13). These provers and a general reaction to an apparent overemphasison completeness moved the AI community to an "anticompleteness" attitude, which is gradually decreasingin intensity as researchersgain a feeling for contexts in which completenessmakes sense.To oversimplify, completenessis useful when dealing with basic mechanisms for deduction since experienceshows that otherwise some very simple deductions may be omitted- On the other hand, control structures (qv) are often designed with little regard for completenessbecause of the desire for anything that works well in reasonable domains; moreover, re-
il::',""1T:T:1"ilii l?;::Tt*:Je
aI IiremeanthatcompIete-
BIBLIOGRAPHY 1. J. Shoenfield,Mathematical Logic, Series in Logic, Addison-Wesl"y, Reading, MA, 1967. 2. C. L. Chang and R. T. C. Lee, Symbolic Logic and Mechanical TheoremProuing, Academic Press, New York, 1973. 3. D. W. Loveland, Automated TheoremProuing: A Logical Basis, Fundamental Studies in Computer ScienceSeries,North-Holland, Amsterdam, 1978. 4. A. Robinson, "Proving theorems, as done by man, machine and of Talks Presentedat the Summer Institute logician," Su.nl,maries for Symbolic Logic, 1957,Znd ed., Institute for DefenseAnalysis, 1960. 5. D. Prawttz, H. Prawitz, and N. Voghera, "A mechanical proof procedure and its implementation in an electronic computer," J. Assoc.Comput. Machin., 7, 102-128 (1960). 6. A. Newell, J. C. Shaw, and H. Simon, "Empirical explorations with the logic theory machine," Proc. West.Joint Comput. Conf., 218-239 (1957). 7, P. C. Gilmor€, "A proof method for quantification theory: Its justification and realization,IBM J. Res.Deuel.,28-35 (January 1960). 8. M. Davis and H. Putnaffi, "A computing procedurefor quantification theory," J. Assoc.Comput.Machin.,7r 20L-2L5 (L960). 9. J. A. Robinsor, "A machine oriented logic basedon the resolution principle," J. Assoc.Comput. Machin., 12r 23-4I (1965). 10. J. Siekmann and G. Wrightson (eds.),Automation of Reasoning, Vol. L, ClassicalPapers of Computational Logic 1957-1966, Symbolic Computation Series,Springer-Verlag,Berlin, 1983. 11. A. J. Nevins, "A human oriented logic for automatic theorem proving," J. Assoc.Comput.Machin-,21 606-621 (1974). LZ, W. W. Bledsoe, "splitting and reduction heuristics in automatic theorem proving," Artif. Intell., 55-77 (1971). 18. D. W. Loveland, Automated Theorem Proving: A Quarter-Century Review, Automated Theorem Prouing: After 25 Years, Vol. 2g, Contemporary Mathematics Series, American Mathematical Society,Providence,1984. 14. E. W. Beth, "A topological proof of the theorem of LdwenheimSkolem-Godel,"Kominkl. Nederl. Akademie uan Wetenschappen, Amsterdam, Proceedings,series A, 54 No. 5 and Indagationes Math 1.3(5)436-444 (1951). D. W. Lovnlaxn Duke UniversitY
LINGUISTICS 133 COMPUTATIONAL
LINGU ISTICS COMPUTATIONAL Researchin computational linguistics (CL) is concernedwith the application of a computational paradigrn to the scientific study of human language and the engineering of systemsthat processor analyzewritten or spokenlanguage. The term natu' ral-language processing (NLP) is also frequently used, especially with regard to the engineering side of the discipline. As an historical note, the term computational linguistics included the study of formal languages and artificial computer languages(e.g.,ALGOL), as well as natural languages,until the middle 1960s, but this entry concernsCL as it is presently conceived. Theoretical issues in CL concern syntax, semantics, discourse,language generation, language acquisition, and other areas, whereas areas for applied work in CL have included automatic programming, computer-aided instruction, database interface, machine translation, office automation, speech understanding, and other areas. Historically, much CL researchhas been done by researcherswhose language interests overlap with interests in such related disciplines as AI, cognitive science,computer science and engineering, information science, linguistics, philosophy, psychology, and the speech sciences.The middle 1970s,however,witnessedan increasein hybrid efforts, so that present efforts in CL typically draw from and contribute to work in one or more of these cognate areas. This entry serves primarily as an overview of the primary topics in CL. It begins with a historical introduction to the field, followed by brief remarks on someof the more important theoretical probleffis,and concludeswith pointers to the literature. Since space has permitted only a general statement of the goals of a theory or implementation, with occasionaletcamples of either I/O behavior or internal representation formalisms, conclusionscannot be drawn from this entry alone concerning the capabilities of the work to be described.More detailed information is available in the separate entries related to the topics consideredhere.
EarlyWork (1950-1965) Most CL work prior to 1960 concernedmachinetranslation, as defined below, but the advent of transformational grarnrnar and the emergenceof paradigms for information retrieuol also played an important role in the formation of a CL community. is a discussion of the essential work in these three ilji::ins Machine Translation.Many of the first attempts at using computers to processnatural language concernedthe problem of translating from one natural-language text into another. Although actual computer programs seeking to solve this task were not written until the early 1950s,the idea of mechanical translation can be traced to conversations as early as L946 betweenWarren Weaver and A. D. Booth. The initial impetus came in 1949, when Weaver wrote and privately circulated a paper titled "Translation" (1). This paper, along with a detailed account of initial work in machine translation, can be found in Locke and Booth (2). Most early work on machine translation, also known as automatic translation, mechanical translation, or simply MT, was conductedin the United States and the USSR, where the
political and military interests in natural-language translation were especially strong. There were also two British projects and somework done in Italy, Israel, and elsewhefe.Typically, efforts at machine translation, which predated the important work in linguistics and computer scienceon syntax, gfammars, and languag€s,were basedon word-by-wordtranslation schemes.In particular, no attempt was made to "parse" sentences (i.e., determine their syntactic structure) and, at least as significantly, no attempt was made to actually "understand" the material to be translated. A char acterization of the basic approach of word-for-word processing can be found in Ref. 3. As an example of what had been achieved by about 1960, the first sentenceof a L956 Russian article yielded the output "'razviti' electronics (allowed permitted) (considerablysignificantly considerablesignificant important) to (perfectimprove) (method way) 'frt' (measurement metering sounding dimension) (speedvelocity rate ration) (light luminosity shine luminous)," where parenthesesindicate uncertainty on the part of the system and where razuiti and fiz were unknown and thus untranslated (ft.zderives from a proper name). From this output, a human posteditor produced"Development of electronics permitted considerably to improve method Fizeau of measurement of speedof light," which may be comparedwith the fully human translation "The development of electronics has brought about a considerableimprovement of Fizeau'smethod of measuring the velocity of light." This example is discussed in detail in Oettinger (3). Concerning the distinction between fully automated as opposedto machine-assistedhuman translation, evenBar-Hillel, an outspoken detractor of much MT work, observed that "word-by-word Russian-to-English translation of scientific texts, if pushed to its limits, is known to enable an English reader who knows the respective field to understand, in general, at least the gist of the original text, though of coursewith an effort that is considerably larger than that required for reading a regular high quality translation" (4). Nevertheless, researchers and government funding agencies continued to anticipate systems that would provide "fully automatic highquality translation" (FAHQT). It was with respectto this more ambitious goal that the Automatic Language ProcessingAdvisory Committee (ALPAC) was formed in April of 1964 "to advise the Department of Defense, the Central Intelligence Agency, and the National ScienceFoundation on researchand development in the general field of mechanical translation of foreign languages." In essence,the committee found that "there has been no machine translation of general scientific text, and none is in immediate prospect" (5). They further observedthat, in some cases,"the posteditedtranslation took slightly longer to do and was more expensive than conventional human translation" and also noted that "unedited machine output from scientific text is decipherablefor the most part, but it is sometimesmisleading and sometimeswrong (as is posteditedoutput to a lesser extent)." Although the ALPAC committee had presumably intended its report to effect "useful changesin the support of research," their findings resulted in the virtual elimination of federal funding for work in MT. As a consequence,very little work was done, and few papers published, for roughly a decade. Since the middle 1970s,however, a number of projects have been spawned or reactivated. The entry on machine translation provides technical details and also discussesmore recent work in the area.
134
COMPUTATIONALLINGUISTICS
TransformationalGrammar. In 1957 an event occurredthat not only revolutionized the world of linguistics but left a lasting impression on philosophy, psychology, and other areas. That event was the publication of a short monographby Noam Chomsky entitled Syntactic Structures (6) that explored the implications of automata theory for natural language. In it, Chomsky first argued that the sentencesof a natural language cannot be meaningfully generatedby a finite-state machine or by any context-free glammar, or at least that "any grammar that can be constructed . . wiII be extremely complex,ad hoc, and 'unrevealing"' (7). He then proposeda theory of what he called transformational grarnrnar (TG) and began to work out its details. At the most abstract level the theory of TG involves specifying a set of "kernel" sentencesof a language; an assortment of "transformations," such as verb tensing and passivevoice;and an ordering in which transformations are to be carried out. For example, to avoid producing a sentencesuch as "John are liked by the students," the passivetransformation must apply to the kernel sentence"The students liked John" before the rule for subject-verb agreement. The entry on transformational grammar provides details of the theory. With the publication of Syntactic StruchtreE,Chomsky had argued for, if not established, the efficacy of a transformational component,but he recognizedthat TG would have to be "formulated properly in . . terms that must be developedin a full-scale theory of transformations." As a suggestive first step, his appendix provided a sample grammar for a very small subset of English that included 12 content words and fairly elaborate auxiliary verb structures. The period from 1957 to 1965 was one of intense activity by Chomsky and several students, culminating in 1965 with the publication of Aspectsof the Theory of Syntax (8) and its far-reaching theory of deep structure, which relates to an internal sentence-independent representation of (the meaning of) the sentence. Although TG has had an uneven impact on CL, centered mostly around matters of syntax, its influence on early work in CL is evidenced through bibliographic references and, more substantively, by conceptsand borrowed terminology that appeared in the CL literature of the 1960s.In the long term the hypothesis of TG most significant for work in CL is that an understanding of the syntax, or structure, of natural-language sentences can be arrived at on a solely grammatical basis, without consideringthe real-world properties(e.g.,meanings) of the terms being discussed.This notion, sometimesknown as the "autonomy of syntax," continues to provide a useful, if regrettable, division in categorizingcurrent work in CL, as the debate continues as to what interactions are desirable, or necessary,between the structural (syntactic) and interpretive (semantic, pragmatic) componentsof a theory or implementation.
is "concernedwith the structure, analysis, organization, stor&Ee,searching, and retrieval of information" and has grown to include proceduresfor "dictionary construction and dictionary look-up, statistical and syntactic language analysis methods, information search and matching procedures,automatic information dissemination systems, and methods for user interaction with the mechanizedsystems"(9). Although little association remained between CL and IR by the middle 1960s,early work in IR did overlap that being doneby the early workers in CL. The evolution of work in IR is chronicledin Refs. 9-12. Broadeninglnterests(1960-1970) In contrast to the 1950s, during which time CL researchers concentratedprimarily on machine translation, the 1960switnessedthe application of CL techniques to databaseretrieval, problem solving, and other areas. For the most part, these early NL systems provided quite limited forms of interaction and were often based on techniques specifically tailored for a single domain of discourse. Nevertheless, the work represented interesting and important, if tenuous' first steps at seeking computational solutions to problems of human language processing.In addition, Raphael notes that these programs "contain the seeds,or at least surfaced the issues,that led to many of today's major computer science concepts:semantic net representations, data abstraction, pattern matchirg, object oriented programming, syntax-driven natural language analysis, logic programming, and so on" (13). One important aspectof CL implementations of the 1960s, largely without counterpart in CL work of the 1950s,was that the "processing" to be done required programs to understand their inputs to some nontrivial degree.For example, although Bobrow recognized that "we are far from writing a program that can understand all, or even a very large segment,of English" (L4), he claimed that "a computer understandsa subset of ntrglish if it acceptsinput sentenceswhich are members of this subset and answers questions based on information contained in the input" (15). This issue was not without controversy, however, as suggestedby Giuliano's complaint that an . which is used in several "arbitr ary heuristic procedure not "becomea principle does computer programmed systems" (16). Simmons QD reargument, this To its use" through spondedthat "theory often lags far behind model building and sometimes derives therefrom" and further maintained that the early systems represented "truly scientific approachesto the study of language" (18). The following discussion seeks to convey a sense of the problems addressedby NL applications in the 1960s.They are gfoupedin terms of question-answering,problem-solving,consultation, and miscellaneoussystems.
Question-Answeringsystems. one of the first fully implelnformation Retrieval. It is fairly well known that the emerdata retrieval systems was BASEBALL (qv)' "a commented the during gence of the modern digital computer occurred that answers questions posed in ordinary En' 1g40s and that the probi=emsfirst solved by these computers puter program gtislt alout data in its store" (19). This system was designedto were numerical in nature and often military in origin. Iiuring interact with a primitive database, stored as attribute-value providi the 1g50s computers were increasingly called opoi to pairs, that contained information about the month, day, place, p,rrpor", for such autu accessto large volumes of nonnum"ri, teams, and scores for American League baseball games' An as database retrieval and on-line bibliographic search. that provided for example input is "What teams won 10 games in July?" Most systems of the 1950sand early f-SOOs ,,inputs,, were directed toward biblio$aphic search Another early program was SAD SAM, desigrredto "parse English written in Basic English and make inferences about sentences into a coalesced efforts these and other library services, and (IR), which kinship relations" (20). This system comprised two modules, field that becameknown as,,information retrieval"
COMPUTATIONALLINGUISTICS
one for parsing (the syntactic appraiser and diagramer, SAD) and one for semantic analysis (the semantic analyzing machine, SAM). The basic operation of the semanticsmodule involved searching a previously constructedparse tree for words denoting kinship relationships in order to construct a family tree, which was stored as a linked structure. The SIR system had the goal of "developing a computer [program] . . having certain cognitive abilities and exhibiting some humanlike conversational behavior" (2L). The system was similar to SAD SAM in allowing a user to input new information, then ask questions about it. However, SIR emphasizedrelations such as set-subset,part-whole, and ownership, as suggestedby the following: Every boy is a person. A finger is part of a hand. Each person has two hands. John is a boy. Every hand has 5 fingers. How many fingers does John have?" The DEACON system, which was designedto answer questions about "a simulated Army environment" (22), represents an important precursor of the databasefrontends of the 1970s. Its internal "ring"-like data structures could be dynamically updated, thus enabling users to supply new information ("The 425th will leave Ft. Lewis at 21950!")as well as ask questions ("Is the 638th scheduledto arrive at Ft. Lewis beforethe 425th leaves Ft. Lewis?"). In reflecting upon their experienceswith DEACON, the authors noted that "perhaps the most significant new feature needed is the ability to define vocabulary terms in English, using previously defined terms" (zil. This rcalization led directly to the REL system and its successors. The REL system (Rapidly Extensible Language) represented the logical continuation to the work with DEACON, and its primary goals were "to facilitate the implementation and subsequentuser extension and modification of highly idiosyncratic langu,ageldatabase packages"(24). An example customization is def:power coefficient:high speedmemory srzeladdtime From a theoretical standpoint, REL was based on the notion that an English language subset could be treated as a formal language "when the subject matter which it talks about is limited to material whose interrelationships are specifiablein a limited number of precisely structured categories"(25). The first sizable application of REL was to an anthropological database at Caltech of over 100,000items. As indicated below, work on the REL project continued well into the 1970s,until the system,now quite advancedover its early prototyp€s,was renamed ASK. Another early database interface, CONVERSE, was designed as an "on-line system for describi.g, updating, and interrogating data bases of diverse content and structure through the use of ordinary English sentences"Q0. It was intended to strike "a reasonablecompromisebetweenthe difficulties of allowing completely free use of ordinary English and the restrictions inherent in existing artificial languages for data base description and querying" (26). An example input is "Which Pan Am flights that are economy class depart for O'Hare from the city of Los Angeles?" In addition to questionanswering capabilities, the system included facilities for English-like data definitions and English-like means of populating the database.
135
in a natural language within somerestricted problem domain" (14). It sought to solve high-school-level algebra word problems stated in what the author considereda "comfortable but restricted subset of English" by constructing an appropriate set of linear equations to be solved. As an example of STUDENT's capabilities, a sample problem is: The price of a radio is $69.70.If this price is 157oless than the marked price, find the marked price. Having been given equations such as "distance equals speedtimes time" and by employing a number of clever pattern-matching techniques (e.g., "years older than" is converted to "plus"), STUDENT could answer somevery complex problems. Its designer suggestedthat the system "could be made to understand most of the algebra story problems that appear in first-year high-schooltextbooks" but noted that "the problems it cannot handle are those having excessiveverbiage or implied information about the world not expressible in a single sentence" (27). In building upon Bobrow's work with STUDENT, Charniak set out to develop a program to solve calculus word problems (28). This led to the implementation of the CARPS system, which was restricted to freshman-level problems related to rates, as Bobrow had in fact suggestedin his paper. An example input processedby CARPS follows. Water is flowing into a conical filter at the rate of 15.0 cubic inches per second.If the radius of the base of the filter is 5.0 inchesand the altitude is 10.0 inches,find the rate at which the water leuelis rising when the uolumeis 100.0cubic inches.(29). In reflecting on his work, Charniak noted that, despite its advances over STUDENT, "a very powerful calculus word 'common sense' problem solver will require a good deal of knowledge" (30). In advancing this claim, Charniak foreshadowed an interest on the part of the AI community in the 1970s in story understanding. Consultation.One of the most widely publicized AI programs of the 1960s,the ELIZA (qt) system, was designed"for the study of natural language communication between man and machine" (31). In essence,ELIZA was intended "to respond roughly as would certain psychotherapists(Rogerians)" since "the psychiatric interview is one of the few examples of categorizeddyadic natural language communication in which one of the participating pair is free to assume the poise of knowing almost nothing of the real world" (31). An excerpt from a "typi cal" conversation with ELIZA follows. "I need some help, that much seemscertain." WHATWOULD IT MEAN TO YOU IF YOU GOT SOME HELP "Perhaps I could learn to get along with my mother." TELL ME MORE ABOUT YOUR FAMILY "My mother takes care of me." WHO ELSE IN YOUR FAMILY TAKES CARE OF YOU "My father." YOUR FATHER (31)
As can be inferred after considering a dozenor so responses ProblemSolving. The STUDENT (qt) system was designed from the system, ELIZA sought to match its current input as "a computer program that could communicate with people against one of a known set of patterns. It then selectedone of
136
COMPUTATIONAT TINGUISTICS
possibly several transformations associatedwith that pattern. Actually, patterns were associatedwith a keyword, and the algorithm considered various priorities in choosing among candidate matches. The idea of maintaining a "script" of data separate from the algorithms of the program itself was not without precedent, but ELIZA carried this out more fully than had previous systems. In addition to its technical contributions and the excitement it caused,the system convincedat least its designerthat "the whole issue of the credibility (to humans) of machine output demandsinvestigation" (31). This thought led Weizenbaum to his widely publici zedsocial criticisms of AI research (32). An interesting and also famous follow-up of ELIZL, in which the program played the role of the patient rather than the analyst, is reported in Colby et al. (33). Miscellaneous.Within the tradition of information retrieval established in the 1950s,but with greater attention to syntax and other linguistic issues,Protosynthex sought to accept natural English questions and search a large text to discover the most acceptablesentence,paragraph, or article as an answer (17). The system was applied to portions of Compton's Encyclopedia,and an example of a question posedto the system is "what animals live longer than men?" The project continued for several years and evolved into "a general purpose language processor. . basedon a psychologicalmodel of cognitive structure that is grounded in linguistic and logical theory" (34). A few systemswere designedto produce English output, as describedby Simmons (17). One system, NAMER, was designed to generate natural-language sentences from line drawings displayed on a matrix (35). It produced sentences such as "the dog is beside and to the right of the boy." Another system, the Picture Language Machine (36), would be given a picture and a sentenceas input and, after translating both the picture and the English statement into a common intermediate logical language, would determine whether the statement about the picture was true. An example input is "all circles are black circles." (1965-1970) FormalismDevelopments In addition to system-building activities, a number of formalisms were developedduring the 1960s,especially in the latter half of the decade,relating to linguistic, psychological,and other aspectsof natural languages.Basedon experienceswith previous attempts to construct natural-language-processing systems, and upon developments in linguistics and various areas of AI, these formalisms provided more sophisticated ways of representing the results of a partial or completeanalysis of inputs to an NL system. A few of the more important of these formalisms are summanzed here, namely, augmented transition networks (seeGrammar, augmented transition networks), casegrammar (qv), conceptualdependency(qv), procedural semantics (qv), and semantic networks (qt). Further details appear in the individual entries. By extending the expressivenessof the transition network models describedby Thorne et aI. (37) and Bobrow and Fraser (38), which were themselves based on the basic finite-state machine model stemming from work in formal language theory, Woodsdevelopedan augmented transition network (ATN) model for the syntactic analysis of natural-language sentences
(39). One of the primary advantagesof the ATN model over its predecessorsrested in its "hold-register" facility, which alIowed information to be passed around in a parse tree under construction. This enabledthe handling of deeply nestedstructures and other syntactic complexities.The hold-register facility derives, at least in spirit, from the desire to construct the "deep structure" correspondingto a sentenceunder analysis, a conceptderiving from work in transformational grammar. The theory of casegrammar, as proposedby Fillmore (40), expands on the view that "the sentencein its basic structure consistsof a verb and one or more noun phrases,each associated with the verb in a particular caserelationship." For instance, Fillmore observes that understanding the sentence "The hammer broke the window" involves recognuzrngthat the noun hammer acts differently from John in John broke the window." Specifically,it is an instrumenf ("the inanimate force or object causally involved in the action") rather than an agent ("instigator of the action identified by the verb"). Fillmore's original theory included these and six additional case roles. One important aspect of casegrammar theory is its distinction between "surface" roles (e.g., subject) and "deep" cases(e.9., agent or instrument). Bruce (4L) provides a survey of ways in which the notion of case grammar was taken up by computationalists in the 1970s. Having adopted a view that language-processingsystems should not produce a syntactic analysis of an input divorced from its meaning, Schank proposeda conceptualdependency (CD) model of language and exhibited its operation in the context of an implemented parsing system (42). Deriving Ioosely from ideas to be found in Hays (43), Kay (44), and Lamb (45), CD is based on a small number of "conceptual categories," including picture producers (PPs), PP assisters (PAs), actions or abstract nouns (ACTs), and ACT assisters (AAs). Developments in the original theory, including more sophisticated conceptual categories such as mental information transfer (MTRANS) and ingestion (INGEST)' are outlined in Schank (46). In addition to its central role in the development of the MARGIE syst€ffi,discussedbelow, CD contributed to philosophical discussionsconcerningthe role of "primitives" in theories of meaning. In seeking to develop "a uniform framework for performing the semantic interpretation of English sentences"(47), Woods devised a framework that he termed "procedural semantics" that acted as an intermediate representation between a language analyzer, e.g., a question-answering (qv) system, and a back-end database retrieval component. In essence,the idea behind procedural semantics is to define, given a particular database,a collection of "semantic primitives" that comprisea set of predicates,functions, and commands.This strategy was first demonstrated in the context of a hypothetical questionanswering system for an airlines reservation system and was soon to be used in building the LUNAR syst€ffi, as described below. Motivated by work in linguistics and psychology and attempting to formulate "a reasonable view of how semantic information is organizedwithin a person'smemory" (48), Quillian proposeda memory model that has cometo be known as a sernantic network. Although precursors of semantic networks are to be found in the use of property lists by designersof early NL systems, Quillian provided a theoretical and more formal treatment. In essence,a semantic network consistsof a set of "nodes," typically representing objectsor concepts,and vari-
COMPUTATIONALLINGUISTICS
ous "arcs" connectingthem that are typically labeled to indicate a relation betweennodes.Quillian's initial use of his network structures involved their role in making inferences and finding analogies. Semantic networks have been important not only becauseof the many systems that incorporate them but also in their contribution to the developmentin the middle 1970s of various theories of knowledge representation. The evolution of semantic network structures, together with a discussion of applications based on them, is traced by Simmons (49), Findler (50), and Sowa (51).
137
that Winograd felt the system could successfullyprocess"(58). Nevertheless,SHRDLU was an impressive demonstration system that rekindled the hope of truly "natural" Ianguage-understanding systems and touched upon many still unsolved research topics.
LUNAR. The task of LUNAR, a system deriving from the work discussedabove on procedural semantics,was to provide lunar geologists with a natural-language interface to the ApoIIo moon rock database.The system had three main components. The first phase formed a syntactic parse using an elaborate ATN grammar and a dictionary of 3500 words. The A TurningPoint (ca. 1970) parser created a deep-structure representation, which was In the aftermath of disappointing results from work in ma- then passedto a rule-driven semantic interpreter. The antechine translation (qv) in particular and the difficulty of con- cedent of a semantic rule specified a tree fragment to be structing sophisticated natural-language-processingsystems matched against the deep-structure representation plus sein general, two natural-langu age projects in the early 1970s mantic conditions on the matched nodes.The right side of a captured a degree of attention that served to boost the confi- semantic rule was a procedural template for the final, redence of AI researchers regarding the prospectsfor broadly trieval component.For example, the sentence based, well grounded NL systerns.These projects, which are What is the average concentration of aluminum in high discussedin turn, were the SHRDLU (qv) system of Winograd alkali rocks? (52) and the LUNAR (qv) system describedin Woodset al (53) was translated as and Woods 6$.
SHRDLU. Winograd's SHRDLU systemprovideda naturallanguage interface (qt) to a simulated robot arm in a domain of blocks on a table. The systemcould handle imperatives such as "Pick up a big red block," questions such as "What doesthe box contain?" and declaratives such as "The blue pyramid is mine." Since SHRDLU maintained information about its actions, it could also be asked questions such as "Why did you pick up the green pyramid?" to which the system might respond "to clean off the red cube." The primary design principle of SHRDLU was that syntax, semantics, and reasoning about the blocks world should be combinedin understanding natural-language input. The main coordinator of the system was a module (effectively a parser) consisting of a few large programs written in a special programming language called PROGRAMMAR, which was embedded in LISP. These programs correspondedto the basic structures of English (clauses, noun groups, pr€positional groups, etc.) and embodieda version of the systemic grammar theory of Halliday (55). A semantics module that was similarly organized coordinated with the parser and made calls to a reasoning system programmed in MICROPLANNER (qv), a theorem-proving (qv) language. Procedural representations for most of the knowledge in the system gave SHRDLU a considerable amount of flexibility to integrate semantic and pragm attc tests, to apply heuristic procedures for anaphora resolution, etc. The successof the procedural representations sparked the procedural-declarative controversy (56), which led to the identification of important knowledge representation issues. In the final analysis, many have agreed with Wilks (57) that SHRDLU's power seemsto derive in large measure from the constraints of its small, closeddomain and that the techniques would fail to scale up to larger domains. Furthermore, the grammatical coverageof SHRDLU was spotty in the sense that "although a large number of syntactic constructions occur at least once in sample sentencesappearing in the published dialog, our attempts to combine them into different sentences (involving no new words or concepts)produced few sentences
(FOR THE X13 / (SEQL (AVERAGE Xr4 t (SSUNION X15 / (SEQ TYPEAS):T; (DATALTNE (WHQFTLEX15) X15 (NPR*X16 / (QUOTE OVERALL)) (NPR*XI7 / (QUOTE AL203)))):T)):T; @RINTOUT X13)). The databasewas a flat fiIe containing 13,000entries. Run time performance of the system was acceptable;the sentence abovewas parsed in just under 5 s. In an informal demonstration of the system at the SecondAnnual Lunar ScienceConferenceheld in Houston in January of 197L,78Voof the 111 requests were handled without error. After correcting minor diction ary coding errors, this rate was improved to 90Vo. In discussingthe coverageof the system, Woodsconsidered the syntactic coverageto be "very competent" but noted that "tf a fiunar geologist] really sat down to use the system to do some research he would quickly find himself wanting to say things which are beyond the ability of the current system" (54). In summary, the LUNAR system demonstrated that a sizable, important databaseproblem could be handled using the techniquesof ATNs and procedural semantics. A Varietyof ApplicationAreas(1970-1984) Following the technical advancesof the 1960sand in the wake of the rather dramatic work of Winograd and Woods,the period from the early 1970switnessed a variety of applied natural-language projects. Application areas include database interface, computer-aided instruction (qv), office automation (qv), automatic programming (qt), and the processingof scientific text. These areas are discussedin turn. Databaselnterface. Typical NL databaseinterfaces operate by translating English or other natural-language inputs into a formal databasequery language to be run against an existing relational or other databasemanagement system. For a number of reasons,this formed the most frequent application area
138
COMPUTATIONALLINGUISTICS
for applied NL work in the 1970s.First, the growing presence of database systems in business and industry resulted in a rapid increase in the number of potential computer users, many of whom preferred not to have to learn a "formal" computer language. Second,the idea of database query followed logically from the question-answering mode of many NL systems of the previous decade.Third, the NL system designer could, by starting with an existing set of data and by assuming an implemented back-end retrieval module, avoid the need to addresslow-level representation issues. An early attempt at providing natural-language accessto a relational database is the RENDEZVOUS system, which emphasized human factors and conceptsderiving from the database world, without less attention to techniques developedin AI and CL. The primary design goal of the system was "to accept queries stated in any English, grammatical or not, rejecting only those that are clearly outside the domain of discoursesupportedby the data base at hand" (59).To accomplish this, RENDEZVOUS temporarily ignored portions of the input it could not recognize and, to compensate,engagedthe user in "clarifi,cation dialog" to refine its understanding. A representative initial input to RENDEZVOUS is "I want to find certain projects.Pipeswere sent to them in Feb.I975." To help ensure reliable processing,RENDEZVOUS provided a paraphrase of its current understanding of the user's request. In situations where a phrase was ambiguous to the system, a request for clarification was generated.Although RENDEZVOUS tended many seemingly to overburden the prospective user with too((37" in "part 37" pointless questions (e.g., it thought that might be a quantity on order rather than a part numbet), it addressed human-factors issues that were taken up by PLANES (qv) and other sYstems. A sharp contrast to the previous system in terms of linguistic sophistication is found in TQA, formerly REQUEST (60), whose syntactic processingis basedon the principles of transformational grammar. This decision was made "in an attempt to deal with the complexity and diversity that are characteristic of even restricted subsets of natural language" (60). In essence,the parsing involves applying transformations in reuerseto reconstruct the deep structure associatedwith a question to be answered. Details 6oncerning this process,and further motivation for it, are given in Petrick (61). Initial applications included Fortune 500 data and a database on White Plains land usage.During the latter field study, a set of "operating statistics" was collected(62). A system intermediate between the previous two in terms of linguistic sophistication is LADDER (63), whose syntactic pro..tsing was based on a "semantic gTammar" designed around the object types of the domain at hand, such as ship and port, rather than linguistically motivated lexical and syntactic categories,such as noun and verb (seeSemantic grammar). The system provided an ability "for naive users to input English statements at run time that extend and personaltze the language acceptedby the system" (64). The specificset of ,,tools" r..king to "facilitate the rapid construction of natural language interfaces" $4) was called LIFER (qv). The system contaitred facilities for the user to add synonyms and define paraphrases; mechanisms to handle ellipsis (incomplete inputs) were also provided. For example, after asking "What is ln" salary of Johnson?"the user could type "position and date hired" and the system would answer the question "What are the position and date hired of Johnson?" The development of
LADDER helped to reassure the CL community that "genuinely useful natural language interfaces can be created and that the creation processtakes considerably less effort than might be expected,'(64). Hershman et al. (65) describean experiment in which LADDER was used for a simulated Navy operation. search-and-rescue Other interesting and important NL databasesystemswere constructed, but space permits only brief descriptions. The TORUS system (66) representedan early attempt at formulating an "integTated methodology" for designing NL database front ends. It employed a semantic network representation, and the prototype was developedaround a databaseof graduate student files. As described in Thompson and Thompson (67),work continued on the REL system mentioned earlier and eventually gave rise to the desk-top systems POL and ASK (68,69).In addition to manifesting refinementsand extensions over earlier work, ASK was extended to allow for French as weII as English inputs. Results of an experimental study with the REL-ASK family appear in Thompson(70).EUFID (7L,72) was also designed to be database independent and, like LADDER, provided a "personal synonym" facility designed to be "forgiving of spelling and grammar errors" (71). Several systems seeking to addressthe issue of cooperative responsewere designed.For instance, the construction of COOP (73) was based on the belief that "NL systems must be designedto respond appropriately when they can detect a misconceptionon the part of the user." For example, the probable presumption of a user who asks, "how many students got a grade of F in CIS 500 in spring I97 7" is that the coursewas in fact given at the time in question. If this were not the case, CO-OP would so inform the user, rather than simply give the literal but misleading answer "nobody."The PLANES system (74) was based on the notion that an effective NL system "must be able to help guide and train the user to frame requests in a form that the system can understand." According to tne designers,the work derives in spirit from Codd'swork (59) with RENDEZVOUS. For instance, PLANES incorporated novel techniques of "conceptcaseframes" fot generating dialogue to flesh out an incomplete understanding of the user's request and "context registers" for handling pronouns and other anaphora. JETS, a successorto PLANES, respondedto some "interesting questions about the conceptual completeness of question-answering systems" (75) that arose during experiments with the earlier system (76). A number of high-quality European systems were developed,each manifesting someinterest in domain independence. The USL system gT was designedin Heidelberg as a domainindependent,German langu agedatabasefront-end. It incorporated a "revised version" of a parser built by Martin Kuy, and "the method of [semantic] interpretation used in the REL system . . was taken as a point of departure" (78). Additional grammars were constructed to enable USL to answer questions posedin English, Dutch, and Spanish as weII as German. A user study with the usl, system was reported by Krause (Zg). PHLIQA was built in Eindhoven to answer English questions about information stored in a CODASYL database (80). It operated on hypothetical data concerning computers installed at European establishments and was intentionally cause to include "some features that structured 'real' databases" in found also are which . and . difficulties (g1). Attention was paid to isolating the parts of the system that dependedon the chosendata base,and an effort was made
COMPUTATIONALLINGUISTICS
"to derive the parts dependent on a data base in a systematic way from the structure of the data base and its subject domain" (81). HAM-ANS (82) was developedin Hamburg as "a robust and cooperative system" to enable natural-language accessin German to database and other software services. The ROBOT system (83,84), rather than emphasizing linguistically complex (e.g., deeply nested) English structures, represented a database front-end with concern for report generation facilities, as suggested by the representative input "Print a report of direct commission, net loss incurred, and change in INBR for region B, sorted by net loss." The system was interesting in its use of the databaseas a world model in which to carry out disambiguations, but its primary importance is that it led in the late 1970sto a product for mainframe databasequery, INTELLECT (qv) and the concomitant founding of Artificial Intelligence Corporation. More recent systems for database and information retrieval, which represent an important direction but for which spacedoes not exist and for which a historical perspective is not yet possible,are mentioned below in the sectionon current trends. Also, there has been a small amount of attention to providing natural-language facilities for database update, &s opposedto query. Examples of this work are found in Salveter (85) and Davidson and Kaplan (86). Computer-AidedInstruction. The first attempt at incorporating AI and CL techniques into an integrated system for computer-aidedinstruction (CAI) was the SCHOLAR (qv) program (87). By representing the information to be learned in serrlantic network structures, this system was designed to be independent of the actual "lesson" at hand, which for the prototype consistedof information about South American geography. In particular, "no specific pieces of text, questions, with their predicted answers, errors, or anticipated branchitrg form part of this data structure" (87). An example interaction with SCHOLAR follows. THE LANGUAGE IN BRAZIL IS FRENCH. RIGHT OR WRONG? "'Wrong" YOU ARE CORRECT. THE LANGUAGE IN BRAZIL IS PORTUGESE. As indicated, the system included a nontrivial English-generation component.More significant, however, was the provision for "mixed-initiative" dialogues, where the user could ask questions of the system at any time rather than supply an answer for a pending question. For example, if the system had just asked "What is Brazil," the student might delay answering the question by asking the system to "Tell me somethittg about Peru." In addition, the system allowed its user to switch to a question-answering mode, thus acting as a question answerer for the data at hand. Historically, however, SCHOLAR was most important by establishing the feasibility of mixedinitiative NL systems. In continuing the work begun with the initial SCHOLAR prototype, Collins et al. (88) extended the system's basic matching mechanisms to enable it to operate in situations in which requested information is missing. The resulting system would sometimes make inferences and sometimes inform the questioner that the requested information was not available.
139
In the latter case, related or partial information might be given. As an example, the system would respond to the question "Is Bolivia a capital?" by answering "No, Bolivia is a country, not a capital." In formulating this response, SCHOLAR has used its knowledge that country and city are mutually exclusive object types and only a city can be a capital. Another extension to the work of SCHOLAR is represented by the SOPHIE (qv) system, which sought to encourage "a 'reactive' wider range of student initiatives" by presenting "a environment in which the student learns by trying out his ideas rather than by instruction" (89). To explore this idea, SOPHIE was designed as an electronics lab troubleshooting system that would "mimic the roles of a human teacher" (89). To use the system, a student would receive a schematic diagram of a power supply and be told the control settings of a simulated faulty circuit. An example interaction follows. "What is the output?" THE OUTPUT VOLTAGE IS 11.7 VOLTS. "What is it in a working circuit?" IN A WORKING CIRCUIT THE OUTPUT VOLTAGE IS 19.9 VOLTS. "What is the voltage between R8 and Q1?" THE VOLTAGE AT Nl IS 47.6VOLTS. When the student finally settled on a diagnosisof the problem and recommended a course of action, such as "replace Q6," SOPHIE would seizethe initiative and begin asking questions of the student. In addition, SOPHIE could alter the parameters of its simulated circuit, thus providing "what-if" capabilities. Despite the fact that it was carefully designedfor a troubleshooting application by the use of a "semantic" grammar and thus lacked the domain independence of SCHOLAR, SOPHIE contained a variety of non-Nl, capabilities that themselves proved to be interesting and important. Subsequentto the efforts reported above,Weischedelet al. (90) constructed a system to aid students in learning first-year German. The designers were interested, among other things, in enabling computers to deal with ungrammatical sentences, and in their chosensetting, it was mandatory for the system to respondmeaningfully to inputs that were linguistically flawed as well as those that were factually incorrect. An example of such a response follows (the system's question translates as "Where did Miss Moreau learn German?"). WO HAT FRAULEIN MOREAU DEUTSCH GELERNT? "Sie hat es gelernt in der Schule." ERROR:PAST PARTICIPLE MUST BE AT END OF CLAUSE. A CORRECT ANSWER WOULD HAVE BEEN: SIE HAT DEUTSCH IN DER SCHULE GELERNT. In addition to detecting incorrect grammar in the context of an otherwise acceptableresponse,the system was able to recoglarzewhen an input was incorrect or, more subtly, correct but not fully responsive to the question. As suggestedabove, the tutoring program dealt with reading comprehension,and the prototype was applied to several "lessons,"each consisting of a
140
COMPUTATIONALLINGUISTICS
paragraph. Concerning generality, the designers pointed out that "the texts that appear in foreign language textbooks very rapidly surpass the ability of artificial intelligence systems (90). They also observedthat "there doesnot seem to be any way to tune the system to particular types of errors," which means that an instructor would have to construct each lesson by hand, unlike for the semantic network approachadoptedfor SCHOLAR (which carefully avoided storing textual information). Perhaps the most significant outcome of the project was to demonstrate ways in which "ill-formedness" can extend to morphological, semantic, and pragmatic problems as well as syntactic ones. The ILIAD system (91) was conceivedas a way of helping instruct people having a language-delaying handicap (e.9., deafness)or who are learning English as a secondlanguage. It included a powerful English generator based on the transformational grammar model. Office Automation. The SCHED system was basedon techniques and formalisms developerlfor the automatic programming system NLPQ describedbelow and representedan initial study of "the feasibility of developing systems which accomplish typical office tasks by means of human-like communication with the user" (92). Atthough the long-range goal of SCHED was to provide an on-line system to review and update one'sown desk calendar and those of fellow office workers, the implemented system was restricted to information pertaining to a single user. An example input for SCHED is Schedulea meeting, Wed, ffiY office, 2 to 2:30, with my manager and his manager, about'a demo'. to which the system would respond by stating in English its understanding of the input. Subject to user verification, the system would issue an appropriate command to a resident calendar management system. In situations where a user input failed to supply all necessary information, SCHED was able to ask for specific information, thus providing for mixed-initiative conversations reminiscent of the previously mentioned work in CAI. The GUS system, similar in spirit to SCHED, though quite different in its methods, was "intended to engage a sympathetic and highly cooperative human in an English dialog, directed towards a specifi.cgoal within a very restricted domain of discourse"(93).In particulat, GUS played the role of a travel agent able to assist a user in makittg a round trip from a city in California. Although its implementation was apparently less robust than SCHED, its designerssuggestedthat "the system is interesting becauseof the phenomenaof natural dialog that it attempts to model and becauseof its principles of program organization" (93). The VIPS system, which seeksto "allow a user to display objects of interest on a computer terminal and manipulate them via typed or spokenEnglish imperative sentences"(94), is unusual in that it incorporates a hardware voice recognuzer into an NLP. Initial applications have been to the numerical domain of its predecessor(the automatic programming system NLC) and to text editing, where objects may be referenced either in English or by use of a touch-sensitivedisplay screen. ARGOT represents a "long-term" research project seeking to "partake in an extended English dialogue on somereasonably well specifiedrange of topics" (95). The initial task domain for ARGOT was that of a computer-centeroperator.
The UC system is designed as "an intelligent natural-language interface that allows naive users to communicate with the UNIX(TM) operating system in ordinary English" (96). It answers questions such as "How do I print the file fetch.l on the line printer?" FinaIIy, some research has applied CL techniques to the analysis, as opposedto processing,of written texts. One such system is the Writer's Workbench (97), which, upon scanning the draft of a document, flags words, phrases, and sentence types that have been felt to negatively affect the readability of the text. For example, the previous sentence could be improved by eliminating the passivevoice ("have been felt") and unsplitting the infinitive ("to negatively affect").Another system, CRITIQUE, formerly EPISTLE (qv) (98), similarly performs stylistic analyses but also diagnosesseveral classesof grammatical errors. For examPle, Your statement of deficiencieshaue not been completed. (should be has) AutomaticProgramming.The term automaticprogramming (qv) refers to the effort to relieve a computer user from the details neededto get the machine to perform a desiredtask. In certain ways attempts at a natural-language solution to this problem are similar to the work on problem-solving systems (STUDENT and CARPS) described earlier, although no specific techniques from these systems appear to have been employed. A survey of early NL work in automatic programming appearsin Heidorn (99). One early system,describedby Gelb (100),was designedto answer simple probabitity questions such as "What is the probability of getting two or more heads or exactly three tails when four coins are tossedonce?" Another early system is NLPQ (101),which producedsimulation programs in GPSS for a class of queueing problems.An example of an initial specificationof a user's problem follows: When a vehicle arrives at a station, it leaves there immediately if the ler'gth of the line at a pump in the station is not less than 2. Seventy-five percent of the vehicles are cars and a fourth are trucks. There is just one pump. A simulation run time of 8 h is desired. At this point the system would take control and begin asking questionsto clarify and extend what it had been told. For example, it would begin by asking "How often do the vehicles arriv e at the station," and the user might answer "The arrivals of vehicles are normally distributed with a mean of 8 minutes." Having acquired all requisite information, NLPQ would first produce an English statement of its understanding of the entire problem and then a completeprogram for the task. As reported in Green (102), the PSI system included modules for English input and output, along with many other components, in a large-scale automatic programming system. An excerpt from a sessionwith the system follows. "[The proglam] first inputs a concept.Then it repeatedly accepts an input scene from the user, classifies it based on whether or not it fits the concept,and outputs this classification to the user." WHAT IS THE STRUCTURE OF THE CONCEPT? "A concept is a collection of relations. Each relation is just a string." WHAT IS THE EXIT TEST OF THE LOOP .
COMPUTATIONATTINGUISTICS
Due to the complexities involved in program synthesis, PSI was tested only on problems related to sortirg, graph reachability, and determining prime number. Unlike the precedingwork, the NLC system (103) choseto regard an English input as a statement in a very-high-level program. The system acted as an interpreter, providing an interactive matrix-domain programming environment with visual feedbackas each English command was processed.An example input follows. "Subtract six from the next to last positive number in the row that was multiplied by 5." In addition to simple imperatives,NLC providedfor loops("repeat"), conditionals ("if . ."), and proceduredefinitions ("define a way to ."). An experiment study of programming with the system is describedin Biermann et al. (94), and an application of the system for college sophomore-levellinear algebra instruction is discussedin Geist et al. (104). ScientificText Processing.Basedon many years of work developing a comprehensivegrammar for English (10b),a group of researchers constructed a system intended "to allow the health care worker to create [a] medical report in the most natural way-in medical English, using whatever syntax is appropriate to the information (106,107).After gathering reports in English and converting them to a textual database form, the system could be intemogated as though it were a conventional databasesyst€ffi, again using English for inputs. Some examples of the types of inputs gathered in a clinical setting are the following:
141
and Hendrix (110), Hendrix and Lewis (111), Mark (LLz), Thompson and Thompson (68,69), Wilczynski (113), Warren and Pereira (LI4), Bates (115), Ginsparg (116), Grosz (LI7), Ballard et al. (118),and Grishman et al. (119).Also, several papers deriving from a recent workshop on transportability (120) have appeared, including Damerau (LzL), Hafner and Godden(L22),Marsh and Friedman (109),Slocum and Justus (L23), and Thompson and Thompson (I24). The Reemergence of MachineTranslation.As indicated earlier, the ALPAC report of 1966nearly eliminated U.S. government funding of projects in machine translation (MT). Naturally, this caused a marked decline in the amount of work being done in the area and in the number of paperspublished. Nevertheless,due in part to progressin AI and other areas of CL, a gradual resurgence of interest in MT has occurred over the past decade,and the field gives evidenceof becomingwell populated once again. A bibliography of about 480 publications since 1973 related to MT can be found in Slocum (125), along with summary papers on several of the major full-scale translation systems in existence.Papers from a recent conference on MT (126) are also available.
The Commerciafization of NLP. As indicated above,Harris's databasefront-end, RoBor, becamethe proprietary software of Artificial Intelligence Corporation in the late 1970s.Under the name INTELLECT, this system was for several years virtually the only natural-language product on the market. In the early 1980s,however,several well-known I\L researchers, including Gary Hendrix and Roger Schank, formed or became associatedwith start-up ventures. In recent years products X-rays taken 3-22-65reveal no evidenceof metastatic disease. from these and other companieshave been appearing for database and other applications [e.g., a developingexpert system Chest X-ray on 8-L2-69showedno metastatic disease. interface is discussedin Lehnert and Schwartz (127)1.More 3-2-65chest film shows clouding along left thorax and pleural recently researchersat Carnegie-Mellon University and other thickening. academic institutions have formed companies;Texas Instruments has produced a menu-basednatural-language-like inand an example question is "Did every patient have a chest X- terface; at least one project at BBN Laboratories is slated for ray in 1975?"A distinctive feature of the systemis its method commercial release; and other corporate flirtations are occurof creating so-calledinformation format structures, which are ring. In addition to databasequery, machine translation syssimilar to structured database records but capture the infor- tems are also being sold, and prospectsexist for additional mation initially supplied in textual form. Defining an informa- application areas. All of these activities, as well as an overtion format for a particular application involves, first, isolat- view of ongoing research into the theoretical and applied side ing word classesby syntactic properties,e.g.,the verbs reueal of CL, are reported in Johnson (128). and show are alike in taking X-ray as a subject, although Xray and film are alike in taki ng show as a verb, and, second, Theoreticallssues defining the columns of the table from the word classesso that any input sentence will have a paraphrase like that shown Having thus far conducted a chronological review of projects above (108). The system, which is unusual in containing as- within CL that relate more or less directly to specific applicapects of both database and information retrieval, was subse- tions, this section provides a brief overview of some of the quently adapted to the domain of navy messages,as described major theoretical topics associated with CL. The discussion in Marsh and Friedman (109). relates specificsystemsto the theoretical issues,but it primarily emphasizes theoretical techniques and formalisms that have contributed to the classification of research in CL. The CurrentTrends topics are parsing and grammatical formalisms, semantics, Domain-lndependentlmplementations.Several domain-in- discourseunderstanding, text generation, cognitive modeling, dependent database systems have already been mentioned, language acquisition, and speechunderstanding. Further inbut the intensity of effort at enhancing the transportability of formation is available in separatearticles. systemsfor this and other applicationsareas should be noted. In particular, a number of projects are seeking either to allow Parsingand GrammaticalFormalisms.Parsing (qv) issues users themselves to carcy out a customtzation or to have the have been of central interest in CL since its inception, when system adapt itself automatically to a user or a domain of cL included the study of formal languages and programming discourse.Representativeexamplesof this work include Haas languag€s, as well as natural languages. As the term is used
142
COMPUTATIONALTINCUISTICS
here, parsing refers to the processof assigning structural descriptions to an input string. Classically, parsers have used various forms of phrase structure grammars and have assigned phrase structure markers to produce derivation trees. Parsers with accessto semantic and pragmatic knowledge, however, may build semantic descriptionsdirectly without explicitly creating derivation trees. Directionof Analysis.Parsing strategies are often classified as top-down (or goal-driven) if they begin with the start symbol and backward chain from the consequentsof rules to their antecedents.Recursive-descentparsers, the PROLOG execution procedurefor definite-clausegrammars (DCGs)(L29),and the usual execution procedure for augmented transition network (ATN) grammars (39) all use top-down approaches(see Processing, bottom-up and top-down). Bottom-up (or datadriven) techniques proceed in a forward direction from the terminal symbols (words) in the grammar toward the start symbol. Left-corner (including shift-reduce) parsers (130), word-basedparsers(131)and its descendants(132),chart parsers (44,133-136),and deterministic parsers(137,138)are primarily bottom-up. Parsers can also be classifiedaccordingto how they analyze the input string: from left to right, from right to left, or from arbitrary positions in the middle outward. The left-to-right ordering is simple and natural and lends itself to easy bookkeeping. It is also of theoretical interest for parsers that attempt to model aspects of cognitive processes,such as attention focusing, that are dependent on temporal ordering. Middle-out, bottom-up parsers have been used particularly in speechsystems(139,140),where the parser can use its analysis in regions of greatest certainty to help in noisy or unintelligible regions, which would cause trouble for a rigid left-toright parser. Parsing techniques bear a close relationship to grammatical formalisms, although a particular grammar or class of grammars can sometimesbe parsed in a variety of ways. ATN grammars, for example, have been parsed by both top-down/ left-right and bottom-up/middle-out methods. Another technique for matching grammar and parser is to preprocess a grammar into an equivalent grammar suitable for a particular parsing method. Search and Nondeterminism. Controlling the search effort for a parse and handling nondeterminism are major problems for parsers. Many actual parsers use a blend of top-down and bottom-up techniques. As a simple example of this, almost every recursive descent (top-down) parser uses some kind of (bottom-up) scanner to identify the tokens in the input. Partof-speechclassifications in a lexicon are a form of bottom-up information. Another method for improving top-down parsing techniques is the use of a precomputedleft-branching reachability matrix that can be used to decidewhether the next input symbol can appear in the leftmost branch of a derivation tree headedby a particular nonterminal. Word expert parsing (qv) (132) uses an idiosyncratic combination of top-down expectations and bottom-up processing. The three principal methods for dealing with nondeterminism are backtracking, parallelism, and transforming the grammar so that a deterministic algorithm (perhaps using bounded look-ahead) can efficiently parse it. Backtracking parsers pursue one alternative at a choicepoint and return to select another alternative on failure of the first one. Forcing failure after a successful parse can cause the backtracking
parser to find additional parses. The standard execution procedure of PROLOG provides such a facility directly for logic grammars such as DCGs, which can be represented as PROLOG programs. Backtracking techniques are especially popular and natural for context-free grammar formalisms. Context-free grammars, which have a single consequentnonterminal, lend themselves to backward-chaining execution methods that work nicely in conjunction with backtracking. Parallel parsing methods keep track of multiple derivations at each point in the processing.The derivations can be developed concuruently using sequential algorithms and machines or truly in parallel if multiple processingresourcesare available. Interest in parallel approacheshas increasedas parallel hardware is becomingavailable; it has also beenbuoyedby the resurgence of connectionist and neural network research (141-143). Chart parsing (qv) algorithms use a particularly efficient way of recording which derivations have already been found to cover substrings of the input string. By splitting the computation at choice points, backtracking methods can also be parallelized. Another alternative, explored by Marcus (137),is factoring the rules in such a way that limited look-aheadis sufficient to resolve most of the nondeterminism. Although not all of English, for example, can be treated deterministically in this manner, the parser interestingly fails on many of the same "garden path" sentencesthat cause people trouble. Cases of Iexical and structural ambiguity that the parser cannot resolve are left for other modules. Word-basedparsing systems generally attempt to incorporate enough knowledge to determine a unique interpretation. When ambiguity cannot be resolved without look-ahead,two possibilitiespresentthemselves.One techniqueis to let a later constituent complete the interpretation; for example, verbs can be responsible for assigning the role of the subject noun phrase. Another solution is to spawn demons that check for the appearanceof disambiguating items. For example, sensespecificdemonsmight check for the presenceof particular particles to detect multiword verbs. Grammatical Formalisms.Many different grammatical formalisms have been used by natural-language-processingsystems. One of the earliest systems, the Harvard Syntactic Analyzer (I44), recognized context-free grammars. Transformational grammar (TG) theories had a direct influence on the Mitre (145) and Petrick (146) parsers and an indirect influence on many others. The UCLA glammar combined TG with case grammar theory (L47). In simplest terms, transformational grammars specify a set of (usually context-free) base phrase structure rules, I set of structure mapping rules, and various conditions, filters, or principles that generated structures must satisfy. Since TG is stated as a generative theory, parsers must try to guesswhich transformations must have been applied by effectively inverting the rules. This has proved to be quite difficult in practice. One of the most comprehensivecomputer grammars of English has been developedat the Linguistic String Project (105). The grammar consists of a set of 180 BNF phrase structure rules, 180 restriction rules that check feature conditions, string-transformation rules, and ellipsis rules. Additional sublanguage categorizations are added to the lexicon together with domain-specificrestriction rules to increase parsing efficiency. ATN grammars (39) have been very influential on compu-
COMPUTATIONALLINCUISTICS
tational approachesto language processing.They augment recursive transition-network grammars, which recognize context-free languages,with actions and tests that give them the recognition power of Turing machines. With suitable self-restraint, however, one can produce disciplined, well-structured grammars in many different grammar-writing styles. The LUNAR grammar (53) was a quite detailed, large grammar (see Grammar, ATN). Other augmented phrase structure formalisms include the DIAMOND grammars at SRI [e.g., DIAGRAM (148)l and the APSG (augmented phrase structure grammar) formalism used in the CRITIQUE system developed at IBM (149). The systemic grammar theory of Halliday (55) has been incorporated in many NLP systems,notably in Winograd's SHRDLU system (52) and in the large NIGEL grammar (150). More recent work in linguistics has revived interest in nontransformational theories of phrase structure grammars, particularly context-free grammars. These theories hold that notational augmentations to phrase structure grammars can express such difficult, "transformational" phenomena as movement nontransformationally and, furthermore, in most cases, that there exist equivalent context-free grammars. From a parsing perspective it will be most useful if the augmentations can be processedon the fly with little overhead above that required for context-free parsing. The augmentations include metarules, complex features, and principles of feature instantiation (151).The major theoretical frameworks include generalized phrase structure grammar (GPSG) (1b2), tree adjoining grammar (TAG) (153), head grammar (HG) (L54),lexical functional grammar (LFG) (155),and functional unification grammar (FUG) (156). Providing for Ungrammaticality. It has been observed that "while signficant progresshas been made in the processingof correct text, a practical system must be able to handle many forms of ill-formedness gracefully" (I57). When the ill-formedness in question is syntactic in origin and when the expected deviations can be grouped into a manageable number of classes,it is possible to prepare for "errors" by explicitly including extra rules in the system grammar so that a predictably deviant input is in fact treated as though it was grammatical. Due to the possible ambiguities that this practice introduces and to confront general situations where either the full range of errors cannot be predicted or the intended meaning cannot be recovered,more sophisticated mechanisms are called for. Attacks on the problem of ungrammaticality are represented by work described by Weischedel and Black (1b8), Hayes and Mouradian (159), Kwasny and Sondheimer(160), Jensenet al (161),Weischedeland Sondheimer(L6z),Granger (163), and Fink and Biermann (164). It is also worth noting that, for some applications, S&hmatical errors are part of the problem being addressedrather than a regrettable accident. Examples include the German language CAI system and the text-critiquing systems mentioned earlier. In addition, most AI work in speechunderstanding is fundamentally concerned with the rampant and perhaps inherent uncertainties associated with speech-recognitiondevices.These uncertainties actually make error conditions the rule rather than the exception. Semantics.Semantics concerns the study of meaning. In the context of CL, this most often relates to problems of finding
143
and representing the meaning of natural-language expressions. The previous discussionhas already touched on several approaches to semantics, including conceptual dependency, procedural semantics, and semantic networks. Some others are preference semantics and other decompositionalsystems, Montague semantics,and situation semantics.Details on each of these can be found in the entry on semantics. Among the more significant questions to be asked of an approachto semantics,at least insofar as its relevanceto CL is concerned,are what sorts of noncompositionalitydoesthe system involve and what role, if any, is played by primitiues. In essence,the idea behind compositional semantics is to determine the meaning of an entire unit under analysis (phrase, sentence,text) in a systematic (ideally simple) way from the meanings of its parts. This approachhas obvious advantages in terms of being tractable for incorporation into an automated scheme for language understanding. The idea behind primitives (qv) (in its strong sense) is to determine a finite set of terms that by themselvescan expressthe meaning of any word and, by implication, the meaning of any utterance. Of the semantic schemes discussed earlier, conceptual dependency (qv) adheresto this goal, where procedural semantics(qv) does not. Many interesting debates on these and other issues of semantics have occurred, as discussedby Jackendoff (165). In building entire NL systems, many designers have attempted to separatesyntax from semanticsby performing syntactic analysis first and then converting the resulting structure (producedby the parser) to a meaning representation (see Natural-language understanding). In other cases, however, the two processeshave been much more tightly integrated. whereas problems in cL related to syntax have largely involved issues also addressedby the fietd of conventional (as opposedto computational) linguistics, problems of semantics have typically concernedwork in philosophy. In the context of AI, important work related to natural-language semantics is to be found in the area of knowledge representation(qv). Discourse Understanding.Discourse understanding includes natural-language-processingphenomena that span individual sentences in multisentence texts or dialogue. The work in discourse understanding acknowledgesthat the syntactic and semantic representations of sentencesin discourse contextsrelate both explicitly (e.g.,by clue words such as now, but, anyu)ay) and impficitty i..g.l by world knowledge) to the representations of other sentencesin the discourse. As an example of how an am azing amount of complexity can enter into even simple interchanges, consider the loilowing brief dialogue: Q: Can you tell me where John is? A: oh, he was hungry for one of Joe'sp:zzas.He,ll be back soon. The petitioner's use of a yes-no question is an example of an indirect speechact.In an indirect speechact one illocltionary act is performed indirectly by way of performing another (1G6168). The yes-no question is interpreted ur u form of politeness instead of a more direct utterance such as "Whlre is John?" Grice (169) noticed that conversational participants follow cooperative principles that he subcategorizld as q,rurrtity (be informative), quality (be truthful), relation (be rele-
144
COMPUTATIONALLINGUISTICS
vant), and manner (be brief). The response in the example abovemeets Gricean notions of appropriateness,but it, too, is indirect in communicating both where John is and why he is there. To infer John's location from his state of hunger and desire requires a plan and goal analysis from pragmatic (extralinguistic) knowledge. The final part of the answer respondsto an inferred petitioner's goal of being copresentwith John by suggesting that he will be back soon. In cooperative conversations Grice noted that speakers causedlisteners to make certain inferences,which he termed conuersationalimplicatures. Hirschberg (170) has studied a class of implicatures called scalar implicatures. In the sentence "Somepeople left early," for example, the hearer may reasonablyconcludethat "not all peopleleft early." A cooperative responseoccasionallyrequires that faulty presuppositions in the question be corrected. For a database query such as "How many juniors failed CS 200?" an answer of "none" is misleading if there were no juniors enrolled. The CO-OP system performed this type of presupposition checking (73). Another type of cooperativeresponseinvolves informing the user of discontinuities (171). In a flight reservation databaseone might want to know of any flights leaving before noon. It or one the next or might be helpful to suggestone at 12:05p.rvr. previous day if none are otherwise available. A natural idea for discouseunderstanding was to extend some of the conceptsof grammars and schemasfrom sentence parsing to discourse.Conversation-related work includes the Susie Software system (L72,L73) and discourseATN grammars (174). In story understanding Rumelhart ( 175) and Coreira (176) developedthe idea of story grammars. Many of the language-understanding systems of Schank and his students use knowledge structures [such as scripts, plans, memory organi zation packets (MOPs), and thematic abstraction processes.Diunits (TAUs)l to guide discourse-understanding thegoal-centered a use to alog-games$77) were an attempt on work (178) integrated also has Litman ory for dialogues. planning and discourse. Focus is an important technical notion in discoursework that relates to the shifts in attention during comprehension. Focus influences many aspectsof language understanding, including choice of topic, syntactic ordering, and anaphoric reference.Grosz (179) did pioneering work on global focus, i.e., how attention shifts over a set of discourseutterances.Immediate focus represents how attention shifts over two consecutive sentences.Sidner (180) used focus to disambiguate definite anaphora by tracking three things: the immediate focus of the sentence,a potential focus list created from discourseentities in the sentence,and the past immediate foci in a focus stack. The resolution of anaphora is an important problem within discourseunderstanding. Early techniques principally used a simple history list of discourseentities combinedwith a heuristic method for selecting them (often a variation of the most recently encounteredentity satisfying the reference).The simple techniques are inadequate, largely because they fail to u..o,rnt for focuseffectsand becausediscoursereferents do not have to be explicitly mentioned (e.g.,the referent of he tn the sentence"I got stopped yesterday for speeding,but he didn't give me a ticket"). BesidesSidner'stechniquedescribedabove, [h.p are several other notable approaches(seeRef. 181 for a more detailed account). Other methods include concept acti-
vatedness(182),task-oriented dialogue techniques(179),logical representations(183),and discoursecohesion(184,185). Text Generation.Text generation is the processof translating internal representations into surface forms. The forms of internal representationshave included deepstructure, semantic networks, conceptual dependency graphs, and deduction trees. The strateglc componentof a generation system chooses what to say-the messageto be conveyedincluding any propositional attitudes. The tactical componentdetermines how to say it. The earliest systemsgenerated sentencesat random to test grammars (186,187).Later AI efforts used generation techniques as a part of paraphrase systems,which parsed input strings into meaning representationsand then generatedback out into surface representations.Klein (188) used dependency gxammars that generated a semantic dependencytree and a standard phrase structure derivation tree. Dependencytrees from multiple sentenceswere related by nominal coreference links. A generation grammar matched portions of the dependency trees. Simmons and Slocum (189) produced sentences from a semantic network using an ATN modified for generation. Eventually, a parser was added to fully automate the paraphraseprocess(49). Similarly, Heidorn (190)reported an algorithm based on an augmented phrase structure grammar for producing English noun phrasesto identify nodesin a semantic network. Goldman (191) used a discrimination net for conceptual dependencygraphs. The net tested the primitive action types and roles to select an appropriate surface verb. This generator was later used as a part of the MARGIE system
(1e2). The generation technique in SHRDLU (.52)is an example of the template-basedapproachthat has predominated in generation techniques.The program used several types of patterned responsesincluding completely canned phrasessuch as "ok," param etenzed phrases such as "sorry, I don't know the word ," and more complexparameterizationsthat involved of determiners, discoursephrases,and dictioi[ffiFtitution nary definitions. Smali programs were responsiblefor formatting the descriptionsof objectsand events. For example, the definition for the event PUTON was (APPEND (VBFIX (QUOTE PUT)) OBJ1 (QUOTE (ON)) OBJ2) A heuristic pronominal substitution mechanism improved the quality of the responses,allowing for the generation of noun phrases as complex as "the large green one that supports the pyramid." Although most generation systemsof the 1970sused techniques similar to SHRDLU, two different, important generation programs appearedin 1974. Davey's PROTEUS program (193) describedtic-tac-toegames.The program had a rich understanding of the tactics of the game and could provide natural summaruzationsat an appropriate, high level. An example is "I threatened you by taking the middle of the edgeopposite that and adjacent to the one I had just taken but you blocked it and threatened me." The ERMA program (194) embodied a cognitive model of human generation that mimicked the realtime false starts and patching of utterances. The model was developedby studying transcripts of psychoanalysissessions to d.etermine a patient's reasoning patterns. As an example,
COMPUTATIONATLINGUISTICS
the program generated "you know for some reason I just thought about the bill and payment" as a gentle way of beginning to argue that "you shouldn't give me a bill." Interest in generation work has revived in the l-980swith a number of new researchprojects.Mann et al. (195) provide a survey of text generation projects. Some of the major projects are as follows. The transformational grammar generation system of Bates and Ingria (91), a very syntactically powerful generaor, was used in a CAI application. McDonald's generator, MUMBLE (196), models spoken language and concentrates on the fluency and coverageof the tactical component. The KDS system (197) used a "fragment-and-compose"paradigm in which the knowledge structure is divided into small propositional units, which are then composedinto large textual units. Mann and Mattheissen (150)useda systemicgrammar (NIGEL) for the tactical component in a text generation system. In describing a system for generating stock market reports, Kukich (198) proposeda "knowledge-intensive"approach to sentence generation; similarly specialized techniques form the basis of the generator for the previously mentioned UC project (199). The KAMP system (200) views generation as a planning problem of proving what to say. In her TEXT system,McKeown (201) adaptedideas of text schemas and focus from discourse-understandingresearch to the task of answer generation in a natural-language database system.
145
methodology lthe "free form speculation approach to theory buitdin g" (202)1,of general attitudes inherited from linguistic theory, of the emphasisin CD systemson the I/O behavior of programs instead of formal computational models, and of the difficulty in discovering and representing conceptual knowledge structures. One problem that plagued the inferencer in MARGIE was how to control the potential inferences that could be made. Later CD-based systems made inferences organized from knowledge sources such as scripts (204,205),plans and goals (206), beliefs (207), episodic memory (qt) (208), and thematic abstraction units (209). Scripts provide prepackaged causal and temporal links for stereotypical situations. For less structured situations the links are created dynamically by a plan and goal analysis. Inferences are also affected by one'sbeliefs (e.g.,conservative/liberalpolitical beliefs)and memory of past events.Although many of the ideasof schematicinferenceand planning are being incorporated in recent work, the difficulty of identifying and integrating a wide range of semantic and pragmatic representations remains a difficult problem for AI and CL.
LanguageAcquisition. Computational language acquisition (qv) research subdivides in much the same way that AI research generally does.Some researchersattempt to automate the acquisition of linguistic expertise by any efficacious method; other work is explicitly aimed at cognitive modeling Cognitive Modeling. In the late 1960s at Stanford, Roger and tries to be faithful to the psycholinguistic data on lanSchank, while working on a parser for an automated psychia- guage acquisition. Most of the language-learning systemsare trist project with Kenneth Colby, developeda meaning repre- concernedprimarily with learning syntactic rules. New computational approaches to language acquisition sentation known as conceptualdependency(CD). Having been exposedto machine translation as a graduate student, Schank have generally followed developments in linguistics or natuwas convinced that more of the underlying meaning of sen- ral-language-processingtechniques. The ZBIE system (210) tences needed to be represented. In particular, certain infer- learned foreign language rules from input pairs consisting of a enceswere included in the CD graphs. The basic schemewas semantic representation and a surface string. For example, centeredon approximately a dozenprimitive action concepts. the representation (be (on table hat)) was paired with the senThe translation of "X hit Y," for example,was approximately tence "The table is on the hat." To the extent that the appro"X propelled someZ from X to Y which resulted in the state of priate syntactic structure of a sentence bears a particular Y and Z betng in physical contact." The first fairly complete relationship to the semantic structure, the semantic system, MARGIE, included a parser (conceptual analyzer),an representation can guide in the induction of syntactic rules. Anderson'sgraph deformation condition (211) is a statement inferencer, and a text generation system (192). In an interesting early retrospective of the CD paradigm, of this principle. Klein's AUTOLING program (212)derived a Schank offered this perspectiveon the situation that he faced transformational grammar in cooperation with a linguist informant. The derived grammars contained context-freephrase in the late 1960s (202): structure rules and transformations. Harris (2LB)produced a point Thus, my was that Chomsky was wrong in claiming that language-learning system for a simulated robot. The system we should not be attempting to build a point by point model of a performed lexicalization, the processof mapping words to conspeaker-hearer.Such a model was precisely what I felt should cepts,and the induction of a Chomsky normal-form grammar. be tackled. Linguists uiewed this as perforrrle,nceand thus un- Berwick (2L4) investigated learning transformational graminteresting. I took rny case to psychologists and found them mar rules of the sort embodied in a Marcus parser. Reeker (zlil explicitly modeled a child's acquisition of lanequally uninterested.Psychologistsinterestedin language were guage with a problem-solving theory. The grammar was repremostly psycholinguists, and psycholinguists for the most part sented by context-free syntactic rules paired with a semantic bought the assurnptions of transformational grarnma,r (although it seemeduery odd to rrle that giuen the competencel representation modeled after conceptualdependencynotation. performance distinction, psychologistsshould be on the side of The system received as input an "adult sentence" and its meaning. A heuristic reduction processformed a reduced sencompetence). tence, which was then compared against a "child sentence" Schank's emphasis on semantic representationswas sup- producedfrom the meaning by the child's current grammar. If ported by others [notably the work on preferencesemanticsby a difference in the derived sentenceswas obtained, the gramWilks (203)l but has been slow to make a large impact on mar *3. adjusted. The AMBER system (216) similarly compractical systems.Perhaps the slow acceptancewas a result of pares input sentences to internally generated sentencesto
146
COMPUTATIONALTINGUISTICS
identify discrepancies.The CHILD system (2L7) receives an BIBLTOGRAPHY adult sentenceand a conceptual dependencyrepresentation of visual input. The model builds lexical definitions similar to 1. W. Weaver, W. Locke and A. Booth (eds.),in Machine Translation of Languages,MIT Press,Cambridge, MA, pp. 15-23, 1955. those of other word-basedparsers. The psychologistJohn R. Anderson has made many contri2. W. Locke and A. Booth (eds.), Machine Translation of Languages, MIT Press,Cambridge,MA, 1955. butions to language acquisition research. His LAS system (zLL) acceptedsentence-scenedescription pairs and learned 3. A. Oettinger, Automatic Language Translation, Harvard University Press, Cambridge, MA, 1960. an ATN grammar that was used for both recognition and generation. The scenedescriptionswere encodedin the HAM asso4. Y. Bar-Hillel, The Present Status of Automatic Translation of Vol. 1, AcaLanguag€s,in F. Alt (ed.), Aduancesin Compu,ters, ciative network representation (218).Following this work, he pp. 1960. 102-103, York, New demic Press, developeda series of cognitive models and learning theories 5. National ResearchCouncil, Language and Machines: Computers basedon a hybrid architecture, called ACT (adaptive control of in Translation and Linguistics, Report by the Automated Lanthought). An elaborate version of the model, ACT- (2I9), uses guage ProcessingAdvisory Committee (ALPAC), National Acada production system to control spreading activation processes of Sciences,Washington, DC, p. 19, 1966. emy in a semantic network. Anderson has studied the learning of 6 . N. Chomsky, Syntactic Structures, Mouton, The Hague, 1957. production rules for language generation, which is viewed as a 7 . Reference6, p. 34. problem-solving activity in ACT-. SpeechUnderstanding.The problem of understanding spoken natural langu age involves virtually all of the issues discussedabove as well as others of its own (see Speechunderstanding). FurtherReading In addition to the many referencesalready cited and the discussions and references in related articles, Feigenbaum and Feldman (220) and Minsky (221) contain descriptionsof, and Simmons (17,34) discusses,early work in natural-language processing;Rustin Q22) and Zampolli (22$ consider the status of several question-answering systemsof the early to middle 1970s;Kaplan (224) contains brief summaries of several dozen projects underway in the early 1980s;and the brief articles in Johnson and Bachenko Q25) give prospectsfor work in several areas of CL. Tennant (173) provides a fairly broad introduction to natural-language processing and contains technical details and historical remarks, as do the articles in Barr and Feigenbaum (226) and Lehnert and Ringle Q27). Grishman Q28) provides a general introduction to technical problems in the field; matters of parsing and grammatical formalisms are discussedin King (229), Winograd (230), Sparck Jones and Wilks (23L), and Dowty et al. (232); an interesting discussionof cognitive approachesto semantics is Jackendoff (165); Brady and Berwick (233) contains papers on discourse. Schank and Riesbeck (234) and Simmons (235) present the actual mechanismsby which specificprocessorshave beenconstructed. Harris (236) has written a recent textbook on natural-Ianguage processing (see Natural-Ianguage understanding). Many articles have appearedin conferenceproceedings,including the annual meeting of the ACL, the biennial International Conference on Computational Linguistics (COLING), conferencessponsoredby the American Association for Artificial Intelligence (AAAI), the biennial International Joint Conference on AI (IJCAI), a Conferenceon Applied Natural Language Processing,and two conferenceson Theoretical Issues in Natural Language Processing.A primary journalrs Compu' tational Linguistics (formerly the American Journal of Com' putational Linguistics), and other important journals include Artifi,cial Intelligence, the Canadian Journal of Artificial Intel' ligence,and Cognitiue Science.
8 . N. Chomsky, Aspectsof the Theory of Syntar, MIT Press, Cambridge, MA, 1965. 9. G. Salton, Automatic Information Organization and Retrieual, McGraw-Hill, New York, 1968. 10. J. Becker and R. Hayes,Information Storageand Retrieual Tools, Elements, Theories,Wiley, New York, 1963. 11. D. Hays (ed.;, Readings in Automatic Language Processing, American Elsevier, New York, 1966. L2. K. Sparck Jones and M. Kay, Linguisfics and Information Sclence,AcademicPress,London, 1973. 13. B. Raphael, Hewlett-Packard, personal communication, July 1983. L4. D. Bobrow, Natural Language Input for a Computer ProblemSolving System, in M. Miusky (ed.), Semantic Information Processing,MIT Press,Cambridge,MA, pp. 133-2L5, 1968. 15. ReferenceL4, p. 146. 16. V. Giuliano, "Commentson the article by Simmons,"CACM 8(1) p. 69, (1965). 17. R. Simmons, "Answering English questionsby computer: A survey," CACM 8(1), 53 (1965). 18. ReferenceL7, p. 70. 19. B. Greetr,A.WoIf, C. Ohomsky,and K. Laughery, BASEBALL: An Automatic Question Answerer, in E. Feigenbaum and J. Feldman Computers and Thought, McGraw-Hill, New York, 1963. 20. R. Lindssy, Inferential Memory as the Basis of Machines which Understand Natural LanguaE€, in E. Feigenbaum and J. Feldman (eds.), Computers and Thoughf, McGraw-Hill, New York, p . 2 2 L ,1 9 6 3 . 2L. B. Raphael, SIR, a Computer Program for Semantic Information Retrieval, in M. Minsky (ed.),Semantic Information Processing, MIT Press,Cambridgu,MA, P. 33, 1968. 22. J. Craig, S. Berezner, C. Homer, and C. Longyear, DEACON: Direct English Accessand Control. AFIPS 1966Fall Joint Computer Conference,p. 366. 23. Reference22, p. 376. 24. F. Thompson,P. Lockemann,B. Dostert, and R. Deverill, REL: A Rapidly Extensible Language System, ACM National Conference,p. 400, 1969. 25. Reference24, p. 404. ZG. C. Kellogg, A Natural Language Compiler for On-line Data Management, AFIPS 1968Fall Joint Computer Conference,pp. 473-492. 27. Reference14, p. 204.
COMPUTATIONALLINCUISTICS ZB. E. Charniak, Computer Solution of Calculus Word Problems, Proceedingsof the First International Joint Conferenceon Artifi' cial Inteltigence,Washington, DC, pp. 303-316, 1969. 29. Reference28, p. 305. 30. Reference28, p. 309. 31. J. Weizenbaum, "ELIZA: A computer program for the study of natural language communication between man and machine," CACM e(1) 36-45 (1966). 32. J. Weizenbaum, Computer Power a.nd Human Reason, W. H. Freeman, San Francisco, CA, 1976. 33. K. Colby, S. Weber, and F. Hilf, "Artificial paranoia," Artif. Intell. 2, L-25 (1971). 34. R. Simmons, "Natural language question answering systems: 1969,"}ACM 13(1) 15-30 (1970). 85. R. Simmons and D. Londe, NAMER: A Pattern Recognition System for Generating Sentencesabout Relationships between Line Drawings, Report TM-1798, System Development Corp., Santa Monica, CA, 1964. 36. R. Kirsch, Computer Interpretation of English Text and Picture Patterns, IEEE Trans. Electron. Comput., L3,363-376 (1964). 37. J. Thorne, P. Bratley, and H. Dewar, The Syntactic Analysis of English by Machine, in D. Mitchie (ed.), Machine Intelligence, Vol. 3, American Elsevier, New York, pp. 281-299, 1968. 38. D. Bobrow and B. Fraser, An Augmented State Transition Network Analysis Procedure. Proceedingsof the First Internq,tional Joint Conferenceon Artifi,cial Intelligence, Washington, DC, pp. 557-567, 1969. 39. W. Woods,"Transition network grammars for natural language analysis,"CACM 13,591-606 (October1970). 40. C. Fillmore, The Casefor Case,in E. Bach and R. Harms (eds.), (Jniuersals in Linguistic Theory, Holt, Rinehart and Winston, New York, pp. 1-90, 1968. 4L. B. Bruce, "Case systemsfor natural languagQ,"Artif. Intell.6, 327-360 (1e75). 42. R. Schank and L. Tesler, A Conceptual Parser for Natural Language, International Joint Conferenceon Artificial Intelligence, pp. 569-578, 1969. 43. D. Hays, "Dependency theory: A formalism and some observations," Language 40, 5fi-524 (1964). 44. M. Kay, Experiments with a Powerful Parser,Proceedingsof the SecondInternational Conferenceon Computational Linguistics, Grenoble,August 1967. 45. S. Lamb, "The semantic approachto structural semantics,"Am. Anthropol. (1964). 46. R. Schank, "Conceptual dependency:A theory of natural language understanding i' Cog. Psychol. 3, 552-63L (L972). 47. W. Woods,Procedural Semanticsfor a Question-AnsweringSystem, AFIPS 1968Fall Joint Computer Conference,pp. 457-47t. 48. M. Quillian, Semantic Memory, in M. Minsky (ed.), Semantic Information Processing, MIT Press, Cambridge, MA, pp. 2t6270,1968. 49. R. Simmons, Semantic Networks: Their Computation and Use for Understanding English Sentences,in R. Schank and K. Colby (eds.),Computer Models of Thought and Languag€, W. H. Freeman, San Francisco,CA, pp. 63-113, 1973. 50. N. Findler (ed.), AssociatiueNetworks: Representationand Useof Knowledge in Computers,Academic Press, New York, 1979. 51. J. Sowa, ConceptualStructures: Information Processingin Mind and Machine, Addison-Wesley,Reading, MA, 1984. 52. T. Winograd, Understanding Natural Language, Academic Press,New York, 1972. 53. W. Woods,R. Kaplan, and B. Nash-Webber,The Lunar Sciences
147
Natural Language Information System: Final Report, Report 2378,Bolt Beranek and Newman, cambridge, MA, L972. 54. W. Woods, Lunar Rocks in English: Explorations in Natural Language Question Answeritg, in A. Zampolli (ed.),Linguistic StructuresP rocessing,North-Holland, Amsterdam, pp. 521-569, t977 . 55. M. Halliday, "Categories of the theory of grammar," Word t7, 24r-292 (1961). 56. T. Winograd, Frame Representationsand the Declarative-Procedural Controversy, in D. Bobrow and A. Collins (eds.),Representation and (Jnderstanding,Academic Press,New York, pp. 1852L0, L975. 57. Y. Wilks, Natural Language Understanding Programs Within the A.I. Paradigm: A Survey and SomeComparisons,in A. Zam' polli (ed.),Linguistic Structures Processiog,North-Holland, Amsterdam,pp. 341-398, L977. 58. S. Petrick, On Natural-Language Based Computer Systems,in A. Zampolli (ed.), Linguistic Structures Processing,North-Holland, Amsterdam, pp. 313-340, 1975. Also appears in IBM J. Res.Deu. 20(4), 3L4_325(1976). 59. E. Codd, R. Arnold, J. Cadiou, C. Chang, and N. Roussopoulos, SevenStepsto RENDEZVOUS with the Casual Llser, in J. Kimbie and K. Koffeman (eds.),Data Base Management,North-Hol1974. land, pp. 1-79-200, 60. W. Plath, "REQUEST: A natural language question-answering system,"IBM J. Res.Deu.2O(4),326-335(1976). 61. S. Petrick, Transformational Analysis, in R. Rustin (ed.;,Natural Langaage Processing,Algorithmics, New York, pp. 27-4L, 19?3. 62. F. Damerau, "Operating Statistics for the Transformational Question Answering Systeml' Am, J. Computat.Ling. 7(1), 3044 (1981). 63. Hendrix, G. E. Sacerdoti,D. Sagalowicz,and J. Slocum, "Developing a natural language interface to complex data," ACM Trans. DatabaseSys. 3(2), 105-147 (1978). 64. Hendrix, G. Human engineering for applied natural language processing. Proc. of the Fifth Int. J. Conf. on Artifi.cial Intelligence,Cambridge,MA, 1977,pp. 183-191. 65. R. Hershman, R. Kelley, and H. Miller, IJser performancewith a natural language query system for command control. Tech. Report TR 79-7, Navy Personnel Researchand Development Center, San Diego, Ca., 1979. J. Tsotsos, 66. J. Mylopoulos,A. Borgida, P. Cohen, Roussopoulos, and H. Wong, TORUS: A Natural Language Understanding System for Data Management, Proc. of the Fourth IJCAI, Tbilisi, Georgia,pp. 414-421, 1975. 67. F. Thompsonand B. Thompson,Practical Natural Language Processing:The REL System as Prototype, in M. Rubinoff and M. Yovits (eds.),Aduancesin Computers,Vol. 3., Academic Press, New York, pp. 109-168, t975. 68. F. Thompson and B. Thompson, Shifting to a Higher Gear in a Natural Language System, National Computer Conference,pp. 657-662,1991. 69. B. Thompson and F. Thompson, Introducing ASK, a Simple Knowledgeable System, Conferenceon Applied Natural Langua,geProcessing,Santa Monica, CA, pp. 17-24, L983. 70. B. Thompson, Linguistic Analysis of Natural Language Communication with Computers, Proceedings of the Eighth International Conferenceon Computational Linguistics, Tokyo, pp. 19020L, 1990. 7L. M. Templeton, EUFID: A Friendly and Flexible Frontend for Data Management Systems,Proceedingsof the SeuenteenthAnnual Meeting of the ACL, pp. 91-93, 1979.
148
COMPUTATIONALLINGUISTICS
72. M. Templeton and J. Burger, Proglems in Natural-Language Interface to DBMS with Examples from EUFID , Conferenceon AppliedNatural LanguageProcessing,Santa Monica, CA, pp. 3-16, 1983. 73. S. Kaplan, Indirect Responsesto LoadedQuestions,Theoretical Issues in Natural Language Processing, Vol. 2, pp. 202-209, 1978. 74. D. Waltz,"AnEnglish languagequestionansweringsystemfor a large relational database,"CACM 2L(7), 526-539 (1978). 75. T. Finitr, B. Goodman,and H. Tennant, JETS: Achieving Completeness through Coverage and Closure, Proceedings of the Sixth International Joint Conferenceon Artificial Intelligence, Tokyo,Japan, pp.275-281, 1979. 76. H. Tennant, Experience with the Evaluation of Natural Language Question Answerers, Proceedings of the Sixth International Joint Conferenceon Artificial Intelligence, Tokyo, Japan, pp. 874-876, 1979. 77. H. Lehmann, "Interpretation of natural language in an information system,"IBM J. Res.Deu. 22(5),560-571(1978). 78. Reference77, p. 560. 79. J. Krause, Results of User Study with the User Specialty Language System and Consequencesfor the Architecture of Natural Language Interfaces, Technical Report 79.04.003,IBM Heidleberg Scientific Center, t979, 80. W. Bronnenberg, S. Landsbergen, R. Scha, W. Schoenmakers, and E. van Utteren, "PHLIQA-I, d question-answeringsystem for data-base consultation in natural English," Philips Tech. R eu. 38 229-239, 269-284 (L978- 1979). 81. Reference80, p. 230. 82. W. Hoeppner,T. Christaller, H. Marburger, K. Morik, B. Nebel, M. O'Leary, and W. Wahlster, "Beyond domain-independencQ," Proceedingsof the Eighth Int. J. Conf. on AI, Karlsruhe, FRG, PP. 588-594, 1983. 83. L. Harris, "User-oriented data basequery with the Robot natural languagesystem,"Int. J. Man-Mach. 9tud.9,697-713 (1977). 84. L. Harris, "The ROBOT system: natural language processing applied to data base query," ACM Natl. Conf. L65-L72 (1979. 85. S. Salveter, Natural Language DatabaseUpdates,Proceedingsof the NineteenthAnnual Meeting of the ACL, Unversity of Toronto, pp. 67-73, 1982. 86. J. Davidson and S. Kaplan, "Natural language accessto data bases: Interpreting update requests," Am, J. Computat. Ling. g(2),57-68 (1983). 87. J. Carbonell, "AI in CAI: An artificial intelligence approach to computer-assistedinstruction," IEEE Trans. Man-Mach. Sys. 11, 190-202 (1970). 88. A. Collins, E. Warnock, N. Aiello, and R. Miller, Reasoningfrom IncompleteKnowledge,in D. Bobrow and A. Collins (eds.),Representationand Understanding, Academic Press, New York, pp. 383-415, 1975. 89. J. Brown and R. Burton, Multiple Representationsof Knowledge for Tutorial Reasonirg, in D. Bobrow and A. Collins (eds.),Representationand (Jnderstanding,Academic Press, New York, pp. 312-313,1975. 90. R. Weischedel,W. Voge, and M. James, "An artificial intelligence approach to language instruction ," Artif. Intell. lO, 225240 (1978). 91. M. Bates and R. Ingria, Controlled Transformational Sentence Generation, Proceedngsof the Nineteenth Annual Meeting of the ACL, Stanford University, pp. 153-158, 1981' 92. G. Heidorn, Natural Language Dialogue for Managing an Online Calendar, ProceedingEof the Annual Meeting of the ACM, Washington, DC, pp. 45-52, L978. 93. D. Bobrow, R. Kaplan, M. Kay, D. Norman' H. Thompson'and
T. Winograd, "GUS: A frame-driven dialog system,"Artif. Intell. 8(2), 155-173 (1977). 94. A. Biermann, B. Ballard, and A. Sigmon, "An experimental study of natural language programmrng," Int. J. Man-Mach. stud. 18(1),7L-87 (1983). 95. J. Allen, A. Frisch, and D. Litman, ARGOT: The RochesterDialogue System, Proceedingsof the SecondNational Conferenceon Artificial Intelligence, Carnegie-Mellon University and University of Pittsburgh, Pittsburgh, PA, pp. 66-70, 1982. 96. R. Wilensky, Talking to UNIX in English: An Overview of UC, Proceedingsof the SecondAnnual Conferenceon Artificial Intelligence,Pittsburgh, PA, pp. 103-105, 1982. 97. N. MacDonald,L. Frase,P. Gingrich, and S. Keenan,"The Writer's Workbench: Computer aids for text analysis," IEEE Trq,ns. Commun..30, 105-110 (January 1982). 98. G. Heidorn, K. Jensen,L. Miller, R. Byrd and M. Chodorow,"The EPISTLE text-critiquing system," IBM Sys. J. 2L(3), 305-326 ( 1982). 99. G. Heidorn, "Automatic programming through natural language dialogue:A survey,"IBM J. Res.Deu.2O(4),302-313(1976). 100. J. P. Gelb, Experience with a Natural Language Problem-Solving System, Proceedingsof the SecondInternational Joint Conferenceon Artificial Intelligence, London, pp. 455-462, I97L. 101. G. Heidorn, Natural Language Inputs to a Simulation Programming System, Ph.D. Dissertation, Technical Report NPS55HD?2101A,Naval PostgraduateSchool,Monterey, CA, t972. I02. C. Green, A Summary of the PSI Program Synthesis System, Proceedingsof the Fifth International Joint Conferenceon Artificial Intelligence,Cambridge,MA, pp. 380-381, L977. 103. A. Biermann and B. Ballard, "Toward natural language computation," Am. J. Computat.Ling. 6(2),71-86 (1980). 104. R. Geist, D. Kraines, and P. Fink, Natural Language Computation in a Linear Algebra Course, Proceedingsof the National Educational Computer Conference,pp. 203-208, 1982. 105. N. Sager, Natural Language Information Processing:A Computer Grammar of English and Its Applica,tions,Addison-Weslty, Reading,MA, 1981. 106. L. Hirschman, R. Grishman, and N. Sager, From Text to Structured Information: Automatic Processing of Medical Reports, Proceedings of the AFIPS National Computer Conferencepp. 267-275, L976. 107. R. Grishman and L. Hirschman, "Question answering from natural language medical data bases,"Artif . Intell. 7, 25-43 (1978). 108. N. Sager, Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base, in M. Yovits (ed.),Aduancesin Compnters,Vol. L7, AcademicPress, New York, pp. 89-t62, 1978. 109. E. Marsh and C. Friedman, "Transporting the linguistic string project system from a medical to a Navy domain," ACM Trans. Ofc. Inform. Sys. 3(2), L2L-t40 (1985). 110. N. Haas and G. Hendrix, An Approach to Acquiring and Applying Knowledge, Proceedingsof the First National Conferenceon Artifi.cial Intelligence, Stanford University, Stanford, CA, pp. 235-239, 1980. 111. G. Hendrix and W. Lewis, Transportable Natural-Language Interfaces to Databases, Proceedngs of the Itlineteenth Annual Meeting of the ACL, Stanford University, pp. 159-165, 1981. 11Lz.W. Mark, Representation and Inference in the Consul System, Proceed,ingsof the SeuenthInternational Joint Conferenceon Artificiat Intelligence,Vancouver,BC, pp. 375-381, 1981. 113. D. Wilczynski, Knowledge Acquisition in the Consul System, Proceedingsof the SeuenthInternational Joint Conferenceon Artifi,ciatIntettigence,vancouver, BC, pp. 135-L40,1981. LL4. D. Warren and F. Pereira, "An efficient easily adaptable system
COMPUTATIONALLINGUISTICS for interpreting natural language queri€s," Am. J. Computat. Ling. 8(3-4), 110- L22 (1982). 1lb. M. Bates, Information Retrieval Using a Transportable Natural Language Interface, Proceedings of the International ACM SIGIR Conference,Bethesda,MD, pp. 81-86, 1983' 116. J. Ginsparg, A Robust Portable Natural Language Data Base Interfa ce,Proceedingsof tlte Conferenceon Applied Natural Lan' guqge Processing,Santa Monica, CA, pp. 25-30 (1983).
149
138. R. Milne, "Resolving Lexical ambiguity in a deterministic parser," Computat.Ling. Lz(L), I-I2 (1986). 139. W. Woods, "Optimal Search Strategies for SpeechUnderstanding Control," Artificial Intelligence L8, 295-326 (1982).
140. L. Erman, F. Hayes-Roth, V. Lesser and D. Reddy, "The Hearsay-Il speech understanding system," Computing Surueys L2, 2L3-253 (1980). 141. G. Cottrell, A Model of Lexical Accessof Ambiguous Words,Proceedingsof the Fourth Conferenceof the AAAI, Austin, TX, pp. 117. B. Grosz, TEAM: A Transportable Natural Language Interface Pro' 6L-67 , August 1984. Language Natural System, Proceedingsof the on Applied cessing,Santa Monica, CA, pp. 39-45' 1983. L42. M. Jones and A. Driscoll, Movement in Active Production Networks, Proceedingsof the Twenty-Third Annual Meeting of the 118. B. Ballard, J. Lusth, and N. Tinkham, "LDC-l: A transportable, Association for Computational Linguistics, pp. 161-166, July processor environoffice for knowledge-based natural language 1985. ments," ACM Trans. Ofc. Inf. Sys.2(1), L-25 (1984). 119. R. Grishman, N. Nhatr, E. Marsh, and L. Hirschman, Automated L43. D. Waltz and J. Pollack, "Massively parallel parsingi' Cog. Scl. 9 ( 1 ) ,5 L - 74 ( 1 9 8 5 ) . Determination of Sublanguage Syntactic Us&ge,Proceedingsof the International Conferenceon Computational Linguistics, Stan- I44. S. Kuno, "The predictive analyzer and a path elimination techford, pp. 96-98, July 1984. , nique," CACM I 453-462 (1965). L20. B. Ballard (ed.), "special issue on transportable natural lanL45. A. Zwicky, J. Friedman, B. Hall, and D. Walker, The MITRE guage processirg," ACM Trans. Ofc. Irf. sys. 3(2) 104-230 Syntactic Analysis Procedure for Transformational Grammars, (1985). IFIPS Proceedings Fall Joint' Computer Conference,Spartan, Washington,DC, pp. 317-326, 1965. LzI. F. Damerau, "Problems and some solutions in customization of natural language database front ends," ACM Trans. Ofc. Inf. t46. S. Petrick, A Recognition Procedure for Transformational GramSys.3(2), 165-184 (1985). mars, Ph.D. Dissertation, MIT, Cambridge,MA, 1965. t4z. C. Hafner and K. Godden,"Portability of syntax and semantics L47. R. Stockwell, P. Schachter, and B. Partee, The Major Syntactic in DATALOG," ACM Trans. Ofc. Inf. Syg 3(2), 141-164 (1985). Structures of English, Holt, Rinehart and Winston, New York, 1973. L23. J. Slocum and C. Justus, "Transportability to other languages," ACM Trans. Oft. Inf.Sys. 3(2), 204-230 (1985). 148. J. Robinson, "DIAGRAM: A grammar for dialogu€s," CACM 25(I), 27 -47 (January L982). L24. B. Thompson and F. Thompson,"ASK is transportable in half a (1985). 185-203 3(2), Inf. Sys. Ofc. Trans. dozenways," ACM I49. G. Heidorn, Augmented Phrase Structure Grammars, in B. Webber and R. Schank (eds.), TheoreticalIssuesin Natural Language L25. J. Slocum (ed.),"special issueson machine translation," CompuProcessing,Cambridg", MA, pp. 1-5, L975. tat. Ling. ll(2-4) (1985). 150. W. Mann and C. Mattheissen, Nigel: a Systemic Grammar for L26. S. Nirenburg, (ed.),Machine Translation: Theoreticaland Meth' Text Generation, in Freedle (ed.),SystemicPerspectiueson Disodological Issues,Cambridge University Press,New York, 1987. course: Selected Theoretical Papers of the Ninth International Language A Natural L27. W. Lehnert and S. Schwartz, EXPLORER: Systemic Workshop,Ablex, Norwood, NJ, 1985. ProcessingSystem for Oil Exploration, Proceedingsof the Con' Monica, Santa Processing, 151. M. Kay, Functional Grammar, Proceedingsof the Fifth, Annual ferenceon Applied Natural Language Meeting of the Berkeley Linguistic Society,pp. L42-158, 1979. CA, pp. 69-72, L983. L52. G. Gazdar, Phrase Structure Grammar, in P. Jacobsonand G. L28. T. Johnson, Natural Language Computing: the Commercial ApPullum (eds.),The Nature of SyntacticRepresentation,D. Reidel, plicatiorzs,Ovum, London, 1985. Dordrecht,pp. 131-186, 1982. grammars for clause "Definite t1g. F. Pereira and D. H. D. Warren, language analysis: A survey of the formalism and a comparison 153. A. Joshi, How Much Context-sensitivity is Required to Provide ReasonableStructural Descriptions: Tree Adjoining Grammars, with Augmented Transition Networks," Artif. Intell. 13, 23tin D. Dowty, L. Karttunen, and A. Zwtcky (eds.),Natural Lan278 (1980). g uage P rocessing : P sycholing uistic, Computational and T heoreti130. D. Chester, "A parsing algorithm that extends phrases," Am. J. cal Properties, Cambridge University Press, New York, 1984. Computat.Ling. 6(2), 87-96 (1980). 181. C. Riesbeck,ConceptualAnalysis, in R. Schank (ed.), Conceptual I54. E. Proudian and C. Pollard, Parsing Head-Driven Phrase Structure Gramm ar, Proceedngsof the Twenty -Third Annual Meeting I nforrnationP rocessing,North-Holland, Amsterdam, pp. 83- 156, of the Association for Computational Linguistics, pp. 8-12, July L975. 1985. (A L32. S. Small, Parsing and Comprehending with Word Experts Theory and Its Realization),in W. Lehnert and M. Ringle (eds.), 155. J. Bresnan and R. Kaplan, Lexical-FunctionalGrammar: A Formal System for Grammatical Representation,in J. Bresnan (ed.), Strategiesfor Natural Language Processing,Lawrence Erlbaum, The Mental Representationof Grammatical Relatio,rzs,MIT Press, Hillsdale, NJ, L982. Cambridg", MA, L982. 133. D. Younger, "Recognition and parsing of context-freelanguages 156. M. Kay, Functional Unification Grammar: A Formalism for Main time n3," Inf. Ctr., 10, 129-208 (1967). chine Translation, Proceedingsof Coling 84,Menlo Park, pp. 75parsing algorithm," CACM L34. J. Earley, "An efficient context-free 78, L984. l3(2), 94-102 (February 1970). L57 . Allen (ed.;,"Specialissueon ill-formed input," Am. J. CompuJ. 135. R. Kaplan, A General Syntactic Processor,Algorithmics, New tat. Ling. 9(3-4), L23-196 1983. York, 1973. 158. R. Weischedel and J. Black, "Responding intelligently to un136. W. Ruzzo, S. Graham, and M. Harrison, "An improved contextparasableinputs," Am J. Computat.Ling. 6(2),97-109 (1980). free recognizer," ACM Trans. Program. Lang. Sys. 3, 4L5-562 159. P. Hayes and G. Mouradian, "Flexible parsirg," Am. J. Compu(July 1980). tat. Ling. 7(4),232-242 (1981). I37. M. Marcus, A Theory of Syntactic Recognition for Natural Lan160. S. Kwasny and N. Sondheimer,"Relaxation techniquesfor parsgua.ge,MIT Press, Cambridge, MA, 1980.
150
COMPUTATIONALLINGUISTICS ing ill-formed input," Am. J. Computat. Ling. 7(2), 99-108 (1981).
161. K. Jensen, G. Heidorn, L. Miller, and Y. Ravin, "Parse fitting and prose fixing: Getting a hold on ill-formedness,"Am . J. Computat. Ling. 9(3-4), I47 -160 (1983). L62. R. Weischedel and N. Sondheimer, "Meta-rules as a basis for processingill-formed output," Am. J. Computat.Ling. 9(3-4), 1 6 1 - 1 7 7( 1 9 8 3 ) . 163. R. Granger, "the NOMAD system: Expectation-baseddetection and correction of errors during understanding of syntactically and semantically ill-formed texti' Am. J. Computat.Ling.9(34), 188-196 (1983). 164. P. Fink and A. Biermann, "Correction of ill-formed input using history-based expectation with applications to speech understanding,"Computat.Ling. 12(1),13-36 (1986).
185. A. Lockman, Contextual ReferenceResolution, Ph.D. Dissertation, Columbia University, May 1978. 186. V. Yngve, Random Generation of English Sentences,Proceedings of the International Conferenceon Machine Translation of Languages and Applied Language Analysis, National Physical Laboratory, Symposium No. 13, Her Majesty's Stationery Office, London,pp. 66-80, L962. 187. J. Friedman, "Directed random generation of sentences,"CACM 12(1),40-46 (1969). 188. S. Klein, "Automatic paraphrasing in essay format," Mechan. Transl. 8(3), 68-83 (1965). 189. R. Simmons and J. Slocum, "Generating English discoursefrom semantic networks," CACM 15(10),891-905 (L972).
190. G. Heidorn, Generating Noun Phrasesto Identify Noun Phrases in a Semantic Network, Proceedings of the Fifth International Joint Conferenceon Artifi,cial Intelligence, Cambridge, Mass., p. 165. R. Jackendoff, Semanticsand Cognition, MIT Press,Cambridge, r43, 1977. MA, 1983. 166. J. Searle,Indirect SpeechActs, in P. Morgan and J. Cole (eds.), 191. N. Goldman, Conceptual Generation. in R. Schank, Conceptual Information Processing, North-Holland, Amsterdam, pp. 289Syntax and Semantics,Vol .3, SpeechActs,Academic Press,New 371 1975. York, pp. 59-82, 1975. lg2. R. Schank, Conceptual Information Processing,with contribuL67. P. Cohen and C. Perrault, "Elements of a plan-basedtheory of tions from N. Goldman, C. Rieger, and C. Riesbeck,Vol. 3 of speechacts," Cog.Sci 3, I77-2L2 (L979). Studies in Computer Science,North-Holland, AmFundamental in utterances," intention Perrault, "Analyzing C. and 168. J. Allen 1975. sterdam, Artif. Intell. 15(3), L43-L78 (1980). Production, Edinburgh University Press, 169. H. Grice, Logic and Conversation, in P. Morgan and J. Cole 193. A. Davey, Discourse Edinburgh, L979. (eds.), Syntax and Semantics, Vol. 3, Speech Acts, Academic Lg4. J. Clippinger, "speaking with many tongues: Some problems in Press,New York, PP.41-58, 1975. modeling speakers of actual discourse," TINLAP-I, 68-73 170. J . Hirschb erg, "Toward a Redefinition of Yes/No Questiors," (1975). Stanford, Proc. of Tenth Int. Conf, on Computational Linguistics, 195. W. Mantr, M. Bates, B. Grosz,D.McDonald, K. McKeown' and CA, pp. 48-51, 1984. W. Swartout, "Text Generation: The state of the art and literaI7I. L. Siklossy,Question-AskingQuestion-Answering,Department ture," JACL 8,2 (1982). of Computer ScienceReport TR-71, University of Texas, Austin, McDonald, Natural Language Generation as a Computational D. 196. t977. TX, An Introduction, in M. Brady and R. Berwick (eds.), Problem: CamPress, MIT Dialog, Processing Framework A Brow1, L7Z. G. for Mod,etsof Discotrrse,MIT Press,Cambridgu,MA, Computational bridge, MA, June L977. pp. 209-265,1983. L7g. H. Tennant, Natural LanguageProcessing,Petrocelli, New York, W. Mann and J. Moore, "Computer generation of multiparatg1. 1981. graph English text," Am J. Computat.Ling.7, L7-29 (1981)' t74. R. Reichman, "Extended person-machineinterface," Artif. Intell. 1gg. K. Kukich, Design of a Knowledge-BasedReportGeneratot,Pro22(2), L57-218 (March 1984). ceedings of the Twentieth Annual Meeting of the ACL, CamL7S. D. Rummelhart, Notes on a Schemafor Stories,in D. Bobrow and MA, PP. 145-150, 1983bridge, A. Collins (eds.), Representationand Understanding, Academic, p. Jacobs, "PHRED: a generator for natural language inter1gg. New York, L975. faces,"Computat.Ling. 11(4),2L9-242 (1985)' 176. A. Coreira, "Computing story trees," Am. J. Computat. Ling' 200. D. App elt, Planning English Sentences,Cambridge University 6(3-4), 135-149 (1980). Press,New York, 1985. L77. J. Levin and J. Moore, "Dialog-games:Metacommunications K. McKeowr, Text Generation, Cambridge University Press, 201. structures for natural language interaction," Cog. Sci. 1(4),395New York, 1985. 420 Q977). Schank, Inference in the ConceptualDependencyParadigm: A R. Z0Z. for model recognition 1Tg. D. Litman and J. Allen, "A plan-based personal History, Yale l]niversity, Department of Computer Scisubdialoguesin conversationl' Cog.Sci. 11 (1987). Report 141, Septemberl'978' Research ence, 1Tg. B. Grosz, Focusing and Description in Natural Language Diay. Wilks, "A preferential, pattern-seeking semanticsfor natural 208. (Jnderstanding, UniCambridge logues,inElements of Discourse language inference,"Artif. Intell. 6, 53-74 (L975). versity Press,PP.84-105, 1981. 204. R. schank and R. Abelson, scripts, Plans, Goals, and under180. C. Sidner, "Focusing for interpretation of pronouns," Am. J. standing, Lawrence Erlbaum, Hillsdale, NJ 1977. -231 (1981). Computat.Ling. 7(4), 2I7 2Ob.R. Culingford, Script Application: Computer Understanding of 181. G. Hirst, "Discourse-oriented anaphora resolution in natural Newspaper Stories, ResearchReport 116, Yale University, Delanguageunderstanding:A review," Arn. J , Computat.Ling ' 7,2, partment of Computer Science'1978' pp. 85-98. R. Wilensky, Planning and (Jnderstanding, Addison-Wesl"y, 206. Discourse of Ig2. R. Kantor, The Management and Comprehension Readng,MA, 1983. Connection by Pronouns in English, Ph.D. Dissertation, Ohio 207. J. Carbonell, "POLITICS: Automated ideological reasonilg," State UniversitY, 1977. Cog.Sci. 2, 27-51 (1978). Garland, 188. B. Webber,A formal Approach to DiscourseAnaphora, 20g. J. Kolodner, Retrieualand Organizational Strategiesin ConcepNew York, 1978. tual Memory: A Computer Model, Lawrence Erlbaum, Hillsdale, 184. J. Hobbs, "Coherenceand coreference,"Cog. Sci. 3(1), 67-90 NJ, 1984. (1979).
DESIGN COMPUTER.AIDED 209. M. Dyer, In-Depth (Jnderstanding,MIT Press, Cambridge, MA, 1983. 2I0. L. Siklossy,"A language-learningheuristic program," Cog.Psy' chol. 2, 479-495 (1971). 2IL. J. Anderson, "Induction of augmentedtransition networks," Cog. Sci. l(2), t25-157 (April 1977). 2I2. S. Klein, Automatic Inference of Semantic Deep Structure Rules in Generative Semantic Grammars, Technical Report 180, Computer Science Department, University of Wisconsin, Madison, May L973. 2I3. L. Harris, "A system for primitive natural language acquisition," Int. J. Man-Mach. stud.9, 153-206 (L977). 214. R. Berwick, Computational Analogues of Constraints on Grammars: A Model of Syntax Acquisition, Proceedingsof the EighteenthAnnual Meeting of the ACL, Philadelphia, PA, pp. 49-54, 1980. 2L5. L. Reeker, "A problem solving theory of syntax acquisition," J. Struct.Learn.2, 1-10 (1971). 2L6. P. Langl"y, "Language acquisition through error recoveryl' Cog. Brain Theor. 5,2tL-255 (1982). 217. M. Selfridge, Inference and Learning in a Computer Model of the Development of Language Comprehensionin a Young Child, in W. Lehnert and M. Ringle (eds.), Strategies for Natural Language Processing,Lawrence Erlbaum, Hillsdale, NJ, pp. 299326, 1982. 218. J. Andersonand G. Bower,Human AssociatiueMemory,Winston and Sons,Washington, DC, 1973. 2L9. J. Anderson, The Architecture of Cognition, Harvard University PressoCambridg., MA, 1983. 220. E. Feigenbaum and J. Feldman (eds.),Computersand Thought, McGraw-Hill, New York, 1963. 22L. M. Minsky (ed.), Semantic Information Processing,MIT Press, Cambridge,MA, 1968. 222. R. Rustin (ed.), Natural Language Processing, Algorithmics, New York, 1973. 223. A. Zampolli (ed.;, Linguistic Structures Processing,North-Holland, Amsterdam, L975. 224. S. Kaplan (ed.),"Specialsectionon natural languageprocessing" SIGART Newslett. 79, 42-108 (1982). 225. C. Johnson and J. Bachenko (eds.),"Applied computationallinguistics in perspectrve,"Am. J. Computat. Ling. 8(2), 55-84 (1982). 226. A. Barr and E. Feigenbaum (eds.), The Handbook of Artifi,ciat Intelligence, Vol. 1, William Kaufmann, Los Altos, CA, 1 9 81 227. W. Lehnert and W. Ringle (eds.),Strategiesfor Natural Language Processirg, Lawrence Earlbaum, Hillsdale, NJ, L982. 228. R. Grishman, An Introduction to Computational Lingistics, Cambridge University Press, New York, 1986. 229. M. King (ed.),Parsing Natural Langueg€, Academic Press,London,1983. 230. T. Winograd, Language as a Cognitiue Process,VoI. L, Syntax, Addison-Wesley,Reading,MA, 1983. 23L. K. Sparck Jones and Y. Wilks, Automatic Natural Language Parsing, Ellis Horwood, Chichester,UK, 1985. 232. D. Dowty, L. Karttunen, and A. Zwicky (eds.) Natural Language Parsing, Cambridge University Press, Cambridge, UK, 1985. 233. M. Brady and R. Berwick, Computational Models of Discourse, MIT Press,Cambridge,MA, 1983. 234. R. Schank and C. Riesbeck,Inside Computer Understanding, Lawrence Erlbaum, Hillside, NJ, 1,981. 235. R. Simmons, Computations from the English, Prentice-Hall, EnglewoodCliffs, NJ, 1984.
151
236. M. Harris, Introduction to Natural Languqge Processing,Reston Publ. Co., Reston,VA, 1985. and M. Jorvns B. BaT,LARD AT&T Bell Laboratories
DESIGN COMPUTER.AIDED Computer-aided design (CAD) is the processof utiliztng the computer to construct drawings or models of objectsor systems (I,2). CAD encompassesthe whole range from drafting to design depending on the purpose of the user. The beginning of CAD may be traced to L964 when IBM introduced the 2250 gaphic terminal together with software, allowing users to draw circuits on the face of the tube through menu selection. However, the field remained dormant until the end of the sixties, when Lockheedengineersdevelopedthe CADAM system (computer-augmented design and manufacturing) (3), which allowed convenient construction of traditional two-dimensional multiple-view orthographic projections on the face of an IBM 2250 (and later an IBM 3250) console.Even then CAD remained little known and used until the late seventies,when CAD systems grew so fast that their usagein industry is now quite commonplace.CAD systemsare often coupled with CAM (computer-aidedmanufacturing) as CAD/CAM (seeComputer-integratedmanufacturing). Indeed, one of the main links betweenCAD and CAM is the possibility of producing a numerically controlled machine program directly from the CAD model. A CAD system requires the following hardware: a digital computer, including various peripherals such as disks, tapes, printers and plotters, and one or more graphics terminals equippedwith keyboards and light pens,joysticks, mouse,or other such device,permitting the user to point to various parts of the screen (see Fig. 1). Although several types of graphic terminals exist, raster-scan systems are predominant currently, allowing resolution as high as 1200 x 1000points with a full spectrum of colors. The software includes basic display programs; programs for storing and processinginternal representation of various elements such as points, lines, curves, arcs, splines, notes, surfaces, and volumes; and programs for translating, rotating, scalitg, clipping pictures, and removing hidden lines. The principal application areas of CAD include mechanical (407o),electronics (35Vo),architecture/engineering/construction (I57o), and others (707o).CAD systems are available as turnkey systems or software packages. The former include both software and hardware (general and specialized).The latter includes a collection of programs designedto run on specific commercial hardware. There are currently some280,000CAD systemsin the U.S. (4). The 1984 CAD market topped $2 billion (10e).It is expectedto reach $28.7 billion by 1994 (5). Early CAD systemswere developedas computerizedextensions of classical drafting; modern systemsare aimed at threedimensional (3-D) models that may be classified as surface, wire-frame, and solid models (L,2) (seeFig. 2). Surfacemodels are used to define double-curvature "sculptured surfaces"such as may be found in aircraft, automobiles, shoes,etc. These surfacesare describedby various analytic techniquesdefining a surface as a quiltwork of patches (smoothly joined together) each defined as locally parameterized surfaces such as bi-cu-
152
DESIGN COMPUTER.AIDED
Figure 1. A CAD sYstem.
bics. The importance and complexity of this field has given rise to a new discipline called computational geometry (6). Wireframe models are 3-D extensions of the usual engineering drawings, wherein 3-D lines and curves denotethe hard edges of an object, i.e., those parts where the tangent to the object surface suffers a discontinuity. Thus, for example, a sphere has no real wire-frame representation, whereas the wireframe model of a cube consistsof 12 line segments.Although wire-frame models are relatively easy to construct and to process,they may be confusing and ambiguous.Thus solid models are taking over the 3-D area. These include, in particular, surface boundary models and CSG (constructive solid geometty) models. In the former caseand object is representedby a collection of surfaces separating empty space from object space.In the latter the object is defined by a number of parameteri zedprimitive solids (such as cuboids,spheres,cylinders, cones, etc.) combined by Boolean-like operations including union, intersection, and difference (seeFig. 3). The proliferation of various types of CAD models that normally cannot communicate has recently given rise to several standard proposals,including IGES (intermediate graphics exchange standard) (7) and GKS (graphics kernel system) (8).
Applicationsof Al The obvious application of AI to CAD would be to utilize AI in order to facilitate, if not automate, the CAD process.This is, in general, a difficult undertaking because so little is known
(o) CADAM
drawing.(b)
Wire-frame model. (c) CSG model.
COMPUTER-AIDED DESIGN
153
( A U B I_ C
\ lvB
a\
/\
U Figure
a
a
'U
3. CSG models constructed from Boolean operations as shown.
about the design process,at least with regard to its creative aspects. However, there are surprisingly many situations where a designer repeatedly designsvery similar objects,such as electric motors, transformers, wheels, etc. This may be described as parameterized CAD. It may be possible in such casesto developexpert systems(ES) (qv) requiring the knowledgeengineer to design the ES to discover (as usual) two types of information: characteristic features of the object and design rules. Thus, for example, the characteristic features of an electric motor might include power, size,electrical wirirg, etc. The design rules would then be utilized to assemblepartial systems or generate new ones into a single final object. The application of ESs to the design of mechanical objectsis in its infancy, and very few examples may be cited. Recently such a parameterized system has been implemented for designing multiple-spindle drill heads (9); however, it was not constructed as an ES, although it may well be consideredas such. Since a large number of manufacturing concernseffectively do parameterized design, the use of ESs in this area is certain to grow in the near future. It may be noted that the use of group technology (GT) (10) for design, which consists of classifying various objects according to some GT codehelping designersto locate previously designedparts, is a step in that direction. In the area of electronic design (LSI, VLSD the number of componentsto be designed and drawn is so large that special languages and systems have been constructed to resolve that problem. The designer can utilize these aids to describe the final operations required, and the system effectively designs and draws the detailed circuits and built-in componentsaccording to various built-in rules (11).This may well be considered as a further application of AI in CAD.
BIBLIOGRAPHY 1. Y. Gardan and M. Lucas, Interactiue Graphics in CAD, Kogan Page, London, 1984. 2. J. Encarnacaoand E. G. Schlechtendahl,Computer-AidedDesign, Springer-Verlag,Berlin, 1983. 3. R. E. Notestine, Graphics and Computer-Aided Design in Aerospace,Proceedingsof the Forty-secondNational Computer Conference,New York, L973,p. 629. 4. The 1984-1985 Directory of Computer Graphics Suppliers, Klein, Sudbury, MA private communication, Technology and Business Communications,Inc., 1985. 5. CADICAM Opportunities and Strategies,Report No. 610, International ResourceDevelopment,Manufacturing Engineering 93, 33 (Nov. 1984). 6. D. F. Rogers and J. A. Adams, MathematicalElementsfor Computer Graphics, McGraw-Hill, New York, Lg7G. 7. The latest information concerningIGES may be obtainedfrom the IGES Coordinator, National Bureau of Standards, A353 Metrology Building, Gathersburg, MD. 8. G. Enderle, K. Kansky, and G. Pfaff, Computer Graphics Programnxing: GKS-The Graphics Standard, Springer-Verlag, 1983. 9. L. Lichten, "Development of a special application computer-aided design systemi' Am. Machin. l2g, 104 (January 1985). 10. Introduction to Group Technology in Manufacturing and, Engineering,Industrial DevelopmentDivision, Institute of Scienceand Technology,The University of Michigan, Ann Arbor, Mr, L977. 11. V. Begg, DeuelopingExpert CAD Systems,Kogan Page, London, 1984. M. A. MrlxlNoFF UCLA
154
INSTRUCTION, COMPUTER-AIDED INTELLIGENT
INTETLIGENTinformation, and so on. In INSTRUCTION, COMPUTER-AIDED This entry provides an introduction to intelligent computerassistedinstruction (ICAI). ICAI is the application of AI principles to the development of instructional programs. The entry describespast ICAI projects,current developments,and future prospects.Major theoretical and practical issues in ICAI are also described. However, this entry provides but a cursory overview of the field. More detailed discussionsof ICAI can be found in Sleeman and Brown (1), Kearsley (2), and O'Sheaand Self (3). What ls lCAl? Carbonell (4) provides one of the first attempts to defi.nethe need for and nature of ICAI. Table 1 lists some of the major characteristics of ICAI programs as described by Carbonell. Mixed-initiative dialogue is one of the most distinguished aspects of ICAI programs; it refers to the capability for the student to ask a question and hence participate in a two-way interaction with the program. This contrasts with the typical one-way interaction (program presents question/problem,student responds)of conventional CAI. The net result of a mixedinitiative dialogue program is to produce a highly interactive instructional session much like the conversation between a good teacher and a motivated student (i.e., the Socratic method). A more important characteristic of ICAI programs from a design perspective is that they are constructed as knowledge networks consistingof facts,rules, and their relationships(see Representation, knowledge; Semantic networks). This contrasts with the scriptlike structure of most conventional CAI programs where all content is organi zed into screens and branching instructions that define the sequenceof instruction. In conventional CAI the author predefines the possible patterns of interaction. In an ICAI program the tutoring rules that an author uses to create these patterns are defined in the knowledge base, and the program generatesthe instructional sequence in response to student questions or mistakes. In other words, ICAI programs contain two types of knowledge: content knowledge about the subject matter being taught and pedagogicalknowledge, that is, knowledge about how to teach the subject. This leads to two other major characteristics of ICAI programs, namely, student models and error diagnosis rules. In order to determine what instruction to present next (sinceit is notpredefined), it is necessaryto have a goodidea of what the student knows and has already learned. This is achieved by identifying the aspects of the knowledge network currently understoodby the student (seealso Belief systems).In order to identify the student's present level of understanding, it is necessary to be able to diagnose any mistakes made by the student in terms of misconceptions,overgeneralizations,missing Table 1. Major Characteristics of ICAI Programs Mixed-initiative dialogue Semantic (knowledge)networks Student models Diagnostic error rules Natural language
addition to a set of general contentindependent errors made while learning (seeRef. 5), each subject domain has content-specificerrors that must be included in the knowledge network. Finally, another characteristic of ICAI identified by Carbonell is natural-language interaction. Clearly, the quality of communication between a CAI program and a student is dramatically improved if the program can understand naturallanguage input (either typed or spoken)(seeNatural-language understanding). Furthermore, many of the previous ICAI characteristics such as mixed-initiative dialogue and error diagnosisdependheavily on the semanticsknowledge associated with natural language. Despite many new developmentssince Carbonell's classic paper, the characteristics outlined in Table 1 remain some of the important conceptsof ICAI. One significant changeis that the understanding of knowledge networks has gone beyond strictly language-based(i.e., semantic) representations. In fact, the importance of natural language has diminished considerably in ICAI (and AI generally). This is becauseit is possible to implement intelligent systems using structured command languages or menu selection structures (seeMenubased natural language). One major characteristic of ICAI programs that was not discussedin Ref. 4 is the ability of ICAI programsto learn (i.e., adaptive or self-modifying systems).It is clear that a system that cannot learn from its successesand mistakes cannot be consideredfully "intelligent." In the case of ICAI programs this means a system capableof changing its teaching behavior based on how well students seem to learn via one strategy versus another (seeLearning). Although it is not listed in Table 1, the power of all ICAI programs is derived from their capability to draw inferences (seeInference).In fact, this single quality more than any other constitutes the intelligence of AI software. In the caseof ICAI programs inferencing takes place when the program attempts to deducewhat the student misunderstands and what tutoring rule to apply in order to remove the misunderstanding. If the program usesnatural langu&B€,a lot of inferencing is required just to understand the input, that is, to disambiguate pronoun referencesand fill in hidden meanings. One area where a great deal of progresshas been made in the past decadeis the design and implementation of inferencing mechanisms. The major trend in the ICAI field in the past five years has been away from mixed-initiative tutoring systemsof the kind describedby Carbonell toward diagnostic tutors or coaches.A diagnostic tutor compares the student's behavior with that of an expert for the problem domain involved. When discrepancies arise, the student is given advice about better learning or performance strategies. Diagnostic tutors are usually a more appropriate form of ICAI for games, simulations, and problemsolving situations than a mixed-initiative tutor approach.
Examplesof lCAl Programs Almost all research conductedin ICAI has been in the context of specificprograms designed for a particular subject domain. Table 2 lists someof these programs.In general,eachprogram has explored a somewhat different set of issues in cognitive scienceand AI methodology. SCHOLAR was the first ICAI program developedinitially by Carbonell, Collins, and colleaguesat Bolt, Beranek, and
INSTRUCTION,INTELLIGENT COMPUTER.AIDED Table 2. Examples of ICAI Programs Program
SCHOLAR WHY SOPHIE BIP SPADE FLOW MENO-II/Proust WEST WUMPUS BUGGY GUIDON STEAMER
Subject Area
Format
South American geography NLS text editor Meteorology Electronics troubleshooting BASIC programming LOGO programming Programming principles Pascal programming Arithmetic game Adventure game Elementary arithmetic Medical diagnosis Steam plant operation
Mixed initiative
416
Mixed initiative Mixed initiative Mixed initiative Diagnostic tutor Coach Diagnostic tutor Coach Coach Diagnostic tutor Mixed initiative Intelligent simulation
b
References
7 8 I 10 LL, L2 13 L4 15 16 T7
Newman, Inc . (4,5).Whereasthe scHoLAR systemdealt bas- problerns and then monitor the student's solution, providing ically with concepts,the WHY system (6) focused on causal diagnostic feedback as needed. One of the unique aspects of BIP was the capability to identify "gaps" in its curriculum reasoning in the domain of meteorology (specifically rainfall). A great deal of attention was paid to identifying the kinds of network that needed completion. Other attempts to develop misconceptionsexhibited by students and the corresponding programming tutors include SPADE (9), FLOW (10), MENOil (11), and Proust (L2). tutoring strategies used by the program. The WEST proglam developedby Burton and Brown (13) The SOPHIE system developedby Brown and Burton was a tutor for electronics troubleshooting (7). One of the strengths was one of the first attempts to implement a coachingstrategy. of SOPHIE was a powerful inferencing capability that allowed The idea was to implement a tutor that would unobtrusively the program to test out student ideas for troubleshooting monitor a student's progless and provide advice when approfaulty circuits and provide critical feedback or counterexam- priate. WEST was basedon an arithmetic game (How the West Was Won) originally implemented on the PLATO system.The ples when students were wrong. SOPHIE featured a naturallanguage interface that made conversational dialogues possi- major research problem in this type of ICAI program is to identify when the coach should interuupt the student and what ble (seeFig. 1). (BIP) specific advice to provide. WUMPUS (14) was another game program at Standeveloped The BASIC instructional used to study coaching. ford was the first of many attempts to develop diagnostic tuBUGGY, also developedby Brown and Burton (15),was an tors for proglamming languages. BIP was a curriculum to teach BASIC in which a BASIC interpreter and programming attempt to explore sophisticated diagnosis of student errors ("bugs") in the context of simple arithmetic problems. The tutor were embedded.The system would pose proglamming program consisted of diagnostic rules specific to arithmetic that allowed the program to infer the reasonsfor mistakes. By WHAT IS THE OUTPUT? applying these diagnostic rules to student responses,BUGGY The outputvoltageis 11.7 V . could deduce exactly what a student was doing wrong. AnWHAT IS IT IN A WORKINGINSTRUMENT? other interesting aspectof BUGGY was that it could be used to In a working circuit the output voltageis 19.9V. gtve teacherspractice in diagnosing student errors by generatWHAT IS THE VOLTAGEBETWEENR8 and Ql? ing problems with known bugs to be identified. The voltageat Nl is 47.6V. The GUIDON program developedby Clancy and colleagues WHAT IS THE BASE EMITTER VOLTAGEOF THE -LIMITING at Stanford (16) represents a major milestone in ICAI proTRANSISTOR? VOLTAGE : Base grams in that it was the first tutor built to work with an - Transistor already existing expert system (MYCIN). MYCIN is a medical The baseemittervoltageof Q5 is - 0.037mV. consultation system for diagnosing infectious diseases (see WHAT ABOUT FOR THE CURRENT.LIMITINGTRANSISTOR? Medical systems).GUIDON teachesthe diagnostic rules used The baseemittervoltageof Q6 is 0.349V. in MYCIN. The GUIDON program provided some interesting WHAT SHOULDIT BE? developments in the structure of mixed-initiative dialogues. In a working circuit the baseemitter voltageof Q6 is 0.593V. First, students could specify what they curuently know, what REPLACEQ6 want to know more about, what they do not understand, they I am goingto ask you somequestionsabouthow Q6 is faulted. they want to ignore using a terse keyword command and what junctions Are any shorted? language. Second,GUIDON featured a "don't interrupt" mode I DON'TKNOW. in which the student could only respond to questions and not Are any terminalsopen? YES. change the topic. Third, the program is able to detect partial Which terminal(s)are open? understandings by asking probing questions about students THE BASE. answers or solutions (seeFig. 2). Q6 is not faulted in that manner. Q6 is not replaced. STEAMER is an intelligent simulation for teaching steam IS IT POSSIBLE THAT Q6 HAS AN OPEN BASE? plant operation (17). It represents a synthesis of state-of-theFigure 1. Excerpt from SOPHIE (1) (student responsesin capitals). art techniques in different areas of computing, including
156
INSTRUCTION, INTELTIGENT COMPUTER.AIDED
user interface, and the instructional control. More fundamentally, the learning model underlying the ICAI program dictates how the program will be constructed and how it will work. 3a. whether the meningitis is partially treated The type of knowledge representation used in an ICAI pro3b. whether a lumbar puncture has been performed on Pt538 gram is an important determinant of the kind of tutoring pos3c. the duration of the neurological signs sible. Early ICAI programs such as SCHOLAR and SOPHIE 3d. the CSF protein used augrnented-transition networks (ATNs) for their repre(.)3e. whether Pt538 has a rash consistent with herpes zoster sentational scheme (see Grammar, augmented transition). Factors marked with an asterisk have already been stated. Based on More recent ICAI programs use an object-orientedor producwhat we have already mentioned, you should be able to conclude tion rule formalism (see Rule-based systems) to represent about: 3b. knowledge. The way the knowledge is representeddetermines USE 3c what kind of subject and student understanding the program is capable of. For example, it is important in a programming The duration of the neurological signs is 7.2 h. Using other tutor to be able to represent specificconceptssuch as variables, information available to you now, the duration of the neurological signs enables you to concludethat the type of the infection is iteration, subroutines, comments, and so on as well as more bacterial (D viral (.2) TRULE 5241. general ones such as causality, syntax, files, and so on. Differtypes of subject matter may require different knowledge ent IKNOW 3b representation structures. Good.MYCIN has also made a decision. A wide variety of different diagnostic techniqueshave been The site of CULTURE-I is evidencethat a lumbar puncture has used in ICAI prograhs, although almost all involve either been performed on Pt538 TRULE LL2l. forward or backward reasoning sequences(seeProcessing,bottom up and top down). Some diagnostic techniques have been the infection . Back to our discussionof the type of fairly simple generate and test strategies. For example, in the Summary of evidencethat the type of the infection is bacterial (.98): BUGGY program each bug rule generates an answer for the problem according to its deficietrcy,and all of the answers are 29a. The fact that the meningitis is partially (.95) is evidencethat the type of the infection is bacterial (.19) TRULE 1481. matched against the student's responseto detect which mis29b. The WBC from the CSF and the percentageof PMNs in the take the student is making. Other diagnostic techniques inCSF is evidencethat the type of the infection is bacterial (.9) volve goal-directed inferencing. In the Proust program the diTRULE 501]. agnostic routines try to match specificsubgoalsin a student's program that are generated by a plan for solving the problem. DETAILS 29a Most student models have been of the "overlay" type, that For which of the following factors would you like to see a review of is, the current representation of the student's knowledge is the evidence? represented as a subset of the complete knowledge network. (Select from the following): The simplest form of student model is a state vector in which each element of the vector represents the student's current 1. the infection that requires therapy 2. the fact that the meningitis is partially treated (.95) knowledge of a specific concept or skill. The REGIS tutorial and O'Shea (18) is a good example of a Figure 2. Example sequencefrom GUIDON (14) (student responses developedby Heines more complex type of student model A model. vector state in capitals) involves building a generative model of each student that is independent of the knowledge network. This kind of model Although accounts for the fact that the student's understanding of a tutoring. and intelligent graphics, simulation, subjectis not merely a subsetof the full knowledge domain but intelpurposes, it not is an for training is designed STEAMER a new incarnation. ligent tutor in the sense discussed in this entry. It provides an The user interface can range from natural-langu age disextremely powerful graphics interface that makes it possible course to selection of items from means or use of keyword for students to explore and learn about a complex system. A Most early ICAI programs used natural language; commands. how project is to understand major focus of the STEAMER however, natural language has been de-emphasizedin recent people build mental models of physical systems and use this years. Most current ICAI programs use menu selectionsfor the knowledge to design a good training system. user interface. This shift correspondsto a greater interest in To summarize this brief review of past ICAI efforts, it the inferencing mechanisms associated with tutoring than programs be could ICAI seems reasonable to conclude that with understanding language. These and domain. training or area subject for any developed User control is an important attribute of ICAI programs. other ICAI programs have provided a rich assortment of intriMany of the original ICAI programs featured mixed-initiative cate learning models to explore and have demonstrated the dialogue in which the student could ask a question at any and kind of fine-grained interaction possible between student time. More recent programs provide menu options that are instruction. of machine in the context always active and available. A major design consideration in the construction of a coach or diagnostic tutor is how to allow Design lssues the student to control the advice or prescriptive feedback. Almost all ICAI programs are based on an explicit or imICAI of an design the with associated issues There are many plicit theory of learning. In fact, most ICAI programs are desystem, including the type of knowledge representation, the signed to test these theories. Consider, for example, the geomdiagnostic techniques, the nature of the student model, the
FACTORS
The following factors will be useful to determine the type of infection:
INSTRUCTION,INTELLIGENT COMPUTER-AIDED
etry and LISP tutors developed by Anderson, Boyle, and Reiser (19) basedon the ACT* Iearning theory. ACT* consists of a set of assumptions about memory that have been embodied in tutoring programs. These assumptionscover the representation of procedural knowledge via productions, the use of goal structures, and working memory limitations. In some casesthe learning theory underlying an ICAI program is general in nature, whereas in other casesit is specificto the subject domain. Even though a number of ICAI programs have been developed, there has been relatively little published analysis and discussion of the design and development process.Woolf and McDonald (20) discussthe design of Meno-Tutor. This generic tutor incorporates a hierarchical discoursemanagement network consisting of three levels: pedagory, strategy, and tactics. The article discusseshow the different levels interact for a given tutoring sequence.Clancey (2L) outlines the design issues involved in building GUIDON and focuseson the distinction between an expert system and a tutor for the same knowledge domain. lmplementationlssues So far, this discussion of ICAI has been free of pragmatic or "real-world" considerations. There are three major practical issues that need attention before ICAI can enter the mainstream of the education and training world. These issues are the relationship between ICAI and conventionalCAI, accessibility of ICAI programs, and performance factors. ICAI and CAl. Researcherswho have developedpast ICAI programs have tended to be computer scientists rather than specialists in CAI. To a large extent, these researchershave not been fully cognizantof the state of the art in conventional CAI applications, tending to characterize the field as still being stuck in the drill-and-practice, or frame-oriented,tutorials of the sixties. In fact, many current applications of CAI in education and training involve sophisticated simulations, diagnostic testing, and problem-solving sequences(see Refs. 22-24). On the other hand, CAI specialists generally have backgrounds in training or education and tend to be largely ignorant of ICAI developments. Because they often lack solid grounding in computer science,the AI techniquesemployedin ICAI are novel to them. Furthermore, becauseof the performance factors discussedbelow, they tend to be skeptical of the practical value of ICAI over conventional CAI programs. The problem with this communication gap between ICAI researchersand CAI specialistsis that if ICAI is to be put to real use, it is likely to come from the CAI specialists who design and implement systems in actual educational or training settings. Even though there is a continuity between ICAI and CAI prograffis, the underlying programming methodsare different. The major difference lies in the data and control structures (qt) of ICAI versus conventional CAI. ICAI programs are implemented using symbolic or production-typelanguages(..g., LISP, PROLOG) or object-oriented languages (such as SmallTalk). Conventional CAI is implementedusing standard sequential control programming languages such as BASIC, Pascal, or C or in authoring languages such as Tutor, Pilot, or Planit with implicit control structures. Data structures in
157
ICAI programs are someform of declarative or procedural representations (i.e., knowledge networks), whereas data structures in conventional CAI are simple data statements (embedded in the control logic) or data fiIes. Despite this difference in programming structures, however, there can be strong similarities between ICAI and conventional CAI programs. For example,in certain CAI tutorials the answer analysis is very sophisticated and closely resembles the kind of diagnostic capability of some ICAI programs. In fact, the collection of keywords, feedback messages, prompts, and branches for any particular answer constitutes the components that would form a knowledge network, student model, and tutoring rules in an ICAI program. The major difference is that the componentsare implicit rather than explicit in the programming. It is not completely obvious how critical the use of AI programming languages are to the creation of ICAI programs. AI programming languages have been developedto make it easy to create programs with the kind of rule-based,context-dependent processingrequired in intelligent programs. However, it is likely that such routines could be implemented using popular high-level languages. Accessibilityof lCAl. This discussionof the differencesin prograrnming structures between ICAI and conventional CAI programs leads to the secondmajor pragmatic consideration associatedwith the current state of ICAI, namely, accessibility. Becausethere are relatively few CAI specialists who are familiar with the type of languages used to create ICAI programs (actually, very few computer scientists in general), accessto the tools neededto do ICAI is very limited at present. Although the conceptual knowledge of how to construct ICAI programs is in theory distinct from the implementation techniques, in practice, the two are very closely intertwined. For example, most developersof ICAI programs do their own programming using an AI language. Another development neededin order for ICAI programs to becomemore widespreadin the near term is the availability of ICAI authoring tools. The existence of authoring languages and systems has made a considerable difference in the time and cost associated with developing conventional CAI programs (2il. Comparable authoring tools need to be developed so that ICAI programs can be created quickly and without the considerableAI knowledge presently required. The emerging generation of software for building expert systems (qv) (e.9., Ml, ART, KEE, etc.) may be useful as a basis for such ICAI authoring tools. Greater availability of ICAI programs is also needed.It is very difficult for designers and programmers to understand the nature of ICAI unless they are able to examine and use such programs. Accessto ICAI programs is usually limited to those immediately involved in the development of a program. At present there are no ICAI programs commercially available for personal computers. Once such programs are on the market, examples of ICAI programs will be more accessible. PerformanceFactors.Closelyrelated to this issueof accessibility is the third pragmatic consideration: program performance. Like most AI programs, ICAI programs tend to be very computationally complex and require enormous amounts of memory. For this reason,they have usually been developedon high-performance machines. Even then, the responsetimes of
158
INSTRUCTION, INTELLIGENT COMPUTER-AIDED
ICAI programs are often very slow and unacceptablefor operational environments. The recent emergenceof LISP machines (qv), powerful supermicros designed primarily for AI applications, has had a significant impact on the performance problem. LISP machines typically have at least 1 Mbyte of RAM and 40 or more megabytes of disk storage. These machines are too expensive at present to be widely used in schools or training centers. However, as personal computerscontinue to improve in priceperformanceratios, it should be possibleto run ICAI programs on machines that are commonly available and relatively inexpensive. Although the raw power of the computersused for AI applications is an important factor in determining performance, the efficiency of the programming involved is also a consideration. As discussedearlier, past ICAI programs have been developed as research tools, not as operational software. Hence, relatively little attention has been given to making ICAI programs run efficiently. If standard techniques used in data processingwere applied to ICAI programs (e.g.,codecompression, hash tables, etc.) or the programs rewritten in general-purpose languages that run faster, substantial improvement in performance could be achieved. The three factorsjust discussed(ICAI vs. CAI, accessibility, and performance) are all current limitations that will dissipate with time. ICAI techniques will be incorporated into conventional CAI programs (and all other software) as more instructional designersand programmers learn such techniques and as suitable computers becomewidely available. FutureDevelopments Probably the most important development neededto increase the use and application of ICAI does not come from the computer field but the domain of cognitive science(qv). ICAI provides the capability to design highly individualized instruction that maps very closely onto the learning strategies and thinking processesof the student. Hence,there must be a good theoretical understanding of human learning and cognition for ICAI to work. Even though the topic has been scientifically studied for hundreds of years, there are still no complex models of how people learn and think. In fact, much of the progress that has been made in cognitive sciencein the past two decadeshas been made by AI researchers.Attempts to design and implement computer programs that exhibit intelligence has forced a great deal of attention on the characteristics of human intelligence. Consequently, much of the research in ICAI has focusedattention on how people learn. There are a number of developments in other areas of computer technology that are closely related to ICAI or likely to become so. The most obvious example is computer-based speechprocessing (i.e., synthesis and recognition) capability (see Speech understanding). The ability to talk to the computer and have it respond opensup a new dimension of interactive instruction in terms of the amount and kind of information that can be presented and analyzed via CAI. Another technology of considerable relevance is computer graphics. Historically ICAI programs have been very text based, making littte of use of visual presentation modes. Yet all other instructional media (including most conventional CAI) make heavy use of visuals and graphics. The STEAMER program mentioned above is an example of how "intelligent" gfaphics
can extend the boundaries of ICAI. In addition, all developments in the expert systems areas are relevant to ICAI in terms of new software tools on programming techniques. There are some very significant sociological implications associated with ICAI. The tradition of didactic, classroombased teaching has become a very strong cultural fixture in the North American educational system. ICAI presents a whole different tradition-one-on-one tutoring that is largely inquiry driven in nature. Such a radically different tradition is not going to be easily accepted or assimilated in schools or training centers. Although hardware and software technologres can change very quickly, traditions (particularly educational ones)change very slowly. Consequently,ICAI programs will likely need new curricula designedfor them. The dream of ICAI researchers is to provide each student with a computer-basedtutor that has all of the qualities of a master teacher. This includes great scopeand depth of subject matter expertise, excellent knowledge of teaching techniques, powerful communication skills, and the ability to inspire and motivate students to learn. Clearly, most conventional CAI programs are a long way from this ideal. So far, ICAI programs have primarily focusedon the first two qualities of a master teacher, namely, subject matter expertise and teaching techniques. This is manifested in the knowledge networks and tutoring rules, which form the basis of ICAI. Less progress has been made in providing powerful communication skills. This is due to the impoverished nature of the communication interface between peopleand machines. Goodcommunicators use many modalities to pick up and convey information, including sight, speechqualities, facial and body movements,touch, and so on. If ICAI programs are going to be better communicators, they must possesssomeanalog to listening and oration skills. The least progresshas been made in the affective area. It is true that some educational games and simulations are highly motivating, but their effect on motivation is short term rather than the long-term impact of real inspiration. Similarly, the positive-feedbackremarks of the "You're doing terrific!" variety have very little real effect on motivation (if any). In order for a program to truly inspire learning in a student, the program would need to exhibit enthusiasm for the subject matter and for learning. A great deal of teaching is concernedwith the transmission of beliefs and values. Can these qualities be buitt into a computer program? The question of values and belief systems embedded in computer programs is likely to becomean important issue in the coming decadesas software becomestruly intelligent. Before having to deal with such profound issues in ICAI and elsewhere, there are many more mundane developments to worry about. This entry has discusseda number of the developments needed in order for ICAI to becomemore prevalent. More instructional designersand CAI specialistswho are familiar with ICAI methodology are needed.Authoring tools that make it much easier and faster to create ICAI programs and widespread availability of affordable personal computers powerful enough to run ICAI programs are also needed.All of these developments are likely to take place within the next five years. Thus, before the end of this decadethere will probabty be a dramatic improvement in the quality of computerbased instruction. However, for the reasons discussedin this entry, it will be much longer before this improvement is widely implemented in the classroom.
COMPUTERCHESSMETHODS To summarrze, a sufficient number of ICAI programs have been created to demonstrate the potential of ICAI. Practical issues that underly the wider use of ICAI include competition with conventional CAI, accessibility, and performance. In addition, significant advances are needed in the understanding of human learning, and ICAI research will likely make and benefit from major contributions to instructional psychology. Lastly, the success of ICAI depends on the emergence of a new tradition of teaching that is different from the current didactic classroom pedagogy.
BIBLIOGRAPHY
20. B. Woolf and D. D. McDonald, "Building a computer tutor: Design issues,"IEEE Compuf. (September 1984). 2L. W. J. Clancey, Methodologyfor Building an Intelligent Tutoring System, Department of Computer ScienceReport, 81-894, Stanford University, October 1981. 22. A. Bork, Learning with Compu.ters,Digital Press, Bedford, MA, 1 9 81 . 23. S. M. Alessi and S. R. Trollip, ComputerBasedInstruction: Methods and Deuelopment,Prentice-Hall, Englewood Cliffs, NJ, 1985. 24. G. Kearsley, Computer Based Training, Addison-Wesley,Readirg, MA, 1982. 25. G. Kearsley, "Authoring systems in computer based education," Carnmun. ACM, 25(7), 429-437 (1982). G. KnaRSLEY Park Row Software
1. D. Sleeman and J. S. Brown, Intelligent Tutoring System.s,Academic Press,New York, 1982. 2. G. Kearsley, Artifi,cial Intelligence and Instruction: Applications and Methods, Addison-Wesley,Reading, MA, 1987. 3. T. O'Shea and J. Self, Learning and Teaching with Computers: Artifi,cial Intelligence in Education, Prentice-Hall, Englewood Cliffs, NJ, 1983. 4. J. R. Carbonell, "AI in CAI: An artificial intelligence approachto computer aided instruction," IEEE Trans. Man-Machine Sys., 1l(4), 190-202 (L970). 5. A. Collins, Processesin Acquiring Knowledge, in R. C. Anderson, R. J. Spiro, and W. Montague (eds.),Schooling and the Acquisition of Knowledge, Lawrence Erlbaum, Hillsdale, NJ, L976. 6. A. Stevens,A. Collins, and S. E. Golden,"Misconceptionsin student's understanding," fnt. J. Man-Machine Stud., 11, L45-L56
(1e7e). 7. J. S. Brown, R. Burton, and J. deKleer, Pedagogical,Natural Language, and Knowledge Engineering Techniques in SOPHIE I, II, and III, in D. Sleeman and J. Brown (eds.),Intelligent Tutoring Systems,Academic, New York, L982. 8. A. Barr, M. Beard, and R. C. Atkinsoh, "A rationale and description of a CAI program to teach the BASIC programming language,"Instruc. Sci.,4, L-31 (1975). 9. M. L. Miller, "A structural planning and debugging environment for elementary programming," Int. J. Man-Machine Stud.,lr 7995 (1979). 10. D. Gentner, Toward an Intelligent Tutor, in H. F. O'Neil (ed.), Procedures fo, Instructional Systems Deuelopment, Academic Press,New York, L979. 11. E. Soloway et al, "Meno-II: An AI basedprogramming tutor," J. Comput.BasedInstruc., l0(1 and 2), 20-34 (1983). L2. W. L. Johnson and E. Solow&y,"Proust," BYTE (April 1985). 13. R. Burton and J. S. Brown, "An investigation of computer coaching for informal learning activities," Int. J. Man-Machine Stud., tL, 5-24 (1979). L4. B. Carr and I. Goldstein, Ouerlo.ys;A Theory of Modeling for Computer Aided Instruction, AI Memo 406, MIT AI Lab, Cambridge, MA, L977. 15. J. S. Brown and R. R. Burton, "Diagnostic models for procedural bugs in basic mathematical skills," Cogn. Sci.,2, L55-192 (1978). 16. W. J. Clancey, "GI-IIDON," J. Comput.BasedInstruc., 10(1 and 2), 8-15 (1983). L7. J. D. Hollan, E. L. Hutchins, and L. Weitzman, "STEAMER: An interactive inspectable simulation-based training system," AI Mag.(Summer 1984). 18. J. M. Heines and T. O'Shea, "The design of a rule-based CAI tutorial," Int. J. Man-Machine Stud., 16, 356-371 (1984). 19. J. R. Anderson,C. F. Boyle, and B. J. Reiser,"Intelligent tutoring systems," Science,228, 456-462 (1985).
159
CHESSMETHODS COMPUTER HistoricalPerspective Of the early chess-playing machines the best known was exhibited by Baron von Kempelen of Vienna in 1769. Like its relations it was a conjurer's box and a grand hoax (I,2). In contrast, about 1890 a Spanish engineer, Torres y Quevedo, designed a true mechanical player for king-and-rook against king end games.A later version of that machine was displayed at the Paris Exhibition of L914 and now resides in a museum at Madrid's Polytechnic University (2). Despite the successof this electromechanical device, further advanceson chess automata did not comeuntil the 1940s.During that decadethere was a sudden spurt of activity as several leading engineers and mathematicians, intrigued by the power of computersand fascinated by chess,began to expresstheir ideas on computer chess.Some,like Nemes of Budapest (3) and Zuse (4), tried a hardware approach, but their computer chess works did not find wide acceptance.Others, like noted computer scientist Turing, found successwith a more philosophical tone, stressing the importance of the stored program concept (5). Today, best recognized are the 1965 translation of de Groot's 1946 doctoral dissertation (6) and the much referenced paper on algorithms for playing chessby Shannon (7). Shannon'spaper was read and reread by computer chessenthusiasts and provided a basis for most early chessprograms. Despite the passage of time, that paper is still worthy of study. landmarks in Chess Program Development.The first computer model in the 1950swas a hand simulation (5); programs for subsets of chess followed (8), and the first full working program was reported in 1958 (9). By the mid-1960sthere was an international computer-computer match (10) between a program backedby John McCarthy of Stanford [developedby a group of students from MIT (11)l and one from the Institute for Theoretical and Experimental Physics (ITEP) in Moscow(I2). The ITEP group's program (under the guidance of the wellknown mathematician Georgi Adelson-Velskiy) won the match, and the scientists involved went on to develop Kaissa, which became the first world computer chess champion in L974 (13). [Descriptions of these programs can be found in various books (13,14). Interviews with some of the designers have also appeared(15).1Meanwhile there emergedfrom MIT
160
COMPUTTRCHESSMETHODS
another prograffi, MACHACK-6 (qv) (16),which boostedinterest in AI. First, MACHACK was demonstrably superior not only to all previous chess programs but also to most casual chessplayers. Secondly,it contained more sophisticatedmoveordering and position evaluation methods. Finally, the program incorporated a memory table to keep track of the values of chesspositions that were seen more than once. In the late sixties, spurred by the early promise of MACHACK, several people began developing chess programs and writing proposals.Most substantial of the proposalswas the 29-pointplan by Good(17). By and large, experimentersdid not make effective use of these works, at least nobody claimed a program based on those designs,partly becauseit was not clear how someof the ideas could be addressedand partly becausesome points were too naive. Even so, by 1970 there was enough progressthat Newborn was able to convert a suggestionfor a public demonstration of chessplaying computersinto a competition that attracted eight participants (18). Due mainly to Newborn's careful planning and organization, this event continues today under the title "The ACM North American Computer Chess Championship." In a similar vein, under the auspicesof the International Computer Chess Association, a worldwide computer chess competition has evolved. Initial sponsorswere the IFIP triennial conferencein Stockholm (L974) and Toronto (1977),and later there were independent backers such as the Ltnz (Austria) Chamber of Commerce (1980), ACM New York (1983), and for 1986, the city of Cologtr€,Federal Republic of Germany. In the first world championship for computers Kaissa won all its games, including a defeat of the eventual secondplace finisher, Chaos. An exhibition match against the 1973 North American Champion, Chess4.0, was drawn (10).Kaissa was at its peak, backed by a team of outstanding experts on tree-searchingmethods. In the secondchampionship (Toronto, 1977), Chess 4.6 finished first with Duchess(19) and Kaissa tied for secondplace. Meanwhile both Chess 4.6 and Kaissa had acquired faster computers, a Cyber 176 and an IBM 3701 165, respectively. The traditional exhibition match was won by Chess 4.6, indicating that in the interim it had undergone far more development and testing (20). The Third World Championship(Lin2,1980) finished in a tie betweenBelle and Chaos. In the playoff Belle won convincingly, providing perhaps the best evidence yet that a deeper search more than compensatesfor an apparent lack of knowledge. In the past this counterintuitive idea had not found ready acceptancein the AI community. More recently, in the New York 1983 championship another new winner emerged, Cray Blitz (2D. More than any other, that program drew on the power of a fast computer, here a Cray X-MP. Originally Bhtz was a selective searchprogram in the sensethat it could discard some movesfrom every position basedon a local evaluation. Often the time savedwas not worth the attendant risks. The availability of a faster computer made it possible to use a purely algorithmic approach and yet retain much of the expensive chess knowledge. Although a mainframe won that event, small machines made their mark and seem to have a great future (22). For instance, Bebe with special-purposehardware finished second,and even experimental versions of commercial products did well. lmplications. All this leads to the common question: When will a computer be the unassailed expert on chess?This issue
was discussedat length during a "Chess on Nonstandard Architectures" panel discussionat the ACM L984 National Conference in San Francisco. It is too early to give a definitive answer, and even the experts cannot agree; their responses covered the whole range of possible answers from "in five years" (Newborn), "about the end of the century" (Scherzer and Hyatt), "eventually, it is inevitable" (Thompson),and "never, or not until the limits on human skill are known" (Marsland). Even so there was a sensethat production of an artificial Grand Master was possibleand that a realistic challenge would occur during the first quarter of the twenty-first century. As addedmotivation, Edward Fredkin (MIT professor and well-known inventor) has created a special incentive prize for computer chess.The trustee for the Fredkin Prize is Carnegie-Mellon University and the fund is administered by Hans Berliner. Much like the Kremer prize for man-poweredflight, awards are offered in three categories.The smallest prize of $5000 has already been presented to Ken Thompson and Joe Condon, when their Belle progTam achieved a U.S. Master rating in 1983.The other awards of $10,000for the first Grand Master progTamand $100,000for achieving world champion status remain unclaimed. To sustain interest in this activity, each year a $1500 prize match is played between the currently best computer and a comparably rated human. One might well ask whether such a problem is worth all this effort, but when one considerssome of the emerging uses of computersin important decision-makingprocesses,the answer must be positive. If computers cannot even solve a decision-making problem in an area of perfect knowledge (like chess),how can we be sure that computers make better decisions than humans in other complex domains-especially in domains where the rules are ill-defined or those exhibitittg high levels of uncertainty? Unlike some probleffis, for chess there are well-established standards against which to measure performance, not only through a rating scale (23) but also using standard tests (24) and relative performance measures (25). The ACM-sponsored competitions have provided 15 years of continuing experimental data about the effective speedof computers and their operating system support. They have also afforded a public testing ground for new algorithms and data structures for speedingthe traversal of search trees. These tests have provided growing proof of the increased understanding about chessby computers and the encoding of a wealth of expert knowledge. Another potentially valuable aspect of computer chess is its usefulness in demonstrating the power of man-machine cooperation. One would hope, for instance, that a computer could be a useful adjunct to the decision-making process,providing perhaps a steadying influence and protecting against errors introduced by impulsive shortcuts of the kind people might try in a carelessor angry moment. In this and other respects it is easy to understand Michie's belief that computer chess is the "Drosophila melanogaster[fruit fly] of machine intelligence" (26). Terminology There are several aspectsof computer chessof interest to AI researchers.One area involves the description and encodingof chessknowledge in a form that enables both rapid accessand logical deduction in the expert system sense.Another fundamental domain is that of search (qv). Since computer chess programs examine large trees, a depth-first search is com-
COMPUTERCHESSMETHODS
monly used. That is, the first branch to an immediate successor of the current node is recursively expanded until a leaf node (a node without successors)is reached. The remaining branches are then consideredas the searchprocessbacks up to the root. Other expansion schemesare possible, and the domain is fruitful for testing new search algorithms. Since computer chess is well defined and absolute measures of performance exist, it is a useful test vehicle for measuring algorithm efficiency. In the simplest case the best algorithm is the one that visits fewest nodeswhen determining the true value of a tree. For a two-person game tree this value, which is a least upper bound on the merit for the side to move, can be found through a minimax search (seeMinimax procedure).In chess this so-calledminimax value is a combination of both the "MaterialBalance" (i.e., the difference in value of the piecesheld by each side) and the "StrategicBalance" (e.g., a composite measure of such things as mobility, square control, pawn formation structure, and king safety). Usually MaterialBalance is dominant. Minimax Search.For chessthe nodesin a two-persongame tree represent positions, and the branches correspond to moves.The aim of the search is to find a path from the root to the highest valued terminal node that can be reached under the assumption of best play by both sides.To represent a level in the tree (i.e., a play or half move) the term ply was introducedby Arthur Samuel in his major paper on machine learning (27). How that word was chosenis not clear, perhaps as a contraction of play or maybe by associationwith forests as in layers of plywood. In either caseit was certainly appropriate, and it has been universally accepted. A true minimax search is expensivesince every leaf node in the tree must be visited. For a tree of uniform width IV and fixed depth D there are WD terminal nodes.Some games,like Fox and Geese (28), produce narrow trees (fewer than 10 branches per node) that can often be solved exhaustively. In
contrast, chessproducesbushy trees (average branching factor about 35 moves).Becauseof the magnitude of the game tree, it is not possibleto search until a mate or stalemate position (a leaf node) is reached,so somemaximum depth of search (i.e., a horizon) is specified.Even so, an exhaustive searchof all chess game trees involving more than a few moves for each side is impossible. Fortunately, the work can be reduced since it can be shown that the search of some nodes is unnecessary. Alpha-BetaAlgorithm. As the search of the game tree proceeds,the value of the best terminal node found sofar changes. It has been known since 1958 that pruning was possible in a minimax search (29), but according to Knuth and Moore, the ideas go back further, to McCarthy and his group at MIT. The first thorough treatment of the topic appears to be Brudno's 1963 paper (30). The alpha-beta algorithm (see Alpha-beta pruning) employs lower (alpha) and upper (beta)bounds on the expectedvalue of the tree. These bounds may be used to prove that certain moves cannot affect the outcomeof the searchand hence that they can be pruned or cut off. As part of the early descriptions about how subtrees were pruned, a distinction between deep and shallow cutoffs was made. Someversions of the alpha-beta algorithm used only a single bound (alpha) and repeatedly reset the beta bound to infinity, so that deepcutoffs were not achieved.Knuth and Moore's recursive F2 algorithm (31) corrected that flaw. In Figure 1 Pascal-like pseudocodeis used to present the alpha-beta algorithm, AB, in Knuth and Moore's negamax framework. A Return statement has been introduced as the convention for exiting the function and returning the best subtree value or score.Omitted are details of the game-specific functions Make and Undo (to update the game board), Generate (to find moves),and Evaluate (to assess terminal nodes).In the pseudocodeof Figure 1 the max(alpha, merit) operation represents Fishburn's "fail-soft" condition (32) and ensures that the best available value is returned (rather than an alpha-beta bound). This idea is usefully em-
position; aIpha, beta, depth : integer) : integer; ) { p is pointer to the current node ) t atpha and beta are window bounds t depth is the remaining search Iength ) { the vatue of the subtree is returned ) V A Rm e n i t , j , v a t u e : i n t e g e r ; p o s n : A R R A Yt l . . M A X t l I D T H l0 F p o s i t i o n ; { Note: dePth must be Positive } BEGIN { hor i zon node, il?x i mum depth? } IF depth = 0 THEN R e t u r n ( E v a I u a t e ( p)); F U N C T I OANB ( p :
posn := Generate(p); I F e m p t y ( p o s n )T H E N R e t u r n ( E v a t u a t e ( p));
161
{ p o i n t t o s u cc e s s o r p o s i t i o n s { t eaf, ho moves?
) )
{ f i n d m e ri t o f b e s t v a r i a t i o n ) merit := -MAXINT; F O Rj : = 1 T 0 s i z e o f ( p o s n ) D O B E G I N { make current move } Make(posnljl); v a t u e : = - A B ( p o s n l j l , - b e t a , - m a x ( a L p h a , m e r i t ) ,d e p t h - 1 ); IF (vatue { note new best score } merit := vaIue; Undo(posntjl); { retract current move} { cutoff? } IF (merit G 0 T 0d o n e ; E N Di done: Return(merit); E N Di
Figure 1. Depth-limited alpha-beta function.
162
COMPUTERCHESSMETHODS
ployed in some of the newer refinements to the alpha-beta algorithm.
more efficient and requires no assumptionsabout the choiceof aspiration window (37).
Minimal Came Tree. If the "best" move is examined first at every node, the tree traversed by the alpha-beta algorithm is referred to as the minimal game tree. This minimal tree is of theoretical importance since its size is a measure of a lower bound on the search. For uniform trees of width W branches per node and a search depth of D ply, there are
Minimal Window Search.Theoretical advances such as Scout (38) and the comparable minimal window search techniques (32,37) were the next products of research. The basic idea behind these methods is that it is cheaper to prove a subtree inferior than to determine its exact value. Even though it has been shown that for bushy trees minimal window techniques provide a significant advantage (37), for random game trees it is known that even these refinements are asymptotically equivalent to the simpler alpha-beta algorithm. Bushy trees are typical for chess,and so many contemporary chess programs use minimal window techniques through the principal variation search (PVS) algorithm. In Figure 3 a Pascal-like pseudocodeis used to describePVS in a negamax framework but with game-specificfunctions Make and Undo omitted for clarity. Here the original version of PVS has also beenimproved by using Reinefeld'sdepth - 2 tdea(39), which ensures that re-searchesare only done when the remaining depth of search is greater than 2.
wtDt2l+ wLDt2t 1 terminal nodes in the minimal game tree. Although others derived this result, the most direct proof was given by Knuth and Moore (31). Since a terminal node is rarely a leaf, it is often called a horizon node, with D the distance to the horizon (33). AspirationSearch.An alpha-beta searchcan be carried out with the initial bounds covering a narrow range, one that spans the expected value of the tree. In chess these bounds might be (MaterialBalance Pawn, MaterialBalance + Pawn). If the minimax value falls within this range, no additional work is necessary,and the searchusually completesin measurably less time. The method was analyzed by Brudno (30), referred to by Berliner (34), and experimented with in Tech (35) but was not consistently successful.A disadvantage is that sometimesthe initial bounds do not enclosethe minimax value, in which case the search must be repeated with correctedbounds, as the outline of Figure 2 shows. Typically these failures occur only when material is being won or lost, in which case the increased cost of a more thorough search is warranted. Because these re-searchesuse a semi-infinite window, from time to time people experiment with a "sliding window" of (V , V * PieceValue)instead of ( % +MAXINT). This method is often effective but can lead to excessivere-searching when mate or large material gain/loss is in the offing. After L974 "iterated aspiration search" came into general use, as follows: Before each iteration starts, alpha and beta are not set to - infinity and + infinity as one might expect,but to a window only a few pawns wide, centered roughly on the finat score [ualue] from the preuious iteration (or preuious moue in the caseof the first iteration) . This setting of "high hopes"increases the number of alpha-beta cutoffs.(36) Even so, although aspiration searching is still popular and has much to commend it, minimal window search seemsto be t { t {
A s s u m eV = e= dePth = P=
aLpha beta
V := AB IF (V V := ELSE IF (V V
est i mated va I ue of posi t i on p, and expected error Limit current distance to horizon position being searched v e; t I ower bound V + e; t upper bound
] ) ) ) ] )
(p, atpha, beta, depth ) ; beta) THEN { AB ( p, V, +ltlAX I NT, dept h)
f a i Li n s h i g h
}
aLpha) THEN { A B ( p , - t t f A x I N T ,V , d e p t h ) ;
fail,ing low
]
A s u c c e s s f u I s e a r c h h a s n o w b e e n c o m pI e t e d V n o w h o Ld s t h e c u r r e n t v a I u e o f t h e t r e e Figure
2. Narrow-window
aspiration search.
Forward Pruning. To reduce the size of the tree that should be traversed and to provide a weak form of selective search, techniques that discard some branches have been tried. For example, tapered N-best search (11,16)considersonly the Nbest moves at each node. Here N usually decreaseswith increasing depth of the node from the root of the tree. As Slate and Atkin observe, "The major design problem in selective search is the possibility that the lookahead processwill exclude a key move at a low level in the game tree" (36). Good examples supporting this point are found elsewhere (40). Other methods, such as marginal forward prunin g (4I) and the gamma algorithm (18), omit moveswhose immediate value is worse than the current best of the values from nodes already searchedsince the expectation is that the opponent'smove is only going to make things worse. Generally speaking, these forward pruning methods are not reliable and should be avoided. They have no theoretical basis, although it may be possible to develop statistically sound methods that use the probability that the remaining moves are inferior to the best found so far. One version of marginal forward pruning, referred to as razorin g (42), is applied near horizon nodes.The expectationin all forward pruning is that the side to move can improve the current value so it may be futile to continue. Unfortunately, there are caseswhen the assumption is untrue, for instance, in zugzwangpositions. As Birmingham and Kent point out, their Master program "defines zugzwang precisely as a state in which every move available to one player creates a position having a lower value to him (in its own evaluation terms) than the present bound for the position" (42). Marginal pruning may also break down when the side to move has more than one piece en prise (e.9.,is forked), and so the decisionto stop the search must be applied cautiously. Despite these disadvantages,there are soundforward pruning methods, and there is every incentive to develop more since it is one way to reduce the size of the tree traversed, perhaps to less than the minimal game tree. A goodprospectis through the development of programs that can deducewhich branches can be neglected by reasoning about the tree they traverse.
COMPUTERCHESSMETHODS F U N C T I 0P NV S ( p :
V A Rm e r i t , j ,
position; atpha, beta, depth : integer) : integer; t p is pointer to the current node t atpha and beta are rindow bounds t depth is the remaining search Length { the vatue of the subtree is returned vatue: integer;
p o s n : A R R AtY 1 " ' ' A X . ' J I D ToHt I o " i t i l : : i
BEGIN IF depth = 0 THEN R e t u n n ( E v a t u a t e ( p)); posn := Generate(p); iF empty(posn) THEN R e t u r n ( E v a t u a t e ( p));
163
) ) ) }
o " i . n m u s tb e p o s i t i v e )
{ h o r i z o n n o d e , m a x i m u md e p t h ? } t point to successor positions ) { I'eaf, no moves? }
{ principal. variation? } merit := -PVS (posn[1], -beta, -atpha, depth-1]; F O Rj : = 2 T 0 s i z e o f ( p o s n ) D 0 B E G I N { cutoff? } IF (menit > beta) THEN G0T0 done; atpha := max(merit, atpha); t fai t-soft condition ) { zero-width minima[-windowsearch ] vatue:= -PVS (posnljl, -aLpha-1, -alpha, depth-1); IF (vatue > merit) THEN { re-search, if "fait-high" } IF (atpha < va[ue) AilD (vatue < beta) AND(depth > 2) THEN merit := -PVS (posnIj], -beta, -vatue, depth-1) E L S Em e r i t : = v a t u e ; E N D; done: Return(merit); EI'ID; Figure 3. Minimal window principal variation search.
Move ReorderingMechanisms.For efficiency (traversal of a smaller portion of the tree) the moves at each node should be ordered so that the more plausible ones are searchedsoonest. Various ordering schemesmay be used. For example, "since the refutation of a bad move is often a capture, all captures are consideredfirst in the tree, starting with the highest valued piece captured" (43). Special techniques are used at interior nodes for dynamically reordering moves during a search. In the simplest case, at every level in the tree a record is kept of the moves that have been assessedas being best or good enough to refute a line of play and so cause a cutoff. As Gillogly observed,"If a move is a refutation for one line, it may also refute another line, so it should be consideredfirst if it appears in the legal move list" (43). Referred to as the killer heuristic, a typical implementation maintains only the two most frequently occuming "killers" at each level (36). Recently, a more powerful scheme for reordering moves at an interior node has been introduced. Named the history heuristic, it "maintains a history for every legal move seen in the search tree. For each move, a record of the move's ability to cause a refutation is kept, regardless of the line of play" (44). At an interior node the best move is the one that either yields the highest merit or causesa cutoff. Many implementations are possible,but a pair of tables (eachof 64 x 64 entries) is enough to keep a frequency count of how ofben a particular move (defined as a from-to square combination) is best for each side. The available moves are reordered so that the most successfulones are tried first. An important property of this so-calledhistory table is the sharing of information about the effectivenessof moves throughout the tree rather than only at nodes at the same search level. The idea is that if a move is frequently good enough to cause a cutoff, it will probably be effective whenever it can be played.
QuiescenceSearch. Even the earliest papers on computer chessrecognizedthe importance of evaluating only those positions that are "relatively quiescent" (7) or "dead" (5). These are positions that can be assessedaccurately without further search. Typically they have no moves, such as checks,promotions, or complex captures, whose outcome is unpredictable. Not all the moves at horizon nodes are quiescent (i.e., lead immediately to dead positions) so some must be searchedfurther. To limit the sizeof this so-calledquiescencesearch,only dynamic rnovesare selectedfor consideration.These might be as few as the moves that are part of a single complex capture but can expand to include all capturing moves and all responsesto check (43). Ideally, passedpawn moves (especially those close to promotion) and selected checks should be included (2I,25), but these are often only examined in computationally simple end games. The goal is always to clarify the node so that a more accurate position evaluation is made. Despite the obvious benefits of these ideas, the realm of quiescence search is unclear because no theory for selecting and limiting the participation of moves exists. Present quiescence search methods are attractive becausethey are simple, but from a chess standpoint they leave much to be desired, especially when it comesto handling forking moves and mate threats. Even though the current approaches are reasonably effectiv€, 4 more sophistieated method of extending the search or of identifying relevant moves to participate in the selective quiescencesearch is needed (45). On the other hand, Sargon managed quite well without quiescencesearch using direct computation to evaluate the exchange of material (a6). Horizon Effect. An unresolved defect of chessprograms is the insertion of delaying moves that causeany inevitable loss of material to occur beyond the program's horizon (maximum search depth) so that the loss is hidden (33). The "horizon
164
COMPUTERCHESSMETHODS
effect" (qv) is said to occur when the delaying moves give up additional material to postponethe eventual loss.The effect is less apparent in programs with more knowledgeable quiescence searches (45), but all programs exhibit this phenomenon. There are many illustrations of the difficulty; the example in Figure 4, which is based on a study by Kaindl (4b), i, clear. Here a program with a simple quiescencesearchinvolving only captures would assumethat any blocking move saves the queen.Even an eight-pty search(b3-b2,B x iZ; c+_c3,B x c3; d5-d4, B x d4; e6-eb,B x eb) would not seethe inevitable "thinking" that the queen has been saved at the expenseof four pawns! Thus, prosams with a poor or inad"qr1u[uquiescencesearch suffer more from the horizon effect. The besl way to provide automatic extension of nonquiescent positions is still an open question, despite proposals such as bandwidth heuristic search (47).
obvious why iterative deepening is effective; as indeed it is not, unless the search is guided by the entries in a transposition table (or the more specialized refutation table), which holds the best moves from subtrees traversed during the previous iteration. All the early experimental evidenceindicated that the overhead cost of the preliminary D - 1 iterations was often recovered through a reduced cost for the D-pIy search. Later the efficiency of iterative deepening was quantified to assessvarious refinements, especially memory table assists (37). Today the terms progressiveand iterative deepeningare often used synony-o,rcly.
Transpositionand RefutationTables.The results (merit, best move, status) of the searchesof nodes(subtrees)in the tree can be held in a large hash table (16,36,48).Such a table serves several purposes,but primarily it enables recognition of move transposition, leading to a subtree that has been seen before Progressiveand lterative Deepening.The term progressive and so eliminate the need to search. Thus, successfuluse of a deepenittgwas used by de Groot (6) to encompassthe notion of transposition table is an example of exact forward pruning. selectively extending the main continuation of interest. This Many programs also store their opening book, where different type of selective expansion is not performed by programs em- move orders are common, in a way that is compatible with ploying the alpha-beta algorithm, except in the sense of in- accessto the transposition table. Another important purpose creasing the search depth by one for each checking move on of a transposition table is as an implied move reordering mechthe current continuation (path from root to hori zon)or by per- anism. By trying first the available move in the table, an forming a quiescencesearch from horizon nodes until dead expensive move generation may be avoided (48). positions are reached. By far the most popular table accessmethod is the one In the early 1970sseveral peopletried a variety of ways to proposedby Zobrist (49). He observed that a chess position control the exponential growth of the tree search. A simple constitutes placement of up to 12 different piece types {K, q, R , fixed depth search is inflexible, especially if it must be com- B, N, P, -K, . . , -P} on to a 64-squareboard. Thus, a set of pleted within a specifiedtime. Gillogly, author of Tech (4g), 12 x 64 unique integers (plus a few more for en passanf and coinedthe term iterative deepeningto distinguish a full-width castling privileges), {Ri]}, may be used to represent all the search to increasing depths from the progressively rnore fo- possiblepiecesquarecombinations.For best results theseintecusedsearch describedby de Groot. About the sametime Slate gers should be at least 32 bits long and be randomly indepenand Atkin sought a better time control mechanism and introdent of each other. An index of the position may be produced duced the notion of an iterated search (gG)for carrying out a by doing an EXCLUSIVE OR on selectedintegers as follows: progressively deeper and deeper analysis. For example, an itPj : Ro x o186 x or . . . xorR, erated series of one-ply, two-ply, three-ply, and so on, searches is carried out, with each new search first retracing the best where the Ro, . , R* are integers associatedwith the piece path from the previous iteration and then extending the placements.Movement of a "man" from the piece square assosearch by one ply. Early experimenters with this schemewere ciated with Rs to the piece square associated with R' yields a surprised to find that the iterated search often required less new index, time than an equivalent direct search. It is not immediately Pp: (Pi x or R) x orBl
ry ffiu%'^rchry %Kry%ru % %,r-ry,,',ffi % ,ffit,ffi_t %,r% %
One advantage of hash tables is the rapid accessthat is possible, and for further speedand simplicity only a single probe of the table is normally made. More elaborate schemeshave been tried, but often the cost of the increasedcomplexity of managing the table swamps any benefits from improved tabte usage. Table 1 shows the usual fields of each entry in the hash table. Figure 5 contains sample pseudocodeshowing how the entries Move, Merit, Flag, and Height are used. Not shown are the
Table l. Typical Transposition Table Entry Lock
Merit Flag
To ensure the table position is identical to the tree position Best move in the position, determined from a previous search Value of subtree, computed previously Indicates whether merit is upper bound, lower bound,
Height
or true merit Length of subtree upon which merit is based
Move
bcdefe Blackto move Figure
4. Horizon effect.
COMPUTERCHESSMETHODS F U N C T I OANB ( p : p o s i t i o n ; a t P h a , b e t a , d e p t h : i n t e g e r ) : i n t e g e r ; integer; vARvaIue, height, merit: j, move: l..MAXWIDTH i f L a g : ( V A L I D , L B O U N DU, B O U N D ) ; p o s n : A R R A Yt 1 . . i l A x l ' t I D T H lO F p o s i t i o n ; BEGIN t retrieve merit and best move for the current position ) Retrieve(p, height, merit, fLag, move); { t t
height is the effective subtree tengthposition not in tabLe. height < 0 position in tabte. height > 0
IF (height IF (fLag = VALID) THEN Return(merit); I F ( f L a g = L B O U N DT) H E N , erit); aLpha := max(aLPham I F ( f L a g = U B O U N DT) H E N beta := min(beta, merit); IF (aLpha Return(merit); END; Note: update of the al.pha or beta bound t is not vatid in a setective search. { If merit in table insufficient to end t s earch try best move (from tabIe) first, { before generating other moves. t IF (depth = 0) THEN R e t u r n ( E v a I u a t e ( p )) ; IF (height
t
hor i zon node?
} ) )
) } ] ] ) )
t first try movefrom tabte ) merit := -AB (posnlmovel, -beta, -al.pha, depth-1); IF (merit G 0 T Od o n e ; E N D E L S Em e r i t : = - M A X I N T ; generate moves No cut-off, ) t posn := Generate(p)i [eaf, mate or statemate? ] IF empty(posn) THEN t R e t u r n ( E v a L u a t e ( p )) ; FORj := 1 T0 sizeof(posn) D0 I F j + m o v e T H E NB E G I i I vatue := -AB (posntjl, IF (vaIue merit := vaIue1 move := i; IF (merit G0T0 done; END; END; done: fLag:= VALID; IF (merit fLag := UBOUND; IF (merit fLag := LBOUND; IF (height Store(p, depth, merit, Return(merit); END;
t using fail,-soft condition ) - b e t a , - m a x ( a t p h a , m e r i t ) , d e p t h - 1) ;
16s
table guides a progressive deepening searchjust as well as a transposition table. In fact, a refutation table is the preferred choice of commercial systems or users of memory-limited processors.A small triangular workspace t@ x D)12 entriesl is neededto hold the current continuation as it is generated,and these entries in the work spacecan also be used as a sourceof killer moves (51). Summary.The various terms and techniques described have evolved over the years. The superiority of one method over another often dependson how the elements are combined. The utility of iterative deepenirg, aspiration search,PVS, and transposition and refutation tables is perhaps best summarized by a revised version of an establishedperformancegraph (37) (Fig. 6). That graph was made from data gathered by a simple chess program when analyzing the 24 standard positions of the Bratko-Kopec test (24).Analysis of thosepositions requires the search of trees whose nodes have an average width of W : 34 branches. Thus, it is possible to use the formula for the terminal (horizon) nodes in a uniform minimal game tree as an estimate of the lower bound on the searchsize (seeFiS. 6). For the results presented in Figure 6 the transposition table was fixed at 8000 entries so that the effectsof table overloading may be seen. Figure 6 shows that: (a) iterative deepeninghas negligible cost and so is useful as a time control mechanism; (b) PVS is superior to aspiration search;(c) a refutation table is a space-efficientalternative to a transposition table for guiding both the next iteration and a re-search; (d) odd-ply alpha-beta searchesare more efficient than even-ply ones; (e) transposition table size must increase with depth of search; and (f) transposition and/or refutation tables plus the history heuristic are an effective combination, achieving search results close to the minimal game tree for odd-ply search depths. Strengthsand Weaknesses
t
update hash tabte )
fLag, move);
Figure 5. Alpha-betawith transpositiontable.
functions Retrieve and Store, which accessand update the transposition table. A transposition table also identifies the preferred move sequencesused to guide the next iteration of a progressivedeepening search. Only the move is important in this phase since the subtree length is usually less than the remaining search depth. Transposition tables are particularly advantageous to methods like PVS since the initial minimal window search loads the table with useful lines that are used in the event of a re-search. On the other hand, for deeper searches,entries are commonly lost as the table is overwritten even though the table may contain more than a million entries (50). Under these conditions a small fixed-sizetransposition table may be overused(overloaded)until it is ineffective as a means of storing the continuations. To overcomethis fault, a special table for holding these main continuations (the refutation lines) is also used. The table has W entries containing the D elements of each continuation. For shallow searches(D < 6) a refutation
Anatomy of a ChessProgram. A typical chessprogram contains the following three distinct elements: board description and move generation, tree searching/pruning, and position evaluation. Many people have basedtheir first chessprogram on Frey and Atkin's instructive Pascal-basedmodel (52). Although several goodproposalsexist in readily available books (14,20) and articles (53,54), the most efficient way of representing all the tables and data structures necessaryto describe a chessboard is not yet known. From these tables the move list for each position can be generated. Sometimes the Generate function producesall the feasible movesat once,which has the advantage that the movesmay be sorted to improve the probability of a cut off. In small memory computers, on the other hand, the moves are produced one at a time. This savesspace and perhaps time whenever an early cutoff occurs. However, sinceonly limited sorting is possible(capturesmight be generated first), the searching efficiency is generally lower. In the area of searching/pruning methods, variations on the depth-limited alpha-beta algorithm remain the preferred choice. All chessprograms fit the following general model. A full-width "exhaustive" search (all moves are considered)is done at the first few ply from the root node. At depths beyond this exhaustive layer some form of selective search is used. Typically, unlikely or unpromising moves are simply dropped from the move list. More sophisticated programs carry out an
166
COMPUTERCHESSMETHODS
-cl O l-
c)
a
(.) I
? (J c) t-
;q.)
.: 6
(.) q)
l-
a,) N.
c\
s e a r c hd e p t h ( p l y ) Figure
6. Comparison of alpha-beta enhancements.
extensive analysis to select those moves that are to be discarded at an interior node. Even so, this type of forward pruning is known to be error prone and dangerous;it is attractive becauseof the big reduction in tree size that ensues.Finally, the Evaluate function is invoked at the horizon nodesto assess the merits of the moves. Many of these are captures or other forcing moves that are not "dead," and so a limited quiescence search is carried out to resolve the unknown potential of the move. The evaluation processis the most important part of a chessprogram becauseit estimates the values of the subtrees that extend beyond the horizon. Although in the simplest case Evaluate simply counts the material balance,for superior play it is also necessaryto measure many positional factors, such as pawn structures. These aspects are still not formalrzed, but adequate descriptions by computer chess practitioners are available in books (14,86).
mainframes will continue to be faster for the near future, it is only a matter of time before massiveparallelism is applied to computer chess.The problem is a natural demonstration piece for the power of distributed computation since it is processor intensive and the work can be partitioned in many ways. Not only can the game trees be split into similar subtreesbut also parallel computation of such componentsas move generation, position evaluation, and quiescencesearch is possible. Improvements in hardware speedhave been an important contributor to computer chess performance. These improvements will continue, not only through faster special-purpose processorsbut also by using many processingelements.
Software Advances.Many observers attributed the advances in computer chess through the 1970s to better hardware, particularly faster processors.Much evidence supports that point of view, but major improvements also stemmedfrom HardwareAdvances.Computer chesshas consistentlybeen a better understanding of quiescenceand the horizon effect and a better encoding of chessknowledge. The benefits of aspiin the forefront of the application of high teehnology. With search (43), iterative deepening (36) [especiallywhen ration of special-purpose (55), introduction the saw the 1970s Cheops a refutation table (51)1,the killer heuristic (43),and with in used tried; were of computers networks Later for chess. hardware New York (1983) Ostrich used an eight-processorData Gen- transposition tables (16,36)were also appreciated,and by 1980 eral system (56) and Cray Blitz a dual-processorCray X-MP all were in general use. One other advance was the simple (21). Someprograms used special-purposehardware [see,€.8., expedient of "thinking on the opponent's time" (43), which Belle (57,58)and Bebe,Advance3.0, and BCP (14)1,and there involved selecting a response for the opponent, usually the predicted were several experimental commercial systems employing move predicted by the computer, and searching the and this tactic, by is lost Nothing reply. next the position for chips custom VLSI chips. This trend toward the use of custom may be saved time the made, is prediction successful a masterwhen latest the of success the by will continue, as evidenced caliber chessprogram Hitech from Carnegie-Mellon Univer- accumulated until it is necessary or possible to do a deeper emsity based on a new chip for generating moves (59). Although search. Anticipating the opponent's response has been
COMPUTERCHESSMETHODS
braced by all microprocessor-basedsystems since it increases their effective speed. Not all advances work out in practice. For example, in a test with Kaissa the method of analogies "reduced the search by a factor of 4 while the time for studying one position was increasedby a factor of 1.5" (60).Thus, a dramatic reduction in the positions evaluated occurred, but the total execution time went up and so the method was not effective. This sophisticated technique has not been tried in other competitive chess programs. The essenceof the idea is that captures in chessare often invariant with respect to several minor moves. That is, some minor moves have no influence on the outcome of a specific capture. Thus, the true results of a capture need be computed only onceand stored for immediate use in the evaluation of other positions that contain this identical capture! Unfortunately, the relation (sphere of influence) between a move and those piecesinvolved in a capture is complex,and it can be as much work to determine this relationship as it would be to simply reevaluate the exchange.However, the method is elegant and appealing on many grounds and should be a fruitful area for further research as a promising variant restricted to pawn moves illustrates (61). EndGame Play. During the 1970sthere developeda better understanding of the power of pawns in chess and a general improvement in the end game play. Even so, end games remained a weak feature of computer chess.Almost every game illustrated some deficiency through inexact play or conceptual blunders. More commonly, however, the progTamswere seen to wallow and move piecesaimlessly around the board. A good illustration of such difficulties is a position from a game between Duchessand Chaos (Detroit, 1979) (seeFig. 7), which was analyzed extensively in an appendix to a major reference (20). After more than 10 hours of play the position in Figure 7 was reached, and since neither side was making progressthe game was adjudicated after white's LLIth move of Bc6-d5. White had just completed a sequenceof 2L reversible moves with only the bishop, and black had responded correctly by simply moving the king to and fro. Duchesshad only the most rudimentary plan for winning end games.Specifically,it knew about avoiding a 50-moverule draw. Had the game continued,
% %,'ffit 'lffi
%%,%,%n
167
then within the next 29 movesit would either play an irreversible move like Pf6 -f7 or give up the pawn on f6. Another 50move cycle would then ensue,and perhaps eventually the possibility of winning the pawn on a3 might be found. Even six years later it is doubtful that many programs could handle this situation any better. There is simply nothing much to be learned through search. What is needed here is some higher notion involving goal-seekingplans. All the time a solution must be sought that avoids a draw. This latter aspectis important since in many variations black can simply offer the sacrifice bishop takes pawn on f6 (B x f6) becauseif the white king recaptures with K x f6, a stalemate results. Sometimes, however, chessprograms are supreme. At Toronto in 1977, in particular, Belle demonstrateda new strateW for defending the lost ending KQ versus KR against chess masters. Although the ending still favors the side with the queen, precise play is required to win within 50 moves, 8s several chessmasters were embarrassedto discover.In speed chessBelle also often dominates masters, as many examplesin the literature show (20). Increasingly, chess programs are teaching even experts new tricks and insights. As long ago as 1970 Strohlein built a database to find optimal solutions to several simple three- and four-pieceend games (kings plus one or two pieces)(62). Using a Telefunken TR4 (48-bit word, 8-t s operations) he obtained the results summarized in Table 2. Many other early workers on end gamesbuilt databasesof the simplest endings. Their approach was to develop optimal sequencesbackward from all possiblewinning positions (mate or reduction to a known subproblem) (63,64).These works have recently been reviewed and put into perspective (65). The biggest contributions to chess theory, however, have been made by Belle (qv) and Ken Thompson (66). They have built databases to solve five-piece end games. Specifically, KQX versus KQ (where X :Q, R, B, or N), KRX versus KR, and KBB versus KN. This last casemay prompt another revision to the 50-move rule since in general KBB versus KN is won (not drawn), and lessthan 67 movesare neededto mate or safely capture the knight (66). Also completedis a major study of the complex KQP versus KQ ending. Again, often more than 50 maneuvers are required before a pawn can advance (66). For more complex endings involving several pawns, the most exciting new ideas are those on chunking. Based on these ideas, it is claimed that the "world's foremost expert" has been generated for endings where each side has a king and three pawns (67,68). MemoryTables. Others have pointed out (36,50)that a hash table can also be used to store information about pawn formations. Since there are usually far more movesby piecesthan by pawns, the value of the base pawn formation for a position must be recomputed several times. It is a simple matter to build a hash key based on the location of pawns alone and so
ffi# ,ffi,ffimry t%,%,%% %%%
abcdefgh
Whiteto move Figure
7. Lack of end game plan.
Table 2. Maximum Moves to Win Simple End Games Pieces
Queen Rook Rook vs. Bishop Rook vs. Knight Queen vs. Rook
Moves
Computation Time
10 16 18 27 31
6.5 min 9 min 6h30min 14 h 16 min 29h9min
168
COMPUTERCHESSMETHODS
store the values of pawn formations in a hash table for immediate retrieval. Hyatt found this table to be effective (21) since otherwise 10-207oof the searchtime was taken up with evaluation of pawn structures. A high (98-997o) successrate was reported QD. King safety can also be handled similarly (36,50)sincethe king has few movesand for long periodsis not under attack. Transposition and other memory tables comeinto their own in end games since there are fewer piecesand more reversible moves. Search time reduction by a factor of 5 is common, and in certain types of king and pawn endings it is claimed that experiments with Cray Bhtz and Belle have producedtrees of more than 30 ply, representing speedupsof well over a 100fold. Even in complex middle games,however, significant performance improvement is observed.Thus, use of a transposition table provides an exact form of forward pruning and as such reducesthe size of the search space,in end gamesoften to less than the minimal game tree! The power of forward pruning is well illustrated by the following study of "Problem No. 70" (69) (Fig. 8), which was apparently first solved (52) by Chess4.9 and then by Belle. The only complete computer analysis of this position was provided later (2L). As Hyatt puts it, a solution is possible because "the search tree is quite narrow due to the locked pawns" (21). Here Cray Blitz is able to find the correct move of Kal-bl at the 18th iteration. The complete line of the best continuation was found at the 33rd iteration after examining four million nodes in about 65 s of Cray-l time. This was possible because the transposition table had become loaded with the results of draws by repetition, and so the normal exponential growth of the tree was inhibited. Also, at every iteration the transposition table was loaded with losing defences corresponding to lengthy searches.Thus, the current iteration often yielded results equivalent to a much longer 2(D l)-ply search. Thompson refers to this phenomenonas "seeing over the horizon" (66).
chess position evaluation. The essenceof the selective approach is to narrow the width of search by forward pruning. Some selection processesremoved implausible moves only (70), thus abbreviating the width of search in a variable manner not necessarily dependent on node level in the tree. This technique was only slightly more successfulthan other forms of forward pruning and required more computation. Even so,it too could not retain sacrificial moves. So the death knell of selective search was its inability to predict the future with a static evaluation function. It was particularly susceptible to the decoysacrifice and subsequententrapment of a piece.Interior node evaluation functions that attempted to deal with these problems becametoo expensive.Even so, in the eyes of some,selective methods remain as a future prospectsince "selective search will always loom as a potentially faster road to high level play. That road, however, requires an intellectual break-through rather than a simple application of known techniques" (58).The reasonfor this belief is that chessgame trees grow exponentially with depth of search. Ultimately, it will becomeimpossible to obtain the necessarycomputing power to search deeperwithin normal time constraints. For this reason most chessprograms already incorporate some form of selective search,often as forward pruning. Thesemethods are quite ad hoc since they are not basedon a theory of selectivesearch. Although nearly all chessprograms have some form of seIective search, even if it is no more than the discarding of unlikely moves,at present only two major programs (Awit and Chaos) do not consider all moves at the root node. Despite these programs can no longer comtheir occasionalsuccesses, pete in the race for Grand Master status. Nevertheless, although the main advantage of a program that is exhaustive to some chosen search depth is its tactical strength, it has been shown that the selective approach can also be effective in tactical situations. In particular, Wilkins's Paradise program demonstrated superior performance in "tactically sharp middle game positions" on a standard suite of tests (71).Paradisewas designedto illustrate that a selective searchprogram can also find the best continuation when there is material to be gained, from beta advances came SelectiveSearch. Many software ter understanding of how the various componentsin evalua- through searching but a fraction of the game tree viewed by tion and search interact. The first step was a move away from such programs as Chess 4.4 and Tech. Furthermore, it can do selective search by providing a clear separation between the so with greater success than either program or a typical algorithmic component, search, and the heuristic component, A-classplayer (71). However, a 9: 1 speedhandicapwas necessary to allow adequate time for the interpretation of the Maclisp program. Paradise's approach is to use an extensive static analysis to produce a small set of plausible winning plans. Once a plan is selected,"it is used until it is exhausted or until the program determines that it is not working." In addition, Paradise can "detect when a plan has been tried earlier along the line of play and avoid searching again if nothing has changed" (71). This is the essenceof the method of analogiestoo. As Wilkins says,the "goal is to build an expert knowledge base and to reason with it to discover plans and verify them within a small tree" (71). Although Paradise is successfulin this regard, part of its strength lies in its quiescence search, which is seen to be "inexpensive compared to regular search," despite the faet that this search "investigates not only captures but forks, pins, multimove mating sequences,and other threats" (71). The efficiency of the program lies in its powerful evaluation so that usually "only one move is investigated at each node, except when a defensive move abcdefeh Whiteto move fails." Pitrat has also written extensively on the subject of finding plans that win materi aL(72), but neither his ideas nor Figure 8. Transposition table necessity.
KffiffiD
COMPUTERCHESSMETHODS
those in Paradise have been incorporated into the competitive chessprograms of the 1980s. Searchand KnowledgeErrors. The following game was the climax of the 15th ACM NACCC, in which all the important programs of the day participated. Had Nuchess won its final match against Cray Blitz,there would have been a five-way tie between these two programs and Bebe, Chaos,and Fidelity X. Such a result almost came to pass, but suddenly Nuchess "snatched defeat from the jaws of victorY," as chesscomputers are prone to do. Complete details about the game are not important, but the position shown in Figure 9 was reached.Here, with Rf6 x 96, Nuchess wins another pawn, but in so doing enters a forced sequencethat leaves Cray Blitz with an unstoppablepawn on a7, as follows:
169
that addressedthis issue; a thesis that could have someimpact on the way expert systems are tested and built since it demonstrates that there is a correct order to the acquisition of knowledge if the newer knowledge is to build effectively on the old.
Areasof FutureProgress.Although most chessprograms are now using all the available refinements and tables to reduce the game tree traversal time, only in the ending is it possible to search consistently less than the minimal game tree. Selective search and forward pruning methods are the only real hope for reducing further the magnitude of the search. Before this is possible, it is necessary for the progfams to reason about the trees they see and deduce which branches can be ignored. Typically, these wiII be branches that create permanent weaknessesor are inconsistent with the current themes. The difficulty will be to do this without losing sight of tactical Rg8 x 96+ 45. Rf6 x 96 ? factors. Nc8 x d6 46. KS5 x 96 Improved performance will also come about by using faster 47. Pc5 x d6 computers and through the construction of multiprocessorsysMany explanations can be given for this error, but all have to tems. One early multiprocessor chess program was Ostrich do with a lack of knowledge about the value of pawns. Perhaps (56,74).Other experimental systemsfollowed,including Parablack's passed pawn was ignored because it was still on its belle (75) and ParaPhoenix (76). None of these systems,nor home square, or perhaps Nuchess simply miscalculated and the strongest multiprocessor program Cray Bhtz (2t1, consist"forgot" that such pawns may initially advancetwo rows? An- ently achievesmore than a five-fold speed-upeven when eight other possibility is that white became lost in some deep processorsare used (76). There is no apparent theoretical limit searchesin which its own pawn promotes. Even a good quies- to the parallelism, but the practical restrictions are great and cencesearch might not recognizethe danger of a passedpawn, may require some new ideas on partitioning the work as well especially one so far from its destination. In either case this as more involved scheduling methods. Another major area of research is the derivation of strateexample illustrates the need for knowledge of a type that cannot be obtained easily through search but that humans are gies from databasesof chessend games.It is now easy to build . Pa5, expert system databasesfor the classical end gamesinvolving able to see at a glance (6). The game continued,47. and white was neither able to prevent promotion nor advance four or five pieces.At present these databasescan only supply the optimal move in any position (although a short principal its own pawn. There are many opportunities for contradictory knowledge continuation can be provided by way of expert advice).What is interactions in chessprograms. Sometimeschessfolklore pro- needednow is a program to deducefrom these databasesoptivides ground rules that must be applied selectively. Such ad- mally correct strategies for playing the end game. Here the vice as a knight on the rim is dim is usually appropriate, but database could either serve as a teacher of a deductive inferin special casesplacing a knight on the edge of the board is ence program or as a tester of plans and hypotheses for a sound, especially if it forms part of an attacking theme and is general learning program. Perhaps a good test of these methunassailable. Not enough work has been done to assessthe ods would be the production of a program that could derive utility of such knowledge and to measure its importance. Re- strategies for the well-defined KBB versus KN end game. A cently, Schaeffercompleted an interesting doctoral thesis (73) solution to this problem would provide a great advanceto the whole of AI.
BIBLIOGRAPHY
%,;.,ruffiLru
1. A. G. Bell, The MachinePla,ysChess?, PergamonPress,Oxford, 1978. 2. D. N. L. Levy,Chessand Computers, Batsford,London,1976.
'&ft abcdefgh White'smove 45 Figure
9. A costly miscalculation.
3. T. Nemes,"The Chess-PlayingMachine," Acta Technico,Hungarian Academyof Sciences,Budapest,1951,pp. 215-239. 4. K. Zuse, "Chess Programs," in The Plankalkul, Report No. 106, Gesellschaftfur Mathematik und Datenverbeitung, Bonn, 1976, pp. 201-244 (translation of German original, 1945). 5. A. M. Turi.g, "Digital Computers Applied to Gam€s," in B. V. Bowden (ed.), Faster Than Thoughf, Pitman, London, 1953, pp. 286-297. 6. A. D. de Groot, Thought and Choicein Chess,Mouton, The Hague, 1965. 7. C. E. Shannon, "Programming a computer for playing chess,"phiIos. Mag. 41,256-275 (1950).
17O
COMPUTERCHESSMETHODS
8. J. Kister, P. Stein, S. Ulam, W. Walden, and M. Wells, "Experiments in chess,"JACM 4, I74-L77 (1957). 9. A. Bernstein, M. de V. Roberts,T. Arbuckle and M. A. Belsky, A ChessPlaying Program for the IBM 704, WesternJoint Computer ConferenceProceedings,Los Angeles, AIEE, New York, pp. 157159, 1958. 10. B. Mittman, A Brief History of Computer Chess Tournaments: 1970-1975,in P. Frey (ed.),ChessSkill in Man and Machine, lst ed., Springer-Verlag,New York, pp. 1-33, 1977. 11. A. Kotok, A ChessPlaying Program for the IBM 7090,B.S. Thesis, MIT, AI Project Memo 4L, Computation Center, Cambridg", MA, L962. 12. G. M. Adelson-Velskii,V.L. Arlazarov,A. R. Bitman, A. A. Zhwotovskii, and A. V. Uskov, Programming a Computer to Play Chess,Russian Math. Surueys,Vol. 25, Cleaver-HumePress,London, pp. 22I-262 (1970). (Translation of Proc. lst summer school Math. Prog. Vol. 2, 1969,pp. 216-252). 13. J. E. Hayes and D. N. L. Levy, The World ComputerChessChampionship, Edinburgh University Press, Edinburgh, 1976. L4. D. E. Welsh and B.Baczynskyj, ComputerChessII, W. C. Brown, Dubuque, IA, 1985. 15. H. J. van den Herik, Computerschaak,Schaakwereld en Kunstmatige Intelligentie, Ph.D. Thesis, TechnischeHogeschoolDelft, Academic Service,'s-Gravenh&ga,The Netherlands, 1983. 16. R. D. Greenblatt, D. E. Eastlake III and S. D. Crocker, The Greenblatt ChessProgram, Fall Joint Computing ConferenceProceedings31, San Francisco,ACM, New York, pp. 801-810, 1967. L7. I. J. Good,A Five-Year Plan for Automatic Chess,in E. Dale and D. Michie (eds.), Machine Intelligence,Vol. 2, Elsevier, New York, pp. 89-118, 1968. 18. M. M. Newborn, Computer Chess, Academic Press, New York, L975. 19. T. R. Truscott, Techniques used in Minimax Game-Playing Programs, M.S. Thesis, Duke University, Durham NC, April 1981. 20. P. W. Frey (ed.), Chess Skill in Man and Machine, 2nd €d., Springer-Verlag,New York, 1983. 2I. R. M. Hyatt, A. E. Gower, and H. L. Nelson, Cray Bhtz, in D. Beal (ed.),Aduancesin Computer Chess,Vol. 4, Pergamon Press,Oxford, pp. 8-18, 1985. 22. D. Levy and M. Newborn, More Chessand Computers,2nd ed., Computer SciencePress,Rockville, MD, 1981. 23. A. E. Elo, The Rating of Chessplayers,Past and Present, Arco Publishirg, New York, 1978. 24. D. Kopec and I. Bratko, The Bratko-Kopec Experiment: A Comparison of Human and Computer Performance in Chess, in M. Clarke (ed.), Aduances in Computer Chess, Vol. 3, Pergamon Press,Oxford, pp. 57-72, L982. 25. K. Thompson,Computer ChessStrength, in M. Clarke (ed.),Aduancesin Computer Chess,Vol. 3, Pergamon Press, Oxford, pp. 55-56, 1982. 26. D. Michie, "Chess with computers," Interdisc. Sci. Reu. 5(3), 2L5-227 (1980). 27. A. L., Samuel,"Somestudiesin machine learning using the game of checkers,"IBM J. Res.Deu.3,2L0-229 (1959).[Also in Computers and Thought, E.Feigenbaum and J. Feldman (eds.),McGraw-Hill, New York, 1963,pp. 71-105.1 28. A. G. Bell, GamesPlaying with Computers,Allen & Unwin, London, L972. 29. A. Newell, J. C. Shaw and H. A. Simon, "Chessplaying programs and the problem of complexity,"IBM J. Res-Deu. 4(2),320-335 (1958). [Also in Computersand Thought, E.Feigenbaum and J. Feldman (eds.),McGraw-Hill, New York, 1963,pp. 39-701.
30. A. L. Brudno, "Bounds and valuations for abridging the search of estimat€s," Probl. Cybern 10, 225-24I (1963). (Translation of Russian original in Problemy Kibernetiki, Vol. 10, May 1963,pp. 141-150.) 31. D. E. Knuth and R. W. Moore, "An analysis of alpha-betaprunirg," Artif. Intell. 6@),293-326 (1975). 32. J. P. Fishburn, Analysis of Speedup in Distributed Algorithms, UMI ResearchPress, Ann Arbor, MI, 1984. 33. H. J. Berliner, Some Necessary Conditions for a Master Chess Program, Proceedingsof the Third International Joint Conference on Artificial Intelligence,Stanford, CA, pp. 77-85, 1973. 34. H. J. Berliner, Chess as Problem Solving: The Development of a Tactics Analyzer, Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, March I974. 35. J. J. Gillogly, PerformanceAnalysis of the TechnologyChessPro9r8fr, Technical Report CMU-CS-78-189, Computer Science, Carnegie-Mellon University, Pittsburgh PA, March 1978. 36. D. J. Slate and L. R. Atkin, CHESS 4.5-The Northwestern University Chess Program, in P. Frey (ed.), ChessSkill in Man and Machine, 1st ed., Springer-Verlag,New York, pp. 82-118, L977. 37. T. A. Marsland, Relative Efficiency of Alpha-Beta Implementations, Proceedings of the Eighth International Joint Conferenceon Artificial Intelligence, Karsruhe, FRG, pp. 763-766 (August 1983). 38. J. Pearl, "Asymptotic properties of minimax trees and game searchingprocedures,"Artif. Intell. l4(2), 113-138 (1980). 39. A. Reinefeld,J. Schaefferand T. A. Marsland, Information Acquisition in Minimal Window Search,Proceedingsof the Ninth International Joint Conferenceon Artificial Intelligence, Los Angeles, pp. 1040-1043(August 1985). 40. P. W. Frey, An Introduction to Computer Chess,in P. Frey (ed.), ChessSkiII in Man and Machine, Springer-Verlug, New York, pp. 54-81 , 1977. 4I. J. R. Slagle, Artiftcial Intelligence: The Heuristic Programming Approach, McGraw-Hill, New York, 1971. 42. J. A. Birmingham and P. Kent, Tree-searchingand Tree-pruning Techniques,in M. Clarke (ed.),Aduancesin ComputerChess,Vol. 1, Edinburgh University Press,Edinburgh, pp. 89-107, L977. 43. J. J. Gillogly, "The technology chess program," Artif. Intell., 3(1-4), L45-L63 (1972). 44. J. Schaeffer,"The history heuristic," ICCA J., 6(3), 16-19 (1983). 45. H. Kaindl, Dynamic Control of the QuiescenceSearch in Computer Chess,in R. Trappl (ed.),Cyberneticsand SystemsResearch, North-Holland, Amsterdam, pp. 973-977 , L982. 46. D. Spracklen and K. Spracklen, An Exchange Evaluator for Computer Chess,Byte, L6-28 (November 1978). 47. L. R. Harris, The Heuristic Search and the Game of Chess,Proceedingsof the Fourth International Joint Conferenceon Artificial Intelligence,Tbilisi, Georgia,pp. 334-339, L975. 48. T. A. Marsland and M. Campbell, "Parallel search of strongly orderedgame trees," Comput.Suru., L4(4),533-551(1982). 49. A. L. Zobrist, A Hashing Method with Applications for Game Playing, Technical Report 88, Computer SciencesDepartment, University of Wisconsin,Madison WI, April 1970. 50. H. L. Nelson, "Hash tables in Cray Blitz," ICCA J., 8(1), 3-13 (1985). 51. S. G. Akl and M. M. Newborn, The Principal Continuation and the Killer Heuristrc, 7977ACM Annual ConferenceProceedings,Seattle, OctoberL977,ACM, New York, pp. 466-473, L977. 52. P. W. Frey and L. R. Atkin, Creating a Chess PlaY€r, in B. L. Liffick (ed.),The BYTE Book of Pascal,Znd. ed., BYTE/McGrawHill, Peterborough,NH, pp. 107-155, L979.
MANUFACTURING COMPUTER-INTECRATED 53. A. G. Bell, "Algorithm 50: How to program a computer to play legal chess,"Comput. J., l3(2),208-219 (1970). 54. S. M. Cracrafb,"Bitmap move generationin chess,"ICCA J.,7(3), 146-r52 (1984). 55. J. Moussouris, J. Holloway, and R. Greenblatt, CHEOPS: A Chess-OrientedProcessingSystem, in J. Hayes, D. Michie and L. Michulich (eds.),Machine Intelligence,Vol. 9, Ellis Horwood,Chichester,pp. 351-360, L979. 56. M. Newborn, A Parallel SearchChessProgram, Proceedingsof the Denuer,ACM, New York, pp. 272-277, ACM Annual Conference, 1985. 57. J. H. Condon and K. Thompson, Belle Chess Hardware, in M. Clarke (ed.), Aduances in Computer Chess, Vol. 3, Pergamon Press, Oxford, pp. 45-54, t982. 58. J. H. Condonand K. Thompson,Belle, in P. Frey (ed.;,ChessSkill in Man and Machine, 2nd ed., Springer-Verlag, New York, pp. 20L-2L0, 1983. 59. C. Ebeling and A. Palay, The Design and Implementation of a VLSI ChessMove Generator,EleuenthAnnual International Symposium on Computer Architecture, Ann Arbor, MI, June 1984, IEEE, New York, pp.74-80, 1984.
171
IEEE Trans. Pattern AnaI. Mach. Intell., 7(4), 442-452 (July 1985). 76. T. A. Marsland, M. Olafsson, and J. Schaeffer, Multiprocessor Tree-SearchExperiments, in D. Beal (ed.),Aduancesin Computer Chess,Vol. 4, PergamonPress,Oxford, pp. 37-51 (1985). T. A. MenslnNn Universitv of Alberta
NTEGRATED MANU FACTURING COMPUTER-I
Computer-integrated manufacturing (CIM) is, basically, the technology that embracesthe full range of the unique ability possessedby the digital computer and related computer technolory that greatly enhancesthe capabilities ofthe entire manufacturing process.That ability has three main elements.The first of these is the ability of the computer to provide on-line, variable-program (flexible) automation of manufacturing activities and equipment. The secondis its ability to provide on60. G. M. Adelson-Velsky,V. L. Arlazarov, and M. V. Donskoy, Algoline, moment-by-moment optimization of manufacturing acrithms of Adaptive Search, in J. Hayes, D. Michie and L. Michutivities and operations.With respectto both of these elements, lich (eds.),Machine Intelligence,Vol. 9, Ellis Horwood,Chichester, it should be noted that the computer has the ability to accomU.K., pp. 373-384, 1979. plish such not only with the "hard" componentsof manufactur61. H. Horacek, "Knowledge-basedmove selection and evaluation to ing (e.g., the manufacturing machinery and equipment) but guide the search in chess pawn endings," ICCA J., 6(3), 20-37 also with the "soft" componentsof manufacturing (the infor(1983). mation flow, the handling of databases,etc.). However, as is 62. T. Strohlein, Untersuchungen uber Kombinatorische Speile, Doc- becoming more and more widely recognized,the third element toral Thesis, Technischen Hochschule Munchen, Munich, FRG, of the computer's unique ability is, by far, the most important January 1970. and powerful of the three. This is its ability to integrate all of 63. M. A. Bramer and M. R. B. Clarke, "A model for the representathe various constituents of the entire manufacturing process tion of pattern-knowledge for the endgame in chess,"Int. J. Maninto a system-a system that can, because of the first two Machine Stud., 11, 635-649 (1979). elements discussedabove,be flexibly automated and moment64. I. Bratko and D. Michie, A Representationfor Pattern-Knowledge in Chess Endgames, in M. Clarke (ed.;, Aduances in Computer by-moment optimized as a whole. This powerful ability of the computer to function as a systemstool therefore results, in the ChessVol 2, Edinburgh University Press, Edinburgh, pp. 31-56, caseof its application to manufacturing, in what is called the 1980. CIM system (1). 65. H. J. van den Herik and I. S. Herschberg,"The construction of an The CIM system is a closed-loopfeedback system in which (1985). omniscient endgamedatabase,"ICCA J.,8(2), 66-87 prime inputs are product requirements (needs)and prodthe 66. K. Thompson,Private Communication, Bell Laboratories,Murray uct concepts (creativity) and the prime outputs are finished Hill, NJ, July 1985. 67. M. Campbell, A Chess Program that Chunks, Proceedingsof the products (fully assembled,inspected, and ready for use). It is Third National Conferenceon Artificial Intelligence, Washington, comprisedof a combination of software and hardware, the elements of which include product design (for production), proD.C., pp. 49-53, August 1983. 68. H. Berliner and M. Campbell, "IJsing chunking to solve chess duction planning (programming), production control (feedpawn endgam€s,"Artif. Intell.,23(1), 97 -t20 (1984). back, supervisory, and adaptive optimizrng)t production equipment (including machine tools), and production pro69. R. Fine, Basic ChessEndings, David McKay, New York, 1941. 70. E. W. Kozdrowicki and D. W. Cooper,COKO III: "The Cooper- cesses(removal, formirg, and consolidative).It is amenableto being realized by application of systems engineering and has Kozdrowicki chessprogram," Int. J. Man-Machine Stud., 6, 627the potential of being fully automated by means of versatile 699 (Le74). 7I. D. Wilkins, Using ChessKnowledge to ReduceSpeed,in P. Frey automation and of being made fully self-optimizing (adap(ed.),ChessSkill in Man and Machine,Znd ed., Springer-Verlag, tively optimiztng)i the present major resourcesfor accomplishNew York, pp. 21L-242, 1983. ing this are the computer-related technologies. The general conceptof this system is shown in Figure 1. In 72. J. Pitrat, "A chesscombination program which usesplans," Artif. Intell., 8(3), 275-321 (1977). this characterization of the system five main elements are 73. J. Schaeffer, Experiments in Search and Knowledge, Ph.D. The- shown, represented by the five boxes. There is nothing hard and fast about this particular characterization of the elements sis, IJniversity of Waterloo, Waterloo, Canada, May 1986. 74. M. Newborn, OSTRICH/P-A Parallel Search Chess Program, of the manufacturing system. The important concept to recogTechnical Report, SOCS 82.3, Computer Science,McGill Univer- nize is that all the types of activities, equipment,and processes sity, Montreal, Canada, March L982. represented by the terms in the boxes are, and must be, an 75. T. A. Marsland and F. Popowich, "Parallel game-tree search," integral part of any manufacturing system that is to be auto-
172
MANUFACTURING COMPUTER.INTEGRATED
mated, optimrzed, and integrated by applying the computer to these tasks if the full benefits of CIM are to be realized. The secondpoint to note in Figure 1 is that the CIM system is a closed-loopsystem. In other words, data and information relative to what is happening downstream in the system must be fed back upstream constantly and in real time in order to continuously condition the operations and activities going on there. Without such feedback,on line, real-time optimization and integrated, coordinated, flexible automation becomeimpossible. Two of the more critical feedback loops (labeled Cost and capabilities and Performance) are included in the figure to illustrate the general nature of the two types of data and information that must be fed back to provide overall flexible automation and real-time optimization. Obviously, all data and information originating within any of the elements of the system must be able to be fed either forward or back to any of the other elements of the system where it is required. This generic concept of the CIM system provides guidance for the ongoing development and implementation of full computer-automated and computer-optimized manufacturingcollectively called CIM. It should be recognrzed,however,that as yet full computer-integrated manufacturing has not been reahzed in practice anywhere in the world. Although at this stage it has been possibleto integrate someparts of the system with each other, the technology is not yet sufficiently advanced to accomplish overall closed-loopintegration of the total system from conceptualdesign of the product to its delivery in finished, ready-to-useform. In particular, the greatest difficulty is being experiencedin accomplishingclosed-loopintegration of the engineering design of the product with the remainder of the system.
examplesof these will provide a flavor of the developingpossibilities. In the field of computer-aideddesign (CAD) (qt) for production, Gero Q) has developed methodology for modeling both single objects and assembliesof objects by use of knowledge engineering techniques of first-order predicate logic (qv) implemented via PROLOG (see Logic programming). Even though a restricted domain has been used in his initial work, it makes evident the uniformity and power of the approach. The field of production planning (seePlanning) has seenthe greatest activity so far, with most of that directed to processand-operationsplanning. Darbyshire and Davies (3) have under development a hybrid expeftlalgorithmic processplanning system for turned parts called EXCAP. Initially, it combined an AL/X-like (later replacedwith a PROLOG-like)expert system with a recursive planning algorithm. This already has shown considerablepromise for realization of truly generative processplanning. Miladii and Kalajdlie (q have developedthe underlying theory of the utilization of expert systemstechnology in process planning, considering it to be the functional basis for long-term logical structuring of overall manufacturing process design. Triouleyre (5) has developed an expertplanning for systems-typeapproach to process-and-operations forming and welding operations. It takes into account the structure of the data concerning process,product, and processirg, thus enabling decisionsto be arrived at readily concerning both the choice of processand the detailed operations required. Further, the rules contained in the knowledgebaseare found to be an efficient aid in the design of the product to determine its easeof production. Zdeblick and Barkocy (6) are evolving an intelligent module for detailed operations planning for machining operations that, as it matures, will find its way onto the production equipment itself through the medium Roleof Al in CIM of intelligent control systems.There it can perform such funcThe fact that full CIM has not yet been reahzed in practice tions as making tool selections,cut selections,and speedand anywhere in the world is in large part due to the fact that the feed decisionsin real time just prior to actual machining of a workpiece. Preiss and Kaplansky (7), by encodingknowledge CIM system is not yet an intelligent system. At this stage AI in the form of expert systems (qv) technology is beginning to of the milling processinto a computer program using princibe developedand experimentally applied to certain elements ples of AI, have produced a system that automatically writes of the system.As y€t, none of these developmentsand experi- part progTams to mill successfully 2*-dimension parts on a mental applications appear to have been fully reducedto prac- three-axis numerically controlled milling machine. Finally, tice. However, they are exhibiting considerablepromise.Some Iwata and Sugimura (8) have developed,in prototype form, a
P E R F O R M A N(CCEA M )
PRODUCT Dtsl0t{
Gon PRODUCII
(cAD)
PR,ODUGTIOT PR0DUCTl0ll I0 il PRoDUCT rQutPiltilT O ON T R O L PtAilil mG (rl{ct-uDlll0 (PROGR,Af$- [f TIDBAOK, t AcHltt $UPTRYISOR ri tilG) T00 Ls) AD A PITV T lIe) 0PTttfilz
cosr
-Ail
PRODUCTION PR0Ct$Sts (RtNt0vA[, r0Rilrtc, 0 0 l l s 0 uDATIVT)
l"lTlts D CAPADI
IIEEDS (PRODUOT RtQUrRtlAtxTs) GREATIVITY (PRoDucT00xctPT$) Figure 1. The CIM sYstem.
fliltsHtDPnODucTs (ruttY AsstlsDl,tD/ AN D, I1{SPTCTTD RTADYTOR U$T)
MANUFACTURINC COMPUTER.INTEGRATED
knowledge-based computer-aided process-planning system that determines, from the CAD model of a part, the sequence of machine tools required to produceit. The knowledge baseof the system includes a set of rules describing preferencerelations among the machining processes. The field of production control has considerablyless activity in the application of AI, with much of it devoted either to schedulingor processcontrol. For example,Bourne and Fox (9) have created an AI system, called ISIS, that has been used successfullyto schedulejobs at the shop floor level in a factory. Ouchi, Mibuka, Kouzuki, and Taguchi (10) have developed and implemented an integrated AI system for controlling the sequencing and processing in the robotic assembly of color television sets by L1 robots (seeRobotics).CAD data from a higher level computer is automatically transformed to control the robots. The field of production equipment and production processes has also seenrelatively small activity in the application of AI, with much of it being devotedto monitoring of machine performance and diagnosis of machine malfunctions. For example, Bel, Dubois, Farreny, and Prade (11) have investigated the possibilities for application of AI to satisfying the need for efficient and flexible monitoring systems in fully automated manufacturing systems.They find that AI methodologyis very well suited not only to creating effective monitoring systems capable of dealing with the imprecise terms in which triggering situations are expressedbut also to detection of unpredicted events, the specification of error recovery strategies, and the planning of job input sequences.Bourne and Fox (9) have presenteda rule-basedarchitecture, called PDS and written in the Schema Representation Language, for the on-line, real-time diagnosis of malfunctions in machine operations.Diagnosis is based on information acquired from tens to hundreds of sensors,which is analyzed to gracefully account for sensor degradation over time as well as spurious readings. The total system of manufacturing is also now being anaIyzed to define the role that AI can be expectedto play. Merchant (L2) has analyzed existing Delphi-type technological forecasts on the future of manufacturing to determine their implications for utilizatton of AI in manufacturing systems. He identified three main thrusts for the future. The first of these is expectedto be toward the application of AI to accomplish full utilization of the product definition databasegenerated by CAD as the primary sourcefor automatic generation of all the information required throughout the rest of the system of manufacturing. The secondmain thrust is expected to be toward the application of AI, in conjunction with pattern recognition (qt) techniques, to accomplish full automation of all production activities carried on throughout the system of manufacturing. The third main thrust is expected to be toward application of AI to accomplishoverall on-line adaptive optimization of advancedmanufacturing systemsand their components. Hatvany (13) has been conducting researchon appropriate approachesto the architecture of overall manufacturing systems that are conducive to maximum effectiveness. As a result, he has concluded that during the past 30 years the thinking about complex computer-controlledsystemshas been conditioned by the conceptsof hierarchical structures. However, recent advances in distributed computing power and open system architectures (particularly local-area networks) have opened the way for heterarchic structures. He finds
173
therefore that basedon the incomplete and nonalgorithmic architecture specification that ensuesfrom this approach,these systems wilt have to exercise a high degree of local intelligence to cope with unforeseen situations. However, the greatest promise and potential impact of AI for the overall CIM system relates to the fact that the system of manufacturing (despitethe best strivings of the engineering profession to arrive at fully deterministic methodologies)can never be a totally deterministic system. The system must always have interfaces with nondeterministic elements of the real world. These include human beings, who are often far from logical or free of emor in their performance, and the economic,social, and political systems of the world, with all their vagaries. Further, as pointed by Hatvany (14), the system of manufacturing, even within a given manufacturing company, involves such an overwhelming welter of variables, parameters, interactions, activities, flows of material and information, and so on that either a detailed, explicit algorithm available for each solution procedure or all the facts, mathematical relations, and models available in perfect arrangement and complete form for a deterministic (and unique) answer can never be found. What are required then, as he indicates, for realtzation of the full potential of CIM are intelligent manufacturing systemscapableof solving, within certain limits, unprecedented,unforeseenproblems on the basis even of incomplete and imprecise information. The technology of AI must advance considerably in capability to carry out the kinds of inference and even intuition that personsnow use to overcomethe problems arising from the nondeterministic nature of the overall manufacturing system before that potential can be significantly realized. As AI technology advances,however, integration of that advancing capability into the CIM system can assure tealtzation of the dramatic improvement of manufacturing productivity and quality that CIM technology can provide.
BIBLIOGRAPHY 1. M. E. Merchant, "The future of batch manufacture,"Philos. Trans.Roy.Soc.Lond. 4275,357-372(L973). 2. J. S. Gero,"Objectmodellingthrough knowledgeengineering," Proc.CIRP Sem.Manufact.Syst.14, 54-62 (1985). 3. I. Darbyshireand B. J. Davies,"EXCAP,an expertsystems'approachto recursiveprocess planning,"Proc.CIHPSem.Manufact. Sysf.14,(1985). 4. V. R. Miladidand M. Kal ajdLi6,"Logicalstructureof manufacturing processdesign: Fundamentals of an expert system for manufacturing processplanning," Proc. CIRP Sem. Manufact. Syst. 14, (1e85). 5. J. Triouleyr€, "Elaboration of expert system knowledge based structure," Proc. CIRP Sem. Manufact. Syst. 14, (1985). 6. W. J. Zdeblick and B. E. Barkocy, "Manufacturing planning evolution with artificial intelligence with applications toward machining operations,"Proceedingsof the PROLAMAT 6th International Conference,Association Frangaise pour la Cyberndtique liconomique et Technique,Paris, pp. 99-108, 1985. 7. K. Preiss and E. Kaplansky, "Automated part programming for CNC milling by artificial intelligence techniques," J. Manufact. Sysf.4, 51-63 (1985). 8. K. Iwata and N. Sugimura, "A knowledge based computer aided process planning system for machine parts," Proc. CIRP Sem. Manufact. Syst. 14, (1985).
174
COMPUTERSYSTEMS
9. D. A. Bourne and M. S. Fox, "Autonomous manufacturing: Automating the job shop," Comput. Mag. 17(9),76-88 (1984). 10. T. Ouchi, M. Mibuka, K. Kouzuki, and K. Taguchi, "The intelligent production control system for color TV assembly process," Proc. CIRP Sem. Manufact. Syst. 14, (1985). 11. G. Bel, D. Dubois, H. Farreny, and H. Prade, "Towards the use of fvzzy rule-based systems in the monitoring of manufacturing processes,"Proceedingsof the PROLAMAT 6th International Conference,AFCET, Paris, pp. 109-119, 1985. L2. M. E. Merchant, "Analysis of existing technologicalforecastspertinent to the utilization of artificial intelligence and pattern recognition techniques in manufacturing engineering," Proc. CIRP Sem. Manufact. Syst.L4, 11-16 (1985). 13. J. Hatv&ry, "Intelligence and cooperation in heterarchic manufacturing systems," Proc. CIRP Sem. Manufact. Sys/. L4, 5-10 (1e85). L4. J. Hatvany, "The efficient use of deficient information," Ann. CIRP 32, 423-425 (1983). M. E. MpncHANr Metcut ResearchAssociates.Inc.
SYSTEMS COMPUTER Computer systems is an area of computer science that addressesthe integrated functioning of computer componentsas a single entity. These componentsinclude hardware, such as processors,memories, peripherals, and communication networks, and software, including operating systems, compilers, communication protocols, and application progTams. This entry discussescomputer systemsdesignedspecifically for AI applications. Artificial intelligence programs contain knowledge,consisting of objectsof someproblem domain, their properties, and relations between them. Further, progTams contain operations on the knowledge in semantic nets, for example, pattern matching, resolution in logic systeffis,and inheritance. Most computers manufactured today are based on a general-purpose von Neumann architecture. The architecture is general purpose in the sense that it may be progTammedto solve a variety of application probleffis,ranging from scientific to business to AI. The basic von Neumann architecture consists of two major parts, a memory and a processor.The memory contains a program and data operated on by the program. The processor constantly fetches and executes instructions from memory. Instructions generally specify an operation and one or more operands,or data. For example, if the operation is addition, the three operandsneededare the memory locations of the two addends and the place to store their sum. The general-purposevon Neumann computer can execute AI programs by mapping the knowledge in the AI program to its linear memory and simulating AI operationsby arithmetic and logic operations. However, this is often costly. The price is complex systemssoftware (compilers,interpreters, the operating system) neededto do the mapping and an execution speed penalty becausethe operations are simulated. There are several reasonswhy the von Neumann architecture is poorly suited to AI applications. First, the fastest von Neumann computers are optimized for fast arithmetic on single floating-point numbers or vectors containing lists of floatittg point numbers. However, this rarely is important to an AI program, which may spendmost of its time manipulating com-
plex data structures, such as lists, graphs, and sets. Further, the content of these data structures may be symbolic, not numeric, requiring rapid comparisonand pattern-matching operations. Another argument against the von Neumann architecture is the need for parallelism in AI. Parallelism is the simultaneous execution of two or more hardware operations,which is faster than performing the operations one after another. Many larger von Neumann computers use a form of parallelism in the processorcalled "pipelining," in which each instruction is decomposedinto several smaller steps. Execution of a sequence of similar instructions may then be overlapped,much as the steps in a factory assembly line are overlapped. This form of parallelism is used in the Symbolics 3600, a LISPbased computer discussedbelow. However, it is generally believed that future technological advancesin single-processorvon Neumann computers are unlikely to produce a computer fast enough to meet the demands of AI applications.For example,Hillis (1) observesthat memory/processordivision of the von Neumann architecture was appropriate for computers manufactured using expensivevacuum tubes for the processorand slower, cheaperdelay lines or storage tubes for the memory. However, today silicon is used to fabricate both memory and processor.Further, the processor occupies 2-3Vo of the silicon area, and the memory occupies most of the remainder. Becauseonly one memory location is active at a time, the bulk of the silicon area is idle most of the time. Finally, Hillis argues that future technologicaladvances that increase the density of circuitry in silicon will only increasethe mismatch in processorand memory power, making the computer less efficient. This mismatch is labeled the "von Neumann bottleneck." This suggests that parallelism, using multiple processorswith smaller memories, can more effectively use the same amount of silicon. The connection machine, which is also discussedbelow, adoptsthis philosophy. A computer system designedspecifically for AI programs is referred to as an Al-based computer system. Its potential advantages over a von Neumann computer are simpler systems software and increased performance through hardware organized specifically for AI programs. Yet it is not a generalpurpose computer, limiting its use largely to AI. Until the last decadethere were no Al-based systems.Artificial intelligence applications were usually written in LISP and run on a von Neumann computer system. Several events have led to the design, and in a few casesthe construction, of AIbased computer systems. Languagessuited to AI deArtificialtntelligenceLanguages. (e.g.,PROLOG)(2). languages logic and veloped,such as LISP These languag€s,unlike FORTRAN, COBOL, PLl1, and Pascal, did not presupposea von Neumann architecture and were not based on assigning values to variables in memory. Implementing these languages raises several questions.First, what computer architecture most naturally executesthem? Second, how can parallelism enhance their performance? Knowledge Representation Knowledge representation refers to the technique by which information or the relation between objectsfrom an application problem domain is represented so that it can be processedby a computer. Several knowledge representation paradigms in AI systemshave been developed:semantic networks, first-order logic, and frames (3). Their implementation raises two questions.First, what memory architectures efficiently store these paradigms? Second,
SYSTEMS 175 COMPUTER what operations on the knowledge, such as inheritance in semantic networks and resolution in first-order logic, should the hardware support? Very LargeSca/e Integration(VLSI)Technology.The emergence of VLSI technolory diminished the cost of fabricating computers becauseone VLSI chip can accommodatean entire processor.Researcherscan now experiment with architectures not based on the von Neumann model. Given the variety of AI languagesand knowledge representation paradigms and the short history of VLSI, no standard computer architectures for AI have emerged. Consequently, the bulk of this entry informs through three examples of AIbased computer systems that emerged from the three events cited above. The examples are 1. a variety of ventures in the Fifth-Generation Project (qv), based on logic languages and knowledge representation methods; 2. the connection machine, which can be configured to reflect the knowledge representation; and 3. the Symbolics 3600 LISP machine, an outgrowth of early attempts to apply VLSI to LISP. Al VersusConventionalPrograms The key issues that arise in the design of an Al-based computer system are summarized in Table 1. This section distinguishes between conventional (e.g., business and scientific) programs and AI programs to motivate these issues.They are also used to unify the discussionof the example architectures. A conventional program has three components:data, control, and a user interface. In contrast, many AI prograhs, such as expert systems,consist of three parts: data, knowledgebase, and control strategy (4). An AI program also has a fourth component,the user interface. The data represent current information during program execution as well as the declarative knowledge of the problem domain, often as semantic networks, frames, or first-order logic. The knowledge base is a set of "pattern-invoked programs" or operators used to reason with the declarative knowledge. The control strategy decideswhich knowledge base operator to apply when more than one are simultaneously applicable.
Table 1. Summary of Issuesin AI-Based Computer Systems Data cornponent Hardware support for knowledge representation (e.g., semantic nets, frames, and first-order logic) Knowledge base cornponent Hardware support for operations (eg, pattern matchirg, unification, resolution, property inheritance) Control strategy cornponent Hardware support for parallelism Human interface cornponent Meeting real-time performance needs AII components Hardware support for storage management Hardware support for dynamic data typing Hardware support for generic operations Memory management Processor scheduling Instruction set level
The data in a single conventional program usually combine many diverse data structures, such as arrays, stacks, and linked lists. However, an AI program represents its data, mostly declarative knowledge, in a single-form-usually semantic networks, frames, or first-order logic. The frequent use of these three structures may justify their implementation using special hardware. This raises the issues of providing special hardware for the efficient representation, access,and modification of semantic networks, frames, and first-order logic. Absent from the control portion of a conventional program are a variety of operations that the knowledge base must perform. For example, it may use pattern matching to determine which pattern-invoked programs or operators to apply. Additionally, each knowledge representation requires a set of operations, such as unification, resolution, and property inheritance. The issue raised by the knowledge base is which operations to complement the arithmetic and logic operations of conventional computers should an Al-based computer system provide in hardware? The control stratery component in an AI system may use one of a variety of techniques, for example, state spacesearch, propagation of constraints, or problem reduction (4). A von Neumann computer executing the control stratery of an AI program must choosea strictly sequential order in which to apply the knowledge base operators. However, an Al-based computer system can improve performanceby applying multiple operators in parallel. Because the control strategies are well defined and the benefit of parallelism is great, it is reasonable to design the architecture to support the control strategy parallelism. This raises the issue of organizing a computer system to allow parallel execution of operations. Several resourcemanagement problems arise. One is scheduling a large set of applicable operators on a smaller set of processors.A secondis minimizing contention of simultaneous operatorsfor the declarative knowledge through multiported memories and replicating the knowledge in multiple memories. The human interface of a conventional program and an AI program may be equally complex; both may require humanoriented input and output through speech,natural language, and pictures. These require additional hardware components, for example, speech synthesis requires voice digitizing and high-resolution graphics. The major issue raised is the design of the user interface to provide in real-time interaction in the form of dialogues. In addition to these issues, there are additional ones common to all four components: storage management, dynamic data typing, memory management, processorscheduling, and the instruction set level. StorageManagement.LISP and PROLOG manage storage automatically. In contrast, the programmar must manage storage in conventional languages such as PL/L and Pascal. The issue raised is how can the hardware support automatic storage management, such as garbage collection. One solution is to use a separate processor to collect garbage in parallel with a secondprocessorrunning a LISP prograffi, as analyzed in Ref. 5. Dynamic Data Typing. LISP and logic languages automatically keep track of data object types and simplify programming by providing generic operations,which work on any data type. In contrast, most commercial von Neumann computers
176
CoMPUTER sYsTEMs
provide typed operators in their instruction repertoires. For example, there may be different operatorsto add floating-point numbers and to add fixed-point numbers. To execute an AI language on a commercial von Neumann computer requires simulation of generic operations by a sequenceof instructions in the object code or interpreter. These select the correct instruction to use based on the current type of the operands. They must also update the type of variables when assigned new values. The issue raised is how the memory and instruction sets can be designed to support dynamic typing and generic operations. The Symbolics 3600, describedbelow, provides generic operations in its instruction set. Scheduling.LISP AI proMemory Management/Processor grams generally have large working sets becausetheir memory-referencing pattern is less predictable than conventional programs. For example, traversing a list scatteredthroughout memory requires accessinga few words from many pages. Much of the early implementation of AI programs were on PDP-10 computers, which have limited memory. Unpredictable referencing patterns reduced the effectiveness of virtualmemory management, so that these systems depended on swapping to allow multiprogramming. Since swapping large working sets is time consumirg, their schedulers gave each processlarge time quantums to reduce memory management overhead.This philosophy contrasts with machines executing business or scientific applications. These programs tend to have working sets small enough to be kept in memory while a processwaits for its turn to use the processor. lnstructionSet Level. Either the instructions are low level, such as in a reduced-instruction-setcomputer (RISC), requiring simple hardware and extensive emulation in software to implement flexibly the operations of the programming langgage, or the instructions are high level, simplifying the software, complicating the hardware, and making the implementation rigid but permitting optimized execution. Because system software can implement AI progTamson a
von Neumann architecture, special AI computer systems are not mandatory. However, an Al-based architecture permits optimi zation, for example, through parallelism, unattainable through software emulation. Given the tremendousprocessing requirements of the most complex AI applications, some implementation in hardware is needed. To support this, Shapiro (6) cites some performance statistics of existing implementations of PROLOG: 120 logical inferencesper secondGIPS), or a Z,80microprocessorrunning Microprolog; 1000-3000 LIPS for a C interpreter on a VAX computer; 25,000 LIPS on a large IBM machine; and 30,000 LIPS, the fastest available today, for compiled codeon a DEC System 2060. One LIPS requires about 100-1000 instructions per secondon conventional architectures. This contrasts with the goals of the JapaneseFifth-Generation Project (described in the next section)of 108-10eLPS. Examplesof Al-BasedComputerSystems As noted earlier, three events motivated the development of Al-based computer systems: AI languages, knowledge representation, and VLSI. This section discusseshow three examples emerging from these events addressthe issuesof Table 1. The discussionstarts with a commercial product and endswith research machines: the Symbolics 3600 LISP machine, the connection machine, and the Fifth-Generation Project. Several other Al-based computer systemsexist commercially or in researchlabs (seeTable 2). Symbolics3600. The Symbolics 3600 is the most recent LISP machine producedby Symbolics,Inc. (7). The 3600 is an outgrowth of the MIT Laboratory LISP Machine Project started in L974. Two machines, called CONS, in 1976, and CADR, in 1978, were developedat MIT. Symbolics,Inc. refined the CADR into a commercial product and introduced it in 1981 as the LM-2. Its successoris the 3600. In contrast to the next two examples, the 3600 uses a von Neumann architecture, with extensive hardware support for
Table 2. AI-Based Computer Systems Source
Name
LISP Symbolics3600 (7) LAMBDA family Scheme79 (8) ALPHA (9) EM3 (10) Logic languages PROLOG processor(11-13) Personal-SequentialInference_Machine( 14,lb) Parallel-InferenceMachine (15) PRISM on ZMob (16,17) Production systems DADO (18) Production system machine (19) Application structure Connection machine (1,20) Logic knowledge base DELTA QI) GRACE (22)
Symbolics, Inc., Cambridge, MA LISP Machines, Inc., Los Angeles, CA Massachusetts Institute of Technology (MIT) Fujitsu Laboratories, Ltd., Kawasaki, Japan Electrotechnical Laboratory, Ibaraki, Japan SRI International, Menlo Park, CA Institute for New Generation Computing Technology (ICOT), TokYo, JaPan University University
of Tokyo of Maryland
Columbia University Carnegie Mellon UniversitY (CMU) Thinking
Machines Corp., Cambridge, MA
ICOT, Tokyo, Japan University of Tokyo
COMPUTERSYSTEMS
LISP. AII software for the 3600 is written in a dialect of LISP a called Zetalisp. Even though the LISP codeis compiled into is available' machine .ode, normally no assembly language This discussion of the 36-bit processoris divided into two parts: LISP features and performance features. The first describes four architecture aspects that reflect the LISP language: tagged words for run time type checking, compact list 2. Normal list representation storage, generic operations, and the instruction set. The sec- Figure pointers. using overcome to designed aspects ond part describesarchitecture the Lxecution time inefficiencies associated with a weakly Three major performance bottlenecks the 3600 addresses typed language: parallelism, buffered stacks, and the stacklike architecture. are due to garbage collection, run time typing, and the von Becausethe data types of LISP objects cannot usually be Neumann bottteneck of limited memory bandwidth described determined at compile time, the type of an object is traditionabove. ally stored in a descriptor associated with the object and upThe first two are addressedthrough hardware parallelism. dated during execution. To save storage space,increaseexecu- The 3600 performs the following operations in parallel: fetchtion speedthrough reduced memory fetches, and simplify the ing instructions, decoding instructions, executing instruccompiled code, each word processedis associatedwith a tag tions, checking the data type of operands,supporting garbage field-.The tag identifies the work as one of 34 types, such as collection, and tagging the data type of the result. symbols, "cons" cells, or arrays. The four memory word forThe limited memory bandwidth is addressedby two means: mats are shown in Figure 1. The first two bits of the data type Reducing Memory Fetches.This is done in three ways. First, (or tab) field identify the word as containing a 32-bit immedi- each memory word contains two 17-bit instructions. Second,a ate fixed-point or floating-point number or indicate that the tag rather than a separate memory word stores specific data next four bits are further data type bits followed by an object types. Third, the compact list representation previously deaddress. scribed is used. Stack. The two high-order CDR code bits are used to compactly UsedMemory Wordsin High-Speed StoringFrequently store lists. They encodethe values "norm aI:' "next," and "nil'" Conventional computers often use a small, expensive, highIf the CDR codeis "next," the CDR of the list is the next word speed memory, called a cache, to store frequently referenced in memory. This savesspacefor representing its address.The instructions or data. However, LISP programs often have an final non-nil element in a list has its CDR-codeset to "nil" to irregular referencepattern, making a cacheless effective.The indicate that its CDR field is nil without using an extra word. 3600 usesthe recursive nature of LISP as the basis for predictFigures 2 and 3 compare this representation to the normal ing memory referencing patterns. It uses two 1024-wordhardone. The processorhardware is designedto operate efficiently ware stack buffers as a high-speedcachecontaining the top of on these lists. the LISP control stack. The tags also simplify another aspect of the architecture: Table 3 summarizes aspectsof the 3600 architecture that generic operations. A generic operation works on an operand addressthe issues describedin Table 1. of any data type. Conventional von Neumann computers, in contrast, require a set of instructions each performing the ConnectionMachine. The other two computer system examsame operation but for a variety of data types. For example, ples (the connection machine and the Fifth-Generation Projthe 3600 has a single generic-addinstruction that works on a ect) reflect the programming language used: LISP or logic variety of operands.Its execution involves simultaneously per- Ianguages. The connection machine, however, reflects the forming an integer add and checking the data type by hard- knowledge representation used. It provides tens of thousands ware. If the operands are not integer, & trap to microcodeoc- to millions of processorswith a programmable interconnection curs to perform an addition on less frequently used data types. topology that may be configured to match the application proThe hardware also checks for overflow, which generates an- gram structure. other trap, and tags the results with the proper data type. The The architecture is suited to applications that display a advantages are compact code,becauseonly one instruction is natural form of parallelism. Such applications consistprimarrequired, and a performance improvement from simultaneous ily of a large number of elements that function similarly and operations. communicate with one another through somespecialinterconThe instruction set is another aspect that intimately re- nection topology. For example, in a machine vision application flects LISP. For example, three classes of instructions are the elements are pixels, and the interconnection topology is a predicates, containing €g, not, fi*p, floatp, symbolP, and arrayp; list and symbol instructions, containing ear, cdr, and rplaca; and array instructions, containing array-leader and store-array-leader. A CDRnext B C D Rn e x t c C D Rn i l 2232
28 C D Rtyp e I
d a ta typ e
I
Pointer
FigUre 1. Two formats of Symbolics 3600 memory words.
Figure 3. Compact list representation of the Symbolics 3600, which uses sequential memory locations and two special bit patterns (CDR next and CDR nil) in the high-order two bits.
178
COMPUTERSYSTEMS
Table 3. How the symbolics BG00Addresses the Issues Data component: Hardware supports storage of LISp lists by reserving two high-order bits (seeFiS. 1) for use in a compact list representation (seeFigs. 2 and B). In the representation, lists are stored in sequential memory locations to avoid the need for pointers. Knowledge base component: No explicit support. Control strategy component: No explicit support. Human interface component: High-resolution terminal and a mouse-pointing device are provided. All components Storage management: Hardware provides support for garbage collection in parallel with other operations. Dynamic data typing: Two or six bits in memory words are reserved for a tag to denote 1 of 34 types so that the hardware can distinguish, e.g., strings from complex numbers (seeFig. 1). Generic operations: All instructions are generic, working on all appropriate data types. At run time the type of instruction operand is determined. Parallelism in data type checking, instruction execution, and result tagging reducesperformance penalty. Memory management: Three cachesare used to avoid a memory bottleneck. One is for instructions and the others are for control stacks. The virtual memory mechanism keeps the local environment of active processesin the stack caches automatically. Instruction set level: The instructions reflect LISP operations.For example there are instructions for the predicateseq and not as well as the list operations car and cdr.
grid. In a VLSI simulation the elements are transistors, and the interconnection topolory represents the wires connecting the transistors. The connection machine represents each application problem cell by a single-bit processorand logically connects the processorto other processorsin a manner matching the application topology. The connection machine was proposedin 1981 at the MIT AI Laboratory (20). The connection machine describedin Ref. 1 and discussedhere is now being manufactured by Thinking Machines, Inc. of Cambridge, Massachusetts.The prototype consistsof 65,536 physical processors,each with 4096 bits of memory. These are physically connected in a "Boolean 16 cube," which is based on the fact that 16 bits are needed to address65,536 processors.This interconnectionhas the property that a messagesent from one processorreachesany other processorwithin 16 steps.Each processoris connectedto the 16 other processorswhose addressesin binary differ in 1 of the 16 bits. However, neither the number of physical processorsnor their physical interconnection limits the application program size the machine can handle. A program may reconfigure the machine by specifying how many virtual processorsare to be mapped to each physical processor. Furthermore, although physical processorsare physically connectedto only 16 other processors,every processorcan communicate with any other processorthrough routers. Each router is hardware responsible for receiving messagesfrom a processoror another router and forwarding the messageto the destination processor to a
C o nn e c t i o n M a c hi n e
Figure 4. Connectionmachineorganization.
router closer to the destination processor.Thus, a program may establish a logical interconnection among processors which matches the way the elements of the application program are connected. The overall machine organtzation is shown in Figure 4, The two major parts are the connectionmachine computer and the front end. The front end, which is connected to a disk and terminal, is either a Symbolics 3600 or a Digital VAX computer. The front end provides an operating system and user interface. The application program is also stored in the front end. Programs are written in extensions of LISP or C, called CM LISP and C*, respectively. Table 4. How the Connection Machine Addressesthe Issues Data component: Connection machine processorsare logically connectedto match the natural structure of the application. The hardware offers support for a variety of data structures, including three representations of sets as well as trees, strings, arrays, matrices, and graphs. Knowledge base cornponent: No explicit support. Control strategy cornponent: Each operation normally applies to all data in the connection machine in parallel. However, each processwill conditionally execute an instruction depending on the internal state of one of its flags. H uman interface cornponent: Not applicable, since the user accessesthe connectionmachine through a front end, which is a solitary Symbolics 3600 or VAX computer. AII components Storage management: Several storage allocation mechanismsare provided to allocate an idle processor.Among these is Free List allocation, which maintains a list of free processors.A much slower method is Waves, in which a processorbroadcastsa request to have any unused processorsend its addressback. In this way a processorcan find an idle processorthat is physically close to it, thereby shortening the communication time. Generic operations: All instructions operate on the entire network and thus apply to whatever data is in all processors. Memory management: Almost no management is done. Each physical processorhas 4096 bits of memory, part of which is a stack area. Processorscheduling: Several virtual processorsare simulated by each physical processor. Instruction set level: The prototype provides flexibility in definition of the instruction set becauseinstructions from the front end are expandedinto nano instructions. Processors receiving the nano instructions implement all possible 256 Boolean operations on the three bits of data they operate on.
COMPUTERSYSTEMS
179
Table 5. Fifth Generation Research Projects
Languages KLO (14,15)
KLl (14,15) PARALOG (14) Machines Personal sequential inference machine GSI) (14,15)
Description
Topic
Name Sequential
machine language
Parallel machine language Parallel logic language Inference machine
Parallel inference machine (PIm (15)
Inference machine
DELTA (zT)
Knowledge base
GRACE Q2)
Knowledge base
A program executesin the following manner. First the connection machine is configured, that is, the program specifies the number of virtual processorsit needs.Next it specifiesthe initial state of each virtual processor,consisting of two things: pointers to the processorsit is connectedto and whatever data the processorneeds.Next the front end executeseach instruction in the program. Instructions are either serial or parallel; serial instructions are performed by the front end, and parallel instructions are passed to the connection machine microcontroller. The microcontroller expands each instruction into a sequenceof "nano instructions," which it broadcaststo all processorsin parallel. Parallel instructions tell each processor either to compute locally or to pass information to another processor. Each processor functions by reading two single-bit operands from its 4096-bit memory and one bit of its internal flag register. Its arithmetic logic unit generatestwo bits; one overwrites an operand, and the other overwrites one bit of its flag. The connection machine hardware is visible to the CM LISP programmer in two means. First, data to be operatedon in parallel in the connectionmachine are stored in a new data
The Kernel language is based on PROLOG and contains operating system primitives' such as multiprocess control, interrupt handling, and input-output control. A parallel version of KL0. Based on PROLOG. This initial machine will be used by researchers while they develop future machines. It is a single-user machine supporting unification, resolution, and performance measurement in hardware with a performancegoal of 20,000-30,000 LIPS. Hardware supports OR, AND, and unification parpllelism. In OR parallelism several machines execute multiple statements with the same goal. AND parallelism tries to achieve multiple subgoalsof a statement in parallel, presenting difficulties becausea consistent choice of arguments must be made to the dependent subgoals.Unification is the processof obtaining consistent instantiations of variables in multiple subgoals.Parallelism in unification involves generating several different instantiations at the same time. This connects to sequential inference machines via a local area network or shared memory. Its three main subsystemsare two-level memory, with moving and solid-state disks; relational database;and control processor, controlling concurrent transactions and interfacing database to inference machines. This is a relational algebra machine. A major research problem is using hardware to perform joins efficiently.
structure, called the xector. Second,two new program annotations, denoted a and B, are used to convert LISP functions to parallel operations in CM LISP. Overall, the connection machine has a raw computing power of a billion (10e)instructions per secondand a message routing speedof 3 billion bits per second. Table 4 summarizes how the connectionmachine addresses the issues of Table 1. Fifth-Generation Project. The Japanese Fifth-Generation computer system project is developing an architecture suited to logic programming. Warren describesthe background of the Fifth-Generation Project in Ref. 23 (seealso Ref. 24). The Fifth-Generation Project assumes that knowledgebased systems will be the important application area of the 1990s (25), in contrast to the evolution of distributed systems containing heterogeneousprocessorsand cooperating,distribTo build a knowledge-basedsystem,researchis uted processes. being conductedin three areas (14): problem solving and inference machines, knowledge base management, and intelligentuser interfaces.
180
COMPUTERSYSTEMS
Table 6. How the Fifth-Generation the Issues
Project Addresses
Data component: Designing relational databasemachines (i.e., DELTA, GRACE) based on logic. Knowledge base carnponent: Implementing inference (i.e., PSI, PIM) machines basedon various logic languages. Hardware supports unification and resolution. Control strategy cornponent: The parallel inference machine supports AND and OR parallelism in hardware. Human interface component: A major objective is to allow human interaction in the form of natural language, speech,and pictures. AII components Instruction set level: Machine languages KLO and KLl are based on logic. Other issues:Insufficient details are available.
The researchhas severalgoals (6). The most ambitious is to build in the 1990san inference machine with up to 1000 processingelements to function at a rate of 108-10eLIPS. The knowledge-basedmachine is planned to have a storage capacity of 1011-1012 bytes. The intelligent-user interface goal is a 10,000-word vocabulary, 2000 grammar rules, 997oaccuracy in syntactic recognition of natural languag€s, and a speech recognition system with 50,000Japanesewords and 95Vorecognition. Finally, the programming language, data description, and query language will use predicate logic (qr). Table 5 contains a summary of several language and architecture research activities that comprise the Fifth-Generation Project. Table 6 summarrzes how these activities addressthe issuesof Table 1. Summary
BTBLIOGRAPHY 1. W. D. Hillis, The Connection Machine, MIT Press, Cambridg", MA, 1995. 2. R. Kowalski, Logic for Problem Soluing, Elsevier North Holland, New York, 1979. 3. G. McCalla and N. Cercone, "Guest editors introduction: Approachesto knowledge representation," Compuler 16(10), L2-L8 (1e83). 4. D. S. Nau, "Expert computer systems," Computer 16(2), 63-85 (1983). 5. T. Hickey and J. Cohen, "Performance analysis of on-the-fly garbage collectior," CACM 27(11),1143-1154 (1984). 6. E. Y. Shapiro, "The fifth generation project-a trip report," CACM 26, 637-64r (1983). 7. Symbolics 3600 Technical SumtuaU, Symbolics,Inc., Cambridge, MA, February 1983. 8. G. L. Steele and G. J. Sussman,"Design of a LISP-basedmicroprocessor,"CACM 23(L1),628-644 (1980). 9. H. Hayashi, A. Hattori, and H. Akimoto, ALPHA: A High-Performance LISP Machine Equipped with a New Stack Structure and Garbage Collection System, Proceedingsof the 10th Annual International Symposiumon Comp.Arch., Stockholm,IEEE Computer Societypublication, Silver Spring, MD, pp. 342-348, 1983. 10. Y. Yamaguchi, K. Toda, and T. Yuba, A PerformanceEvaluation of a LISP-Based Data-driven Machine (EM3), Proceedingsof the 10th Annual International Symposium on Comp. Arch., Stockholm, pp. 363-370, 1983. 11. E. Tick, An OverlappedPROLOG Processor,Technical Note 308, SRI Int., Menlo Park, CA, October 1983. L2. E. Tick and D. H. D. Warren, Towards a Pipelined Processor, Proceedings of the International Symposium on Logic Programming,IEEE ComputerSocietypublication, Silver Spring, MD, pp. 29-4L, February 1984. 13. D. H. D. Warren, An Abstract PROLOGInstruction Set,Technical Note 309, SRI Int., Menlo Park, CA, October 1983. L4. T. Moto-oka and H. Stone, "Fifth-generation computer systems:A Japaneseproject," Computer L7, 6-L3 (1984). 15. S. Uchida, Inference Machine, Proceedingsof the 10th Annual International Symposium on Comp. Arch., Stockholm, pp. 4I0416, 1993. 16. S. Kasif, M. Kohli, and J. Minker, PRISM: A Parallel Inference System for Problem Solving, Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence,Karlsruhe, FRG, 1983,pp.544-546. L7. C. Rieger, ZMOB: A Mob of 256 Cooperative Z8}a-Based Microcomputers, Technical Report TR-825, Department of Computer Science,University of Maryland, College Park, MD, L979. 18. S. J. Stolfo and D. E. Shaw, DADO: A Tree-Structured Machine Architecture for Production Systems,in Proceedingsof the Second National Conferenceon Artifi,cial Intelligence, Carnegie-Mellon University and University of Pittsburgh, Pittsburgh, PA, 1982, pp.242-250.
The design of computer systemsspecifically for AI applications is in its infancy. Consequently,almost no theory and just a few systems exist. Their design is motivated by computational needs that are so great they can only be filled by exploiting parallelism at the hardware level (6,20). From the examples discussedhere, several trends become evident. ArchitecEvolutionary yersus Revo/utionary Architectures. tures such as the Symbolics 3600 implement AI languages on a von Neumann model specifically designedfor the language. The alternative is to devise a more revolutionary architecture. The fifth-generation machines and connectionmachine exploit parallelism in logic programs and in the application problem, respectively. Orientation The three examples discussed here are ori19. T. Lehr, The Implementation of a Production System lVrachine, ented to either LISP programs, logic prograhs, or parallelism Technical Report, Carnegie-Mellon University, Pittsburgh, PA, in the application problem. May 1985. New Concepts.The Fifth-Generation Project tries to exploit 20. W. D. HiIIis, The Connection Machine, Artificial Intelligence AND, OR, and unification parallelism in hardware. The SymMemo No. 646, MIT AI Laboratory, Cambridge, MA, September bolics 3600 employs generic operations, a compact list repre1 9 81 . sentation, and parallel garbage collection support. The connec- 2L. K. Murakami et al., A Relational Data Base Machine: First Step tion machine allows the hardware to be configured to match to Knowledge Base Machin e, Proceedingsof the 10th Annual In' the natural structure of the application through massive parternational Symposium on Comp. Arch., Stockholm, pp. 423-425, 1983. allelism and programmable interconnections.
IN EDUCATION:CONCEPTUALISSUES COMPUTERS ZZ. T. Moto-oka, Overview to the Fifth Generation Computer System project , Proceed,ingsof the 1-0thAnnual International Symposium on Comp.Arch., Stockholm,pp. 4L7-422, 1983' Zg. D. H. D. Warren, A View of the Fifth Generation and Its Impact, Technical Note 265, SRI Int., Menolo Park, CA, JuIy L982' 24. K. Sorenson, "Fifth Generation: slow to Rise," Infoworld, 35 (June 9, 1986). 25. P. C. Treleaven, The New Generation of Computer Architecture, Proceedings of the 10th Annual International Symposium on Comp.Arch., Stockholm,pp. 402-409, 1983. A. AcnAwALAand M. AenAMS UniversitY of MarYland
IN EDUCATION: COMPUTERS ISSUES CONCEPTUAL It is futile to publish a factual overview of computersin education in a volume that will be read for more than a few years. The field is now at a watershed. Looking two years backward or forward give views as different as those to the east and west of the Rocky Mountains. A factual report based on data from the past would be obsoletebefore it got into the hands of the readers.On the other hand, an encyclopediais not the place for speculation and prophecy. So instead of choosingbetween obsolescenceand prophecies,what follows are several concepts that wilt help students of the situation follow the rapid changes of scene. Density Some obvious parameters do not get full recognition in the literature. One of these is the density of computersin learning environments. At the time of this writing (1985),averagedensity of computers in grade schoolsin the United States is about 1 for 60 or 70 students (estimates vary). But these machines are not evenly distributed. Roughly L07oof schoolshave no computers. A handful of large city school systemshave more than 1 computer per 30 students-which allows a student to have an average of t h of computer time a week. A handful of individual schoolshave a density sufficient for students to average an hour a day. Only a few experimental schoolshave more than this-the District Three Computer School in New York, the Hennigan School in Boston, and the WICAT School in Utah are prominent examples. Dlnsity is not merely a quantitative factor. Quadrupling the density does not mean four times as much of the same. What can be done with computers at different densities is qualitatively different. Three 10-min periods a week is a significant dose of drill and practice in number facts. It is not a significant period of time for the computer to be used as an instrument for creative writing. No one can create on such a tight schedule. Educationalldeology Differing visions of schoolswith high densities of computers raise the secondconceptual issue: educational ideology.A major line of cleavage separatespeoplewho seeeducation as ide-
181
ally being a highly structured processfrom those who seeit as controlled by tft. learner. This cleavage existed long before computers, tut their presence brings the controversy into much sharper focus. lnstructionversusDevelopment.One side leadsto the "back to basics" movement, with its emphasis on giving instruction in number facts, spelling, and the like. As students become more sophisticated, multiplication tables give way to quadratic una differential equations, and spelling is displacedby Shakespeare,but the emphasis on instruction remains unchangea. fftir philosophy of education places the brunt of re,pottrlbility on the teacher-to organize the material and tiansmit discrete packets of knowledge to the student. The other side shows up under names such as "child-centered education," "open education," and, in the extreme case' "free schools." The emphasis here sees the responsibility of schooling as the encouragementof the individual's overall development. The acquisition of particular factual knowledge is seen as an easy part of the processfor good learners and an impossibteone for poor learners. Here, the teacher'stask is to help students become better learners rather than teaching them that 5 x 7 is 35, not 57. Becoming a goodlearner builds on such factors as self-confidence, believing in oneself, and having the opportunity to work on things that one likes. It is undermined by unpleasant Iearning experiencesthat lead to "hating math" (or speUing) in particular, to hating school in general, and to becoming disgusted with the learning processitself. There are also many misleading "pop theories" about learning. For example, children often say that you learn best by making your mind a blank and saying "5 x 7 - 35" over and over again or by putting on the radio very loud-probably to drown out images of things the child would rather be doing. On the other hand, trying to relate the material you are learning to your interests and to things you already know might be a very good way of learning. There is a whole body of knowledge called mnemonics that suggeststhis might be true. So one can either concentrate on finding out how to get number facts into the headsof children who may be poor learners, or one can concentrateon what can be done to improve the learning ability of students so one doesnot have to take heroic measures to get them to learn the number facts. Structured,Metacognitive,and DevelopmentalPerspectives. The presence of computers has given rise to at least two schoolsof thought about how schoolscan help students become better learners. The first is rooted in ideas about cognitive processesthat stem directly from AI theory and research.In a nutshell, there is a growing tendency to believe that metacognitive knowledge helps one becomea good learner by providing explicit knowledge about the learning process and strategies for learning. The other approach puts a greater emphasis on a learner's self-directedactivities. Here, the goal is to create an appropriate learning environment in which the learner can come to grips with the essential problems and find a personal way of dealing with them, enter into relationships with other people, both teachers and other students, and developa more personal relationship with the knowledge being learned. The contrast between these two schoolsof thought stands
182
COMPUTERS lN EDUcATtoN:coNCEpruALtssuEs
out most clearly when one looks at three approachesto using the computer to improve a student's command of language. In the extreme case of the structured, instructional approach, there are many programs that posequestions about grammar, check the answer, and may also give feedback to the student. The metacognitive approach sees the main problem as the student's ability to structure-a story, for example-so there are programs that provide a framework in which the story can be mapped,drawing upon information about the "grammar" of story plots. The third and most developmental approach offers the student a good word processor.This frees the learner from the arduous-and for very young children, almost impossiblemechanics of writing text by hand. When writing is a laborious and slow process,the first draft is inevitably the final copy, and any corrections are messy ones. Being able to edit and print out clean copy provides the student, perhaps for the first time, with a product that can be looked at with pride, uncontaminated by the overall messinessof inexpert handwriting, scratched-outcomections,and smeared erasures. lnstruction Manuals, Videodisks,and Programs.The structured approach is obviously appropriate in situations that require quick learning of very specificmaterial. Manufacturers of appliances supply books of instruction on how to use their products. These books are expectedto enable people to learn specific facts quickly and reliably. Manufacturers of digital watches are not interested in promoting the general ability of their customers to the point where they could figure out for themselves how the watch works. When the "appliance" is complex-a task or a pieceof software, for example-videodisks are rapidly replacing printed books as the standard for these instruction manuals. The techniques being used to program these systems are still very much in a state of infancy and flux. A number of control, or author, languages do exist. However, it is quite clear that the art of producing such languages is not yet stable enough to warrant a detailed description of them. One class of instructional programs deals with learning how to use computer software: a word processoror accounting system, for example. These tutorial programs typically show something on the screenand invite the learner to manipulate the keyboard in appropriate ways and observewhat happens. They set problems:move the cursor down a paragraph, pick up this sentence,move it to the end, and so on. When a progTamis able to detect failures to carry out its instructions, it may also take appropriate action. In the simplest cases,it can insist that the learner try again; in more complex cases,it can give very elementary advice. Much of what is included explicitly in the schoolcurriculum can be presentedin an instructional form. The curriculum lays down that children should learn the multiplication tables, know how to spell words and to punctuate, learn historical dates, and so on. Instructional programs have been designed for eachof theseareas-as well as many others.They will be no less effective than books,flash cards,or any other sorts of drilland-practice technologies. However, the computer can also be an integral part of the educational process-rather than serving as a computerized instruction manual. Of course, the line is blurry. A student using a word processoras part of a creative writing classmight still require instruction on how to use it and might use the
same program that was designedfor teaching an office worker how to use the latest system. ComputerCulture But computerizedinstruction manuals of this sort are a minor instance of how computers are used in the school.The important issue is how the computer fits into the overall structure of education.The idea of creating an open,high-density learning environment in a schoolraises the third issue,the conceptof a computer culture. In the Computer School in New York, the teacher was explaining to an eighth-grade class the basics of the arrangement of electrons in the structure of atoms. His explanation of how the electrons are distributed among the successiveshells was done by saying "Let's write a computer program that tells the electrons how to move." By getting into the process of writing a computer program, these students found the concept of rules for distribution of the electrons very much more concrete. This instance could happen only becausethe teacher and the students already had experience in writing programs. It was familiar, part of their culture. The processof teaching often means relatin g a new idea to experiencesin a shared culture. The computer presencemeans a new and particularly rich sourceof referencepoints for a large number of otherwise abstract ideas. Thus, the computer can play a role as an aid to instruction even when it is not physically present. The computer in the head can often be a more effective aid to instruction than the computer on the desk. locus of Learning A fourth conceptual issue is the locus of learning-where does learning take place? Certain learning happens in the home before the child comesto school, such as learning to speak at least the colloquial language. Other learning happens traditionally inside the school:reading, writitg, arithmetic, and so on. The presenceof computers in the homes is already influencing the distribution of such learning. Recent articles in The New York Times (1) note that it is possibletoday for somebodywith a home computer to acquire all the credentials-from schooldiplomas to collegedegreesat home. Here the computer serves mainly as a communications link with a centralized source (universities, correspondence schools,data banks, and libraries) as well as among students and teachers via electronic mail. But when computers are used in the home by small children, a far more fundamental change in the kind of learning can also take place. The computer can be a means of exploring the world around them and of learning by experimentationjust as making mud pies enables children to learn about dirt. An example of this arising from developmentsin computer voice synthesizers (see Speechsynthesis) is the "talking box" or "word factory." It allows children to combine symbols on a screen in order to produce spoken words. In one version they put letters together and seewhat sound it makes. So they can experiment. If they do not like what they hear, they can change the letters and try different spellings. It may not matter how close it sounds to "teal" spoken English. The children do not care. They are used to cartoon characters who speak strangely. This sounds like robot lan-
lN EDUCATION:CONCEPTUALISSUES COMPUTERS
guage. That is okay. what they are learning to do experimentalty is develop spellings that work-not necessarily the dictionary versions, but ones that follow phonetic rules for transcribing the soundsof English into written words. In some versions of trrir software, they can ask the computer if this is the dictionary spelling. This is far different from drill and practice that C-A-T spells cat. Here, the computer is channeling the children's interest in alphabetic language into energ"yfor learning. It gives the child the same power that literacy gives adults, the power to make things happen with alphabetic language. Another program now being used in preschoolsalso builds a bridge between something children love to do-telling stories-and mastery of alphabetic language. This program allows them to create cartoons: designing characters, making paths for them to follow, and putting the characters' words or noises on the screen. So young children can "write" part of a story even if their mastery of words is very slight. They can set up the story by making the Wolf, Little Red Riding Hood, the Grandmother-and only need to have a few actual words, like "What big eyes you have." of Curriculum Transformation This versatility of the computer raises another conceptual issue: whether to use it as an aid to status quo learning of what has always been learned or to use it for learning new material that had previously been inaccessible,especiallyto young children. Two examples illustrating this (the first of which is taken from the arts) touch on another issue as well: whether the computer belongs to mathematics and the sciences,or to everyone. A remarkable fact about music education in our society is that the instruction is confined to learning how to reproduce music composedby other people(seeMusic, AI in). This is true even for the most private, tutored musical education.All other domains are quite strikingly different. In the visual arts, for example, everyone picks up a pencil and draws at some time. In literature, everyone is expectedto have someexperienceat writing. But in music, only a few students composetheir own material. One reason for this is easy to find. In order to compose effectively, a higher degree of performance competence is needed than most students ever acquire. Computers are now being used in a number of experimental projectsto create environments in which children move easily into composition.The computer has a number of roles, one as a musical instrument. The computer will play what you describeto it-in symbolsor drawing or some way that doesnot dependon your being able to producethe music with your fi.ngersor lips in real time-so you can try out a musical idea. You can hear it in its purity, untrammeled by your own performance deficiencies. Second,the music can be edited. If you do not like it, you can get into it and change it. The word processoropeneddoors to editirg, debuggitg, and writing more easily. The music editor can do the same. The second example is from science. Motion has been a central theme of physics since the period of Galileo and Newton. Newton's laws of motion are crucial to an understanding of physics.Yet in schools,motion (or dynamics)is taught very late and understood very poorly. The teaching of physics begrns not with dynamics, but with statics.
183
Why this curious reversal? Why is the less important field taught first? The reason lies in the kinds of technologieswe havi had for representing motion. As long as the technology was static-pencil and paper-the representation of dynamics required etaborate formalisms. First you go through algebra, then through calculus and differential equations, until finally - rna-.and the door is openedto Newton's someonecan say F has been turned off and dropped out student laws unless the before. long The computer turns this situation around becauseit is essentially a dynamic entity. One can begin to study motion at the earliest age by writing simple programs to control moving objectson the screen.The idea of a law of motion becomesvery concrete,tangible, and accessible.Thus, one can look for a turnabout-and signs of such turnabout in what is being taught are already evident. A Different Kind of EducationalTechnology A related issue is the similarity or dissimilarity betweencomputers and other technologies. Is the computer just another piece of educational technology? Or is it something that is transforming the world as we know it, including both schools and education? Inevitably, the computer is comparedwith educational television and other technologies-with SesameStreel in the case of early childhoodeducation;in the caseof schools,with audiovisual aids, language labs, and the like. Many of these technologiesare generally consideredto have been only partially successful at best and to have only a limited role and value in the classroom.At worst, they have ended up collecting dust in some distant closet. What specific capabilities does the computer have? Is it unique? Perhaps most obvious is that the computer can be used in more widely varied ways than these other technologies.But what is really fundamental is that only the computer is at least potentially under the control of the learner. Watching television is like listening to a fancy lecture. The accompanying images are often beautiful, impressive, and more informative than one usually finds in a classroom.The lecturer on television was no doubt selectedfor having special knowledge and special talents, and could put more effort into preparation than the teacher standing in front of a classroom. So maybe he does a better job than the teacher standing in front of the classrooffi,but he is doing a job of the same kind. In certain ways it is a better job, but in others it is not. The television lecture is less matched to the particular knowledge of those particular students, and there is no opportunity at all for interaction. At best, it is a slicked-up version of the same kind of thing. The speaker can be chosenmore carefully, there is more time for preparation, errors can be edited out, and so on. This does not make it a different kind of experience and probably cannot make up for the possibilities of personaltzation and interaction. The computer can create a different relationship to knowledge. Children who are discovering mathematical ideas by experimenting with the computer are in a different kind of relationship to mathematics than someone listening to an explanation from even the most skilled and interactive teacher. Compared with traditional teaching, the computer goes in the opposite direction from the change these other technolo-
184
COMPUTERS tN EDUCATION: coNcEpruAt tssuEs
gles can bring. Where television has somevirtues, its ultimate weaknessis that it pushesin the direction of passivelearning. Whereas the computer, whatever its weakness,pushes in the direction of active learning. The two are not comparabletechnologies;they are opposites.
SocialInteraction
The final conceptual issue is social interaction. Critics of the computerjump quickly to the conclusionthat computersin the educational environment will lead to isolation of the individual. Parents easily fear that their children will "spend all day sitting in front of the computer." What is the reality behind AlienatedVersusSyntonicLearning these fears? The answer is not a simple one. A few studies on social interactions in schoolswhere comConflicting claims about the role of computersin the classroom raise another issue. Do computers focus on factual and cogni- puters are present indicate that, at least statistically, the comtive knowledge alone, or do they also touch on feelings and puter presenceincreasesthe amount of interaction among the children. It allows for projects that encouragecooperation;it relationships? Mathematics can be used as an example that allows children to make something they are excited about and will clarify this issue. Most people come out of school with extremely negative talk about with others; it allows a new medium of communicaattitudes to mathematics. Many find their mathematical tion through computer mail or leaving messageson one anlearning experienceunpleasant. Quite a number developwhat other's files. However, there are individual casesof people of all ages has come to be known as mathophobia. get lost in the computer, who do withdraw from interacwho The reasonsfor this are complex and not completely understood, but certainly one of them is the fact that as early as tion with other people and focus their energies on themselves elementary school, children feel that mathematics is some- and their computer. Which will prevail? The answer leads to a final and most important concept about the computer. thing imposed on them from the outside, something that serves no clear purpose. Math is felt as something alienated, The"Effect" of the Computer. The computeris not an agent. something one doesunder pressure,something one is forcedto The computer doesnot "cause" more social behavior or less. It do-not something one does with a senseof pleasure and de- does not "cause" better or worse mathematical learning. The light, or that has relevance to one's own interests, and cer- computer is a material that enters into the learning environtainly not something one spontaneouslychoosesto do. ment-and more generally into the whole social environUsing the computer for drill and practice will only further ment-and can do so in many different ways. Noted aboveare alienate the children's experienceof mathematics. A different ways in which the computer can be used in support of the most approach seeks to solve this problem by changing the child's diametrically opposedtheoretical approachesto education. It relationship to mathematical knowledge. can be used to make structured education more structured. It The computer language Logo and turtle graphics allow chil- can be used to make open education more open. In all three dren to create designs or animations on the screen.In order to casesit is not the computer that has the particular effect but do so, they must acquire certain mathematical ideas. One of the ways it is used. these is numbers: they need to develop an intuitive senseof Nowhere is this more true than in the role computers can the relative sizesof various numbers. At first, children are not play in human interactions. Following are some examples of very good at guessing whether a particular line is 10 or 15 or measuresthat have been used to increasethe computer'sinflu100 units long. But with experience,they becomequite expert. ence as a socializrrtg,rather than an isolating, instrument. A more technical example is understanding the notion of LearningEnvironments that StrengthenInteraction. In the exangle. To make the turtle face another direction, you have to perimental school at the Learning Research Development say how many degreesit should turn: for example, with the Center in Pittsburgh, Logo teacher Leslie Thyberg runs a very command RIGHT 90 or LEFT 25. open class for children in the lower grades,K through 3. For a As traditionally taught, the idea of measuring angles in degreesis still very abstract for children as old as 11 and 12. variety of reasons, she introduced a simple rule: "Ask three By contrast, children who work with Logo and its turtle are before you ask me." This usually meant asking three other engaged in a different relationship with mathematics. They children-which had many results, all good. Asking each other questions created a particularly rich enenjoy what they are doing, and it is relevant to their immediate goal: for example, to program a video game or a cartoon or vironment for communication and for cross-fertilization of a drawing on the screen. With Logo, children as young as 7 ideas. This is the kind of climate in which a computer culture and 8 (and maybe even preschool ages) become completely can (and does) thrive. Answering each other's questions also allowed them to function as experts and as teachers. Leslie fluent in these ideas. This different relationship to mathematics opensthe possi- herself found that the number of trivial questions decreased markedly. This freed her time and energies to focus on more bility of a different kind of learning: syntonic learning-as contrasted with alienated learning. The word "syntonic" de- subtle things: spotting that one child is in trouble, another is rives from psychoanalysis,where it refers to the feeling that blocking, d third could be doing something more exciting, and certain activities are in harmony with one's goals and values so on. Another example concernsHenry-a boy whose early perand innermost self. It has always been clear to psychologistsand some educa- sonality development sets him up to get lost in the computer tors that syntonic learning is much more effective than alien- (2). Long before he met a computer, he had difficulty in social ated learning. Somepeople also find that educational ideology relations. He was a dreamer who lived in fantasies and preferred his own dream fantasies of space travel to playing can create marked differences in which type of learning takes place. In any case,syntonicity is a major issue for under- games with other children. For him, as for many like him, the standing the goals of many of the uses of computers in educa- arrival of the computer was an opportunity to intensify his social withdrawal. tion.
CONCEPTLEARNING
185
To exhibit these characteristics, an intelligent system-human or machine-must be able to classify someobjects,behaviors, or events as equivalent for achieving given goals and some others as differing. For example, to satisfy hunger, an animal must be able to classify some objectsas edible despite the great variety of their forms and the changesthey undergo in the environment. Thus, &r intelligent system must be able to form concepts,that is, classes of entities united by some principle. Such a principle might be a common use or goal, the same role in a structure forming a theory about something, or just similar preceptual characteristics. In order to use the concepts, the system must also develop efficient methods for recognizing concept membership of any given entity. The question then is how conceptsand conceptrecognition methods are learned. The study and computer modeling of processesby which an intelligent system acquires, refines, and differentiates concepts is the subject matter of conceptlearning. Conceptlearning is a subdomain of machine learning (qv). The research in this area originated with studies of concept development in humans (e.g.,Refs.1-3). It subsequentlycontinuedin the context of both AI efforts to build machines with concept-learning capabilities and cognitive sciencestudies to construct computational models of learning. Selected publications covering this developmentare listed in Refs. 4-23. At present, concept learning is one of the central research topics in machine learning, d subarea of AI concernedwith the development of computational theories of learning and the building of learning machines (see Machine learning). In research on concept learning, the term "concept" is usually viewed in a more narrow sensethan outlined above, namely, BIBLIOGRAPHY as an equivalenceclass of entities, such that it can be compre1. The New York Times,Section12, Education,Sunday,April L4, hensibly describedby no more than a small set of statements. This description must be sufficient for distinguishing this con1985. cept from other concepts.Individual entities in the class are Sithe Human Spirit, and Computers Self: The Turkle, Second 2. S. called instancesof the concept. monand Schuster,New York, pp. 129-136,1984. The assumption that a concept is an equivalence class implies that its every instance is equally representative of the General Beferences
However, Henry was in a school that was set up to favor a very different pattern of development. Children were encouraged to act as experts and advisors to the other children whenthey had special knowledge. The computerswere located "iu" out in the open rather than in computer labs or in classrooms where quiet was imposed. This made it much easier to see what other children were doing and to interact with anyone doing intriguing work. Thus, it was not the computer as such but the computer culture of the schoolthat drew Henry into a situation where he was in demand. So this young man who had always been afraid of pursuing contacts with other children found himself being pursued. Finally, a more subtle example is drawn from the author's work with Logo. From the outset this language was designed to encourage communication between users. Logo progTams are modular so they can be bomowed and shared. Logo is also designedto make it as easy as possibleto talk about how you made your program work-what the bugs were, what the difficulties were, and how you solved them. Thus, the content of actual computer work, even on what might seem like a very technical level such as designing a computer language, is a factor that can make for greater socialization or greater isolation. In all these conceptual issues one needs to remember one thing. Any question such as "What effect will the computer have upon this or that?" is a badly posedquestion. It is not the computer. In each case it is not what the computer will do to one, it is what one will do with the computer.
MA, :T"T:Jr:li,:T:iffffi:fjT:'i::'Jl Reading, Addison-wesrey, c. Daiute, writing andcomputers. 1985.
Deuetop,me* and. cognitiue Experience R.Lawrer, computer j:,:!:.:!: ffiTt;J1ffi"T1"""1"':,ffi: ff# Learningin a ComputerCulture,Ellis HorwoodLtd., distributed - ^--'
l?,fltrT:Jpi*
$r#:r:t-lxlTi :iff;
necessary and jointly sufficient conditions and thus excludes a^disjunctive description') Such an idealization greatly facilitates research on concept learning, as it defines the learning task simply as the acquisition of a formal structure describing an equivalence class. It is, however, only a very rough approxithat ignores many important aspects of the human 1^111"" notion of a concept (24). At the conclusion of this entry the w-eaknessesof this definition are briefly addressed' and ideas are pointed out that attempt to capture the notion ofa concept more adequately. Within research on concept learning two major orientations peprnr S. can be distinguished: cognitive modeling and the engineering Massachusetts Institute of Technology
by John wiley & sons,New york, 198b. T. o,shea, Learning and. Teachingwith computers.prentice-Hall, Inc., Englewoodcliffs, NJ, 19gg. S. Papert,Mind.storms: child,ren,computers,and Powerful ldeas,Basic Books,New york, 1980. publishing,Reston, D. peterson,The IntelLigentschoolhnuse.Reston VA, 1984. Harper & Row, NY, s. weir, cultivating Mind.s:Another casebooh. 1986.
CONCEpT
LEARNING
what is concept l_earning? Among the fundamental characteristics of intelligent behavior are the abilities to pursue goals and to plan future actions.
'l;Ii?ffi f l"'j;:A#iiTJl".:""flT:il:fi 3llffi :T: I,TtrlH;,l:lTllililfl ffi1,fi ffl'#".1ffI"j;,?""T'*il : velop computational theories of conceptlearning in humans or
and computer programs embodying those methods. In contrast, the engineering approach attempts to explore and experiment with all possible learning mechanisms, irrespective of their occurrence in living organisms.
186
CONCEPTLEARNING
ConceptLearningCan Be Classifiedby Type of InferencePerformed
usrng a conceptexample for guidance (27).In general, deductive learning is performing a sequenceof deductionsor compuIn any learning process the student applies the knowledge tations on the information given andior stored in background knowledge, and memoruzrngthe result. possessedto information obtained from a source,for .*u-p[, More advanced deductive learning is exemplified by anaa teacher, in order to derive new useful knowledge. This new knowledge is then stored for subsequentuse. Learning a new lytic or explanation-basedlearning methods (e.g., 27).These conceptcan proceedin a number of ways, reflecting the type of methods start with the abstract conceptdefinition and domain inference the student performs on the information r,rppli.d. knowledge, and by deduction derive an operational concept For example, one may learn the conceptof a butterfly by teing definition. A concept example is used to guide the deductive given a description of it, by generalizing examples of specific process.For instance, knowing that a cup is an open, stable butterflies, by constructing this concept in the processof ob- and liftable vessel, an explanation-basedmethod can produce serving and analyzitrg different types of insects, or by yet an- an "operational" description of a cup. Such a descriptioncharother way. The type of inference performed by the student on acterizes the cup in terms of lower level, more measurable the information supplied definesthe strategy of conceptlearn- features, such as the presenceof concavity, of a handle and a ing and constitutes a useful criterion for classifying learning flat bottom. Curuent research attempts to combine such anaIytical learning with inductive learning in order to learn conprocesses. cepts when the domain knowledge is incomplete, intractable Several basic concept-learning strategies have been identiinconsistent. fied in the courseof machine-learning research.These are pre- or sentedbelow in the order of increasing complexity of inference Learningby Analogy. The learner acquiresa new conceptby as performed by the learner. In some general sense,this order modifying the definition of a known similar concept.That is, reflects the increasing difficulty for the student to learn the rather than formulating a rule for a new conceptfrom scratch, conceptand the decreasingdifficulty for the instructor to teach the student adapts an existing rule by modifying it approprithe concept. In any practical act of learning, more than one ately to serve the new role. For example, if one knows the stratery is often simultaneously employed. It should also be concept of an orang€, learning the concept of a tangerine can noted that this classification of strategies applies not only to be accomplishedeasily by just noting the similarities and dislearning of conceptsbut also to any act of acquiring knowltinctions between the two. Another example is learning about edge. electric circuits by drawing analogies from pipes conducting water. Direct lmplantingof Knowledge.This is an extreme casein Learning by analogy can be viewed as inductive and deducwhich the learner does not have to perform any inference on tive learning combined and for this reason is placed between the information provided. The knowledge supplied by the the two. Through inductive inference (see below) one detersource is directly acceptedby the learner. This strategy, also mines general characteristics or transformations unifying concalled rote learning, includes learning by direct memorization cepts being compared. Then, by deductive inference, one deof given concept descriptions and learning by being pro- rives from these characteristics features expected of the grammed or constructed. For example, this strategy is em- concept being learned. Winston (18) describesa method for ployed when a specific algorithm for recognizing a concept is learning conceptsby analogy basedon matching semantic netprogrammed into a computer or a database of facts about the works. Learning by analogy plays an important role in probconcept is built. In Samuel's CHECKERS program (5) rote lem solving (e.g.,Ref. 22)" learning was employed to save the results of previous game tree searches in order to deepen and speed up subsequent Learningby Induction. In this strategy the learner acquires searches. a conceptby drawing inductive inferences from supplied facts or observations.Depending on what is provided and what is Learningby lnstruction(or Learningby BeingTold). Here the known to a learner, two different forms of this strategy can be learner acquires concepts from a teacher or other organized distinguished: learning from examples and learning from obsource,such as a publication or textbook, but doesnot directly servation and discovery. Learningfrom Examples.The learner induces a conceptdecopy into memory the information supplied. The learning process may involve selecting the most relevant facts and/or scription by generalizing from teacher- or environment-protransforming the sourceinformation to more useful forms. The vided examples and (optionally) counterexamplesof the consystem NANOKLAUS (25), which builds a hierarchical cept. It is assumedthat the conceptalready exists; it is known knowledge base by conversing with a user, is an example of to the teacher or there is some effective procedure for testing the concept membership. The task for the learner is to determachine learning employing this strategy. mine a general concept description by analyzing individual Learningby Deduction. The learner acquires a conceptby conceptexamples. An example of this strategy takes place when a senior docdeducing it from the knowledge given and/or possessed.In other words, this stratery includes any process in which tor examines medical records and makes interviews with paknowledge learned is a result of a truth-preserving transfor- tients in the presence of one or more interns, noting that mation of the knowledge given, including performing compu- "this is a patient with hepatitis"; "this is another patient with tation. A very simple example of this strategy determining hepatitis, but notice that . . ", and so on. The latter part of that the factorial of 6 is 720 by executing an already known this entry briefly discussesa few methods for learning from algorithm and having this fact for future use. This technique examples. Learningby Observationand Discovery.In this stratery is called "memo functions" (26D.A form by deduction is explanation-based learning which transforms an abstract, not di- the learner analyzes given and/or observedentities and deterrectly usable, concept definition to an operational definition mines that some subsetsof these entities can be grouped use-
CONCEPTLEARNING
187
fully into certain classes(i.e., concepts).Becausethere is no terized as reasoning from specificto general, from particular to teacher who knows the conceptsbeforehand, this strategy is universal, or from part to whole. Such a characterization is also called unsupervised learning. Once a conceptis formed, it simple but not too informative. It does not identify all the is given a name. Conceptsso created can then be used as terms componentsplaying a role in the inductive process,nor doesit in subsequent learning of other concepts. explain how this inference is possible.To understand this inAn important form of this stratery is clustering (i.e., partiference more precisely, its major components are distintioning a collection of objectsinto classes)and the related pro- guished, and the properties of its conclusions are specified. cess of constructing classifications. Classifications are typically organized into hierarchies of concepts.Such hierarchies Given: exhibit an important property of inheritance. If an object is premise statements(facts, specific observations, intermedirecogntzedas a member of some class, the properties associated specifically with this class, as well as with classesat the ate generalizations) that provide information about some higher level of hierarchy, are (tentatively) assigned to the objects,phenomena,processes,and so on; given object. For example, if one learns that Freddy is an a tentatiueinductiue assertion,which is an a priori hypotheelephant, then, without seeing Freddy, one will typically assis held about the objects in the premise statements (in sume that Freddy has four legs, a trunk, and all the distinsome acts of inductive inference there may not be any tenguishing properties of elephants, vertebrates, and generally, tative hypothesis; if there is such a hypothesis, the inducanimals. Hierarchical classifications vary in height: Some tive processmay be simplified, 8s it may involve merely a may be tall, like the classification of living organisms, and modification of the tentative hypothesis rather than creatsomemore flat,like the social hierarchy. The topics of clustering a new hypothesis from scratch); and ing (in particular, conceptual clusterirg) and classification backgrourudknowledge, which contains general and doconstruction are treated in a separate entry in the encyclopemain-specific conceptsfor interpreting the premisesand india (seeClustering). ference rules relevant to the task of inference; it includes Another form of learning by observation and discovery is previously learned concepts,domain constraints, causality descriptive generalization. This form is concernedwith discovrelations, assumptions about the premise statements and ering regularities and formulating new concepts and rules candidate hypotheses,goals for inference, and methods for characterizing collections of any entities (objects,events, proevaluating the candidate hypotheses from these goals' o'most cesses,etc.). It producesstatements such as peopleare viewpoints (specifically, the preference criterion or bias). honest," "whenever there are independent events, the normal distribution should hold," or "John is in the habit of amblin' Determine: down to the soda fountain every day about now." Examples of research on this topic are two programs by an inductiue assertion (a hypothesls) that strongly or Lenat (L5,23): AM, which searchesfor and developsnew ,,inweakly implies the premise statements in the context of teresting" conceptsafter being given a set of heuristic rules background knowledge and is most preferable among all and initial conceptsin elementary mathematics and set theother such hypotheses. orY,and EURISKO, which formulates new heuristics.Another example is the BACON system (e.g.,Ref. zB), which syntheA hypothesis strongly implies premise statements in the sizes mathematical expressions representing chemical or context of background knowledge if by using background physical laws on the basis of given empirical data. knowledge (and standard rules of inference), the premise In the AI literature the term "concept learning" is frestatements can be shown to be a logical consequenceof the quently used in a more narrow sensethan it is here, namely, to hypothesis. In other words, the assertion mean solely learning conceptsfrom examples.One reason for this is historical, as this strategy was studied first, and most is Hypothesis & Background knowledge ) Premise statements known about it. It subsequently served as the springboard for studies of other strategies, but it continues to be the area most is valid, that is, true under all interpretations (the symbol + intensively investigated. Learning from examples and learn- denotesimplication). A hypothesis that satisfiesthis condition is called a strong candidate hypothesis. In contrast, a weak ing from observationand discovery(i.e., inductive learning in hypothesis is the one that only weakly implies premise stategeneral) are fundamental forms of concept learning. When ments, that is, these statements are a plausible, but not ceracquirittg any abstract concept,examplesare typically needed tain, consequenceof the hypothesis. The following two-part to achieve a deeper understanding of the concept;and initial learning of any conceptsand natural laws is typically achieved example illustrates both types of hypotheses. by generalizing from our sensoryobservations.For these reaExample:Part 1. sons the remainder cf this entry concentrates on inductive learning. For coverageof other strategies the reader is advised Premise statements: to consult other references,in particular Ref. 29. The nature of Socrateswas Greek. Aristotle was Greek. Plato was Greek. inductive inference, which is the core of inductive learning processes,is explored in more detail. Background knowledge: InductivelnferencesGeneratesHypothesesfrom Factsand/or Other Hypotheses Inductive inference is the primary vehicle for creating new knowledge and predicting future events. It is usually .huru.-
Socrates,Aristotle, and Plato were philosophers.They lived in antiquity. Philosophersare people.Greeks are people. PreferenceCriterion. Prefer the hypothesis that is short and useful for deciding the nationality of philosophers.
188
CONCEPTTEARNING
Candidate hypotheses(a selection): 1. Philosophers who lived in antiquity were Greek. 2, Atl philosophers are Greek. 3. All people are Greek. Preferred hypothesis: 4. Atl philosophers are Greek. (It is shorter than 1 and more specific than 3; it allows one, unlike 1, to determine the nationality of all philosophers.) It can be seen that the original premise statements are a logical consequenceof the generated hypothesis and background knowledge. The fact that the generated hypothesis is too general is a result of the poverty of the background knowledge and/or the premise assertions. Example:Part 2. Supposethat the stock of facts has been enlarged with statements such as "Spencer was British" and "Hume was British" and that the background knowledge includes also the statement "Hume and Spencer were philosophers." In this case a strong candidate hypothesis would be "All philosophers were Greek except Spencer or Hume, who were British." A weak hypothesis would be "Most (or some)philosophers were Greek." Given a fact that Plato was a philosopher, the new hypothesis, in contrast to the old one, doesnot allow one to concludestrongly that he was Greek. It allows one only to say that it is likely (or that it is possible)that he was Greek. However, unlike the first hypothesis, it will also not conclude strongly that philosopher Russell was Greek! This example illustrates important properties of inductive inference. One is that it may not be truth preserving, that is, its conclusions may be incorrect though the premise statements are correct. Going back to the first hypothesis, though Socrates,Aristotle, and Plato were Greek, it certainly doesnot follow that all philosophers were Greek. This quality of nontruth preservation contrasts inductive inference with truthpreserving deductive inference. Figure 1 illustrates the relationship between rleductive and inductive inference. Inductive inference that produces strong hypotheses is fal-
Premise statements facts
D e du c t i o n
Background knowledge
Figure
1. Relation between deduction and induction.
sity preserving. This means that if the original premise statements are false, the derived hypothesis will be false also. For example, if it were not true that Socrates was Greek, then clearly the first hypothesis, "All philosophers were Greek," could not be true either. Hypotheses generated by inductive inference have unknown truth status. They must be tested and verified before they becomerules or acceptedtheories (see section on hypothesis verification). The premise statements, background knowledge, and derived hypotheses need to be expressedin some language. In human inference it is the language of the mind, a "mentalese," that at the surface level takes the form of natural language augmented with special representations of sensory stimuli, such as drawings, pictures, sounds, or gestures. In machine inference it is a formal language, such as propositional logic, predicate calculus or other logic-style formalisms, or a knowledgerepresentation system, such as semantic networks, mathematical expressions,frames, scripts, or conceptualstructures (30). Sometimesexpressingthe premise statements is easier in one language and expressing hypothesesis easier in another Ianguage. In conceptlearning from examples(conceptacquisition) the main concern is with a special case of inductive inference, called inductive gen erahzation. Here both the premise statements and the hypothesis are either interpretable as descriptions of sets (in this casethere is instance-to-classgeneralization) or as descriptions of componentsof someobject or process (in the latter case there is part-to-whole generaltzation). In instance-to-class generalization properties known to hold for a set of objects are assignedto a larger set of objects. This form can be seen in the example above,in which a property (the nationality) assignedby premise statements to a few individuals was assignedto all individuals in some class (all philosophers). In part-to-whole generalization the premise statements describe parts of some object, and the goal is to hypothesizea description of the whole object.For example,the following is a part-to-whole generalization. Premise: His hands and his legs are strong. Background knowledge:Hands and legs are parts of a body. Hypothesis.'His whole body is strong. An important form of part-to-whole generalization is sequenceor processprediction (31,32). Inductive inference was defined as a processof generating descriptions that imply original facts in the context of background knowledge. Such a general definition includes inductive generalization and abduction as special cases.The term "abduction" was coined by the American logician Peirce (33). In abduction, the generated descriptions are specificassertions implying the facts (in the context of background knowledge) rather than generalizations of them. For example, given a premise assertion, "these roses are purple," and background knowledge "all rosesin Adam's garden are purple," an abductive assertion would be "perhaps these roses are from Adam's garden." A description that implies some facts can be viewed as an explanation of these facts. The most interesting form of an explanation is when it provides a causal, goal-orientedcharactefization of the facts. To derive such an explanation, background knowledge must contain, along with other inference rules, causal inference rules as well as a specification of the
CONCEPTLEARNING
goal(s) of inference. Generating causal explanations can thus be viewed as a form of inductive inference.
InductivelnferenceCan Be Performedby Rules One of the important results of researchon inductive inference is the development of the concept of an inductive inference rule. An inductive inference rule performs some elementary act of inductive inference. It takes one or more assertionsand generatesan assertion that tautologically implies them. The concept of an inductive inference rule permits one to view inductive inference, at least conceptually, as a rule-guided process that starts with initial premises and background knowledge and ends with an inductive assertion (84). Here are a few examples of such rules:
189
Interpretation Reformulation
I n s t a n c es p a c e
D e s c r i p t i o ns p a c e
Equivalent descriptions
E x a m p l es e l e c t i o n e x p e r i m e n tp l a n n i n g
Figure 2. Interaction between instance space and description space. Dropping conditions (removing a conjunctively linked condition from a statement; e.g., replacing the statement ,,a nation is strong if it has a strong economyand high deterConsider a simple casewhere examplesof a concept(posimination" by "a nation is strong if it has high determina- tive examples) and counterexamples (negative examples) are tion"). represented by attribute vectors, that is, by lists of values of Turning constants into uariables (e.g., it generalizes the certain attributes. Considering attributes as dimensions spanstatement "this apple tastes good" into "all apples taste ning a multidimensional space,each example maps to a point good"). in this space.Points that do not correspondto any observed Adding options (it generalizes a statement by adding a example represent potential examples.Such a spaceis called a disjunctively linked condition; e.g., it might generarizethe feature spaceor an event spaceand can be viewed as a geometstatement "peace will be preserved if all nations have ric model of an instance space. One may ask where the attributes come from. In simple peaceful intentions" into "peacewill be preserved if all nations have peaceful intentions or if nonaggressivenations methods the attributes are defined by the teacher. Such methods are called selective becausethe learned concept does not are much stronger than the aggressiveones"). include any new attributes but only those definedby a teacher. Climbing generalization tree (replacitrg a less general term In more sophisticated methods the system is provided with by a more general term in a statement; e.g., generalizing some initial attributes plus various rules of inference, heuristhe statement "I like oranges" into "I like citrus fruits,,). tics, or proceduresthat a learner uses for generating new attributes. The latter methods are called constructive (ga,gs). A systematic presentation of inductive rules is in Ref. 34. Different subsetsof the instance spacecorrespondto different concepts.Descriptions of those concepts are elements of the description space.For simplicity, assumethat the descripInstanceSpaceversusDescriptionSpace tion spaceis the set of all logical expressionsinvolving attriEarlier two forms of inductive learning have been distin- butes used in char actenzittg examples.Depending on tf,e conguished: learning from examples and learning by observa- straints imposed on these expressions,all (or only some) tion. Learning a concept from examples is a processof con- subsetsof the instance spacecan be representedby an expresstructing a representation of a designated class of entities by sion in this language. usually, any concept correspondsto a observing only selectedmembers of that class and optionally subset of (logically equivalent) descriptions in the description nonmembers (counterexamples).Learning from observations space. involves creating conceptsas useful classesfor characteri z1ng A concept is consistent with regard to the examples if it observationsor any given facts. Both processesdependon the covers some or all positive examples and none of the negative learner's background knowledge, in particular, on the type of examples. A conceptdescription is completewith regard to the description language the learner usesfor characteri zingu"uo,- examples if it coversall positive examples.A description of a ples and learned concepts. conceptthat is both complete and consistent with regard to all In this context it is instructive to distinguish between an examples is a candidate hypothesis.The requirement for cominstance space and a description space. The instance space pleteness and consistency follows from th; assumption that consistsof all possibleexamplesand counterexamplesof con- the hypothesis should imply the initial examples(seeRef. 84). cepts to be learned. Actually observed positiv. utrd negative The set of all candidate hypotheses is called the candidate examples constitute subsets of such an instance space. The hypothesis spaceor the version space.The candidate hypothedescription space is the set of all descriptions of instances or sis spacecan be partially ordered by the relation of g.tt"rulity classes of instances that are possible using the description that reflects the set inclusion relation betweenthe .*r.rpondlanguage specified by the learner's background knowlidge. ing coneepts.The most general hypothesis describesthe conLearning a concept involves an interaction between the two cept that is the complement of the union of negative examples; spaces. Such an interaction may involve reformulation or and the most specific hypothesis describesthe conceptthat is transformation of initial assertionsas well as experimentation the union of all positive examples. and active selectionof training examples (Fig. z). Because the candidate hypothesis space is usually quite
190
CONCEPTTEARNING
large, a preferencecriterion is used to decidewhich candidate hypothesis to choose.Such a criterion may favor, for example, hypotheses that are short, hypothesesthat require the least effort to measure the attributes involved, or generally, hypothesesthat best reflect the goal of learning. If the concept representation language is incomplete, for example, allows one to express only conjunctive hypotheses, and a sufficient number of positive and negative examples is supplied, the resulting version space may contain only one candidate hypothesis.In such a casethe preferencecriterion is not needed(17). In summary, learning a concept can be described as a heuristic (qv) search (qv) through the description spacefor a most preferred hypothesis among all those that are consistent and complete with regard to the training examples. SelectedMethodsof InductiveLearning An important characteristic of learning methods is the way in which descriptions in the description spaceare generatedand/ or searchedin relation to the examplesor facts in the instance space. Three types of methods can be distinguished: data driven, model driven, and mixed. A data-driven method starts with selecting one or more examples,formulates a hypothesis explaining them, and then generalizes(and occasionallyspecializes)the hypothesis to explain further examples.A modeldriven method starts with some very general hypothesesand then specializes(and occasionally generalizes)them to fit all the examples.Roughly speaking, data-driven methodsproceed from specific to general, and model-driven methods proceed from general to specific.A mixed method has elements of both: It uses an example(s)to jump to one or more general hypotheses,tests the hypotheses,and then modifies them to fit other examples. Data-driven methods tend to be more efficient, and model-driven methods tend to be more tolerant of errors in data (29). Below are examples of the three types of methods.
The generalization step (step 2) applies such operators as dropping conditions, turning constants to variables, or climbing generalization tree. When confronted with multiple choice in generalizing, the program chooses the least "drastic" change to the current conceptdescription. For example, it witl replace a less general term by a more general term rather than drop a term. The specialization step (step 3) adds more conditions and introduces exceptionsor the must-not conditions to the currently held hypothesis.There are usually many ways to specialrzea hypothesis so that it doesnot cover a given negative example (as many as there are differences between the example and the hypothesis). For that reason the program favors the near misses,that is, negative examples that diffler from the hypothesis in only a few or, in the best case,in only one aspect. Other examples of data-driven methods are the candidate elimination algorithm (I7 ,37) for learning from examples and the method for learning from observation embodied in the BACON system (28). The latter method discovers equations characterizing empirical laws. Model-DrivenMethods
Learningby lncrementalSpecializationand Modification:The Program.This progTamimplements a modelMeIa-DENDRAL driven method for discovering rules characterizing the operation of a mass spectrometer (38). These so-called cleavage rules predict which bonds in a molecular structure of a chemical compoundwilt likely break when bombarded by electrons in the mass spectrometer.To avoid undue technical details of the specificdomain, the rule-learning processis presentedat a level of abstraction. This processconsists of two phases.First, the rule generation phase conductsa general-to-specificsearch of the spaceof possible cleavage rules (subprogram RULEGEN). Next, the rule modification phase makes the rules so obtained more precise and less redundant by performing local hill-climbing searches(subprogram RULEMOD). Training examplescan be Data-DrivenMethods viewed as attribute vector descriptions of the environment of Winston'sBlock World: Learningby IncrementalGeneraliza- individual bonds in a molecule. Among the attributes are the tion and Modification. Winston's program (36) is an excellent type of atoms on both sides of the bond, the number of hydrorepresentative of a data-driven method of conceptlearning. It gen and nonhydrogen atoms bound to each atom, number of learns structural descriptions of concepts in a blocks world unsaturated valence electrons of the atom, and so on. With (e.g.,the conceptof an arch) from representative examplesand each example is associateda decision as to whether the correcounterexamplesprovided by a teacher. The progTam repre- sponding bond will break in the mass spectrometer.An imporsents examples and conceptsin the form of a semantic net- tant feature of this application is a large-srzed,error-laden set work. At each step of learning it maintains only one working of input examples. The rule generation phase starts with the most general hypothesis. In searching for the final hypothesis, it uses a algorithm rule, stating that every bond will break. Abstracting from si-pl" form of best-first search method. The basic the specific domain-dependent notation, such a rule can be can be describedas follows: written: 1. Take first positive example of the concept and assume that If a bond is any bond, then it will break. it is a concept descriPtion. the not satisfy does and positive 2. If the next example is The next step specializesthe left side of the parent rule by current concept description, generalize the description so making a change to atoms at a specified distance from the that it includes the examPle. bond. A changemay involve changing properties of an atom or current the satisfies but negative is adding a new atom. New rules so obtained are then tested to B. If the next example description, specialize the description so that it excludes seeif they perform better in predicting the breaks in the given set of examples. This two-step process of rule specialization the example. of performance is 4. Repeat steps 2 andS until the processconvergeson a stable and testing repeats until a local optimum as: be characterized can rules resulting The achieved. concept descriPtion.
CONCEPTLEARNING
If a bond environment has properties so and so, then it will break. MeIa-DENDRAL was an important learning system that worked well in a real-world domain with noisy data. In addition to the processof rule development,outlined above, it also performed a sophisticated transformation of the initial data (the input spectrum) to usable training instances (the bond environment descriptions). In all aspects of its operation the program relied on a large amount of domain-specific knowledge. Another example of a model-driven method is the conceptlearning progr&ffi,CSL (3), and its modified version, IDg (Bg). The program starts by attempting to find the best one-attribute rule characterizinggiven examples.If this is not possible, it builds a decisiontree of such rules that classifiesall input examples. In such a tree nodes correspondto attributes, emanating branches to the attribute values, and leaves to classes.
T91
been found). Otherwise, find positive examplesthat remain uncovered. 5. Repeat steps I-4 for the remainder set. Continue until all positive examples are covered.The disjunction of hypotheses selected at the end of each cycle is a consistent and complete description of all the positive examplesand maximizes the preference criterion.
Thus, the program builds a disjunctive description of a concept when a conjunctive description is not possible.The individual conjuncts in such a disjunction may significantly differ as to the size of coverageof the training examples.This allows for an interesting interpretation: The conjunct that covers most of the events could be viewed as a char actenzation of the typical, or "id eal," members and those with light coverageas a characterization of exceptional cases. The incremental part of the program performs operationsof modifying generated descriptions to fit new examples. The background knowledge of the program contains information Mixed Methods about the properties of the attributes used to describe examples and various domain constraints. The program has been Learningby RapidGeneralizationand StepwiseSpecialization: applied to various problems in medicine, agriculture, chess, AQl 1. Inductive conceptlearning can be viewed as a generate-and-test process.The "gener ate" part creates or modifies and other areas. A more advanced version of the program, hypothesesand the "test" part tests how well the hypotheses INDUCE (34), is capableof learning not only attribute-based fit the data. In data-driven methods the "gener ate'i part is but also structure-based concept descriptions. These descripsophisticatedand the "test" part is simple, whereas in model- tions charactenze conceptsas structures of componentsbound driven methods the opposite holds. A mixed method, imple- by various relationships, and are expressedin an extended mented in the program AQll, attempts to more equally em- predicate calculus. The program has the ability to utili ze general and domain-specific knowledge to generate new attriphasize the "gener ate" and "test" parts. butes. AQl1 is a multipurpose learning program that formulates general rules describing various classesof examples (40). Input to the program consists of attribute value vector descrip- How are Learned ConceptsValidated? tions of examplesfrom different classes.It also includes background knowledge about the application domain and a Although inductive inference represents the basic method for hypothesis preference criterion. The output can be viewed as acquirittg knowledge about the world and is one of the most common forms of inference, it suffers from a fundamental rules, weakness.Except for special cases,results of this inference are inherently insusceptible to complete validation. This is because an inductively acquired hypothesis may have an infinite where"condition" *"fi::ffi;:::, a disjunction orconjunctions, such that it describesall entities assigned ,,class.,, number of consequences,but only a finite number of tests can to A simplified version of the algorithm, called AQ-,which under- be performed. This property of inductive inference was oblies the nonincremental learning part of the program is as served early on by the Scottish philosopher David Hume and subsequently analyzedby twentieth-century thinkers such as follows. Popper (e.g., Ref. 4r).Consequently, one typically assumes that conceptdescriptions learned inductively huro. only a ten1. Select at random one positive example (called the seed). 2. Comparing the seedwith the first negative example, gener- tative status. When new examples become available, these ate all maximally general hypothesesthat cover the seed descriptions are tested on them and, if neccessary,appropriately modified. A standard method for testing inducti*ly ..and exclude the negative example. quired descriptions (rules) is to apply them to iesting examples 3. Specializethe hypothesesto exclude all negative examples. and compute a confusion matrix. Such a matrix records the This is done by considering one negati'r. .*umple at a time number of correct and incorrect classifications of the testing and adding, whenever neccessary,additional constraints to examplesby the rules. the hypotheses.After each step of spectalizatron the newly generated hypotheses are ranked according to how well they classify remaining examples and u..ording to other ExtendedNotionsof a Concept aspectsdefined in the preference criterion. Only the most The basic ideas and a few selectedmethods of conceptlearning promising hypotheses are kept. The set of frypotfreses ob- have been described here. These methods were based on the tained at the end of the specialization processis called a notion that concepts are classesof entities describable by star. a logic-style description. This means that concept descriptions 4. Select from the star the best-ranked hypothesis. If this hy- have sharp boundaries and all members are equal ,"pr"rentapothesis covers all positive examples,exit (a solution has tives of a concept.As pointed out above, this simpiification,
192
CONCEPTLEARNINC
though useful for research, misses some important aspectsof the human notion of a concept. Human concepts,exceptfor special casesoccurring predominantly in science(conceptssuch as a triangle, a prime number, a vertebrate, etc.), are structures with flexible and/or imprecise boundaries. They allow a varying degree of match between them and observedinstancesand have context-dependent meaning. Flexible boundaries make it possibleto "fi.t" the meaning of a conceptto changing situations and to avoid precision when not needed or not possible. The varying degree of match reflects the varying representativenessof a conceptby different instances. Instances of a conceptare rarely homogeneous. Among instances of a concept, people usually distinguish a "typical instance," a "nontypical instance," or, generally, they rank instances according to their typicality. By the use of context, the meaning of almost any concept can be expanded in a multitude of directions that cannot be predicted in advance.An imaginative discussionof this property is by Hofstadter (42), who shows how a seemingly well-defined concept, such as "First Lady," car' expressa great variety of meanings depending on the context in which it is applied. Despite various efforts, the issue of how to represent concepts in such a rich and context-dependent sense remains open. This issue is, of course,crucial for conceptlearning becauseto learn concepts,the learner must be able to represent them. In view of this, a brief review of basic approaches to concept representation may be useful for understanding the current research limitations and directions in concept learning. Smith and Medin (43) distinguish between three approaches:the classical view, the probabilistic view, and the examplar view. The classical view assumesthat conceptsare representableby features that are singly necessaryand jointly sufficient to define a concept.This view is a special caseof the one assumed in this entry, as it does not allow disjunctive concept descriptions. The probabilistic view representsconceptsas weighted, additive combinations of features. Using the aforementionednotion of a feature space,this means that conceptsshould correspond to linearly separable subareas in such a space. Experiments indicate, however, that this may be too limiting a view (43). The exemplary view represents conceptsby one or more typical exemplars rather than by generalized descriptions. The notion of typicality can be captured by a measure' called family resemblance.This measure represents the sum of frequencieswith which different features occur in different subsetsof a superordinate concept,such as furniture, vehicle, and so on. The individual subsets are represented by typical members. Nontypical members are viewed as corruptions of the typical, differing from them in various small aspects, as children differ from their parents (e.g., Refs. 44 and 45). Another approach uses the notion of a fuzzy set as a formal model of a concept (46). Members of such a set are characterized by a gradual numerical set membership function rather than by the in-out function seen in the classical notion of a set. This set membership function is definedby peopledescribing the concept and thus is subjective. This approach allows one to expressthe varying degreeof membership of entities in a concept but does not have mechanisms for expressing the context dependenceof the conceptmeaning.
Elements of the above approacheshave been unified in a more recent idea, which postulates that the conceptis characterized by a well-defined description, but the use of this description is flexibl e (47). If an entity does not satisfy the description precisely, a consonance degree is computed that specifiesthe degreeto which the description is satisfied.Thus, objectsprecisely satisfying the formal description can be considered as typical conceptmembers and those that satisfy approximately as less typical, with the degree of membership defined by the consonancedegree. In the case of disjunctive descriptionsthe component(conjunction)that explains most of the examplescan be viewed as representing the ideal form of a concept. Other componentsthen represent exceptional cases. The method of computing consonancedegreecan be shared by many concepts;therefore, there is no need for storing a set membership function with each concept,as in the caseof fuzzy sets. The dependenciesamong the attributes characterizing a conceptand its relationship to other conceptscan be expressed in the same logic-basedformalism. Thus, in such a "flexible logic" approach the total meaning of a concept is distributed between its formal description and the function evaluating the degreeof consonance.The description gives the basic meaning to a concept,and the evaluation function allows for its flexibility. Major questions, then, are how to properly distribute the concept meaning between these two componentsand how to expresscontext-dependentmeaning. An adequate concept representation should include not only a description that permits one to recognrzethe given concept among other conceptsor to evaluate the typicality of its members but also a number of other components.It should specify the constraints and correlations among the defining or characteristic attributes, the relationship of the concept to other concepts,its typical and nontypical examples,the dependenceof meaning on different contexts,the purposeand use of the concept,and its position and role in knowledge structures and theories in which it is embedded.Many of these components are present in the representation described in Ref. 48. Murphy and Medin (24) argue that the role a conceptplays in a theory that usesit provides a basis for conceptualcoherence, that is, for explaining why certain classesof entities constitute a meaningful concept and some others do not. Further progress on concept learning is predicated on progress in concept representation.
Conclusion Conceptlearning has been presentedas a processof constructing a concept representation on the basis of information provided by an external source, a teacher, or an environment. The type of transformation performed by the learner defines the learning strategy. The main emphasis of this entry is on inductive learning, which is divided into learning from examples and learning from observation and discovery. Principles are described that underly inductive inference, and several methods are presented for conceptlearning from examples. A number of topics in concept learning have not been covered. Among these are methods for creating new concepts, noninductive learning strategies, techniques for evaluating learned concept descriptions, and learning from noisy or incompletely defined examples. The general referencesinclude papers on these topics.
CONCEPTLEARNING BIBLIOGRAPHY
193
2L R. C. Schank, Looking at Learning, Proceedingsof the European Conferenceon Artifi,cial Intelligence,Orsay, France,July 1982,pp. 11-18. 22. J. G. Carbonell, Learning by Analogy: Formulating and Generalizing Plans from Past Experience,in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning: An Artifi'cial IntelligenceApproach, Tioga, 1983,pp. 137-161.
1. C. I. Hoveland, A "communication analysis" of conceptlearning, Psychol.Reu. 59(6),46L-472, L952. 2. J. S. Bruner, J. J. Goodnow,and G. A. Austin, A Study of Thinking, Wiley, New York, 1956. 3. E. B. Hunt, J. Marin, and P. J. Stone,Experimentsin Induction, 23. D. B. Lenat, The Role of Heuristics in Learning by Discovery: AcademicPress,New York, 1966. Three Case Studies, in R. S. Michalski, J. G. Carbonell, T. M. 4. A. Newell, J. C. Shaw, and H. A. Simon, A Variety of Intelligent Mitchell (eds.),Machine Learning: An Artificial Intelligence ApLearning in the General Problem Solver, Rand Corporation Techproach, Tioga, 1983, pp. 243-306. nical Report, Santa Monica, CA, 1959. 24. L. Murphy and D. L. Medin, "The role of theories in conceptual G. 5. A. L. Samuel, "Some studies in machine learning using the game coherence,"Psychol.Reu. 92(3), 289-316 (1985). of checkers,"IBM J. Res. Deu. (3), 210-229, lgbg, reprinted in 25. N. Hass and G. G. Hendrix, Learning by Being Told: Acquiring E. A. Feigenbaum and J. Feldman (eds.), Computersand Thought, Knowledge for Information Management, in R. S. Michalski, J. G. McGraw-Hill, New York, 1963,pp. 71-10b. Carbonell, and T. M. Mitchell (eds.), Mq,chineLearning: An Artifi6. M. Kochen, "Experimental study of 'Hypothesis Formation' by Intelligence Approach, Troga, Palo Alto, CA, 1983, pp. 305cial Computer," in C. Cherry (ed.), Information Theory: 4th Lond,on 427. Symposium, Butterworth, London and washington, DC, 1961. 26. D. Michie, "Memo functions and machine learning," Nature 7. S. Amarel, On the Automatic Formation of a Computer Program 218(5136),19-22 (1968). which Representsa Theory, in M. Yovits, G. Jacobi,and G. Gold27. T. M. Mitchell, R. M. Keller, and S. T. Kedar-Cabelli,"Explanastein (eds.), Self-Organizing Systems,Spartan Books, Washingtion-Based Generalization: A Unifying View," Machine Learning ton, DC, L962,pp. 102-L78. 1(1),47 -80 (1986). 8. R. B. Bane{i, Computer Programs for the Generation of New Con28. P. Langley and G. L. Bradshaw, RediscoveringChemistry with cepts from Old Ones, Neure Ergebnisse der Kybernetik, in K. Steinbuch and S. Wagner (eds.),Oldenberg-Verlag,Munich , Lg64, the Bacon System, in R. S. Michalski, J. G. Carbonell, T. M. p. 336. Mitchell (eds.),Machine Learning: An Artificial Intelligence Approach, Tioga, PaIo Alto, CA, 1983,pp. 307-329. 9. N. Bongard, Pattern Recognition,Spartan Books,New york, 1gz0 (translation from a Russian original published in 1966). 29. T. G. Dietterich and R. S. Michalski, A Comparative Review of SelectedMethodsfor Learning from Examples,in R. S. Michalski, 10. S- Watanabe,Pattern Recognition as an Ind,uctiueProcess,MethoJ. G. Carbonell and T. M. Mitchell (eds.),Machine Learning: An dologiesof Pattern Recognition,Academic Press,New York, 1968. Artifi,cial Intelligence Approach, Troga, Palo Alto, CA, 1983, pp. 11. M. Minsky and S. Papert, Perceptrons,MIT Press, Cambridge, 41-81. MA, 1969. 30. J. F. Sowa, Conceptual Structures: Information Processing in L2- P. H- Winston, Learning Structural Descriptionsfrom Examples, Mind and Machine, Addison-wesley, Reading, MA 1984. Ph.D. Thesis, Report No. TR -28r, AI Laboratory, MIT, 1920 [re31. H. A. Simon and G. Lea, Problem Solving and Rule Induction: A printed tn The Psychologyof Computer Vision, P. H. Winston (ed.), Unified View, L. W. Gregg, (ed.), tn Knowled,geand, Cognition, McGraw-Hill, New York, Ig7S. Lawrence Erlbaum, Potomac,MD, pp. 10b-I27, 1974. 13. B. G. Buchanan,E. A. Feigenbaum,and J. Lederberg,A Heuristic 32. T. Dietterich and R. S. Michalski, Learning to Predict Sequences, Programming Study of Theory Formation in Sciences,Proc. of the in R. s. Michalski, J. G. carbonell and r. M. Mitchell (eds.),MaSecond International Joint Conferenceon Artificial Intelligence, chine Learning: An Artificial Intelligence Approach, YoL 2, MorLondon, L97L,pp. 40-48. gan Kaufman, Los Altos, CA, 1986,pp. 6g-106. 14. R. S. Michalski, A Variable-Valued Logic System as Applied to Picture Description and Recognition, in F. Nake, A. Rosenfeld 33. C. S. Peirce, Essaysin the Philosophy of Science,The Liberal Arts Press,New York, IgS7. (eds.),Graphic Languages,North-Holland, Amsterdam, Lg72, pp. 20-47. 34- R. S. Michalski, Theory and Methodology of Inductive Learning, in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.). 15. D. B. Lenat, AM: An Artificial IntelligenceApproachto Discovery Machine Learning: An Artificial Intelligence Approach, Tioga, in Mathematics as Heuristic Search,Ph.D. Dissertation, Stanford Palo Alto, CA, 1983,pp. 88-L84. University, 1976. 35. L. A. Rendell, Substantial Constructive Induction: Feature For16. P. Langl.y, "BACON: A Production Systemthat DiscoversEmpirmation in Search, Proc. of the Ninth IJCAI, Los Angeles, CA, ical Laws," Proc. of the Fifth International Joint Conferenceon August 1985,pp. 650-6b8. Artificial Intelligence, Cambridge, MA, L977, pp. 344-346. 17. T. M. Mitchell, Version Spaces:A Candidate Elimination Ap36. P. H- Winston, "Learning Structural Descriptions from Examproach to Rule Learning, Proc. of the Fifth International Joint ples," The Psychology of Computer Vision, McGraw-Hill, New Conference on Artificial Intelligence, Cambridge, MA, August York, 1975,ch. 5. L977,pp. 305-310. 37. T. M. Mitchell, P. E. Utgoff, and R. Banerji, Learning by Experi18. P. H. winston, "Learning and reasoning by analogy," CACM mentation: Acquiring and Refining Problem-solving Heuristics, 23(L2),689-703 ( 1979). in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.), Machine Learning: An Artificial Intelligence Approach, Tioga, 19. J. R. Anderson, A Theory of Language Acquisition Based on GenPalo Alto, CA, 1988,pp. 168-190. eral Learning Principles, Proceedingsof the SeuenthInternational Joint Conferenceon Artificial Intelligence, Vancouver, British Co38. B. G. Buchanan and E. A. Feigenbaum, ,,Dendral and Meta_ lumbia, August 1981,pp. gZ-109. Dendral: Their applications dimension," Artif. Intell. ll, E-24 (1e78). 20. R. S. Michalski and R. E. Stepp,"Learning from observation:Conceptual clusterirg," in R. S. Michalski, J. G. Carbonell,and T. M. 39. J. R. Quinlan, Learning Efficient Classification Procedures and Mitchell (eds.),Machine Learning: An Artificiat Intelligence Aptheir Application to chess End Games,in R. s. Michalski, J. G. proach, Tioga, Palo Alto, CA, 1988,pp. 881_864. Carbonell,and T. M. Mitchell (eds.), Machine Learning: An Artifi_
CONCEPTUALDEPENDENCY cial Intelligence Approach, Tioga, palo Alto, CA, 1gg3, pp. 463-
482. 40. R. S. Michalski and J. B. Larson, Selectionof Most Representative Training Examples and Incremental Generation of VLl Hypotheses:The Underlying Methodology and a Description of Programs ESEL and AQ11, Report 867, Department of Computer Science, University of Illinois, Urbana, 1928. 41. K. R. Popper, objectiue Knowledge: An Euolutionary Approach, Oxford, Clarendon Press, 1929. 42. D. R. Hofstadter, Metamagical Themas: Questing for the Essence of Mind and Pattern, Basic Books, New York, 198b, Chapter 24. 43. E. E. Smith and D. L. Medin, Categoriesand Concepfs,Harvard University Press,Cambridge,MA, 1981. 44. L. Wittgensteirt, TractatusLogico-Philosophicus,Routledge& Kegan Paul, London, 1921. 45. E. Roschand C. B. Mervis, "Family resemblances:Studies in the internal structure of categories," Cog. Psychol., 7(4), bZB-60b (1e75). 46. L. A. Zadeh, "A Fuzzy-algorithmic approach to the definition of complex or imprecise concepts,"Int. J. Man-Machine Stud. 8(B), 249-291 (1976). 47. R. S. Michalski and R. L. Chilausky, "Learning by being told and learning from examples:An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybeandiseasediagnosis,"PoI. Anal. Inform. Sys. 4(2), L25-L61 (June 1980). 48. D. Lenat, M. Prakash, and M. Shepherd,"CYC: Using common senseknowledge to overcomebrittleness and knowledge acquisition bottlenecks,"AI Mag. 6(4),65-85 (1986). General References T. G. Dietterich, B. London, K. Clarkson, and G. Dromey, Learning and Inductive Inference, in P. R. Cohen and E. A. Feigenbaum (eds.), Handbook of Artifi,ciq,l Intelligence, Vol. 3, 325-511, W. Kaufmann, Los Altos, CA, 1982. J. McCarthy, Programs with Common Sense,Proceedingsof the Sy*posium on the Mechanization of Thought Processes,Vol. 1, National Physical Laboratory, 1958. N. Zagoruiko, Empirical Prediction Algorithms, Computer Oriented J. C. Simon (ed.), Noordhoff, Leiden, The Learning Processes, Netherlands. 1976. R. S. Mlcuemrr UniversitY of Illinois
This work was supported in part by the NSF under grant No. DCR 84-06801,by the ONR under grant No. N00014-82-K-0186,and by DARPA under gsant No. N00014-K-85-0878.
DEPENDENCY CONCEPTUAL Conceptual dependency(CD) is a theory of natural language and of natural-language processing (see Natural-language generation; Natural-language understanding). tt has been developed by Schank with the motivation to enhance one's ability to construct computer programs that can understand language well enough to summarrze it, translate it into another language, and answer questions about it. At the heart of the theory lies the conjecture that language is a medium whose purpose is communication. Therefore, the central issue dealt with by the theory is the kinds of things that can be communicated, the meaning content of the communication.
What inferences are made? When are these inferences made? Where do they come from? For example, most peoplewould agree that the sentence"John sold his old car" contains a referenceto money even though the word "money" is not mentioned in the sentence.Furthermore, most people would agree that as a consequenceof John's action, he no longer owns that car. Any computer program that understands this sentencemust answer no to the question "Does John own the car?" and yes to the question "Did John receive money?" How could a program know that? To model language understanding on a computer, one needs a strong theory of human inference that operates on the level of conceptual manipulations. Furthermore, in order for a theory of language to have relevance in the field of AI, it must provide a representation of meaning as well as the means to map into and out of that representation (seeRepresentation,knowledge). Conceptual dependencytheory is a theory of the representation of meaning. It is a representation of everyday concepts and events in a way that reflects natural thinking and communication about those conceptsand events. At the time of its development, the approach taken by Schank was not considered unusual within the AI framework. Since AI is largely an experimental field, the theory and its computer implementations were viewed as investigation into the dynamics of natural-language understanding. However, in the field of linguistics thoughts about the nature and the purpose of language were oriented in a direction opposite to that reflected by Schank's theory, and the latter was consideredradical. ConceptualStructures Conceptual dependencytheory views understanding of natural language as a processof mapping linear strings of words into well-formed conceptual structures. A conceptual structure is defined as a network of concepts,where certain classesof concepts can be related in specific ways to other classes of concepts(seealso Semantic networks). The basic axiom of the theory is: For any two sentencesthat are identical in meaning, regardless of langusg€, there should be only one representation. A corollary that derives from it is: Any information in the sentencethat is implicit must be made explicit in the representation of the meaning of that sentence. The rules by which classes of objects combine may be viewed as conceptual syntax rules. It is important to note that these rules underly the language, but they are independentof it. They are rules of thought as opposedto rules of a language. The initial framework consists of the following rules (1): The meaning of a linguistic proposition is called a conceptualization or CD form. A conceptualization can be active or stative. An active conceptualization consists of the following slots: actor; action; object; and direction, source (from) destination (to) (instrument).
CONCEPTUALDEPENDENCY
A stative conceptualization consists of the following slots: object, state, and value.
Rule 1. Certain PPsCan ACT. For example, the sentence "Kevin walked" may be represented using the primitive act PTRANS (seebelow) as
Each CD has associatedsemantic constraints on the kinds of entities that can fill its slots. These semantic constraints reflect different levels of specificity. For example, some rules may be applied to any object that plays the actor role in any action. On the other hand, other rules will be very specificto a particular action and its slot values. ConceptualDependencyRules The CD rules prefer combinations of conceptsthat go along with experienceover those that violate experience.Of course, it is possible for the CD rules to be idiosyncratic, but most people share enough of them to be able to communicate.What is usually referred to as semantics (qv) in linguistics is the set of operations at the conceptual level. When the word "semantics" is used in the context of the CD theory, it means the experiential laws that allow for concept combinations. The vocabulary that expressesconceptualrules makes use of the fottowing conceptual categories of types of objects described below. PPs:PictureProducersor ConceptualNominals.Only physical objects are PPs. PPs may serve in various roles in the conceptualization.PPs that are animate or have animate properties (like machines) or that are natural forces(wind, gravity) may be actors. Any PP may serve in the role of an object.A PP in the role of sourceor destination refers to the location of that PP. Animate PPs may also serve as recipients.
r95
Actor Action Object Direction
Kevin PTRANS Kevin From: unknown To: unknown
The graphic notation is Kevin
to treat them as abbreviations for (possibly infinite) sets of rules or to reexpress them in terms of rules that introduce "fictitious" phrase types whose sole purpose is to share common parts of different rules or to express iteration of repeatable constituents. One can obtain an equivalent recursive transition network from a given context-free grammar by collecting all of the rules that share a given "left side" (i.e. all of the rules for forming a grven phrase type) and replacing them with a single rule whose right side is a regular expressioncorrespondingto the union of the right sidesof the original rules. One can then convert that right side regular expression to an equivalent transition diagram by a standard mechanical algorithm (22). A result of this author (13) shows that the resulting recursive transition network can be further optimized by the elimination of left and right recursion and the application of standard state minimization techniques (originally developedfor finitestate machines),whose effect when applied to a recursive transition network yields a transition network grammar with greatly reduced branching. Figure 2 illustrates this sequence. A standard theorem of formal language theory (23) proves that a language acceptedby a context-free glammar can be acceptedby a finite state machine unless every context-free grammar for the language contains at least one self-embedding symbol (i.e., a phrase type that can contain a proper internal embedding of the same type of phrase such as the middle S
l F S T H E N5
2.5->5AMS 3.5->5ORS P
4' 5-> (o)
Figure
2. (a) Sample context-free grammar.
(b) Equivalent
RTN. (c) Optimized RTN.
TRANSITION NETWORK AUGMENTED CRAMMAR, in the rule: S -+ if S then S). The RTN optimization results show that a given context-free grammar can be converted to an RTN, which can then be optimized until the only remaining PUSH transitions are for self-embedding constituents. Together, these results suggest that a context-free grammar can be thought of as having a finite-state part and a recursive part. The RTN optimizattonconstructions show how to extract all of the finite-state part into transition network form to which conventional finite-state optimization techniques can be applied. Note that when the standard state minimtzatton transformations are applied to a recursive transition network, they do not quite produce a deterministic network as they do for finitestate g1ammars,although they do producea network in which no two transitions leavin g a given state will have the same label. This is not sufficient to guarantee determinism for an RTN becausetwo transitions that push for different types of phrases may nevertheless recognize a common sequenceof input symbols (i.e., the grammar may be ambiguous).Even if the grammar is not ambiguous, two different phrase types may begin with somecommon initial sequence,and the grammar would not be able to tell which of the two phrase types were present before examining the sequencefurther. However, the results of such transformations can produce grammars with very little nondeterminism that can be parsed quite efficiently. (In an ATN one can exploit techniques such as finite look-aheadconditions and merged subordinate networks to produce grammars whose nondeterminism is reduced still further.) Another result (13) shows that such reduced branching RTNs can be used by a generalization of Earley's parsing algorithm Q0 to minimize the number of state transitions that need to be considered in the course of parsing. That is, an optimized RTN is used more efficiently by a generahzation of Earley's algorithm than an unoptimized RTN or an unaltered context-free grammar. RTNs are equivalent in weak generative power (i.e., can charactertzethe same sets of strings) as context-freegrammars or pushdown store automata. RTNs are slightly stronger than context-free grammars in terms of the tree structures they can assign (strong generative power) since they can characterize structures with unboundedbranching at a single level as in Figure 3. AugmentedTransitionNetworks As mentioned above, an ATN consists of an RTN augmented with a set of registers and with arbitrary conditions and actions associatedwith the arcs of the grammar. ATNs were developedin order to obtain a grammar formalism with the linguistic adequacy of a transformational grammar and the efficiency of the various context-free parsing algorithms. As a sentenceis parsed with an ATN grammar, the conditions and actions associatedwith the transitions can put pieces of the input string into registers, use the contents of registers to
327
build larger structures, check whether two registers are equal, and so on. It turns out that this model can construct the same kinds of structural descriptions as those of a transformational grammar and can do it in a much more economicalway. The merging of common parts of alternative structures, which the network grammar provides, permits a very compactrepresentation of quite large grammars, and this model has served as the basis for several natural-language-understanding systems. ATNs have also been used in systemsfor understanding continuous speech such as the Bolt, Beranek, and Newman HWIM system (24,25). For speech understanding (qv) the transition network grammar is one of the few linguistically adequategrammars for natural English that are at all amenable to coping with the combinatorial problems. A state in an ATN can be thought of dually as a concise representation of a set of alternative possible sequencesof elements leading to it from the left or as a conciseprediction of a set of possiblesequencesof elements to be found on the right. (Alternatively, it can be thought of in a right-to-left mode.) The reification of these states as concreteentities that can be used to represent partial states of knowledge and prediction during parsing is one of the major contributions of ATN grammars to the theory and practice of natural-language understanding. They are especially important in representing states of partial knowledge in the course of speechunderstanding. The ATN formalism suggestsa way of viewing a grammar as a map with various landmarks that one encounters in the course of traversing a sentence. Viewed in this way, ATN grammars serve as a conceptual map of possible sentence structures and a framework on which to hang information about constraints that apply between separateconstituents of a phrase and the output structure that the grammar should assign to a phrase. For speechunderstanding this perspective is beneficial, for example, in attempting to correlate various prosodic characteristics of sentencessuch as intonation and rhythm with "geographical landmarks" within the structure of a sentence. Another advantage of the transition network formalism is the ease with which one can follow the arcs backward and forward in order to predict the types of constituents or words that could occur to the right or left of a given word or phrase. One of the important roles of a syntactic componentin speech understanding is to predict those places where small function words such as "a", "arr," and *ofn should occur since such words are almost always unstressedand difficult to distinguish from accidentally similar acoustic patterns in spoken sentences.In the HWIM speechsystem such words are almost always found as a result of syntactic prediction and are not even looked for during lexical analysis, where more spurious matches would be found than correct ones. Other types of grammars, such as context-free grammars, can be augmented by conditions and actions associatedwith the grammar rules. However, such grammars lose someof the benefits of the recursive tranbition networks, such as merging common parts of different rules and applying optimi zung transformations. Specifyingan ATN
DET N
PP
PP
PP
Figure 3. Illustration of unbounded branching.
It is important to maintain a distinction between the underlying abstract state transition automaton that constitutes the essenceof an ATN and the various surface notations that can
328
GRAMMAR,AUGMENTEDTRANSITIONNETWORK
be used to specify an ATN grammar. A variety of notations have been developedfor specifying ATN grammars. This author's original ATN parser was written in LISP and used a notation in which the conditions and actions on the arcs were specifiedin LISP, but this is not essential.Later ATN implementations have simplified and streamlined the notations for expressing conditions and actions, and a number of other grammar formalisms can be thought of as specialrzedspecification languages whose underlying parsing automaton is an ATN (e.g., Ref. 26). With the advent of widely available graphics interfaces, one can even visualize using the gfaphic presentation of an ATN transition diagr&ffi, coupled with an interactive specification of the conditions and actions on the arcs, as a specification medium. Figure 4 gives a BNF specificationfor one notation that can be used to specify an ATN grammar. It is similar to most ATN formalisms, except that conditions on arcs are expressedin terms of an action (VERIFY (condition)), an infix assignment operator ( + ) is used in place of the more customary SETR function, and functions (NE and PC) are used to refer to the next input element and the parsed constituent of a push arc, respectively (in place of the asterisk, which served both purposesin Ref. 9.) In this notation an ATN specification consists of a list of state specificationseach of which consistsof a state name and a set of arc specifications.Arcs can be one of the five indicated types. A CAT arc aceeptsa word that is recorded in a dictionary as belonging to the specifiedsyntactic (or semantic) category; a WRD arc acceptsthe specificword named on the arc;a PUSH arc invokes a subordinate level of the ATN to recognrze a phrase beginning with the specifiedstate; a POP arc signals the completion of a phrase and specifiesan expressionfor the value that is to be returned as the structure for that phrase. A JUMP arc specifiesa transfer of control from one state to another without consuming any input.
+ ->
(state) <arc)
+
(<state (<state-name> <arc) (arc)*) (CA T (ca te g or y- name ( W R D < E n g l i s h - w o r d> < a u g m e n t a t i o n > * ( T O < s t a t e - n a m e > ) ) | ( P U S H( s t a t e - n a m e > < a u g m e n t a t i o n > * ( T O < s t a t e - n a m e > ) ) I ( P O P< e x p r e s s i o n> ( a u g m e n t a t i o n > . ) I ( J U M P( s t a t e - n a m e > < a u g m e n t a t i o n > * )
( a u g me n ta ti o n )
+
( V E R I F Y< c o n d i t i o n > ) | ( action )
( action > ->
( re gister - name> €- < expr ession> ( 5 E N D R< r e g i s t e r - n a m e ( < d efined- oper ator > < expr ession) *)
< e x p r e s s i o>n - >
( N E )l ( P C ) ( r e g i s t e r - n a m>e) | (GETR ( s t r u c t u r es c h e m > (BUILDQ a < e x p r e s s i o>n * ) | ( < defined-operator > *) ) ( expression
Figure 4. BNF Specification of ATN grammar notation: NE - next element, PC : parsed constituent, GETR - get contents of a register.
Augmentations on an arc indicate further conditions under which the arc may be taken and actions to be performed when the arc is taken. A (VERIFY (condition)) operation will block the transition if the condition is not satisfied. An assignment operation ( + ) will set a register to the value of the specified expression (this operation is known as SETR in most ATN specification languages). A SENDR action specifiesan initial value to be used for a register in a subordinate invocation about to be initiated by a PUSH arc (SENDR only makes sense on a PUSH arc and is executedbefore the subordinate computation is begun). In addition, one can define other operators that can abbreviate complex''manipulations of register contents and complex conditions under which to abort computation paths. In experimental parsing implementations, one can even send information to the parsing algorithm and/or manipulate its agendasand tables. The expressionsused in register assignments and as arguments to other actions can accessthe next element of the input string via the function NE' accessthe parsed constituent on a push arc via the function PC, accessthe contents of registers using GETR, and build structures by substituting the values of other expressions into open positions in a specified schematic structure (e.9., using BUILDQ, a primitive form of the LISP "back quote" operation). One can also invoke defined structure-building operators that encapsulate complex register manipulations and/or accessto other information outside the ATN (such as potential antecedenttables for interpreting pronouns). The parsed constituent function (PC) refers to the constituent returned by a subordinate network invocation (on a PUSH arc). LinguisticExperimentation ATNs have been used to explore a variety of issuesin linguistic theory relating to extending the abilities of grammars to specify difficult linguistic phenomena and to parse them efficiently. A number of experimental explorations are described in Ref. 14 including: 1. VIR (virtual) arcs and HOLD actions for dealing with "left extraposition" transformations such as those that move the relativtzed constituent; from its logical place in the structure of a relative clause to the position of the relative pronoun at the beginning of the clause (e.g.,"the man that I saw," "the man that Mary said ran away.") A HOLD action can make an entry on the stack when the extraposedconstituent is found, which then enablesa matching VIR arc to use the extraposedconstituent from the stack at the position where the grammar would normally expect it. This stack entry will also block the acceptanceof the phrase until some VIR arc has used the held constituent. 2. RESUMETAG and RESUME actions for dealing with "right extraposition" transformations that leave dangling modifiers that logically belong with constituents that have been fronted or otherwise moved to the left. For example,in "What papers has Dan Bobrow written that are about natural language?" the relative clause "that are about natural langua ge" clearly modifies the questioned noun phrase "what papers" but is not adjacent to it. A RESUMETAG action can be executed before popping a constituent that the grammar writer knows could have been moved to the left, away from a detached right-extraposed modifier. This
AUGMENTED TRANSITION NETWORK GRAMMAR, enables such a constituent to be reentered by a RESUME action at any point where dangling modifiers might occur, enabling the resumed constituents to consume any modifiers that it can accept at those points. 3. Selective modifier placement for dealing with the ambiguous scoping of movable modifiers such as prepositional phrases(e.g.,"I saw the man in the park with a telescope"). A special pop arc (SPOP) causesmanipulation of the parser's agendasand stacks to determine all of the placeswhere a given movable modifier might be attached. These are then evaluated to determine which is the most likely candidate given a set of semantic preference criteria. The most preferred alternative is then pursued and any others are saved on the agenda to be pursued at a later time if necessary. 4. A metagrammatical conjunction facility for handling a wide variety of conjunction constructions, including reduced conjunctions that result in apparently conjoinedsentence fragments. For example, "Give me the best methods to grow and properties of alkali iodates" involves an apparent conjunction of the fragments "best methods to grow" and "properties of." A special SYSCONJ action, invoked on specialactive arcsassociatedwith the conjunctionsAND and OR, trigger a complexmanipulation ofthe agendasand parsing configurations of the ATN so that the parsing of the sentenceup to the occurrenceof the conjunction is temporarily suspended, and some earlier configuration is restarted to parse the string beginning after the conjunction. When the restarted configuration has completed the constituent it was working on, the suspendedconfiguration is resumed in a special mode to complete its corresponding constituent on some tail of the constituent just completed. After this, the two coordinate constituents are conjoined and the two separateconfigurations merged to continue the parsing. (This produces an analysis of the above example equivalent to "Give me the best methods to grow alkali iodates and the properties of alkali iodates" by conjoining two noun phrase constituents). A schematic charactenzation of the phenomenon in question is that a string of the form "r lc u and u y t" can be analyzed as equivalent to "r s t" where s is a constituent whose structure is a conjunction of the form "bc u,/l and lx u yl."
329
are to be done when the parse returns to that level and a set of register contents to be used by those actions). As pointed out above, an RTN is equivalent in generative power to a context-free grammar or pushdown store automaton. Adding augmentations to make an ATN producesan automaton that is equivalent in power to an arbitrary Turing machine if no restriction is imposed on the conditions and actions on the arcs. This is useful in the sensethat one can be confident that any linguistic phenomenon that might be discoveredcan be characterizedwithin the formalism but has the disadvantage that one cannot guarantee that the sentences acceptableby such a grammar would be a decidableset. However, there are simple restrictions on an ATN (14) that guarantee a decidable grammar model. If one blocks infinite looping and restricts the conditions and actions on the arcs to be totally recursive (i.e., decidable),then the resulting automaton will be totally recursive. The loop-blocking restrictions merely amount to forbidding closed loops of nonconsuming arcs (such as JUMP arcs) and forbidding arbitrary "looping" of self-embeddingsingleton recursion (pushing for a single constituent, which in turn pushes for a single constituent, and so on, potentially without limit). These two mechanismsare the only ones that would let an ATN parser compute for an arbitrary amount of time without consuming anything. Perrault (27) gives a restricted class of ATNs, equivalent to finite-state tree transducers, that are known to lie within the power of a context-sensitivegrammar (a decidableclass).Finally, although the proof has not been published,this author has shown that restricting the conditions and actions of an ATN to be primitive recursive, coupledwith the loop-blocking restrictions described above, results in a parsing automaton that is itself primitive recursive (a powerful subclassof totally recursive functions). The interesting thing about this result is that almost any "sensible" ATN grammar that anyone would write automatically satisfies these restrictions so it is reasonable to think of both ATN grammars and natural English syntax as lying in the realm of primitive recursive computation. The ATN Perspective
One can think of ATNs as an efficient, abstract parsing automaton that can serve as a unifying underlying model for a variety of different high-level syntactic specification languages. FormalPropertiesof ATN Grammars For example, Swartout (28) has shown that Marcus's pARSIIn the face of various implementations of ATN parsers and FAL (29) can be viewed as a specialized ATN, and one can different formulations of the specification language for ATN think of lexical functional grammars (4) as a high-level specifigrammars, it is important to remember that the essenceof an cation language that could be parsed by an underlying ATN ATN is an abstract formal automaton in a class with finite- whosebasic arc action is a kind of "unification" ofsets of equastate machines, pushdown store autom ata, and Turing ma- tions. chines (qr). Such automata are typically defined by specifying Moreover, the operational semantics of definite clause the structure of an instantaneous configuration of a computa- grammars (qv) (30) executedin PROLOG is almost identical to tion and specifying a transition function that expressesthe a standard top-down, left-to-right parser for a special class of relationship between any instantaneous configuration and ATN whose states correspondto the "joints" between the subthose that can result from it in one "step" of the computation. goals in a rule and whose registers are the variable bindings of A nondeterministic automaton is one in which the transition the environment. function determines a set rather than a single next configuraViewed as ATNs, definite clause grammars use a powerful tion. From this perspective an ATN can be defined as an au- unification operator as a universal condition-action, whose tomaton whose instantaneous configurations record the posi- effect is to establish bindings of registers (variables) to struction in the input string, the name of the state that is currently tures. (These structures may in turn contain variables that active, a set of register contents, and a stack context (a list of point to other structures). Alternatively, one could use only stack entries each of which recordsthe push arc whose actions one register to contain the PROLOG environment as a list of
330
GRAMMAR,AUGMENTEDTRANSITIONNETWORK
bindings. The action associatedwith a final state is to return the variable bindings that were established in the embedded constituent to the higher level environment that pushed for it (invoked it as a subgoal). This requires PROLOG's ability to effectively rename variables when pushing for a constituent (invoking a subgoal) in order to keep the bindings straight, and uses an open-endedset of register names, but otherwise the mechanism is very like a direct implementation of an ATN automaton. From this point of view, a definite clause grammar can be seen as more like an augmented phrase structure grammar than a full ATN sinceit doesnot exploit the ability of its states (the "joints" between the subgoals) to support arbitrary repeatability and alternative subsequencesof transitions (subgoals).Rather, such phenomenawould be handled by creating new kinds of phrases. From the ATN perspective one can see a deep similarity between definite clause grammars and lexical functional grammars in the way that the equations of LFGs are used to add constraints to an environment similar to the variable bindings of DCGs. One major difference seemsto be the way LFGs use accesspaths through the functional structure in place of some of the things DCGs would do with variables. LFGs thus appear to avoid the need to rename variables. Otherwise, both have a similar emphasis toward specifying syntactic facts in the form of constraints on attributes of phrases that are then realized by some form of unification. The abovediscussionis one example of the way that one can use the perspective of an abstract ATN automaton to understand a variety of different parsing formalisms and syntactic specificationnotations. Without such a perspectiveit would be difficult to see a similarity between two formalisms whose surface presentation is as dramatically different as DCGs and LFGs. Coupledwith an understanding of the formal properties of various restrictions on the conditions, actions, and transition structure of an ATN, this perspectivecan also shed light on the expressivepower of other formalisms.
(SPLITS). This parser contains the experimental linguistic capabilities described above and a fairly powerful trace facility capableof producing a detailed record of the individual stepsof an ATN analysis of a sentence. The generalization of Earley's algorithm for RTNs, discussedabove, can be extended in a natural way to a general ATN parser (though not maintaining Earley's n3 time bound results if nontrivial use is made of the registers). In general, most of the parsing algorithms for context-free grammars have analogous versions for RTNs and can be extended to handle ATNs. Other implementations of ATN parsers include three middle-out parsers for ATNs used in the context of speech-understanding systems: One by Bates (31), one by Paxton (26), and one by this author (32). These are bottom-uP, data-directed parsers that can begin in the middle of a sentenceand work upward and outward in either direction. The Bates parser is capable of working on several different parts of the utterance as part of a single hypothesis. The Paxton parser provided an especially clean restricted form of ATN grammar specification (although he did not characterize it as one).The Woodsparser constructs an index that recordsfor any pair of states whether they can be connectedby chains ofjuffiP, push, and pop transitions used to quickly determine whether a new word can be connectedto an existing island and to guide the computation that establishes such a connection. One can also implement ATN grammars in languagessuch as PROLOG in a style similar to Pereira and Warren (30), where the unification and backtracking capabilities inherent in the language can be exploited to reduce (or even eliminate) the effort of writing a parsing algorithm. Finally, one can compile ATN grammars into object code that efficiently implements a combination of the parser and the grammar (33), a technique that has producedparsing programs that are roughly 10 times faster than a general ATN parsing algorithm interpreting a grammar. Misconceptionsabout ATNs
ATN Parsers A variety of different parsing algorithms have been implemented for ATN g3ammars. The most straightforward is a simple top-down, depth-first, backtracking implementation of the ATN as a parsing automaton. A slightly more powerful implementation is describedin Ref. 19. The main implementation technique is to create a data structure correspondingto an instantaneous confi.guration (ic) of an abstract ATN automaton and to implement the abstract transition function of the automaton as a procedurethat computesthe successoric's of a given ic. The ic's of the LUNAR parser are extendedfrom the formal definition above to include a "weight" expressing a degree of goodnessof the parse so far (allowing grammars to specify d.egreesof grammaticality via actions on the arcs that adjust the weight), a hotd list (for the HOLD-VIR mechanism described above),and a historical path (used for the experimental SYSCONJ features describedabove).By the setting of various flags, this parser is able to pursue parses according to a variety of control strategies including depth first, breadth first, best first, and a variety of combinations of depth first with priority ordering. There are also some special casessuch as pursuing small identified sets of alternatives in parallel
ATNs are frequently seen in different ways by different people. A common misconception is the belief that ATNs are strictly top-down, Ieft-to-right parsing algorithms. Another is that an ATN is specifiedin LISP or contains LISP codeor can only be written in LISP. As the preceding discussion makes clear, many of these beliefs are incorrect. ATNs can be defined as abstract autom ata, independent of any proglamming language, and can be implemented in a variety of progTamming languages.Similarly, many different parsittg algorithms have been implemented for ATN grammars, including bottom-up and even middle-out parsing algorithms. Another common misconception is that ATNs cannot handle unordered constituents (i.e., sequencesof constituents whoserelative order is unspecified)without enumerating all of the possibleorderings. Such phenomena,in fact, are routinely handled by use of self-looping arcs' as shown in Figure 5. In FigUre 5 three arcs accept locative, time, and manner adverbial phrases in arbitrary order at the end of a verb phrase. Conditions on the arcs restrict the parse to not more than one of each kind. (This could be relaxed to permit more than one manner adverbial, for example, by removing the VERIFY condition on that arc.) A11three of these adverbials are optional. If one or more such constituents were to be oblig-
GRAMMAR,AUGMENTEDTRANSITIONNETWORK P U S HL O C A T I V E( V E R I F Y( N O T ( G er n I O C ) ) ) L O Ce - ( P C )
P U S HT I M E ( V E R I F Y( N O T ( G E T RT I M E ) ) ) T I M E + - -( P C )
POP
(VERIFY (NOT(GETR PUSHMANNER MANNER))) MANNER article(N, Oc, Od, Oe),
'd;]' o.,od,oe, r:iTilff,l,",t6fl of)).
Note that this rule is a simplified version of fourth rule of the grammar G presented.The nonterminal for a noun phrase has three arguments. The interpretation of the last argument Ob will depend on a property Oa of an individual N becausein general a noun phrase contains an article such as "a." The word "a" has the interpretation Oe, and(Oc,Od)
Hodges writes for Penguin.
in the context of two properties Oc and Od of an individual N. The property Oc will correspondto the rest of the noun phrase contaiiring the word "a," and the property Od will come from SyntaxPlusSemantics the rest of the sentence.Therefore, Oe will contain an overall interpretation, and it is linked to Ob by the same variable. As sentences(S)--+noun-phrase(NP, 52, O), Of is the property of the common noun, it is linked to Oc by the verb([subject-XI L], O1), same variable. Oa has the description of the properties of O2). 01 complements(L, , N, and it will depend on the properties coming from the rest complements(f ], O, O) - [ ]. of the sentence.Therefore, Oa is linked to Od by the same complements(tK-NI Ll, 01, 03) - complements(L,01, C2), variable. case(K), Each word is associated to a property. For example, the noun-phrase(N,02, O3). meaning of the verb "writes" is introduced by the relation "ispublished-by(A,P)." The verb rule contains also information ((A" noun-phrase(N,02, 04) + article(N, 01, 02, O3), regarding the arguments of the relation, namely that ('P" imposes plays the role of subject in the sentenceand that the use of preposition "for." The meaning of the indefinite noun-phrase(PN,O, O) + [PN], {proper-noun(PN)}. article caa"is introduced by the conjunction "and (Of , O2)" according to the definition often adopted in classical logic. articlelA, 01 , 02 and (O1, O2))-- [a]. A more advancedgrammar than G would have more elaborated definitions for nouns, verbs, adjectives,and articles (3), case(for)- [for]. such as: case(direct)-+ [ ]. noun([A-[ ] & author& type-Xl, pr(author(X)))- no(author, A). Morphology
|Lr'o1)' :ffirffi::;,if:3J:til
-' ;::::::"":;l;1, ffii;l: ":;:::ilililil: proper-noun(hodges).
proper-noun(penguin). For example, the rule noun-phrase(PN, O, O) --+ [PN], {proper-noun (PN)}. represents the clause
no(Type,GN) + [Nosn1,{nol(Noun, Type, GN)}. nol(author, author, mas-sin). verb([(G-N)-V&type-X,dir-A-W&title-Y], pr(author(X, Y))) ve(writes, N). ve(Type,N) + [Verb], {vel(Verb, Type, N)}. vel(writes, writes, sin). adjective([A-[]&author&typ€-X, prep(by)- -- [ 1&pub&type-Y], pr(published(Y, X))) -' ad(pub, A).
342
PHRASESTRUCTURE CRAMMAR,CENERALIZED
ad(Type,GN) -- [Adj], {adl(Adj, Type, GN}. adl(published, pub, mas-sin). article(G-sin)-D-X,01, C2, for([X, D] and (Of , OZ),cardinality(X, greater, 0))) art-ind(G-sin). art-ind(mas-sin)+ [a]; [some]. '-'.) (Note; Anonymous variables are written in PROLOG These definitions include syntactic and semantic checks, such as the gender and number and semantic types. The meaning of the article is also different. Instead of a two-branched quantifier, it was introduced by a three-branched quantifier: The first branch for the variable X to be quantified, the second for the general property "and" of X's, and the third for a property (cardinality) to specify and constrain the domain of x's. of DCGs Extensions Extraposition grammars (XGs) extend the power of DCGs to specify context dependencies(7). XG rules may have, on their left side, more than one nonterminal symbol, and a "gap" symexpressesa nonspecified and arbitrary string of bol logic symbols (terminals and nonterminals). For example, the XG rule Relative-market
complement -+ [that].
states that the relative pronoun "that" can be analyzed as a relative marker followed by someunknown phrases and then a complement. XGs simplify the expression of syntactic concepts and therefore allow easier treatments of semantic and logic descriptions. Arguments to nonterminals are used (as in DCGs) for agreement checks, for producing a parse tree, and to restrict the attachment possibilities of postmodifiers. Modifier structure grammars (MSGs) improve the possibility to specify nonsyntactic representations in a clearer way. MSGs simplify the automatic construction of such representations while the analysis is processed(8). Tree grammars (TGs) allow a better handling of condination of linguistic constructions. Puzz\egrammars (PGs) are tools specially oriented toward linguists, where strategy rules describe assembly order and mode, and are specifiedindependently (12). Conclusion Logic grammars have evolved over the years into higher level tools, which allow users to concentrate on linguistic phenomena. Definite-clause grammars support the use of logic for natural-language processing, and they have paved the way for practical linguistic work basedon the programming language PROLOG.
3. H. Coelho,A Program Conuersing in PortugueseProuiding a Library Seruice,Ph.D. Thesis, University of Edinburgh, EdinburBh, U.K., and LNEC, Lisbon, Portugal, 1979. 4. V. Dahl, Un Systdme Deductif d'Intercogation de Banques de Donnds en Espagnol, These de Docteur de Troisibme Cycle, University of Aix-Marseille, Marseille, France, L977. 5. V. Dahl, "Translating Spanish into logic through logic," Am. J. Computat.Ling. 7(3), L49-164 (1981). O. D. H. D. Wamen and F. Pereira, An Efficient Easily Adaptable System for Interpreting Natural Language Queries, Research Paper 155, Department of Artificial Intelligence, University of Edinburgh, Edinburgh, U.K., 1981. 7. F. Pereira, "Extraposition grammars," Am. J. Computat. Ling. 7(4),243-256 (1981). 8. V. Dahl and M. McCord, Treating Coordination in Logic Grarnrrlars,Internal Report, Simon Fraser University, Burnaby, British Columbia, 1983. 9. V. Dahl and P. Saint-Dizier, Natural Language Understanding and Logic Programming, Elsevier Science,Amsterdam, The Netherlands, 1985. 10. J. F. Pique and P. Sabatier, An Informative Adaptable and Efficient Natural Language Consultable Database System, Proceed' ings of ECAI, pp. 250-254, 1982. 11. P. Sabatier, Dialogues en Francais auec un Ordinateur, G.I.A., University of Aix-Marseille, 1980. L2. P. Sabatier, Les Grammaires Logiqu,es,Actes du Colloque Traitement Automatique du Langage Natural, University of Nantes, Nantes, France, L984. 13. A. Colmerauer,An Interesting Natural Language Subsef,G.I.A., University of Aix-Marseilles, Marseilles, France, 1977. L4. G. Frege, Begriffsschrift, a Formula Language Modelled upon that of Arithmetic for Pure Thought, in J. Van Heijenoort (ed.), From Frege to Gddel:A SourceBook in Mathematical Logic, 18791931,Harvard University Press,Cambridge, MA, pp. 1-82, L967. 15. J. F. Pique, Interrogation en Francais d'une Base de Donn6sRelationnelle, G.I.S.,University of Aix-Marseilles,Marseilles,France, L978. General References V. Dahl, Un SystOmede Banques de Donnds en Logique du Premier Ord,re, En Vue de sa Consultation en Langue Naturelle; G.LA., University of Aix-Marseille, Marseille, France, I976. R. Kowalski, Logic for Problem Soluing, Elsevier North-Holland, New York, 1979. M. McCord, "Using slots and modifiers in logic grammars for natural language,"Artif. Intell. 18(3),327-367 (1982) E. Oliveira, L. M. Pereira, and P. Sabatier, "An expert systemenvironmental resource evaluation through natural language," Proceed' ings of the First International Logic Programming Conference, Marseille, France, 1982. J. Van Heijenoort (ed.), From Frege to Gddel:A SourceBook in Mathematical Logic, 1879-1931,Harvard University Press,Cambridge, MA, 1967. H. ConLHo Laborat6rio Nacional de Eugenharia Civil
BIBLIOGRAPHY G.I.A.,Uni1. A. Colmerauer,Les Grammairesde Metamorphose, L975. France, Marseilles, Aix-Marseilles, of versity 2. F. Pereiraand D. H. D. Warren,"Definiteclausegrammarsfor language analysis, a survey of the formalism and a comparison with augmented transition network," Artif. Intell. 13(3), 23I-278 (1e80).
RE STRUCTU PHRASE GRAMMAR,GENERALIZED Generalized phrase structure grammar (GPSG) is a framework for defining the syntax of natural languages (1). It was developedwithin theoretical linguistics in the early 1980sand
GRAMMAR,GENERALIZED PHRASESTRUCTURE
has been widely applied within computational linguistics (qu) (2). Mathematically, GPSG, 8s formulated in Ref. 1, is simply a variant of context-free phrase structure grammar (qr) (CFPSG). Historically, it falls within the family of theories that have developedout of Montague grammar. CF-PSGsattracted renewed interest within linguistics, following two decadesof neglect, when it was realized that all of the original arguments that apparently demonstrated their descriptive inadequacy for natural languages were either invalid or dependent on false premises (3). Despite their supposedlinguistic inadequacy, CF-PSGshad remained of interest within the computational linguistics community since such grammars are well understood mathematically and known to be computationally tractable. GPSG made this engineering interest theoretically respectableagain (4). In GPSG the implicit CF-PSG itself is not defined ostensively, but rather it is characterized indirectly by various techniques that have the effect of both allowing the grammar to capture linguistically significant generalizations and making the grammar several orders of magnitude more compact than a simple listing of rules would be. TheoreticalOutline GPSG defines syntactic categories as sets of syntactic feature specifications.A feature specification is an ordered pair consisting of a feature (e.g.,CASE) and a feature value. The latter may either be atomic (e.g., ACCUSATIVE) or it may be a syntactic category (i.e., features are allowed to take categories as their values). A syntactic category is then a partial function from features to their values. The internal make up of categories is further constrained by feature co-occurrencerestrictions (FCRs),which are simply Boolean conditions on combinations of feature specifications. Syntactic structures are phrase structure trees of the familiar kind whose nodes are labeled with syntactic categories as characteri zedabove. The well-formednessof the local substructures of a tree (and hence, recursively, of the tree as a whole) is determined by immediate dominance (ID) rules, linear precedence(LP) rules, principles of feature instantiation, and feature specification defaults (FSDs). ID rules are like ordinary CF-PSG rules except that they say nothing about the linear order of the items they introduce: S+ NP, VP As such, they simply permit a particular mother category to dominate the given daughter categories. Some ID rules are just listed, but others are derived from these and from each other by metarules. A metarule is a clause in the definition of the grammar that enables one to define one set of rules in terms of another set, antecedently given: VP-e X, NP > VPIPASI -- X Generalizationsthat would be lost if the two setsof rules (e.g., active VP rules and passive VP rules) were merely listed are captured by the metarule. LP rules state the relevant gen eralizations about the order of (classesof) sister constituents in the language: V ry is the alphabet and I is the set ofterminal symbols), ri.lC6
346
GRAMMAR,PHRASE.STRUCTURE
where Ce is a Boolean combination of proper analysis and domination predicates. Let G be a finite set of local constraint rules and r(G) the set of trees analyzable by G. It is assumedthat the trees in r(G) are sentential trees; that is, the root node of a tree in r(G) is labeled by the start symbol, S, and the terminal nodes are labeled by terminal symbols. It can be shown that the string language L(r(G)) : {*l*,is the terminal string of t and t C r(G)} is context free (7).
features). Also, there will be enormous redundancy in the grammar.] Each such "complex symbol" for a terminal symbol is a set of features. For example, in (4) the terminal symbols are replaced by the following complex symbols to give (4').
,/s /
\
VP
\u, v I t\ I _t /\ t-'rltI | .:X,; , l-l--l NP
NPR
Example. Let V - {S, T, &,b, c, e} and a finite set of local constraint rules:
f-wanteol
-DEr| | + | Animate I
1. S+e 2.S-aT 3. T+aS 4. S + bTc/(a-)) A DOM (T-) b. T + bSc/(a_))A DOM (S_)
l:l
LJ
In rules 1, 2, and 3 the context is null, and these rules are context free. In rule 4 (and in rule 5) the constraint requires an a on the left, and the node dominated (immediately) bV a T (and by an S in rule 5). The language generated by G can be derived by G1: S+e $-+aT T-+aS Sr + bTc
S--+aTr T-+aSr Tr-bSc
v
| lt"ll,T,i,^ L I l.r*' l' L
The possibility of associatingcomplex symbolswith intermediate nodesis not discussedin this entry. The form (4') is a "structural description" (SD) of the sentence (/ ): John wanted to publish the paper.
In G1 there are additional nonterminals Sr and T1 that enable the context checking of the local constraints grammar, G, in the generation process. It is easy to seethat under the homomorphismthat removes subscripts on the nonterminals T1 and 51, €&chtree generable in Gr is analyzable in G. Also, each tree analyzablein Ghas a homomorphic preimage in Gr. Consider once again the context-sensitive rule (10), V + wantedl- VP
P
Q0)
When (10) is interpreted as a "local constraint" as described above, the lexical item "wanted" will appear under a V node only if there is a VP node to its right (in the tree in which V appears).The predicate "VP to the right of V" is defined over the tree in which V and VP nodesappear and not on a string in which V and VP appear. Another way of saying the same thing is to say that to the right of V, there is a string that has an "analysis" VP in the tree. Context-sensitive rules in a PSG for describing linguistic gfammars are used in this "analyzabrlity" sense and not string-rewriting rules. TerminalSymbolsin a PSC. So far the terminal symbolsin a PSG have been presented as unanalyzed elements. This is done for simplicity. It is necessary to regard the terminal elements as complexes of phonological, syntactic, and semantic features (4,8). [In principle, it is possibleto eliminate aII these features complexes by introducing new nonterminals. However, the number of these new nonterminals will be extremely large (essentially coffespondingto all possiblecombinationsof
PSGsin a TransformationGrammar (TG). TGs are also not discussedin this entry. However, it is important to note that PSGs (and phrase structure trees) play a crucial role in a TG. The basic idea of a TG is that certain structural descriptions (SDs) are described in a component of a TG, called the "base component," and the other SDs are then obtained from these base-derived SDs by certain tree-transforming rules, called transformations. The base component is a phrase structure grammar and thus defines a set of base phrase structure trees. The trees obtained by using transformation rules are also phrasestructure trees.This view of TG is a more classicalview and also an oversimplified view, but it is adequate for this description. Thus, for example, the phrase structure tree for (11) below, which is shown in (12), can be basegenerated.The phrase structure tree for (/3) is then obtained by applying a transformational rule to (12),resulting in the phrase structure tree Q4). ( 11 )
John saw Mary
/'N
NP
t
nl,,x \^
I
NPRIv\
rriY outt I
John
,/utsee
(12) I
NPK
I
Mary Mary was seen by John.
(t3)
GRAMMAR, PHRASE.STRUCTURE
'N /
NP
I
NPR
I
Mary
AUX
/\
past
VP
be
en
/\,, vl
l/ t?
see I
by
347
node labeled ql P will dominate subtreesidentical to those that can be dominated by o, except that somewherein every subtree of the al B type there will occur a node of the form Bl B dominating a resumptive pronoun, a trace, or the empty string, and every node linking orlB and Bl B will be of the form ylP. Thus, al B labels a node of type a that dominates material (14) containing a hole of the type (i.e., Br extraction site in a F NP movement analysis). For example, S/NP is a sentencethat has an NP missing somewhere"The derived rules allow the propaNPR gation of a hole, and the linking rules allow the introduction of a category with a hole. For example, given the rule (15),
l
I
J o hn
Grammarsand Revivalof Phrase-Structure Trees Phrase-Structure Although a PSG is used in a TG, it plays a subsidiary role. Beginning around L975, it was becoming clear that when viewed in a certain w&y, PSGs had more descriptive power than one would have thought without necessarily going beyond the CFGs. The results on local constraints are a clear example of this point of view. In the late 1970s a number of grammatical formalisms were proposed that were nontransformational in character. Some of these are amendments to PSGs without necessarily going beyond CFGs (e.g., generalizedphrase structure grammars, GPSG (9,10),others are PSGs accompaniedby another level of representation to be used for filtering somestructures generated by PSGs (e.9.,lexical functional grammar, LFG (11)),and someothers are basedon treebuilding systems for generating phrase structure trees without the use of rewriting rules (e.9., tree-adjoining grammars, TAG (L2,13).Only GPSG and TAG are describedhere because they are directly related to phrase-structure grammars. GeneralizedPhrase-Structure Grammar(GPSG).Besidesthe ana\yzability (or node admissibility) notion described above, Gazdar (10) introduced two other notions in his framework, generalized phrase structure grammar (GPSG).These are categories with holes and an associatedset of derived rules and linking rules and metarules for deriving rules from one another. The categories with holes and the associatedrules do not increase the weak generative power beyond that of context-free grammars. The metarules, unless constrained in some fashion, will increase the generative power because,for example, a metarule can generate an infinite set of contextfree rules that can generate a strictly context-sensitive language. (The language {a"b"c"ln>L} can be generated in this way.) The metarules in the actual grammars written in the GPSG framework so far are constrained enough so that they do not increase the generative power. Gazdar introduced categories with holes and some associated rules in order to allow for the base generation of "unbounded" dependencies.Let VN be the set of "basic" nonterminal symbols.Then a set D(VN) of derived nonterminal symbols can be defined as follows. D(VN) _ lo.lBlo, F € VNI For example, if S and NP are the only two nonterminal symbols,then D(VN) would consist of S/S, S/NP, NP/NP, and NP/S. The intended interpretation of a derived category (slashed category or a category with a hole) is as follows: A
(15)
lsNPVPI
This is the same as the rule S + NP VP, but written as a node admissibility condition. Two derived rules Q6) and Qn can be obtained, [srNpNPNP/VP]
(16)
lsrNpNPVP/NPI
(17)
An example of a linking rule is a rule (rule schema) that introduces a category with a hole as needed for topicalization, for example, [sa S/a1
(/8)
[sPP S/PP]
(1e)
For a : PP this becomes
This rule will induce a structure like (20). The technique of categories with holes and the associatedderived and linking rules allows unbounded dependenciesto be accountedfor in a phrase-structure representation.
(20)
PP/PP
tl
a
book
The notion of categories with holes is not completely new. Harris (14) introducescategoriessuch as S-NP or S-pp(like S/ NP of Gazdar) to account for moved constituents. He doesnot, however, seemto provide, at least not explicitly, machinery for carrying the "hole" downward. He also has rules in his framework for introducing categorieswith holes. Thus, in his framework, something like (6) would be accomplishedby allowing for a sentence form (a center string) of the form (Z) (not entirely his notation), NP V O-Np
(21)
O - object or complement of V
This notion also appears in Kuno's context-free grammar (15). His grammar had node names with associateddescrip-
348
PHRASE.STRUCTURE GRAMMAR,
tions that reflected the missing constituent and were expanded as constituents, one of which similarly reflected the missing constituent. This was continued down to the hole. Sager (16), who has constructed a very substantial parser starting from some of these ideas and extending them significantly, has allowed for the propagation of the hole resulting in structures very similar to those of Gazdar. She has also used the notion of categories with holes in order to carry out some coordinate structure computation. For example, Sager allows for the coordination of S/a and S/a (16). Gazdar (10) is the first, however, to incorporate the notion of categories with holes and the associatedrules in a formal framework for his syntactical theory and also to exploit it in a systematic manner for explaining coordinate structure phenomena. Tree-AdjoiningGrammar(TAG). In a GPSGcertain amendations were made (e.g., the introduction of slashed categories) that allowed one to construct structural descriptions that incorporate certain aspectsof transformational grammars without transformational rules. Moreover, these amendations do not increase the generative power beyond that of CFG. It is possibleto capture many aspectsof a transformational grammar in a phrase structure tree-generating system consisting of tree-building rules rather than string-rewriting rules. The tree-adjoining glammar (TAG) is such a system. A TAG, G _ (I, A) consists of a finite set of "initial trees," I, a finite set of auxiliary trees, A, and a composition operation called "adjoining." The trees in I and A together are called "elementary trees." A tree a is an "intial tree" if it is of the form
labeled X, will correspond to a minimal recursive structure that must be brought into the derivation, if one recurseson X. A composition operation called adjoining (or adjunction) is now defined,which composesan auxiliary tree B with a tree y. Let y be a tree containing a node n bearing the label X and let F be an auxiliary tree whose root node is also labeled X. [Note that B must have, by definition, a node (and only one such) labeled X on the frontier. l Then the adjunction of p to y at node n will be the tree y' that results when the following complex operation is carried out: The subtree of 7 dominated by n, call it t, is excised,leaving a copy of n behind; the auxiliary tree B is attached at n and its root node is identified with n; and the subtree t is attached to the foot node of B and the root node n of t is identified with the foot node of B. Form (24) illustrates this operation. XS
'=/-\v,=A'
vithoutt
,\/f -
Node n
q$
L--F'Q4)
A t
The intuition underlying the adjoining operation is a simple one, but the operation is distinct from a substitution operation on trees. For a TAG, G : (I, A), T(G) is the set of all trees derived in (22) G starting from initial trees in I, and a string language L(G) is the set of all terminal strings of the trees in T(G). It can be shown that TAGs are more powerful than CFGs; is, there are string languages that can be generated by that Terminals TAGs but not by CFGs. For example, the language L That is, the root node of a is labeled S and the frontier nodes {a"bncnln= 1} can be generatedby a TAG but not by any CFG, lanare all terminal symbols. The internal nodes are nontermi- as is well known, becauseL is a strictly context-sensitive possible to it is language for a context-free guage. Moreover, form is of the nals. A tree F is an auxiliary tree if it construct a TAG, G, such that G generates the same contextfree language, but the set of phrase-structure trees generated by G cannot be generated by a CFG; that is, G provides structural descriptions for the strings of a context-free language : that no CFG can provide. In particular, for the languaE€,L (23) TAG can & langUagl, context-free > {a"eb"ln 1}, a well-known be constructed, G, that is able to provide structural descriptions for strings in L exhibiting cross-serialdependenciesbetween the a's and b's. Terminals Terminals For example, let G - (I, A), where: That is, the root node of p is labeled X, where X is a nonterminal and the frontier nodes are all terminals except one that is labeled X, the same label as that of the root. The node labeled X on the frontier will be called the foot node of B. The internal nodesare nonterminals. The initial and the auxiliary trees are not constrained in any manner other than as indicated above. The idea, however, is that both the initial and auxiliary trees will be minimal in some sense.An initial tree will correspond to a minimal sentential tree (i.e., without recursing on any nonterminal), and an auxiliary tree, with root and foot node
S I
I
I: 01= A:
Fz=
S
Ft=
/\ /\
aT
T
/
t\
t\ t\
Sb
(25)
\ S
t\
Tb
GRAMMAR,PHRASE.STRUCTURE 349 in G are shown below.
Some derivations 7o=
A
S* 12=
'ol" ,/\^
,rtt--t
--..
\
,,/' /
ia1 \
^:,,/ a 2 i \\-.-t-. \
r,
i\ l\0,
t
\..t-\- --/'f-\I
-r/
/ l \ \
AUX
i\
NP
VP
r+whl | €;
/ \ iNP
/\
S
,,/'F--.
NP
S
A /\
1r=
Fz= I
NP
Fr=
NPi
I
N
T,/N, I N
D i d J o h n p e r s u a d eB i l l S '
Who met Mary
l\0, ) \-- -'?-tl\- -'/
Fz=
bl
I
!
It : ys with Fr adjoined at S as indicated it yo by the asterisk
Tz : yr with Fz adjoined at T as indicated it y, by the asterisk
/\
NPV
t/
NV
K NP
I
S'
N Clearly, L(G) - {a"eb" ln =- 0}. The a and b in each auxiliary J o h n p e r s u a d e dB i l l S ' tree as it enters the derivation have been coindexedto indicate that they both belong to the same auxiliary tree, that is, they have a dependency relation between them. The terminal string of yz as shown in (27) below illustrates the cross-serial dependencies. The lexical string under each tree is an example string that would result by appropriate lexical insertions in the tree. The detailed structure of each tree is not relevant and should not (27) be taken as the unique structure assigned to the string. Much of the recent linguistic research can be characterized by the study of the constraints on the shape of the elementary trees, The ability to represent such cross-serial dependenciesper- initial and auxiliary. The coindexing of nodes in B1 and a3 is mits one to construct cross-serialdependenciesin natural lan- for the purpose of illsutrating the dependencies.The following phrase-structure tree (not the only ones)in this TAG can now guages(e.g.,in Dutch). In the following example, how a TAG can be used for char- be derived. By adjoining Fr to a1 at the NP node, (29) can be acterizing some linguistic structures has been illustrated very obtained. briefly and in a highly simplified manner. For example: let G - (I, A) where: CI3 =
S'
/\
(^7-\:)--:\ ,/t\
coilp \
t\,
VP
N
NPi
I
/
[+ wh]
V
DET
left
man
the
/r
I
I PROT
I
VP
/\
oi\
S
,/
/
\
NP
I
I
/
,/"\,ii ^^1"\)t DE+ :T'^/\t
l
NP
S'
d2=
NR..
I
v' V
NP
I
€i
\
VP
PRO To
/
\
V
W h o P R Ot o i n v i t e
(,li;,f ,/\) l,.l \
\ --- ----/'Y
The man who met Mary left
NP
I
N P R Ot o i n v i t e J o h n
(ze,
(28)
By adjoining Fz to a2 at the root node S of u2, (J0) can be obtained.
350
GRAMMAR,PHRASE.STRUCTURE
,/
RW (uhv, u'gv')
,-?l-t.
tt
rP,
|
tt
i^,/\,
where uhv and u'gv' are two strings with h and g as designated symbols, called heads. The result of applying the rule results in a string
I \[ /N,' \
uhu'ry'v
| /l
t
\_yl\
T/\
PRO TO
(30)
VP
/\ /\ V
NP
I
That is, the string to the right of the head of the first string is wrapped around the secondstring. The head of the resultant is the head of the first string. The adjoining operation in a TAG is very similar to wrapping operations in HG. It has been recently shown by Vijayshankar, Weir, and Joshi (18) that HGs are equivalent to TAGs (assuming head for an empty string is defined).
'l' Bill PROto inviteJane Johnpersuaded
Summary
By adjoining Fg to as dt S' under the root node of 43, one has
(3/):
*r,,,\ S'
, - - _ .-
/\
- -.F3 -\
NPi [+ whJ, I l
\ \
(3I)
Phrase-structure trees provide structural descriptions for sentences. Phrase-structure trees can be generated by phrase structure grammars. Phrase structure trees ean be shown to be appropriate to charactertze structural descriptions for sentences,including those aspectsthat are usually characterized by transformational gfammars, by making certain amendations to CFGs, without increasing their power, or by generating them from elementary trees by a suitable rule of composition, increasing the power only mildly beyond that of CFGs. Structural descriptions provided by phrase structure trees are used explicitly or implicitly in natural-language processing systems(1).
BTBLIOGRAPHY
I
to invite BillPRO WhodidJohnpersuade
I €;
Note that the dependencybetween NP[+wh] and e (the empty string, representing gap or trace) was stated locally in the auxiliary tree 43. In the tree resulting from adjoining Fs to a3 the dependent elements have moved away from each other, and in general, adjoining will make them unbounded.This is an example to show that dependenciescan be locally stated on the elementary trees, adjoining preservesthem, and may introduce unboundedness. The TAG grammar illustrates how phrase structure trees can be built out of elementary trees (elementary phrase structure trees) such that the cooccurencerelations between elements that are separated in surface constituent structure can be stated locally on elementary trees in which these elementary trees are copresent.This property of TAGs achievesthe results of transformational rules (without transformations), including the generation of phrase structure trees exhibiting cross-serialdePendencies. pollard (17) has proposeda rewriting system, called head grammars (HG), in which the rewriting rules not only allow concatenation of strings but also wrapping of one string around another. For example, HG has rules of the form
1. T. Winograd, Language a,sa Cognitiue Process,Academic Press, New York, 1983. 2. L. Bloomfield, Language, Holt, New York, 1933' 3. R. S. Wells, "Immediate constituents," Language 23, 212-226 fte47). 4. E. Bach, Syntactic Theory, Holt, Reinhart, and Winston, New York, t97 4. b. L. S. Levy and A. K. Joshi, "skeletel structural descriptions,"Inf. Cont. 39, 192-2L1 (1978). 6. S. peters and R. W. Ritchie, Context-sensitive Immediate Constituent Analysis, ProceedingsACM Symposium on Tft'eoryof Computing, pp. 150-161, 1969. 7. A. K. Joshi and L. S. Levy, "Constraints in structural descriptions," SIAM J. Comput. 6, 272-284 (1977)' 8. N. Chomsky, Aspectsof the Theory of Syntax,,The MIT Press, Cambridg., MA, PP. 131-186, 1965' g. G. J. M. Gazdar, Phrase Structure Grammar, in P. Jacobsenand G. K. pullum (eds.), The Nature of Syntactic Representation, Reidel, Boston, MA, L982" 10. G. Gazdar,J. M. E. Klein, G. K. Pullum, and I. A. S"g, Generalized Phrase Structure Gramrrlar' Blackwell, Oxford, 1985' 11. R. Kaplan and J. W. Bresnan, A Formal System for Grammatical Representation,in J. W. Bresnan (ed.),The.MentalRepresentation of Grarnmatical Relations, MIT Press, Cambridge, MA, pp' 17328L,1979. L2. A. K. Joshi, L. s. Levy, and M. Takashaski, "Tree adjunct grammars," J. Comput. Sys.Sci. 10, 136-163 (1975)'
GRAMMAR, SEMANTIC 1g. A. K. Joshi, How Much Context-sensitivity is necessaryfor Structural Description? Tree Adjoining Grammars in D. Dowty, L' Karttunen, and A. Zwicky (eds.), Natural Language Parsing, Cambridge University Press,Cambridge, MA, pp. 206-250,1984. 14. Z. S. Harris, String Analysis of Language Structure, Moutan and Co., The Hague, L962. lb. S. Kuno, The Current Grammar for the Multiple Path English Analyzer, Mathematical Linguistics and Automatic Translation, Report No. NSF 8, Computation Laboratory, Harvard University, Cambridge, MA, 1963. lG. N. Sager, Syntactic Analysis of Natural Languages, in M. Alt and M. Rubinoff (eds.), Aduances of Compu,ters,Vol. 8, Academic Press, New York, PP.202-240, 1967. L7. C. Pollard, Head Grammars, Ph.D. Dissertation, Stanford University, Stanford, 1984. 18. K. Vijay Shanker, D. Weir, and A. K. Joshi, Adjoining, Wrapping, and Headed Strings , Proceedingsof the 24th Annual Meeting of the AssociationFor Cornputational Linguistics, New York, June 1'986. A. JosHr University of Pennsylvania
GRAMMAR,SEMANTIC A "semantic glammar" is a grammar for language in which the categories refer to semantic as weII as syntactic concepts. It was first developedin the early 1970sin the attempt to build practical natural-language interfaces to educational environments, SOPHIE (qv) (L,2),and database,LIFER (qv) (3'4) and PLANES (qv) (5). It has continued to be used in a variety of commercial and other applications such as ROBOT [also known as INTELLECT (qv) (6), PHRAN (qv) (7)' XCALIBUR (8), and CLOUT. The distinguishing characteristics of a semantic grammar is the type of information it encodesand not the formalism used to represent it. Semantic grammars have been representedin many different formalisms including augmented transition networks (seeGrammar, augmentedtransition network) and augmented phrase structure grammars (see Grammar, phrase-structure). Unlike natural-language systems generally, the aim of semantic gTammarsis to characterize a subset of natural language well enough to support casual user interaction. As such, it is primarily a technique from the fietd of natural-language engineering rather than a scientific theory, [though some researchers have proposed semantic grammars as a psychological theory of language understandins (7).1 To understand semantic grammars, it is helpful to understand a little about theories of natural language. The goal of a theory of language is to explain the regularities of language. Transformational grammars (see Grammar, transformational) and lexical functional grammars are two good examples of theories of language. The syntax part of the theory explains the structural regularities of a languog€, for example, things that are true about word order and inflections. The theory doesthis by providing rules that the words and phrases must obey.This collection of rules is refened to as a grammar. An example of the kind of regularity that the syntactic part of a theory of language seeksto capture can be seen in the relationship between the following two sentences: 1. The boy hit the ball. 2. The ball was hit by the boy.
351
It is called the passive relationship and exists between an infinite number of other sentencesin English as well. A good syntactic grammar will have a small number of rules that account for the passive relationship between all of these sentences. To explain these relationships, the glammar must name and relate broad, abstract concepts.For example, introducing the conceptof a noun phrase (NP) as referring, roughly, to the collection of all possible phrases that name things allows a syntactic grammar to contain a rule like: (Noun PhraselXVerbXNounPhrase2):(NounPhrase2XAuxiliaryVerbXVerb) by (NounPhrasel) This gives rise to categories in the grammar that characterize the roles words and phrases play in the structure of language that is in the syntax. In semantic grammars, the choice of categoriesis based on the semantics of the world and the intended application domain as well as on the regularities of language. Thus, for example, in a system that was intended to answer questions about electronic circuits (such as SOPHIE), the categories might include measurement, measurable quantity, or part as well as standard categories such as determiner and preposition. For example, the rule (Measurement) ; (Determiner)(Measurable-Quantity)(PrepositionXPart) applies in the following Phrases: The voltage acrossR9. The current through the voltage reference capacitor. The power dissipation of the current-limiting transistor. In Figure 1 are two parse trees of the same sentence that might be generated by typical grammars, the left one with a standard grammar, the right one with a semantic glammar.
Advantagesof SemanticGrammars Semantic grammars provide engineering solutions to many problems that are important when building practical naturallanguage interfaces. These important issues are efficiency, habitability, discoursephenomena,bad inputs, and self-explanation. Efficiency is important because the user is waiting during the time the system spends understanding the input. Semantic grammars are efficient becausethey allow semantic constraints to be used to reduce the number of alternative parsings that must be considered.They are also efficient becausethe semantic interpretation (meanirg) of the expression follows directly from the grammar rules. When considering a natural-language interface, it is often useful to think of the interpretation of a statement as the command or query the u$er would have had to type had he or she been talking directly to the system. For example, in a databaseretrieval system the interpretation of the input is the query or queries in the retrieval language that answer the question (seeSemantics, procedural). Typically, in a semantic gfammar each rule has an augmentation associatedwith it that builds its interpretation from the interpretations of the constituents. For example, the interpretation of the rule (Query) : - (QuestionIntro) (Measurement) is a query to the databasethat retrieves the measurement specifiedin the interpretation of (Measurement). The interpretation of (Measurement) specifies the
352
GRAMMAR,SEMANTIC QUERY
MEASUREMENT QUESTTON IN T R O
What
MEASURABLE QUANTITY
QUESTTON WORD
Q/PRO
is
the
voltage
across
R9
What
S t a n d a r ds t r u c t u r e o f a n E n g l i s hq u e s t i o n
is
the
voltage
across
R9
S e m a n t i cg r a m m a r s t r u c t u r eo f a n E n g l i s hq u e s t i o n
FigUre l. Examples of two parse trees of the same sentence.
quantity being measured (e.g.,voltage) and where it should be measured (e.g., across Rg). The interpretation of (Measurement) can be used differently in, for example, a rule like (YesNo-Query) :- (Be-Verb) (Measurement) (Comparator),as in the question "is the voltage across R9 low?" Having the semantic interpretation associateddirectly with the grammar is efficient becauseit avoids a separateprocessthat doessemantic interpretation. The second important issue is habitability. It is unlikely that any natural-langu ageinterface written in the foreseeable future will understand all of natural language. What a good interface does is to provide a subset of the language in which users can expressthemselves naturally without straying over the language boundaries into unallowed sentences.This property is known as "habitability" (9). Although exactly what makes a system habitable is unknowr, certain properties make systems more or less habitable. Habitable systems accept minor or local variations of an acceptedinput and allow words and concepts that are accepted in one context to be accepted in others. For example, a system that accepts "Is something wrong?" but does not accept "fs there anything wrong?" is not very habitable. Any sublanguagethat doesnot maintain a high degree of habitability is apt to be worse than no natural-language capability becauseusers will continually be faced with the problem of revising their input. Lack of habitabitity has been found to be a major source of user frustration with natural-language systems. An important problem in designing habitable natural-language interfaces is the occurrence of discourse phenomena such as pronominal reference and ellipsis. When people interact with a system in natural language, they assumethat it is intelligent and can therefore follow a dialogue. If it does not, they have trouble adapting. The following sequenceof questions exemplifies these problems: 3. What is the population of Los Angeles? 4. What about San Diego? Input 3 contains all of the information necessary to specify a query. Input 4, however, contains only the information that is different from the previous input. Systems using semantic grammars handle sentence like 4 by recognizing the categories of the phrases that do occur in the ellided input. In this case,"San Diego" might be recogntzedas being an instance of (City;. The most recent occurrence of the same category is
Iocated in a previous input, and the new phrase is substituted for the old one. In some systems,such as SOPHIE, PLANES, and XCALIBUR, this is done using the interpretation structure of previous inputs. In some systems,such as PHRAN, the substitution is made in the previous input string, which is then reparsed.Input 4 is an example of the discoursephenomena called ellipsis (qt). Semantic grammars have also been used to handle classesof pronominal and anaphoric reference, as in the sentence "What is it for San Francisco?"Although the techniques used by semantic grammars work on many common cases of discourse constructs, there are many other more complex uses that they do not address (see Discourse understanding and Ref. 10 for more details). Another ramification of the fact that the natural-Ianguage interface will not understand everything is that it must deal effectively with inputs that lie outside its grammar, that is, sentences that do not parse. The standard solution to this problem is to understand part of the sentenceeither by ignoring words (sometimescalled "fitzzy parsing") or by recognizing phrases that do satisfy some of the grammar. A semantic grammar has the advant age that recogntzed phrases are meaningful and can be used to provide feedback to the user. For example, if the user's phrase contains the phrase "voltage across R9," the system can display the rules that use (Measurement) to give the user an idea of what sentencesthe system will accept. A related difficulty with natural-language interfaces is conveying to the user the capabilities of the system, for example, what the system can do and what conceptsit knows about. Semantic grammar systems chn use the information in the grammar to provide some help. For example, LIFER allows the user to ask about possible ways of completing a sentence. In the dialogue below, the user requestshelp in the middle of a sentence.The system respondswith the possibleways that the sentencecould be completed. Since the grammar is semantic, the terms are meaningful to the user. usER:What is the voltage (help) Inputs that would complete the (measuresysrEMRESpoNsE: ment) rule are: across (part) between (node) and (node) at (node)
GRAMMAR,TRANSFORMATIONAL
The NLMENU (11) system attacks this problem by constraining the user's input to a series of menu selections that only produce legal sentences.In addition to obviating the problem of ,rnr" cognizedsentences,the approach also has the benefit of presenting in the menus an explicit picture of what the system can do. Limitationsof SemanticGrammars Many limitations arise from the merger of semantic and syntactic knowledge that characterizes semantic glammars. The range of linguistic phenomena that have been coveredis limited and excludes,for example, complex forms of conjunctions, comparatives,or complex clause-embeddingconstructs,for example, "Which ships does the admiral think the fourth fleet can spare?" (L2). Moreover, although work in constructing semantic grammars is creating some generalizable principles of design, the grammar itself must be redone for each new domain. Categories that are appropriate to electronics are not applicable to the domain of censusdata. Even within a single domain, certain syntactic regularities, such as the passive transformation must be encodedfor each semantic class that allows active sentences.This not only increasesthe size of the grammar but, more importantly, results in a great deal of redundancy in the grammar, making it difficult to write or extend. Attempts have been made to overcomethis limitation by separating the syntactic knowledge. The simplest approach is to reformulate the categoriesin the grammar to make them more syntactic. In this casethe semantic distinctions that had previously been made by having distinct categoriesare made in the augmentations associatedwith each glammar rule that producethe interpretation. Another approachis to capture the syntactic knowledge in the program that applies the grammar rather than in the glammar itself. In PHRAN, for example, aspects of adverbs and relative clauses are handled by the matching processthat applies the grammar rules to the input. In a return to the more classicalbreakdown of linguistic information, some systems seek to maintain the advantages of semantic grammar by closely coupling separate syntactic and semantic components (13). This points to one contribution of semantic grammats to the theory of language (as contrasted to their contributions to the production of usable systems),the identification of phenomena that succumb to simple methods.
353
description of the LIFER system, which includes many elegant user interface features including the ability to change the grammar during the interaction. 4. G. G. Hendrix, E. D. Sacerdoti, D. Sagalowicz,and J. Slocum, "Developing a natural language interface to complexdata," ACM Trans. DatabaseSys.3(2), 105-147 (June 1978).Providesan overview of the LIFER System. 5. D. L. Waltz, "An English language question-answeringsystem for a large data base," ACM 2I, 526-539 (July 1978). Describesthe PLANES system that interfaces to relational databases. 6. L. R. Harris, "IJser-orienteddata basequery with the Robot natural language query system,"Int. J. Man-Mach. 9tud.9,697-713 (1977).Describesthe system ROBOT that is marketed as INTEL' LECT. 7. R. Wilensky, A. Yigal, and D. Chin, "Talking to UNIX in English: An Overview of UC," CACM 27(6), 574-593 (June 1984). Describesthe PHRAN system, which pushesthe domain dependence of semantic grammars. 8. J. G. Carbonell, DiscoursePragmatics in Task-Oriented Natural Language Interfaces, Proceeding of the 21st Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, pp. 164-168, 1983.DescribesXCALIBUR, & general systemfor interfacing to expert systems. 9. W. C. Watt, "Habitability," Am. Document.L9,338-351 (1968). 10. B. L. Webber, So What Can We Talk About Now? in M. Brady and R. C. Berwick (ed.), Computational Models of Discourse, MIT Press,Cambridge, MA pp. 331-371, 1983. Describesthe difficult problems anaphoric reference that arise in natural discourse. 11. H. R. Tennant, K. M. Ross,R. M. Saenz,C. W. Thompson,and J. R. Miller, Menu-Based Natural Language Understanding, Proceeding of the 21st Annual Meeting of the Association for Computational Linguistics, Cambridge,MA, pp. 151-158, 1983.Describes NLMENU, a menu driven natural language input system. L2. T. Winograd, Language as a Cognitiue Process,Vol. 1, Syntax, Addison-Wesley,Menlo Park, CA, p. 381, 1983. Excellent introduction to the area of natural language understanding. 13. R. J. Bobrow and B. L. Webber, Knowledge Representationfor Syntacticlsemantic Processirg, Proceedings of the l st AAAI , Stanford CA, pp. 316-323,1980. Describesthe RUS system that arose from attempts to extract knowledge common to semantic gramamrs in several domains.
R. BunroN XeroxPARC
GRAMMAR,TRANSFORMATIONAL BIBLIOGRAPHY T?ansformational grammar is a theory for describing human 1. R. R. Burton,SemanticGrammar:An EngineeringTechniquefor languagesbasedon the idea that the full range of sentencesin a language can be describedby variations, or transformations, Constructing Natural Language Understanding Systems, BBN Report 4274, Bolt, Beranek and Newman, Cambridge,MA, L976. on a set of basic sentences.Developedby Noam Chomsky in Burton's Ph.D. thesis, University of California, Irvine, L976which the early 1950s and building on the earlier work of Zelhg introduced the term "semantic grammar" and described its use Harris (L,2), the theory of transformational grammar is now and advantages in building the SOPHIE natural-language frontprobably the single most widely studied and used linguistic end. Good introduction to the issues surrounding natural-lanmodel in the United States. (A revised version of Chomsky's guage engineering. thesis work of the early 1950s that initiated the study of 2. R. R. Burton and J. S. Brown, Toward a Natural Language Capatransformational grammar, Ref. 1 gives a brief review of the bility for Computer-AssistedInstruction," in H. O'Neill (ed.),Prointellectual background at the time. Although it is difficult cedures for Instructional Systems Deuelopment,Academic, New reading, is it still a good source on the overall framework of paper largely basedon York, pp.273-313, 1979.A more accessible "Semantic grammar: An engineering technique for constructing generative grammar, including the theory of linguistic levels of description.) Transformational grammar has also been the natural language understanding systems." subject of experiments in human language processingand the to Buildiig Practical Manual: A Guide 3. G. G. Hendrix, The LIFER basis for several computer models of language processing,data Natural Language Interfaces, Technical Note 138, SRI Artificial Intelligence Center, Menlo Park, CA, February 1977. Complete base retrieval, and language acquisition. The theory has had
354
TRANSFORMATIONAL GRAMMAR,
an enormous influence on the practice of linguistics as a scientific discipline, particularly as part of a general approach to the study of human cognition that posits the existence of mental representations that have a central role in mental processing. Many of the core proposalsof the theory, those regarding the exact representation of linguistic knowledge, remain controversial in nature and have given rise to a variety of competing linguistic models (2,3-9). Ianguage T?ansformational grammar seeks to answer three key questions about human language: what is knowledge of langUage; how is that knowledge put to use; and how is knowledge of Ianguage acquired? It aims to answer the first question by providing a finite representatior, 4 grammar, for eachpossible human language. This use of the term "grammar" in the transformational framework is to be contrasted with its colloquial usage.Given a particular human language like English, a grammar for that language is to show how each sentenceof that language is pronounced, and how its sound can be paired with its meaning (or meanings, if the sentencehas more than one meaning); that is, the grammar completely characterizesa set of (sound,meanirg) pairs for that language. Note that this description is not meant to have any direct computational interpretation but is just meant to describe in an abstract way the representation of grammatical knowledge that a person might have, sometimes called linguistic competence(seeLinguistics, competence and performance). Transformational grammar answers the question of how langUage is used by claiming that the grammar enters into the mental computations speakers and hearers carry out when they produce or understand sentences.A full theory of language use would thus include someaccount of actual computational algorithms, memory requirements, and the like, entering into langUage processing;this would be an account of linguistic performance. Finally, the theory tries to answer the question of how language is acquired by assuming that all human languages are cut from the same basic design, called universal grammar. Universal grammar is not itself the grammar for any natural language but is like a chassisthat is to be fleshedout and built upon by the actual linguistic experience of a child in a particular language community. Much of the theoretical effort in transformational grammar is directed to showing how human Ianguages vary from each other in small enough ways that a child can learn English or Chinese or German without explicit instruction.
branch of this mathematical study, known as formal language theory, grew out of Chomsky's study of rule systems for generating languages (10,11). Transformational grammar is thus part of the so-called generative paradigm in linguistic theory and is sometimes called transformational generative grammar. Other grammatical theories for human languages may be constructed that are generative but do not include transformations as part of their grammars. Over the past 30 years several such alternative theories have been advanced, such as relational gfammar (5), arc-pair grammar (6), and more recently lexical-functional grammar (3) and generalized phrase structure grammar (4). Syntacticand SemanticRules As mentioned, a central idea of transformational theory is that the variety of surface forms of any particular language-its sentences-are the result of the interaction of several modular subsystems.Most versions of transformational grammar assume that two of the basic subsystems are a set of syntactic rules or constraints and a set of semantic rules. The syntactic rules (from the Greek auura(t(, "arranged together") specify the legal arrangements of words in sentences,for example, that the English sentence"John will eat the ice cream" is legal becauseit consists of a subject noun phrase "John" preceding a verb phrase or predicate "will eat the ice cream." The semantic rules specify how a particular arrangement of words is to be interpreted, for example, that "WiII John eat the ice cream" is a question. The syntactic rules may be further subdivided into a set of rules, a base grammar that generates a set of basic sentences (at one time called kernel sentencesand later deep structures, though the terminolory is no longer applicable) and a set of transformations that operate on these basic sentencesto produced derived sentences or surface structures. Additional rules operate on the surface structures to yield pronounceable output sentences(1,10). Transformations
Roughly and intuitively, the transformations are designed to account for the systematic relationship between sentences such as active-passive sentence pairs; global sentence relationships, such as the relationship between "what" and "eat" in "what will John eat," where "what" is the questioned object of "eat"; and ambigUities in sentencessuch as "they are flying planes," where one and the same string of words is derived from two different base sentences(one where "flying planes" is a kind of plane, with "flying" an a{ective, and one where GenerativeGrammar "flying" is the main verb). For instance, in one version of (11),the senAlong with much other work in linguistics, transformational transformational grammar developedabout 1965 generated by a be would ice cream" the eat wilt "John grammar notes that since the speakers of a language like En- tence rule transforrnational a then and rules, syntactic of set produce infinite simple an or glish have the potential to understand and invert "John" would sentence basic this on of way operating have some must speakers such sentences, number of ice the eat John question "Will produce derived the generating an infinite number of sentencesfrom finite means. "wiII" to could operations of transformational series Another produce in cream." The use of the term "gener ate" here doesnot mean passive sentence the senseof a speaker being able to say some particular sen- act on this sentenceyet again to produce the last sequenceof This John." by eaten be cream ice the system "Will axiom an of sense mathematical the in rather but tence and "by," mov"be" elements being able to produce or derive a set of theorems. For this operations involves adding new of existing form the changing and around, elements ittg old purpose, transformational grammar relies on the mathematiblock diagives overall the 1 Figure in a sentence. one elements allow that onward 1930s cal devices formulated from the sounds are and meaning how showing system, gram this for rules. One finite of means by sets infinite specify to recursively
TRANSFORMATIONAL 355 GRAMMAR, Base grammar rules
J
Output: base (deep) structures -+ Semantic interpretation (meaning)
t TYansformational rules
t Output: surface stmctures
I
grammatical relations like Subj. In another current theory, generalized phrase structure grammar (4), the active-passive relationship is describedby a metagrammar that, given a rule system that can produce the active sentence, derives a rule system to produce the corresponding passive form. This derived gTammar contains no transformations but simply generates all surface forms directly without a derivation from a deep structure.
Phonological rules
J Output: sound
Variationsof TG
As a concrete example of how a transformational grammar of a transformational Figure l. A block diagramof the components factors apart the sound-meaning relationship and how the grammar,Ca 1965(11). form of transformations and base rules has changed,consider one version of transformational grammar, the so-called expaired. Note that in this version of the theory the meaning of a tended standard theory of the 1970s(2,L2,13).First it is shown sentenceis determined by rules operating on the output of the how this version of transformational grammar differs from the base grammar, that is, the deep structures. The workings of version of the mid-1960s, which was briefly sketched above. this model, known as the standard theory, are described in This gives a detailed example of how the components of a transformational grammar work together. Then it shows how more detail below. these components have been modified in the most recent version of transformational gTammar, known as governmentDerivation Process binding theory (14). Reviewing what was described earlier, the 1965 transforThe processof deriving a sentence (surface structure) such as "Will the ice cream be eaten by John" has been the source of mational theory had a syntactic component with two types of considerable confusion for computational analysis. A deriva- rules: First, it had a base grammar consisting of phrase struction does not give an algorithmic description or flowchart for ture rules, which represented or marked the basic categorial how a derived sentence could be mechanically produced or relationships or phrases of sentences,such as the fact that a analyzed by a machine; that is, it does not directly give a noun phrase (NP) follows a verb phrase ryP) in "John wiII eat parsing procedure for natural languages. The latter part of the ice cream." This defined what is called a set of phrase this article gives more detail on how transformational gram- markers or basic sentences. Semantic interpretation was asmars may actually be used for sentenceanalysis or production. sumed to take place via a set of rules operating on the output of the base grammar (called deepstructures). Second,this theOver the course of 30 years the theory of transformational grammar has greatly altered the mechanisms used to generate ory contained transformational rules that mapped phrase the basic sentences,the definition of transformations, and the markers to other phrase markers, producing surface strucway that the final complexity of sentencesis accounted for by tures as output. Phonological rules operated on the surface the various subcomponentsof the grammar. The general trend structures to yield sentences in their final "pronounceable" has been to have less and less of the final sentenceform be form (11). determined by particular transformations or rules in the base grammar. Instead, relatively more of the constraints are writBaseCrammar. The basic phrase markers are describedby ten in the form of principles common to all languages or en- a phrase-structure grammar, in the simplest casecontext-free coded in the diction ary entries associated with each word grammar (15). A simple example of a phrase structure gram(12,13). mar helps to clarify this notion and illustrates how a grammar can generate a language. This grammar is given in the form of This approach is controversial (2). Other researchers in generative gtammar have adopted quite different viewpoints context-freephrase structure rules (10,11,15): about how best to describe the variations within and across (1) S -+ NP Aux VP (2) VP -+ Verb NP natural languages. In general, these alternative accounts (3) NP -+ Name (4) NP -+ Determiner Noun adopt means other than transformations to model the basic (5) Auxiliary + will (6) Verb -+ eat variation in surface sentencesor assumeother predicatesthan (7) Determiner -+ the (8) Noun + ice cream phrase structure relations are centrally involved in grammati(9) Name -+ John cal descriptions (7). In the recent theory dubbed lexical-functional grammar (LFG) (3), there are no transformations. InThe first rule says that a sentence(S) is a noun phrase (NP) stead,the differencesbetween,for example, an active sentence followed by an auxiliary verb and then a verb phrase (VP). The like "John kissed Mary" and a passive sentence like "Mary arrow can be read as an abbreviation for "is a" or as an instrucwas kissed by John" is encodedin the form of different lexical tion to generate the sequenceof symbols NP Aux VP from the entries (dictionary entries) for "kiss" and "kissed" plus a con- symbol S. That is, this rule is a commandto replacethe symbol nection between those lexical entries and the glammatical S with the sequenceNP Aux VP. For the purposesof the rule, relations of subject and object. In this example, among other the symbols NP, VP and so on are regarded as atomic. Simithings the lexical entry for "kiss" says that "John" is the sub- larly, the secondrule says that a VP consistsof a verb followed ject, and that for "kissed" says that "Mary" is the object.There by an NP, while the third and fourth rules describe NPs as is no derivation from deep structure but simply the direct con- either a name or a determiner followed by a noun. The last five struction of a kind of surface structure plus the assignment of rules are lexical rules that introduce actual words like "ice
356
GRAMMAR,TRANSFORMATIONAL
cream" or "John." In a full grammar this representation would be in a form suitable for pronunciation, but conventionally printed versions just spell out words in their written form. Symbols like "ice cream" are called terminal elementsbecause they do not appear on the left side of any rules. Therefore, no more rules apply to them, and they terminate any further action. All other symbolslike S, NP, VP, Name, and so on, are called nonterminals. All the rules in this grammar are called context-free because they allow one to freely replace whatever symbol is to the left of the arrow with whatever sequenceof symbols is to the right of the arrow. Formally, context-freerules have only a single atomic symbol like S, VP, or NP to the left of the "is a" arrow. To use this grammar for generating a base phrase marker, one applies the rules of the grammar beginning with a designated initial symbol, in this case S, until no further rules can apply. This is called a derivation because it derives a new string of symbols from the starting symbol S. If the derivation consists of only words, it generates a sentence.The set of all sentencesthat is derived from S given somegrammar is called the language generated by the glammar. For example, applying rule 1, one can replaceS with NP Aux VP. Now apply rule 3 and replace the symbot NP with Name, producing the sequenceof symbols Name VP. Note how the presenceof the VP context to the right of the symbol NP did not matter here; that is why the application of rule 3 is called context free. Continuirg, now apply rule 9 and replace Name with "John," obtaining "John" VP. Since "John" is a terminal element, no more rules apply to it, but rule 5 applies to Aux. One can replaceit with "will" and get "John will" VP. Now rule 2 applies to VP. Replacing VP with Verb NP yields "John" Verb NP. Passing rapidly over the remaining steps,rules 6, 4,7, and 8 apply in turn, yielding "John will eat the ice cream," a sentencederived by this grammar. This derivation not only says that this sentence can be generatedby the given grammar but it also specifies, by means of the rules that were applied, what the implicit structural relationships are in a sentence,someof which are of linguistic importance. For instance, by defining the subject of a sentence as the first NP immediately contained in S, it is clear by inspecting the derivation that "John" is the subject of the sentence.This information can be made explicit either by recording the sequencerules that were applied to generate each sentenceor by associatingwith the grammar that explicitly marks phrase boundaries by wrapping left and right brackets around each nonterminal symbol, Iabeled with the name of that symbol. In addition, just to get things going, the grammar must include a special initial rule Start + [sS1:
will]lvp fv [s [Np [N"*"J ohn][Auxiliary "raeatf -cream]lJl [Np [o"t/he][Noonice Conventionally, rule systemslike the onejust describedare augmented to exclude the possibility of generating nonsentenceslike "The ice cream ate" or "John took." To do this, the context-freelexical rules of the original grammar are replaced with context-sensitiverules (10,15)that rewrite the symbols like Verb or Noun only in certain contexts. For example, the symbol Verb should be replaced with a verb like "took" only when there is an NP object to the verb's right. The theory of transformational grammar from the time of the 1965 work by Chomsky, "Aspectsof the Theory of Synto)c"(11), has placed such context-sensitive lexical insertion rules in the diction ary rather than in the base phrase structure component.That is, instead of using a context-freerule to replace the symbol Verb with an appropriate work, the basegrammar is expandeduntil there are just symbols like Verb, Noun, or Determiner left. Then the diction ary is consulted to see what words can be inserted given the surrounding context of the other symbols. For example, the dictionary entry for eat might look like this: eat: Verb, Noun[ f Animate] Auxili ary-,-Determiner Nounl -Abstract] This entry says that "eat" is a verb; that is can occur to the right of an animate noun (Iike "John") followed by an auxiliary verb; and that it can occur to the left of a determiner followed by a non-abstract noun (like "ice cream" but not like "fear"). In addition, the dictionary contains implicational statementsof the form: If a word is a person'sname, then it is also an animate noun. Therefore, the verb can be replaced in a sequenceof symbols like Name Verb Determiner Noun with the word "eat" because the diction ary entry for this word meets all the given context conditions. Together,the dictiondty, consistingof lexical insertion constraints and implicational rules, plus the base phrase structure rules generate the set of possiblebase phrase structures. At one time these were called deep structures, to indicate that they "underlay" the surface forms of sentences,but this terminolory proved confusitg; such forms are not "deeper" in the sense that they are more fundamental or their meaning is deeper.This terminology was therefore discarded (11,L2).
TransformationalComponent.Referring to Figure L, the basestructures may now be fed to the transformational component, where zero or more transformations can be applied to generate additional sentences;the output of this processis a surface structure, ready to be "spelled out" and pronouncedas a sentence.If no transformations apply, the surface structure (1) S + [NpNP] (0) Srart + [sS] is the same as the base phrase structure. This will roughly be [AuxliarrAuxil i arY] IvPVP] the casefor ordinary declarative sentences,such as "John will (3) NP + [N"rr,.Name] (2) VP + [veruVerb][NpNP] eat the ice cream." If transformations do apply, they produce new phrase markers, such as "Will John eat the ice-cream." (5) Auxiliary + will (4) NP -+ [o"t"r-ir,".Determiner] Each transformation is defined by a structural description INoorrNoun] defines the phrase markers to which it can apply and a that (T) Determiner + the (6) Verb + ate struetural change that describeshow that phrase marker will be altered. That is, a transformation must match the descrip(9) Name -> John (8) Noun -> ice cream tion of a phrase marker and producesas output a new phrase marker. Further transformations may then apply to this new If the reader follows through the derivation of "John will phrase marker, and so on. In this sense,transformations are it grammar, will new eat the ice cream" as before,but with the generated, like an if-then production rule system,with the domarn much be will of symbols string following the be seenthat of the rules being phrase markers. the phrase marker or labeled bracketing of the sentence:
TRANSFORMATIONAL 357 GRAMMAR, For instance, one such transformation creates an interrogative sentenceby starting with a phrase marker of the form X wh Y, where X and Y are any string of symbols in phrase markers, and wh is any phrase with a wh at the front of it, like "whatr" "who," or "what ice cream." It then moves the wh phrase to the front of the sentence. For example, given the phrase marker corresponding to the sentence"John will eat what" (L2), the phrase marker portion corresponding to "John will eat" matches X, "what" matches the wh phrase portion of the transformation, and the empty string matches Y; therefore, this transformation can apply. Moving the wh phrase to the front gives "What John will eat." An additional transformation, subject-auxiliary inversion, can now apply to this new phrase marker, interchanging the NP "John" and the auxiliary phrase "will" to produce the question "What will John eat." Note that transformational rules manipulate only whole phrases, Iike the wh phrase above. Conventionally, structural descriptions and structural changesare written by labeling the elements in the pattern to be matched with numbers and then showing how those elements are changed (moved,inverted, or deleted)by indicating the appropriate changes on the numbers. In this format, for example, the wh phrase rule would be written as follows: Structural description: (X, wh, Y (L,2,3) (2,3,1) Structural change: Extended Standard Theory. As described, this version of transformational grammar, the standard theory, was current from the mid-1960s to about 1970. In this theory deep structures were also the input to another component of the grammar, dealing with semantic interpretation and then, ultimately, rules of inference, belief, and so forth. Among other reasons,this position was discardedwhen it becameclear, for example, that sentenceswith the same base structure could have different meanings. As a simple example, consider the sentences"Everyone in this room speaks two languages" versus "Two languages are spoken by everyone in this room." If Bill and Sue are in the room, the first sentence is usually taken to mean that Bitl speaks, for example, German and French, and Sue speaksEnglish and French-they each speak two languages but not necessarily the same languages. The second sentence is different: It is ordinarily interpreted to mean that there are exactly two languages-the same two languages-that everybody in the room speaks. But, assuming that the secondsentenceis derived from the first by the passivetransformation, this means that both sentenceswould have the same deep structure-and therefore the same meanitg, unless something more than just deep structure enters into the determination of meaning. To deal with such probleffis, among others, the extended standard theory (EST) of the early 1970saddednew representational devices and new constraints designedto simplify the statement of transformations and give a better format for semantic interpretation (L2,13). First, it was proposed that when a phrase is moved by a transformation, it leaves behind an empty category of the kind moved, a trace, indicating the position from which it was moved. For example, the wh phrase question transformation applied to "John will eat what" now gives: What John will eat [Npe]
where [xpe] denotesan empty NP or empty category(hencethe "e") that is the object of "eat." The theory assumesthat "what" and its trace are associated,for example, by the notation of coindexing: a subscript is assi$led to "What" (say, i) and the same subscript to [Npe]. This empty NP will not be "pronounced" by the rules of phonetic interpretation so that the final spoken sentencewill not reveal the empty category directly. The trace is to be understood as a kind of variable bound to "wh dt," and semantic interpretation rules will now assign to "what" the meaning "for which thing," thus yielding the following representation: For what thing X, will John eat X. In this way the enriched surface structure (now called S-structure) will provide the format for semantic interpretation and retain the relationships such as that between verb and object that were transparently represented by deep structure (now called D-structure to avoid any confusion with the earlier approach). Questions regarding the interpretation of sentences such as "everyone in this room speaks two languag€s," which involved mainly the interpretation of the quantifier-like terms "everyone" and "two," are now determined via the operation of rules that operate on S-structure, deriving a new level of representation quite closeto S-structure, but one that substitutes "for which thing X" for terms such as "what," binds traces considered as variables to their wh antecedents, interprets quantifiers, and so forth. This new representation, called LF (for logical form) completes the picture of the extended standard theory model, shown in Figure 2. Again, the diagram depicts simply the logical relationship among elements, not the temporal organization of a processing system that would use them. Constraints.The second major shift in transformational glammar from the mid-1960sthrough the 1970s,pursued today with renewed effort, involved the role of constraints. From the outset it was noted that the transformational rules for any particular grammar, soy, for English, would have to be quite complex, with elaborate structural descriptions. For example, the simple rule given earlier to move a wh phrase to the front of a sentence: (X, wh, Y) -
(2, 1, 3)
will give the wrong result applied to the following example even though the structural description of the rule matches: I wonder a picture of 12 What
what
is on the table
I wonder a picture of
is on the table
3-)
since after several other transformations it eventually produces the incorrect sentence, "what do I wonder a picture of is on the table." However, complicating the structural descriptions of rules leads to a large, complex rule system that becomes harder to learn since it is difficult to explain why a child would posit a complex structural description rather than a simple one like movin g a wh phrase to the front of the sentence. Starting about L964 linguists began to formulate constraints that allowed one to retain and even simplify transformational rules like "front wh." These constraints were not
358
GRAMMAR,TRANSFORMATIONAL Base grammar phrase-structure rules D-structures
J
Transformational rules
I
S-structures (with traces)
/\ LF rules
Phonological rules
JI
Logical form (LF)
Phonetic form eF)
Figure 2. Block diagram of extended standard theory (EST) (12,13).
part of any particular grammar, like English, but part of aII human grammars. For example, the A-over-A principle, applying to all transformational rules, states that one cannot move a phrase of type A out of another phrase of type A. This prohibits the wh phrase rule from applying in the errant example above,since "what," consideredan NP, cannot be moved out of the NP "a picture of." Further simplifications became possible when it was realized that many other particular transformations were just the result of such general principles operating in conjunction with just a few very simple, general transformations. As an example, in earlier work in transformational grammar there were many different rules that acted on NPs, among these, a passive transformation, exemplified below: John ate the ice cre(fin. The ice cream was eaten by John. This rule could be written with the following structural description and structural chatrge,moving the third element to the front, adding a past tense "be" form, and altering the tense of the verb (details of the latter change being omitted here) and moving the subject after a "by" phrase to the end: (NP,
V,
NP)
1
2
3+
(3,
be-en 2
bY 1)
Another transformational rule affecting NPs, called "raising," moves the NP of an embedded sentence to an empty position at the front of a sentence: e seernsJohn to like ice cream. John seernse to like ice cream. Given other, general, constraints, modern transformational theory shows that there is no passive or raising rule but just the interaction of the constraints with the single rule MoveNP. In addition, the rule moving wh phrases can be shown to have many properties in common with the Move-NP rule, so that in fact there is essentially just one transformational rule in the modern theory, namely, the rule Move-alpha, where alpha is any category (NP, wh phrase, etc.).There are no structural descriptions or structural changes in the statement of this single rule. It is therefore incorrect in the modern theory grammar to speak of a separate rule of of tranrfor*.tional passiveor raising (13,14).
To give a simple example of some of these constraints and how they simplify the statement of rules, consider again the passive transformation. In modern terms, the structure underlying the surface passive form would be: LSseenBill by John. The modern theory assumesthat there is a general principle requiring every NP that Will be pronounced to have Case, where Case can be roughly thought of in traditional grammatical terms, for example, "het" has objective Case, "she" has nominative, and so on. This is called the case filter. Case is assigned by transitive, tensed verbs: nominative case to the subject, objective case to the object; case can also be assigned via a preposition to its object. Verbs with passive tense, like "seen," are in effect intransitive; therefore, they cannot assign case. The result is that unless "8i11" moves, it does not get case;therefore,theruleMove-alpha(wherealpha plies, moving it to the empty subject position and obtaining the passive sentenceform. Similarly, in e seerrlsJohn to like ice crea'rrl "John" doesnot receive casebecauseit is in a sentencewithout a tensed verb. Therefore, it must move to a position that does receive case, for example, the subject position. In both cases, movement to the subject position is not a property of the rule of passive or raising but is a side-effectof the casefilter along with the particular properties of verbs for a given language. Much current work is devoted to describing the variation from language to language in these constraints, dubbed parameters, so that the full range of human languages can be ac' counted for. For example, in Rornancelanguages like Spanish and Italian, the object of a passive verb can still assign case, and so something like the surfaceform "was seenBill by John" is permitted. Besides the casefiIter, in the modern theory of transforrrrstional grammar there are a variety of other constraints that interact with the rule Move-alpha to yield particular surface sentences.Among the most important of these are certain locality principles that timit the domain over which a phrase can be moved. One such principle, Subjacency,states that Movealpha cannot move a phrase more than one sentence away from its starting position. This prohibits surface sentenceslike the following, where "John" is moved acrosstwo sentences("it is certain" is one and "to tike ice cream" is the other), and allows movement acrossjust one sentence.Note that the permitted example also substitutes a dummy "it" into the subject position, a particular requirement in English but not in Romance languages tike Italian or Spanish, that will not be covered here. e seerrlse is certain John to like ice cream. John seernsit is certain e to tike ice cream' (forbidden) It seemsJohn is certain e to like ice cream. (allowed) X-Bar Theory. Besides these constraints on transformations, there have been significant restrictions placed on the system of basephrase structure rules (generating the set of Dstructures). It was noted by L970that a (Noun, Verb, Preposition, Adjective) phrase consistsof a head that is itself a noun, verb, preposition, or adjective, respectively, and its comple-
GRAMMAR, TRANSFORMATIONAL
ments as determined by the head. For example, a verb such as "eat" takes optional NP complements corresponding to the thing eaten and an instrument used for eating: "John ate the ice cream with a spoor," while a verb like "know" takes either an NP or a sentence complement: "John knows Bill, John knows that Bitl likes ice cream." Importantly, the same requirements show up in the noun forms of the verbs, if there is one: "knowledge that BiIl likes ice creaffi, knowledge of Bill." This suggeststhat we need not write separate rules to expand out verb and NPs but one general template for all phrase types in a language. This subtheory of transformational grammar is called X-bar theory, after the idea that all phrases may be built on a single framework of type X, whete X is filled by the values of word categorieslike verb, noun, or preposition to get verb, noun, and prepositional phrases. Following X-bar theory, modern transformational grammar stores the properties of words in a dictionary, along with that, in English, the complementsfollow the head and that all complement restrictions of a head as expressedin the dictionary be represented in the syntactic representation as well; this last constraint is called the projection principle. If this is done, then an elaborate set of phrase structure rules is not neededto generate D-structures; all that is neededis the set of diction ary entries plus the general constraints of X-bar theory and the projection principle.
359
tences roughly in the form S + NP-auxiliary verb-VP. Instead of a transformational rule of subject-auxiliary verb inversion, there is in effect a separate context-free rule expanding S as Auxili ary Verb-NP VP. The systematic correlation between inverted and noninverted forms is captured by an implicational statement, a metarule in a metagfammar, stating that if there is a rule in the noninverted form, then there is a rule in the inverted form. However, there is no notion of a derivation from a D-structure to an S-structure. Such theories are sometimes said to be monostratal becausethey posit just one level of syntactic structure, in contrast to a multiple-level approach like GB theory. In addition, the effect of transformational movement is accommodatedby expanding the set of nonterminal names to record the link between a displaced element and the position from which it is interpreted in the predicate-argument sense.For example, in the sentence"who did John see," one can augment the sequenceof phrases to record "who" as being interpreted in the position afber "see" as follows: Who [S/wh did John [VP/wh seetwh/Wh e]l
Here, the categoriesS/wh and VPlwh record the link between the position after "see" (marked with Wh/Wh) and "'Who." Note that there is a phonologically empty element after "see," as there would be in the transformational analysis. The differenceis that this is generatedby a context-freerule rather than a transformation. by recent version of most The Government-BindingTheory. Lexical-functional grammar (qv) also avoids transformatransformational gTammar, known as government-binding but retains a muttiple-levels approach (3)' Instead of Dtions theory (12-L4), incorporates all of the changes described S-structure, lexical-functional grammar proposes above: X-bar theory and general principles, instead of base structure or (F-structure) and phrase structure rules, and reduction of transformations to a the representations of functional structure (C-structure). differs from F-structure structure constituent single transformational rule Move-alpha and general congrammatical relations such as it in takes that D-structure straints on its application (the casefilter, subjacehcY,etc.).In (the prepositional a of object object and oblique object, subject, addition, a rich source of investigation in the governmentprimitives. C-structure, generated by conbinding theory centers on the rules that map from S-structure phrase) as central gives a representation of phrasal and hierarrules, text-free to LF, having to do with the relationship between traces and pairing of F-structures and C-strucA relationships. chical the phrases that "bind" them (in the senseof variable bindgrammatical relations like subject and object associates tures primitive configuing); and the constraints governing certain to phrasal elements. In this theory some variations in surface rational relationships, such as that between a verb and its sentences,such as subject-auxiliary verb inversion, are gencomplements,dubbed the notion of government. The resulting directly by C-structure context-free rules, whereas erated picture is quite different in detail from earlier work, as there like passive sentences,are produced by the variations, other research are now no particular transformations; the bulk of now focuseson discovering particular patterns of constraints operation of rules in the lexicon, which convert active verbs to their passive forms and alter the default association of or parametric variation from language to language in the way grammatical relations with syntactic (C-structure) elements. the caseassignment,X-bar theory, or locality principles apply. Like generalized phrase structure grammar, lexical-funcNonetheless, the underlying principle of the theory still tional grammar has been argued to provide a better represenin language by possible a sentences the stands: to describeall means of a factored set of representations plus mappings be- tation for computational operations as well as a more adequate basis for describing languages that evidently do not dependon tween these representations. syntactic configurations to fix the association between verbs and their arguments like subject and object. This claim reAlternative Theories. As mentioned earlier, this model is mains to be established. still quite controversial. The existenceof representations like Government-binding theory is a topic of current research traces and constraints like the projection principle have been and as such is undergoing continual change; for details of recalled into question, as well as the multilevel organization of cent work the reader is urged to consult a survey such as that grammar assume the as whole. Competing approachesoften by Van Riemsdrjk and Williams (13) or the journal Linguistic that there is a single level of phrase structure, rather than a derivation from D- to S-structure. Other alternatives empha- Inquiry. size different representationsthat avoid the use of traces. The following two examples illustrate these alternatives. TransformationalGrammarand ComputationalModels Generalized phrase structure grammar (qv) (4) generates all possible surface structures via a single context-free gram- Several computational models have incorporated one or anmar. For instance, there is a rule expanding declarative sen- other version of transformational grammar. These may be di-
360
GRAMMAR,TRANSFORMATIONAT
vided into two sorts: those basedon Asp ectsstyle grammars, ca 1965, or even earlier versions of transformational grammar, and those based on post-Aspec/smodels. In general, the more recent post-Aspectsmodels have proved to be more adaptable to computational implementation, whereas the earlier versions did not prove very useful for direct computational analysis (16). Early approachesfor using transformational grammars for sentenceanalysis adoptedthe model of Syntactic Structures or Aspects.In these models a sentenceis generatedby the operation of the base component followed by zero or more transformations. Sentenceanalysis is the reverse of this: A procedure must start with a sentence,such as "The ice cream was eaten by John," and then must try to determine how it could be derived given a particular context-free base grammar and set of transformations. If no transformations are involved, this problem reduces to that of standard context-free parsing, for which there are several known efficient algorithms. However, surface sentencesthat are derived by transformations-like the examplejust given-are not directly generatedby the base but by a sequenceof one or more transformations. Inverting this processmay be difficult becausetransformations can delete someparts of a sentenceand rearrange others, and certain other transformations may be optional. The problem is that transformations work only on trees, so to compute an inverse transformation, an algorithm must start with a tree. However, given an input sentence, the procedure does not have a tree but only a string of words. For example, in the sentence"The book which was written by the Boston author became a best seller," an Aspectstheory might proposean optional transformation to delete "which was," yielding "the book written by the Boston author becamea best seller." To analyzethis sentence, a computer procedure must guess that material has been deleted and apply an inverse transformation. In general, since deleted material doesnot appear in the surface sentence and since the inverse of transformational rules may not be functions, this procedure is highly nondeterministic. One approach to solving the sentence analysis problem is purely enumerative. Given some transformational grammar and an input sentence,one can try to generate or synthesize all sentencesone after the other. After each sentenceis generated, it is checked against the input sentence to see if it matches;if it does,the sentenceis recognrzed;if not, the procedure keeps going. This procedure, analysis by synthesis, is computationally expensivebecausein general one would have to generate many imelevant candidate sentencesthat could not possibly be related to the actual input sentence.For instance, it makes little senseto analyze our example sentence "the ice cream was eaten," as an active sentence,but the analysis-by-synthesisprocedure will blindly try to do so. In addition, the procedure was judged to be psychologically unrealistic, becauseit calls for the entire sentenceto be read before any attempted synthesis match is attempted. Becauseof these problems, analysis by synthesis was not considereda serious algorithm for transformational-sentence analysis (16,17). Instead of enumerating all possibilities, the transformational parsers of the 1960s used a two-step procedure: First, analyze the sentence using a context-free grammar that generates a supersetof the possiblesurface sentencesproducedby the transformational grammar. This gives a candidate set of trees to work with. The secondstep applies inverse transfor-
mations to these trees, checking to seeif a tree that could have been generated by the base grammar is obtained. The Petrick System.The most widely known algorithms built along these lines were developedin the mid-1960s by Zwtcky and colleaguesat Mitre (18) and by Petrick at MIT and then IBM (18). The Petrick system (19) was originally designed to automatically provide a parsing procedure, given a transformational grammar. A revised version of this system is part of a question-answering system earlier called REQUEST and now dubbed TQA (19,2L). The original Petrick system contained a set of reverse transformations that mapped sentences to candidate deep structures. The idea was to have an automatic procedure to construct the reverse transformations given the structural descriptionsand structural changesof the original transformational grammar. The deepstructures were then parsed by a (context-free)base grammar component.The problem here is that the processof reversing transformations is still highly nondeterministic. For example, given the sentence "John was certain that Bill liked ice cream," such a parser would have to guessthat "John" was originally moved from an embeddedsentence,as it would be in "John was certain to like ice cream." To get around this difficulty, one must try to make the reverse transformations restrictive enough to cut down on the possibility of multiple reverse transformations applying to the same sentence.The current Petrick system, TQA, uses a restrictive set of reverse transformations that operate on trees rather than sentence strings, with results good enough for efficient question answering. For the most part, though, prosess on transformational parsing was blocked by computational difficulties QL). The Marcus Parser. The first modern transformational parser, based on the extended standard theory of the mid1970s,was that of Marcus (22). Marcus developeda computer progTam that would map from a surface sentence, such as "John was persuaded to leave," to its surface structure (Sstructure) representation as defined by the extended standard theory, indicating phrase boundaries as weII as traces. (The subscript I indicates coindexing.) [s [NpJohni]fvp was persuadede; [s ei fvp to leaue]lll The Marcus parser PARSIFAL used a basically bottom-up design in conjunction with a set of production rules that took the place of reverse transformations. That is, it would wait until it had seenseveral words of the sentenceand then build a piece of the analysis tree corresponding to the S-structure analysis of those words. Each rule, called a grammar rule, had a pattern and an action. An important element of the parser was the addition of a three-cell look-ahead buffer that could inspect upcoming words in the sentencein order to determine what action to take. The pattern was a triggering predicate that could look at part of the S-structure analysis already completed plus the words in the look-ahead buffer. The action would build a tiny piece of the output S-structure representation. For example, given the following sequenceof elements in the input buffer: was eaten o the Marcus parser could determine that a trace should be inserted after "eaten," thus undoing the effect of the Move-NP
GRAMMAR,TRANSFORMATIONAL
rule while building the S-structure representation corresponding to this sentence. It is instructive to seehow this design avoids the problems of early transformational parsers. The key problems with standard transformational parsing were: constructing candidate trees suitable for inverse transformations; correctly determining what elements had been deleted, if any; and guessing whether an optional transformation had been applied. The secondproblem is handled by relying on the extended standard theory representation. In this theory nothing can be deleted without leaving behind a trace, and there are other severe constraints that limit the appearanceof traces (such as the locality principles and case filter described above). The first problem, constructing candidate trees, is also aided by building S-structures rather than D-structures. Since the Sstructure representation looks very much like that produced by a context-free grammar, plus the addition of traces, it now becomespossibleto just build phrase structure directly rather than performing string-to-string transformational inverses or first constructing a partial tree and then applying reverse transformations to it. Finally, the third problem, determining which transformational rule may have applied, is greatly alleviated by means of the look-ahead buffer. In most casesMarcus argued that this reducedthe candidate choicesdown to just one; that is, the problem of mapping from an input sentenceto an S-structure became deterministic. Those sentenceswhere parsing remained nondeterministic included casesof obvious global ambiguity ("They are flying planes") or cases where people are not able to analyze the sentencedeterministically (such as "The horse raced past the barn fell," so-calledgarden path sentences)(L7,22). Becauseit was deterministic, the resulting parsing procedure was also quite efficient. Large-scaleversions of the Marcus parser are now being developedfor several industrial applications, including speechrecognition at Bell Laboratories. Marcus's model has also served as the basis for several more recent transformational models, some grounded more explicitly on X-bar theory, the casefilter, and the like (12).Work on adapting the principles of government-binding theory to this design is currently underway.
BIBLIOGRAPHY
361
Jakobovits (eds.),An Interdisciplinary Reader in Philosophy,Linguistics, and P sycltology,CambridgeUniversity Press,Cambridge, UK, L97L. 9. R. Hudson, Argurnents fo, a Non-transformational Grammar, University of Chicago Press, Chicago, IL, L}TL Presents arguments against transformational approachesgenerally. 10. N. Chomsky, Syntactic Structures, Mouton, The Hague, lgb]. 11. N. Chomsky, Aspects of the Theory of Syntax, MIT Press, Cambridge, MA, 1965. Summarizes transformational theory as of 1965. t2. B. Radford, Transformational Syntax, Cambridge University Press,Cambridge, MA, 1981. Gives a textbook introduction to the extended standard theory. 13. H. Van Riemsdijk and E. Williams, Introciuctionto the Theory of Grammor, MIT Press,Cambridge,MA, 1986. L4. N. Chomsky,Lectureson Gouernmentand Binding, Foris Publications, Dordrecht,The Netherlands, 1982.The first full-scaletreatment of the current theory of transformational grammar (ca 1985). 15. T. Winograd, Language as a Cognitiue Process,Vol. 1, AddisonWesley,Reading,MA, Chapter 4, 1983.Gives a short introduction to the aspectstheory. 16. J. Fodor, I. Bever, and M. Garrett, The Psychotogyof Language, McGraw-Hill, New York, L974. Gives psychologicaland computational studies of the relevance of transformational grammar up through the late 1960s. L7. R. Berwick and A. Weinberg, The Grammatical Basisof Linguistic Performance,MIT Press, Cambridg", MA, 198b. 18. A. Zwicky, J. Friedman, B. Hall, and D. Walker, the MITRE Syntactic Analysis Procedure for Transformational Grammars, Proc. 1965Fall Joint Computer Conference,ThompsonBooks,Washington, DC, 1965. 19. S. Petrick, Transformational Analysis, in R. Rustin (ed.),Natural Language Processing,Algorithmics, New york, 1gZB. 20. W. J. Plath, "REQUEST: A natural language question-answering system,"IBM J. Res.Deu.20,1?B-BB5(1926). 2I. F. Damerau, "Operating statistics for the transformational question answering system," Am. J. computat. Ling. 7(L), 82-40 ( 1 9 8 1) . 22. M. Marcus, A Theory of Syntactic Recognition for Natural Language, MIT Press, Cambridg", MA, 1980. 23. K. Wexler and P. Culicover, Formql Principles of Language Acquisition, MIT Press,Cambridge,MA, 1980. 24. R. Berwick, The Acquisition of Syntactic Knowled,ge,MIT Press, Cambridge,MA, 1985. Gives a computer model of acquisition.
1. N. Chomsky, LogicalStructureofLinguisticTheory,Chicago University Press,Chicago,IL, 198b. 2. F. NewmeY€r,Linguistic Theory in America,Academicpress, General Referenees New York, 1980.Providesan intellectualhistoryof transforma- Current research in transformational grammar may be found in the tional grammar. 3. J. Bresnan, The Mental Representationof Grammatical Relations, MIT Press,Cambridge,MA, 1983. 4. G. Gazdar, E. Kleitr, G. Pullum, and I. Sag, Generalized,phrase Structure Grarwnar, Harvard University Press, Cambridge, MA, 1985. 5. D. Perlmutter, Studies in Relational Grammar,IJniversity of Chicago Press,Chicago,IL 1985. 6. D. Johnson and P. Postal, Arc Pair Grammar-1980. Princeton University Press, Princeton, NJ 1980. 7. M. Brame, Base GeneratedSynfor, Noit Amrofer, Seattle, WA, 1978. 8. G. Lakoff, "On Generative Semantics," in D. Steinberg and L. A.
journal s Linguistic Inquiry, and The Linguistic Reuiew.For opposing viewpoints consult the journals Natural Language and Linguistic Theory, Linguistics and philosophy, and,Language. proceedings of the Linguistic Society of America Conference(LSA); Proceedings of the Meetings of the Chicago Linguistic Society (CLS); and Processingsof the New England, Linguistic Conference (NELS) are good sourcesfor extremely recent work, both pro and con. R. BenwrcK MIT
GRAMMAR,wEB. See Grammar, phrase-structure.
362
GUIDON
GUIDON An automated tutoring system for teaching about any domain representable by EMYCIN, GUIDON was written in L979 by Clancey at the Stanford Heuristic Programming Project. GUIDON explores the problem of carrying on a coherent, taskoriented mixed-initiative dialogue with a student by expert
HACKER A program by Sussmanthat createsplans for solving problems in the "blocks world" (seeG. J. Sussman,A ComputerModel of Skill Acquisition, Elsevier, New York, 1975).HACKER's creation was guided by introspection of the human problem-solving process.HACKER is viewed as a programmar who first tries to find a solution to a given problem by looking into an "answer libraty." If no answer is available, the programmer tries to "write" aplan by adapting a known plan with a similar "activation pattern." A "criticizer" then looks for any bugs in the plan and tries to use "patches" to fix them. Skill acquisition is achieved by generalizing and reusing these patches. The implementation of HACKER is basedon the CONNIVER (qr) language [see D. V. McDermott and G. J. Sussman,The CONNMR ReferenceManual, AI Memo 259, MIT AI Lab, Cambridge, MA (May 1972)1.
systems (see W. J. Clancey, Dialogue Management for RuleBased Tutorials, Proceedingsof the SfucthInternational Joint Conferenceon Artificial Intelligence, Tokyo, Japan, pp. 1551 6 1 ,t 9 7 g ) . M. Tem SUNY at Buffalo
communicating through a blackboard allows island parsing [seeV. Lesser,R. Fennell, L. Erman, and D. Reddy,"Organiza' tion of the HEARSAY-II speechunderstandittg system," IEEE Trans. Acoust. SpeechSig.Proc. ASSP-23, LL-24, 1975,and L. Erman, F. Hayes-Roth, V. Lesser, and D. Reddy, "The HEAR-SAY-II speech understanding system: Integrating knowledgeto resolveuncertainty," Comput.Suru. L2(2),2L3253, 19801. YuHax A. HaNYoNG SUNY at Buffalo
HERMENEUTICS
Recent debatesabout the theoretical foundations of AI refer to hermeneutics, the branch of continental philosophy that treats the understanding and interpretation of texts. Applying certain hermeneutic insights, Dreyfus (1), Winograd (2), and and Flores (3) have questionedthe functionalist cogWinograd J. Gnr,lnn SUNY at Buffalo nitive-scienceparadigm that guides most contemporaryAI research,particularly in natural-language processing(seeNatural-Ianguage entries) and common-sensereasoning (see Reasoning, Commonsense).Dreyfus draws upon the hermeneutic HARPY philosophy of Heidegger (4) to deny the possibility of form altzintelliA speech-understanding(qv) system, HARPY was written by ing mental processesand therefore creating artificial (March, Dreyfus 1986), personal communication gences. a [In Lowerre in L976 at Carnegie-Mellon University under ARPA his views and now Speech-Understanding Research project. Understanding a indicated that he has recently moderated necessarily impossible.l not but difficult AI very considers precompiled path in a transition of a as is realized sentence based on a network of words, where each word is a template of all possible Winograd and Flores reach a similar conclusion Yet, in addiargument. technical informed hermeneutically allophones [see B. Lowerre, The HARPY SpeechRecognition illuminate may hermeneutics of doubts, source a being to tion System, Ph.D. Dissertation, Carnegie-Mellon University, and understanding and Pittsburgh, PA, L976, and B. Lowerre and R. Reddy, The problems like the nature of meaning paradigm (2). functionalist the HARPY Speech Understanding System, in W. Lea (ed.), thereby hetp reconstruct for AI rehermeneutics of relevance the help clarify To Trends in Speech Recognition, Prentice-Hall, Englewood hermeneuof major strains the reviews first entry this search, Cliffs, NJ, pp. 340-360, 19801. tic thought. These positions include the naiue hermeneutics YusnN of early modern Europe and Dilthey's (5) more historically A. HarqYoNG conscious, nineteenth-century methodological hermeneutics, SUNY at Buffalo which sought to produce systematic and scientific interpretations by situating a text in the context of its production. In the twentieth century Heidegger's @) and Gadamer's (6) philo' HEARSAY-II sophical hermeneuticsshifted the focus from interpretation to existential understanding, which was treated more as a direct, A speech-understanding(qv) system, HEARSAY-II was writnonmediated, authentic way of being in the world than as a ten by Lesser et al. in L976 at Carnegie-Mellon University project. Asynway of knowing. Reacting to the relativism of this position, under ARPA Speech-Understanding Research Apel (7) and Habermas (8) introduced critical hermeneutics-a modules knowledge-source of different chronous activation
HERMENEUTICS
methodologically self-reflective and comprehensive reconstruction of the social foundations of discourseand intersubjective understanding. FinaIIy, Ricoeur (9), in his phenornenological hermeneutics, attempted to synthesize the various hermeneutic currents with structuralism and phenomenology. This background situates AI researchers and critics who draw from the various hermeneutic traditions. In their investigations of the affective structure of texts and in their concern with systematic rules for identifying the genres of narrative, Alker, Lehnert, and Schneider(10) in effect pursue a classical hermeneutical progTam tempered by phenomenologicalhermeneutics.Other researchers(2,LL,L2)draw from philosophical hermeneutics to propose strategies for developing computer systems that understand natural language. A third approach(3), aligned with philosophicalhermeneutics,argues that computer understanding of natural language is exceedingly difficult and probably intractable. A fourth group (13) has developedan implementation guided in part by ideas from phenomenological hermeneutics but informed by the other variants as well. Hermeneutic theories differ in several characteristic ways from approachesto meaning and understanding that are better known to AI researchers.Hermeneutics grounds the meaning of texts in the intentions and histories of their authors and/or in their relevance for readers.In contrast, analytic philosophy usually identifies meaning with the external referents of texts, and structuralism finds meaning in the arrangement of their words. Hermeneutics regards texts as means for transmitting experience,beliefs, and judgments from one subject or community to another. Hence the determination of specific meanings is a matter for practical judgment and commonsense reasoning-not for a priori theory and scientific proof. This attitude reflects the origin of hermeneutics in ancient-world efforts to determine systematically the meaning, intent, and applicability of sacred and legal texts. Hermeneutic theories and applications also share the idea of the hermeneutic circle or the notion that understanding or definition of something employs attributes that already presupposean understanding or a definition of that thing. Circles or spirals of understanding arise in interpreting one's own languaga, d foreign language or an observed action, in confirmin g a theory and in distinguishing between background knowledge and facts (14). The existenceof these circularities raises questionsfor hermeneutics regarding the grounding and validity of understanding. The philosophical conceptof the hermeneutic circle resembles the distinctly computational notion of bootstrapping-a processthat uses a lower order component(a bootstrap component) to build a higher order componentthat is used in turn to reconstruct and replace the lower order component.Bootstrapping has been introduced in the design of certain knowledge bases(15-17) and in Al-oriented theories of cognitive development (18-21) and should be distinguished from hierarchical layering in systems that do not include the "strange loop" of replacing the bootstrap component.The similarity of the hermeneutic circle and bootstrapping suggeststhe possibility of an important contribution from hermeneutics to AI architectures for natural-language processing and for commonsense reasoning. ClassicalMethodologicalHermeneutics Origins. Hermeneutics as a general scienceof interpretation can be traced back to more domain-specificapplicationsin
363
the ancient Greek's study of literature and in ancient Biblical exegesis. The word hermeneutics was coined in the seven"to teenth century (22) on the basis of the Greek hermeneu,ein, an of a text, a declamation equally interpr et:' which signified explanation of a situation, or a translation from a foreign tongue. (Hermeneueinitself derived from the name of Hermes, the winged messengergod of ancient Greece,who both delivered and explained the messagesof the other gods.)Regarding texts as organic or coherent wholes rather than as collections of disjointed parts, the Greeks expecteda text to be consistent in grammar, style, and ideas. Accordingly, they codified rules of grammar and style that they used to verify and emend textual passages.By extending the logic of part and whole to a writer's or school'sentire output, the Greeks were also able to attribute works with uncertain origin. Although the Jewish Rabbis and the early Church Fathers deployed similar philological tools, their biblical exegeseswere better known for the developmentof allegorical readings, frequently at the expense of the texts' literal meaning. Their interpretations found within the visible sign a hidden sensein agreement with the intention they beforehandascribedto the text. Since instances of this method are found for the Vedas, Homer, the Koran, and other sacredwritings, it seemsa typical strategy for reconciling an enlightened or moral world-view with texts whose"outward" earthiness or banality seemsbeneath the dignity of the gods being celebrated(23). The Middle Ages witnessed the proliferation of nonliteral interpretations of the Bible. Christian commentators could read OId Testament stories simultaneously as precursors of analogous episodesin the New Testament, symbolic lessons about Church institutions, and allegories about spiritual traits (24). In each case the meaning of the signs was constrained by imputing a particular intention to the Bible, such as teaching morality, but these interpretive baseswere posited by the religious tradition rather than suggestedby a preliminary reading of the text. Thus, when Martin Luther argued that Christians could rediscover their faith by reading the Bible themselves, Catholic Church officials not surprisingly respondedthat the Bible was too obscureto read without their guidance. The Protestant exegesis, which appeared after Luther's translation of the Bible, tended to view the texts as responsesto historical or social situations rather than expressions of theological principles. Assuming that the New Testament documentedthe Christian faith, one reader's guide proposedthat contradictory statements and difficult passagesin the New Testament could be clarified by comparing their possible meanings with contemporaneous Christian practices. The example suggeststhat interpretation might rely on empathetic understanding, the interpreter's self-projection into the author's space.Indeed, it was just such empathy that Schleiermacher and Dilthey raised to a methodological principle in their attempt to create a general hermeneutics. MethodologicalHermeneuticsof Schleiermacherand Dilthey. Schleiermacher (25) proposed to join classical philology'sfocuson grammar and style and biblical exegesis'concern for themes, creatin g a general hermeneutics with principles independent of domain-specific interpretation principles. Schleiermachercomparedthe reader's approachto a text with the efforts by participants in a dialogue to understand each other, and he depicted the dialogue in terms of a speaker who puts together words to expresshis thoughts and a listener who understands this speechas part of a shared language and as
364
HERMENEUTICS
part of the speaker's thinking (26). The listener can comprehend the words and sentences because they are drawn from the language's lexicon and follow its grammatical rules, but the listener can also recognizethe intentions behind the words by virtue of being in the same situation and sharing a common human nature with the speaker. Since Schleiermacher'sconcept of understanding includes empathy (projective introspection) as weII as intuitive linguistic analysis, it is much richer than the idea in modern communication theories that understanding is merely the decodingof encodedinformation. Interpretation is built upon understanding and has a grammatical, as well as a psychological moment. The grammatical thrust has a bootstrapping flavor: It places the text (or expression) within a particular literature (or language) and reciprocally uses the text to redefine the character of that literature. The psychologicalthrust is more naive and linear. In it the interpreter reconstructs and dxplicates the subject's motives and impticit assumptions. Thus Schleiermacher claimed that a successfulinterpreter could understand the author as well as, or even better than, the author understood himself because the interpretation highlights hidden motives and strategies. Broadening Schleiermacher'shermeneutics, Dilthey Q7) developeda philosophy of method for history and the human sciencesthat he believed could produce objective knowledge but avoid the reductionist, mechanistic, ahistorical explanatory schemaof the natural sciences.Dilthey argued that texts, verbal utterances, work of art, and actions were meaningful expressionswhose "mental contents" or intentions neededto be comprehended.He claimed that investigating human interactions was more like interpreting a poem or discoursethan doing physics or chemistry experiments (5). Dilthey termed the desired comprehensionof events and expressions"understanding" (uerstehen)and attempted to distinguish it from the explanatory knowledge (erkennen) generated by the hypothetico-deductivemethod of the natural sciences. Dilthey initially followed Schleiermacherin identifying understanding as empathy guaranteed by the notion of a common human nature. Although he recognizedthat the outlook and values of people varied over different historical periods and cultures, Dilthey argued that becausehistorians themselves thought and acted, they could relive and understand what people in the past were trying to expressand accomplish in their writings, speeches,actions, and art. Nevertheless, many of his contemporaries criticized this position becauseit relied on introspection and an underspecified,noncritical psychology. Stung by this criticism and influenced by the neoKantian idea that works of art and literature embodied the formal values of their respectiveperiods, Dilthey revised his position. He began to emphasizethat texts and actions were as much products of their times as expressionsof individuals, and their meanings were consequentlyconstrainedby both an orientation to values of their period and a place in the web of their authors' plans and experiences.In this revision meanor worldings are delineatedby the author's weltanschauutug, view, reflecting a historical period and social context. Understanding (uerstehen), the basis for methodological hermeneutics, involves tracing a circle from text to the author's biography and immediate historical circumstancesand back again. Interpretation, or the systematic application of understanding to the text, reconstructsthe world in which the text was produced and places the text in that world. [See Dilthey (5) for a sampling of Dilthey's writings on history and
the human sciencesand Ermarth (28) and Plantinga (2$ for their discussion.l This circular processprecludes an interpretation of a text from being unique and scientifically objective, like the explanation of a chemical reaction, inasmuch as knowledge of the author's or agent's world may itself critically depend on the present interpretation (14). Dilthey and his recent followers, Hirsch (30) and Betti (31), claim, however, that interpretations become more valid as they assimilate more knowledge about the author and the author's values, instead of reflecting the interpreter's own values or sense of reality. Dilthey's method in effect bootstraps from a whole (a biography, a set of works) whose themes may be repeatedly respecifiedthrough the elaboration of one of its parts (the action or work). The processeventually reachesstability becausesuccessiveinterpretations of the work or action serve to constrain subsequent refinements in the background model of the author. The strength and validity of such constraints dependson the currency and robustnessof that model. Increasesin temporal and cultural distance between the speaker and interpreter decrease the reliability of interpretation, but this neither foreclosesthe possibility of such a model nor deniesthe potential for a valid interpretation. Hermeneutics Philosophical Heidegger'sOntological Hermeneutics.In Being and Time (4) Heidegger undermines the notion of objectivity in Husserl's phenomenology(32) and, by extension,in methodologicalhermeneutics. [schmitt (33) and Zaner (34) present conciseoverviews, and Ricoeur (35) providesan extensiveanalysis of phenomenology(qv).1Husserl argues that objectiveinterpretation is possible using his transcendental phenomenological method, which requires bracketing the subjectivity inhering in the interpreter's life-world (Lebenswelt),the world of personal experience and desires. Heidegger denies that this bracketing is possible.He claims instead that the understanding of a situation is directly mediated by a foreknowledge, or sensitivity to situations, that is comprised by the understander's life-world. Therefore, suspending that life-world would preclude the possibility of understanding altogether. Heideggerreacheshis conclusionby contendingthat, as a necessary part of human "being-in-the-world" (Dasein), things are perceivedaccording to how they are encounteredand used in one's everyday routines and tasks. Perception and apprehension thus move from foreknowledge to an existential understanding, a largely unreflective and automatic grasp of a situation that triggers a response.This understanding must be incomplete because Dasein ts both historical and finite. It is historical in that understanding builds from the foreknowledge accumulated from experience.It is finite due to "throwness,"the necessity of acting in situations without the time or ability to grasp the full consequencesof actions or plans in advance.OnIy when actions fail to meet the exigenciesof the situation and "breakdown" occurs do individuals stand back and assume the theoretical attitude of science, which sees things "objectively," as discrete objects separate from the self and resistant to one'swiII. Heidegger brings hermeneutics from a theory of interpretation to a theory of existential understanding. He "depsychologizes" hermeneutics by dissociating it from the empathetic perception of other beings. Understanding now appears as a
HERMENEUTICS 365 no-Ionger-consciouscomponent of Dasein; it is embedded within the context of specific situations and plans, with, in effect, finite computational resources.Therefore, interpretation (Auslegung) that dependson such existential understanding (Verstehen) is not the general logical method found in classicalphilolory but refers to a consciousrecognitionof one's own world. Dilthey's methodologicalhermeneuticcircle is consequently supplanted by the more fundamental ontological hermeneutic circle, which leads from existential understanding situated in a world to a self-consciousinterpretive stance. however,cannot escapeits limitations This self-consciousness, understanding in the senseof Hetranscendental a to achieve gel (36,37), who consideredrationality the ability to reflectively accept or reject (transcend) the received sociocultural tradition (38). According to this reading of Heideggef, foreknowledge is accumulated over time and constrains successive exercisesof existential understanding. But self-consciousunderstanding cannot choosewhich elements in the experiencebased foreknowledge are respecifiedin the bootstrapping process.Green (39) presents a conciseoverview of Heidegger's contributions to philosophy.Steiner (40) and Palmer (22) provide accessibleintroductions to Heidegger's thought. Murray (4L) contains an informative collection of essays discussing Heidegger'sthought. Hermeneutics.In his philosophical Gadamer'sPhilosophical foundation hermeneutics Gadamer (6) follows his teacher Heidegger in recognizing that the ties to one's present horizons, one's knowledge and experience, are the productive grounds of understanding. However, Gadamer argues that these limits can be transcended through exposure to others' discourse and linguistically encoded cultural traditions becausetheir horizons convey views and values that place one's own horizonsin relief. [This position remedieswhat Green (39) contends is Heidegger's failure to show how the historicity of the individual relates to the history of a broader community.l He stressesthe role of language in opening the subjectto these other subjectivities and their horizons. In forcefully stressing the role of language in opening the subject to other subjectivities in constituting traditions. Gadamer places language at the core of understanding. Gadamer's (42) position approximates the hypothesis advancedby the American linguists Sapir (43) and Whorf (44),which holds, in its strong version,that the individual's language partially determines his or her conceptual system and world-view. According to the Sapir-Whorf hypothesis, complete translations between languages is impossible, and understanding another language requires complete immersion accompaniedby a change in thinking. Consequently, understanding for Gadamer does not scientifically reconstruct a speaker's intention but instead mediates between the interpreter's immediate horizon and his emerging one. For Gadamer, understanding is bound and embeddedin history becauseunderstanding deploys the knower's effectiuehistory, or personal experienceand cultural traditions, to assimilate new experiences.Thus, the initial structure of an effective-history constrains the range of possible interpretations, excluding somepossibilities and calling forth others.As effective-history constitutes the prejudices brought to bear in understanding, it simultaneously and dialectically limits any self-consciousattempts to dissolve those prejudices. Gadamer thus explicitly opposesthe scientific ideal of prejudicelessob-
jectivity in interpretation. In this respect, he moves beyond Heidegg€r, who regarded so-called scientific objectivity as a derivative of existential understanding. Gadamer does not deny the importance of either scientific understanding or critical interpretation, a form of interpretation that introspectively questions assumptions unreflectively inherited from cultural traditions. His focus on the human context of knowledge emphasizes the need for repeated attempts at critical understanding, through which people can gain the insight needed to correct their prejudices. But if prejudices may be individually overcoffi€,their fact is inescapable.It imposes a priori limitations on the extent to which a self-reflectivemethodolory can eliminate distortions from scientific inquiry. The critical self-consciousnessof a rational agent who introspectively questions received traditions may counter distorting consequencesof effective-history, but it at best only leads to successiveapproximations of objectivity. Gadamer's position prompts the philologists Betti (31) and Hirsch (30) to complain that its relativism destroys all bases for validating an interpretation and so defeats the purpose of interpretation. Social theorist Habermas (45) also criticizes Gadamer'srelativism. The resulting theory of meaning differs from the methodological hermeneutics of Schleiermacherand Dilthey, which identifies the meaning of a text with its author's intentions and seeks to decipher the text by uncovering the world-view behind it. For Gadamer, understanding recreates the initial intention embodiedin the text by elucidating the subject matter that the text addresses(its aboutness).The processmoves the text beyond its original psychological and historical contexts and gives it a certain "ideality" of meaning, which is elaborated in a dialogue between the interpreter and the text. The dialogue is grounded in the concern the interpreter and the author share toward a common question and a common subject matter. In confronting a viewpoint reflecting a different set of horizons, the interpreter can find his own horizons' In seeking highlighted and reach critical self-consciousness. the key question, the interpreter repeatedly transcends his own horizons while pulling the text beyond its original horizonsuntil a fusion of the two horizons occurs.The interpreter's imagination can also play a role in the dialogue with texts and carry the understanding of the subject matter beyond the finite interpretation reahzed in methodological hermeneutics. Nevertheless,the interpretations are constrained by the questions posedsince each question calls forth frameworks within which the subject matter must be understood.The meaning of a text then is not fixed but changesover time accordingto how it is received and read. Thus, for Gadamer, to understand is to understand differently than the author or even one'sown earlier interpretations preciselybecausethe processinvolves creating new horizons by bootstrapping from the old horizons they replace. But the notion of bootstrapping in Gadamer moves beyond the one in Heidegger becauseGadamer allows prejudicesto come into a consciousfocus that may direct their individual supersession. Gadamer doesnot merely work through Heidegger'sphilosophical program. He also redirects philosophical hermeneutics along partly Hegelian lines by appropriating substantial parts of the Hegelian transcendentalphilosophythat Heidegger eschewed(a6). Gadamer'sconceptsof the opennessof language and the ability of people to transcend their interpretive horizons are based on Hegel's dialectic of the limit, in which the recognition of limits constitutes the first step in transcend-
366
HERMENEUTICS
ing them. The concept of understanding as a concrete fusing of horizons is derived ultimately from Hegel's idea that every new achievement of knowledge is a mediation, or a refocusing of the past within a new, present situation (47), which attempts to explain mind and logic on the basis of the dialectical resolution ofmore basic and antithetical concepts(36). As each opposition is resolved, the resulting synthesis is found to be opposedto yet another concept, and that opposition must also bL dialectically resolved. This purely subjective and continual unfolding interacts with and is conditioned by experience' particularly the experience of language, which tends to mold the developing subject in conformity with the traditions encoded in Iinguistic utterances and in the language itself. However, Gadamer clearly resists Hegel's notion of the self-objectifying, transcendental subject. Instead, he views the logical and ontoIogical categories with which Hegel marks the unfolding of thought as distillations ofthe logic inherent in language, particularly the German language, whose use as a tool for speculative philosophy Hegel brought to perfection (48)' This view affirms the relativist position that thought and reason are always determined by the historical traditions of a linguistic community (49). Critical Hermeneutics Strategic Orientation. Heidegger's and Gadamer's critique ofobjectivity was particularly challenging for social theorists because empirical social science and normative social theory depend ultimately on the characterization ofevents and situations.Ataminimum,thepracticalneedtoassesstruth-claims' and interpretations had to be reconciled with the critique of objectivity. Apel (50) and Habermas (8,51) sought the means for ttre reconciliation in conjoining methodological hermeneutics with ordinary language philosophy. Their point of departure was the critique of ideology originated by Marx, which argues that beliefs and ideas reflect the holders' social class inLrests. (Although implying that an objective social reality might ultimately be described, this view also helps explain .orifli.t in beliefs among members of the same society') Armed with it, Apel and Habermas could conceive of a hermeneutically inspired analysis of communication in linguistic communitils. Thus, just as Heidegger's ontological hermeneutic concentrates ott ih" individual's apperception ofexperience, from the inside out, critical hermeneutics concentrates on individsituated in groups, from the outside in' uals Ap"l and HabIrmu. u*go" that of the three divisions of the study of language-syntax, semantics, and pragmatics-only the hrst two nave been adequately studied by the school of ordinary language philosophy descending from Wittgenstein (52). They Uetleve [h.t .to account of human understanding can be beiieved ifexplained as a theory about a single, asocial and ahistorical being. On the contrary, understanding may only be explained by reference to the social and historical settini in which understanding occurs and in the discursive or diiogical situation in which communication takes place' meaning do not await discovery but are negotiated t*tfr ""a who come to consensuson issues of truth and meanby actors ing through social discourse' This perspective may be conas tristed *it*, tttu first principles of research programs' such and use language explicate to seek (53-55), which Chomsky's learning on the basis of an examination of a monolanguagl "*oa"l of the competence of an ideal speaker-hearer i"gi-"uf of ab'stracted from his social situation (7)' Although studies
syntax and semantics are surely necessary for an adequate grasp of the human linguistic faculty, they are by no means sufficient. Any adequate understanding of language' Habermas (56,5?) asserts, must be grounded in the practical purposesfor which speakersuse language. Universal Pragmatics.To provide such grounditg, Habermas (56) proposed a universal pragrnatics (see Ref. 58 for a short overview and discussion),the primary task of which is the identification and reconstruction of the necessarypreconditions for the possibility of understanding in discursive communication. Turning to ordinary language philosophy, he attempts this reconstructionby linking Austin's (59) and Grice's (G0) notions of felicity conditions underlying discourse to Searle's(61) theory of speechacts and to a consensustheory of truth, which holds that truth claims are resolvedthrough reasoned discussion culminating in consensus.Habermas does not confine universal pragmatics to the analyses of language and speech.Rather, becausehe seeslanguage as the medium in which all human action is explicated and justified, he intends "universal pragmatics" as the gfoundwork for a general theory of social action. The resulting critical hermeneutics holds that intersubjective communication is possible despite differencesin the participants' preunderstandings, becausethe participants in effect posit as an ideal the attainment of a consensus(concerning the validity of statements). The desired consensusis free from constraints imposed on them by others and from constraints that they might impose on themselves.That is, a participant posits a situation in which all participants can freely try to convince others (or be convincedby them) and in which all have equal opportunity to take dialogue roles. Participation in dialogue thus admits the possibility of reinterpreting and changing the perceived situation. Habermas and ApeI term this idealization the ideal speechsituation and considerit the participants' emancipatory interest-the situation of freedom to which they aspire. This ideal might never be attained, but even to approach it, the participants must overcomesystematically distorted communication, which suppressesand concealsthe speakers'interests. According to Habermas, these distortions are produced by the division of labor and disguise its correlated structure of domination. Habermas turns to a Freudian psychotherapeutic model to prescribe treatment for the pathologicat consequencesof the systematically distorted horizonr produced under these conditions. According to him, the task of the social theorist is to act as therapist, encouraging citizens (patients) to reject internalizations of distorted institutional urrungements (classdomination). For Habermas' then, understanding involves compensating for these distortions, and interpretation requires an account of how they were generated. The Habermas-GadamerDebate. Gadamer (62) attacks Habermas's position by pointing out that the psychotherapist or social theorist is not i**r.,ne from the preunderstandings of tradition and that these preunderstandings are not themselvesnecessarilyfree of distortion. Gadamer seesHabermas's effort as part of the traditional social-scientificgoal of attain(45) ing ,,objective"knowledge of the social realm. Habermas Schleierappears to believe that the social theorist, like macher's interpreter, can understand the social actor better than the social actor understands himself. That is beyond belief for Gadamer, given his notion of ontological preunder-
HERMENEUTICS
standing. For his part, Habermas seesGadamer as too ready to submit to the authority of tradition and too reticent to offer any methodological considerations (apart from the exceedingly abstract notion of "interpretive horizons"), thereby giving unwitting support to positivist degradations of hermeneutics. In reply to Gadamer's claim that prejudices are inescapable, Habermas insists that a self-reflective methodology can overcomeprejudices and that an objective social theory can be approachedby bootstrapping from an initial understanding of society. Habermas argues that the systematic distortions in communication that bias an initial understanding of society can be analyzedand reducedusing generalization from empirical knowledge of society, quasi-causalexplanation (deductive verification), and historical critique. To build this comprehensive social theory, Habermas must provide a theory of knowledge grounded in 1. a general theory of communicative action; 2. a general theory of socialization to explain the acquisition of the competencethat underpins communicative action; 3. a theory of social systems to show the material constraints on socialization and their reflection in cultural traditions; and 4. a theory of social evolution that allows theoretical reconstruction of the historical situations in which communicative action obtains. But this move apparently fails to counter Gadamer's objection since the theoretical tools used to forge this theory may themselves be subject to interpretations other than Habermas's vary acrossthe cultural traditions of social interpreters. McCarthy (63,64) reviews the debates,discussesvarious problems in Habermas's position, and provides a systematic rendition of Habermas's arguments. Ricoeur's proposedresolution of this debate is discussedbelow. Theoryof CommunicativeAction. Gadamer'sobjectionsnotwithstanding, Habermas has embarked on a multivolume statement of a comprehensivesocial theory centered on communicative action. In the first volume Habermas (65) concentrates on the connectionbetween the theory of universal pragmatics and the general theory of action descending from Weber (66) through Parsons (67) to Schutz (68) and Garfinkel (69). His stratery is to align the various types of communication, their inherent truth claims, and their counterparts in rational action. Cognitiue corrlmunication,in which the correspondenceof terms to objects and events is at issue, has its rational action counterparts in instrumental and strategic action. These types of action are oriented toward successand are validated by instrumental reason,which reflectson the efficacy of plans as means to desiredends.Habermasties interac' tiue cornrrlunication,in which claims to moral rightness and appropriateness are thematrzed, to normo,tiuely regulated action, in which the norms of a community and the social roles of actors becomeimportant constraints on the perceivedapprocompriateness of actions. Finally, Habermas links eJcpressiue munication, in which the truthfulness of communicative actions are thematized, to drarnaturgical action, which focuses on the fact that actors respectively constitute a public for each other. Dramaturgical action attends to phenomena involving each actor's presentation of the self to others (70), to those aspectsof the actor's subjectivity he choosesto reveal to others
367
and to those he choosesto conceal.These revelations and concealments are, in turn, important factors that rational actors must assesswhen interpreting the actions of others and when planning their own. Hermeneutics Phenomenological Faced with the diversity of hermeneutics, and other continental philosophies including structuralism and phenomenology, Ricoeur strives for a grand synthesis in his phenomenological hermeneutics. For his interpretation of earlier hermeneuticists, seeRef. TL Ricoeur (72) arguesthat phenomenoloryand hermeneutics presupposeeach other. The connectionbetween hermeneutics and phenomenology traces to Heidegger who took the term "hermeneutics" from Dilthey to distinguish his own philosophical investigation of everyday being from Husserl's transcendental phenomenology,which tried to achieve objective knowledge by suspending concern for the subject's life-world. To capture knowledge of that world, Heidegger retained Husserl's notion of eidetic phenomenolory, which assumes immediate registration of phenomena in a picturelike but uninterpreted manner. Like Heidegg€r, Ricoeur also follows Husserl to eidetic phenomenology, but like the later Heidegger and, particularly, Gadamer, Ricoeur recognizesthe ontological basis of understanding in language. For Ricoeur, then, the subject'sbeing is not identical with immediate experiences. So, instead of attempting a direct description of Dasein like Heidegger (4) and Merleau-Ponty (73,74), Ricoeur sees the need for a hermeneutic theory of interpretation to uncover the underlying meaning constituting Dasein. Through its emphasis on the prelinguistic, eidetic phenomenolory supplies a means of distancing observation from linguistic descriptions and their implicit preconceptions.This distanciation (7il is precisely what is required for interpretation to proceed.Since the task of uncovering the underlying objectivity cannot be achieved through the suspensionof subjectivity, Ricoeur concludesthat Husserl's project of transcendental phenomenologycan only be realized through the application of a methodological hermeneutics to an eidetic phenomenology. Ricoeur also argues that structuralism and hermeneutics can be complementary approaches to analyses of langusg€, meaning, and cultural symbolism, for reasonssimilar to those he advanced for the complementarity of eidetic phenomenology and hermeneutics. Structuralism refers to a mode of inquiry that inventories elements of a system and notes the grammar of possible combinations. It is exemplified by Saussurean linguistics and Levi-Strauss's anthropolory (76). Ricoeur finds that the value of structuralist analysis lies in its ability to catalogue phenomena and describe their possible (grammatical) combinations, but its weakness lies in its inability to provide anything more insightful than behavioral descriptions of closed systems. Nevertheless, the ability to generate structural descriptions complementsthe hermeneutic method, which interprets these descriptions by assigning functional roles to the phenomena. In his treatment of psychoanalysis,particularly the interpretation of dreams, Ricoeur (77) shows the complexity involved in the hermeneutic task of assigning functional roles to words and symbols. The analyst must develop an interpretive system to analyze the dream-text and uncover the hidden meanings and desires behind its symbols, particularly those that have multiple senses(polysemy).Allowing for the possi-
368
HERMENEUTICS
bility of multiple levels of coherent meaning, hermeneutics aims at ascertaining the deep meaning that may underlie the manifest or surface meaning. Ricoeur distinguishes two approachesfor getting at the deepermeaning: a demythologizing one that recovers hidden meanings from symbols without destroying them (in the manner of the theologian Bultmann) and a demystifying one that destroys the symbols by showing that they present a false reality (in the manner of Marx, Nietzche, and Freud). The demythologizerstreat the symbols as a window into a sacredreality they are trying to reach. But the demystifiers treat the same symbols as a false reality whose illusion must be exposedand dispelled so that a transformation of viewpoint may take place, &s, for example, in Freud's discovery of infantile illusions in adult thinking. Thus, there are two opposingtendencies,o revolutionary and a conservativehermeneutics.Whereasthe critical hermeneutics of Apel and Habermas falls within revolutionary demystification, the phenomenologicalhermeneutics of Ricoeur and the philosophical hermeneutics of Gadamer fall in the more conservative camp of the demythologizers. Ricoeur (78) attempts a dialectical resolution of the Habermas-Gadamer debate by arguing that the hermeneutics of tradition and the critique of ideology require each other. He deniesthe alleged antinomy between the ontology of tradition, which limits possiblemeanings (Gadamer),and the eschatology of freedom, which seeks to transcend these constraints (Habermas).If, as Gadamer believes,understanding should be conceivedas the mediation between the interpreter's immediate horizons and his emerging horizon, the interpreter must distance himself to some degree if he hopesto understand the text. That is, when confronted with a text, the interpreter must adopt a stance of critical self-understanding not unlike the stance adopted in the critique of ideology. Hermeneutics thus incorporates a critique of ideology. Likewise, the critique of ideology incorporates tradition. The ideal of undistorted communication and the desire for emancipation do not begin with Habermas. They arise from a tradition-from the tradition of the Greek conception of "the good life," from the Exodus, and from the Resurrection. Thus, the interests voiced by Gadamerand Habermas ate, in Ricoeur'sview, not incompatible. One is an interest in the reinterpretation of traditions from the past and the other is the utopian projection of a liberated humanity. Only when they are radically and artificially separated, argues Ricoeur, does each assume the character and tenor of ideology.
according to the author or agent's world-view but also according to its significancein the reader's world-view. Ricoeur'shermeneutic arc combinestwo distinct hermeneutics: one that moves from existential understanding to explanation and another that movesfrom explanation to existential understanding. In the first hermeneutic subjectiveguessingis objectively validated. Here, understanding correspondsto a processof hypothesis formation based on analogy, metaphor, and other mechanismsfor "divination." Hypothesisformation must not only propose sensesfor terms and readings for text but also assign importance to parts and invoke hierarchical classificatory procedures.The wide range of hypothesisformation means that possible interpretations may be reached through many paths. Following Hirsch (30), explanation becomes a process of validating informed guesses.Validation proceedsthrough rational argument and debate based on a model of judicial proceduresin legal reasoning.It is therefore distinguished from verification, which relies on logical proof. As Hirsch notes, this model may lead into a dilemma of "selfconfirmability" when nonvalidatable hypothesesare proposed. Ricoeur escapesthis dilemma by incorporating Popper'snotion of "falsifiability" (80) into his methodsfor validation, which he applies to the internal coherenceof an interpretation and the relative plausibility of competing interpretations. In the secondhermeneutic that moves from explanation to understanding, Ricoeur distinguishes two stances regardittg the referential function of text: a subjective approach and a structuralist alternative. The subjective approach incrementally constructs the world that lies behind the text but must rely on the world-view of the interpreter for its preunderstanding. Although the constructed world-view may gradually approximate the author's as more text is interpreted, the interpreter's subjectivity cannot be fully overcome. In contrast, Ricoeur sees the structuralist approach as suspending referenceto the world behind the text and focusing on a behavioral inventory of the interconnectionsof parts within the text. As noted earlier, the structural interpretation brings out both a surface and a depth interpretation. The depth semanticsis not what the author intended t;osay but what the text is about, the nonostensivereferenceof the text. Understanding requires an affinity between the reader and the aboutnessof the text, that is, the kind of world openedup by the depth semanticsof the text. Instead of imposing a fixed interpretation, the depth semantics channels thought in a certain direction. By suspending meaning and focusing on the formal algebra of the genres reflected in the text at various levels, the structural method The HermeneuticArc: Ricoeur'sTheoryof Interpretation.Ri- gives rise to objectivity and captures the subjectivity of both coeur's theory of interpretation (79) seeks a dialectical inte- the author and the reader. Like the other traditions, Ricoeur'shermeneutic arc can be gration for Dilthey's dichotomy of explanation (erklaren) and existential understanding (uerstehen).Ricoeur begins by dis- interpreted as a bootstrappingprocess.Becauseit groundsthe tinguishing the fundamentally different interpretive para- bootstrapping in an eidetic phenomenology,incorporates an digms for discourse (written text) and dialogue (hearing and internal referential model of the text, and begins interpretaspeaking). Discourse differs from dialogue in being detached tion with a structural analysis, Ricoeur'stheory of interpretation may be easierto envision in computationalterms. But the from the original circumstances that produced it, the intencentral bootstrapping engine in his theory is the alternation tions of the author are distant, the addresseeis general rather than specificand ostensivereferencesare absent. In a surpris- between forming hypotheses about meanings and validating ing move, Ricoeur extends his theory of interpretation to those hypotheses through argument. This view resonates action, arguing that action evinces the same characteristics strongly with computational ideas about commonsensereathat set discourseapart from dialogue. A key idea in Ricoeur's soning (qv). Indeed,these ideas lead Ricoeur to identify metaview is that once objective meaning is released from the sub- phor as the main source of semantic innovation (81,82),linguistic evolution, and therefore as major question for jective intentions of the author, multiple acceptableinterprehermeneutics (83). For an excellent overview and comparison tations becomepossible. Thus, meaning is construed not just
HERMENEUTICS 369 of the treatments of langu ageand cognition found in phenomenological hermeneutics and in other nonhermeneutical traditions of philosophy, see Dallmayr (84). Hermeneuticsas Metascience The hermeneutic tradition provides a basis for prescribing and criticizing the conduct of inquiry and the development of knowledge in the natural, social, and cognitive sciences(qt). Its representatives have figured prominently in debates concerning how valid knowledge can be acquired and whether there is a need for a separate methodology in the social sciences.Since AI is a new discipline, occupying a middle ground between the natural and social sciences,its researchers can benefit from knowledge of these debates. The choice of the appropriate methodology for inquiry in AI research remains unsettled for such areas as natural-language processing,human problem solvitg, belief systems,and action. On one hand, the substantial contributions to AI from logic, mathematics, engineering, and the natural sciences'like physics, seem to make their strategies for inquiry uncontested. On the other hand, when the subject matter is clearly linked to the human sciences-particularly lingUistics, anthropology,and psychology-methods devised for those areas might be more appropriate. Hermeneuticsand the SocialSciences.Dilthey distinguished from the cultural and social sciences(Geistewissenschaften) approthe and their objects of basis the on the natural sciences priate means for knowing them. The natural sciencesconcernedphenomenathat, opaqueto thought, could only be studied from the "outside" through observation of uniformities in their behavior and through the construction of causal laws to explain those uniformities. In contrast, the human sciences had objects such as texts, verbal expressionsand actions that could be investigated from the "inside" through an understanding of their authors' experiencesand intentions. An interpretive or hermeneutic methodology could more reliably and intelligibly account for these objectsby reconstructing the internal cognitive processesthat motivated and gave meaning to each of them. The use of hypothetico-deductive methods employedin the natural sciencescould only capture the external correlations among these objects at some high level of abstraction. Dilthey's arguments were embracedin the early twentieth century by many social scientists, including the sociologist Weber (66), whose paradigmatic studies of social institutions interpreted human behavior as intentional action, structured by the agents' goals and beliefs. However, the physics model of the social sciencesalso persists and is currently manifested in such techniques as Skinnerean stimulus-responsemodeling of human behaviors and statistical content analysis, which determines the meaning of texts through frequency count of their words. Contemporary hermeneuticists, such as Apel (85,86),Habermas (8), and Ricoeur (9), strengthen Dilthey's distinction by noting that in the human sciencesthe subject of investigation and the investigator can communicate with each other. The equality suggests that an appropriate methodology will resemble discussionsin which members in a community justify their actions. The tools of the natural sciencesare simply 'incapableof representingthe key conceptsin such discussions, namely motivation, belief, and intention, and the complexity
of their interactions. Intentional actions are embedded in groups of varying size and are constrained by (re-) created rules and norms-sociocultural traditions. Because of the complexity of these intertwined and mutually defining webs of relationships, scientific accessto them is difficult, and "uncertainty principles" abound. These involve the difficulties of isolating the object of study from its milieu and preventing changesthat communication between the investigator and the subject produces in the subject. These conditions support the notion that cultural and social studies have the role of clarifying the beliefs, plans, motivations, and social roles that led cognitive agents to produce their texts and actions. The inquiry becomesa "dialogue" through which the inquirer comes to understand the tradition in which the author or agent is embedded,so that the inquirer may either accept or repair the tradition, as Gadamer demands, or even reject it, as Habermas permits. Phases of understanding may be alternated with phasesof validating knowledge,as Ricoeur'shermeneutic arc suggests,or of seeking explanations to opaque behaviors, as suggested in Apel's model of psychoanalysis.In any event, hermeneutic studies are inherently interactive and produce self-understanding. In this way they extend the original mission of hermeneutics to mediate cultural traditions by correcting misreadings or distortions. Logical positivists have neverthelessrejectedthe claims for a separate method for social and cultural sciencesas groundless challenges to their own program of creating a unified scientific method based on an unambiguous observation language (87). Abel (88), Hempel, and others argue that empathetic understanding and the attribution of rule following are psychologicalheuristics, unverifiable hunches,or intuitions based on personal experience.Although Abel concedes that they may be useful in setting up lawlike hypothesesfor testing, he concludesthat they are neither necessarynor sufficient to constitute a human science. There are several rebuttals to these claims. First, methodological hermeneutics,which Dilthey initiated and Betti (31) and Hirsch (30) continue, holds that an interpretation can be "objective" and "valid," if not verifiable, provided the investigator resists temptations to make the text relevant for her own practical affairs. This strategy regards the text as an embodiment of the values of its time and suspendscredibility regarding its truth and acceptability, according to present standards. But knowledge of values expressedin other texts and recordsfrom the period are allowed to constrain the possible interpretations. Second,the idea of an interpretive or hermeneutic social sciencehas received indirect support from ordinary language philosophy, an analytic that eschews the mentalism to which the logical positivists so strenuously object. The support comesfrom the sociologistWinch (89), who generatesrecommendationsfor a social scienceon the basis of the later Wittgenstein's analysis (52) that particular word use and discoursepatterns-"language games"-1sflect and constitute activities in semi-institutionalized,functional areas of life-"life-forms." Winch contendsthat the analysis of social actions (both verbal and nonverbal) has a necessarilyholistic, situation-oriented, interpretive character rather than a generalizing, explanatory one: "Llnderstanding . . is grasping the point or meaning of what is being doneor said. This is a notion far removed from the world of statistics and causal laws: it is closer to the realm of discourse and to the internal relations that link the parts of . . a discourse"(90).Third, philosophi-
37O
HERMENEUTICS
cal hermeneutics is not concerned with verifiable accounts, Apel and Habermas. Apel (85) clarifies this processof reconand, as noted above, it denies the possibility of objective structing paradigms from first principles when he notes that knowledge. Instead, it argues that only a personwho stands in justifications for scientific statements ultimately rely on a history, subject to the prejudices of his 89€, can hope to under- common ground in ordinary language statements. This comstand it. A valid understanding of an event, interaction, or mon gfound, the "communicative a priori," provides procetext is one that bridges history or sociocultural differencesto dural norms regarding the admissability of evidence and the highlight the inquirer's situation. By this standard, Winch's validity of argumentation. Thus, despite paradigmatic differrecommendations are not hermeneutic becausethey are based ences, scientific discourse can still reach a consensus,and on the idea of ahistorical language games.They do not recog- avoid arbitrariness or dogmatism, by falling back on princintze that interpretation includes both "translation" and "ap- pted argument stated in ordinary language. plicatior," that is, the mediation between the disintegrating Notion of an EmancipatoryScience.The hermeneuticstradiand the emerging language-games,on one hand, and the revitalization of the past and its assimilation into the present life- tion also provides the methodologicalstarting point for Marx's critique of ideology, Freud's psychoanalysis,and other studies form, on the other hand (85). that seek human emancipation by dissolving obsolete,restricHermeneuticsand the Natural Sciences.Kuhn's influential tive, and self-denying traditions and practices. Their initial justifications given for these pracThe Structure of ScientificReuolutions(91) developeda herme- strategy is to unmask the true needsand the conditions actors' the of as distortions neutics of the natural sciencesby portraying them as histori- tices understanding will not rehermeneutic Yet, situation. the of organized cally embedded, Iinguistically mediated activities justifications. In presenting these accept actors the why veal invesand conceptuahzation the direct around paradigms that science, emancipatory paradigmatic tigation of the objects of their studies. Scientific revolutions psychoanalysis as the acfully cannot beings (50,86) human that emphasizes occur when one paradigm replaces another and introduces a Apel expresin their intentions the or motives own their knowledge noThe new set of theories, heuristics, exemplars, and terms. need to be tion of a paradigrn-centered scientific community conse- sions. Consequently, empathy and introspection the applies that turn quasi-naturalistic supplemented by a quently seems analogous to Gadamer's notion of a linguistiAny behavior. actor's to the science natural of cally encodedsocial tradition. Kuhn (92) reports that his own causal analysis fed back to the actor and development toward this idea began with his distress over resulting explanations can then be self-knowledge. as appropriated that discovery Aristotle's theory of motion and the eventual As mentioned earlier, Gadamer and Habermas debatedthe Aristotle meant by "motion" something other than what the especially in regard to the word signified in Newtonian mechanics. This effort corre- vatidity of rejecting past traditions, institutions. Gadamer social political and Western of spondsclosely to a programmatic definition of hermeneuticsas critique and ungrounded incoherent move this considers the study of human actions and texts with the view to discover "rr"rrtially the value of raincluding tradition, very the it rejects since them with agree them, their meaning in order to understand must acinvestigator the that tional, noncoerced consensus, or even amend them (87). and Habermas_(51) response, In explication. the begin Debates around Kuhn's thesis have spurred often grudging cept to understandand reason for preference (Z) the that claim Apel concessionsthat data, facts, and lawlike relations are theoryfor hermeneutics-is not just arbitrary or dependent rather than verifiable, coherent, and independent ing-the grounding in the Western cultural tradition. Inof the scientific theories in which they are embedded (93). an inherited prejudice communicative a priori underlies all a that Noting the inescapable theory dependenceof observational stead, they assert that speech(and speechlikeaction) entails and communication sentences and the incommensurabilities across paradigms, as grammatical and sincere,to be weII as appropriate, be must no that conclusion Feyerabend (94,95) reaches the radical these validity claims imply a processfor methodological standards can legitimately be applied. He meani"gfui. Since the act of speaking itself commits the agreement, therefore advocates a "methodological anarchism" that pro- reaching reason. to speakers Prefer ceeds from the slogan "in science, anything goes!" Feyerabend's doubts about the possibility of interparadigm communication closely resemble Gadamer's doubts regarding the Hermeneuticsin Al accessibilityof alien traditions. AI researchers have incorporated ideas from Putnam (96), however, argues that Feyerabend conflates Thus far, few their computational models of understandinto hermeneutics comconceptswith conceptualization. According to Putnam, Hermeneutics, instead, has provided interpretation. and ing conthe that require not munication across paradigms does source of urguments for doubting the possibility of the cepts be the same acrossparadigms but only that members of fertile ,1ard AI" project, creating true artificial intelligences that can otrl paradigm make their ideas intelligible to members of anpassthe Turing test (qv)-which can be thought of basically as othei paradigm. They can do so provided the fundamental in natural language just like a human. mechanisms conceptualization are the same acrossparadigms lft. ability to converse in action theory and social interinterest AI (langu age communities). According to Putnam, the mecha- Nevertheless, as need to glean the insights of will researchers deepens, action nisms oi .ott.eptualization must be universal and a priori or if their programs are to adequately mirror social empirical experiencewould not be possible.But making ideas hermerr.rrii.s and theii cognitive foundations. Efforts that fail intelligible across paradigms can require rederiving the con- phenomena the variability of meaning according to the intencepts upon which a paradigm's theories rely as well as recon- lo .orrrider of actors as well as the perceptions of obhistories and tions structing the grounds for those concepts, and so or, recurwill not solve the difficult questions of understanding, sively. Thnr, interparadigmatic communication accordingly servers perform very weil in microworlds. Indeed, requires a "critique of ideology" similar to the one proposedby and may not even
HERMENEUTICS
371
cal knowledge or experience that an understander deploys when interpreting utterances. Hermeneuticists identify this problem as the historicity of understanding or the role of background knowledge in mediating understanding. Moreover, these deductive formalisms are subject to Alker, Lehnert, Text. of Analyzing the Affective Structure ontological critique of Husserl. Their failure to Heidegger's for extractmodel (10,97) bottom-up present a and Schneider the ing the affective structure of a text. Their "computational her- address the fundamental ontolory of language typified by the for to account inability to an leads situation conversational meneutics" builds from Lehnert's earlier work on "plot units" inthe of identification in speaker-hearer role of context nevertheless but (98,99). Plot units provide an unvalidated (2). supports Thus, Winograd utterances of meanings tended relationships affective designating for interesting vocabulary and their combinations. In this research they are used to de- the Heideggerian critique with arguments and examples In for participants in events drawn from ordinary language philosophy (59,61,103,104). scribe many emotional consequences sense making that he argues (qt) of Gadamer, reminiscent vein a and actions. Working within "conceptual dependency" theory (100), Lehnert identified various combinations of plot of a statement requires knowing how it is intended to answer (implicit or explicit) questions posed by the conversational units for use in summarizing narrative texts. These "story molecules" relate changes in actors' affects to successesand context. He concludesthat deductive logic can accountfor only a small fraction of human reasoning, and therefore new adfailures in the resolutions of problems involving them. In their in natural-language understanding require "a calculus vances pasreduced manually work Lehnert, Alker, and Schneider reasoning" (105). natural of Christ's to leading up events of retelling sagesfrom Toynbee's Winograd proposes knowledge-representation language crucifixion to a large number of these molecules. The molecules were interrelated through the actors involved and by (KRL) (qv) (106) as a starting point for an alternative apvirtue of some molecules being antecedent conditions for proach. KRL's reasoning based on limited computational reothers. After the input of these manual reductions,the central sourcescaptures Heidegger's thesis of the finititude of Dasein subgraph of the plot structure was computationally extracted, and also echoesSimon's notion of "boundedrationality" in the theory of decision making (107). For Winograd, effective reausing a program for finding the most strategic and highly connectedmolecules. This central subgraph was labeled the soning strategies under limited or variable computational resources provide a "natural reasonitg," which, although for"essential" Jesus story. After studying this affective core, Alker, Lehnert, and Sch- mally incomplete, can account for more of everyday neider concluded that the Jesus story involves an ironic vic- natural-language usage than can the small fraction that fits tory born from adversity and conforms to a well-known genre, the mold of a complete deductive logic (105). Moreover, this the romance of self-transcendence.Their method resembles approach must have the ability to deal with partial or impreclassical hermeneutics in seeking to uncover the essential cise information if it is to work at all. Winograd proposesa structure of text based on systematic linkages between the control structure that uses matching of the current processing parts and the whole and in emphasizing the use of explicit context to trigger actions appropriate for the situation. This rules for objective interpretation. However, their willingness view of situated action, in which situational responsesare unreflective, resemblesthe concept of "thrownness" as develto tolerate multiple interpretations and their structuralist orientation also aligns them with phenomenologicalhermeneu- oped by Heidegger. The combination of situated action as a tics. Alker, Lehnert, and Schneider suggest that the Jesus control structure and resource-limited reasoning grounded in story has been emotively potent becauseit provides a step-by- commonsense,stereotype-basedreasoning (2) resonates with recent work on analory (108-110), precedential reasoning step account of affective change in self-transcendenceand thus (111), and metaphor (2l,ll2). At its core KRL also incorpocan open its readers to the experience of this process.In its present form, however, this work does not implement a boot- rates a notion of bootstrapping similar to the one found in the strapping process even though ironically the theme of self- various hermeneutic traditions, particularly in the works of transcendence presupposes a mechanism capable of con- Heidegger and Gadamer. Winograd argues that spurious reification, or misplaced sciously directed bootstrapping. concreteness,has plagued earlier efforts to develop a formalWhat Does it Mean To UnderstandNatural Language?Wino- ism for representing natural language. Spurious reification grad (2) uses insights primarily from philosophical hermeneu- occurswhen a competenceis imputed to an understander, not tics to sketch a new approachto natural-language understand- becausethe understander actually employs the specifiedcompetence in performance, but because the observer classifies ing (qv). He intends to overcome the pitfalls of earlier performances as instances of a particular competenceand then approachesthat succumbedto the phenomenologicalcritique advancedby Dreyfus (1). Focusing on the theory of meaning, mistakenly imputes the competenceto the understander. InWinograd argues that previous efforts, including his own stead of building from domain-level conceptsand structures, SHRDLU (qv) (101), fell into the trap of "objectivism," or the Winograd attempts to avoid spurious reification by constructmisplaced belief that the contents of a theory or model corre- ing formal representationsbasedon ontological considerations boruowedfrom methodologicalhermeneutics (113). Since no spond directly to reality (the correspondencetheory of truth). [Prior (102) provides a conciseoverview of the coruespondence substantial AI project has been attempted using KRL, the theory of truth, which holds that the structure of theoretical ideas that its designershoped to capture remain more theoretknowledge corresponds to reality.l Winograd adds that the ical than practical. In discussing hermeneutics, Winograd not only proposesa deductive nature of the formalisms used by AI researchers forced them to adopt an objectivist position but that these new researchprogram for AI but also problematizesthe philoformalisms failed to account for the informal, phenomenologi- sophical basis of current natural-langu age research. Fundathey are more likety to impute the implementor's theory, &s embodied in the program, rather than recognizethe particular organization in the phenomena under study.
372
HERMENEUTICS
mental assumptions and philosophical orientations underlying research must now be explicitly analyzedand justified. In rejecting "objectivism," Winograd advocates a "subjectivist" hermeneutical position that builds from Maturana's (114) notion of the nervous system as "structure determined,"plastic, and closed.According to this model, activities outside the system (stimuli) perturbate the structure of the system,and these perturbations in turn lead to "patterns of activity that are different from those that would have happenedwith different perturbations." Winograd's parallel notion of understanding posits a closed system in which preunderstanding evolves through acts of interpretation. As in Heidegger'shermeneutic circle, the possible horizons that can be understoodare constrained by the historically determined structure of preunderstanding or set of stored schemas(2). Understanding is open to the environment only within this range. Unlike Heidegg€r, who recognizedthe importance of the environment but failed to analyze it, Winograd is led to the analysis of the environment by several influences. These include Garfinkel's (69) ethnomethodology,which emphasizessocial context, Searle's focus on speechas social action, and Lakatos' (115) argument that even in mathematics the meanings of terms are contingent on outside context. Winograd Q) grounds his theory of meaning in terms of social action, and so takes a position close to critical hermeneutics,between relativism and objectivism. Stimulated in part by Winograd (2), Bateman (11,12)examof Heidegger'sexistential phenomenolines the consequences ogy and agrees with Dreyfus (1) that this philosophy denies the possibility of modeling thought and action using the specific formalizations proposedby the functionalist paradigm of cognitive science.Bateman saysthese formalisms are basedon the "ontological assumption" of an interpreter who follows rules in acting upon a mental representation of a situation. Heidegger's notion of "being-in-the-world," which includes both situatedness and understanding as ontological modes, precludes the subject-object dichotomy in this assumption. Since one is always in a situation, and its structure and significance are determined by its relevance to one's plans and purposes,no context-freerepresentationis possible. Bateman, however, does not dismiss the possibility of a functionalist paradigm for cognitive science.He wants instead to ground it on the later Heidegger'sidea of language' which, according to Bateman, seeks to make intelligible the experienceof "being-in-the-world" as it is for "anyone,"that is, for a generalized subject or member of a language community. As a collective artifact, a language is consideredto encodepartially the history of the language community through both the admissible and inadmissible combination (association)of words and phrases. The resulting connotatlqnal structure captures a kind of collective background knowledge and imposesa priori constraints on the actions of individuals who contemplate actions in terms of the language. In Halliday's "systemic grammar" (116) there is the notion of a "social semiotic" that acknowledgesthat a group's culture can restrict the possible meanings of utterances through constraints on possibleways of acting in situations. Bateman considers this orientation compatible with the hermeneutic view and believes that "systemic gIammtt," with appropriate revisions, can provide an adequate theoretical framework for natural-language understanding. Yet despitethis opennessto social constraints, Bate-
man does not consider hermeneuticists who came after Heideggct, most notably Gadamer and Habermas. Foundationsof Understanding.In a more recent work Winograd and Flores (3) draw upon philosophicalhermeneuticsand Maturana's (117) work on the biolory of cognition to deny the possibility of the constructing intelligent computers. They argue that to the extent Heidegger and Gadamer make a persuasive casethat certain features of human existenceare fundamental, the quest for intelligent machinery is quixotic. Theseconceptsinclude "thrownness,""blindness,"and "breakdown." "Thrownness" denotesthat people are thrown into the situations of everyday life and rarely have time to reflect on alternative coursesof action. They cannot be impartial, detached observersof the world in which they live, but they must decideand act using heuristics they have as part of their effective histories. Although these heuristics enable some action possibilities, the same heuristics also "blind" peopleto other action possibilities that might have predominated had their effective-histories been different. When faced with situations where their effective-histories fail to provide an adequate guide for action and also "blind" them to those actions that support their purposes, people experience a kind of "breakdown." In breakdown, actions becomeproblematic and tools which had been previously taken for granted are perceivedin isolation as objects. If an expert system (qt) is designedto present a user with possiblecoursesof action in particular situations, the concepts of "thrownness," "blindness," and "breakdown" also comeinto play. Although expert systems may operate successfully in well-understood,constrained domains, expert systemsin complex domains may be "thrown" into situations where they cannot evaluate all possibleactions and they consequently"break down." Systems targeted at complex domains must therefore rely on heuristic rules, but these may "blind" the program to more propitious coursesof action. Winograd and Flores add that the expert-systemprogrammer introduces his own "blindness" or preconceptions into the program. Because of these difficulties, Winograd and Flores recommendreformulation of the goals of artificial intelligence. Instead of directing efforts toward the putatively impossible goal of creating machines that can understand, programs should be designedto serve as tools for enhancing the quality of life. This could be done by recognizing the role of such programs in the web of conversations (speechacts) that constitute social existence,by attempting to minimize the "blindness" they engender,and by anticipating the range of their potential "breakdowns." Winograd and Flores present a reasoned critique of two specific categoriesof AI research. The first comprisesAI approaches that incorporate rigidly fixed means of interpretation, such as much work in knowledge-basedsystems. The secondcategory includes those approachesthat proceedfrom the dualist presumption that truth, meaning, and reference are establishedby means of a correspondencebetween entities in the world and entities in the mind (the correspondencetheory of truth) rather than in the everyday discourseof intelligent agents. Although they acknowledge that learning approaches might eventually be able to address the criticisms they raise, they do not expect progressin learning during the near term. Thus, their work amounts to a critique of the tractability of the "hard AI" project. As such, it constitutes a con-
HERMENEUTICS 373 tinuation of the critique AI begun by Dreyfus (1) but differs in that it comesfrom within AI and is argued in more computational terms. However, Winograd and Flores fail to demonstrateconvincingty that computer understanding exceedsthe range of the possible. They only demonstrate that the goal is much more difficult than many people, including many AI practitioners, may have thought. Unfortunately, Winograd and Flores unfairly charactenze as "blind" those AI approachesthat come closest to overcoming their objections, such as Winston's (108,110,118)approachto learning and reasoningby analogy. Winograd and Flores misconstrue Winston's approachas capable of producing results only becauseit operates in a microworld with primitives fixed in advance by the implementors. Although this criticism may be leveled fairly at many AI progfams, Winston's program is in principle not so limited, precisely because it is not based on domain-specificprimitives. Indeed, Winston's program is general enough to perform well in any domain becauseit processeslinguistically derived data accordingto the data's form rather than specificcontent. Moreover, becauseit learns rules on the basis of its experience(the effective history over which it can draw analogies).Winston's program representsa first computational approximation of the basic hermeneutic notion of a preunderstanding grounded in effective-history. Mallery and GroundingMeaningin EideticPhenomenology. Duffy (13) present a computational model of semantic perception-the processof mapping from a syntactic representation into a semantic representation. Some computational and noncomputational linguists (100,119,L20)advocate determining equivalent meanings (paraphrases)through the reduction of different surface forms to a canonicalizedsemantic form comprised by somecombination of semanticuniversals (e.9.,"conceptual-dependency"primitives). Mallery and Duffy reject this view on the grounds that most meaning equivalencesmust be determined in accordancewith the specificlinguistic histories of individual language users-or at least linguistic communities basedon social groups-and the intentional context of the utterance. Their alternative is lexical-interpretiueserrLantics, an approach to natural-language semantics that constructs semantic representations from canonical grammatical relations and the original lexical items. On this view, semantic representations are canonicalized only syntactically, not semantically or pragmatically. Instead of relying on static equivalencesdetermined in advance, lexical-interpretive semantics requires meaning equivalences to be determined at their time of use, reference time. To meet this requirement, Mallery and Duffy introduce the conceptof a ffLeaningcongruence class,the set of syntactically normalized semantic representations conforming to the linguistic experienceof specificlanguage users and satisfying their utterance-specific intentions. Meaning equivalencesare then given by th_emeaning congruence classes to which utterances belong. Lexical-interpretive semantics differs from approachesrelying on semantic universals becausemeaning equivalencesare determined dynamically at referencetime for specific-languageusers with individual histories rather than statically in advance for an idealized-language user with a general but unspecific background knowledge. The major assumption underlying lexical-interpretive se-
mantics is that meaning equivalencesarise becausealternative lexical realizations (surfaceforms) accomplishsufficiently similar speaker goals to allow substitution. Determining meaning congruencesin advance,based on static analysis, is hopelesslyintractable. This follows from the need to predict in advance all potential utterance situations, intentional contexts, and combinations of language-user effective-histories. Although semantic canonicalization on the basis of a general "semantic and pragmatic competence"renders static analyses of language-usercombinations tractable by fiat, it also reduces nuancesso dramatically that intentional analysis and individual linguistic histories play a drastically diminished role. Lexical-interpretive semantics is hermeneutic because it emphasizesinterpretation based on the individual effectivehistory of language users and the specificintentional structure of communicative situations. By virtue of its emphasison innovation in language and polyseffiy, Iexical-interpretive semantics is perhaps most closely aligned with the phenomenological hermeneutics of Ricoeur (72). Interpretation builds from an eidetic level of representation, the syntactically normalized semantic representation.The determination of meaning congruenceclassesbecomesan early level of a more general and open-endedhermeneutic interpretation. Stimulated by recent debates about perception (I2I,122), Mallery and Duffy consider semantic perception to be a processof mapping from sense-data, in this casenatural-langu age sentences,to a semantic representation. But instead of providing an account of perception suited to a theory of meaning basedon semantic universals like Feigenbaum and Simon (L22), MaIIery and Duffy provide one suited to a hermeneutic theory of meaning. Mallery and Duffy have implemented this theory, uP to the level of eidetic representation, in the RELATUS Natural-Language System (123). Although they share some of the hermeneutically oriented views and concerns articulated in Winograd Q) and Bateman (11,12), their implementation allows more concrete specification and testing of their theory, which currently focuseson earlier processinglevels. For example, Mallery and Duffy (13) have proposedconstraint-interpreting reference (L24) as a model that conforms to lexical-interpretive semantics,just as discrimination nets are well-suited to approaches relying on semantic primitives (122,125-127). They ground this choice both in the available experimental psycholinguistic evidence and in the desirable computational properties of reference based on constraint interpretation. These properties include maximizing monotonicity (minimizing backtracking) in the syntactic processing that precedes referenceand optimizing subgraph isomorphism (search)as it arises in reference and in other reasoning operations-particularly commonsensereasoning grounded in analory. Conclusions This entry has presentedhermeneutics primarily as a philosophy of understanding rather than as a set of technologiesfor interpretation in specific domains. As such, the hermeneutic tradition seemsable to speak to AI researchersin two distinct ways. First, hermeneutics provides some basis for arguing against the feasibility of the AI project, dt least under its present dispensation. Whether represented by Dilthey's idea of empathetic understanding or Heidegger's idea of situated understanding, hermeneutics seemsto have discovereda qual-
HERMENEUTICS
374
ity in the human situation that is vital for knowledge of others and oneselfbut has not yet been simulated mechanically.Becausethese doubts are generated from an ongoing intellectual tradition and becausethey refine some fairly common intuitions, they cannot easily be dismissedas "irrational technological pessimism."On the other hand, these doubts should stimulate attempts by AI researchers to overcome them, as just somedoubts raised by Dreyfus (1) stimulated earlier research. At the very least, then, the insights of the various hermeneutical camps can be expectedto receive increasing attention in the artificial intelligence community. Second,hermeneutics can suggestconstraints, orientations and even criteria in the design of AI systemsthat are intended either to understand natural language or to represent knowledgeof the social world. The lessonsof this tradition are, however, equivocal.Dilthey, Heidegger,Gadamer,Habermas,Ricoeur, and others provide very different notions of what constitutes understanding and its grounding. Nevertheless, researchers who are aware of these debates might be more cognizantof the choicesthey make in their own designs.As a systemswould not merely illustrate isolatedand consequence, perhaps idiosyncratic theories about linguistic phenomenabut would begin to support (or deny) major philosophical positions in ontolory, epistemology,and philosophy of mind. But the generally precomputational nature of contemporary hermeneutics calls for specific formulations that can be tested computationally. Computational experimentation, and empirical philosophy, can then feed back into the reformulation and refinement of ideas about both hermeneutics and AI.
BIBLIOGRAPHY 1. H. Dreyfus,What ComputersCan'tDo: A Critiqueof Artificial 'W. Reason, H. Freeman, San Francisco,1972.2nd edition with a new prefacewas published in L979. 2. T. Winograd, "What does it mean to understand natural language," Cog. Sci. 4,209-24L (1980).
3. T. Winograd and F. Flores, Understanding Computersand Cognition: A New Foundation for Design,Ablex, Norwood,NJ, 1986. 4. M. Heidegger,Being and Time, J. Macqarrie and E. Robinson (trans.),Harper & Row, New York, 1962.Originally publishedas Sein und Zeit, Neomarius Verlag, Tubingen, F.R.G., L927. 5. W. Dilth"y, "The Rise of Hermeneutics,"T. Hall (trans.), in P. Connerton (ed.),Critical Sociology:SelectedReadings, Penguin, Harmondsworth, U.K., pp. 104-116, 1976. Excerpted from W. Dilthey, "Die Entstehung der Hermeneutik," 1900, in W. Dilthey, Gesammelte Schriften, B. G. Teubner, Leipzig and Berlin, pp. 3L7-320, 323-31, 1923. 6. H. Gadamer, Truth q,ndMethod, Continuum, New York, 1975. Originally published as Wahrheit und Methode, Tubingen, F.R.G.,1960. 7. K. Apel , Towards A Transformation Of Philosophy, G. Adey and D. Frisby (trans.),Routledge& Kegan Paul, London,1980.Originally published in Transformation der Philosophie, Suhrkamp Verlag, Frankfurt am Main, F.R.G., L972, 1973. 8. J. Habermas, Knowledge and Human Interests, J. J. Shapiro (trans.), Heinemann, London, 1972. Originally published in 1968. 9. P. Ricoeur,Main Trends in Philosophy,Holmes and Meier, New York, 1979. Reprinted from Main Trends in The Social and Human Sciences-Part /1, UNESCO, New York, 1978: see Ref'
7r. 10. H. R. Alker Jr., W. G. Lehnert, and D. K. Schneider,"Two rein-
terpretations of Toynbee'sJesus: Explorations in computational hermeneutics,"Artif.Intell. Text Understand.Quad.Ric. Ling.6, 49-94 (1985). 11. J. A. Bateman, Cognitive ScienceMeets Existential Phenomenology: Collapseor Synthesis?Working Paper No. 139, Department of Artificial Intelligence, University of Edinburgh, Edinburgh, April 1983. t2. J. A. Bateman, The Role of Language in the Maintenance of Intersubjectivity: A Computational Investigation, in G. N. Gilbert and C. Heath (eds.),Social Action And Artificial Intelligence,Grower, Brookfield, VT, pp. 40-81, 1981. 13. J. C. Mallery and G. Duffy, A ComputationalModel of Semantic Perception,AI Memo No. 799, Artificial Intelligence Laboratory, MIT, Cambridge,MA, May 1986. 14. W. Stegmuller, The So-called Circle of Understandi.g, in W. Stegmuller (ed.), CollectedPapers on Epistemology, Philosophy of Scienceand History of Philosophy,Vol. 3, Reidel, Dordrecht, The Netherlands L977. 15. D. B. Lenat, AM: Discoveryin Mathematics as Heuristic Search, in R. Davis and D. B. Lenat (eds.),Knowledge-BasedSystems in Artificial Intelligence,McGraw-Hill, New York, pp. L-227, 1982. 16. D. B. Lenat, "Eurisko: A program that learns new heuristcsand domain concepts:The nature of heuristics III: Program design and results," Artif. Intell. 21, 6I-98 (1983). L7. K. W. Haase, ARLO: The Implementation of a Language for Describng Representation Languag€s, AI Technical Report No. 901, Artificial Intelligence Laboratory, MIT, Cambridge, 1986. 18. J. Piaget, The Origins of Intelligence in Children, M. Cook (trans.),W. W. Norton, New York, 1952. 19. J. Piaget, Genetic Epistemology, Columbia University Press, New York, 1970. 20. G. L. Drescher, Genetic AI: Translating Piaget Into LISP, AI Memo No. 890, Artificial Intelligence Laboratory, MIT, February 1986. 2L. M. Minsky, The Societyof Mind, Simon & Schuster,New York, 1986. 22. R. Palmer, Hermeneutics:Interpretation Theory in Schleiermacher, Dilthey, Heidegger, and Gadamer, Northwestern lJnversity Press,Evanston, IL, 1969. 23. J. Bleicher, Contemporary Hermeneutics: Hermeneutics as Method, Philosophy, and Critique, Routledge & Kegan PauI, London, 1980. 24. B. Smalley, The Study of the Bible in The Middle Ages,2d ed., Blackwell, Oxford,U.K., 1952. 25. F. Schleiermacher,in H. Kimmerle (ed.), Hermeneutik, Carl Winter Universitatsverlag, Heidelb€rg,F.R.G., 1959. 2G. J. B. Thompson, Critical Hermeneutics:A Study in the Thought of Paul Ricoeur and Jurgen Habermas, Cambridge University Press,Cambridg", U.K., 1981. ZT. W. Dilthey, in H. P. Rickman (ed.),SelectedWritings,Cambridge University Press,Cambridge,U.K., 1976. 28. M. Ermarth, WithetmDilthey: The Critique of Historical Reason, University of ChicagoPress,Chic&go,IL, 1978. 29. T. Plantinga, Historical [Jnderstanding in the Thought of Withetm Ditthey, University of Toronto Press,Toronto, 1980. 80. E. D. Hirsch Jr., Validity in Interpretation, Yale University Press,New Haven, CT' 1967. 81. E. Betti, Hermeneuticsas The GeneralMethodologyof The Geisrrr Ref. 23, pp. 5t-94. Originally published as teswissenschaften, Die Hermeneutik als allgemeineMethod der Geisteswissenschaften, Mohr, Tubingen, F.R.G., L962BZ. E. Husserl, Id.eas:General Introduction to Pure Phenomenology, W. R. B. Gibson(trans.),GeorgeAllen and Unwin, London,1931. First publishedin 1913.
HERMENEUTICS Bg. R. Schmitt, Phenomenology,in P. Edwards (ed.),The Encyclope' dia of Philosophy,Yols. 5 and 6, Macmillan, New York, pp. 135151,1967. 94. R. M. Zaner, The Way of Phenomenology:Criticism as a Philosophical Discipline, Pegasus,New York, 1970. 35. P. Ricoeur, Husserl: An Analysis of His Phenomenology,E. G. Ballard and L. E. Embree (trans.), Northwestern Univerity Press,Evanston, IL, 1967. 36. G. W. F. Hegel, The Philosophy of Mind, Part 3 of The Encyclopedia of the Philosophical Sciences,W. Wallace (trans.), Oxford University Press, Oxford, U.K., IgTL First published in 1830. g7. G. W. F. Hegel, The Scienceof Logic, Part 2 of The Encyclopedia of the Philosophical Sciences,W. Wallace (trans.), Oxford University Press,Oxford, U.K., L975.First published in 1830. 38. P. Singer, Hegel, Oxford University Press, Oxford, U.K., 1983. 39. M. Green, Martin Heid egger,in P. Edwards (ed.),The Encyclopedia of Philosophy,Yols.7 and 8, MacMillan, New York. pp. 457465, 1967. 40. G. Steiner, Martin Heidegger, Penguin, New York, 1980. 4L. M. Murray, Heidegger and Modern Philosophy: Critical Essays, Yale University Press,New Haven, CT, 1978. 42. H. Gadamer,Man and Language,in D. E. Linge (ed.and trans.), Philosophical Hermeneutics, University of California Press, Berkeley,pp. 59-68, L976. 43. E. Sapir, SelectedWritingsof Edward Sapir, University of CaIifornia Press, Berkeley, L947. 44. B. Whorf, Language, Thought and Reality, MIT Press, Cambridge, MA, 1967. 45. J. Habermas, A Review of Gadamer'sTruth and Method, in F. R. Dallmayr and T. A. McCarthy, (eds.),Understanding and Social Inquiry, University of Notre Dame, Notre Dame, pp. 335-363, L977. Originally published tn Zur Logik der Sozialwissenschaften, Suhrkamp Verlag, Frankfurt am Main, 1970. 46. H. Gadamer,Hegel'sDialectic: Fiue HermeneuticalStudies, P. C. Smith (trans.), Yale University Press, New Haven, CT, L976. German edition published in L97L. 47. D. E. Linge, Editor's Introduction, in Ref. 42, pp.xi-viii. 48. H. Gadamer, Hegel and Heidegg€r,in Ref. 46, pp. 100-116. 49. H. Gadamer,The Idea of Hegel's Logic, in Ref. 46, pp. 75-99. 5 0 . K. Apel, Scientistics, Hermeneutics and The Critique of ldeology: Outline of a Theory of Sciencefrom a Cognitive-Anthropological Standpoint, in Ref. 7, pp. 46-76. 5 1 . J. Habermas,"Knowledge and human interest," InquiU 9r 285300(1966). 52. L. Wittgenstein,PhilosophicalInuestigations,3d ed.,MacMillan, New York, 1968.Earlier edition published in 1953. N. Chomsky, SyntacticStructttres,Mouton, The Hague, 1957. N. Chomsky, Aspectsof The Theory of Syntax, MIT Press, Cambridge, MA, 1965. 55. N. Chomsky, Lectures on Gouernment and Binding, Foris, Dordrecht, 1981. 56. J. Habermas, What is Universal Pragmatics? Communication and the Euolution of Society,T . McCarthy (trans.), BeaconPress, Boston,pp. 1-68, 1979. First published in German in 1976. 57. J. Habermas, "Some Distinctions in Universal Pragmatics, Theor.Soc.3, 155-167 (197O. 58. J. B. Thompson,Universal Pragmatics, in J. B. Thompsonand D. Held (eds.),Habermas: Critical Debates,MIT Press, Cambridge, MA, pp. 116-133, 1982. 59. J. L. Austin, How To Do Things withWords, Harvard University Press, Cambridge, MA, t962. 60. P. H. Grice, Logic and Conversation,in P. Cole,and J. L. Morgan (eds.),Studies in Syntax, Vol. 3, AcademicPress,New York, pp. 41-58, r975.
375
6 1 . J. R. Searle, Speech Acts, Cambridge [Jniversity Press, Cambridge, U.K., 1970. 62. H. Gadamer, On The Scopeand Function of Hermeneutical Reflection, in Ref. 42 pp. 18-43. 63. T. McCarthy, Rationality and Relativism: Habermas's "Overcoming" of Hermeneutics, in Ref. 58, pp. 57-78. 64. T. McCarthy, The Critical Theory of Jurgen Haberrnans, MIT Press,Cambridge,MA, 1978. 6b. J. Habermas. The Theory of CommunicatiueAction, Vol. L, Reason and the Rationalizationof Society,T.McCarthy (trans.),Beacon, Boston, 1981. German edition published in 1981. 66. M. Weber,in E. Shils and H. Finch (eds.and trans.),The Methodology of the Social Sciences,Free Press, Glencoe,IL, 1949. 67. T. Parsons, The Structure of Social Action, McGraw-Hill, New York, 1937. 68. A. Schutz, The Phenomenologyof a Social World, Northwestern University Press,Evanston, IL, 1967. 69. H. Garfinkel, What is Ethnomethodology?In Ref. 46, pp. 24026L. Originally published in H. Garfinkel, Studies in Ethnomethodology,Prentice-Hall, Englewood Cliffs, NJ, L967. 70. E. Goffman, The Presentation of SeIf in Euerydoy Life, Doubleduy, New York, 1959. 7L. P. Ricoeur, "The task of hermeneutics,"PhiloA. Tod,.L7, (1973)' D. Pellauer (trans.). Reprinted in Ref. 41, pp. 141-160. Also reprinted in J. B. Thompson(ed.and trans.),Paul Ricoeur:Hermeneutics and the Human Sciences,Cambridge University Press, Cambridge,U.K., pp. 43-62, 1981. 72. P. Ricoeur, Phenomenologyand Hermeneutics, Translated and reprinted in J. B. Thompson(ed. and trans.),Paul Ricoeur:Hermeneutics and the Human Sciences, Cambridge University Press,Cambridge,England, pp. 101-L28, 1981.Originally published as "Phenomenologie et Hermeneutiqu€," Phanomenologische Forschungen, Vol. 1, E. W. Orth (ed.), Karl Alber, Freiberg,pp. 3l-77, 1975. 73. M. Merleau-Ponty, Phenomenology of Perception, C. Smith (trans.),Routledge& Kegan Paul, London, L962.Originally published as Phenomenologiede la Perception,Paris, L945. 74. F. A. Olafson,Maurice Merleau-Ponty in P. Edwards (ed.),The Encyclopediaof Philosophy,Vols. 5 and 6, MacMillan, New York, pp. 279-282, L967. 75. P. Ricoeur, "The hermeneutical function of distanciation," PhiIas.Tod. L7, t29-143 (1973).Reprinted in Ref. 9, pp. 131-144. 76. C. Levi-Strauss,Structural Anthropology,C. Jacobsonand B. G. Schoepf(trans.), Penguin, Harmondsworth,U.K., 1968. 77. P. Ricoeur, Freud and Philosophy: An Essay on Interpretation, D. Savage (trans.), Yale University Press, New Haven, CT, 1970. 78. P. Ricoeur, Hermeneutics and the Critique of Ideology,in J. B. Thompson (ed. and trans.), Paul Ricoeur: Hermeneuticsand the Human Sciences,Cambridge University Press,Cambridgu,U.K., pp. 63-100, 1981.Originally publishedas Hermeneutiqueet critique des ideologies,in E. Castelli (ed.),Demythisationet ideologie, Aubier Montaigtr€, Paris, pp. 25-64, 1973. 79. P. Ricoeur, "The model of text: Meanin#ul action consideredas text," Soc.Res.38,529-562 (L971).Reprinted in J. B. Thompson (ed. and trans.), Paul Ricoeur: Hermeneuticsand the Human Sciences,Cambridge University Press,Cambridge,U.K., 1981. 80. K. Popper, The Logic of ScientificDiscoue4y,Basic Books, New York, 1959. 81. P. Ricoeur, "Creativity in language,"Philos. Tod. 17,97-111 (1e73). 82. P. Ricoeur, The Rule of Metaphor: Multi-Disciplinary Studies of the Creation of Meaning in Language, R. Czerny, (trans.), University of Toronto Press, Toronto, 1977. Originally published as La Metaphore uiue,edition du Seuil, Paris, 1975.
376
HEURISTICS
83. P. Ricoeur, Metaphor and the Main Problem of Hermeneutics, New Literary History, Vol. 6, pp. 95-110, 1974-75. Reprinted in C. E. Reagan and D. Stewart (eds.),The Philosophyof Paul Ricoeur:An Anthology of His Work, Beacon,Boston, pp. L34-I48, 1978. 84. F. R. Dallmayr, Languo,geand Politics: Why DoesLanguageMatter to Political Philosophy?" University of Notre Dame Press, Notre Dame, IL, 1984. 85. K. Apel, The Communication Community as the Transcendental Presuppositionfor the Social Sciences,in Ref. 7, pp. 136L79. 86. K. Apel, lJnderstanding and Explanation, G. Warnke (trans.), MIT Press, Cambridg", MA, 1984. Originally published as Die Brklaren-Verstehen-Kontrouerse in Tranzendental-Pragmatischer Sicht, Suhrkaffip, Frankfurt am Main, F.R.G., 1979. 87. G. Radnitzky, Continental Schoolsof Metasciences:The Metascienceof the Human SciencesBased upon the "Hermeneutic-Dialectic" School of Philosophy, Vol. 2 of ConternporarySchools of Metascience,Scandinavian University Books,Goteborg,Sweden, 1968. 88. T. Abel, The Operation Called Verstehen,in Ref. 46, pp. 8I-92. Originally published tn Am. J. Soc. 54, 2L1-218 (1948). 89. P. Winch, The Idea of a Social Scienceand its Relation to Philosophy, Routledge & Kegan Paul, London, 1958. 90. Reference89, p. 115. 91. T. S. Kuhn, The Structure of ScientificReuolutions,University of ChicagoPress,Chicago,IL, L962. 92. T. S. Kuhn, The Essential Tension: SelectedStudies in Scientific Tradition and Change,University of ChicagoPress,Chicago,IL, L977. 93. R. J. Bernstern, Beyond Objectiuism and Relatiuism: Science, Hermeneutics, and Praxis, University of Pennsylvania Press, Philadelphia, 1983. 94. P. Feyerabend,Consolationsfor the Specialist,in I. Lakatos and A. Musgrave (eds.),Criticism and the Growth of KnowledgeCambridge University Press,Cambridge,U.K., pp. L97-230, 1970. 95. P. Feyerabend,Against Method,Yerso, London, 1978. 96. H. Putnaffi, Reason, Truth and History, Cambridge University Press,Cambridge,U.K., 1981. 97. W. C. Lehnert, H. R. Alker Jr., and D. K. Schneider,The Heroic Jesus: The Affective Plot Structure of Toynbee's Christus Patiens, in S. K. Burton and D. D. Short (eds.),Proceedingsof the Sixth International Conferenceon Computers and the Humanities,ComputerSciencePress,Rockville,MD, pp.358-367,1983. 98. W. C. Lehnert, "Plot units and narrative summarization," Cog. Scl. 4, 293-331 (1981). 99. W. C. Lehnert, Plot Units: A Narrative Summarization Strategy, in W. C. Lehnert and M. H. Ringle (eds.),Stratgies for Natural Language Processing, Erlbaum, Hillsdale, NJ, pp. 375-4L4, L982. 100. R. C. Schank and R. Abelson, Scripts,Plans, Goals,and Understanding, Erlbauffi, Hillsdale, NJ, 1977. 101. T. Winograd, Understanding Natural Langudge,Academic,New York, 1972. Theory of Truth, in P. Edwards(ed.), L02. A. N. Prior, Correspondence The Encyclopedia of Philosophy, Vols. l-2, MacMillan, New York, pp. 223-232, 1967. 103. J. R. Searle,"A Taxonomy of Illocutionary Acts," in K. Gunderson (ed.),Language And Knowledge: Minnesota Studies In PhilosophyOf Science,11,University of Minnesota Press,Minneapolis, pp. 344-369, 1975. L04. J. R. Searle, "The intentionality of intention and action," Cog. S c r .4 , 4 7 - 7 0 ( 1 9 8 0 ) . 105. Reference2, p.2Lg.
106. D. G. Bobrow, and T. Winograd, "An overview of KRL, a knowledgerepresentationlanguage," Cog. Sci. 1,3-46 (L977). 107. H. A. Simon, "Rational decision making in business organizations," Am. Econ. Reu. 69,493-513 (1979). 108. P. H. Winstort, "Learning and reasoning by analogy," CACM 23, (December1980). 109. J. G. Carbonell, Learning by Analogy: Formulating and Generalizing Plans From Past Experience,in R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning: An Artificial Intelligence Approach, Tioga, PaIo Alto, CA, pp. 137-L62, 1983. 110. P. H. Winston, Artificial Intelligence,Addison-Wesley,Reading, MA, 1994. 111. H. R. Alker Jr., J. Bennett, and D. Mefford, "Generalizedprecedent logics for resolving insecurity dilemmas," Int. Interact. 7, 165-206 (1980). LL2. J. G. Carbonell, Metaphor: An InescapablePhenomenonin Natural Language Comprehension,in Ref. 99, pp. 415-434. 113. Reference2, p. 227 LL4. H. R. Maturana, Biology of Knowledge,in R. W. Reiber (ed., The Neurophysiology of Language, Plenum, New York, L977. 115. I. Lakatos, Proofs and Refutations, Cambridge University Press, Cambridge,MA, 1976. 116. M. A. K. Halliday, Language as Social Semiotic,Edward Arnold, London, 1978. LL7. H. R. Maturana, Biology of Cognition, in H. R. Maturana and F. Varela (eds.),Autopoeisis and Cognition: The Realization of the Liuing, Reidel, Dordrecht, 1980,2-62. 118. P. H. Winston, "Learning new principles from precedentsand exercis€s,"Artif. Intell. 19, 321-350 (1982). 119. J. J. Katz and J. A. Fodor, "The structure of a semantic theory:' Language 39(2), L70-210 (1963). L20. R. C. Schank, "Conceptual dependency:A theory of natural language," Cog. Psychol.3,552-63 0972). LZL. L. W. Barsalou and G. H" Bower, "Discrimination nets as psychological modelsl' Cog.Scl. 8, L-26 (1984). L22. E. A. Feigenbaumand H. A. Simon,"EPAM-like modelsof recognition and learningi' Cog. Sci. 8, 305-336 (1984). t23. G. Duffy and J. C. Mallery, Relatus: An Artificial Intelligence Tool for Natural Language Modeling, AI Memo No. 847, Artificial Intelligence Laboratory, MIT, Cambridge, MA, 1986. L24. J. C. Mallery, Constraint-Interpreting Reference,AI Memo No. 827, Artificial Intelligence Laboratory, MIT, Cambridge, MA, 1986. L25. E. A. Feigenbaum, An Information ProcessingTheory of Verbal Learning, RAND, Santa Monica, CA, 1959. L26. J. L. Kolodner, "Reconstructive memory: A computer model," Cog.Sci. 7, 280-328 (1983). L27. J. L. Kolodner, "Maintaining organization in a dynamic longterm memoryi' Cog. Sci. 7,243-280 (1983). J. C. MallERY and R. Hunwlrz MIT G. Duppv University of Texas at Austin
HEURISTICS Heuristics are approximation techniques for solving AI problems. AI deals primarily with problems for which no practical exact solution algorithms are known, such as finding the shortest proof of a given theorem (seeTheorem provitg) or the
HEURISTICS 377 least costly plan for robot actions (see Planning). Heuristics provide uppro*imate methods for solving these problems with practical computational resourcesbut often at some cost in solution quality. Their usefulness is derived from the fact that the trade-offs among knowledge, computation time, and solution quality are generally favorabte. In other words, a small amount of approximate knowledge often buys a large 'improvement time. -Candidatein solution quality, and/or computation generally fall methods heuristic problems for are algorithms no exact which for those classes: into two known at all and those for which the known exact algorithms are computationally infeasible. As an example of the first class, consider the problem of computer vision (seeVision). The task is to take the output of a digitizing camera in the form of a two-dimensional matrix of pixel values representing color and light intensities, and transform it into a high-level symbolic description of objects and their spatial relationships. UnfortunatelY, there are no known algorithms for solving this problem that are guaranteed to always yield a "cortect" interpretation of the scene. Computer chessis an example of the secondclass of probIem (see Computer chess methods). In principle, there is an exact deterministic algorithm for always making an optimal move in a chess game. It requires generating all moves and countermoves in the game until only won, lost, and drawn positions remain, and propagating the ultimate outcomes of these positions back to the current position in order to choose an optimal move (seeMinimax procedure).Unfortunately, the number of positions that would have to be generated by such an algorithm could be as large as 1gtzo.Thus, although an exact solution to this problem is known, the computational cost of running the algorithm is prohibitive. In either casearrivin g at an exact solution is either impossible or impractical. Thus, AI programs must resort to heuristic techniques that provide approximate solutions. Their power lies in the nature of the trade-offs between domain knowledge (qt), computation, and solution quality. If the domain knowledge is fixed, increasedcomputation results in improved solution quality. Alternatively, if the amount of computation is held constant, more accurate domain knowledge producesbetter solutions. Finally, for a given level of solution quality, improved domain knowledge reduces the amount of computation required. The value of more accurate domain knowledge is that it improves the trade-off between computation and solution quality. For example, given no knowledge of chess,two algorithms suggest themselves: one is the complete minimax procedure for playing perfect chess,and the other is to make legal moves randomly. The minimax procedure producesperfect play but at a tremendous cost in computation, whereas the random algorithm is very efficient but generates very poor play. Introducing some heuristic knowledge allows some additional computation to produce large improvements in quality of play. For example, one heuristic for chessis to always make the move that maximizes one's relative piece or material advantage. Although less efficient than random Play, this heuristic provides a relatively efficient means of selecting a next move that results in play that is far superior to random play but still inferior to perfect play. Returning to the vision example, heuristics such as "adjacent pixels with the sameintensity values probably belong to the same object" and dramatically improve the ability of programs to interpret visual scenes,but at the risk of occasionallymaking mistakes.
The nature of these trade-offs among knowledge, computation, and solution quality determines the usefulnessof heuristic knowledge. If it were the casethat a large percentageof the knowledge and/or computation necessary for perfect performance was required for even minimal performance,heuristic techniqueswould not be practical. For example,if it were necessary to examine any significant fraction of the lgtzo chess boards in order to achieve even beginner skill levels, good chessprograms could not be built. On the other hand, if significant performancelevels can be achieved with relatively small amounts of knowledge and computation, heuristics becomevery cost-effective,at least until near-optimal performance levels are reached. In computer chess,for example, if quality of play is measuredas the percentage of human players that can be beaten by a given progroffi, small amounts of knowledge and computation provide large improvements in performance, at least initially. Only when Expert- or Master-level performance is achieved is a point of diminishing returns reached where additional performance increments come only with a large amount of knowledge or at great computational cost. One of the empirical results of the last 30 years of AI research is that for many problems, the knowledg., computation, and solution-quality trade-off is initially quite favorable. Thus, a little knowledge and computation goesa long way and heuristic programs have been spectacularly successful at achieving moderate levels of performancein a large number of domains. At the same time it becomesincreasingly difficult to improve the performance of progfams as they begin to approach expert levels of competence. HeuristicEvaluationFunctions Given this general discussion of heuristics as a background, almost all of the analytic and experimental work on heuristics per se has occurred on a special case of heuristics, namely heuristic evaluation functions. The only exceptionsto this rule are the development of heuristic production rules for particular problem domains and the EURISKO (qv) project, which is discussedbelow. A heuristic evaluation function is a function that maps problem situations to numbers. These values are then used to determine which operation to perform next, typically by choosing the operation that leads to the situation with the maximum or minimum evaluation. Heuristic evaluation functions are used in two different contexts: single-agent problems and two-player games. Single-AgentProblems.The classicAI example of a singleagent problem is the Eight Puzzle (seeFig. 1). It consistsof a 3 x 3 square frame containing eight numbered square tiles and one empty position. Any tile horizontally or vertically adjacent to the empty position can be slid into that position. The task is to rearrange the tiles from a given initial configuration into a particular goal configuration by a shortest sequenceof legal moves. The brute-force solution to this problem involves searching all move sequencesup to the length of the optimal solution. Since the Eight Puzzle has roughly 180,000 solvable states (9!12),thisapproachis feasibleby computer.However,for even the slightly larger 4 x 4 Fifteen Puzzle, which has approximately 10 trillion solvable states (16!l2), this brute-force approach is computationally intractable.
378
HEURISTICS
For example, if the constraint that position Y be empty is removed, the resulting problem allows any tile to move along the grid regardlessof where the empty position is. The number 5 6 8 7 of moves required to solve this simplified problem is exactly equal to the Manhattan Distance. 9 10 1 1 T2 If both constraints are removed, the resulting problem allows any tile to move directly to its goal position in one move. The number of moves neededto solve this problem is exactly l3 15 T4 equal to the number of tiles that are out of place. This is an obvious heuristic estimator for the original problem that is Figure 1. Eight and Fifteenpuzzles. even cheaperto compute than Manhattan Distance but is also Iess accurate. Finally, if only the constraint that positions X and Y be The standard heuristic approachto this problem makes use adjacent is removed, the resulting problem allows one to move of an evaluation function to guide the search. The heuristic any tile into the empty position, adjacent or not. The number evaluation function is interpreted as an estimate of the numgoal of movesrequired to solve this problem is the number of times the to state ber of moves required to map the current the empty position must be swappedwith another tile to solve for the function heuristic known best the state. For example, Eight Puzzle is called the Manhattan Distance heuristic. It is the problem, which suggestsanother heuristic estimate for the computed by taking each tile individually, measuring its dis- original problem. Although it is not as obvious how to express tance from its goal position in grid units, and summing these this value in closed.form, it is not necessary.A program can values for each tile. Note that this measure in general under- simply count the number of steps required to solve each simplified problem and use this count as a figure of merit for the estimates the number of moves since it does not take into moves in the original problem. A simplification schemeof this consideration interactions between the tiles. algotype was implemented for discovering heuristics in constraintGiven such an estimate, there are a number of different consider satisfaction (qt) problems (2). to move which decide it to of rithms that make use by required is than time in less find a solution to in order next Two-PersonGames. Although a heuristic evaluation funcbrute-force search (seeSearch). The simplest, often referred to for a single-agent problem is normally an estimate of the tion is to always as pure heuristic search or the greedy algorithm, to the goal, the exact meaning of a heuristic function distance minimum the with state the to selectnext the move that leads game is not as precise.Generally speaking, it for two-player of a goal. accuracy As the the to distance of estimate heuristic the heuristic improves, the amount of searchrequired to find a is a function from a game situation to a number that measures solution and the cost of the resulting solution both decrease. the strength of the position for one player relative to the other. player A stightly more complex algorithm, called A* (seeA* algo- Large positive values reflect good positions for one positions the for strong indicate values negative large of whereas number actual the estimate heuristic rithm), adds to the positions to moves always Max, player, called One opponent. the from state get curuent the to to used were moves that initial state, and then always selectsnext the state for which that maxim tze the heuristic evaluation function, whereas the minimize tt. this sum is a minimum. This amounts to selecting states in other player, Min, moves to positions that for the game of function evaluation simple a For example, solution of a cost total the of increasing order of the estimate Max's piecesand subof values weighted the is sum to chess additional an Given state. that pass through to constrained pieces.The weights reflect the constraint on a heuristic function that it never overestimate tract the weighted sum of Min's pieces, and the classic values are the of utilities different by the actual cost of a solutior, o constraint that is satisfied and Pawn-l. Note that Knight-3, Bishop-3, Rook-5, be shown it can Puzzre, Queen-9, Manhattan Distance for the Eight not to maximize mateis and ate, is checkm goal chess of the acmore a case that In solution. that A* finds an optimal goal in the an approximate represents curate heuristic reduces the amount of search required to rial. Material, however, Even if efficiently. computed be can of which game, status the find the optimal solution. A number of theoretical results game in the as game maximizematerial, is to of the object the search and accuracy quantify this trade-off between heuristic material maximizing of Othello, it is not necessarilytrue that efficiency(1). in the short term is the best way to maximize it over the long A more accurate evaluation function for chesswould inrun. HeuristicsFromSimplifiedModels. Where do theseheuristic additional components such as center control, pawn clude discovery their evaluation functions come from, and how can and mobilitY. structure, sugquestion, which first the be automated? One answer to technique used to increase the accuracy of a heuAnother evaluation heuristic is that second, the to gests an approach at the cost of increasedcomputation functions ut. derived from simplified models of the original ristic evaluation function idea is that, instead of directly basic The look-ahead. is called problem (1). current position and picking the of successors the for evaluating rule move legal the For example, one way of describing evaluation can be obtained by the Eight Puzzle is that a tile can be moved from position X to the best, a more accurate evaluating the positions at moves, several forward position y iff position X is adjacent to position Y and position searching values to the successorsof the up backing then and level, that the is removed, Y is empty. If either of these constraints algorithm. The minimax minimax the position by cupent the for idea The solve. to is easier result is a simpler problem that of a position where Min is (qv) computes the value generating heuristics is that the exact number of moves re- algorithm and the its successors, of values of the minimum the as move to to compute [uired to solve the simprer problem may be easy of maximum the as move to is Max position where a of value to needed moves of and can serve as an estimate of the number lookgames minimax most the values of its successors.For solve the original Problem. I
2
3
4
HEURISTICS 379 ahead search improves the accuracy of the evaluation with increasing search depth. Since improved accuracy results in better-quality play, look-ahead provides a nearly continuous trade-off between computation cost and quality of play. In practice, programs search as far ahead as possible given the ro*p,rtational resources available and the amount of time allowed between moves. Unifying One- and Two-Player EvaluationFunctions.Although most of the literature on heuristic search in singleug"ttl problems overlaps very little with that on two-player g"tn6, th"t" is a consistent interpretation of heuristic evaluation functions in both domains (3). In both casesan ideal heuristic evaluation function has two properties: When applied to a goal state, it returns the outcome of the search; and the value of the function is invariant over an optimal move from any given state. The outcome of a search is the figure of merit against which successis measured, such as the cost of a solution path or the win, Iose, or draw result of a game. Note that the constraints of determining the outcome and invariance over the best moves guarantee that suboptimal moves will have a different evaluation than the optimal moves. Taken together, these two properties ensure a function that is a perfect predictor of the outcome of pursuing the best path from any state in the problem space.Therefore, a heuristicsearch algorithm using such a function will always make optimal moves. Furthermore, any successful evaluation function should approximate these properties to some extent. For example, the evaluation function for the Ax algorithm is f(s) - g(s) + h(s),where g(s) is the cost of the best path from the initial state to the state s and h(s) is an estimate of the cost of the best path from state s to a goal state. Typically the h term is called the heuristic in this function, but for this text the entire function f is referred to as the heuristic evaluation function. When this function is applied to a goal state, the h term is zero,the g term representsthe cost of reaching the goal from the initial state, and hence /returns the cost of the path or the outcomeof the search.If h is a perfectestimator, then in moving along an optimal path to a goal state, each move increasesg by the cost of the move and decreasesh by the same value. Thus, the value of f remains invariant along an optimal path. It h is not a perfect estimator, / will vary somewhat depending on the amount of error in h. Thus, a good evaluation function for an algorithm such as A* will determine the outcome of the search and is relatively invariant over single moves. Now considera two-persongame using minimax searchand a heuristic evaluation function. The heuristic evaluation reflects the strength of a given board position. When applied to a state where the game is over, the function determines the outcomeof the game, or which player won. This is often added as a special caseto an evaluation function, typically returning positive and negative infinity for winning positions for Max and Min, respectively. When applied to a nongoal state, the function is supposedto return a value that predicts what the ultimate outcome of the game will be. To the extent that the evaluation is an accurate predictor, its value should not changeas the anticipated moves are made. Thus, a goodevaluation function should be invariant over the actual sequenceof moves made in the game. Therefore, in both domains a good evaluation function should have the properties of determining outcomeand being invariant over optimal moves.
LearningEvaluationFunctions.The idea that heuristic evaluation functions should remain invariant over optimal moves can also be used to automatically learn (seeLearning) evaluation functions. The basic idea is to search in a spaceof evaluation functions for a function that has this invariance property. This is done by computing the difference between direct evaluations of positions and the values returned by look-aheadand modifying the evaluation function to reduce this difference. This idea was originally used by Samuel in a pioneering program that automatically learned a very powerful evaluation function for checkers based on a large number of different factors (4) (see Checkers-playing programs). A refinement of Samuel's technique used linear regression to automatically learn a set of relative weights for the chess piecesin an evaluation function basedjust on material (3).The basic idea is that any board position gives rise to an "equation" that constrains the ideal evaluation function. The left side of the equation is the function as applied to the given position, and the right side is the backed-upvalue of the function resulting from look-ahead search. In an ideal evaluation function these two values would indeed be equal. By generating a large number of such "equations," one from each board position, linear regression can be used to find the set of weights that provides the best approximation to an invariant evaluation function. Iterating this entire processover successiveapproximations of the heuristic function produces a converging sequence of weights for the pieces. HeuristicRules Although most work on heuristics has focusedon humerical evaluation functions for one- or two-player games, the EURISKO (qv) project has addressedthe nature of heuristics in general (5). The lessonslearned from EURISKO are consistent with, but more general than, the results concerning heuristic evaluation functions. Recall that heuristic evaluation functions derive their power from their relative invariance over single moves in the problem space. In other words, the value of a given state is roughly equal to the value of the state resulting from making the best move from the given state. This can be viewed as a form of continuity of the evaluation function over the problem space. This idea was originally suggestedin the more general context of heuristic production rules for determining what action to apply in a given situation (5). A production rule is composed of a left side that describesthe situations in which the rule is applicable and a right side that specifiesthe action of the rule (see Rule-based systems). Consider the function Appropriateness(Action, Situation), which returns somemeasure of appropriateness of taking a particular action in a particular situation. The claim is that heuristics derive their power from the fact that this function is usually continuous in both arguments. Continuity in the situation argument means that if a particular action is appropriate in a particular situation, the same action is likely to be appropriate in a similar situation. Continuity in the action argument means that if a particular action is appropriate in a particular situation, a similar action is also likely to be appropriate in the same situation. Furthermore, this appropriatenessfunction is time-invariant, which amounts to a strong form of continuity in a third variable, time. In other words, if a particular action is appro-
380
HORIZON ETFECT
priate in a particular situation, that sameaction will be appropriate in that same situation at a later time. If the notion of an action is broadenedto include an evaluation, the invariance of an evaluation function can be viewed as a special caseof this idea where the situation variable ranges over different states of the same problem. Similarly, the use of an exact evaluation from a simplified problem as a heuristic evaluation for the original problem can be viewed as another example of this general rule where the situation variable is allowed to range over similar problems. The notion of continuity of appropriateness over actions and situations was used to automatically learn heuristic production rules. In Eurisko, both the situation and action sides of a rule are describedusing a large number of relatively independent features or parameters. Given a useful heuristic, Eurisko generates new heuristics by making small modifications to the individual features or parameters in the situation or action sides of the given heuristic. The continuity property suggeststhat a large number of heuristics derived in this way will be useful as well.
sentation is employed in which specific piece configurations represent discrete states, and the moves that are legal from these positions represent the permissible operators. A lookahead game tree (qv) is developed by generating all of the positions that could be produced by every possible move sequence for the two players. Since it would take literally millions of years to examine all possiblelines of play until each reached a terminal state (win, lose, or draw), existing game programs search only a few moves ahead (usually three to six) and then artificially declare the position as "terminal" and make a heuristic (qr) evaluation of whether it is good for the player on the move. The values assigned to these end points are then "backed up" to the initial position by using a minimax stratery (qv) (2). The backed-up value for each of the potential moves at the initial position determines which is the best. TerminalPositions
Positions that are declared terminal may be, in fact, very turbulent. For example, in chess,a so-calledterminal position might be one that is in the middle of a queen exchange. The Conclusions heuristic evaluation calculated for such a position will be inacHeuristics are approximation techniques for solving AI prob- curate becausethe queen discrepancywill be correctedon the lems. Approximations, however, are only useful in domains next move. This common problem has been addressedrouwith some form of continuity. Thus, the power of heuristic tinely in chess by developing a quiescencefunction that astechniques comesfrom continuities in their domains of appli- sessesthe relative material threats for each side and adjusts cability. The successof heuristic techniques in AI can be taken the evaluation function accordingly. Sometimesthis is doneby as evidencethat many domains of interest contain continuities direct calculation and sometimes by a minature look-ahead search from each terminal position examining only capturing of various kinds. moves and a subset of checking moves. This approach is usually reasonably accurate with respect to material considerations but is often blind to positional factors, which may be in BIBLIOGRAPHY a turbulent state. An example of positional turbulence is a piece en route to an important location where it will exert a Reading,MA, 1984. 1. J. Pearl,Heuristics,Addison-Wesley, 2. R. Dechterand J. Pearl,The Anatomyof EasyProblems:A Con- commanding presence.Despite its attractive destination, its s of theI{ inth I nterna- current position may appear to be weak or even dangerous. Formulation,Proceeding straint-satisfaction tional Joint Conferenceon Artificial Intelligence, Los Angeles, CA, Other dynamic positional factors include a trapped piece, a August 1985,pp. 1066-L072. pawn in a crucial lever role, and a pawn aspiring for promo3. J. Christensen and R. E. Korf, A Unified Theory of Heuristic Evalution. Current quiescencefunctions often misevaluate these poation Functions and Its Application to Learning, Proceedingsof the Fifth National Conferenceon Artificial Intelligence, Philadelphia, PA, August 1986,pp. 148-L52. 4. A. L. Samuel, Some Studies in Machine Learning Using the Game of Checkers, in E. Feigenbaum and J. Feldman (eds.),Computers and Thoughf, McGraw-Hill, New York, 1963,pp. 71-105. 5. D. B. Lenat, "The nature of heuristics,"Artif.Intell.l9(2), 189-249 (October1982). R. E. Konr' UCLA This work was supported in part by NSF Grant IST 85-15302,by an NSF Presidential Young Investigator Award, and by an IBM Faculty Development Award.
HORIZONEFFECT Two-person zero-sum, strictly competitive games such as chess,checkers,and Othello can be played quite skillfully by a computer. The methodology most commonly used today dates back to a seminal paper by Shannon (1). A state spacerepre-
sitions. Berliner (3) provided the name horizon effect to this classof problems because the arbitrary search termination rule causedthe program to act as if anything that was not detectable at evaluation time did not exist. Berliner defined two different versions of this phenomena, a negative-horizon effect and a positive-horizon effect. The negative-horrzon effect involves a form of self-delusionin which the program discoversa series of forcing moves that push an inevitable unpleasant consequencebeyond the searchhorizon. The program manages to convince itself the impending disaster has gone away when in fact it is stitl lurking just beyond the search horizon. In essence,the negative-horuzoneffect is an unsuccessfulattempt to avert an unpleasant outcome.The positive-horizon effect is a different form of self-delusion. In this effect the program attempts to accomplish a desired consequencewithin the search horizon even when the outcome would be much better if postponed a few moves. In Berliner's words the program "prematurely grabs at a consequencethat can be imposed on an opponent later in a more effective form." Both of these effects are based on improper quiescence,and usually this has to do with the evaluation of positional factors.
HORIZON EFFECT
381
Negative-HorizonEffect An excellent example of the negative-hortzorteffect occurred in a computer chess match (4) at the sixth North American computer chess championship (Minneapolis, 1975) between programs from Northwestern University and the University of Waterloo. Figure 1 depicts the game position after black's twelfth move, Ra8 to b8, attacking the advancedwhite pawn atb7. This position resulted from an early exchangeof queens and minor pieces.At this juncture, white is destined to losethe advanced pawn, which will even up the material but leave white with a slight positional advantage (its king is castledand its rook dominates the queen file). The Northwestern program placed a high value on the passedpawn on the seventh rank. Instead of accepting the inevitable loss of the pawn, white deviseda plan to "save" it by making liberal useofthe negativehorizon effect. In its look-aheadsearchwhite discoveredthat it could advance pawns on the rook file and knight file, which would force black to retreat the bishops. The tempos used in these pawn thrusts were sufficient to push the eventual capture of the white pawn at b7 over the search horizon. White continued the actual game by playing 13. a3, forcing the black bishop atbf to retreat. White followed with 14. h3, forcing the black bishop at 94 to retreat. White's next move continued the sametheme, 15. 94,forcing the black bishop to move again and substantially weakening white's defensiveposition. From the computer's perspective these attacking pawn moves were effective becauseeachone savedthe pawn atb7. In reality, these moves,especially L5. 94, weakenedwhite's position. Positive-HorizonEffect The positive-horizon effect can be demonstratedwith the position presented in Figure 2 with white to move. In this situation white's pawn advant age provides excellent winning chances.For most programs the look-ahead search will not be
iir
iii
1[ : -a-: .r:
I.
A :I :'fr: A i.i*;
erD+ r[-J:
g
-
I
I
A iA
m
€
n
:r I
€
.
( t .
r\
a Figure 2.
sufficiently deep to "see" the pawn promotion. Therefore, the correct move choice must be based on heuristic factors such as moving the pawn closer to the eighth rank. With a typical shallow search,white is likely to push the pawn immediately, ignoring the black knight's threat to capture becausewhite can recapture. Heuristic evaluation functions usually consider a knight to be worth as much as three pawns, and therefore, the program would assumethat black would not initiate such a foolish exchange.In reality, the exchangeof the knight for the pawn is good for black since it transforms a losing situation into a draw. This conclusion is based on the knowledge that white can only win by promoting the pawn, and thus the pawn in this situation is much more valuable than the knight. Programs that know about the future only in terms of their immediate look-ahead search underestimate the value of the pawn becauseits "moment in the sun" lies beyond their searchhorizon.Most chessprograms would throw away the win by giving their opponent the opportunity to exchangethe knight for the pawn. This positive-horizon effect differs from the negativehorizon effect in that it results from an inability to understand long-range consequencesand is not influenced dramatically by moving the search horizon one or two plies deeper (see also Computer chessmethods).
A
a A ffi H Figure 1.
.)(' :) < :/,)
-=t
BIBLIOGRAPHY
A E g
1. C. E. Shannon, "Programming Mag. 4I, 256-275 (1950).
a computer to play chess," Philos.
2. P. w. Frey, An Introduction to Computer Chess, in P. w. Frey (ed.;, Chess Skill in Man and Machine, Springer-Verlug, New York, pp. 54-91,1993. 3. H. J. Berliner, Some Necessary Conditions for a Master Chess Program, Proceedings of the Third International Joint Conference on Artifi.cial Intelligence, Stanford, CA, pp. 77-85, 1973.
382
HOUGH TRANSFORM
4. B. Mittman, A Brief History of the Computer ChessTournaments: 1970-1975, in P. Frey (ed.), Chess Skill in Man and Machine, Springer-Verlag, New York, pp. 27-28, 1983.
Description
In the HT, features of phenomena (e.g., shape features) in an input space produce votes for phenomena in a parameterized spaceof causesor explanations (e.9.,shapelocation) transform P. W. Fnnv with which the features are compatible (see Feature extracNorthwestern University tion). Explanations garnering the most votes are those that account for the most features. For example, points in (x, y) input space may lie on (be explained by) a line described in programming. Logic See HORN CIAUSES. parameter spaceby the two parameters m and b in the equation y : m^x* b. A point in input (x,y) spacepresumedto lie on a line producesa locus of votes in parameter spacefor all lines HOUGH TRANSFORM on which it could lie. (This locus happens to be in a straight The Hough transform (HT) denotes any of several parameter line in (m, b) space.)The vote locus of a secondpoint intersects estimation strategies based on histogram analysis, in which the first (adds to it) only at the (m, b) parameters of the single histogram peaks, (modes)in a transform space identify phe- (infinite) line containing both feature points. All other feature nomena of interest in an input feature space.The name origi- points colinear with the first two contribute votes to this (m, nates from L962 invention for locating lines in bubble chamber b), and no other points do. If the input space is ideal edge photographs (1). Since then the idea has becomewidespread elements-(r, y, orientation) triples describing image bright-, and of considerable engineering importance. In computer vi- nessdiscontinuities-each edgeelement castsa single vote for sion (qv) it was first used to identify parameterized curves the one line passing through it at the correct orientation. After (e.g.,conics) in images (2). HT has been generalized to detect voting, peaks (modes) in the parameter space correspond to nonparametric shapes of arbitrary scale and rotation (3,4). image lines through the greatest number of lined-up edge eleThe HT process has been postulated to occur in abstract fea- ments regardless of their sparsenessor other confusing edges ture spacesduring human perception (5) and is a widely appli- in the image. Multiple lines in the input do not interfere but grve multimodal results in parameter space. Figure 1 cable form of evidence eombination.
Figure l. Circle detection. An input grayscale image (o) is processedwith an edge detector, yielding an orientation at each point. The edge strength, or contrast, is shown in (6). For each of several radii E; there is an accumulator array Ai the same size as the image. Each edge element votes into eachA; for two possiblecenters.B;away from the edgein both directions orthogonal to it. The accumulator for one of the larger radii is shown in (c). Peaks in the three-dimensional (x,y,R) accumulator are interpreted as circles and displayed in (d).
HUMAN-COMPUTERINTERACTION
shows circle detection with edge element input. An HT implementation of general shape matching is formally equivalent to template matching (matched filtering). With HT, the computational effort (voting) grows with the number of matchable features in the input, not the size of the input array (6). Practicallssues HT is a form of mode-basedparameter estimation that is complementary to mean-based (such as least-squared error) techniques. Least-squared eruor methods may be preferable if all the data originate from a single phenomenon, there are no "outlier" points, and data are corrupted by zero-mean noise processes.Mode-basedestimation is indicated if there are several instances of the phenomenon of interest in the input or if the data are incomplete or immersed in clutter or outliers. Parametric HT finds parameters that may describe infinite objects.Line detection is a goodexample: Further processingis needed to find end points of line segments. Noise of several varieties can affect HT (6,7) and can be combatedby standard techniques. Uncertainty in any feature parameter (e.g., edge orientation) may be accommodated either by using a set of votes spanning the uncertainty range or by smoothing the accumulator array before peak finding. Votes may be weighted according to the strength of the feature producing them. Votes are usually collected in discrete versions of parameter space implemented as arrays. Parameter spaces involving three-dimensional directions are often represented with more complex data structures, such as spheres or hyperspheres. High-resolution or highdimensionality arcays can have large memory requirements. A solution is to implement the accumulator as a hash table. If each feature detector is prewired to its associatedparameters, in transform space the "voting" happens in parallel instantaneously and can be considered as excitation in a network (8). In two-dimensional shape detection the parameter spaceis usually (x,!,0,s), for location, orientation, and scale.A highdimensional parameter spacemay sometimes(with ingenuity) be decomposedinto a sequenceof lower dimensional spaces, making voting less expensive. Parameters in accumulator space must be independent if a mode is to correspond to a unique tuple of parameters. The global nature of the HT, accumulating evidencefrom the entire input space,can be a drawback for some applications. One remedy is to decomposethe input space into a set of regions small enough to enforce the desired locality. The histogram generation and analysis neededfor HT admit parallel solutions.
BIBLIOGRAPHY 1 P. V. C. Hough, Method and Means for RecognizingComplex Patterns, U.S. Patent 3,069,654,DecemberL8, L962. 2. R. O. Duda and P. E. Hart, "Use of the Hough transform to detect lines and curves in pictures," Commun. Assoc.Comput. Mach. 15, 11-15 Q972). 3. D. H. Ballard, "Generalizing the Hough transform to detect arbitrary shapes,"Patt. Recog.13(2),111-122 (1981). 4. D. H. Ballard and C. M. BrowD, Computer Vision, Prentice-Hall, Englewood Cliffs, NJ, L982.
383
5. H. B. Barlow, "Critical limiting factors in the design of the eye and visual cortex," Proc. Roy. Soc.Lond.B2l2(L), 1-34 (1981). 6. C. M. Brown, "Inherent bias and noise in the Hough transform," IEEE Trans. Patt. Anal. Mach. Intel,l. PAMI-5, 493-505 (September 1983). 7. S. D. Shapiro and A. Iannino, "Geometric constructionsfor predicting Hough transform performance," IEEE Trans. Patt. Anal. Mach. Intell. PAMI-l(3), 310-317 (July 1979). 8. D. H. Ballard, G. E. Hinton, and T. J. Sejnowski,"Parallel visual computation,"Nature 306(5938),2I-26 (November3, 1983). C. BnowN University of Rochester
INTERACTION HUMAN-COMPUTER The recent history of advancesin the study and techniques of human-computer interaction has been intertwined with that of AI; each has contributed to the other. At times research in AI has developedtechniques to improve user-computer communication, and at other times the unique demands placed on the users and programmers of AI systems have led them to be the first to apply innovative techniques for human-computer communication. BecauseAI systemsare often designedto perform complicated and poorly understood tasks, they need to interact with their users more intimately than other systems and in more complex, less stereotypedways. AI programs are also among the most complicated programs written, Ieast amenable to being specifiedclearly in advance, and most unpredictable. Hence their programmers have been the first to need such advancesas powerful interactive debuggers,editors, programming tools, and environments, and they have developed many of them. This entry examines the reciprocal connectionsbetween the study of human-computer interaction or human factors and AI from each of the following directions: 1. specificfields of AI directly useful in constructing humancomputer interfaces, such as speech recognition (some of these topics are coveredin separate entries in this volume and are mentioned only briefly here); 2, by-products of AI programming that have proven useful in designing human-computer dialogues; and 3. developments in the study of the human factors of humancomputer interaction that are helpful in designing user interfaces for complex AI systems. Finally, this entry indicates how the two fields of study overlap in their concernsand how insights into cognitive psychology from both fields will help to build more intelligent, natural user interfaces in the future. SpecificAl Applicationsto Human-ComputerInteraction Natural Language.Among those areas of AI researchuseful in improving human-computer interaction, the most obvious is the study of natural language (seeNatural-language understanding). Researchinto how natural language is understood can permit human-computer dialogues to be conducted in such a language (although this is not always an unalloyed benefit, BSdiscussedsubsequently).The study of natural-lan-
384
HUMAN-COMPUTERINTERACTION
guage input has its roots in early work in machine translation and, later, in query-answering systems. Systems such as ELIZA (qv), SHRDLU (qv), and BASEBALL (qv) demonstrated that computers could conduct plausible natural-language dialogues in restricted domains. But proceeding from that point to a general ability to accepta wide range of natural language has been difficult. A natural-language processing system generally contains three parts: a dictionary or lexicon of the words it accepts;a grammar, which describes the structures of the sentencesit accepts;and a semantic component,which assigns interpretations to or performs actions in responseto the input. Syntax is typically represented in the secondcomponentby a set of productions or an augmented transition network (seeGrammar, augmented transition network). Some systems combine the latter two componentsinto a semantic glammar, putting the semantic rules or actions directly into the syntax grammar. Tlrey use a specialized grammar designedfor a particular domain of discourse and subset of the language (1). This approach provides an effective way to build systemsthat accepta relatively constrained subsetof natural language in a particular domain, but it is difficult to expand to larger, more general areas of the language. The alternative, use of a purely syntactic grammar and leaving the semantics in a separate component, is helpful for building a system with broad coverage,but the syntactic component will often identify a wide range of possible parses of a sentence,which can only be narrowed by the semantic. Thus, such systems tend to perform searches with considerable backtracking. Still other alternative approaches,such as systemsdriven by semantically basedscripts (qv) rather than syntax (2) and menu-basednatural-language (qv) systems (3), have also been used successfully.Finally, to complete a dialogue in natural language, it is necessaty to generate sentences from internally stored information, and approachesthat go beyond simply storing canned responses have been studied (4,5). Given the present state of the art, it is possible to construct a practical natural-language system for a specifiedsubset of a language in somenarrow, well-defined domain. Such a system requires that a considerableamount of knowledge about that domain be built into the lexicon, Brahmar, and semantic component and thus much effort that cannot be reused in another natural-language system. Systems that can handle a broad range of language on many topics remain a research goal. Speech.Another important area of AI research is the processing of speech,both accepting as input and generating as output. Speechis an attractive input medium becausepeople already know how to use it, they can generally speak faster than they can write or type, and it leaves their hands free for other tasks. Recognition of isolated words is a relatively wellunderstoodproblem, and commercial systemsare available for this task (6). Accepting continuous speechhas proven significantly less tractable, Iargely becausenormal speakersdo not pause between words. It is generally not possible to divide a speechinput signal into words simply through signal processirg; it requires knowledge of the meaning and context of the utterance. Thus, speech understanding (qv) involves both a signal processing or pattern recognition component, which identifies words or other parts of the input signal, and a semantic component,which assigns meanings to the utterance. For systemsthat go beyond isolated words, there must be feedback between the two; and to function effectivelY, the latter
component requires considerableknowledge about the underlying language and the domain of the discourse.Thus, work in speechinput is intimately connectedto the study of natural language and knowledge representation. Much of the principal work on speechunderstanding was performed under the aegis of the ARPA SpeechUnderstanding ResearchProgram between 1971 and 1976.The principal projects, which included HEARSAY (qv), HARPY (qv), and HWIM, all emphasizedthe problems of representation and use of knowledge about the spokenlanguage, and each used different approachesto them. More recent work has extendedthese ideas, but robust, production-quality continuous speechinput continues to be an elusive goal (7). The area of speechgeneration is also important, but much of it is sufficiently well understoodand widely available that it is no longer considereda topic in AI (8). Current research focuses more on reducing the cost of large vocabularies through coding techniques and on improving naturalness. PatternRecognition. Computer vision (qv) or pattern recognition (qv), appropriately applied, is also relevant to humancomputer interaction, as it can permit computer input in the form of gestures much as people use in communicating with one another. An example of this approach, without using sophisticated pattern recognition, was demonstratedby Bolt (9). Similar AI techniques could also be used to accept rough sketchesdrawn by people as a form of computer input, again similar to the mode used for communication between people. Going further, such gesture input can be combined with improved displays to be an important componentof a user interface that resemblesa real-world environment (10). "lntelligent" User lnterfacesand Computer-Aidedlnstruction. The above has examined some specifictechniques or modalities of human-computer interaction derived from AI research.What can be said of a human-computer interface that begins to exhibit more generally intelligent behavior, beyond simply competencein one or more of the specific interaction media discussed?An intelligent human communication partner can: accept and compensatefor many types of incorrect or incomplete inputs; realize when the conversational partner has misperceived something and provide explanations to rectify the underlying misconception; infer the underlying goals in a statement or question, even where they are at odds with those stated; follow the changing focus of a conversation; maintain and update an estimate of the partner's interests, knowledge, and understanding; and construct replies that take into account that current estimate. There is research in AI that attempts to understand and duplicate some of these processes.The bulk of it has thus far been conductedin the area of computer-aidedinstruction (CAI, see Intelligent CAI) in order to build "intelligent tutors." Such systemsattempt to model a student's (incomplete)understanding of material and present new material or leading questions appropriate to the student's current level of knowledge and ability.
HUMAN-COMPUTERINTERACTION
For example, SOPHIE (qv) watches a student troubleshoot electronic equipment, answers his or her questions,and criticizeshis or her hypotheses.WEST and WUMPUS both observe students playing computer games and offer suggestionsbased on inferences about the students' skill made from watching their moves. SCHOLAR (qt) asks its student leading questions when it finds deficiencies in his knowledge. MENO-II finds bugs in student programs and identifies the underlying misconceptionthat causedthe bug. GUIDON (qv) is built upon a rule-based system. By presenting example cases,it attempts to deducewhich of the rules in its knowledge base the student already knows and which he or she is ready to learn. It also manages the overall flow of the dialogue with the student, selectstopics for study, selectsappropriate presentation techniques, maintains context, and allows for unsolicited inputs (11,12). Some such work has extended outside traditional CAI. For example, The University of California (13) uses these techniques in an intelligent help system. It attempts to infer the user's underlying goals and intentions and provides answers that take this information into account in addition to the specific question asked. Other intelligent help systemsvolunteer advice when appropriate (L4). This sort of research into problems such as modeling a user's information state in a dialogue, inferring user's misconceptions, and constructing appropriate replies has been concentrated in the area of CAI, but it is applicable to the design of intelligent user interfaces or intelligent dialogues in any area. By combining many of these individual techniques, one can take the notion of an intelligent user interface and carry it somewhat further, to build a user-modeling system that can describe and reason about what its user knows and conduct a dialogue with the long-term flow and other desirable properties of dialogues between people.Such a system would maintain and use information about the user and his or her current state of attention and knowledge, the task being performed, and the tools available to perform it (15,16). For example, when the underlying application program sends information to the user, this user agent can control its presentation based on its model of what the user already knows and is seeking and remove information irrelevant to his current focus. It is important to remember that such an intelligent user interface is by no means restricted to natural language.Most researchon the processesneededto conduct such dialogueshas concentrated on natural language, but they apply to any human-computer dialogue conducted in any language. For example, STEAMER (17) demonstrates a dialogue in a rich graphical language using powerful and natural state-of-theart input and output modalities. The user's side of the dialogue consistsalmost entirely of pointing and pressing mouse buttons and the computer's of animated pictorial analogs. A dialogue in such a language could also exhibit the intelligent user interface properties discussedhere-following focus,inferring goals, correcting misconceptions. Further, knowledge-based techniques can be applied to improve the selection, construction, and layout of appropriate graphical representations for the output side of the dialogue (18). Adaptation. An intelligent user interface would also exhibit some learning and adaptation to the user. The simplest form such adaptation could take uses explicit input: A user enters instructions about the way he or she wants the dialogue to be conducted,and the subsequentdialogue usesthis information.
385
This is already available in, for example, facilities for defining aliases or command proceduresor using profiles. A more intricate form of adaptation uses implicit inputs: The computer obtains information about the user without actually asking him for it. This can be done in two ways: using information intrinsic to the dialogue or using external information about the user (19). Examples of the former are: using information about the user's errors, choice of commands, or use of help features to decidewhether he is an expert or novice user; inferring the focus within which a command should be interpreted from the preceding sequenceof user commands;and measuring and using the length and distribution of user idle periods. The other possibility is to use implicit measurementsobtained from inputs outside the actual dialogue. For example, sensors might try to determine whether the user was actually sitting at his terminal (or had left the roorn) or what the user was lookin g at and, from that, the context within which his commands should be interpreted (20). Another way to classify adaptation is by time span. Changes like renaming a command are intended to be long term. Explicit inputs are generally used only for such longterm adaptation becauseit is too much trouble to enter them more frequently. Short-term adaptation to changes in the user'sstate relies on implicit inputs. A systemcould usethe fact that he or she is typing faster, making more semantic errors, or positioning a cursor inaccurately to make short-term changesin the pace or nature of the dialogue. Short-term adaptation using implicit inputs is a potentially powerful technique for creating adaptive human-computer dialogues.Some beginnitrg steps in this direction are demonstrated by Edmonds (2I). Other Al Developmentsin Human-Computerlnteraction Becauseof the complexity of AI prograffis, their programmers have been pioneers in the development and use of innovative human-computer interaction techniques, which are now used in other areas. The development of powerful interactive programming environments was spearheadedby AI programmers developing large LISP programs. They required and developed complex screen-orientededitors, break packages,tracing facilities, and data browsers for LISP programming environments (22,23), More recent interaction methods, such as overlapping display windows, icons, multiple contexts, use of mice, pop-up menus, and touch scleens had their roots in AI programming. Many of these were developed by workers at Xerox PARC, both in Interlisp and Smalltalk (in parallel and with considerable interaction between the two). Many of these ideas were spawned and made practical by the availability of powerful graphics-oriented personal computers in which a considerable fraction of the computing resourcesin the unit was devotedto the user interface. Recent programming systemsthat combine and exemplify these include Interlisp (qv), the MIT LISP Machine (qv), Smalltalk (qv), and LOOPS (24). The combination and effective use of many of these techniques have been demonstrated by a variety of systems (e.g.,Ref. 25) and most notably in STEAMER (17). These techniques are moving out of the AI community into all areas of human-computer interaction, including small personal computers. One continuing problem is that, although interfaces involving such techniques are often easier to learn and use than conventional ones, they are currently consider-
386
INTERACTION HUMAN-COMPUTER
ably more difficult to build since they are typically programmed in a low-level, ad hoc manner. Appropriate higher level software engineering conceptsand abstractions for dealing with these new interaction techniques are needed. DesigningHuman-ComputerInterfaces The study of human factors and user psychology over the past few years has paralleled that of AI. Its results are now finding application in the design of better user interfaces for AI and other complex systems. AI systems stretch the limits of what has been or can be done with a computer and thus often generate new human-computer communication problems rather than alleviating them. The methods of human factors-task analysis, understanding of interaction methods and cognitive factors, and empirical testing of alternatives with users-are thus especially applicable to designers of AI systems. Design of a human-computer interface begins with task analysis, an understanding of the user's underlying tasks and the problem domain. It is desirable that the user-computer interface be designed in terms of the user's terminology and conceptionof his or her job, rather than the progTammer's.A good understanding of the cognitive and behavioral characteristics of peoplein general as well as the particular user population is thus important, as is knowledge of the nature of the user's work. The task to be performed can then be divided and portions assignedto the user or machine based on knowledge of the capabilities and limitations of each. AI often expands the capabilities of the computer side, but for all but fully autonomous systems,the user is likely to play some role in performing or gpiding the task and hence will have to interact with the machine. ' Stylesof Human-Computer Interfaces. A style of user interface appropriate to the task should be selected.The principal categories of user interfaces currently in use are command Ianguages, menus, natural language, and gfaphics or "direct manipulation" (26). Command language user interfaces use artificial languages,much like programming languages. They are concise and unambiguous but are often more difficult for a novice to learn and remember. However, since they usually permit a user to combine their constructs in new and complex ways, they can be more powerful for advanced users. They are also most amenable to programming, that is, writing prog1ams or scripts of user inPut commands. Menu-based user interfaces explicitly present the options available to a user at each point in a dialogue. Thus, they require only that the user be able to recognize the desired .ntry from a list rather than recall it, placing a smaller load on long-term memory. They are highly suitable for novice users. A principal disadvantage is that they can be annoying for experienced users who already know the choicesthey want to *.k" and do not need to seethem listed. Well-designedmenu systems, however, can provide bypasses for expert users. Menus are also difficult to apply to "shallow" languag€s,which have large numbers of choices at a few points, becausethe option display becomestoo big. Natural-language user interfaces are considered above. Their principal benefit is, of course, that the user already knows the language. However, given the state of the art, such an interface must be restricted to some subset of natural languege, and the subset must be chosencarefully, both in vocab-
ulary and range of syntactic constructs. Such systems often behave poorly when the user veers even slightly outside the subset. Since they begin by presenting the illusion that the computer really can "speak English," the systems can trap or frustrate novice users. For this reason, the techniques of human factors engineering can help. A human factors study of the task and the terms and constructs people normally use to describe it can be used to restrict the subset of natural language in an appropriate way based on empirical observation (27). Human factors study can also identify tasks for which natural-language input is good or bad. Although future research in natural language offers the hope of human-computer communication that is so natural it is "just like talking to a person," such conversation may not always be the most effective way of commanding a machine (28). It is often more verbose and less precise than computer languages. In some settings people have evolved terse, highly formatted languages, similar to computer languag€s, for communicating with other people. For a frequent user the effort of learning such an artificial language is outweighed by its conciseness and precision, and it is often preferable to natural language. Finally, recent advances have led to a graphical or direct manipulation (26) style of user interface, in which objectsare presented on the screen, and the user has a standard repertoire of manipulations that can be performed on them. There is no command language to remember beyond the set of manipulations, and generally any of them can be applied to any visible object. This approach to user interfaces is in its infancy. Some current examples include Visicalc, the Xerox Star, STEAMER (17), and, of course,many video games. Although object-orientedlanguageslike Smalltalk and powerful graphic input and output techniques make such interfaces easier to build, an important difficulty in designing them is to find suitable manipulable graphical representations or visual metaphors for the objectsin the problem domain. The paper spreadsheet (Visicalc), desk and filing cabinet (Star), and engine control panel (STEAMER) were all fortunate choices.Another problem with direct manipulation interfaces is that it is often diffi.rrlt to create scripts or parameteri zedprograms in such an inherently dynamic and ephemeral language. Various modalities of human-computer communication may also be employedas appropriate in designing a user interface. Keyboards and text displays are common,but somemore mod.ern modalities include, for output, saPhics, windows, icons, active value displays, manipulable objects,speech,and other sounds. Techniques for input include keys that can be dynamically labeled, interactive spelling correction and commice, speech, gesture, and visual line of mand .o*ft.tion, gaze.Eaclrmust be matched to the tasks for which it is used. DesignTechniquesand Guidelines.A variety of tools, techniques, and guidelines from human factors engineering can be brought to bear on the design of the user interface (29,30).One important principal is that of empirical measurement. Decisions about user interface design should be based on observations of users rather than on a designer's or programmer's notions. Careful use of empirical measurement also encourages the establishment of precise performance objectivesand metrics early in the development of a system. Alternative designs can then be tested against them empirically as the work progresses(31-33).
HUMAN-COMPUTERINTERACTION
In addition to specifictests of proposeduser interfaces,some general principles have been derived from laboratory experiments. For example, a user interface should be consistent; similar rules should apply for interpreting commands when the system is in what appear to the user to be similar states. Command names, order of arguments, and the like should be as uniform as possible, and commands should generally be available in all states in which they would be plausible. The system should also be predictable; it should not seemerratic. A small difference in an input command should not result in a big difference in the effect (or time delay) of the response. Unpredictability makes the user anxious, continually afraid of making an irrevocable mistake. A general backup facility, which lets the user undo any command after it has been executed, is one way to allay this anxiety. A fully general undo facility is difficult to implement but has been demonstratedin the Interlisp programming environment. More generally, a system should exhibit causality; the user should be able to perceive that the activity of the system is caused directly by his or her actions rather than proceeding seemingly at random. The state of the system should be visible to the user at all times, perhaps by a distinctive prompt or cursor or in a reserved portion of the screen. Systemscan be easy to learn and/or easy to use,but the two are differentn sometimesconflicting goals. Designs suitable for novice users may interfere with expert users; features like help facilities or command menus should be optional for experienced users. A good command language should consist of a few simple primitives (so as not to tax long-term memory) plus the abitity to combine them in many ways (to create a wide variety of constructs as needed, without having to commit all of them to long-term memory). The user interface should also exploit nonsymbolic forms of memory. For example, it can attach meaning to the spatial position on a display screen (certain types of messagesalways appear in certain positions)or to icons, typefaces,colors, or formats. One way to help design a user interface is to consider the dialogue at several distinct levels of abstraction and work on a design for each. This simplifies the designer's task becauseit allows him or her to divide it into several smaller problems. Foley and Wallace (34) divide the design of a human-computer dialogue into the semantic, syntactic, and lexical levels. The semantic level describes the functions performed by the system. This corresponds to a description of the functional requirements of the system but doesnot addresshow the user will invoke the functions. The syntactic level describes the sequencesof inputs and outputs necessaryto invoke the functions described. The lexical level determines how the inputs and outputs are actually formed from primitive hardware operations. With appropriate programming techniques,these aspectsof the dialogue can be desigxedand programmed entirely separately (35). Another approach that can help the designer and software engineer is the user interface management system (UIMS). A UIMS is a separate software component that conducts all interactions with the user; it is separate from the application program that performs the underlying task. It is analogous to a database management system in that it separates a function used by many applications and moves it to a shared subsystem. It removes the problem of programming the user interface from each individual application and permits some of the effort of designing tools for human-computer interaction to be amortized over many applications and shared by them. It
387
also encouragesthe design of consistent user interfaces to different systems since they share the user interface component. Conversely, it permits dialogUe independence,where changes can be made to the dialogue design without affecting the application code (36). It is also useful to have a method for specifying user interfaces precisely so that the interface designer can describe and study a variety of possible user interfaces before building one (37,38). Al and Human Factors:Toward"Natural" Human-ComputerInterfaces The recent histories of research in AI and human factors have been interrconnectedin many ways. Each has contributed techniques and ideas to the other, and each has found applications in the other. How will these two disciplines cross paths in the future? The answer is in the domain of understanding the user's cognitive processes. Much work in human factors has been devoted to understanding the mental models and processesby which users learn about, understand, and interact with computer systems. Its purpose is to build systemsthat are easier to learn and use because they fit these processesmore closely. For example, somecommand languages,text editors, and programming language constructs have been improved by studying and using carefully, but not overloading, the capabilities of human shortand long-term memory in their design (39). Much of AI research,too, is devotedto understanding people'scognitive processes.The results of such study can be a better understanding of how people (specifically, computer system users) processinformation-perceive data, focus attention, construct knowledge,remember, make errors. The insights into cognitive psycholory developed by research in both fields can be used to make human-computer interfaces more "natural," to fit their users better. The goal of such work is to produce a more intelligent and natural user interface-not specifically natural langudge, but a naturally flowing dialogue. Such a development will begin with human factors study of good user interface design using insights from cognitive psychology. Appropriate visual and other metaphors for describingoand proxies for manipulating, the objects and activities of the task at hand must then be chosen.AI techniques can permit the system to obtain, track, and understand information about its user's current conceptions, goals, and mental state well beyond current dialogue systems where moq,tof the context is lost from one query or command to the next. The system will use this information to help interpret users' inputs and permit them to be imprecise, vague, slightly incorrect (e.g., typographical errors) or elliptical. This approach, combined with powerful techniques such as direct manipulation or graphical interaction, can produce a highly effective form of human-computer communication. The research in AI pertinent to human-computer interaction has attempted to discover users' mental models, to build systems that deduce users' goals and misconceptions,and to develop some forms of adaptive or personalizable user interfaces.A collection of powerful interaction modalities has also been developed.The challenge for the future is for research into cognitive psychology in both human factors and AI to combine with new interaction and programming techniquesto produce a style of interface between user and computer more closely suited to the human side of the partnership.
3BB
HUMAN-COMPUTERINTERACTION
BIBLIOGRAPHY 1. G. G. Hendrix, E. Sacerdoti,D. Sagalowicz,and J. Slocum,"Developing a natural language interface to complexdata," ACM Trans. DatabaseSys. 3, 105-147 (1978).
22. D. Weinreb and D. Moon, Lisp Machine Manual, MIT Artificial Intelligence Laboratory, Cambridg", MA, 1981. 23. W. Teitelman, Interlisp ReferenceManual, Xerox PARC Technical Report, Palo Alto, CA, 1978. 24. M. Stefik, D. G. Bobrow, S. Mittal, and L. Conway, "Knowledge programming in LOOPS: Report on an experimental course:' AI Mag. 4(3),3-13 (1983).
2. R. C. Schank and R. P. Abelson, Scripts,Plans, Goals,and.Understanding, Lawrence Erlbaum, Hillsdale, NJ, L977. 3. H. Tennant, K. Ross,R. Saenz,C.Thompson,andJ. Miller, Menu- 25. R. G. Smith, G. M. E. Lafu€, E. Schoen,and S. C. Vestal, "Declarative task description as a user-interface structuring mechanism," BasedNatural Language Understanding,Proceedingsof theAssoIEEE Comput. l7(9), 29-38 (1984). ciation fo, Computational Linguistics Conference,Cambridg., MA, pp. 151-157, 1983. 26. B. Shneiderman, "Direct manipulation: A step beyond programming languag€s,"IEEE Comput. 16(8),57-69 (1983). 4. W. C. Mann, An Overview of the PENMAN Text Generation System, Proceedings of the Third National Conferenceon Artificial 27. P. R. Michaelis, M. L. Miller, and J. A. Hendler, Artificial IntelliIntelligence,Washington, DC, pp. 261-265, 1983. gence and Human Factors Engineering: A NecessarySynergism in the Interface of the Future, in A. Badre and B. Shneiderman 5. B. Swartout, The GIST Behavior Explainer, Proceedingsof the (eds.), Directions in HumanlComputer Interaction, Ablex, NorThird National Conferenceon Artifi.cial Intelligence, Washingwood,NJ, pp. 79-94, 1982. ton, DC, pp. 402-407, 1983. 6. J. L. Flanagan, Speech Analysis, Synthesis, and Perception, 28. D. W. Small and L. J. Weldon, "An experimental comparisonof natural and structured query languages,"Hum. Fact.25r 253-263 Springer Verlag, New York, 1972. (1e83). 7. W. A. Lea (ed.;, Trends in Speech Recognition, Prentice-Hall, 29. B. Shneiderman, Software Psychology: Human Factors in NJ, 1980. Englewood Cliffs, Computer and Information Systems,Winthrop, Cambridge, MA, 16(8), 18-25 IEEE Spect. speaks," computer 8. B. A. Sherwood,"The 1980. (1979). B. R. Gaines and M. L. G. Shaw, Dialog Engineering, in M. E. 30. 9. R. A. Bolt, "Put-that-there: Voice and gesture at the graphics Sime and M. J. Coombs(eds.),Designing for Human-Computer (1980). 262-27 0 L4(3), interface," Contput. Graph. Academic Press,London, pp. 23-53, 1983. Communication, 10. A. Lippman, "Movie-maps:An application of the optical videodisc Whiteside, A. Singer, and W. Seymour, "The J. A. H. Ledgard, 31. to computer graphics," Comput. Graph. l4(3),32-42 (1980). natural language of interactive systems," CACM 23, 556-563 11. W. J. Clancey, Dialogue Management for Rule-BasedTutorials, (1e80). Proceedingsof the Sixth International Joint Conferenceon AI,To' 32. J. D. Gould, J. Conte, and T. Hovanyecz, "Composing letters kyo, Japan,pp. 155-161, 1979. with a simulated listening typewriter," CACM 26, 295-308 L2. B. Woolf and D. D. McDonald, "Building a computertutor: Design (1983). (1984). issues,"IEEE Comput. 17(9),61-73 33. T. K. Landauer, K. M. Galotti, and S. Hartwell, "Natural com13. R. Wilensky, Y. Arens, and D. Chin, "Talking to UNIX in English: mand names and initial learning: A study of text-editing terms," An overview of UC," CACM 27,574-593 (1984). cAcM 26, 495-503 (1983). 14. J. Shrager and T. W. Finin, An Expert System that Volunteers g4. J. D. Foley and V. L. Wallace, "The art of graphic man-machine Advice, Proceedings of the SecondNational Conferenceon Artificonversation,"Proc. IEEE 62, 462-47I (L974). cial Intelligence, Pittsburgh, PA, 1982. gb. R. J. K. Jacob,An Executable SpecificationTechniquefor Describ15. P. Hayes, E. Ball, and R. Reddy, "Breaking the man-machine ing Human-Computer Interaction, in H. R. Hartson (ed.), Adcommunicationbarrier," IEEE Comput. l4(3), 19-30 (1981). uances in Human-Computer Interaction, Ablex, Norwood, NJ, 16. E. L. Rissland, "Ingredients of intelligent user interfac€s,"Int. J. 1985. Man-Mach. stud. 2L, 377-388 (1984). 36. H. R. Hartson and D. H. Johnson, "Dialogue management:New 17. J. D. Hollan, E. L. Hutchins, and L. Weitzmar:,"STEAMER: An concepts in human-computer interface development," Comput. interactive inspectable simulation-based training system," AI Suru.(1987)(in press). Mag. 5(2), t5-27 (1984). 87. P. Reisner, "Formal grammar and human factors design of an 18. F. Zdybel, N. R. Greenfeld, M. D. Yonke, and J. Gibbons, An interactive graphics system," IEEE Trans. Soft. Eng.SE'7' 229Information Presentation System, Proceedingsof the SeuenthIn240 (1981). ternational Joint Conferenceon AI, Vancouver, BC, pp. 978-984, g8. R. J. K. Jacob, "Using formal specifications in the design of a 1 9 8 1. human-computer interface," CACM 26, 259-264 (1983). 19. E. Rich, "IJsersare individuals: Individualizing user models,"Int. Bg. R. B. Allen, Cognitive Factors in Human Interaction with ComJ. Man-Mach. stud. 18, 199-2L4 (1983). puters, in A. Badre and B. Shneiderman(eds.),Directions in Hu20. R. A. Bolt, Eyes at the Interface, Proceedingsof the ACM SIGCHI manlComputerInteraction, Ablex, Norwood,NJ, 1982. Human Factors in Computer Systems Conference,Gaithersburg, MD, pp. 360-362, L982. 2I. E. A. Edmonds, Adaptive Man-Computer Interfaces, in M. J. R. Jncos Coombsand J. L. Alty (eds.),Computing Skills and the UserInterNaval Research LaboratorY face, Academic, London, PP. 389-426, 1981.
basic familiarity with vision as provided by the overview entry (seeVision). Integration is the key phrase when describing an IUS. Research on IUSs has experimented with ways of integating IMAGEUNDERSTANDING existing techniques into systems and, in doing so, has discovyou you what process understand by which ered problems and solutions that would not otherwise have Think about the see.Can you determine what is happening and how it is hap- been uncovered.Unfortunately, there are no truly general vipening when you look out the window and notice that your sion systems yet, and much further research is necessaryon best friend is walking toward your door? As you may guess, all aspectsof the problem. Integrated within a single framethe processby which you amived at this conclusion,and which work, an IUS must: causedyou to go and open the door before your friend knocked, is not a simple one. Ancient philosophers worried about this Extract meaningful two-dimensional (2-D) grouping of inproblem. Biological scientists have been studying the problem tensity-location-time values. Images or image sequences in earnest since Hermann von Helmholtz (1821-1894), comcontain a tremendous amount of information in their raw monly credited as the father of modern perceptual science. form. The processof transformation thus begins with the Computer scientists began looking at this problem only reidentification of groups of image entities, pixels. These pixcently in these terms, and the discipline of computer vision is a els are grouped by means of similarity of intensity value, very young one. The miracle of vision is not restricted to the for example, over a particular spatial location. They can eye; it involves the cortex and brain stem and requires interacalso be grouped on the basis of intensity discontinuity or tions with many other specificbrain areas.In this sense,vision similarity of change or constancy over time. The assumpmay be consideredas an important aspectof AI. It is the major tion is that groups of pixels that exhibit some similarity source of input for man's other cognitive faculties. in their characteristics probably belong to specific objects This entry discussesthe aspectsof vision that deal with the or events. Typical groupings are edges, regions, and flow 'understanding' of visual information. Understanding in this vectors. context means the transformation of visual images (the input Infer 3-D surfaces, volumes, boundaries, shadows, occluto the retina) into descriptions of the world that can interface sion, depth, color, motion. Using the groupings of pixels and with other thought processesand can elicit appropriate action. their characteristics, the next major transformational step The representation of these descriptions and the process of is to infer larger groupings that correspondfor, for example, their transformation are not understood currently by the bioto surfaces of objects or motion events. The reason for the logical sciences.In AI, researchersare concernedwith the disneed for inference is that the pixels by themselves do not covery of computational models that behave in the same ways contain sufficient information for the unique determination that humans do, and thus, representationsand processesare of the events or objects;other contraints or knowledge must definedusing the available computational tools. This encyclobe applied. This knowledge can be of a variety of forms, pedia is a collection of such tools and their application, and ranging from knowledge of the imaging process,knowledge this entry assumesthat the reader will refer to other appropriof the image formation process,and knowledge of physical ate entries for details on specifictopics only mentioned here. constraints on the world to knowledge of specific objects Image understanding (IU) is the research area concerned being viewed. Typically, the most appropriateknowledgeto with the design and experimentation of computer systemsthat use is an open question,but the simplest and least application-specific knowledge is preferred, and the current belief is that no application-specificknowledge is required at this or more methods for matching features with models using a stage. control structure. Given a goal, or a reason for looking at a information into unique physical entities. Surfaces Group particular scene,these systems produce descriptions of both connected to form 3-D objects,and changesin trajeccan be the images and the world scenesthat the images represent. tories can be joined to describe motions of specifictypes. The goal of an image-understanding system (IUS) is to Again, the original pixel values do not contain sufficient transform two-dimensional (2-D) spatial (and, if appropriate to information for this process, and additional knowledge the problem domain, time-varying) data into a description of must be applied. This knowledge is perhaps in the form of the three-dimensional spatiotemporal world. In the early to connectivity and continuity constraints,and in many cases mid-seventiesthis activity was termed "sceneanalysis." Other these are embedded in explicit models of objects of the terms for this are "knowledge-basedvision" or "high-level vidomain. sion." Several survey papers have appearedon this topic. The Transform image-centeredrepresentations into world-ceninterested reader is particularly referred to papers by Binford (1), Kanade (2), Matsuyama (3), and Tsotsos(4) as well as the tered representations.To this point the descriptionscreated have all been in terms of a coordinate system that is "imexcellentcollectionof papersin the book ComputerVision Systems (5) and Part IV of the book Computer Vision (6). Those age centered" (also called "viewer centered" or "retinoreaders interested in the biological side of image understandtopic"). A key transformation is to convert this coordinate system to one that is "world centered" (also called "object ing are referred to the excellent book by Uttal, A To^tconorny of centersfl"),that is, the description is no longer dependent Visual Processes(7). This entry assumesthat the reader has a I M A G E A N A I Y S I S . S e e S c e n ea n a l y s i s ;V i s i o n , e a r
fn,"f"? ?Hifi"l r,Hl:I: il'tr#L:1il:;'J,#:H"?#
389
390
IMAGE UNDERSTANDING
on specificlocations in images. This is a crucial step-otherwise, the stored modelsmust be replicated for eachpossible location and orientation in space. Label entities dependingon system goals and world models. It almost never occurs that humans are given a picture or told to look out the window and asked to describe everything that is seen in a high and uniform degree of detail. Typically a sceneis viewed for a reason. What exactly this goal is has direct impact on how the scene is described, which objectsand events are describedin detail; and which are not. Second,scenesare always describedbasedon what is known about the world; they are describedin terms of the domain that is being viewed. A factory scene,for example, is almost never described in terms of a hospital environment-that would not be a useful description (unlessmetaphoric use is the goal!). This knowledge base permits the choice of the most appropriate "labels" to associatewith objects and events of the scene. Labels are typically the natural-langU age words or phrases that are used in the application domain. The processof finding labels and their associated models that are relevant is called "search." Models that are deemedrelevant may be termed "hypotheses." Each hypothesis must be "matched" against the data extracted from the images. In the case where the data is insufficient to verify a model, "expectations" may be generated that guide further analysis of the images. Labels are necessary for communication to other componentsof a complete intelligent system that must use interpreted visual information. The label set forms the language of communication between the vision module and the remainder of the brain. Infer relationships among entities. In viewing a scene,not only are individual objects and events recognizedbut they are also interrelated. Looking out the window, for example, one may see a tree in a lawn, a car on a driveway, a boy walking along the street, or a girl playing on a swing set. The relationships may play an important role in assisting the labeling process as well. These relationships form a spatiotemporal context for objects and events. Construct a consistentinternal description.This really applies to all levels of the transformation processthat is being describedhere. The output of an image-understanding system is a representation of the image contents, usually called an "interpretation." Care is required, however, in defining what an interpretation actually involves. Little attention has been given to this, and current systems employ whatever representation for an interpretation is convenient and appropriate to the problem domain. Basically, an interpretation consists of inferred facts, relationships among facts, and representationsof physical form. Issuesof consistencyand foundations of the underlying representational formalism are important, yet they have not received much attention with the IUS community. The output of an IUS usually takes one of two forms: a graphic rendition of the objects recognized is displayed, perhaps with naturallanguage labels identifying various parts, or textual output describing the characteristics of the objects observed and recognized,is generated. Some systems employ both methods, and the choice dependson the particular problem domain being addressed.
arise as distinct from so-calledlow-level vision or early vision?" and the second is "Is image understanding computationally the same as speech understanding?" The answer to the first question follows" There are two main reasonsfor the distinction: the bottom-up approach (see Processing,bottomup and top-down) embodiedin early vision schemesis inadequate for the generation of complete symbolic descriptions of visual input, and there is a need to describevisual input using the same terminolory as the problem domain. There are several basic realities that impact the design of image-understanding systems.The first is that images underconstrain the scenesthat they represent. The reason is straightforward: In human vision a 3-D sceneundergoes a perspectiveprojection onto a 2-D retina in order to become an image. Thus, much information is lost, particularly depth information. The image is just a snapshotin time of the scene,and both spatial as well as temporal continuity information is lost. Further, the image created is a distorted view of the scenethat it represents.The distortion is not only due to the perspective transformation, but, also,there is noise involved in the image creation process. Finally, a purely bottom-up (or data-directed) approach does not lead to unambiguous results in all cases.A data-directed scheme considers all the data and tries to follow through on every hypothesis generated. Consideration of all data and all possiblemodels in a system of size and scopecomparableto the human visual system leads to combinatorial explosion and is thus an intractable approach. Moreover, it can be nonconvergent, can only produce conclusionsthat are derivable directly or indirectly from the input data, and cannot focus or direct the search toward a desired solution. A general vision systemmust be able to representand use a very large number of object and event models. If the input is naturally ambiguous, & purely bottom-up activation of models will lead to a much larger set of models to consider than is necessary or salient. The working hypothesis of IUSs is that domain knowledge (qv), in addition to the bottom-up processes,can assist in the disambiguation processas well as reduce the combinatorial problem. How that knowledge is to be used is a key problem. The secondquestion that often arises is "Is image understanding computationally the same as speechunderstanding?" On the surface it may seem that the techniques applicable to the speech-understandingproblem are directly applicable to the image-understanding problem. A simplified view of the processleadsto this conclusion.The difspeech-understanding ferencesarise if content rather than only form is considered. Speechunderstanding (qv) may be regarded as the recognition of phonemes (qv), the grouping of phonemes into words, the gfouping of words into sequences,the parsing of word sequencesinto sentences,and the interpretation of the meaning of the sentences.Indeed,in a paper by Woods(8) the similarity is presentedin somedetail. In that paper Woodsspeculateson the applicability of the HWIM architecture for the image-understanding problem and concludesthat it may be worth the attempt. However, a closer examination of the differencesbetween speechand image interpretation tasks reveals that the image-understanding task is significantly different and more difficult. The similarities between the speechand image tasks are many. Both domains exhibit inherent ambiguity in the signal, and thus signal characteristics alone are insufficient for interpretation. Reliability of interpretation can be increasedby the IUS describing an when arise questions always Two basic to the uninitiated. The first question is "Why did this field use of redundancy provided by knowledge of vocabulary, sYtr-
IMAGE UNDERSTANDING
tax, semantics, and pragmatic considerations; and both domains seem to involve a hierarchical abstraction mechanism. The differences include the facts that: (a) speechexhibits a single spatial dimension (amplitude) with a necessarytemporal dimension, whereas images display two spatial dimensions as well as the temporal dimension; (b) a speechsegment has two boundary points, whereas an image segment, as a spatial region, has a large number of boundary points; (c) speechhas a reiatively small vocabulary that is well documented (e.9., in dictionaries) and images have much larger, undocumentedvocabularies; (d) grammars have been devisedfor languages,but no such grammars exist for visual data; (e) although speech differs depending on the speaker, images vary much more becauseof viewpoint, illumination, spatial position and orientation of objects,and occlusion; (f) speechhas a convenient and well-acceptedabstract description, namely, letters and words, whereas images do not; and (g) the speechsignal is spatially one-dimensional, and when sampled by the ear, there is no equivalent of the projection of a 3-D sceneonto a 2-D retina. Thus, it seems that the image-understanding situation is radically different, particularly in combinatorial terms, and it is for this reason that very different solutions have appeared. and Control Requirements Representational This section attempts to summarrze the experienceof the IU community in the design and implementation of IUSs with a statement of componentscurrently believed to be necessaryfor general vision systems. It should be clear that this is not a formal definition of an IUS in a strict sense;many of the requirements are really topics for further research. The section doesnot contain specificreferences;instead, it refers to other entries in this encyclopedia.Specificsolutions and vision systems and how they deal with each of these requirements appears in a subsequentsection. RepresentationalRequirements.Many IUSs distinguish three levels of representation, a low level, an intermediate level, and a high level. Theselevels do not necessarilyrefer to particular types of formalisms but rather simply point out that in the interpretation process,I transformation of representations into more abstract ones is required and that typically three levels of abstraction are considered.These levels can usually be characterized as follows: Low level includes image primitives such as edges,texture elements, or regions; intermediate level includesboundaries,surfaces,and volumes;and high level includes objects,scenes,or events.There is no reason why there should be only three levels, and in fact, the task of transforming representationsmay be made easier by considering smaller jumps between representations.It should be clear in the descriptionsthat follow which level or levels are being addressed. Representationof Prototypical Concepfs. A prototype provides a generalized definition of the components,attributes and relationships that must be confirmed of a particular concept under consideration in order to be able to make the deduction that the particular conceptis an instance of the prototypical concept. A prototype would be a complex structure spanning many levels of description in order to adequately capture surfaces,volumes, and other events, to construct discrete objectsinto more complex ones,to define spatial, temporal, and functional relationships for each object, and to assert
391
constraints that must be satisfied in order for a particular object in a sceneto be identified. ConceptOrganization Three kinds of abstraction are commonly used, namely, feature aggregation, called "PART-OF ", conceptspeciahzatron,called "IS-A", and instantiation, called "INSTANCE-OF". The PART-OF hierarchy can be considered as an organization for the aggregation of concepts into more abstract ones or as an org antzation for the decomposition of conceptsinto more primitive ones, depending on which direction it is traversed. The leaves of the PART-OF hierarchy are discrete conceptsand may represent image features. It should be pointed out that concept structure does not necessarily mean physical structure only, but similar mechanisms with different semantics may be used to also represent logical components of concepts.IS-A is a relationship between two concepts,one of which is a specialization of the other (or, in other words, one IS-A generalization of the other). An important property of the is-a relationship is inheritance of properties from parent to child concept, thus eliminating the need for repetition of properties in each concept.Finally, the relationship between prototypical knowledge and observedtokens is tho INSTANCE-OF relationship. These three relationships are typically used in conjunction with one another. Consideration of the semantics of these relationships is important, and such issues are discussed elsewhere (see Inheritance hierarchy). SpatialKnowledge. This is perhaps the main type of knowledge that most vision systems employ. This includes spatial relationships (such aS "above," "betwe€r," "left of"), form information (points, curves, regions, surfaces,and volumes), Iocation in space, and continuity constraints. Much of this is described elsewhere (see Reasonitg, spatial). Spatial constraints for grouping have appeared in the Gestalt literature in psychology and include the tendencies to group using smoothnessof form, continuity of form, spatial proximity, and symmetry. The PART-OF relationship is usedto aggregatesimple forms into more complex ones. Properties or attributes of spatial forms are also required, namely, size, orientation, contrast, reflectance, curvature, texture, and color. Maps are common forms of spatial knowledge representation, particularly for vision systemsdealing with domains such as aerial photographs or navigation tasks. Temporal Knowledge.Information about temporal constraints and time is not only necessary for the interpretation of spatiotemporal images but can alse provide a context in which spatial information can be interpreted. Time can provide another sourceof contraints on image objectsand events.Temporal constraints for motion groupihgs, in the Gestalt sense,include the tendencies to group using similarity of motion, continuity or smoothnessof motion, and path of least action. The basic types of temporal information include time instants; durations and time intervals; rates, such as speedor acceleration; and temporal relations such as "before," "during," or "start." Each of these has meaning only if associatedwith somespatial event aswell. PART-OF and IS-A relationships can be used for grouping and organizing spatiotemporal concepts in much the same fashion as for purely spatial concepts. A difficulty with the inclusion of temporal information into an IUS is that an implicit claim is made of existential dependency. That is, tf a relationship such as "object A appears before object B" is included in a knowledge base,and object B is observed,then according to the knowledge base, it must be
392
IMAGE UNDERSTANDING
true that object A must have appearedpreviously. This prob- scribedelsewhere(seeA* algorithm; Beam search;Constraint lem is further describedelsewhere (seeReasonitg, temporal). satisfaction; Rule-based systems; Search, best-first; Search, The Sca/eProblem. It has been well understood since the branch-and bound; Search,depth-first). A different categorrzaearly days of computer vision that spatial and spatiotemporal tion of search types, and one that is more frequently found in events in images exhibit a natural "scale." They are large or the IUS literature, is in terms of knowledge interactions. The small in spatial extent and/or temporal duration, for example. following schemesare describedbelow: model-directedsearch, This problem is different than the image resolution or coarse- goal-directed search, data-directed search, failure-directed ness problem, and there is no relationship between the two. search, temporally-directed search, hierarchical models, hetThis is dealt with in more detail in another entry (seeScale- erarchical models, blackboard models, and beam search.The choice of search method employed depends on a number of spacemethods),and it is important that an IUS deal with this as well. There are implications not only for the design of the factors, including the form of the representation over which image-specificoperations that extract image events (a given the search is to be performed, the potential complexity proboperator cannot be optimal for all scalesand thus is limited for lems, and the goals of the searchprocess. Saliency of a model dependson the statement of goals for a particular range of events that it detects well) but also for the choice of representational and control scheme.If spatio- the search process.The search can be guided by a number of temporal events require representation at rnultiple scales,the trigger features, for example, and any modelsthat are encounmatching and reasoning processesmust also be able to deal tered that embody those features are selected.The selectionof with the multiple scales.The unification of information from a model for further considerationis termed "hypothesisactivation." A searchprocessthat leads to a very large set of active multiple scalesinto a single representation is important. hypotheses is not desired since the objective of search is to Description by Comparisonand Differentiation Similarity measurescan be used to assist in the determination of other reduce the spaceof models. Matching and HypothesisTesting.Once a set of active hyrelevant hypotheseswhen matching of a hypothesisfails. This potheses has been determined, further consideration of each growth as the hypothesis space of of is useful in the control hypothesis takes place. The first task to be carried out is to well as for displaying a more intelligent guidance schemethan match the active hypothesis to the data. It is important to note random choice of alternates. The similarity relation usually relates mutually exclusive hypotheses.The relation involves that data here do not necessarily only mean image-specific the explicit representation of possible matching failures, the information. Matching is defined as the comparison of two representationsin order to discovertheir similarities and difcontext within which the match failure occurred,binding information relevant to the alternative hypothesis, as well as ferences.Usually, a matching processin vision comparesrepthe alternate hypothesis. Thus, the selectionof alternatives is resentations at different levels of abstraction and thus is one of the mechanisms for transforming a given representation into guided by the reasons for the failure. a more abstract one. The result of a match is a representation lnferenceand Control Requirements.A brief note is in order of the similarities and differencesbetween the given representations and may include an associatedcertainty or strength of before continuing this section on the difference between inferin the degree of match. belief ence and control, particularly since in some works they are The specific matching methods used depend largely on the process of deriving refers the to used as synonyms. Inference representational formalisms that are used to code the data new, not explicitly represented facts from currently known facts. There are many methods available for this task, and being compared.They can range from image-image matchitg, they are discussedin detail in other entries (see Inductive subgraph isomorphisffis, or shape matching to matching only selectedfeatures with a model, such as identifying structural inference; Inference; Reasoning entries). Control refers to the Matching processes,particularly ones that incomponents. and search, inference, many the of processthat selectswhich matching techniques should be applied at a particular stage of volve matching images directly, are usually very sensitive to processing. The remainder of this section briefly discusses variations in illumination, shading, viewpoint, and 3-D orienthese issues and others in roughly the order that a designer tation. It is preferred, therefore, to match abstract descriptions of a typical image-understanding system would confront such as image features against models in order to overcome some of these problems. However, for 3-D models it is not them. always the casethat image features can trigger proper models Searchand HypothesisActivation. The basic interpretation for consideration. Rather, the process must also involve the Perspecparadigm used in IUSs, as is developedin Historical of the projection of the model that can be determination sevare There test." and "hypothesize is tive and Techniques, (see Pattern matching). Matchitg; beginning matched in turn described are these and this, to aspects eral Generationand Useof Expectations.Expectationsare beliefs with search and hypothesis activation. A general vision system must contain a very large number of models that repre- that are held as to what exists in the spatiotemporalcontext of sent prototypical objects, events, and scenes.It is computa- the scene.The conceptof expectation-directedvision is a comtionally prohibitive to match image features with all of them, mon one that appears in most systems. Expectations must going from and therefore, search schemes are employed to reduce the bridge representations in a downward direction, commonly is term a "Projection" appearance. image to models models number of modelsthat are considered.Only the salient of the representations between connection the denote to used salient are which of needbe considered,and the determination the is, for It example, domains. differing in but concept same methof search The catalog problem. is termed the "indexing" apits actual and object prototypical ods includes breadth first, depth first, hill climbing, best first, relationship between a that required is mechanism a Thus, image. pearance in an dynamic progfamming, branch and bound, A*, beam search, coninformation gatherirtg or constraint satisfaction, relaxation takes objectposition, lighting, observermotion, temporal internal repan create to into account viewpoint and tinuity, deall are These production systems. and processes, labeling
IMAGEUNDERSTANDING 393 resentation of an object's appearancein an image. Complete projections may not always be necessary,and in most casesit s€€rn-sthat expectations of important distinguishing features or structures are sufficient. The most common use of expectations is in directing image-specificprocessesin the extraction of image features not previously found (seealso Parsing expectation-driven). Changeand Focusof Attention. Even the best of search and hypothesis activation schemeswill often lead to very large hypothesissets.Computing resourcesare always limited, and thus the allocation of resources must be made to those hypothesesthat are most likely to lead to progressin the interpretation task. This can be done in a number of ways, including the use of standard operating system measures for resourceallocation, as were used in an augmentedfashion in HEARSAY (9), ranking hypothesesby means of certainty or goodness-of-fitestimates, or by considering the potential of a hypothesis in conjuction with the expensethat would be ineurred in its evaluation. These best hypotheses, which are usually those that are confirmed or virtually confirmed, are also termed "islands of reliability." Not only is it imporant to determine a focusof attention but it is also important to determine when to abandon a current focus as unproductive. The change of focus can be determined in one of two ways: the focus could be recomputedeach time it was required or it could remain fixed and only change when circumstances necessitated the change. The latter is clearly more desirable; yet mechanisms for its implementation are few. It should be pointed out that a focus of attention doesnot necessarilyrefer only to a hypothesis set but may also refer to a region on an image or a subset of some representation. Certainty and Strengthof Belief. The use of certainty measures in computer vision arose due to two main reasons:biological visual systems employ firing rate (which may be thought of as a strength of response),as the almost exclusive means of neural communication, and computational processes available currently are quite unreliable. This strength of responsemay be thought of as brightness for simplicity. Lateral inhibition (one of the processesof neural communication), whereby neurons can inhibit the responseof neighboring ones based mainly on magnitude of the firing rate, is a common process,if not ubiquitous. It motivated the use of relaxation labeling processesin vision. In relaxation, the strength of responseis termed "certainty," and is often used as a measureof reliability of a correspondingdecisionprocess,for example,the goodnessof fit of a line to the data. Since visual data are inherently noisy due to their signal nature, measuresof reliability are important in the subsequentuse of information derived using unreliable processes. Yet another use of certainty is in hypothesis ranking. The ranking of hypothesesis useful not only for the determination of a focus of attention but also for determining the best interpretation. Most schemes introduce some amount of domain dependenceinto the control structure, and this seemsto lead to problems with respect to generality. An important problem is the combination of certainties or evidence from many sources. Inferenceand Goal Safisfaction.Inference (qv) is the process by which a set of facts or models is used in conjuction with a set of data items to derive new facts that are not explicitly present in either. It is also called "reasoning" (qt). The many forms of reasoning include logical deduction, inheritance, de-
fault reasonirg, and instantiation. These are discussed in length in other entries (seeInheritance hierarchy; Reasonitg, default). However, it should be pointed out that the vision problem adds a few different wrinkles to this task that may not appear in many other reasoning processes.It is not true in general that the data set is complete or correct, and processesthat can reliably draw inferences from incomplete data are required. Second,sincevision is inherently noisy and as describedabove requires reliability measures,inference schemesshould also permit reliability measures to be attached to derived conclusions. Finally, since the processof vision involves a transformation from images to a final description through many intermediate representations, a reasoning schememust be able to cross between several representations. Most IUSs are not explicitly driven by a goal when interpreting images. They typically have implicit goals, such as to describe the scene in terms of volumetric primitives, to describe everything in as much detail as possible,or to describe the scenein the most specificterms possible.Human vision usually does involve a goal of some kind, and the area of AI that is concernedwith how to achieve goals given a problem is called "planning." Systemsthat can plan an attack on a problem must contain meta-knowledge, that is, knowledge about the knowledge that the system has about the problem domain (see Meta-knowledge, meta-rules, and meta-reasoning).The meta-knowledgeallows the system to reason about its capabilities and limitations explicitly. Such systems have a set of operations that they can perform, and they know under which circumstances the operations can be applied as well as what the effects may be. In order to satisfy a goal, a sequenceof operationsmust be determined that, in a stepwisefashion, will eventually lead to the goal. Attempts to find optimal plans usually are included in terms of min imization of cost estimates or maximization of potential for success.In vision the sequence of operators may involve image feature extraction, model matchirg, and so on (seePlanning). HistoricalPerspectiveand Techniques The historical development of the techniques of image understanding provides an interesting reflection of the major influencesin the entire field of AI. The emphasisin the IU community has been primarily in the control structure, and this discussionbegins with the sequenceof contributions that led to the current types of control mechanisms.Rather, little emphasis has been placed on integrating the best of the early vision schemesinto IUSs, and one notices the range of weak solutions to the extraction of features. Little discussionis thus provided; however, in the description of control structures for specific systehs, appropriate notes are made. Control Structures.The heart of virtually all IU systemsis the control structure (qt). Features universal to all working IUSs are cyclic control involving feedback (see Cybernetics) and the requirement of specific solutions to the problem of uncertainty. This survey of the development of control structure highlights only those systems that require and use explicit models of objectsor events of the domain. Other important contributions that impact IUSs are allocated their appropriate historical due but are not consideredpart of the direct line of development. Finally, with two exceptions,the
394
IMAGEUNDERSTANDING Projectmodelsinto imagespace
Interpretation
\3:3i;? Model activationvia image features
\
\
Verification
/\-
t\ t\
Extractfeaturesbased on interpretation
Extractmost obviouscontours
Extractline drawing \rmage
Image
Figure 1. The controlstructureof Roberts(L2).
hypothesis of Marr (10) and the intrinsic image concept of Tenenbaum and Barrow (11), only implemented and tested systems are described in this section. Developingthe Cycleof Perception.Roberts was the first (in 1965) to lay out a control scheme for organizrng the various componentsof a vision system (L2). They are shown pictorially in Figure 1. He defined several of the major processingsteps now found in all vision systems:extract features from the imoge,in his case,lines; activate the relevant modelsusing those features; project the model's expectations into image space; and finally, choosethe best model dependingon its match with the data. This is not a true cycle, and becauseof the lack of feedback,it was very sensitive to noisy input. In 1972, Falk realized that Roberts's work involved an assumption that would rarely be satisfied in real application domains, namely, that of noise-freedata. If noisy data were to be correctly handled, enhancements to Roberts's processing sequencewere required (13). In Figure 2 Falk adds a new component,the fill in incompletenessstep,and closedthe loop, allowing partly interpreted data to assist in the further interpretation of the scene.His program was called INTERPRET. Shirai, in 1973,defined a system for finding lines in blocks world scenesand interpreting the lines using models of line junctions and vertices for polyhedral objects(14).Thus, he was able to use interpreted lines as guidance in subsequent line finding. He first extracted features from a reducedimage, thus smoothing out some of the noise and smaller detail features, and then used these gTossfeatures in subsequent guidance. Shirai's cycle is shown in Figure 3. Shirai, however, was not the first to employ reducedimages in a preprocessingstage.Kelly, in 1971,had the intuition that if an image that was reduced in size was processedinitially,
Segmentation
\
\
Figure 3. The controlstructureof Shirai (14).
instead of the full-size image, much of the noise could be reduced,and the resulting edgesof lines could be used as a plan for where to find edgesand lines in the full image (15). This was applied to the domain of face recognition. Kelly reduced an image to 64 x 64 pixel size, thus minimizrng noise effects, and then locatedthe outlines of the faces.Thoseoutlines then formed a plan for the full-size image, limiting the searchspace for the detailed facial outlines. However, Kelly's system contained no models and was a sequential two-step process. Several incarnations of the cycle appeared subsequently, and one example of note is presented here, namely, the L977 work of Tenenbaum and Barrow in their interpretation-guided segmentation(IGS) program (11).Their version of the cycle is shown in Figure 4. IGS experimented with several types of knowledge sourcesfor guidance of the segmentation process: unguided knowledge, interactive knowledg., both user driven and system driven; models; and relational constraints. They concludedthat segmentation is improved over unguided segmentation with the application of knowledge, and with little computational overhead-the more knowledge, the faster the filtering process.
All adjacentinterPretation setsdisjoint?
Failure
Success
Terminate
Performsafest merge
Initial partition lmage
Figure 2. The control structure of Falk (13).
{ Vv[,p(v*)] 3v[p(v*)] are well-formed formulas. (wff.3) Nothing else is a well-formed formula. Parentheses and brackets will sometimes be omitted when no ambiguity results. For example, each of the following is a wff: A(x, y) In(Eiffel-tower, France) - Republican(John-F.Kennedy) (Capital(Albany, New-York) v B) VrFr Vr[- Human(r) v Mortal(r)] -t llr Unicorn(r)
For somex, x is a philosopherand r is a computer scientist. There exists an rc such that r is a philosopher and r is a computer scientist. Syntax. A formal syntax for a language 9 of predicate logic can be presentedby giving an alphabet, a recursive definition of term, and a recursive definition of utell-formedformulo (wff) (given in Tables 1-3). In order to define the notion of a sentence and to give the inference rules, the following definitions ate necessaryi (D1) Let pbe a wffprefixed by a quantifier phrase (i.e.,either Vv or 3v). Then rp is the scopeof the quantifier phrase.
Table 1. Alphabet of I n-place predicate symbols (n an integer) n-place function symbols (n an integer) Individual Individual
variables constants
A, . . (l an integer); , Z; At B,, any sequence of words separated by hyphens f, g, h; f , Ei, h; (i an integer)i any sequence of words separated by hyphens . , z) ui, . , zi (i an integer) . , €i ei, . , ei (i an integer); any noun phrase (the words separated by hyphens)
Connectives Punctuation Quantifiers Universal Existential
(()" V 3
(Note that a zero-place"predicate," Iike B, is an atomic wff.)
For example, the scopeof Vr rnVxg(r) is g@), but the scopeof 3y in Fy p0) v {) is ,p(y). (D2) Let the variable in a quantified phrase be called its uariable of quantification. Then: (a) An occurrence of an individual variable in a wff p is bound means: the variable occurs in the scopeof a quantifier phrase in E that has that variable as its variable of quantification. (b) An occurrenceof an individual variable in a wff p is free means: the occurrence of that variable is not bound. (c) A variable ts bound means: there is an occurrenceof that variable that is bound. (d) A variable rs free means: there is an occurrence of that variable that is free. For example, in (Fr v VrGr)
540
LOGIC, PREDICATE
the first occurrenceof x is free and the secondis bound; the variable x is both free and bound in this wff. Finally, (Dg) A sentenceis a wff with no free variables. (For further discussion of the grammatical syntax of a firstorder language and translations of natural-language sentencesinto it, seeRefs. 13-15.)
2. If p and $ are wffs and v is an individual variable, then (a) Fr - I if and only if not- fte, (b) Flp v {) if and only tf *e or Ffil (c) Frvvp if and only rf Fr,g for every interpretation I' that differs from I at most on what it assignsto v; (d) FFvp if and only if Fr,9 for some interpretation /' that differs from I at most on what it assigns to v. Finally,
Semantics.Providing a semanticsfor such a first-order language is somewhat more problematic than it is in the propositional case.The main reasonfor this is that a decisionmust be made about the domain (or universe) of discourse.It was noted above that a predicate can name a property (or relation) or a class.But classesare extensional("two" classesare identical if they have the same members), whereas properties are intensional (i.e., nonextensional).Moreover, there are important questions about what counts as an individual: 1. Can properties or classesthemselvesbe individuals? This is surely plausible; considersuch propositionsas: Red is a color. Colors are properties. {x: x is a rational number} is countable.
A wff p is valid in M (written: M F d if and only if Frg for every interpretation / on M. A structure M is a model for a set H of wffs if and only if M F H; for every wff H; e H. Expressibility.As is the case with propositional logic, one can chooseto employ either a small number of connectivesand quantifiers (for eleganceand metatheoretical simplicity) or a wide variety (for expressivepower). Thus, on the one hand, the formal system presented above may be extended in a natural way to include the other truth-functional connectivesor, on the other hand, restricted to using (say; only -, V, and V. The latter can be accomplishedas in propositional logic, together with the following definition: AvP :d.f-' Vv - I
However, care must be taken to avoid paradox, as in Russell's (16) well-known example: {x: x # *} € {r i rc # r} if and only if {x: rc # )c}f {x: x f x} 2. Must the individual actually exist? If variables and terms may only range over existents, how does one expresssuch sentencesas the following? There are no round squares. Santa Claus doesnot exist. All unicorns are white.
Vr[Ar + Br] and =lr[Ar n Br]
If f is an individual constant or individual variable, then I(t) € D. If f is a function symbol, then /(f) € F. It f is an n-placefunction symbol and tt, . , tn ate terms, I(f)(I(t), t)) . then l(f(h, ,I(t)) € D. , If P is an n-place predicate symbol, then /(P) e R. The notion of "truth on an interpretation" (symbolizedas: Fl) can be defined recursively as follows:
(I(t),.
,I(t")>€/(P).
All As are Bs. SomeAs are Bs. as, respectively,
Thus, a semantics for a first-order language cannot be completely specifiedindependently ofan ontology-a precise specification of the domain. Nevertheless, the general form of such a semantics (often calledfo rmal semantics,seeRef. 8) doesnot vary. Metatheoretical results are given here in terms of settheoretic semantics (i.e., in terms of an ontologa of sets and ertvrsrv are given in most of the their members), which is the way they literature. Let M be the structure (D, R, F), where D is a nonempty set, R is a set of n-place relations on the elements of D, and F is a set of n-place functions on the elements of D. An interpretation, .f, on M for I is a function from the symbols of 9 to D U RUFsuchthat:
1. If P is an z-place predicate symbol, and tt, are terms, then FrP(/r, , tn) if and
Another variation is to employ restricted quantifiers. Instead of translating
tn
only
if
with the noticeable change in syntactic structure, a family of restricted quantifiers can be introduced: Nr: rp(r)) and (fr: 'p(r)) Using this notation, the translations become the more uniform-looking (Vr: Ar)Br and (lr: Ar)Br This notation has the additional advantage of being extendible to generalized quantifiers for handling such sentencesas Most As are Bs. Many As are Bs. as well as numerical quantifiers: Exactly 4LLGAs are Bs. Greater than 5 As are Bs. Between 5 and 10 As are Bs. Generalized and numerical quantifiers are, however, beyond the scopeof first-order logic (for discussionsof these issues,see Refs. L7 -2L). Other alternatives to first-order languages and logics have been motivated by ontological concerns. As is seen below, when deduction is discussed,Vxq@) implies axg@) in a nonempty domain. But what about the empty domain? Why
LOG|C, PREDICATE
should logic imply that something exists? Shouldn't logic be independent of ontology? Attempts to broaden the scope of first-order logic have included free logics (i.e., logics that are free of existence presuppositions)and Meinongian logics that allow for representing and reasoning about nonexistents.Both of these kinds of logics often chooseto represent existenceby a special predicate, E!, rather than by trying to define existence in purely first-order terms (as, e.g., "Jtcfx - af" for "o exists") (for discussionsof free logics, see Refs. 10 and 22-27 and for discussionsof Meinongian logics, seeRefs. 28-36). DeductiveSystemsof PredicateLogic Syntax.As with propositional logic, a deductivesystem for predicate logic can be presented axiomatically or as a natural deduction system. Axiomatic PredicateLogic. In this section a set of axioms and rules of inference for predicate logic are presented using the terminology introduced in the entry Logic, propositional. As is done there, the wffs are restricted to those whose only connectives are -t and +; and the only quantifier is the universal quantifier. All wffs of the following forms will be axiom schemata: (Al) (A2) (AB) (A4) (A5)
(p - (./l- d). ((p - (./l- X)) - ((p - f) - (p - X))). ((- e + - f) -- (,, - d). (Vv[p - ,r] - (p -+ Vvf)), where v is not free in p. (Vvq(v*) + p(t/v)), where p(tlv) is the result of replacing all free occurrencesof v in g by any term t and where
in t are rreeat all locationsin .,,wherev ;::JtrH|:il There are two rules of inference: Modus ponens:From g and (p - f), infer g. Universal generalization: From Introduction --> Introducti.on to prove ; END of subproof that used -(llrlHorse(r) SzlAnimal(z) n Head-of(o, z)l) n Head-of(a, r)l ; 13.(frlHorse(r) n Head-of(a,x)l - f zlAnimal(z) n Head-of(o,z)l) ; returned to main proof from line 12 of outer subproof 14.Vy[!lrlHorse(r) n Head-of(y, r)] --+ 3z[Animal(z) n Head-of(y, z)]l ;from line 13, by V Introduction Figure 1. An example of Introduction and Elirninatiioz rules to prove the argument that if horses are animals, then every head of a horse is a head of an animal.
duced here can be extended by introducing a new variablebinding operator in addition to the quantifiers. Unlike the quantifiers, which are wff-producing operators, the definite description operator r is a term-producing operator. The definition of term can be augmented as follows: (T5) If p is a wff and v is an individual variable, then r v[91 is a term. There has been a great deal of controversy over the semantics of such terms. The approach due to B. Russelt (61) has becomethe standard logical one. According to Russell'sanalysis, sentencesof the form *bx p(r)) should not be treated as subject-predicate sentences;that is, they should not be parsed as consisting of a noun phrase, 1x Q(r), and a verb phrase, rf. Rather, they are to be arralyzedas 3xl,p(x) n Vytp(y) + y - )cl n f(r)l For instance, to use Russell's famous example, The present King of France is bald
It is a consequenceof this analysis that the sentencecomesout false, since there is no present King of France. Similarly, The book that Knuth wrote is interesting is false, since Knuth has written more than one book; and The winged horse captured by Bellerophon is named "Pegasus" is false, since the winged horse captured by Bellerophon does not exist. The addition to the axiomatic formulation of predicate logic is straightforward: Simply add the axiom schema (A8) f (rvr.p(vr))* =lvr[ 0 Substitute(0 u) if u < 0 ? Update(L 1 N) Destructive assignment is often used in order to reuse names as well as memory. The example above could be modified to make L serve as the name of both old and new states of the sequence.This use of names is ambiguous and is logically unsound, yet is permissible in PROLOG, as it is in the use of destructive assignment within conventional languages. of Horn-ClauseProgramming Extensions Negationas Failure. Although Horn-clauseform is adequate for computation, extensionsof its logic can greatly improve its suitability for practical applications. Negation as failure is typical of such extensions.
LOGICPROGRAMMING 549 Negation as failure treats failure to prove as proof of negation: A negated procedure call Not(P) is deemedto hold if and only if P cannot be shown to hold in a finite amount of time. For example, given P(a) P(b) ? Not(P(c)) the query Not(P(c)) is solved becauseP(c) fails finitely. Clark (19) has shown that negation as failure is a correct approximation to standard classical negation, provided that the implicit closed-world assumption (CWA)-that the program contains a completedescription of the relations named in it-is made explicit. In the example above,the CWA is explicitly expressedby P(x)iff(x: aorx-b) and a complete specification of :, including inequalities such as a+b,b+a,. This more complete specification of P then classically implies -'P(c). The negation-as-failure rule has been proved complete in a restricted context by Jaffar, Lassez,and Lloyd (20). More generally, however, it is incomplete and oversensitive to context of use. In particular, it cannot deliver values for variables in negatedcalls. Thus, for example, given appropriate clausesfor Q, it can deal with the query ? Not(Q(a)) but not with the logically identical ? Not(Q(x)) Negation as failure was first given prominence in Hewitt's language PLANNER (3). ConditionalSubgoals.Horn clausescan be extendedto admit subgoalsthat are themselves conditional in structure, as in ? (V x) (P(x) if Q1v)) Such an extension bridges much of the gap between Hornclause logic and full first-order logic. Programs using it can be converted to Horn clauses augmented by negation as failure. For instance, the query above can be rewritten as ? Not(Q(x) & Not(P(x))) The original query requires that all solutions of Q(x) also solve P(x). PROLOG execution of the rewritten form treats this as the task of showirg, by iterating through the solutions of Q(x) (if any), that none of them fails to solve P(x). Setsof Solutions.Another useful extension-essentially a metalevel feature-is the ability to collect all solutions to a call into a set represented by a single term. This facility is commonly called "aggregation." For example, to construct and then count the set y of all personsx liked by John, one could write ? y set-of (x: John likes x) & Length(y n)
The set-of call can be implemented by posing the call John Iikes x and collecting all the distinct solutions for x into y until failing to generate any more. The soundness of this is, Iike that of the implementations suggestedfor negation as failure and universally quantified subgoals,dependent on the CWA. Also, like both those extensions, it is susceptibleto incompleteness, looping, and context sensitivity through falling short of full classical logic. Horn-ClauseMetalevelProgramming.Horn-clauselogic and its extensionscan be used at the metalevel to enable programs to describethe logical and behavioral properties of themselves and other programs. This use of metalevel logic is exemplified by the expert equation solver PRESS implemented in PROLOG by Bundy and Welham (21). Instead of solving equations by using object-level rules of algebra, PRESS uses metalevel rules describing mathematical problem-solving expertise (see Meta-knowledge,-rules, and -reasonitg). Bundy's use of metalogic operatesentirely at the metalevel. Systems that amalgamate object-Ievel and metalevel uses of logic have been devised by Weyhrauch (18) and Bowen and Kowalski (2D. The basis of these amalgamatedsystemsis the use of a proof predicate Demo(x y) expressing "conclusiony is provable from assumption set r," whose role is analogous to the EVAL function of LISP. The proof predicate can itself be defined in logic and executed either by running its definition directly or by running the object-level logic system recursively. Including the proof predicate in the logic language makes it possibleto formalize and reason with such subtle distinctions as that between "a person is innocent if not guilty" and "a person is innocent if not proven guilty." It also allows one to formulate self-referential sentences,not those like "this sentence is false," which is a paradox and not expressibleusing Demo, but rather ones like "this sentence is unprovable," which is true but unprovable. This is a direct analogue of Godel'sproof of the incompletenessof axiomatic arithmetic. SpecificationversusProgramming ExecutingSpecifications.Logic has been used traditionally in computing to expressdeclarative specificationsserving program analysis and construction. However, the mechanization of logic through computer-basedproof procedures has made such specificationsexecutable in their own right, so that, €.9., they may be tested on small-scale data or debuggedor treated as prototypes in program development. As a preliminary problem description, a naive, nonprocedural, logical specification is, or at least ought to be, both simpler to reason about and more flexible to modify than a program containing greater commitment to a particular problem-solving method. Such specificationscan be used not only as precursors to program developmentbut also as queries to a database. A declarative style for such queries is essential to users unconcerned with the database's storage and access mechanisms. Horn-clause logic and its extensions can be used both for specifying and for programming. These usesare distinguished only by their intent and by their relative degrees of proced.uralness.The sentence y is-sort-ofx if
y is-permutation-ofx & y is-ordered
550
LOGIC PROCRAMMINC
is more like a declarative specification of the sorting relation than is the sentence y is-sort-of x if
x decomposes-into(xr xz) & y1 is-sort-ofx1
&
y2 is-sort-of x2
&
y is-merge-of (yt yz)
together with specificationsof Member and is-lower-bound-for and with the assumption that < is transitive. The PROLOG clausesare each logically implied by the specificationand easily derivable from it using first-order inference. Program derivation can also be applied to given logic programs in order to transform them to equivalent but computationally different ones. For instance, an alternative, nonlooping program for the former connectivity problem is ? x connects-toy
which anticipates a specificmerge-sort algorithm. Both, however, are directly executable, with varying efficiencies, in PROLOG. Logic can also be used to encodeand animate (possibly incomplete) knowledge formulated prior to formal specifications in the early stages of user-requirements definition and systems analysis. For instance, the knowledge content of conventional data-flow diagrams can often be transcribed directly into executable Horn clauses. Although mere run-time inefficiency may be tolerable when experimenting with specifications,the deliberate disregard of the problem solver's behavior raises the hazard of nonterminating loops. The naive specification x is-joined-to y if y is-joined-to x of connectivity in a network defined by a is-joined-to b b is-joined-to c etc. when executed by PROLOG to determine the network's connections through a query ? x is-joined-toy will loop indefinitely-without computing any solutions and without even accessingthe data defining the network-if the general rule for is-joined-to textually precedesthe data. Fundamentally this is due to the "unfairness" of depth-first search commented on above. However, if the data precede the rule instead, all solutions are generated infinitely often. In either caseone is penalized for ignoring the procedural consequences of what one has written. One way of overcoming this while preserving declarative freedom of style is to incorporate loop detection into the problem solver. Deriving Programsfrom Specifications.A specificationassumed to be a logically correct problem description can be used to derive a more efficient description (program) for the problem. If the specificationis itself written in logic, it can serve as an axiom set for deducing computationally useful theorems. For example, the PROLOG clauses w is-least-ofw.NIL w is-least-ofu.v.NIL if u < v & w is-least-ofu.NIL w is-least-ofu.v.NIL if u > v & w is-least-ofv.NIL witl answer the query ? w is-Ieast-of 3.2.1.4.NIL much more efficiently than will the naive specification w is-least-of x iff Member(w x) & w is-lower-bound-forx
x connects-toy if x is-joined-to y x connects-toy if y is-joined-to x a is-joined-tob b is-joined-toc etc. and is derivable from the previous looping program using the bridging specification x connects-toy iff
(x is-joined-to y or y is-joined-to x)
As a trivial example, from this specificationone can infer the conditional sentence x connects-toy if (x is-joined-to y or y is-joined-to x) and from this the two principal Horn clauses x connects-toy if x is-joined-to y x connects-toy if y is-joined-to x of the desired program. Studies of logic program verification and derivation can be found in works by Clark (23), Hogger (24), Clark and Darlington (25), and others. FunctionalProgramming Functional programming can be regarded as logic programming in the broad sense of computation by deriving consequences from assumptions. Assumptions in functional programs are expressed as equations between individuals constructed by means of variables, constants, and function symbols. For example, the equations length(Nll,) - 0
Ll
Iength(*.y)_length(V)+ 1
L2
recursively define the function that computes the length of a list in terms of the addition function and the list constructor function ("."). To compute the length of the list D.A.D.NIL it is necessaryto derive a conclusion of the form length(D.A.D.NIL) - t where f is expressedonly in terms of undefined functors (such as "."). The derivation is performed by using the equations as rewrite rules:
TOGIC PROGRAMMING
length(D.A.D.NIL) _ length(A.D.NIL) + 1 - (length(D.Nll,) + 1) + 1 _ ((length(Nll,) + 1) + 1) + 1 _((0+1)+1)+1
-(r+1)+1 _2 + 1 -3 In logic programming understood in its narrow sense as backward reasoning applied to Horn clausesand their extensions, defined function symbols such as length are represented by relation symbols and term rewriting is replaced by problem reduction. Thus, the equations L1 and L2 would be expressedas Horn clauses: Length(NIL 0) Length(x.y u) if Length(y v) & Plus(v 1 u) The computation of the length of a list is performed by backward reasoning:
can be represented by equations
member(x x.y) : TRUE member(x z.y) : member(x y) The problem with this correspondenceis that with the normal algorithm for rewriting terms the equations can be used only to test for membership, whereas the Horn clausescan be used to generate members equallY well. The relationship between Horn-clause programming and functional programming is a very active research subject. Much of this activity is centered around the development of hybrid languages that combine the two kinds of programming. The language LOGLISP (27), which combines Horn-clause programming and LISP, was the first of these hybrids. In addition, much attention is being given to the extension to Hornclause programming by features that have received greater attention within the framework of functional programming. The most important of these features are the treatment of data types, higher order functions, and the development of highly parallel computer architectures. Some higher order effects can be achieved even in unextended Horn-clause logic. For instance, the solution x - u.v.NIL computed from the query
? Length(D.A.D.NIL u) ? Length(A.D.NIL ul) & Plus(uL 1 u) ? Length(D.NIL u2) & Plus(u2 1 ul) & Plus(ul 1 u) ? Length(NIL uB) & Plus(u3 1 u2) & Plus(u2 1 ul) & Plus(ul 1 u)
551
? Length(x 2) can be regarded as a binary function: given values for u and v (as input) it yields a list containing those two items (as output). This example demonstratesthe special power of the logical variable in being able to communicate partially instantiated data.
? PIus(0 1 u2) & Plus(u2 1 ul) & Plus(ul 1 u) ? Plus(l 1 u1) & Plus(ul 1 u) ? Plus(2 1 u) giving the solution u : 3. The example illustrates the general fact that computation by means of equations used as rewrite rules can be simulated by means of backward reasoning using Horn clauses.Since all computable functions can be representedby rewrite rules, this suggestsa particularly simple and transparent proof that all computable functions can be represented by means of Horn clauses and all computation can be performed by backward reasoning. The adequacyof Horn-clause logic for computation was first proved by Aanderaa and Borger (26), who showedthat every computable n-ary function can be computed by some Hornclause program using only terms constructible from the constant 0, unary functor s, and n * 2 vanables. The representation of n-ary functions by means of n * 1-ary relations, illustrated above,is only one of several possiblecorrespondencesbetween functional programming and Hornclause programming. Another correspondenceuseful for expressing Horn-clause programs as functional programs makes use of n-ary Boolean-valuedfunctions to represent n-ary relations. For example, the Horn clauses Member(x x.y) Member(x z.y) if Member(x y)
Logic Databases Logic Databases= DeclarativeLogic Programming.The notion of logic databasearose out of work on question-answering (qv) systemsin AI. The main impetus to this work was Green's demonstration that resolution logic could be used for question answering (28). Logic databasesshare with logic programs the use of logic to represent knowledge (see Representation, knowledge) and the use of deduction to derive solutions to problems. However, whereas logic programming admits both declarative and procedural modes of use, logic databasesconcentrate on the declarative. In the declarative use of logic the user represents knowledge and formulates problems without concern for the problem-solving process.The problem solver, whether human or machine, is conceptually distinct from the user and can use any problem-solving strategy, including backward reasonirg, to solve the problem. In the procedural use of logic, the user formulates knowledge and posesproblems bearing in mind the problem solver's problem-solving stratery. The user programs the problem solver by assessingthe effect of his statements on the problem solver's behavior. The more sophisticated the problem solver, the more effective it is for declarative modes of use but the more difficutt it is for the procedural programmer to predict or control its behavior.
552
TOGIC PROGRAMMING
Relational Databases.Relational databases have emerged from the world of commercial data processing.Like logic databasesthey use logic declaratively to express queries to databases;unlike logic databasesthey treat databasesas modeltheoretic relational structures rather than as sentences expressedin formal logic. Question answering is a model-theoretic process of unraveling truth definitions to evaluate the query in the relational structure. Queries can be arbitrary formulas of first-order logic augmented with aggregation operators such as set-of. Alternatively, and equivalently, a relational databasecan also be viewed as a special caseof a logic databasein which the database consists of variable-free atomic assertions. Queries are equivalent to Horn-clause queries augmented with negation as failure (29). These two contrasting views of the logical nature of relational databaseshave led to great confusion, includirg, €.g., conflicting claims that recursion can be represented in firstorder logic and that it cannot (30). Query Optimization. Many relational databasesystemsuse query optimizers to analyze the form of queries and to determine appropriate evaluation strategies. The resulting strategies are generally sensitive to the I/O pattern of relation arguments. Thus, e.9., a query of the form ? John supplies x & x costs y "Find the cost of articles supplied by John" will be evaluated left to right, whereas a query such as ? x supplies y & y costs l0A "Find suppliers of articles which cost tUg" will be evaluated right to left. It can be argued therefore not only that relational database systems can be regarded as logic databasesand consequently as declarative logic programs but also that for certain classes of "programs" they are more sophisticatedthan PROLOG. Of course, this argument ignores the fact that PROLOG has to deal with more complicated databasesand has to cater to both declarative and procedural modes of use. Nonetheless,the argument doespoint out someof the possibilities for improving logic programming languages such as PROLOG. Greater use can be made both of compile-time program transformation and of more sophisticatedrun-time execution strategies. Program transformation as originally developed for functional progTamming and extended to logic programming can be viewed as subsuming query optimization. More intelligent execution strategies are discussedbelow (seeIntelligent Execution Strategies).
Element(A L 2) Element(A 2 4) : Element(A 100 200) or by a general rule, Element(A i x) if 1 = i & i = 100 & Times(2 i x) The use of databases as data structures has the advantage that accessto the data can be obtained by arbitrary queries, €'8'' ? Element(A 50 x) ? Element(A x 100) ? Element(A x y) & Element(B z y) The implementation is responsible for arranging efficient acC ESS.
The problem with databasesas data structures arises when they are updated. Considerthe operation that interchangesthe ith and jth elements of an array represented by the relation Element as above. The goal ? Interchange(AZ 3 x) for example, can be solved for x by a single clause, Interchange(x i j inter(x i j)) However, it requires an inordinate amount of computation to determine the elements of the new array inter(A 2 3), which results from the interchange; the following clauses define its elements: Element(inter(x i j) i u) if Element(x j u) Element(inter(x i j) j u) if Element(x i u) Element(inter(x i j) tD2>D1. Using this difference in definition of relevance. the used possess does not D3. for a state that information, GPS searches Triangularity has been used as the basis of a method (12) Then, without reintroducing D3, it searchesfor a state that for learning difference information. The basic idea is to find a doesnot possessD2. And finally, it looks for a goal state without reintroducing either D3 or D2. Of course,GPS backtracks set of differencesthat gives rise to a triangular table. The row ordering gives the difference ordering, and the diagonal enif necessary. The invariances of the differences are important. A differ- tries give operator relevance. The method starts by looking for enced is invariant over an operator f if , for any state s, both s properties that are invariant over at least one operator. Then, and f(s) either possessd or both do not. In other words, f the properties are combined to form properties all goal states neither introduces nor removesd. For example,D2 and D3 are possess.These properties are potential differences, and the invariant over 180'moves that are relevant to Dl. The invari- method attempts to form a triangular table out of them. The details are given in Ref. L2. ances can be tabulated as follows:
D3 D2 D1
F3 F2 100 110 111
F1
Fl are the 180" moves;F2 are the 90o and270" moves; F3 are the remaining moves. A 0 indicates that the difference is in-
Summary A number of different problem-solving methods have employed some form of means-endsanalysis. Empirical results show that it has been rather effective in controlling search. Most of these methods use a specialization of the mechanisms found in GPS in order to remove the requirement for external
MEDICATADVICE SYSTEMS
information about differences. In addition to GPS's mechanisms, MPS has two important mechanismsit usesvery effectively in solving difficult puzzles like Rubik's cube. Some guidelines are given for selecting differencesthat lead to efficient search.
BIBLIOGRAPHY
level performance so long as the program had comprehensive and accurate knowledge of the domain. AIM research activities are important to medicine not only becausemedical advice systemswill somedaybecomeroutine tools in clinical practice but also becausethe education of doctors,which has traditionally emphasized memorization of knowledge, may increasingly emphasize the learning of effective problem-solving techniques,enhancedwith the knowledge and advice provided by computer systems.
1. A. Newell,J. C. Shaw,and H. A. Simon,Reporton a General Problem-Solving Progr&ffi, Proceedingsat the International Conferenceon Information Process.,UNESCo House, paris, pp. 256264, 1960. 2. G. W. Ernst and A. Newell, GPS: A CaseStudy in Generalityand, Problem Soluing, Academic Press, New york, 1969. 3. A. Newell and H. A. Simon, Human Problem Soluing, Prentice Hall, EnglewoodCliffs, NJ, Ig7Z. 4. J. R. Quinlan and E. B. Hunt, "A formal deductiveproblem-solving system," JACM 15, 625-646 (October 1968). 5. R. E. Fikes and N. J. Nilsson, "STRIPS: A new approachto the application of theorem proving to problem solving, Artif. Intell. 2, 25r-288 (1971). 6. E. D. Sacerdoti, "Planning in a hierarchy of abstraction spaces, Artif. Intell.5, 115-135 (1974). 7. R. E. Korf, "Macro-operators:A weak method of learnirg," Artif. Intell. 26(I),35-78 (April 1985). 8. N. J. Nilsson, Problem-Soluing Methods in Artificial Intetligence, McGraw-Hill, New York, L97L. 9. Reference3, pp. 428-435. 10. R. B. Baner{i, GPS and the Psychologyof the Rubik Cubist: A Study in Reasoning about Actions, in A. Elithorn and R. B. Baner{i (eds.),Artifi,cial and Human Intelligence,Elsevier Science, New York, 1984. 11. R. B. Banerji and G. W. Ernst, A Theory for the CompleteMechanization of a GPS-type Problem Solver, Proceedings of the Fifth International Joint Conference on Artificiat Intettigence, Cambridge, MA, 1977, pp. 450 -456. 12. M. M. Goldsteinand G. W. Ernst, "Mechanicaldiscoveryof classes of problem solving strategies," JACM 29, I-28 (January 1982). G. Enxsr Case Western ReserveUniversitv
MEDICALADVICESYSTEMS For several decades, collaborating computer scientists and physicians have been building computer programs to diagnose medical illness and to recommendtherapy. In the early 1970s four research groups developedprograms that differed somewhat from the other medical decision-making programs in that they drew heavily on earlier AI research such as DENDRAL, a program from the late 1960s that had used expert knowledge to derive chemical structure from mass spectral data (1). The resulting work helped define the field of AI in medicine (AIM) and seededdevelopmentof expert systems(qr) in other domains (i.e.,fields of expertise)as well (2,3).Medical diagnosis and patient management problems helped demonstrate the validity of an emerging AI principle: that domainspecific assertions and extensive knowledge about a problem area are generally more crucial to problem-solving performance than are domain-independent principles of reasoning. Simple reasoning techniques were shown to suffice for expert-
TheoreticalBasis ProtocolAnalysis.The theoretical foundation of AIM owesa great deal to psychological research carried out in the mid1970s.In these experiments physicians were urged to verbalize their thoughts while they solved diagnostic problems. Researchers then analyzed transcripts of those sessions. Investigations of this type (4,5) identified a general problem-solving procedure common to both expert and novice physicians: the hypothetico-deductive approach. Hypotheses emerge quite soon after the physician begins gathering data, and these are tested as new data arrive. Questions may be generated solely to test an active hypothesis or to distinguish between hypotheses. Thus, early generation of hypotheses seemsto provide leverage for the diagnostician. Building on those results, researchers at the University of Minnesota (6) examined the performance of both experts and novices and found differencesnot in their reasoning-regardless of experience,they shared the hypothetico-deductiveapproach-but in the richness and organization of medical knowledge.Novices had spotty knowledge of diseases,not yet full enough or sufficiently organized to optimize the hypothetico-deductive approach. These results agreed with the results of the expert systems research mentioned earlier in that performance seemed to be critically dependent on domain-specificknowledge. KnowledgeRepresentation.Two aspectsof knowledge representation (qv) are of particular interest in consideringthe construction of medical advice systems.First, what knowledge do physicians use to make the diagnosis and to plan therapy? Second,what abstract data types are best for computer implementations of that knowledge? It became increasingly clear that the first-generation AIM programs captured only a small portion of the knowledge that physicians actually use in problem solving. Typically, the medical knowledge represented consistedof weighted associationsbetween findings (i.e., observable descriptors of a patient) and hypothesesor between two hypotheses.The underlying semanticsof such associations were not always made clear, and there was generally no distinction made between causal and associationalrelationships. For example, a diagnostic system might represent a link between the hypothesis of breast cancer and the finding that the patient's mother had breast cancer.In this casethe finding is a risk factor, not a clear causal relationship, as a skiing accident might be to a ftactured leg. In recent years AIM researchhas explored various representations for causal knowledge and their integration into advice systems (seeReasonitg, causal). Pure causal modeling is rarely applicable in medicine becausemedicineis an empirical sciencein which detailed mechanisms are often unknown. However, whenever cause-effect information is available to physicians, they use it in at least five ways. First, if one can confidently follow effect-to-cause
MEDICALADVICESYSTEMS 585 links (i.e., statementsof what entities may causean observed effect) from the patient's complaints back toward primary disorders, Bil intersection point provides the diagnostician with a commoncauseof multiple complaints.CASNET is a computer program developedat Rutgers for the diagnosisand treatment of glaucoma; that domain lent itself to this intersection-point technique (7,8). (For historical reasonsthe names of computer programs are written in uppercase,e.9., CASNET. Occasionally, the name is an acronym, but understanding the acronym seldom helps one understand the system. In this discussion many of the better known computer programs are referred to by an uppercasename, and the acronyms are explained only if necessary to understandthe accompanyingtext.) Second,medical therapy is often unavailable either for the patient's complaints or for the elemental physiologic disorder (primary disease)at the beginning of the causal path. But effectivetherapy may indeed be available for intermediate states. For example, swollen, painful feet can be causedby abnormal retention of fluid in the body, which is in turn causedby cardiomyopathy. Current medical therapy cannot correct cardiomyopathy, and it would be suboptimal to simply give pain killers for swollen feet, but drug therapy can reversethe fluid retention (intermediate state) and thus relieve the patient of swollen feet. Third, physicians use causal models to interpret the temporal ordering of complaints. Leg cramps that occur during vigorous walking may be due to atheroscleroticdisease,in which the leg muscles begin consuming more oxygen than the narrowed leg arteries can deliver. Leg crampsthat are relieved by walking cannot be explained by this mechanism. Fourth, causal information can be used by a diagnostician to avoid treating two related findings as though they provide independent support for a hypothesis. For example, if there are known associations between findings fi, fn, and hypothesisI/, observation of both fi and fn might be interpreted as contributing independently to confidencetn H. But if it is known that the causal path is f/ - fi - fn, then f1 and /p must be dependent.Cooper (9) uses causal models in this way to establish probability bounds that are consistent with knowledge about cause and effect.Finally, physiciansuse causal modelsto partition their knowledge into levels of abstraction. Diagnosis and explanation can then be performedat the clinical level (e.g.,fatigue) or the pathophysiologicallevel (e.g., serum partial pressure of carbondioxide in blood is related algebraically to pH), depending on the complexity of the problem and the demands for explanation. ABEL, a computer program developedat MIT to deal with acid-base and electrolyte disorders, first demonstrated the advantages of using such levels of abstraction (10,11). Another area of increasing emphasishas beenthe representation of a taxonomy (i.e., hierarchic organization)for the diagnostic hypothesis space. For example, viral hepatitis and alcoholic hepatitis are both inflammatory diseasesof the liver. A representation schemethat captures this type of hierarchic relationship might allow the system to begin reasoning at an appropriate level of abstraction, e.g., to identify a patient as having hepatitis before beginning to determine which subtype is present.Diseasetaxonomies,then, have been used to direct search. The MDX system, a liver diseasediagnostic program developedat Ohio State University, contains a taxonomy of diseasesthat allows the system to direct the search as a progressive refinement of hypotheses, popping back to higher nodes in the hierarchy only when strong contradictions arise (L2). Another control schemethat uses taxonomic knowledge
extensively can be found in the design for enhancementsto INTERNIST, a diagnostic program for internal medicine developedat the University of Pittsburgh (10). The abstract data types used in AIM systems have been legion, but three classespredominate: production rules (see Rule-basedsystems),frames (seeFrame theory), and semantic networks (qv). AIM researchershave not been uniform in their choice of knowledge representations. Four early AIM computer programs exemplified this diversity of representation schemes:MYCIN experimented with production rules; PIP and INTERNIST used disease frames; and CASNET representedcausal relations in an associationalnetwork. An excellent discussionof knowledge representation in these four early AIM systemscan be found in Ref. 13. Support for the definition of abstract data types is provided by "object-centeredprogramming" languages (see Languages, object-oriented),which can be used to bind algorithms to the data structures on which they operate.Many feel that the developmentof large systems is more manageable with this encapsulation scheme,and it still allows designs that use production rules, frames, or networks. Several of the object-orientedlanguages facilitate the construction of taxonomies becausethe languagesprovide automatic inheritance of behavior from objectsto their subtypes in a hierarchy. Control. Separation of the knowledge base (data structures) and control (algorithms) is often cited as a central element in expert system design and is a goal of most AIM system designersbecausethe technique preservesthe ability to work with each component separately. Designers can experiment with new control schemes,keeping the knowledge base fixed, and observeperformancechanges(seeControl structures). For example, a new technique for combining evidence might be run on the MYCIN knowledge base, a collection of rules for making infectious disease diagnoses.Or a new INTERNIST differential diagnosis mode might be run on an otherwise unaltered knowledge base. Knowledge acquisition (qv), a primary concern of medical advice systems, can ideally be achieved by adding new instantiations of a data structure (e.g.,a new rule or a new diseaseframe), thereby upgrading the knowledge base without changing the control structure. There are as many control schemesas there are systemsand a large number of terms in use. MYCIN searchesits rule set using a depth-first control strategy. The system usesbackward chaining to invoke and link its rules so that a reasoning network is created dynamically. INTERNIST's control is initiated with a data-directed scheme but evolves into a hypothesisdirected approach after an initial set of hypothesesis invoked (14) (seeProcessing,bottom-up and top-down).The Serum Protein Diagnostic Program (Helena Laboratories), built with an expert system-building tool known as EXPERT (15), doesnot require hypothesis-directedcontrol becausequestion selection is not a problem; most of the information is obtained automatically from an electrophoresisinstrument with which this program is packaged and sold. Thus, its control is predominantly data directed. The control stratery of Ohio State's MDX system (I2) is a breadth-first search of a static tree. As MDX pushesdeeperinto this taxonomy tree, it refines hypothesesto be more specific.The ATTENDING syst€ffi, developedat Yale to critique anesthesiamanagement plans (16,L7),searchesa hierarchical planning network in order to identify alternatives to the user's proposedplan. Starting at the most detailed arcs of this augmented decision network (similar to aug-
586
MEDICALADVICE SYSTEMS
mented transition networks (seeGrammar, ATN) used in natural-language (qv) research), ATTENDING compares the of the user's proposedarc (action) to the risks of parallel ;TI:
the advantages of each can be melded in medical-advicesystems (seeReasonirg, plausible).
EvaluationFunctions.AI chess-playingprograms (seeComputer chessmethods) use an evaluation function to assign scalar values to board positions.Advice systemsin medical management face analogous situations, but the values of medical outcomesare difficult to assess.What are the relative values of chronic pain vs. a lifetime of paralysis vs. loss of life? The absenceof a generally accepted"correct" therapy means that the physician will demand a reasoned argument that addressesthe issues of costs and benefits in a convincing way. This issue is of growing importance to medical AI researchers becausethere is increasing interest in designing therapy systems. Diagnosis systems typically sidestep the difficulties of evaluation functions, except as they relate to test selection during diagnostic workup. Most of these systems consider information-gathering costs,but this doesnot constitute a comprehensive value theory for medical advice systemsbecauseit ignores the utility of acts, i.e., the cost of incorrect action. For example, assumethat a medical-advicesystem concludesthat an infection is most likely causedby organism 1 and much less likely by organism 2. Is it correct management to treat for organism 1 and not for organism 2? Perhaps not if organism 1 causesonly discomfort, organism 2 can cause death, and the treatment for organism 1 may causekidney damage.The cost of diagnostic misclassification drives the real-life diagnostic process.Medical cost containment pressures may force more explicit inclusion of cost-benefit considerations.Someevaluation techniques that AIM management programs have used are included in the discussionof example systems,below. Future researchis likely to draw upon related disciplines such as operations research that provide a formal theory for evaluating the expectedutility of actions.
Additional ongoing research topics for investigators building AIM systems for diagnosis or management advice include knowledge acquisition, explanation, temporal reasoning,and validation.
Inexactlnference(ScoringHypotheses).Inexact inferencein this discussionrefers to use of information that is probabilistic to somedegreerather than purely categorical (seeReasonitg, plausible).Medical evidenceis such that most conclusionscan be drawn only with a limited degreeof certainty. This character of medical evidenceand hypothesis assessmenthas driven AIM researchers to experiment with different scoring schemes.Few AIM systems have used classical probability theory to represent uncertainty. Systemsdevelopedin medical centers have tended to seek representations for uncertainty that reflect physician behavior, and several researchershave argued that probability theory and the use of Bayes' theorem (see Bayesian decision methods) do not model that behavior well (18). They further argued that the application of Bayes' theorem often requires so many simplifying assumptionsthat the theoretical foundations tend to be invalidated in any practical system using a probabilistic approach.Thus, more ad hoc approacheshave becomecompetitors for representation of uncertainty. The MYCIN experiments resulted in the certainty factor model (18).The INTERNIST project produceda calculus of evoking strength and frequency weights (14).Thesealternatives vary in their degree of formalism. It is expected that future work witl better elucidate the features of these alternatives that were not seen in probability theory. The perceived differences between formal systems like probability theory and the alternatives may diminish as researchersidentify how
ResearchThemes
Knowledge Acquisition. A well-recognized bottleneck in building expert systems is acquiring knowledge from the expert. Work on TEIRESIAS, a program built to interface with MYCIN (19),demonstratedthat a program might assistin the on-line transfer of knowledge from a human expert to the consultation program's knowledge base. The expert could disagreewith a conclusion,and then the system would trace, step by step, back through the reasoning processuntil the erroneous rule (or missing rule) was identified. The SEEK program, which operated in concert with the EXPERT program mentioned earlier, also provided assistancein recognizinghow a system'sknowledge base should be altered (20). Focusing on actual cases,the system suggestsrefinements to the knowledge base, which take the form of adding or deleting entries from the "major findings" or "minor findings" of a disease. Explanation.MYCIN was one of the first systemsto demonstrate that explanation (qv) capabilities might be key to physician acceptanceof computer-baseddecision support (2L). MYCIN allowed users to ask "why?" when they were unclear about the purpose of the system's questioning and "how?" when they wanted to know how the system would (or did) reach certain conclusions. Researchersat MIT enriched the Digitalis Therapy Advisor (22) with causal models of heart rhythm disturbances and principles of antiarrhythmia therapy to create a computer program named XPLAIN (23), which could give the rationale behind a therapy. This work demonstrated that optimal explanation was facilitated by accessto the more abstract principles, which do not always appear in the program code. The goals of the NEOMYCIN project at Stanford University are to provide explanation of the diagnostic processin terms of diseasesand symptomsbut also in terms of the overarching principles of medical diagnosis.This work has included a revision of MYCIN's rules and the addition of an explicit model of diagnostic strategy (24 and 25). The ATTENDING system for anesthesia management planning first proposed the critiquing approach to explanation (16,17). Rather than simulating a physician's reasoning and generating a recommended action, critiquing systems center their analysis around the user's proposed management plan. In medical management there is often more than one defensible therapy, so an approach that hightights the pros and cons of each approach is more likely to meet acceptanceby the physician. In addition, critiquing systemsremain silent on the uncontroversial aspectsof the plan. Temporal Reasoning.Medical advice systems are usually designedwith the assumption that data are gathered and inferencesare made at one point in time. Since medical diagnosis and management actually take place over time, optimal medical advice systems would allow reevaluation of the patient, assessingthe rate of diseaseprogressionor the therapeutic responseto prior treatment. The Digitalis Therapy Ad-
MEDICALADVICE SYSTEMS
587
visor, VM, and ONCOCIN are unusual in that they have PR EM ISE: ( $AN D ( SAM E C N T XT IN F EC T PR IM AR Y- BAC T ER EM IA) (MEMBF CNTXT SITE STERILESITES) attempted to man age patients over time. The Digitalis Therapy Advisor (22) usesthe results of previous treatment to alter ( SAM E C N T Xr POR T AL G I) ) its model of the patient. For example,if predictedbody stores ACTION: (CONCLUDE CNTXT IDENT BACTEROIDES TALLY .7) of digitalis are much higher than measured stores,the system adjusts the "oral absorption" parameter downward. VM, a program designed to assist with the management of patients on IF : 1) T he i nfec ti on i s pr i m ar y - bac ter em i a,and respiratory-support systems (ventilators), assumesthat par2) T he s i te of the c ul tur e i s one of the s ter i l e s i tes ,and ticular data are only valid for a certain period of time, and the (26). this of 3) The suspectedportal of entry of the organism is the gastro-intestinaltract, An example systemcan representtemporal trends pressure blood arterial mean in rise a is VM's ability to detect THEN: There is suggestiveevidence(.7) that the identity of the organism is bactercides. of 15 torr (2kPa) over 10 min. ONCOCIN (27)followspatients Figure 1. Rule from MYCIN knowledge base. LISP code at top is through many cycles of cancer chemotherapy,each cycle lastdynamically translated to the prose explanation at bottom. ing weeks. Someof its inference rules are basedon the temporal trends of patient parameters (see also Reasoni.g, temproduction rule representation (9) and also for its innovative poral). model of inexact reasoning, the calculus of certainty factors judged (18). MYCIN became a laboratory for investigations into acby the usually Validation. Diagnosissystemsare accepted to some acquisition (19), metalevel reasoning(34), intelliknowledge compared when diagnosis their curacy of "gold standard." Credibility is gained by evaluating the pro- gent computer-aidedinstruction (32), explanation (21), and gr&ffi, informally at first and then in double-blind studies. Sev- knowledge-engineeringtools (35).The evaluation of MYCIN's eral groups have carried out formal evaluations of perfor- performancewas a careful, blinded study, which demonstrated mance (L4,28-30). Evaluation in a different clinical setting that MYCIN was competitive with expert clinicians (29). INTERNIST is designedto diagnosediseases,and combinafrom that in which the system was built has the advantage of demonstrating generalizability. Fewer groups have evaluated tions of diseases,within the extensive domain of internal medthe acceptability to users, and successin this area is notori- icine (L4). This system has been developedover a lO-year peously difficult to achieve.Systemsthat will involve hands-on riod by workers at the University of Pittsburgh. Diseasesare use by doctors face additional challenging design issues com- explicitly related to their clinical manifestationsin data strucpared to those systems that analyze instrument data and pro- tures resembling frames (Fig. 2). The strengths of association are captured in two numbers: "evoking strengths" and "freduce a report. Objectives and guidelines for system validation quency weights." Evoking strength representsthe degree to are discussedin Ref. 31. which the manifestation suggests the disease. Frequency weight representsthe likelihood of finding that manifestation Example Systems in the presence of the given disease.Patient data allow the program to contribute evoking strengths and frequency Several medical advice systems are now discussed in more illustrated weights to diagnostic hypotheses.Then, a high-level control are above mentioned issues detail. The theoretical choosesone of four hypothesis-directedcontrol schemes,(i.e., in three programs designed for medical diagnosis and in four conclude,pursue, rule out, and discriminate), depending on other programs concentrating on management. the number of active hypotheses and how closely they are program deinteractive clustered by weights of evidence. This higher level control is an Diagnosis Systems. MYCIN schememodels the hypothetico-deductiveapproachmentioned signed to be used as a consultant in difficult cases of earlier. The system can handle multiple coexisting diseases meningitis or bacteremia. It suggests a set of likely organisms (bacteria) and then proposes therapy that will treat those that through a clever partitioning algorithm that allows it to focus rule-based first the one of on the differential diagnosis of subsets of findings while it MYCIN was most significant. are holds the additional patient data in abeyancefor later investiexpert systems (18). The domain knowledge of MYCIN is repgation. Question selection is driven by whichever control resented in a set of abut 500 production rules. Most of these scheme INTERNIST has chosen.INTERNIST's inexact rearules encode associations between the findings and a hypothesoning technique is a logarithmic system of weights that are sis (Fig. 1). These if-then rules were easily understood by both additively combinedby an algorithm that was empirically dethe computer scientists and the physicians collaborating on rived. INTERNIST usesa coarsecost-classificationof findings program construction. Problem-solving behavior could be to decidewhich test to request next. In descendingorder of cost modified by altering a rule or adding new ones. However, valuthey are invasive labs, noninvasive labs, physical exam, and able knowledge about disease taxonoffiy, cause and effect, and history. In contrast to the MYCIN project, which emphasized temporal ordering between disorders was represented only imknowledge acquisition and explanation techniques, INTERNplicitly, somewhat buried in the rules. This frustrated atIST has concentrated instead on the comprehensivenessof its tempts to use MYCIN's rules for intelligent computer-aided knowledge base and the optimal strategic mode for a differeninstruction (32). As described earlier in the discussion of contial diagnosis. A careful evaluation of INTERNIST's pertrol, MYCIN backward chains through the rule base, although formance demonstrated broad diagnostic abilities (L4). INfor reasons of efficiency, rules are occasionally invoked in a TERNIST has been the inspiration for a new program, called data-directed fashion (33). Rules were meant to represent only CADUCEUS, which is intended to addressmany of the inadedomain knowledge, but ultimately they encoded a good deal of quacies of INTERNIST. Plans for CADUCEUS include excontrol logic as well. Later rule-based systems have attempted plicit modeling of diseasetaxonomies and cause-effect relato achieve a cleaner separation between control and domain tionships (36). level knowledge. The MYCIN research is well known for its
5BB
MEDICALADVICE SYSTEMS
AlcoholicHepatitis A G E 1 6 T O 2 5 ...0 I AG E 2 6 TO 5 5 ...03 AG E G T R T HA N 5 5 ...02 ALCOHOL INGESTION RECENT HX ...24 AL C O H OL IS MC HR ONIC H X ...24 SEX FEMALE ...0 2 SE X M AL E ...04 U R I N E D A R K H X . . . 13 W E I G H T L OS SGT R T H A N 1 0 PERCENT ...03 ABDOMEN PAIN ACUTE ... T 2 ABDOMEN PAIN COLICKY ... 1 1 ABDOMEN PAIN EPIGASTRIUM ... 1 2 AB D O M E N P A IN NON -C OL ICKY ... T 2 AB D O M E N P A IN RIGH T U P PER QUADRANT ... 1 3 AN O R E X IA ...04 D I AR R H E A A C UT E ...12 M YA L G IA ...03 VOMITING RECENT ... O 4 A B D O M E N B RU IT CON T IN UOUSRIGHT UPPER QUANDRANT ...I 2 A B D O M E N T E N DE R NE S SR IGHT UPPER QUADRANT ...24 CONJUNCTIVA AND/OR MOUTH PALLOR ... 1 2 FECES LIGHT COLORED ...1 2 FEVER ... O 4 H A N D ( S ) D UP U Y T R E NSCONTRACTURE( S)...12 J AU N D I C E ...I 3 L E G ( S ) E DE MA B T L A T E RA LSLTGHTOR M ODERATE ...12 L I V E R ENL A R GE D MA S S IV E...12 L I V E R E NL A R GE D MODE R ATE ...13 L I V E R ENL A R GE D S L IGH T ...12 PA R O T TDGL A ND (S ) E NL A R GED ...12 SKIN PALLOR GENERALIZED ...0 2 S K I N PA L MA R E R Y T H E MA ...13 S K I N S P IDE R A NGIOMA T A ...23 S K I N T EL A NGIE C T A S IA ...11 A L KA L I NE P HOS P H A T A S EB LOOD GTR THAN 2 TIMES NORM AL ...12 ALKALINE PHOSPHATASEBLOOD INCREASED NOT OVER 2 TIME NORMAL ... 1 4 BILIRUBIN BLOOD DECREASED ... 2 2 B I L I R U BIN UR INE P R E S E NT...24 CHOLESTEROLBLOOD DECREASED ...22 C H O L ES T E ROLB L OOD IN CREASED...12 H EM A T OC RIT B L OOD L E S STHAN 35 ... I 3 H EM O G L OB IN B L OOD L E S STHAN 12 ... 1 3 KE T O N UR IA ...I 2 PR O T EI NU RIA .,.I 2 sGoT 120 TO 400 ...2 3 s G o T 4 0 T O 1 1 9 ...23 SGOT GTR THAN 4OO...I 2 U R EA NIT R OGE N B L OOD L ESSTHAN 8 ...22 U R O BI L IN OGE N U RIN E A B SENT ...11 U R O BI L IN OGE N U RIN E IN CREASED ...24
Figure 2. A portionof onedisease"frame"usedby the INTERNIST computerprogram.The strengthof association betweenthe disease (alcoholichepatitis)and its manifestationsis capturedby the two numbersfollowingeachmanifestation. The first number,the evoking strength,representsthe degreeto which the manifestationsuggests The secondnumber,the frequencyweight,represents the disease. the likelihoodof finding that manifestationin the presence of the given disease. CASNET, developedat Rutgers University, uses a CausalASsociation NETwork to represent cause-effectlinks between hypothesesas well as associationallinks betweenfindings and hypotheses (7,8). An example of a cause-effect link in the system is elevated intraocular pressure causing visual field loss (Fig. 3). There are at least two advantagesto this causal net representation scheme. First, the system can trace from activated hypotheses,backward along the cause-effect pathways, to identify starting nodesin the network. Starting nodes are hypothesesfor which no causeshave been defined and are thus primary disorders. Second, if any intermediate node along this path is known to be false (a "denied node"), this causal pathway can be ruled out as a candidateexplanation for the patient's complaints. CASNET developedone of the more complex test-selection algorithms, in which a weighting schemeis used to selectwhich test should be donenext and to define when further tests or questions are unnecessary.For and each CASNET state S, there are appropriate tests Tr,z,...,n a current weight of evidence(W). Each 7; has an associated cost C;. Weight is separate from "status," although both are measures of belief. Status derives from rules that conclude about S. Weight, on the other hand, is basedsolely on the fact that certain states "causally connected" to S have positive status. The weight of S is calculated by multiplying the status
of a state that is causally connectedto S times the product of the "associative strengths" along the connecting links. One test selection strategy focuses on state Se that is currently considered the most likely state, surveys possible tests Tur,ur,...,hn, &TLd selectsTp,wrth the smallestCp,.If the quotient WplCp,exceeds a predeterminedthreshold Q, the program asks the user for the results of Tp,. If not, it goes to the next best state, repeatittg the test selectionprocedure.If no weight-cost ratio exceedsQ, the system stops. Managementsystems.The Digitalis Therapy Advisor (22) is a program designedto help physicians prescribe a doseof the drug digitalis for particular patients. This program uses body weight, dge, target serum concentration, and other parameters of a pharmacokinetic model and producesan initial dose estimate. Subsequent feedback about toxic and therapeutic states (qualitative information) then guides adjustments to that initial dose.One of the system'scentral data structures is the patient-specificmodel (PSM). The PSM includes not only clinical and laboratory data but also the reason for digitalis administration. This allows the system to evaluate therapeutic responsefrom subsequentclinical information. For example, if atrial fibrillation (an abnormal heart rhythm) were the reasonfor using digitalis, the system would look for heart rate decreaseto determine therapeutic response,but if congestive heart failure were the reason for using digitalis, the system would look for signs of decreasingcongestion,such as resolution of ankle edema.The Digitalis Therapy Advisor'scombination of math modeling and AI techniqueswas novel, especially in a system that took advant age of feedback about the patient's responseto earlier therapeutic actions. The researchissuesin VM (2O lay in modelingthe dynamic environment of the intensive care unit GCU). VM is rule based,using four classesof rules. A "status rule" definesgeneral clinical states(e.g.,"stable hemodynamics").Certain vital signs may imply stable hemodynamicsin one stage of ventilator management but not in another stage. The system generates expectationsof what parameter values (e.g.,blood pressure and pulse rate) should be found in a given clinical context.An exampleof a "transition rule" is one that identifies a return to the ventilator from another device called the "T-piece."This data-directedreasoningis neededbecausephysicians do not always inform VM of what they have done. An example of an "instrument rule" is one that identifies potentially artifactual readings. "Therapy rules" make use of the other three rule classesto recommend therapy. The program monitors patients over time, iteratively analyzing instrument readings, making conclusions,and if appropriate, printing messagesto the physicians caring for the patient. The ATTENDING system is best known for exposition of the critiquing approachto medical advicesystems(16,17).ATTENDING is designed to critique an anesthetist's plan for premedication,induction, intubation, and maintenanceof anesthesia.The system handles risk-benefit trade-offsin medical management through a technique called "heuristic risk analysis." The central data structure in ATTENDING is a hierarchy of augmented transition networks (ATN). These networks consist of terminal and nonterminal arcs (Fig. 4). Terminal arcs can be traversed directly and represent a choice of drug or technique. Traversing a nonterminal arc necessitates dropping into a subnetwork and finding a path through that network before popping up and continuing in the upper network. A therapy plan is a path that starts at the top net-
MEDICALADVICE SYSTEMS
z8[lu'f-il Diseose Coleqorres
a
oPENANGLE Aucoil4A GLAucOil4A
A
22tlittcLosuRE
Ar{cLE ACurEAr{cLE AcurE
,R 1 I l\l\ 1.il
8LX?,ffi,^ ,/,ffi
,/' ll ,/ | ! --a / Ill
C/ossrfrcononLrnks
Po t hophysroloqtco/ Slo tes
/
// ,/conneau I/
Cousol Lnks
EDEMATA
GLAUCOiIATOUS VISUALFIELD LOSS
ELEVATED INTRAOCULAR PRESSIJRE
Obser vot ions
Figure 3. Three-level description ofa disease processin CASNET. Observations are direct evidence about a patient. Pathophysiological states are connectedby causal links. Diseasecategories represent patterns of pathophysiological states. Reproducedwith permission from S. M. Weiss et al., Artif. Intell.LL,148 (1978).
work's start node, traverses the network with varying degrees of descent into subnetworks, and ends at the top network's finish node. Analysis pivots around the physician's proposed plan, heuristically collecting alternative plans that are roughly equivalent or superior to the user's choice.Comparisons are made using a "risk magnitude," which is an aggregate of probabilistic information and information about the utility of possible outcomes. Then, "contextual preference knowlrules" refine these comparisonswith more case-specific edge. Finally, another ATN producesa prose explanation of the analysis. The ONCOCIN system is designedto assist in the treatment of cancer patients (27). ONCOCIN is designedto help managepatients over time, interpreting the current sessionin tight of past information whenever necessary.ONCOCIN's domain knowledge is separated from the control. A central goal of this research effort is that the system be used regularly in a busy clinical environment. This imposes constraints on re-
sponsetime, which ONCOCIN addressesby running two independent processes.One of the processes,the Reasoner,performs most of the inference, and the second process,the Interviewer, controls the data-gathering interaction with the physician. Another constraint imposedby clinical use is that the electronic format of data-recording and display must not retard the physician. To this end, the ONCOCIN project has recently begun to transfer its program to LISP machines that use bit-mapped displays to duplicate the visual appearanceof flowcharts traditionally filled out by physicians. Although MYCIN is often describedas a diagnostic prograffi, its principal motivation was therapy planning. There are several goals to MYCIN's antibiotic therapy task, someof them conflicting with each other. One MYCIN project researcherfound the rule-basedformat a difficult representation to work with when he designeda therapy selectionalgorithm; in Ref. 37 he articulates the motivations and design for a "revised therapy algorithm" that he added to the program.
s90
MEDICALADVICE SYSTEMS
ANES, R E GI O N A T
INTUBATION
MAINTENANCE
INDUCTION
INTUBATI
popa IH IO PENTAL
INTUBATION' KETA M INE
MASKCRIC RAPIOSEQ NORMINT
SUCC INY LCHO L IN E
RAPIDSEQ. P A N C U R OI U NM
RELAXANT
' M AI N T E N A N C E LIGHT
SUCCINYLCHOLINE HALOTHANE
R E L A X A N T, GALLAMINE
ENFLURANE
ETOCU R I N PANCURONIUM
Figure 4. An ATN from ATTENDING. A proposedmanagement plan is traced in boldface.Courtesy ofIEEE, 1983. 'POP = ascent to the upper network.
Systemsin Clinical Use. The CASNET researchat Rutgers led to the first commercial application of AI in medicine, the Serum Protein Diagnostic Program (Helena Laboratories) (15).Two other AIM systemsin clinical use are PUFF (30) and ONCOCIN (27).All three of these systemsare usedby practicing doctors.The design requirements of PUFF and the Serum Protein Diagnostic Program are quite different from that of ONCOCIN, however. Both of those systems acquire the needed information automatically from instruments so that data collection, analysis, and recommendation can proceed without direct interaction with the physician. This is quite different from ONCOCIN, where the physician's hands-on interaction with the computer is a major design consideration.In general, systemsthat will be used interactively face additional design challenges:responsetime must be short, data collection and analysis must be simple and intuitive to the physician, recommendationsmust be backed up with good explanations, and finally system hardware and software must be reliably available. Summary Designing medical advice systems for clinical use has influencedthe evolution of AI during the last decade.The mutually beneficial relationship between protocol analysis and expert systems research are discussedabove, with emphasis on the role of AIM systems as laboratories for experiments in the
representation of causal and taxonomic knowledge,in explanation of reasoning, and in inexact inference.In more detail, example medical advice systems that have made important research contributions to AIM have been examined. The research challenges have not abated, but the future of AI research and applications in medicine promisesto be a fruitful one.
BIBLIOGRAPHY 1. B. G. Buchanan and E. A. Feigenbaum,"DENDRAL and MetaDENDRAL: Their applicationsdimension,"Artif. Intell. 11(1),524 (Le78). 2. P. Szolovits (ed.), Artifi,cial Intelligence in Medicine, Westview, Boulder, CO, 1983. 3. W. J. Clancey and E. H. Shortliffe, (eds.),Readings in Medical Artificial Intelligence, Addison-Wesley,Reading, MA, 1984. 4. J. P. Kassirer and G. A. Gorry, "Clinical problem solving: A behavioral analysis," Ann. Int. Med. 89r 245-255 (1978). 5. A. S. Elstein, L. S. Shulman, and S. A. Sprafka, Medical Problem Soluing:An Analysis of Clinical Reasoning, Harvard University Press,Cambridge,MA, 1978. 6. Reference3, Chapter 12. 7. Reference2, Chapter 2. 8. Reference3, Chapter 7.
MEMORYORGANIZATIONPACKETS g. G. F. Cooper, NESTOR: A Computer-BasedMedical Diagnostic Ph'D' Aid that integrates Causal and Probabilistic Knowledge, 1984' CA, Stanford, University, Dissertation, Stanford 10. Reference2, ChaPter 6' 11. Reference3, ChaPter 4. 12. Reference3, ChaPter 13. 13. Reference3, ChaPter 9. L4. Reference3, ChaPter 8. 15. Reference3, ChaPter 20. ,,ATTENDING: Critiquing a physician's manage16. p. L. Miller, ment plan," IEEE Trans. Patt. Anal. Mach. Intell. PAMI-5(5)' 449-461 (1983). 17. P. L. Miller, A Critiquing Approach to Expert computer Aduice: ATTENDING, Pittman, London/Boston, 1984' 18. B. G. Buchanan and E. H. Shortliffe, (eds'),Rule-BasedExpert Systems,Addison-Wesley,Reading,MA, Chapter 11, 1984. 18, ChaPter 9. Reference 19. ChaPter 18. 3, Reference 20.
591
and constantly being changed and created through learning of terms in memory generalization. Episodes are indexed in earlier from generalized been fnowledge structures that have examples.Episodesare always anaryzedat a number of levels gensimulturr"or*Iy. This allows information to be stored and abstract and eralizations to ue made at a variety of concrete levels during the processingof a single example. when a new episodeis ,rld.rclood, information is collected from all relevant knowledge structures and applied to the new example. Scripts
Historically, MOPs developedout of the understanding theory proposedin Ref. B, specifically from the knowledge structure known as a script (see scripts). scripts were designed to be used to explain events comprisedof stereotypical sequencesof actions such as visits to restaurants and doctor visits. Although scripts were successfully used in several languageunderstanding prograffis, they did have certain probleffis, 18. ChaPter 18, .Reference 21. particularly when used for memory and learning. ZZ. G. A. Gorry, H. Silverman, and S. G. Pauker, "Capturing clinical The main problem in using scripts as defined in Ref. 3 for expertise:a computer program that considersclinical responsesto memory and learning is that they are too large and monodigitalis," Anl.. J. Med. 64, 452-460 (1978)' Iithic. Several psychologicalexperiments,€.g.,Ref. 4, showed 23. Reference3, ChaPter 16. that peoplewould confuseevents that occurredin similar local 24. D. W. Hasling, W. J. Clanc€Y,and G. D. Rennels,"strategic expla- settings even if in different scripts. So, for example, a subject nations for a diagnostic consulting system," Int. J. Man-Mach' who read about an action that took place in a waiting room Stud. Z0, B-19 (1984). during a dentist visit might recall it as having taken place in a 25. Reference3, Chapter 15. story about a visit to a doctor. In addition, Iearning in situa26. Reference18, Chaptet 22. tions involving different scripts (such as different kinds of 2 7 . Reference18, Chapter 35. waiting rooms) would be difficult. 28. D. H. Hickam, E. H. Shortliffe, M. B. Bischoff, A. C. Scott, and The solution to these problems was to develop a system C. D. Jacobs,"A study of the treatment advice of a computer-based made up of a number of much smaller structures. Each struccancer chemotherapy protocol advisor," Ann. Int. Med. 103, 928ture describes a small chunk of information about events. 936 (Dec.1985). These chunks can be used by a variety of higher level struc29. Reference18, Chapter 31. tures, providing flexibility in memory organization and 30. Reference3, Chapter 19. learning. 31. Reference3, Chapter 30. 32. Reference18, Chapter 26. 33. Reference2, Chapter 3. 34. Reference18, Chapter 28.
Scenes
The basic unit of memory in the MOP theory is the scene.A sceneconsistsof actions that occur over a short period of time 35. Reference18, Chapter 15. in service of a specific goal. MOPs organize scenes.In Ref. 2 36. Reference2, Chapter 5. Schank divides scenesinto three basic classes:physical, soci37. Reference18, Chapter 6. etal, and personal. Physical scenesdescribe events that take place at a single location. Societal scenesare tied together by a G. RnNNELSand E. SuoRTLIFFE social relationship between people.Personal scenesare unified Stanford University by idiosyncratic goals unlikely to be shared by many people. MOPs can be broken down into the same three categories.In understanding, these classeslead to the questionsabout what happened physically, what happened socially, and what hapMEMORYORGANIZATIONPACKETS penedto the participants. Note that all types of memory struc(MOP) represenof unit is packet a tures are idiosyncratic-not just personal scenesand MOPs. A memory organi zation tation and memory organization in a theory proposedby Roger Physical and societat MOPs and scenesdescribe an idiosyncratic view that a person assumesto be shared by other people. Schank of Yale University to explain the way episodic inforto Most events that take place are, of course,not simply isomation is stored in human memory. The term is also used around built memory organizatlon lated scenes.Scenesoccur together in commonpatterns. This of theory refer to the entire this unit. The theory was initially introduced in Ref. 1 and has information is captured by MOP memory structures. Each been most fully explicated in Ref. 2. A number of computer MOP is a stereotypical sequenceof scenesor other MOPs tied programs that make use of MOPs have been developed to together physically, societally,or by a personalgoal. An event test the theory, primarily at the YaIe Artificial Intelligence is usually understood in terms of three or more MOPs-at least one at each of the physical, societal,and personallevels. Project. The basic idea behind the MOP theory is that representa- In understanding, for each MOP found to be relevant, the varitions of information are dynamic-knowledge structures are ous scenescan be collectedand used much as a script would be
592
MEMORYORGANIZATIONPACKETS Physical
Societal
Personal
M.GROCERY-SHOP
M.PURCHASE
M-MAKE.(MY).DINNER
Get-cart Examine-fruit Check-out
Make-selection Determine-availability Provide-payment
Preheat-oven Get-TV-dinner Eat-and-watch-TV
Figure 1. Examples of different MOP types with some component scenes.
in the theory of Ref. 3. The use of the samescenesin a number of MOPs increasesgenerally and the ability to learn. Figure 1 lists a typical MOP of each class,along with some component scenes.The physical MOP contains concrete, if stereotypical, information about a trip to a grocery store; the societal MOP involves social conventionsabout making a purchase; and the personal MoP is a very idiosyncratic one involving the preparation of a typical dinner for a specific person. To illustrate how MOPs are applied, the method that a person (or program) might make use of some of the structures in Figure 1 to understand a story (seeStory analysis) about a person going to a grocery store and coming home to make a TV dinner will be briefly considered.Understanding such a story would involve all three MOPs in Figure 1. The physical MOP would be used to understand, for example,why the patron took a shopping cart on the way into the store. The societal MOP might be used to understand why a patron who did not have enough cash was able to write a check. Such processingcould occur even if the understander only knew about checks in other situations. The personal MOP might be crucial to understanding if the patron turned on the oven before leaving for the store. The overall flow of understanding is to collect the various relevant physical, societal,and personal scenesand match incoming events against them. This provides the explanatory part of understanding. In the example above,an action in the story involving the patron at the checkout with a TV dinner will be understood in physical terms-the checkout sceneis being carried out; societally-a payment is taking place; and in terms of the patron's personal goal of having a TV dinner to watch television by, which is being achieved. Crucial to the idea of MOPs is that understanding must include learning (qv), where the understander examines the caseswhere input was not adequately explained and determines how the relevant scenesor MOPs should be modified to enable later understanding to take place. Depending on whether earlier expectation violations were similar, the generaltzations underlying one or more scenesor MOPs might be changed or this violation will be indexed so that it can be found if there are similar violations in the future. Developing an algorithm for determining exactly which structures should be modified is one of the most difficult aspectsof implementing a MOP-based computer system. In people, accurately determining which structures should be modified seems to be an important component of intelligence. The first two computer experiments that made use of MOPs were CYRUS (5), developedby Kolodner, and IPP (6), developed by Lebowitz. CYRUS used MOPs to store detailed descriptions of episodesabout a single individual. Due to the rich nature of the descriptions and the generalizations made from them, CYRUS was able to answer a variety of questionsabout its memory. CYRUS was also used to study the reconstructive nature of memory retrieval and question answering (qv). IPP used a MOP-type memory structure to organize information
taken from news stories about international terrorism. The generaltzation-basedmemory created from the articles was useful both in studying the cognitive process of organizing information and as a prototype intelligent information system. The use of dynamic memory structures in text understanding was also a major part of the development of IPP. MOPs have subsequentlybeen used in a number of other computer experiments. MOPs have been used to assist language understanding, in Lebowitz's intelligent information system,RESEARCHER (7), and Lytinen's translation system, MOPTRANS (8). They have also been used in a number of problem-solvingsystemssuch as Kolodner'sMEDIcase-based ATOR (conflict mediation) (9), Bain's JUDGE (criminal sentences)(10),and Hammond'sWOK (cooking)(11).All of these systemsare describedin Ref. 12.
BIBLIOGRAPHY 1. R. C. Schank,"Languageand memoryi'Cog.Sci. 4(3),243-284 (1e80). 2. R. C. Schank,DynamicMemory:A Theoryof Remindingand Learning in Computersand People,Cambridge University Press, New York, L982. 3. R. C. Schank and R. P. Abelson, Scripts,Plans, Goalsand Understanding, Lawrence Erlbaum, Hillsdale, NJ, L977. 4. G. H. Bower, J. B. Black, and T. J. Turner, "scripts in text comprehensionand memoryi' Cog.Psychol. ll, L77-220 (1979). 5. J. L. Kolodner, Retrieual and Organizational Strategiesin Conceptual Memory: A Computer Model, Lawrence Erlbaum, Hillsdale, NJ, 1994. 6. M. Lebowitz, "Generaltzatton from natural language text," Cog. Sci. 7(l), 1-40 (1983). 7. M. Lebowrtz, RESEARCHER: An Experimental Intelligent Information System, Proceedingsof the Ninth International Joint Conferenceon Artificial Intelligence, Los Angeles, 1985, pp. 858-862. 8. S. L. Lytinen, Frame Selection in Parsing, Proceedings of the Fourth National Conferenceon Artificial Intelligence,Austin, TX, 1984,pp.222-225. 9. J. L. Kolodner, R. L. Simpson,and K. Sycara-Cyranski,A Process Model of Case-BasedReasoningin Problem Solving, Proceedings of the Ninth International Joint Conferenceon Artificial Intelligence,Los Angeles, 1985, pp. 284-290. 10. W. M. Bain, Assignment of Responsibilityin Ethical Judgments, in J. L. Kolodner and C. K. Riesbeck(ed.),Memory, Experience and Reasoning,LawrenceErlbaum, Hillsdale, NJ, 1985,pp. L27138. 11. K. J. Hammond, Planning and Goal Interaction: The Use of Past Solutions in Present Situations, Proceedingsof the Third National Conferenceon Artificial Intelligence, Washington, DC, 1983, pp. 148-151. L2. J. L. Kolodner and C. K. Riesbeck(eds.),Memory,Experienceand Reasoning,Lawrence Erlbaum, Hillsdale, NJ, 1986. M. Lenowrrz Columbia University
MEMORY,SEMANTIC
MEMORY,SEMANTIC The term semantic mernory gained currency in AI with the publication of Quillian's memory models in the late sixties (1,2).The term remains associatedprimarily with Quillian's models and their direct descendants,whereas the broader term semantic networks (qv) (or nets) is preferred for the full range of networklike memory models and knowledge-representation (qv) formalisms. (The terms associatiuenLernory(qv) or ossociatiue networks are also used.) In cognitive psychology (qv), semantic memory is often distinguished from episodic memory (qv), where the former serves as a long-term store of knowledge needed for language understanding (see Naturallanguage understanding) and the latter servesas a long-term store of information about specific episodesand events, especially personal experiences(seeRef. 3 for the original formulation of this distinction). This distinction has been less important in AI research on language understanding and question answering (qt), where both kinds of knowledge are usually represented in a formally uniform, structurally integrated fashion (however,seeRef. 4 for a sophisticatedsystem incorporating the distinction). Quillian'sModels Quillian's models were foreshadowedin networklike representations of sentencemeanings developedin the late fifties and early sixties by researchers in Mechanical Translation (see Machine translation) (e.g., M. Masterman of the Cambridge Language ResearchUnit and S. Ceccatoof Milan University). However, this work lacked many of the most important features of Quillian's approach,perhaps most crucially his emphasis on the role of a large body of associatively interconnected knowledge about language and the world in language understanding. In essence,Quillian's models consistedof nodesrepresenting word sensesor their properties and links interconnecting these nodes and providing associativepathways for processes presumed to underlie language comprehension. In his first model the nodesand links were layered into planes (1). Each plane was headed by a type node for a word sense and contained a set of token nodes accessiblefrom the type node through a series of links. The first such token node supplied the superclassof the word sense,whereas the remaining token nodes supplied its additional properties (where a property could be specified by any number of token nodes interconnected by links of certain fixed types). For example, the plane for PLANT1 (a living plant) specifiedA STRUCTURE as its superclassand included LIVE, not ANIMAL, WITH3 LEAF, and GETS FOOD FROMB AIR OR WATER OR EARTH as additional properties.All token nodesin a plane were linked by interplane links to the type nodesthey referenced(such as LIVE, ANIMAL, WITH3, etc.). As a first step toward simulating the human language-understanding process,Quillian constructed programs for comparing and contrasting pairs of words, such as cry and comfort, plant and liue, or planl and man. The programs producedsimple verbalizations of the relationships they discovered, expressedin terms of disambiguated sensesof the given words. They relied on intersection searches(seeSearch,bidirectional) to accomplishtheir task: beginning at the word sensenodesof the given words, they propagated "tags" outward along links in pseudoparallelfashion, keeping track of the paths traversed by the tags. Nodes at which the "spheres of spreading activa-
593
tion" intersected were noted, and the paths to them were used to generate the descriptions of similarities and contrasts. For example, comparison of plant and man led to intersections at the ANIMAL and PERSON nodes. The paths to ANIMAL al' Iowed generation of the contrasting sentences "PLANTI IS NOT A ANIMAL STRUCTURE" and "MANI IS ANIMAL," and the paths to PERSON allowed generation of the compari' son sentences "TO PLANTS IS FOR A PERSON TO PUT SOMETHING INTO EARTH" and "MANB IS PERSON." Quillian suggested that intersection searches underlie the elimination of lexical ambiguity in language understanding. He also noted the importance of memory pathways for inference (qv). In particular, a path connecting successivelyhigher level concepts can mediate property inheritance; i.e., lower level conceptslying on such a path inherit the properties ofthe higher level concepts (see Inheritance hierarchy). In a follow-up project, called the Teachable Language Comprehender (TLC), Quillian slightly revised his memory model, replacing type nodes by "units" and token nodes by "properties" (2). Units provided explicit slots for a superset pointer and pointers to refining properties. Properties in turn provided slots for a pointer to an attribute (somepredicative concept),a pointer to an attribute value (some particular concept), and possibly further "subproperties" augmenting the attributevalue specification. The goal ofthe TLC project was to expand the knowledge stored in semantic memory by having the system read text. Its successwas limited by its nearly exclusive reliance on intersection searches,with little syntactic analysis or inference, and no pragmatic analysis. Direct Extensions In a series ofpapers during 1969-1972, Collins and Quillian clarified the notion of a property inheritance hierarchy, the semantic distance (number of intervening links) between two concepts, and the relevance of these notions to the theory of human memory, as indicated by reaction time studies (5'6). Collins and Loftus refined Quillian's ideas about intersection searches(spreading activation) and evaluated their psychological import in detail (7). Carbonell implemented a CAI program called SCHOLAR (specializing in the geography of South America) around a TLClike semantic memory (8). The memory used a refined measure of semantic distance, with "irrelevancy tags" serving to increase this distance where appropriate. Anotherpotable extension was McCalla and Sampson's MUSE (9), in which a nontrivial syntactic component was used to improve both the range of input sentences the system could convert into TLC memory format and the quality ofoutput sentences. Further Developments:SemanticNetworks Quillian-like memory models and knowledge-representation (qv) formalisms evolved along several dimensions from 1970 onward. First, their expressive power was augmented to permit representation of episodic information, information about the knowledge, beliefs, etc., of other agents (see Belief systems), and logically compounded and quantified information. Second, ideas about the structure of knowledge at levels "above" the logical level were incorporated into them, including nested subnets or partitions, framelike or schemelike structures, and taxonomies ofparts and topics (in addition to the original taxonomies of concepts). Third, they were augmented with knowledge in the form of procedures directly as-
594
MENU-BASED NATURATTANGUAGE
sociated with stored concepts. And finally, possible ways of implementing semantic-memory models as active parallel networks were studied, with a view toward building practical intelligent systems or advancing the theory of human memory. The first three lines of development are covered in several collections of articles (10 -12) and surveys (18-15), and the last in a collection by Hinton and Anderson (16).
BIBLIOGRAPHY 1. M. R. Quillian, Semantic Memory, Report AD-6 4LG71,Clearirrghouse for Federal Scientific and Technical Information, 1966. Abridged version in M. Minsky (ed.), Semantic Information pro_ cessing,MIT Press, Cambridg", MA, 196g, Chapter 4. 2. M. R. Quillian, "The TeachableLanguage comprehender 1ACM ,,, 12,459_475(1969).
NLMenu grew out of research on building conventional natural-language interfaces-the kind where users are invited to type whatever questions they have and the naturallanguage-understanding system will do its best to decipher what the user means. However, the performanceand usabllity of conventional natural-language systemsis limited. NLMenu is an attempt to overcome these limitations. NLMenu also providessomeopportunitiesthat are not possiblewith conventional natural-language systems. Problemwith ConventionalNatural-Language Systems
A conventional natural-language system is one in which the user is presentedwith a blinking cursor and the opportunity to type in whatever question he has. It is then the natural-language system's problem to understand what the user wants and respond appropriately. A number of problems with this 3. E. Tulvirg, in E. Tulving and W. Donaldson(eds.),Organizationof approach are described(4). Discussionof these problemshelps Memory, Academic Press,New York, L972. to clarify the benefits of NLMenu. 4. w. G. Lehnert, M. G. Dyer, p. N. Johnson, c. y. yang, and s. First, there are mechanical problems. Many users do not Harley, "BORIS-an experiment in in-depth understanding of know how to type or do not type well. Users often have considnarratives," Artifi.cial Intelligence 20, 1b-62, 1gg3. 5. A. M. Collins and M. R. Quillian, Experimentson SemanticMem- erable difficulties with spelling, which can causeproblems for ory and Language Comprehension,in L. W. Gregg (ed.), Cognition language-understandingsystems. Finally, users often have trouble getting started. They can find it difficult to articulate in Learning and Memory, wiley, New york, pp. 1L7-L87, Lg72. what they want to say despitehaving very explicit problemsto 6. A. M. Collins and M. R. Quillian, How to make a languageuser,in E. Tulving and W. Donaldson(eds.),Organizationof MemoU,Aca_ solve. Next, there are problems with understanding language. It demic Press,New York, pp. 809-Bb1, Lg7Z. is not uncommonto ask a question in a way that conventional 7. A. M. Collins and E. F. Loftus, "A spreadingactivation theory of systems do not understand; but if properly rephrased,these semantic processing,"Psychol.Reu. gz, 407-429 (lg7b). questionscan be understood.This is called exceedingthe lin8. J. R. Carbonell, "AI in CAI: An artificial intelligence approachto guistic coverageof the system. With lots of hard work, system computer-aided instruction," IEEE Trans. Man-Mach. svs. MMS-I 1, 190-202 (1970). developersmight anticipate every possible synonym, para9. G. I. McCalla and J. R. Sampson,"MIJSE: A model to understand phrase, metaphor, or point of view and prepare the naturallanguage system for them all. Thus, with enough hard work, simple English," CACM lE, 29-40 (L972). 10. E. Tulving and W. Donaldson(eds.),Organizationof Memor!,Aca_ the problem of linguistic coveragecould be effectively eliminated. Notice, however, that this could be difficult-imagine demic Press,New York, L972. 11. D. G. Bobrow and A. Collins, .Representation and (Jnd,erstand,ing, providing all possiblesynonymsfor all of the databasevalues and keeping them current with a dynamically changing dataAcademic Press,New York, Lg7E. base. (ed.), 12. N. V. Findler AssociatiueNetworks, Academic press, New A problem related to exceedingthe linguistic coverageis York, L979. exceedingthe conceptualcoverageof the system.If one were to 13. A. Barr and E. A. Feigenbaum,Handbook ofA.I, Vol. B, william ask "How many trucks did we ship in January?" he might be Kaufmann, Los Altos, CA, pp. B6-G4, I}BZ. told that the system did not understand the query. He may 14. R. J. Brachman, On the EpistemologicalStatus of SemanticNetworks, in N. V. Findler (ed.), AssociatiueNetworks, Academic assumethat he had exceededthe linguistic coverageand rephrase, "How many January truck shipments did we have?" Press,New York, pp. 3-50, 1g7g. 15. G. D. Ritchie and F. K. Hanna, "semantic networks: a general He might again be told to rephrase,and this could go on until he ran out of patience.The problem could be that the system definition and a surveyi' Inf. Technol. Res.Deu.2, 1988. 16. G. E. Hinton and J. A. Anderson(eds.),Parallet Modetsof Associa- doesnot know about truck shipments.If so,the questionshave exceededthe conceptualcoverageof the system. tiue MeffioU, Lawrence Erlbaum, Hillsdale, NJ, 1981. The limits of coverage,both linguistic and conceptual,are difficult for users to infer. They tend not to learn quickly what L. K. ScHuspnr is acceptable and what is not. Part of the problem is that University of Alberta natural-langu agesystemsfail in very different ways from human understanding so the strategies for making oneself unMENU.BASED NATURALLANGUAGE derstoodin person-to-personconversationsdo not apply to person-to-computerconversations. Menu-basednatural-langu ageunderstanding (NLMenu) is an The last major set of problems relates to the implementaapproach to natural-language interfaces (qv) that combines tion of natural-language systems.Conventional natural-lanthe expressivepower of natural language with the easeof use guagesystemstend to be quite large. Indeed,they must anticiof menus (1-3). The NLMenu approachis unique in that I00Vo pate every likely synonym and paraphrase of questions from of the queries entered through NLMenu will be understood,it users. If they are to provide accessto large databases,they provides a convenient way of combining textual and graphical must at least be large enough to accept the database values input, and its simplicity allows interfaces to many applica- and synonyms for those values. Large natural-language systions to be automatically generated. tems require computers with large memories.
Y?il|drrJ
Fnrtd x rhiprnts the lest lftnrGtr <spccifb vcndor) <spcclfrc prrt) <spccific sniPnt>
Figure 1 NLMenu Interfaee - SPO with
FFtr ;FF *'hh wtich
Rubout SaveQuerg
Re-start 0 u t p u t l li n d o r eatures
of
S h o uQ u e r g R e t ri e v e Q u e r g
parts
Figure 2
Execute D eI e t e Q u e r g
prrt
color
D.rt nfiE prr"t prr# prrt weipt rre sold btt
Suspend
Figure 3
p*;; rt**
----1.**----
|il*t
f;il------l
________l----------l------1----------l i ____i_ I Pr2l ip-si
Bo-rl c*ti
cREElf l a-trl
!?l r2l
tlllgl Pmlsl
Figure 4
LANGUAGE NATURAL MENU.BASED
597
with language understanding itself. As noted above, all queries entered through the NLMenu interface are accepted-the NLMenu solves the problems with conventional natural-Ian- user gets no opportunity to composea question that would not guage systemsoutlined above.In this sectionNLMenu is illus- be accepted.As a result, the problem of linguistic coverage lratea. in the last sectionthe solutions to the various problems disappears.Similarly, one cannot exceedthe conceptualcoverof natural-language systems are discussed, as well as the age of the system-that problem disappearsas well. Notice unique advantages of NLMenu. that the problem of exceeding the coverage has disappeared Users build questionswith NLMenu by selectingwords and not becauseof the massive work of finding all possibleparaphrases from menus. Figure 1 shows the processin progress. phrases but because of elimination of the need for paraT'ft. user has selected "find" and "all features of" from two phrases. successivemenus and is about to select"parts," the boxedword NLMenu interfaces require less memory and processing in the white-background menu. Notice that the sentencehe is than do conventional natural-langu age systems.They do not buitding appearsin the window near the middle of the screen. need to sift through large grammars and dictionaries to anaWhen he selects"parts," severalthings will happen.First, the ryze sentences.There are also ways of expressing database sentenceunder construction will be updated to "Find all fea- queries in such a way that the interfaces can be generated tures of parts." Second,other menus will becomeactive (indi- automatically from a description of the database (5). In fact, cated *ittt white backgrounds). Third, the contents of those the example interface for this entry was automatically gennewly activated menus will be restricted to only those phrases erated. that make sense following "Find all features of parts." This As discussedabove, NLMenu has many advantages over last point is important becausethis is why all sentencesen- conventional natural-Ianguage systems. It has the same extered into NLMenu will be understoodby NLMenu. For exam- pressive power as conventional systemsbut solvesthe biggest ple, the large menu at the right of the screen will be made proble*r thut natural-Ianguage systemshave. It also provides active, and such phrases aS "who ship" and "who supply," opportunities such as mixing textual and graphical input and which do not make sensefollowing "Find all features of," will automatically generating new interfaces from a description of not appear, &s shown in Figurc 2. an application. In Figure 3 the user is about to select"(name)," indicating One question that is frequently asked is whether NLMenu that he will specify specific part names, which are specific understands language. There are two answers. If convendatabase values. One advantage of the NLMenu approach is tional natural-Ianguage systems understand language, then that the system always knows when the user intends to enter NLMenu must also. Behind the menus it usesthe same techspecifi.cvalues and so can provide assistance in expressing nolory as they do, representing and translating questions in those values. The assistancecan be as simple as a menu of all the same way that they do. Behind the menus one cannot tell relevant values (in this caseall part names).Another applica- the difference between conventional and NLMenu interfaces. tion could present an opportunity for graphical input such as a Assuming that conventional systems understand language, map. The user can enter latitude and longitude values by the answer is yes. The other answer to the question is "Who pointing at the area of interest on the map. In this w&Y, cares?" This is a technology, and the appropriate forum for NLMenu provides a convenient way to combinethe expressive evaluating technology is in solving problems.If it provides a power of natural language with the easeof expressingspatial flexible, mnemonic, and powerful interface, what difference relationships with graphical input. doesit make if it is declaredthat it doesor doesnot understand Figure 4 shows the completed query "Find all features of language? parts with part name cam or bolt." The data satisfying this command are shown at the bottom of the screen.The user still has an opportunity to further restrict his query by selecting BIBLIOGRAPHY
lnterfaces Natural-Language Menu-Based
"and" or "or" and adding other clauses. of NLMenu Performance NLMenu interfaces provide the same expressivepower as conventional natural-langu agesystems,but the problems of conventional systems are largely eliminated. First, the mechanical problems: typing, spellitg, and articulating questions. With an NLMenu interface there is no typing. Sentencesare built through menu selection.With an NLMenu interface, the user is presented with words and phrases from which he sees what types of questionscan be asked.Instead of composinga question, one can think of it as recognizinghis question-an easier task-one phrase at a time. In addition, if the system provides useful but unexpectedcapabilities, such as a graphing option for numerical data, the existenceof those capabilities is revealed through the menus. In fact, the full extent of the coverage of an NLMenu system is revealed through the menus. With a conventional syst€ffi, the user must guessthe coverageof the system or find it through trial and error. The most dramatic advantage of the NLMenu interface is
1. H. R. Tennant et al., Menu-BasedNatural Language Understandirg, Proceedings of the Conferenceof the Association for Computational Linguistics, Cambridg", MA, pp. 151-158, 1983. 2. H. R. Tennant, K. M. Ross,and C. W. Thompson,Usable Natural Language Interfaces through Menu-Based Natural Language Understanding," Proceedings of the Conferenceon Human Factors in Computing Systems,Cambridge, MA, 1983. 3. C. W. Thompson, Using Menu-Based Natural Language Understanding to Avoid Problems Associated with Traditional Natural Language Interfaces of Databases,Ph.D. Dissertation, Department of Computer Science,University of Texas at Austin, 1984. Ph.D. 4. H. R. Tennant, Evaluation of Natural LanguageProcessors, Dissertation, Department of Computer Science,University of Illinois, 1980. 5. C. W. Thompsonet al., Building Usable Menu-BasedNatural Language Interfaces to Databases, Proceedings of the Ninth International Conferenceon Very Large Databases,Florence,Italy, pp. 4345, 1983. H. TnrvNANr Texas Instruments
598
MERLIN
MERTIN A system that implemented Newell's data-flow graphs for heuristic search (see Heuristics) encodedas schemas,MERLIN was developedaround 1971by Moore at Carnegie-MellonUniversity. It represented Newell's Logic Theorist prograffi, with the generators and tests derived by hand. MERLIN could prove theorems (see Theorem Proving) by executing the schema (seeJ. Moore and A. Newell, How can MERLIN Understand?, in Gregg L. (ed.), Knowledge and Cognition, Erlbaum Associates,Hillsdale, NJ, pp. 253-285, L974). K. S. Anone SUNY at Buffalo
LES, META-KNOWLEDGE, META-RU AND META.REASONING AI research involves building computer systems capable of reasoning (qv) and acting in a variety of environments. For example, these computer systems, or cognitive agents as they are sometimes called, should be capable of talking with other cognitive agents, advising people in complex tasks, and interacting with the world by perceiving situations and carrying out actions. The nature of knowledge is crucial for this research. When building these systems, one must think in terms of what they have to know to perform these tasks. Similarly, one must analyze the performance of cognitive agents in terms of their knowledge. Thus, a system knows about the objects in its domain of application, about how to perform a certain activity, or about the events that take place during that activity. Research on knowledge representation (qv) in AI concerns the search for models of knowledge that will enable systems to behave intelligently. A particular representation for knowledge is a combination of data structures and procedures that, if represented and used adequately in a program, will lead to intelligent behavior. The knowledge contained in an intelligent system is, for the most part, embodied in these data structures, generally called the knowledge base, and represents the propositions that the system knows or believes. Some of the propositions are represented explicitly, whereas others can be derived from those by applying inference rules. The process of deriving new propositions is done by the inference system (see Inference), either when new information is added to the knowledge base (forward inference) or when a query is posed to the knowledge base (backward inference). Forward inference enables the cognitive agent to make new deductions with information perceived from the world, and backward inference enables the cognitive agent to find out answers to its problems (see Processing, bottom-up and top-down). Although the inference system allows the cognitive agent to perform reasoning, it does not allow it to act in the world. To do this, the cognitive agent needs an acting system that executes actions. Since most actions are not trivial, they must be planned first; thus, the cognitive agent needs, in addition, a planning system that derives appropriate plans to be given to the acting system. The issues involved here are the subject of research in planning (qv), another field of AI. The problem in trying to formulate a plan of action to achieve some goal comes from the multiple interactions that can exist between the sub-
actions that constitute the plan and from the fact that the agent may not have enough information to formulate the plan. It is often necessary to reason about what knowledge is needed to carry out a plan and how that knowledge can be obtained. A typical action that these cognitive agents may need to perform is to conduct a dialogue with other cognitive agents. This action, besides the problems common to other actions, has problems of its own. They are the subject of research in natural-language understanding (qv), another field of AI. One of these problems is that a cognitive agent engaging in a dialogue has to take into account the knowledge possessedby the other cognitive agent. This usually requires having a model of that agent's knowledge and reasoning about what that agent knows. Thus, the main question to be answered by these fields of AI is: What kinds of data structures and procedures must the agent know about and how should they be used by the agent in order to make it behave intelligently? Research in these fields has led very soon to the conclusion that, among other things, a cognitive agent must know about objects, states, and actions. In addition, it is now strongly believed that knowledge about the extent and organizatron of its own and others' beliefs, about how to use its own reasoning rules, about how to perform an action, and about its own and others' performance are important aspects of intelligent behavior. Several researchers have suggested the use of metaknowledge, meta-rules, and meta-reasoning to accomplish the integration of all these features in a single cognitive agent. In a general sense, meta-knowledge is knowledge about knowledge as opposed to knowledge about "things in the word" (1). It enables a reasoning system to "know what it knows" and to make multiple use of its knowledge (2). In addition to using its knowledge directly, the system may have other abilities: knowing what it knows and what it does not know (consciousness) (1-f0); knowing where and how to use knowledg" to infer other knowledge (planning reasoning or meta-reasoning) (5,6,9- t4); knowing where and how to use knowledge to perform actions (planning acting) (9,10,15-f9); explaining how and why it used its knowledge (explanation (5,6,9,10,11); and examining its own knowledg", modifying it, abstracting and generalizing it, and acquiring new knowledge (learning) (2,4-6).
Foundations in Human Cognition The idea of incorporating meta-knowledge in knowledge-based systems has its foundations in human cognition. Although simulation of human cognition by a machine is not needed or even desired, AI researchers continue to search for answers in human cognition. This is reasonable for two reasons. If one considers that the goal of AI is to make one better understand human cognition, one must test the theories developed in psychology through the use of computer models. If, on the other end, one considers that the goal of AI is to develop machines to help humans in activities requiring intelligence, those rnachines must reason and act like humans so that they can interact smoothly. Bahr (1) describes some studies of human behavior that demonstrate people's ability to reason about what they know and about how they reason, suggesting that meta-Ievel knowledge and reasoning are an integral part of common cognitive activity in human experience. In his words:
AND META.REASONING 599 META-RULES META.KNOWLEDCE, the concept of meta-leuel knowledge captures intrinsic, corn' mon-plaie properties of human cognition that are central to an und.erstand,ing of knowledge and intelligence.
tion was to give these systems the capability of reasoning about knowledge that was not available previously. Each of these motivations is discussedbelow in greater detail.
Acquisition and Maintenanceof Knowledge.The development of expert systems,i.e., programs that are skillful in a specific domain of application, emphasizedthe importance of Iarge stores of domain-specificknowledge as a basis for high p"tfot*ance (20). Assembling and modifying the required knowledge base is a complexprocessthat involves great expertise and careful maintenance. This is usually an ongoing task that often extends over several years and, due to the high dependency of related facts and rules, is often error prone. A key element of this processis the transfer of expertise from a human expert to the program. Due to the expert's lack of knowledge about programming, this usually requires the mediation of a human programmer, called the knowledge engineer. However, this transfer of knowledge through the knowledge engineer has some problems. First, the knowledge engineer is not an expert in the specificdomain of application. Second,since most of the expert knowledge is heuristic and experimental, the expert is not capableof conveying it directly to the knowledge engineer. The processusually extends over many sessionsin which the knowledge engineer struggles to extract the knowledge from the expert. This suggeststhat the expert should be able to interact directly with the program. Of course,the program has then to supply the same kind of assistance the knowledge engineer would provide and if possiblein a more efficient and flexible way. Davis and Buchanan (2,4,5) suggestedthe use of metaknowledge to enable the system to provide this kind of assistance. Management of knowledge presents a real problem sincethe internal organization of the data structures and their interrelationships with other data structures are very complex. It is diffieult for the expert to keep all these in memory, Lspecially when they are constantly changing, as occurs during the initial phasesof development where the refinement of successiveprototypes takes place. A secondproblem is that studied this phenomenon and pointed out some conditions that documentation is usually not well organized and updated, and of importance relative increase certainty in these beliefs: the consequently, changing the system is not a trivial task. Anthe is, That area. topic in the expertise own the fact and one's other problem is that since the expert doesnot know about all more important the fact is and the more one's own expertise in knowledge stored in the knowledge base, it is not easy for the the topic area, the more certain one is of something not being him to discoverwhat knowledge should be addedto the system true if not remembered. its performance. As the size of the domain of speincrease to people that evidence strong constitute phenomena These cific knowledge increases, maintenance becomesa more and have an intuitive knowledge of the extent and importance of complextask. Systemsthat allow the explicit declaration more their own knowledge. The concept of meta-knowledge captures data structures in their representation schemes, meta-level of metathat it seems this property of human cognition, and that allow the encodingof data structures that formalisms i.e., sysreasonittg AI of behavior the improve knowledge could "describe"other data structures, will possiblybe a solution for tems. this problem (2,4-6).The system can then assist and advise the user in modifying its knowledge and can even provide Motivations expectations concerning what knowledge should be acquired next. There were also several problems faced by AI knowledge-
several cognitive phenomena illustrate the importance of meta-krrowfedg" uttd meta-reasoning in human experience. For example, the tip-of-the-tongue phenomenon suggests that one has knowledge about our knowledge. This phenomenon happens when on" knows that one knows some fact even though one cannot recall it. Another phenomenon that is common in human cognition is the knowing-not phenomenon studied by Kolers and Palef (7). It is illustrated when one knows rapidly and reliably that one does not know something. Kolers and palef collected data that suggest people know what they do not know without having to search their positive knowledge and that negative knowledge is accessed as directly as positive knowleag. and sometimes even more rapidly. This phenomenon is not easily captured by common searching models of memory, where negative judgments are made only as a result of a search of positive instances that end in failure. The fact that some negative knowledge can be accessed more rapidly than some positive knowledge suggests that not even parallel processing can accommodate this fact. Another interesting ph"no-enon is what one can conclude from the fact that one do"r not know something. Bahr (1) views this phenomenon as directly related to meta-knowledge and the knowing-not phenomenon since such reasoning presumes some awareness of not knowing some fact. In the lack-of-knowledge inference the fact that one would know some fact if it were true, but one does not remember it, makes one believe that it is not true. For example, if one were asked if the President had died one month &go, in normal circumstances it would not be a reasonable answer to say "I don't know." Although one could not find either a positive or a negative answer, the fact that the death of the President is such an important fact, if it had occurred, one should have known about it. Therefore, since one does not know about it, one can conclude that he did not die. Collins (3)
based systems, namely expert systems (qv), that motivated the use of meta-knowledge and meta-reasoning in those systems. One problem was how to do acquisition and maintenance of knowledge. Other problems concerned the reasoning process. One problem was how to control or plan the reasoning process in those systems. Another problem was how could they explain their reasoning behavior in an intelligible manner. These problems with the reasoning process apply to any kind of activity, not just to reasoning. Finally, a more general motiva-
Planningthe ReasoningProcess.A secondmotivation for using meta-knowledgein knowledge-basedsystemsis to control or plan the processof reasoning (5,6,8,11-14).At each cycle of the reasoning process,the system must reason about how to reason,i.e., must do meta-reasoning.At a certain point, adding more object-level knowledge to the system will no longer improve performance. What is neededis some knowledge about how to use the object-levelknowledge selectively. In fact, part
600
META.KNOWTEDCE, META.RUTES AND META-REASONING
of the definition of intelligence includes appropriate usage of information, not just brute force; so even if the amount of object-level knowledge is small, it is important to use it wisely (5). Also, a main weaknessof reasoning systemscomesfrom the fact that they use a severely limited and predetermined subset of reasoning strategies.Sacerdotr(21) suggeststhat a significant number of strategies should be integrated into a single system. Generally, current AI paradigms have only one strategy, and even that one is embeddedin the inference processor. This implicit inclusion makes the systemsinflexible and hard to modify and expand. Therefore, it would be convenient to represent explicitly these strategies by meta-rules, i.e., by rules that indicate how to use other rules. One could now changethe stratery of the system very easily just by changing the rules in its knowledge base. One could also write rules describing different strategies and have meta-rules of even higher order to decidewhich strategy to choosein each particular situation (6,11- 13). Explainingthe ReasoningProcess.An essentialaspectof the interaction between cognitive agents is the explanation of their reasoning. Explanation (qv) and meta-knowledge are generally associatedsince both constitute a trend toward declaratively representing knowledg" that previously was encodedprocedurally. Moreover, if meta-rules encodestrategies to plan the act of reasonirg, an explanation facility gives an account of the planning decisionsduring reasoning. However, much more researchhas to be done to allow the user to model his own explanation facility in the same sense that he can model his own strategies, i.e., to use meta-knowledgeto explain reasoning. According to Davis (6), the fundamental goal of an explanation facility is to enable a program to display a comprehensible account of the motivation for all of its actions. It is not easy, even for an experiencedprogrammer, to find out how a complex processof reasoning got to where it is. Trying to account for past behavior is even more difficult when dealing with an audienceassumedto know nothing about programming. Comprehensibility, then, has to be defined in terms of the application domain rather than in the language of computation. Current explanation facilities are one of the main reasons for the successof expert systems. They use a goal tree built during reasoning as a basis for explanation. Sincethe goal tree models the control structure of reasoning, it provides a single and easy model for the system'sreasoning behavior. Explanation is then viewed in terms of traversal of the goal tree and is generally activated by two commands,"why" and "how," that allow ascent or descent traversal of the tree, respectively. These commands can in general be issued consecutively to allow the entire traversal of the tree. In some systems, like TEIRESIAS (6), the command "why" has an integer argument that allows the explanation of several levels of the tree to take place in a single step, and the command "how" has an argument that can refer to the number of the rule clause to be explained. TEIRESIAS also has the capability of directly examining the rules in the knowledge base to determine which clauseshave already been establishedand which have not yet been tested. In this casethe explanation facility interprets the same piece of knowledge that the inference facility is about to use. The explanations are thus expressedin terms of the contents of the rule. Morgado (9) suggestedthat the goal tree, or an equivalent
data structure representing the ongoing reasoning process, should be representedin the knowledge baseitself, so that this knowledge could be reasoned about as any other kind of knowledge. A system must be able to explain the course of action taken during reasoning in terms of the knowledge that was used during that reasoning and taking into account the previous interaction with the user. In order to give explanations, a system must understand what it knows and what it is doing. So, knowledge about the specificdomain of application and knowledge about the ongoing reasoning activity should be encodeduniformly to allow the system to reason about them equally (9). This allows the system to use rules to reasonabout its own reasoning behavior and therefore to explain it. Reasoning about a previous or ongoing activity is also a precondition to dealing with dialogues in natural-langu ageunderstanding. One must make use of what has gone on to help interpret what is coming. Planningand ExplainingActivities. What was said about reasoning can be applied to any activity in general. The interaction between knowledge, planning, and action has been the subject of much research (9,10,15-L8,22).A cognitive agent must integrate a belief model with an acting model to form a single model (9,10).It must have a uniform representationfor beliefs and actions to reason effectively about the interaction between knowledge and action. In particular, the system should be able to reason about what knowledge it must have to perform an action, what knowledge it may acquire by performing an action, and what knowledge it needsto plan an action (15-18,22). These are all aspectsof meta-knowledge.In other words, the system must have knowledge about its own knowledge and about acting. Pushingthe DeclarativeApproach To RepresentKnowledge. Finally, the contribution of meta-knowledgeto reasoning and acting can be looked at as the ultimate move toward representing most knowledge declaratively (9). This gives the system the capability of reasoning about knowledge that it could not reason about previously. What Are Meta-knowledge, Meta-rules,and Meta-reasoning? Now that the background and the motivations have been presented,the main conceptstalked about are defined, 4s well as how they relate to each other (9,10). Knowledgeand Meta-knowledge.Meta-knowledge,like object knowledge, is composedof assertions (meta-assertions) and rules (meta-rules). Meta-assertions are beliefs about beliefs, and since a rule that is believed to hold is a belief, metaassertions include beliefs about rules. For example, the belief that John loves Jane is an assertion, whereas the belief that Henry believesthat John lovesJane is a meta-assertionrepresenting a belief about a belief. Similarly, the belief that Henry believes that all men are moral is a meta-assertion representing a belief about the rule that all men are mortal. Other meta-assertionsthat can be represented in a system are the beliefs that John loves Jane is a belief about John; all men are moral is a rule about men; Bill doesn't know whether John loves Jane; I (the system) don't know about the fishing industry in Venezuela. Rules telt how to derive beliefs from other beliefs. Since a rule that is believed to hold is a belief, one may have rules
AND META.REASONING META.RULES META-KNOWLEDGE,
about rules as well. These rules are called meta-rules. There are two types of meta-rules: deduction meta-rules and planning meta-rules. Deduction meta-rules are rules that use rules to derive beliefs or that derive rules from beliefs. For example, the rule A -+ (B - C) is a meta-rule that enables the system to derive the rule B -> C if the belief A holds. SimilarlY, the rule (A - B) + C is a meta-rule that enables the system to derive the belief C in case the rule A -> B holds. Both meta-rules are representedby a proposition that has the proposition representing the rule B + C appearing on the consequent and antecedent position of the meta-rule, respectively. The second type of meta-rules,planning meta-rules,are rules that encode reasoning strategies. The distinction between deduction rules and planning rules, i.e., between reasoning and metareasonirg, is discussedbelow. Reasoning and Meta-reasoning.Believing is a state of knowledgerepresenting the propositionsthat the system assumesto be true. Reasoning is the processof inference to form beliefs from other beliefs using deduction rules. Davis proposedthe use of meta-rules as a means of encoding strategies for reasoning (5,6,11).Meta-rules specify which rules should be considered and in which order they should be invoked. For example, the two rules from Ref. 11 appearing in Figure 1 are of this type. Planning meta-rules have to be used differently from all the other rules (deduction object rules and deduction meta-rules) since they do not expresshow to derive beliefs but how to plan the reasoning process.They are inference rules that specify how the deduction rules should be used. Davis proposeda layered control structure to handle reasoningin the TEIRESIAS system(5,6,11).Thebasicexecution cycle in TEIRESIAS consists of selecting the inference strategy to use (backward inference, forward inference, etc.) and applying it to invoke all rules that are relevant to the goal. But before invoking the rules at one level, the system checks for rules at the next higher level that specify which rules should be selectedand in what order they should be used. Morgado and Shapiro (9,10) seethis processas a particular case of a more general acting-planning processsuch as the one proposedby Sacerdoti (23). Acting is the processof executing a plan. Any complex action has to be planned before being performed. Planning is the processof composinga sequenceof actions to be executedto achieve a predetermined goal from a given situation; it is reasoning about how to act to achieve
Meta-rule1: If 1. you are attemptingto determinethe beststockto investin 2. the client'stax statusis nonprofit, 3. there are rules that mention in their premise the income-tax bracket of the client, then it is very likely (.9) that each of these rules is not going to be useful. Meta-rule 2: If
1. the ageofthe client is greaterthan 60, 2. there are rules that mentionin their premiseblue-chiprisk, 3. there are rules that mentionin their premisespeculativerisk, then it is very likely (.8)that the former shouldbe usedbefore the latter' Figure 1. Selectingand orderingplanning meta-rules.
601
that goal. The basic planning cycle in NOAH (qt) consists of looking for a plan to achieve the goal and of an iterative process in which new refinements of the plan are continuously expanded and witicized until a final plan is derived. The expansion phase produces a new, more detailed plan. The criticism of the new plan consists of any necessary reordering or elimination of redundant operations to ensure that the local expansionsmake global sense.After being constructed,a plan of actions may be executed. Reasoning can be looked at as the sequenceof actions performed in applying rules (plans for reasoning) to derive beliefs from other beliefs. Since reasoning is itself an action, and an action has to be planned before being performed, then before reasonirg, the system must first plan the reasoning. Since planning is reasoning about acting, and in this case,the acting is the act of reasonirg, then this planning of the act of reasoning is reasoning about how to reason, or meta-reasoning,and Davis's meta-reasoning cycle can be seen as a special case of the general planning cycle. Morgado and Shapiro conclude, then, that if an actingplanning-reasoning system usesits acting componentto carry out its reasoning, its planning component will automatically perform meta-reasoning. ConnectingTheories.In philosophy there is a substantial literature on the logic of knowledge and belief (24-26) and on the theory of reasoning and acting (27-29). These topics (15,22,30),as well as the topics of meta-knowledgeand metareasoning (1,6,9,L0,L2-L4,3I,32),and the interaction between knowledgeand acting (15-18) have also receivedconsiderable attention in AI recently. Morgado and Shapiro present a thesis (10) that provides an insight into the relations among these issuesin AI knowledge-basedsystems: In a knowledge-representation(KR) systemin which assertions and rules are representedin the same way as any other concepts,no special mechanism is neededto representmeta-knowledge, where this is understood to include beliefs about beliefs, rules about beliefs,beliefsabout rules, and rules about rules. In a knowledge representationsysternwhich has an acting-planning componentand which can representactions and plans, no other mechanism is needed to handle meta-reasoning,where this is understood in include rules about the order of using rules, and reasoningabout the processof reasoning.The dtfference betweenmeta-knowledgeand meta-reasoningas formulated aboueis that the former dealsprimarily utith beliefswhile the latter deals with acting. We thereforeconcludethat, besides the conceptualdistinction betweenthe objectleueland the metaIeuel,a ualuable distinction to focus on when building KR systems which can haue mets,-knowledgeand can do meta-reasoning is that betweenbelieuing and acting.
Systemswith Meta-knowledge Two systems that have meta-level components-TEIRESIAS and MOLGEN-are briefly described. TEIRESIAS.Davis and Buchanan (2,4_6,11) explore in TEIRESIAS the concept of meta_level knowiedge in several different forms, each of them supporting one or more of the tasks of acquisiiion, accumulation, and maintenance of knowledge. Schemata and rule models were built to support acquisi-
602
META.KNOWLEDCE, META.RULES AND META.REASONING
tion and accumulation of knowledge via interactive transfer of expertise from the human expert to the knowledgebase.Function templates and schemata support maintenance of knowledgeby giving the system a "picture" of its own knowledgeand the way that knowledge is organized.Schemataencodeknowledge about the representation of objectsand about their relationship. Knowledge about inference rules is encodedin the rule models.A rule model is an abstract descriptionof a subset of rules, built from empirical generahzationsabout those rules and it is used to char acterizea "typical" member of the subset. Finally, function templates are list structures indicating the order and the type of the arguments in a typical call of a function. They are used for code dissection and generation. According to Davis and Buchanan (5,6,11),meta-rules embody strategies-knowledge that indicates how to use other knowledge. They show how meta-rules can be used to encode strategies and to define control regimes. They see strategies from the perspective of deciding which knowledge (rule) to invoke next when more than one rule may be applicable. Meta-rules in TEIRESIAS draw conclusions about object-level rules in two ways: they can make deductions about the likely utility of certain object rules or they can indicate a partial ordering between two subsetsof object-levelrules. Davis and Buchanan stress that meta-rules should make conclusions about the utility of object-levelrules, not their validity. They claim that it is because of this fact that it makes sense to distribute knowledgein object-leveland meta-levelrules. Otherwise, it would be only neededto add another premise clause to each of the relevant object-levelrules. Adding meta-rules to the TEIRESIAS system requires only a minor addition to the control structure. The system retrieves the entire list of rules relevant to the current goal. But before trying to invoke those rules, the system first looks for any meta-rules relevant to that goal. If it finds atry, these are invoked first. This may draw conclusionsabout the likely utility and relative order of those rules. The list of objectrules may be shortenedand reorderedby those meta-rules and only then are they used. Viewed in tree searchterms, the implementation of meta-rules in TEIRESIAS can either prune the searchspaceor reorder the branchesof the tree. This processis generalizedin TEIRESIAS, i.e., there can be an arbitrary number of levels of knowledge, each one guiding the use of the knowledge at the next lower level. Finally, Davis defendsmeta-rules,sincethey enable one to use content-directedinvocation. This technique allows the user to define his own invocation criteria, offering him a richly expressivelanguage. Meta-rules also have strong validity, since descriptionsare done via direct referenceto the knowledge source content itself. In meta-rules, then, the two ideas of generalized invocation criteria and content-directed retrieval are combined. The former gives a high expressivenessto meta-rules sinceit allows invocation of any knowledge sourcethat fits a given description. The later gives meta-rules a strong degree of validity becausethere is a formal link between the knowledge source and its description. Besidesthis, content-directedinvocation offers a strong degreeof flexibility in a program, since acquisition and maintenance of knowledge becomeseasier. Editing or adding an object-levelrule doesnot require meta-rules to be edited to make sure they still apply, since meta-rules will adjust to the changesfound in the edited rule. On the other hand, editing or adding a meta-rule doesnot cause problems either, since one does not have to look for all the object rules to which these meta-rules apply in order to mention them in the code.Indeed, 8s invocation is made by a
description of the codeof the objectrule itself, this entire operation becomestransparent to the user becauseagain this burden for system upkeep was transferred to the program. This idea of replacing referenceby name with referenceby description has its problems as pointed out by Davis. First, it is not always clear how to gen erahze from a specificprocedure to a general description of the capabilities desired. Second, the overhead in computer time must be considered. MOLCEN. Stefik (I4) recognizesthe fact that most of the decisionsa planner makes are about the reasoningprocessas opposedto decisions about the problem and that it raises a variety of decisionsthat are usually made implicitly in planning programs with rigid control structures. It is this fact that leads him to propose a layered approach for meta-planning, that is, for planning about planning. His meta-planning model usesoperations for hierarchical planning with constraints and integrates two strategies generally used independently in planning programs: the least-commitment (conservativereasoning) and the heuristic (plausible reasoning (qv)) strategies [namely in Sacerdoti'sNOAH (23) and Sussmans'sHACKER (33) (qr), respectivelyl. By integrating these techniques, MOLGEN makes senseof the use for guessing,but only as a last resort, and so, he considers bugs as inevitable (as in HACKER), but only when one guesses.Guessing is used to compensatefor the lack of knowledge to solve a problem. The control structure in MOLGEN is composedof an interpreter and three layers, called planning spaces.Each space has operators, objects,and steps and controls the creation and scheduling of steps in the next lower layer in the hierarchy. The lowest layer in the hierarchy, the domain space,is called the laboratory space. This is the space that has knowledge about the objectsand operationsof the specificdomain, & genetic laboratory in MOLGEN. This is not a control level at all; it plays merely an executerole. The next layer in the hierarchy is called the design space. It is the space charged with designingthe plans; i.e., it is this layer that createsand schedules stepsin the laboratory space.This is the first control layer in MOLGEN. This spacedefines a set of operators for desigtting plans abstractly and for propagating constraints among the refined subproblemsin the laboratory plan. The top layer of the hierarchy is called the strategy space.The organizational idea behind the strategy space is the distinction between least-commitment and heuristic modesof reasoning. It relies on cooperation between subproblems via constraint propagation to stay in the least-commitment cycle as long as it can (conservativereasoning),resorting to guessing(plausible reasonirg) only as a last choice. This is the spacethat has knowledge about strategy. Although the design operatorsplan by creating and scheduling laboratory steps,the strategy operators meta-plan by creating and scheduling design steps.Communication between spacesis done by using control messages that invoke proceduresat the next lower level without knowing their names. This guarantees the communication to be uniform, but these procedures redundantly represent the knowledge about operators.Another problem is that scheduling is based on numeric priorities rather than on contentdirected invocation. In summary, MOLGEN uses layers as a way of creating abstraction. Although meta-knowledgeis used to combine the least-commitment and the heuristic strategies, metaknowledge is embedded in the interpreter in a form of two cycles that invoke the strategy operators. Therefore,
MICRO-PLANNER
MOLGEN has in the strategy space the tools to create several different control regimes, but the way they are combined to specify a particular strategy is controlled by the interpreter. In order to have other different strategies, the interpreter would have to be modified.
Conclusions Meta-knowledge is knowledge about other knowledge as opposed to knowledge about things in the world. Meta-reasoning is planning the act of reasoning. Meta-rules are rules that "talk" about other rules. They can be deduction meta-rules or planning meta-rules. The planning meta-rules are rules to do meta-reasoning. Recent work suggests that besides the conceptual distinction between the object level and the meta-level, a valuable distinction to focus on when building knowledge-based systems that can have meta-knowledge and can do meta-reasoning is that between believing and acting. The theories of knowledge and belief and of knowledge and action may shed some light on the issues of meta-knowledge, meta-rules, and meta-reasoning.
BIBLIOGRAPHY 1. A. Bahr, Meta-knowledge and Cognition, Proceedings of the Sixth International Joint Conference on Artificial Intelligence, Tokyo, Japan, pp. 31-33, 1979. 2. R. Davis, Knowledge Acquisition in Rule-Based SystemsKnowledge about Representations as a Basis for System Construction and Maintenance, Pattern Directed Inference Systems, Academic Press, New York, 1978. 3. A. Collins, Fragments of a Theory of Human Plausible Reasoning, TINLAP -2, Lg4-201, 1979. 4. R. Davis, Interactive Transfer of Expertise, in Proceedings of the Fifth International Joint Conference on Artificial Intelligence, Cambridg", MA, August L977.
5. R. Davis and B. G. Buchanan, Meta-level Knowledge:Overview and Applications, tn Proceedingsof the Fifth International Joint Conferenceon Artificial Intelligence, Cambridge, MA, L977. 6. R. Davis and D. Lenat, Knowledge-BasedSystems in Artificial Intelligence, McGraw-Hill, New York, pp. 227-490, L982. 7. P. A. Kollers and S. R. Palef,"Knowing not," Mem. Cog.4-5r 5bB5 5 8 ( 1 9 76 ) . 8. S. C. Shapiro, On RepresentingAbout, Extended Abstract, Computer ScienceDepartment, SUNY at Buffalo, 1980. 9. E. Morgado,Believing and Acting: An Approachto Meta-Knowledge and Meta-reasoning,Ph.D. proposal,Department of Computer Science,SUNY at Buffalo, 1980. 10. E. J. Morgado and S. C. Shapiro,Believing and Acting-A Study of Meta-Knowledge and Meta-Reasoning, Proceedings of the EPIA-8S (Encontro Portugues de Inteligencia Artificial), Oporto, Portugal, 1985,pp. 138-L54, 1985. 11. R. Davis, GeneralizedProcedure Calling and Content-Directed Invocation, Proceedingsof the AIIPL Conference,August 1977. 12. H. Gallaire and C. Lassere,Controlling KnowledgeDeductionin a Declarative Approach, in Proceedingsof the Sixth International Joint Conferenceon Artificial Intelligence, Tokyo, Japan, rg7g. 13. M. Genesereth,An Overview of Meta-Level Architecture, Proc. of the Third AAAI Conference,Washington, D.C., 119-L29,1988. L4. M. Stefik, Planning and Meta-Planning, MOLGEN: Part 2, Computer Science Department, Stanford University, Stanford, CA, 1980.
603
1 5 . D. A. Appelt, A Planner for Reasoning About Knowledge and Action, in Proceedings of the First AAAI, 1980.
Stanford, CA, 131-133,
1 6 . R. Moore, Reasoning About Knowledge and Action, Proceedings of the Fifth International Joint Conference on Artificial Cambridge, MA, pp. 473 -477 , 1977 .
Intelligence,
L7. R. Moore, Reasoning About Knowledge and Action, Technical Note 191, AI Center, Computer Scienceand Technical Division, SRI International, Menlo Park, CA, 1980. 18. B. Smith, Knowledge Representation Semantics,Proceedingsof the Fifth International Joint Conferenceon Artificial Intelligence, Cambridge,MA, pp. 987-990, L977. 19. L. Morgenstertr, A First Order Theory of Planning, Knowledge, and Action, Proceedings of the Theoretical Aspects of Reasoning About Knowledge,Monterey, CA, pp. 99-LL4, 1986. 20. E. Feigenbaum, The Art of Artificial Intelligence: I. Themes and Case Studies of Knowledge Engineering, Proceedingsof the Fifth International Joint Conferenceon Artificial Intelligence, Cambridge,MA, pp. 1014-L029,1977. 2L. E. D. Sacerdoti, Problem Solving Tactics, nt Proceedingsof the Sixth International Joint Conferenceon Artificial Intelligence,Tokyo, Japan,pp. 1077-1085,1979. 22. S. Amarel, On Representationsof Problems of ReasoningAbout Actions, in D. Mitchie ed, in Machine Learning and Heuristic Programming Machine Intelligence3, 131-L7L, American Elsevier, NY, 1968. 23. E. D. Sacerdoti,A Structure for Plans and Behauior,Elsevier, New York, 1977. 24. J. Hintikka, Knowledge and Belief, Cornell University Press, Ithaca, NY, 1963. 25. J. Hintikka, Semanticsfor PropositionalAttitudes, in Ref. 26, pp. L45-I67. 26. L. Linski (ed.),Referenceand Modality, Oxford University Press, London, L97L. 27. B. Aune, Reasonand Action, Reidel,Dordrecht,The Netherlands, L977. 28. M. Braud and D. Walton, Action Theory, Reidel Dordrecht, The Netherlands, 1976. 29. H.-N. Castaneda,Thinking and Doing, Reidel, Dordrecht, The Netherlands, L975. 30. A. Maida and S. Shapiro, "Intensional conceptsin propositional semantic networks," Cog. Scl. 6, 29I-330 (1982). 31. K. Bowen and R. Kowalski, Amalgamating Languageand MetaLanguage in Logic Programming, Technical Report, School of Computer and Information Science, Syracuse University, New York, 1981. 32. R. Filman, Meta-Language and Meta-Reasoning,Computer ResearchCenter, Hewlett-Packard Laboratories,Palo Alto, CA. 33. G. J. Sussma\, A ComputerModel of Skill Acquisition, American Elsevier,New York, L975. E. MonGADo SUNY at Buffalo
MICRO-PLANNER MICRO-PLANNER is a subset of the programming language PLANNER (seeC. Hewitt, PLANNER: A Language for Proving Theorems in Robots, Proceedingsof the First International Joint Conferenceon Artificial Intelligence, Washington, DC, pp. 295-301, 1969).PLANNER itself has never been implemented completely,but MICRO-PLANNER was implemented by Sussman,Winograd, and Charniak (seeG. J. Sussman,T.
604
MILITARYAPPTICATIONS
Winograd, and E. Charniak, MICRO-?LANNER Reference Manual, Artificial Intelligence Memo No. 208A, MIT, Cambridge,MA, December1971.MICRO-PLANNER was intended to combine elements of a theorem prover with a normal LISPlike programming language. The mechanism used can best be describedas pattern-directed procedureinvocation. A theorem prover is a program that blindly searchesthrough a database of assertions and theorems. on the other hand, a normal programming language has a fixed prespecified and inflexible flow of control. MICRO-PLANNER behaves like a theorem prover that makes use of additional procedural information. In this way it becomespossible to specify a goal to be reached instead of a detailed algorithm of how to reach it. Winograd's SHRDLU (qv) program is basedon MICRO-PLANNER (seeT. Winograd, Understanding Natural Language, Academic Press, New York, 1972). Deficiencies of MICRO-PLANNER resulted in the development of several other languages, most prominently CONNIVER (qv). J. Gnllnn SUNY at Buffalo
MILITARYAPPTICATIONS During the last few years AI technology-related activity within the military has increaseddramatically. This heightened interest and expanding investment in AI by the Department of Defense (DOD) and the individual services (Army, Navy, Air Force, and Marine Corps) may be attributed to a number of factors, in particular, 1. the very real progressAI technologieshave been making and demonstrating at academiccenters and in commercial applications; 2. the increasing complexity of modern-day military operations, brought about in great degree by significant advances in the speedand accuracyof sensorsand weapons, coupled with the rapid growth in the amount of critical information to be processed,analyzed,and assimilated under severe time constraints with limited manpower; and 3. a growing awarenessand acceptanceby the military of the potential of AI technologiesto help solvemilitary problems.
haps the most significant areas for both near- and far-term AI technologyapplications (1). Underneath the current surge of attention to military applications of AI lies a history of almost 20 years of DoD support, through agencies such as the Office of Naval Research (ONR) and the Defense Advanced Research Projects Agency (DARPA), to basic AI researchat a number of universities. As the discipline has progressedand promising technologiessuch as expert systems and natural-language processing have emerged, interest has gt:own in applying these techniques to challenging real-world military problems. In the early 1980s the Navy took the lead among the servicesand establishedthe Navy Center for Applied Researchin AI at the Naval Research Laboratory to address the transition of basic AI research to naval applications. More recently, the Air Force has accelerated AI research and exploratory developmentat the Avionics and the Flight Dynamics Laboratories at Wright-Patterson Air Force Base and designatedRome Air Development Center at Griffiss Air Force Base as part of a long-range AI effort that includes a consortium of sevenNew York universities and the University of Massachusetts.The Army, also, is investing in long-term AI research,exploratory development,and training of personnel, in part through liaisons with the University of Texas and the University of Pennsylvania. A new, far-reaching program involving a number of universities, defenseresearchand developmentlaboratories,and private industry is the Strategic Computing Initiative (SCI).Administered by DARPA and estimated to cost about $600 million (106)for the first 5 years, SCI is aimed toward developing and applying a new generation of machine intelligence technologyto critical defenseproblems(3). Three specificmilitary areas targeted for initial technology applications are an autonomous land vehicle, 8D intelligent Pilot's Associate, and naval battle management.
Autonomousland Vehicle. The developmentof the autonomous land vehicle, with active participation by the Army, will emphasizecomputer vision and image understandingtechnologies.Ultimately, the addition of advancedAI reasoningtechniques may allow the vehicle to not only senseand react but interpret its environment and then adapt its mission strategy correspondingly.Initial work is concentratingon designing a vehicle that can automatically determine the path of a road and follow it. Eventually the vehicle must also be able to not only detect an obstaclein its path but also determine its naThe possiblecontributions of AI to defensespanthe breadth ture (e.g.,a shadow, a traversable log, or a large boulder reof military activities. Table 1 relates 14 basic AI technologies quiring a detour) and react accordingly. to a number of military-problem areas.Applicability to seven generic military problem areas as well as a number of more Pilot'sAssociate.In concert with the Air Force, the Pilot's specific task domains is indicated as either major or minor. Associate project is directed toward providing the pilot of a That the matrix is quite denseis not surprisirg; each AI tech- single-placefighter aircraft with the support and expertiseof a nology is applicable to a wide variety of military task areas, "phantom flight crew." Rather than addressing the automaand each problem area could profit from a number of AI tech- tion of conventional functions in an aircraft, the project is niques. Note also that the generic problem entry "operations" aiming toward providing logical expertise in specifiedtask aris rated as a potential major application area of almost all of eas through the conceptof an integrated cockpit. Initially, the the AI technologiesconsidered.The more specificmilit ary task system is being conceivedas a construct of four major interacareas enumerated in the table are not only primarily opera- tive expert subsystems: a situation assessmentmanager, a tions oriented, but many are vital componentsin the critical tactical-planning manager, a mission-planning manager, and operations area of command,control, communications,and in- a systems-statusmanager. Specialemphasisis being placedon telligence (CBD.Indeed, military commandersare identifying the pilot-vehicle interface, which will include advancedconCBIand its increasingly complex and difficult problems as per- trol, display, and automation techniques that utilize speech
MILITARYAPPLICATIONS
605
Table 1. Military Applications of AI Technologies" o)
(1 H
-
C)
.F{ +)
AI Approaches To
c{
cU
iF
o eq
8r 3 cd
A
X
X
X
qB cEd
X
X
X
€ Eo o
BE
X
X
X
sc d[ o
X
X
X
X
dE E9
X
a\
X
a)
€ 9
(D
X
X
X
X
.)
X
.q
rE
X
X
X a)
rf
6
\, ^
r)
X a)
X
ia
-' io t r
X
X
X
:
o
X
X a\
X
!o
.:
g
UE 5a
X
a)
O.d,
* fi
R&D
X
E
gg 5E
v J -
X
-
=
5
O
.95 'j5'H
E .E ? 5 5b fr E E sg E 9 q cd o
R
LV
P
E
r\
X
X
^
" Symbols: O, major applicability; x, minor applicability. From Ref. 2. Courtesy of EW Communications, Inc.
recognition (see Speech understanding), derstanding (qt), and voice synthesis.
natural-language
un-
Naval Battle Management. A goal of the battle management prograh, & joint effort with the Navy, is to demonstrate how AI technology, particularly expert systems and natural-language understanding, can contribute to the development of automated decision aids for the complex combat environment. Five battle-management functions have been identified as initial application areas within fleet-command center operations. They include force requirements, capabilities assessment, campaign simulation, operations planning, and strategy assessment. These functions are well defined, yet complex, demanding, and labor intensive, requiring skill and expertise to perform and are thus promising candidates for expert system decision aids. As with the personnel they will support, expert
systemsdevelopedfor these applicationswill need to interact and cooperatewith each other. Emphasis is also being placed on natural-language understanditg, both as an interface between the expert systems and their users and as a means of automating the processingof the ever-increasingcommandcenter message traffic, which can expand lO-fold during a crisis. Military operations,and in particular C3I, possesssignificant characteristics that have not always been prominent in other AI application domains. One such characteristic is the time-critical nature of tactical decision making-the need for appropriate, real-time responseto dynamic situations. The deployment of increasingly complicated surveillance and weapons systems, both friendly and hostile, has compressedthe time available for tactical decision making. Automated deci-
606
MILITARYAPPLICATIONS
sion aiding (and ultimately automateddecisionmakirg) under these conditions must emphasize efficient solution-space search and pruning techniques and consider finding the first solution that satisfies a given set of conditions or exceedsa specifiedthreshold. In addition, vast amounts of diverse, often incomplete and uncertain data must be interpreted and integrated to form the tactical picture upon which situation assessmentsand consequenttactical actions are based. Therefore, effective techniques for reasoning under uncertainty will be crucial to automated decision support (seeReasonirg, plausible). The problem bf information processingin the military is an enormous one, due both to the vast quantities of data to be handled and to the distributed nature of the generation and usage of the information. Huge databasesmust not only be maintained and updated but must also be quickly and efficiently accessibleby a distributed hierarchy of military personnel with differing needs.The flood of incoming data must be analyzed, disseminated,integrated, stored, and presented appropriately and in a timely manner. Thus, methodologies for efficient distributed database management and information interpretation and integration as well as man-machine interfaces that accommodatethe specialtzedneeds and personal preferencesof the system user will be required. The geographic and functional distribution of both C3I assets and C3I decision-making authority and responsibility have led CsI to be describedas an excellent exampleof distributed problem solving (qv). A new field of researchwithin AI, called distributed AI, is addressingmany of the difficult issues in this area (4). For example, how may control be most effectively distributed across a network of semiautonomousproblem-solving or decision-making nodes and still ensure their cooperation in arriving at consistent and coherent problem solutions or strategies?How may tasks be assigneddynamically among often competing nodes,and in what ways should nodes communicate with each other and what kinds of information should be exchanged?Additionally, how will distributed problem-solving systems recover from the failure of one or more nodes?These questions are but an indication of the challenges in developing intelligent systems for distributed problem solving in such domains as C3I. To date, military AI application systems are still in the prototype developmentstage.The following sectionsdescribea small sampling of experimental systemsthat are demonstrating the feasibility of applying AI techniques to a variety of military needs. Application areas include sensor-information integration for situation description and assessment,combatresourceallocation, mission planning, maintenanceand troubleshooting of military equipment, training, and automated natural-langu ageunderstanding (qv) of military messages.A crucial issue that separates current prototype systems from operational systemsis that of robustness-the ability of a system to "keep its head" and not fall apart when facedwith input that is unfamiliar, violates internal-system constraints, or contains unresolvable ambiguities. Many AI researchersalso feel that operational systems, particularly those that are expert-systembased,will need a capability to learn to survive in the complex, dynamic military environment. Although current computer systems cannot adapt and improve themselves significantly on the basis of past mistakes or acquire new abilities through observation (e.g., by example or analogy), machine-learning research, which is receiving increased attention following its evolution from early network approachesto
present-day knowledge-intensivetechniques, is making progress toward these goals (5). Sensorlnformationlntegrationfor SituationDescription and Assessment A central problem in military intelligence is the construction of coherent situation descriptions using sensor information. Situation descriptionsprovide crucial support to military decision makers over a wide range of activities, from local tactical operations to strategic planning. Sensor information comes from diverse sourcesin a variety of forms, such as intercepted communications,radar returns, intercepted radar emissions, aerial surveillance, sonar, etc. Such data are often incomplete and uncertain and may be time delayed, ambiguous, and in error. As the technology of warfare escalates,the informationintegration problem grows along two dimensions: The quantity of sensor data is increasing at the same time that the variety of such information is proliferating. This combination creates a potentially overwhelming situation for the human analysts who must generate curuent, coherent situation descriptions and assessmentsunder increasingly restrictive time constraints. Perhaps the simplest form of sensor information integration occurs when returns from successivesweepsof a single radar are correlated to produce a track of somedistant object. Conventional computer algorithms have long been developed for this and other routine correlation tasks. More recently, however, the techniques of AI have been applied to sensor information integration problems that normally require the attention of human analysts since their solutions often involve reasoning with incomplete, uncertain evidence.This section describestwo such applications of AI technologyto sensorinformation integration. The ANALYST program illustrates the use of AI techniques to help generatetactical situation descriptionsand assessmentson the battlefield (6). Developedin the early 1980s by the MITRE Corporation as a prototype expert system for the Army, ANALYST uses reports from multiple-sensor sourcesto generate a real-time battlefield situation display for use by force commanders and their staffs. The premise on which the ANALYST program is basedis that the existenceof enemy units can be inferred from their basic war-making activities. Thus, the input to ANALYST is in the form of reports involving five types of intelligence: intercepted communications, indications from shooting sensors,photo interpretation, radar interceptions, and moving-target indications. The output of ANALYST is a situation map showing suspectedlocations of enemy units. The process of fusing the incoming stream of intelligence reports into a coherent situation map is performed in a deductive fashion using production, or if-then rules. An important constraint of the project was the implementation of the software on computers small enough to be deployedat a battlefield command post. Three basic types of entities are manipulated by ANALYST. Each of the three entity types is representedusing a frame hierarchy (7) (seeFrame theory). Thus, intelligence reports, groups of seemingly related intelligence reports (activity clusters), and hypothesizedbattlefield entities (tanks, command posts, etc.) are all stored as frames. The selection of frames as the basic data structure provides a convenient framework for storing information having taxonomic structure. For example, each photo interpretation report is a spe-
MILITARYAPPLICATIONS
cific instance of a generic intelligence report, and therefore it inherits certain properties from intelligence reports in general. In addition, frames provide for the attachment of demon functions, which supply values for slot values that may be missing as a result of an incomplete report. Although a good deal of information is inherent in the frame structures, the major portion of the domain knowledge available to ANALYST is stored in six distinct knowledge bases.Each knowledge base consistsof a collection of if-then rules that operate on the frame entities. The first knowledge base servesto associateeach incoming intelligence report with clusters of previously processedintelligence reports from the same general geographicalarea. Patterns among these activity clusters are recognizedby the second knowledge base, which creates a frame representing a hypothesized battlefield entity whenever one of its pattern rules fires. A corresponding symbol is placed on the system's graphical situation map to represent the entity. One of the slots in the newly created battlefield entity frame contains a likelihood that is used to indicate the strength of the evidence used to infer the entity's existence. The inference processof the first two knowledge basesis pursued in a parallel fashion for each of the five types of intelligence reports. Thus, it is possiblefor the pattern rules of the secondknowledge base to create multiple frames representing the same battlefield entity when that entity's existence is supported by more than one form of intelligence. Such duplicate entities are merged into single compositeentities by the merge rules of the third knowledge base. This merging process is a crucial step not only becauseit removesredundanciesin the situation map but also becauseit allows information from diverse sourcesto be integrated into a coherent situation description. Tactical and terrain data are used by the fourth knowledge base to refine the descriptions of the hypothesizedbattlefield entities. The rules of the fifth knowledge base reinforce the existence of hypothesized battlefield entities by examining those activity clusters that were not used by any pattern rules. If a stray activity cluster is sufficiently closeto somehypothesized battlefield entity, a reinforcing rule may use that unclaimed cluster to reinforce the entity's existence. One way that the refinement and reinforcement of the fourth and fifth knowledge basesis accomplishedis by adjusting the values of the tikelihood slots contained in the various battlefield entity frames. The sixth knowledge base serves to delete hypothetical battlefield entities that have persisted for a sufficient length of time without reinforcement. ANALYST's rules are segregatedinto separateknowledge bases for control purposes. Each of the knowledge bases is applied to the incoming data in the order that they have been described. Thus, all possible clusters are formed before the pattern rules are applied, all pattern rules are applied before any merging rules are applied, etc. The partitioning of ANALYST's rules into specializedknowledgebasesfacilitates controlling this sequential application process.ANALYST works from lower level data to higher level conclusionsusing a forward-chaining inference mechanism. Thus, conclusionsmade in the then portion of an if-then rule may be used later to satisfy the if portion of some other rule. Conflict resolution, the processof deciding which rule to select when more than one rule is applicable,is handled by applying the rules in their order of appearancein the knowledge base. ANALYST was tested using data from a computer simulation of a battlefield environment. The simulation contained
607
modelsof enemy units performing somespecificmission, and it simulated the intelligence observablesthe enemy units would produce as a result of their war-making activity. In addition, the simulation employed models of friendly sensors used to capture the intelligence observables.Two capture ratios were used in testing ANALYST: 35 and 20Vo.(A 35Vocapture ratio means that the intelligence-gathering apparatus captures only 35Voout of all the possibleevents.)In both casesANALYST produced a quite comprehensive situation ffiap, even given sparse intelligence. For example, even at the 207ocapture level, approximately half of the simuiated battlefield entities were correctly hypothesized.In somecaseshypothesized locationswere accurate enough to be used as targeting data for area weapons.In addition, it may be difficult to trick the ANALYST program using decoyssince it employsinformation from diverse intelligence sources. Just as battlefield entities produce observableinformation during their activities, ships and submarines produce observable features as they transit the ocean.The observableof interest here is an acoustical signature-energy in certain narrow bands of the sound spectrum and particular harmonics of these fundamental frequencies-that is produced by the propulsion system and other equipment in the vessel. To detect and classify the ships and submarines operating in a certain sector of the ocean,the Navy uses acoustical data collectedby submerged hydrophone arrays located at the ocean'speriphery. Each hydrophone of the array is directional, so that its sensitivity is concentratedin a cone that projects out into the ocean. The signals collected from the hydrophones are displayed in the form of a sonograffi,a time series display of the acoustical energy spectrum detected at the hydrophone. Highly trained sonar analysts interpret the sonograms,and by using their knowledge of ship and submarine signature traits, sea-lane characteristics, underwater sound propagation, and intelligence information, they develop a situation board that describesthe current state of activity in the ocean sector in question. The most straightforward situation for the analyst occurs when only one source presents itself on a given hydrophone channel. In that case the processof matching the incoming signature with a collection of stored referencesis complicated primarily by noise, changing acoustical propagation conditions, measurementerrors, and the possibleincompletenessof the signal data. A more difficult situation is one where radiations from several vesselsare captured on the same channel and where several channels are active simultaneously. The processof disentangling these multiple signatures is challenging to even the most experiencedanalysts. In order to investigate the feasibility of using automated knowledge-basedreasoning to aid in this complicated signal-understandingtask, DARPA initiated a research project in the early 1970sinvolving computer scientists at the Stanford Heuristic Programming Project and also at Systems Control Technology. The resulting programs, HASP (Heuristic Adaptive Surveillance Project) and SIAP (Surveillance Integration Automation Project) (8,9), were evaluated in the late 1970swith quite promising results. There are several superficial similarities between ANALYST and HASP/SIAP. For example, frames are used to store static knowledge about the characteristicsof vessels,and entities hypothesizedby HASP/SIAP have associatedweights as a measure of the confidencein the hypothesis. These weights are used in a fashion similar to ANALYST's likelihoods.
608
MILITARYAPPLICATIONS
HASP/SIAP representsmuch of its domain knowledge as production rules. In addition, the information refinement process in HASP/SIAP is similar to that of ANALYST. ANALYST successivelyrefines intelligence reports into activity clusters and then refines activity clusters into entities. The refinement processof HASP/SIAP begins by detecting harmonic relationships between sonogramlines and associatingthem into harmonic sets. Harmonic sets are further related to potential shipboard noise sources,and groups of sourcesmay suggest a specificvessel,etc. The underlying framework for problem solving used by HASP/SIAP, however, is quite different from that of ANALYST. The control strategy employed by ANALYST is to apply knowledge basessequentially, each of which uses forward chaining and a straightforward conflict-resolution scheme. The control strategy of HASP/SIAP is a much richer one and is known as a blackboard architecture (10,11) (see Blackboard systems).In this implementation the production rules are divided into a hierarchy of knowledge sources.The lowest level in this hierarchy consistsof specialistknowledgesourcesthat contain domain knowledge about ocean surveillance. The higher levels of the knowledge source hierarchy contain strategic knowledge about how to solve ocean surveillance problems. These problem-solving strategy rules monitor a central data structure called the blackboard, where the current best hypothesis[e.g., &t the highest level of analysis, the situation board postulating the most likely vessel(s)basedon data available up to that timel is posted. The strategy rules determine opportunistically which of the lower level knowledge sources should be applied to the current best hypothesis in order to provide the most refinement. Thus, all of the knowledge sourcesoperate on the same blackboard under control of the strategy knowledge source,whosejob it is to provide focus of attention for the system. The lower level knowledge sources may be invoked in either an event-driven (forward-chaining) or an expectation-driven (backward-chaining) mode. As in ANALYST, event-driven inference combinesincoming data to create hypothesesat higher levels of abstraction. For example, a newly found sonogram line might be combined with other lines to form a harmonic set. Expectation-driven inference takes a higher level hypothesis and searchesfor lower level information to support it. For example, supposethat the current best hypothesis contains a certain type of ship known to possessseveral noise sources.If not all of the expectedsources are present, expectation-driven inference would direct the system to look for lower level information, such as the presenceof certain previously unexplained sonogram lines, in order to reinforce further the higher level hypothesis. Thus, even though ANALYST and HASP/SIAP both addressthe military problem of integrating and interpreting sensorinformation to develop a situation description, they use dissimilar architectures to accomplishtheir goals. The performance of HASP/SIAP was evaluated in a series of three experiments performed by the MITRE Corporation in the late 1970s.During these tests the expert system'sperformance was compared to that of two expert sonar analysts. In all casesHASP/SIAP developedsituation descriptionsof similar quality to that of the experts, and in one case it outperformed a human analyst. Combat ResourceAllocation A critical element of battle management is the allocation of combat resources,both in anticipation of and responseto tacti-
cal situations. In particular, battlefield commanders have always been confronted with the problem of determining how to allocate their weapons resources so as to destroy desired targets most efficiently. A wide range of factors, pertaining both to the enemy and to friendly forces, can influence the success of a weapons-assignment strategy. For example, it may be important to consider the enemy's counterfire ability, vulnerability, etc., and at the same time, the state of readiness of friendly forces and the easewith which they can be resupplied. Furthermore, the allocation problem is compoundedfor modern commandersbecauseof the ever-widening variety of weapons from which to choose. The Marine Corps is addressing this problem with the introduction of the Marine Integrated Fire and Air Support System (MIFASS). Under MIFASS, fire and air support centers would be established to help solve weapon-to-target allocation problems. These centers would perform weapon allocation planning using information relayed from forward observers equipped with hand-held digital communications terminals. Originally, MIFASS used a heuristic algorithm for weapon-totarget assignment that approachedthe allocation problem in a sequential fashion by optimizing on successiveweapons (12). However, this simple sequential scheme has several limitations: it does not consider the assignment problem as a whole; it doesnot allow for more than one weapon to be allocated to a target; and it ignores a significant number of battlefield factors. More recently, BATTLE, a prototype interactive decision support system that employs AI techniques to solve the weapon-to-target allocation problem, has been developed at the Naval Research Laboratory to remove these limitations (13,14). In its first phase of operation BATTLE examines each possible weapon-to-target pairing and calculates a measure of its effectiveness.This effectiveness calculation is performed by a computation network that is a generalization ofthe inference networks of PROSPECTOR (15). The network, which is prepared by a military domain expert, involves an extensive set of over 50 weapon, target, and battlefield situation factors. Data for a particular battlefield situation are entered interactively by the system user under BATTLE's guidance. In its secondphase BAT'TLE generates a weapon allocation tree using the effectiveness measures computed in the first phase together with a user-supplied set oftactical values ofthe targets. Instead of searching for the optimal solution only, BATTLE allows its user to specify a value, ft, such that the best ft plans will be found. Because the size of the weapon allocation tree becomes astronomical in complex battlefield situations, it is not computationally feasible for BATTLE to explore it exhaustively. Rather, a pruning algorithm is used so that only a selected portion ofthe tree is explored. The pruning algorithm works by applying a heuristic each time a new node is generated. The heuristic calculation finds an upper bound for the overall destructiveness of the current partial assignment, If the upper bound indicated by the heuristic is less destructive than the &th best complete assignment found so far, the current partial path is abandoned. As in most command and control situations, there is a certain time criticality associated with solving the weapon-totarget assignment problem. As mentioned earlier, BATTLE's first phase considers a multitude of factors in calculating the potential effectiveness of each weapon against each target. It would be time-consuming and tedious if BATTLE always insisted upon asking its user all possible questions about the situation at hand, especially if some of the answers affected
MILITARY APPLICATIONS
the outcome only marginally. To prevent this problem and thereby acceleratethe interrogation process,I new questioning stratery called merit was developed(16). The merit strategy ensures that BATTLE focusesits question asking so that the questions asked first are those questions whose answers will have the greatest effect upon the final outcome.A cutoff value may be set so that questionshaving a merit value below the cutoff will not be asked. Experiments have shown that a significant reduction in the number of questions asked occurs when the merit stratery is used to guide questioning. MissionPlanning Another complex problem facing military commandersis the task of mission planning. As in other military probleffis,mission planning is performed at many scales,ranging from tactical to strategic. In all casesthe planning processis a laborintensive one, relying on both the common sense and the specializedtraining and experience of the commanders.The potential for applying AI to mission planning has long been recognrzed,and several existing efforts illustrate the range of applications. Tactical air planning for the Air Force provides an example of an intermediate-level planning task facing the military. In this case the missions being planned are air strikes against designated targets. The problem is to design a plan in which aircraft and ordinance are assignedin such a way as to ensure the destruction of these targets within some predetermined probabitity. This planning processbegins on receipt of an apportionment order issuedby the Joint Task Force Commander. The resulting air tasking order, which may take 24 h to complete manually, specifiesa detailed plan that satisfiesthe original apportionment order. To accelerateboth the planning processand the replanning process,the Air Force has funded the developmentof a planning aid called KNOBS (17). KNOBS (the KNOwledge-BasedSystem)was developedat the MITRE corporation between 1978 and 1982. Its specific domain of expertise is planning ground-strike counter-air missions in the European theater. Its knowledge base contains plausible (but, for security reasons,not necessarilyaccurate) information about a number of potential targets and friendly air bases,generic information about aircraft and ordnancecapabilities, information about antiaircraft defenses,and Air Force tactical doctrine. In a typical KNOBS interactive-planning session,the user would enter the desired target and the desired probability of destruction for that target. The user can then specify other particulars for the mission, such as the type and number of aircraft to be used, which air base should supply the aircraft, etc., or the user can have KNOBS make suggestions for each particular. An advantage of allowing KNOBS to make suggestionsis that KNOBS will only make suggestions that result in a valid plan, and KNOBS will present its suggestionsin an order of preference.At any point in the sessionthe user can have KNOBS checkfor inconsistencies in the partially created plan. For example, the system can alert the user if the selected aircraft and airfield are too far from the designated target or if the ordnance selectedcannot achieve the desired probability of destruction. The processof refining the details of the plan continues,with the user always having the option of letting KNOBS attempt to complete the plan on its own. When the plan is completely specified, KNOBS warns the user about possibleantiaircraft defensesin the vicinity of the target, and the interaction is complete. KNOBS approachesthe planning processby treating each
plan as an instance of a prototypical plan. Plans, as well as nearly all other objects in KNOBS, are represented using frames.Thus, the constructionof a valid plan consistsof building an instance of a plan frame in which all slots have been filled in and the values contained in the slots define a valid plan. The validity of the plan is defined in terms of constraints that exist between the various particulars of the plan. For example, the fact that a given aircraft has only a fixed operating range defines a constraint on the distance between that aircraft's airfield and any potential target areas. In KNOBS the planning process is simplified greatly becauseall of the constraints are known a priori. Although the approach of generating a specific plan by elaborating on a template plan is a limited one, it has been found to be useful in a variety of domains that require somewhat stereotypical planning. By modification of KNOBS's domain-dependentcode,the Navy has used KNOBS to plan certain specificcategoriesof naval missions,and NASA is investigating several applications of KNOBS relating to planning for the space shuttle. A further refined planning system is now being developedfor the Air Force (18). The planning cycle in the Navy is not unlike that of the Air Force; operational mission planning is initiated by the arrival of an operational order document, which states a mission goal in very general terms. The resulting operational plan, which may take a team of commanders days or weeks to complete, specifiesin detail a military plan the planners believe best satisfiesthe original order. To provide Navy commanderswith a planning tool, a knowledge-basedproblem-solving system called OPPLAN-CONSULTANT is being developedat the Naval ResearchLaboratory (19). OPPLAN-CONSULTANT solvesplanning problemsin the domain of naval operational planning. Unlike KNOBS, OPPLAN CONSULTANT is designedto be a general naval planning tool incorporating knowledge acrossthe full spectrum of naval operational planning. The software system used to represent and operate on the domain knowledge is called CKLOG (Calculus for Knowledge processingin LOGic). CK-LOG is a knowledge-processingsystem that uses a three-valued logic (true, unknowr, and false) in building partial models of world states and a two-valued logic (true and false) for theorem proving. An important feature of CK-LOG is its ability to represent and reason about actions and the temporal dependenciesbetween them. A pilot implementation of the system is currently under construction; however, it is expectedthat it wilt take several years before a sufficiently extensive knowledgebase has been developedto demonstrateOPPLAN-CONSULTANT in a realistic planning environment. of Military Equipment Maintenanceand Troubleshooting Sincethe early 1960smilitary equipment has increasedsteadily in complexity and variety, whereas at the same time the pool of trained technicians has been decreasing.A major cost of operations is in fault diagnosis and repair, the procurement of maintenance equipment, and the training of technicians and operators. Each of the services has problems that are unique to its mission, but all share problems of space,difficulty in providing logistics support, and limited technical manpower. These factors, coupled with the demandsof operations, place heavy emphasis on speedyand accurate diagnosis and repair in the field. The various difficulties have created prime opportunities for the application of AI, and a number of efforts are underway. This discussion considers AI applica-
610
MILITARYAPPLICATIONS
tions in three key military maintenance areas: automatic test equipment (ATE), built-in test (BIT), and interactive troubleshooting aids. AutomaticTestEquipment(ATE).In any maintenanceapplication where only a limited pool of human experts is available, the application of an expert-system-basedmaintenance aid is an attractive option. In electronics equipment maintenance the possibilities for immediate benefit are even more apparent. This is particularly true for aircraft electronics (avionics) because of the large number of different systems involved, the heavy reliance of modern aircraft on avionics for mission accomplishment, and the premium placed on rapid turnaround. In avionicsmaintenancethe Navy and Air Forcerely heavily on automatic test equipment (ATE) for diagnosis of faults. This reliance is especially evident in the Navy, where scarcity of spacelimits test equipment and manpower and sparesstorage. Even though many items of avionics have an intricate built-in test (BIT) with automated testing, high false removal rates, excessivelevels of fault ambiguity, and the continued need for human intervention are still problems. (Falseremoval rates as high as 857oare found, and ambiguities involving three to five circuit cards are fairly common.) ATE makes use of test programs sets (TPSs)that consist of an interface between the avionics and the ATE and software for fautt diagnosis. For each different item of avionics a separate TPS must be provided. Test program generation is highly manpower intensive, and results are variable, with high costs and long delivery times common. A test sequencemay take between20 min and L2h to diagnosesystemfaults. Moreover, a typical Navy carrier, for example,requires over 600 different TPSs to support the avionics on its various aircraft. In many casesTPSs are inadequate, either failing to identify faults in a reasonabletime or producing a large ambiguity group of suspected faulty components.These factors, coupledwith limited expert manpower, make ATE an especially attractive application for AI. Efforts underway in this area are applying expert systemstechnology toward the performance of efficient, accurate fault isolation either automatically or interactively with maintenance personnel (20 -23). A more near-term application of this knowledge-basedapproach is directed toward the automatic generation of TPSs for execution on existing ATE configurations. For electronics fault diagnosis an expert-system database typically consists of two kinds of information: detailed specifications for the equipment to be diagnosed and results of measurements. For electronics equipment the specifications consist of such information as a functional description, interconnections,nominal values for normal operating parameters, and componentvalues and tolerances.This kind of information must be available for each piece of equipment and is equivalent to the manuals and performancespecificationsthat a technician would use. The additional data information in the databaseconsistsof symptoms and results of measurements. The rule base of the system consists of general diagnostic methods, rules associatedwith particular classesof equipment, and finally, rules unique to the specificequipment being tested. The key to the effrcient utilization of expert systemsin ATE is the automation of the rule and data acquisition process.This particular bottleneck to expert systemsdevelopment in general is a prime candidate for automation in this application becausethe design data for military electronic equipment is already available in a computer-usable form from CAD/
CAM databases.In addition, it is expectedthat at least someof the rules can be automatically captured from analysis of system functional descriptions and circuit topolory. The possibility for automatic "knowledge compilation" is an important driver in applying expert systems to electronic diagRosisin ATE and should be useful in more conventional maintenance aids as well. In fault isolation systems being developed by the Navy lsuch as FIS (Fault Isolation System) at the Naval Research Laboratoryl and the Air Force,the conceptof "functionality" is utilized to add a dimension of deep reasoning without resorting to detailed circuit analysis (23-25).Most existing expert systems are limited to knowledge bases that provide shallow reasoning capability within their area of expertise. Deep reasoning is not even a reasonableobjective for many application areas becausethe level of theoretical understanding is inadequate to permit reasoning from basic principles, and even if it were, the processwould be incredibly inefficient. Electronic systems offer a potential for effective deep reasoning because their functions are fully understoodand documented,and convenient partitioning of functions can permit a mix of shallow and deep reasoning to be utilized as appropriate. Under the conceptof functionality, electronic equipment subsystemsare consideredto be more than simple nodes in a circuit. In addition to producing a value of output under a given stimulus, a subsystem is consideredto provide a specifiedtransformation of information. By reasoning about the relationship of functional elements in addition to tests made concerning nominal measuredvalues, ambiguity about the ultimate causeof faults is expectedto be greatly reduced. A prime, long-range objective in using expert systems for diagnosis is to rninimize the total testing time to unambiguously isolate the fault. An additional shorter-range goal, with substantial benefit in cost and timeliness, is the automation of TPS generation for existing ATE. A current Navy project, Intelligent Automatic Test Generation (IATG), is incorporating the features of FIS along with a performance improvement capability basedon actual test fault detection successto generate conventional TPSs and eventually perform as an on-line controller for future ATE. The economic benefits are potentially very high for a successfuldiagnostic expert system,for it is estimated that TPS and ATE procurement costs could be reducedby 25-50Vo.This is no small matter since these costs are several billion (10e)dollars a year for all the services.One key point for the application of expert systems in ATE is the already high level of commitment to automation and the fact that most of the equipment neededfor immediate application exists and is designed for computer control. Significant benefits can be achieved in nonavonics applications as well, but in most casesstimulus and measurementsrequire human intervention. The issue is addressedfurther in the discussionabout interactive maintenance aids. Built-in Test (BlT). For large weaponssystemsoff-line testing using ATE and manual methods may be impractical and inadequate. Typical characteristics of such systems are high value, long operating times, and isolation from sources of spares and test equipment during normal operations. Some typical examples are submarine and surface ships, where replacement of "black boxes" is not practical owing to the nature of the equipment and the difficulty in providing sufficient sparesto last for a full deployment. In such circumstancesthe equipment is typically diagnosedand repaired in place at ei-
MILITARYAPPLICATIONS
ther the module or componentlevel. Similarly, diagnosisand repair or the activation of redundant systems on large, longendurance arrcraft must be done in flight to ensure adequate capability levels. The Air Force has undertaken a project to develop "smart" BIT for digital systems, with the intent of minimizing false alarms, improving fault coverage,and identifying intermittent faults (26,27). The intent is to provide design conceptsthat individual designers would use to incorporate smart BIT. Although smart BIT improves performance of individual systems,two Air Force projectsare looking at the overall operation of a vehicle and its systems.The integrated maintenance information system (IMIS) project is designedto provide flightline personnelwith accessto all onboarddiagnostic data available as well as accessto the supply system,scheduling data, training, and maintenance records.The B-1B aircraftwould be a likely candidate for a demonstration sincethis aircraft incorporates extensive BIT already. In addition, the Generic Integrated Maintenance Diagnostics project (GIMADS) proposes to use AI coupled with more conventional equipment and software to address the overall diagnostics problem in an integrated system. Also, the Navy is developing an expert-system-basedradar maintenance aid for the AEGIS ship combat systeffi,I modern missile-defensesystem for Navy cruisers and destroyers. The complexity of the systems involved makes conventional software approachesuneconomical; however, these systemsare consideredexcellent applications for expert system technology. lnteractive MaintenanceAids. Many military systems are not adaptable to the approach taken in avionics and large electronics systems either becausethey are largely mechanical and must be diagnosed in place, lack sufficient built-in sensors to diagnose, or must be repaired in the field under austere conditions. To addressthese cases,there is strong interest in maintenance aids that can interact with operators and technicians to guide the diagnosis, provide advice, or make technical information available in a readily usable form. Commercial work in this area has been successful(e.g., the DELTA system at GE), and the military is interested in exactly the same sort of easily transportable interactive system for field use. For direct aids to a man, much more attention must be given to interfacing with the technician than is neededfor ATE or BIT. Besidesthe more austereenvironment in which he must operate, the technician is likely to be less highly trained and less tolerant of systemdemands.Such aids must provide him with the information he requires in a readily usable form, in natural language, and with advanced graphics capabilities to be of significant utility. For maintenance aids the problem is not so much free-form communication as it is providing accessto large bodiesof information in a convenientway. Video disks under computer control are being explored as one solution, but the options are still open at this time. The IMIS, GIMADS, and Integrated Diagnosticsprojects all expect to develop some form of easily transportable aid of this sort. The army in particular has need of this type of maintenance aid becauseof its austere operating environment and the emphasis placed on rapid and accurate repair in the field. The work done in diagnostic expert systemsin ATE should be directly applicable to interface with a man, but voice response,natural-language capability, advanced graphics, and perhaps vision and image understanding will be critical elements. The psychology of implementation is very important here, with several key issuesto be consideredbeyondthe per-
611
formance of the AI system itself. To be effective, these man aids must be more than simply intelligent; they must becomea partner of the maintainer to a degree well beyond existing systems. To be specifically avoided is the "smart machinedumb man" philosophy. This approach, perhaps justified in certain instances,can only lead to failure in servicedue to poor job satisfaction, wasted human capability, failure to capitahze on learning as a by-product of aid, and outright sabotage.The various aspects of natural language, voice recognition, and reasoning neededto produce interactive maintenance aids are very similar to those needed in any interactive environment and neednot be expandedfurther here. The potential for use of Al-based systemsto improve human performanceis especially evident in the field of diagnosis and repair. For all branchesof the military-complex equipment, high costs,the needfor rapid and accurate diagnosis, and the relatively high turnover rate of manpower create prime opportunities for AI applications. Training An additional, and potentially very important, use of AI technolory is in training. As military operationsand combat systems increasein technical complexity and personnelresources both shrink in number and increase in turnover, the efficient, yet thorough, training of military personnel is crucial to all the services.Many of the sametechniquesused to aid decision making can be applied to provide guidanceand instruction in the training process.An important example of an intelligent, computer-basedmilitary training systemis STEAMER, developed at the Navy PersonnelResearchand DevelopmentCenter (28). STEAMER's domain is propulsion engineering. Steam propulsion systems are an integral part of most Navy ships, and it is imperative that the engineers who operate them have a thorough understanding of their behavior. Although these engineers must operate the systemsroutinely on a day-to-day basis, their understanding must be complete enough to enable them to anticipate the behavior of the system during mechanical failures. The cost of specialized-training simulators is high for such systeffis,and the use of traditional simulators does not necessarily engender a deep understanding of the system being simulated. Since mathematical models exist for such systems,they may be simulated readily on a digital computer.In addition, the systemin question is a physical one, making it a goodtarget for experiments involving aiding humans in the constructionof mental models. STEAMER combinesthese problem features to produce a graphics-oriented trainer whose underpinnings derive from computer simulation. The user of the systemis presentedwith a detailed digital simulation of a steam plant. Elaborate interactive-computer graphics permit users to inspect the simulated plant's operation at many different hierarchical levels. The trainee may use a mouse interface to vary settings of valves and other plant controls and watch how the changes affect the overall system. In addition, for example, the trainee can manipulate fluid levels, which could not ordinarily be manipulated externally in such a plant. The latitude to introduce such changes and observe their effects may be important in developingan intuition about the system.The emphasisin the graphical depictions is to provide trainees with a display that enablesthem to develop a mental model of the plant's operation similar to the mental models used by experts. The initial implementation of STEAMER relied heavily on mating a traditional digital simulation with newer AI pro-
612
MILITARYAPPLICATIONS
gramming techniques, such as object-oriented programming and active display icons. An important componentof the overall system is an object-basedgraphics editor, which includesa wide range of predefined icons for displaying various levels or their rates of change. This editor has enabled nonprogrammers to build complicated steam-plant diagrams by combining primitive icons.In addition, the editor allows systembuilders to define their own unique display icons, if necessary.Future research will be in the area of the knowledge representations necessary to represent steam-plant-operating procedures in terms of their primitive components. STEAMER has been used as a training aid in the Great Lakes Training Center and on CoronadoIsland. Preliminary results are quite encouraging; they indicate that personnel respondvery positively to the interactive system and can learn the same material in a shorter period of time than with traditional instruction methods. AutomatedNatural-Language Understandingof Military Messages Enormous numbers of operational reports are generated and transmitted as part of daily military messagetraffic. These reports range from messagesabout employment schedules, equipment failures, and weather to messagesconcerningforce deployment and readiness, tactics, and intelligence and are used at various levels throughout military command hierarchies.Typically operational reports obey strict formatting conventions but also contain important English narrative descriptions. Although current message-handlingsystemsprocessthe formatted sectionsby entering the data into appropriate fields in the system'sdatabase,messagenarrative is usually treated as adjunct information and stored in the form of remarks or comments.However, many tasks, such as messagedissemination and the recognition of messagetrends, require information in messagenarrative and consequentlymust rely on personnel performing keyword searches and visually scanning individual messagenarratives, a laborious and time-consuming process. Automation of these and other tasks in future military message systems will require computer interpretation of message content. One effort toward this end is an experimental system being developedat the Naval Research Laboratory that employs techniques of computational linguistics and AI to automatically extract information from Navy messages(29,30).Initially the system is addressinga class of operational reports about shipboard equipment failure called CASREPs (CASualty REPorts). CASREPs are an important messagetype, providing current information about ship readiness and equipment performance. They inform operational and support personnelabout equipment casualtiesthat could affect a unit's ability to perform its mission, as well as reporting the unit's need for technical assistanceand for parts to correct the failure. The experimental system uses CASREP messagecontent to assign a distribution list to each messageand to generatea summary of the equipment failure (31). To processsuch messages,the systemmust provide a representation of messagecontent that can be readily accessedand used for applications such as dissemination and summarization. This is accomplishedby a messageinterpreter that initially decomposesthe messageto determine its overall structure and then performs narrative analysis to generate the
structures that enable automated interpretation of English narrative. Messagedecompositionof reports like CASREPsis straightforward becausethe overall structure is known and report formatting conventionscan be used to extract pro forma (strictly formatted) information. However, narrative analysis-the extraction and representation of the particular types of information contained in the narrative portions of a message-is more difficult, principally becausethe structure of the information, and often much of the information itself, is implicit in the narrative. The experimental system uses an approach to narrative analysis called information formatting, originally developed at New York University (32,33). This technique employs an explicit grammar of English and a classification of the semantic relationships within a suitably restricted domain to derive a tabular representation of the information in a messagenarrative. Thus, in simplest terms, an information format is a large table, with one column for each type of information that can occur in a class of texts and one row for each sentenceor clause in the text (seealso Naturallanguage understanding). The implementation of this approach first requires the development of the information format structure through the identification of the classes of objects and the relationships among them discussedin messagetexts within the domain. For CASREPsabout electronicequipment, the objectsinclude the equipment items and their component parts, the signals and data operatedon by the equipment,the peopleand organizations who operate and maintain the equipment, and the documents involved in the maintenance process.These various classesof objectsand their semantic relationships then have their own "slots" in the data structure, so that information can be much more readily retrieved than from the original narrative. The transformation of the narrative portion of each message into a series of tabular format entries involves three stagesof automated processing:parsing, syntactic regularization, and mapping into the information format. Parsing essentially determines sentencestructure and resolves lexical ambiguity, such as usage of the word if both as a noun abbreviation for "intermediate frequency" (a frequent occurrence in CASREPs) and as the more familiar subordinating conjunction. In the secondstage the parse trees are syntactically regularized by a series of transformations to simplify the subsequent mapping into the information format. For examPle, passive assertionsare transformed into simple active assertions, some elements missing from sentencefragments are filled in, and a subject-verb-object word order is created for sentencesnot having one. The third stage of processingmoves the phrases in the syntactically regularized parse trees into the information format. The mapping processis controlled in large part by the sublanguage (semantic) word classesassociated with each word. Theseclasses,along with syntactic information about the word, are recordedin each word's diction ary entry, which is tailored to the domain. CASREP information formatting is currently beng applied to two task areas: dissemination and summary generation.In each area the experimental system contains a knowledgebase organized as a production system; productions operate on an initial database of working memory elements that includes data from both the pro forma set and the information formats. Someproduction rules reflect an understanding of the subject matter of the equipment failure reports, and others are based
MILITARYAPPTICATIONS
613
on general principles of dissemination and summarization. Taken together, the productions addresssuch matters as malfunction, causality, investigative action, uncertainty, and level of generality. Although production rules for the dissemination system act on data extracted from both the formatted portion (e.g., identity of malfunctioning equipment) and the naruative portion (e.g., in requests for assistance,the type of assistanceand from whom) of the message,rules for summarization deal only with messagenarrative (34). Typically a summary consists of a single clause extracted from a section of text, thereby reducing significantly the material that must be read for such critical uses as detecting patterns of failures for particular types of equipment. Currently each summary is generated manually by reading the entire messageand then selecting an appropriate clause from the "remarks" narrative as the summary. Using manual summanzatron expertise as the basis for its production rules, the experimental summarization system involves three steps: inference, scoring the information format entries for their importance, and finally the selectionof the appropriate (highest rated) format entry as the summary. For example,words like inhibit, impair, and preuent trigger inference rules such that if part 1 impairs part 2, one can infer that part 1 causespart 2 to be bad, and one can also infer that part 1 is bad. In scoring the various format entries, the fact that bad is a member of the class of words signifying malfunction will causeentries associatedwith part 1 and with part 2 to be promoted in importance. In addition, the entry associatedwith part 1 will score even higher becauseit is a causerather than an effect. In an early comparison of computer-generatedsummaries with those generated manually on a modest set of CASREPs, the summaries agreed on approximately 837oof the messages tested. Sometimes the summarization system generated two summary lines (as a result of a tie between two format entries), although the manual summary consisted of only one sentence. Nonetheless, one of the two computer-generated summary lines was also the manual summary.On the other hand, the most significant discrepancies(exceptwhere the crucial status word in the narrative was not in the production rule system) involved the system actually selecting more specific causal information than was indicated in the manual summary. Issues yet to be addressedin experimental system development include refinement of the format, intersentential processing,and robustness.A future option for such messagesystems is to perform message analysis at the point of transmission so that the messagesender can be aided by the system in resolving ambiguities and avoiding crucial omissions (35). This could also result in an improvement of message system capabilities by eliminating messagesof little or no information content and upgfading overall message quality.
1 1 . L. D. Erman, P. E. London, and S. F. Ficas, "The design and an example use of HEARSAY-III," Proceedingsof the SeuenthIJCAI , University of British Columbia,Vancouver,BC, Canada,409-415 (August 24-28, 1981).
BIBLIOGRAPHY
19. C. V. Srinivasan, The Use of CK-LOG Formalism for Knowledge Representation and Problem Solving in OPPLAN-CONSULTANT: An Expert System for Naval Operational Planning, NRL Report in publication, Naval ResearchLaboratory, Washington, DC, 1985.
1. A. J. Baciocco(RAdm), "Artificial intelligence and C3I," Signal 36(1), 24-28 (September1981). 2. B. P. McCune and R. J. Drazovich,"Radar with sight and knowledge,"Def. Electron. (August 1983). 3. P. J. Klass, "DARPA envisionsnew generationof machine intelligence technology," Auiat. Wk. Space Technol. 122(10, 46-84
(April 22, 1985). Also Neut-Generation Computing Technology: A Strategic Plan for its Deuelopment and Application to Critical Problems in Defense,DARPA, Arlington, VA, October 28, 1983. 4 . R. G. Smith, "Report on the 1984 Distributed AI Workshop," A/ Mag.6(3), 234-243 (Fall 1985). D . R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.),Machine Learning: An Artificial Intelligence Approach, Tioga, Palo Alto, CA, 1983. 6. R. P. Bonasso,Jr., ANALYST: An Expert System for Processing Sensor Returns, MTP-83W 00002, The MITRE Corporation, Mclean, VA, 1984. 7. M. Minsky, A Framework for RepresentingKnowledge,AI Memo 306, MIT AI Laboratory, L974. 8. H. P. Nii, E. A. Feigenbaum,J. J. Anton, and A. J. Rockmore, "Signal-to-symbol transformation: HASP/SIAP case study," A.I Mag.3(1), 23-35 (Spring 1982). 9. H. P. Nii, E. A. Feigenbaum,Rule-BasedUnderstanding of Signals, in D. A. Waterman and F. Hayes-Roth (eds.),Pattern-Directed Inference Systems,Academic Press, New York, pp. 483501,1978.
10. L. D. Erman, F. Hayes-Roth,V. D. Lesser,and R. D. Reddy,"The HEARSAY-II speech understanding system: integrating knowledgeto resolveuncertainty," ACM Comput.Suru. l2(2),213-253 (1980).
L2. K. E. Caseand H. C. Thibault, A Heuristic Allocation Algorithm with Extensions for Conventional Weaponsfor the Marine Integrated Fire and Air Support System, School of Industrial Engineering and Management, Oklahoma State University, Stillwater Oklahoffio, September 1977. 1 3 .J. R. Slagle, E. J. Halpertr, H. Hamburger, and R. R. Canton€,A Decision Support System for Fire Support Command and Control, IEEE Trends and Applications ConferenceProceedings,National Bureau of Standards,Gaithersburg, MD, pp. 68-75, May 25-26, 1983. L4. J. R. Slagle and H. Hamburger, "An expert system for a resource allocation problem," CACM 28(9), 994-1004 (September1985). 15. R. O. Duda, P. E. Hart, K. Konolige, and R. Reboh,A ComputerBased Consultant for Mineral Exploration, Artificial Intelligence Center, SRI International, Menlo Park, CA, September1979. 16. J. R. Slagle, M. W. Gaynor, and E. J. Halpern, "An intelligent control strategy for computer consultation," IEEE Trans. Patt. Anal. Mach. Intell. PAMI-6, 129-136 (March 1984). 17. C. Engelman, J. K. Millen, and E. A. Scarl, KNOBS: An Integrated AI Interactive Planning Architecture, Computersin AerospaceIV Conference,American Institute of Aeronautics and Astronautics, Hartford, CT, (October24-26, 1983. 18. G. Courand, C. O'Reilly, and J. Payne, OCA (OffensiveCounter Air) Mission Planning, Advanced Information and DecisionSystems, AI/DS-TR-3050-1,Mountain View, CA, 1983.
20. J. J. King, Artificial Intelligence Techniquesfor Device Troubleshooting, Computer Science Laboratory Technical Note Series Hewlett Packard,Palo Alto, CA, AuCSL-82-9(CRC-TR-82-004), gust 1982. 2t. W. R. Simpson and H. S. Balaban, The ARINC ResearchSystem
614
MINIMAX PROCEDURE
Testability and Maintenance Program (STAMP), Proceedings of the 1982 IEEE Autotestcon Conference, Dayton, OH, October 1982. 22. R. R. Cantone, F. J. Pipitone, w. B. Lander, and M. p. Marrone, Model-Based Probabilistic Reasoning for Electronics Troubleshooting, Proceeding of the Eighth IJCAI, Karlsruhe, FRG, August 22-26,1983, pp. 207-21I. 23. K. DeJong, Applying AI to the Diagnosis of Complex System Failures, Proceedings of the Conference on AI, Oakland Universitv. Rochester, MI, April 1984. 24. F. Pipitone, An Expert System for Electronics Troubleshooting Based on Function and Connectivity, IEEE First Conference on AI
Applications,Denver, CO, December1984,pp. lB3-188. 25. F. Pipitone, "The FIS electronicstroubleshootingsystem," C o m puter, l9(7), 68-76 (July 1986).
Proceedings of the Army Conference on Application of Artificial Intelligence to Battlefield Informotion Management, Battelle Columbus Laboratories, Washington, DC, April 20-22, 1983. R. Shumaker and J. Fl. Franklin, Artificial Intelligence in Military Applications, Signal Magazine 4O(10),29 (June 1986). The AI Magazine, an official publication of the American Association for Artificial Intelligence (AAAI), Menlo Park, CA, published quarterly. Proceedings of the International Joint Conferences oruArtificiat Intelligence (IJCAI), which are held biennially (odd-numbered years) since 1969, every 4xk* 1 year in the united states. Proceedings of the American Association for Artificial Intettigence Conf e r e n c e s ,s i n c e 1 9 8 0 , e v e r y 4 x k , 4 x k * 2 , and 4xk*J years.
26. K. A. Haller, J. D. Zbytniewski, K. Anderson, and L. Bagnall, Smart BIT, Rome Air Development Center Report RADC-TR-85, June 1985. 27. H. Lahore, Artificial Intelligence Applications to Testability, Rome Air DevelopmentCenter Report RADC-TR-84-208,October 1984. 28. J. D. Hollan, E. L. Hutchins, and L. Weitzman, "STEAMER: An
'#::;:ff;
r::;;ilt*-:it?iion-based
trainingsvstem"' 41
29. E. Marsh, J. Froscher, R.Grishman, H. Hamburger, and J. Bachenko, Automatic Processing of Navy Message Narrative, NRL report in publication, Naval Research Laboratory, Washington, DC, 1995. 30. E. Marsh, Utilizing Domain-Specific Information for Processing Compact Text, Proceedings of the Conference on Applied I{atural Language Processing, 1983, pp. gg-1090. 31. J. Froscher, R. Grishman, J. Bachenko, and E. Marsh, A Linguistically Motivated Approach to Automated Analysis of Military Messages,Proceedings of the 1983 Conference onAl, Oakland University, Rochester, MI, 1983. 32. N. Sager, Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base. in M. C. Yovits (ed.), Aduances in Computers, Vol. 17, Academic Press, New York, pp. 89- IGz. 33. N. Sager, Natural Language Wesley, Reading, MA, 1981.
Information
Processing, Addison-
34. R. Granger, "The NOMAD system: Expectation-based detection and correction of errors during understanding of syntactically and semantically ill-formed text," Am. J. Computat. Ling. g(3*4), 188-196 (1984). 35. E. Marsh, H. Hamburger, and R. Grishman, A production Rule System for Message Summarization , Proceedings of the lt{ational Conference on Artificial Intelligence (AAAI -84), University of Texas, Austin, August 1984, pp. 243-24G.
General
References
Discussion
and documentation
of military
applications
of AI
are appearing in an increasing variety of sources. Defenseoriented popular publications such as Signal, Defense Electronics, and AuiationWeek and Space Technology provide general articles on current or proposed military AI applications. Papers and reports from defense-research laboratories, industrial defense contractors, and defense-funded university research groups remain the primary source for the more technical descriptions of these applications. However, as service-sponsored conferences and symposia on AI become more numerous, their published proceedings, along with those from the well-known AI national and international conferences, are providittg additional valuable references for military applications. Examples include the following.
J. FnINKLTN Planning ResearchCorp. Launa Devls RnNoeLL SHUMAKER NRL Paul Monawsxr Mitre Corp.
MrNIMAX pRoc EDu RE At least two other entries in this volume (Game playing and Alpha-beta pruning) have discussed the idea of minimax search as it is commonly used and understood in AI. This entry places the idea of minimax in context (as a very convenient simplification useful in special cases) with the rest of the field of game theory (including the way it is used in economics) (1). To do this, it is necessary to go a little deeper into the discussion of the idea of a game. General Model of Games Although the use of game trees (see Game trees) to model games is almost universal in AI, the game tree cannot reflect all the aspects of games in general. consider the game of bridge. Each team consists of two players (since their interests are the same), but since the two players in the team do not have the same information (each player knows what cards he holds but does not know what cards the partner holds), the game tree does not quite tell the whole story of the game. There is still a game tree, of course. The deal (a chance move) starts the game. The player on the move knows that the deal is one of the 39!/(13!13!13!)possible ones that give him the hand he is holding. What the bridge player decides to do is not based on the game state, &s in chess, but on his imperfect knowledge of the game state. Imperfect knowledg" does not come only from chance moves, &s in bridge. In Kriegspiel chess is played on two boards, one before each player. Neither player can see the position of the other player's pieces, although, if he makes an illegal move in view of the opponent's position, the umpire tells him. So his knowledg" of the opponent's pieces is imperfect. Another phenomenon occurs in Kriegspiel that makes the game different from the kind normally encountered in AIthe players do not play alternately. If a player makes an illegal move and is so informed, he can try again. The illegal move, then, yields information but allows him to play again: He can make more than one move in sequenceusing the illegal moves as "probes." Returni.rg to bridge, &player's knowledge of the opponent,s
MINIMAXPROCEDURE 615 hands as well as of his partner's hand increases during the bidding, after the dummy is laid down, and as the cards fall during the plays. The point is that one decideson moves in a general game, not on the basis of the state, but on the basis of his knowledge that he is in one among a possible given set of states. The result of a player's move places him in one of a set of states. This set may be smaller than the set reachable from the original set. Extra information may be gained on the basis of what was learned during the move. The umpire decides,oil the basis of the rules of the game, what the player is supposedto know. Strategiesand Payoffs In the usual games consideredin AI, a strategy is an initial decision by the player as to what moves would be made at which state. If the reader doesnot feel that the idea of a strategy is realistic (that the player has to make a decision on the move only when he is on move),he would do well to think of a strategy as a game-playing program itself. It is important to note that given the strategy of all players in the game, the outcome of the game is determined. The conceptof strategy can be used in the general casealso but with some modification. The strategy now determines a move on the basis not of the state but of the knowledge of the player on move about the state. One may objectthat the move chosenby a player may not even be a valid move, given his imperfect knowledge; but this can be counteredby saying that the very invalidity of the move is a piece of information the player can use to enhancehis knowledge.With Kriegspiel this point has already been made; however, for the mathematical ramification of the idea, see the rather complex set-theoretic discussionon which von Neumann and Morgenstern (1) based these concepts. One of the stipulations made by von Neumann and Morgenstern was that at the beginning of the game the player on move knows that it is the beginning of the game. So his choice is with complete information, and the result of his move is known to the umpire. If the first move is a chance move, at least there is a probability-distribution known. One can proceedby induction from here to show that once each player's strategy is known, including a probability distribution over all the chance moves, a probability distribution over all the leaves of the game tree is known. With the payoff to each player specifiedby the rules at each leaf, there is an expectedpayoff of each player known as a function of the ntuple of strategies,one chosenby each player. Given the rules of a game, one can encapsulateits essential structure into what is called a game in normal form. This is a game where each player, instead of choosinga move when his turn arises, is asked at the beginning of the game what his strategy is, i.e., what procedurehe will use to choosea move given any state of information that may arise in the game. He makes this choice without any knowledge of what strategies are chosen by all the other players. On the basis of these choices,the payoff to each player is determined. That is, the game is now encapsulatedinto n tables, one for each player, of his payoff as a function of nvariables (the strategies),of which he can control only one. The sizes of these tables are enormous. The entry Game playing has discussedthese sizes. But for the present ignore the practicality of this table and consideronly what one would do with it if it was accessible.
Zero-SumTwo-PersonGames Specializations: Game theory has been studied in economicsmostly in terms of the normal-form games. The problem that is posedis the following. Given n different n-dimensional matrices, each of size ' x kn, the first player choosesan integer between ht x kz x 1 and k1, the secondplayer between 1 and k2, etc.The payoff to each player is given in the corresponding cell of his matrix. How would a player, given his matrix, decidewhat his choice should be? The last word has in no way been said with regard to the answer to this problem: It is not even clear as to what is meant by anybody's "best" choice.The reader can be given a whiff of what is involved from the well-known game Prisoner's Dilemma: Two men have robbed a bank and have been arrested on suspicion. Both of them have been given the option of confessing and bearing witness against his partner in return for full pardon. If neither confess,there is not sufficient evidence against them for the entire crime but enough to send them to jail on lessercharges.If both confess,they are both convicted. If one of the men doesnot confess,he gets convictedif the other confesses.So, if the two coursesof action chosenby the two are "confess"and "don't confess," the 2 x 2 payoff matrix for each is shown below. Confess Don't Confess Don't
30 30
0 10
In this matrix the two rows refer to the action of the prisoner whose payoff is seen here. Not confessingis a good way for the two to get lesser charges. But if the other prisoner, using that strategy, doesnot confess,there is great advantage to confessingand getting the full pardon due a state witness to a major crime. However, that is not a good idea if the other person confesseson the same argument; then total conviction is certain. Leaving aside the unsolved questions of game theory, return to the case where a few things are known or, at least, agreed upon. This is the casewhere there are two players and where one's gain is always the other's loss, i.e., a zero-sum game. To discusszero-sumgames,the matrix aboveis used again, but its interpretation is changed. The payoff shown is once again that of the player whose choicesare shown as the rows. The payoff of the player whose choicesare shown as columns, however, has the negative of the numbers shown on the matrix. Lookin g at the above matrix from that point of view, one notes that the opponentnow has a vested interest in giving up as little as possible.So if the first row (which is called Confess in the previous discussion;the story is changednow) is chosen, the opponent'sbest move is Don't, yielding zeroto the player. If the player's choiceis Don't, the minimum he can have is 10. So his safest move is Don't since this choice gives him the greatest value of minimum to which he can be pushed by the opponent. Similarly, the opponent can lose 30 if he choosesConfess and only 10 if he choosesDon't. So he minimizes his maximum Ioss,playing Don't and losing 10. Thus, if both sides move conservatively, they would both play Don't. This is a rather stable situation, different from the previous one; this is because,unlike the previous case,no co-
616
MINIMAX PROCEDURE
operation is possible between the two players: One gains exactly what the other loses. The discerning reader will see that this also is really a rather special case: The maximum over the row minima is exactly the minimum over the column maxima. The matrix has a saddle point. In matrices without saddlepoints the players (if they are to play more than one game) can play different strategies in different games.In this caseone gets a mixed strat egy,given by the chosen probability distribution over the different strategies. It can be shown in this casethat there is a saddle point over the mixed strategies,i.e., that there are mixed strategies p and q of the two players such that the expectedpayoff over the two strategies satisfy:
pav orr(p'q) t n ; #il JJ:ifl1o" t"l;ii ff :?iq'Jtt ffi"#,9',\ However, this is not neededfor the kinds of games people consider in AI most of the time, when the strategy maps state to move rather than knowledge to move. It can be shown that in these casesthe minimax value as calculated over the entire game graph is the same as the minimax value over all strategies, and this value is indeed the saddlepoint of the normalform matrix. The strategy for which this saddle point is obtained over the matrix maps each state into exactly the same move dictated by the minimax search of the game tree. It may be worthwhile to illustrate the point by considering the three-stepgame shown in Figure 1. Here a,the first move, is the maximizing player's move (i.e., the move of the player whose payoff is given at the leaves of the game tree); A andB are the opponent's moves (the normal "alternating move"); and b, c, d, and e ate the maximuzer'smove again, leading to the leaveswhosepayoffs are as shown. The minimizer's strategies are specifiedby whether the left or the right branch are taken at the points A and B. A strategy where he would go left at A and right at B is to be denotedby LR. There are thus four possible strategies of the minimizrng player. Similarly, the maximizer's strategies are given by the left and right choices
a A
.,,\\ , t
,'
tt
'4.
/,/\
Figure 1. A three-step game.
made at the points a-e. There are 32 possiblestrategies,de-oted by LLLLL to RRRRR. It can be seen that if the maximizer choosesthe strategy LLLLL and the minimrzer choosesLL, the game will be at A after the first move, at b after the secondmove, and so end up at the leaf with value 1 after the third move following the strategy of choosing L at b. Similarly, the strategies LLLLL and RR would yield the value 3. Table 1 shows the 32 x 4 payoff matrix correspondingto the various strategy pairs. Notice that this matrix has various saddle points with value 6; they are at the intersections of the rows R--R-,with all choices at the nodes b, c, and e and of the columns LL and RL. These are exactly the values and the strategies that the extendedform minimax would yield also: The maximuzerchoosesthe right branch at the first move. It is to the minimizer's advantage to take the left branch, forcing the game to the lower values, after which the maximrzer obtains the larger of the two payoffs by taking the right branch again. This latter analysis, well known in AI and discussedin the entry Game playitg, can be clarified if the following is noted. The maximizer's value of the nodes b-e are 2, 4, 6, and 8, respectively.The minimizer's value at A, being the minimum of 2 and 4, rs 2. Similarly, the minimizer's value at B is 6. So the maximizer's value at a, the larger of 2 and 6, is 6. So the maximrzer in turn plays R to B, the minimrzer answers with L to node d, and the maximizer gains 6 by playing R again.
Table 1. Strategies Minimizer
Maximrzer LLLLL LLLLR LLLRL LLLRR LLRLL LLRLR LLRRL LLRRR LRLLL LRLLR LRLRL LRLRR LRRLL LRRLR LRRRL LRRRR RLLLL RLLLR RLLRL RLLRR RLRLL RLRLR RLRRL RLRRR RRLLL RRLLR RRLRL RRLRR RRRLL RRRLR RRRRL RRRRR
2 2 2 2 2 2 2 2 5 o 6 6 5 D
6 6 5 5 6 6 5 5 6 6
133 133 133 133 144 144 144 144 233 233 233 233 244 244 244 244 757 858 767 868 757 858 767 868 757 858 767 868 757 858 767 868
MODAr toctc Table 2. Payoff Matrix for Maximizer.
LL LR RL RR
4 4 5 D
-2 10 -2 10
10 -2 10 *2
An illustration of what the situation may be when the information is incomplete clarifies some of the ramifications of the von Neumann-Morgenstern formalism: A two-step game, started by the minimizer, who has three choicescalled 1, 2, and 3. In answer, the maximuzet can make one of two moves, called L and R. So there are six possibleplays, LL,2L,3L, 1R, 2R, and 3R. After the minimizer's choice, the game is restricted to the plays lL and 1R if the minimizer plays 1 and to two other correspondingsets if she plays 2 or 3. However, the maximrzer, in her turn, may not be informed as to what the minimrzer played: If she played 1, the maximizer is told that. In the two other casesthe maximuzeris not informed what the move was, so the maximizer (from the fact that no information was given) can surmize that either 2 or 3 was played. So, even though the umpire would know that the game has been restricted to (say; 2L and 2R, the maximuzer would only know that the game has been reducedto 2L, 2R,3L, and 3R. After her move, the game reducesto a leaf, which may be either of 2L or 3L if the maximtzer plays L or to 2R or 3R if she plays R. So if the minimizer doesnot play 1, the maximizer choosesher own move on incomplete knowledge.After her move the actual play is determined however. In this casethe minimrzer has three strategies, each determining her first move. The maximuzer has four strategies, choosing L or R depending on her state of knowledge. If the values of the six plays above are 4, 5, -2, 10, 10, and -2, respectively,the payoff matrix for the maximuzeris as given in Table 2. The columns indicate the minimizer's choice. The rows correspondto the four maximizer's choices.For example, the stratery LR correspondsto when the maximizer decidesto play L in reply to 1 and to play R otherwise. Notice that the matrix has no saddle point. The maximum of the row minima occur at each row as -2. The minimum of column maxima are in column 1, at the values 5. In a game with complete information the minimax value would clearly be 5. As said earlier, most of the ideas describedhere are not of direct applicability to AI. They appear here merely to place the AI work in context with the rest of game theory. Certain ideas of game theory applicable to AI only use minimax indirectly. These are describedin Ref. 2. References3-5 point out certain limitations to minimax and suggestalternatives and improvements.
617
4. G. M. Baudet, "On the branching factor of the alpha-beta pruning algorithm," Artif. Intell. lO, 173 (1978). 5. G. A. Stockman, "A minmax algorithm better than alpha-beta?" Artif. InteII. L2, L79 (1979). R. B. BeNprur St. Joseph'sCollege The preparation of this paper was supported by the National Science Foundation under grant MCS-82L7964and forms a part of an ongoing research on Knowledge Based Learning and Problem Solving Heuristics.
MODAL LOGIC Modal Logic goes back at least to Aristotle, but the current phaseof its developmentbegins with the work of the American logician, C. I. Lewis (1883-1964) (1), who defineda number of axiomatic systems for it. Modal and related logics like temporal logic, dynamic logic, and the logics of knowledge and belief have recently acquired importance for computer science, Iargely because of potential applications in AI and program correctness. Modal Logic is concernedprincipally with the notion of necessityand its companionnotion of possibitity.A propositionis said to be necessarilytrue if it could not be the casethat it was false. Thus, the propositions "everything that is green is colored" or "two plus two is four" could not possibly have been false (or so it seems)and hence are not just true, but necessarily so. By contrast, the proposition "Ronald Reagan is President of the United States in 1985" is true but might have been false since he might have lost the election to Mondale. In other words, it is not necessarily true but only contingently so. The formul a JA indicates that A is necessary. A proposition is possible if it is not necessarythat it be false. Then every true proposition is possible, but not vice versa, and every necessary proposition is true but not vice versa. That A is possibleis written AA. Note that OA is equivalentto-f-/'. Lewis was interested in the notion of necessity becauseof his dissatisfaction with material- or truth-functional implication. Using -+ to indicate material implication, it is true that (pigs lack wings) + (you are reading this entry). But there is no connection between the two facts, and Lewis felt that the intuitive notion of implication was not adequately expressed by -. Lewis proposedto remedy this defect by introducing the = B is an abbreviation for strict implication 1, where A I(A+B). Thus, A strictly implies B if it is impossiblethat A be true and B false. However, there are paradoxes of strict implication analogousto those of material implication, and it is not clear that Lewis's attempt was wholly successful. Formalismsfor Modal Logic
BIBLIOGRAPHY 1. J. von Neumann and O. Morgenstern, Theory of Games and Economic Behauior, Princeton University Press, Princeton, NJ, L947. 2. R. B. Bane{i, Artificial Intelligence: A Theoretical Approach, North-Holland, Amsterdam, 1980. 3. D. S. Nau, "Patholog'yon game trees revisited, and an alternative to min-maxing," Artif. Intell. 2L, 222 (1983).
The language of propositional modal logic is obtained from that of the propositional calculus by adding the operator I for necessity.Then the set of formulas of propositional modal logic will be obtained from some propositional atoms P, 8, etc., by closure under truth-functional connectives(say - and v) and the operator I. Note now that under this interpretation, the formula tr(Pv-'P) will be true sinceP is necessarilytrue or -P, which says that either P is false, but the formula lPvf
618
MODAr toctc
necessarilytrue or it is necessarilyfalse,may not be true since P might have been contingently true. Axiomatic Systems.The system T (sometimescalled M) is due essentially to Gddel and Feys (2).It has the languagejust described above, with axioms consisting of all propositional tautologies (or enough of them) together with the two axiom schemes (A1) [,A+A (A2)
f(A--+B)-IA+IB
These schemessay, in effect,that every necessaryproposition is true and that if one proposition A necessarilyimplies another, B, the necessity of A implies that of B. The rules of inference are modi ponentes (derive B from A and A+B) and necessitation(derive IA from A). There is a subsystemK of T that lacks axiom scheme(A1) and is used as a basis for deontic logic. In deontic logic the symbol I would not represent necessitybut desirability. In this casethe axiom scheme(A1) would be unsuitable since a propositionthat is desiredmay well be false. Again, in a logic of knowledge,tr will stand for "is known," and such a logic will tend to have axiom scheme(A1) since a propositioncannot be known unless it is true. However, in a logic of belief, I would stand for "is believed," and (A1) would not be wanted since false propositions may be believed. [In practice, these logics tend to use symbols other than I and O, but a uniformity of notation makes the comparison easier (see Refs. 3-5 for a discussionof modal logics of knowledge and belief).1 There are two systemsstronger than T, namely the systems S4 and S5 of Lewis. 54 is obtained from T by adding the scheme
The models as described above are the K models. In T modelsthe relation R is required to be reflexive. 54 modelsare obtained by restricting ^Rfurther, to be also transitive, and for 55 modelsit must be an equivalencerelation, i.e., be reflexive, symmetric, and transitive. Then each of the four axiomatic systems is complete for its particular models. That is, a formula A is a theorem of S, where S is any of the systemsK, T, 54, and 55, iff it is true in all S models at all states (b,6).
TemporalLogic Temporal Logic (with linear time) is a casebetween54 and 55. In it W'is the set of instants of time, and R is the before-after relation so that sBt holdsjust in casef either equals or comes after s. Since R is reflexive and transitive in this case, all theorems of 54 will hold. However, there will also be some additional laws that are not theorems of 54. For example, all formulas OD A+ IO nA will be valid, but they are not theorems of 54. See Ref. 8 for a computer-science-orientedintroduction to temporal logic. All the systems K, T, 54, and 55 have the finite-model property. If S is any of the four systems,I formul a A has an S model iff it has a finite S model. This fact, together with the completenessof the appropriate axioms, yields decidability. However, Ladner (9) has shown recently that the system S5 is I{P complete, whereas the systems K, T, and 54 are PSPACE complete.Thus, any implemented decisionprocedurecan only deciderelatively short formulas.
First-OrderModal Logic
It is easy enough to obtain a language for first-order (or quantified) modal logic. One simply adds the modalities I and O to the usual language for first-order logic. One can also define a so that all propositions that are necessary are necessarilynecsemantics for first-order modal logic as an enrichment of that essary.It follows at oncethat all propositionsthat are possibly for propositional modal logic. Frames are as before,but each possibleare possible,i.e., state s is now a model of ordinary first-order logic, and the (A3',) OOA+OA semantics of first-order logic can be easily extendedto one for first-order modal logic. The new enriched states are now the and indeed, (A3) and (A3') are equivalent. The yet stronger possibleworlds of first-order modal logic. system Sb can then be obtained from 54 by adding the axiom Unfortunately, there is little agreement on what these posscheme sible worlds might be like and henceabout the right first-order (A4) OA+IOA modal logic. SeeRef. 7 for someof the issuesthat arise. Below is one that has some interest even for nonphilosophers. There is a semantics for modal logics, due chiefly to Kripke The law a - b+(A(a)-A(b)), substitutivity of equals,is a (6,7), that brings out the differencesbetween the systemsin a fundamental principle governing identity. However, Quine striking way. (10) has pointed out that this law seems to fail in contexts involving modalities, knowledge,or belief. For example, it is Kripke Models true that the number of planets equals 9. It is also true that 9 is necessarily greater than 7. However, it is not true that the A Kripke model is basedon a frame <W,R), where I4zis a set of number of planets is necessarilygreater than 7. Similar exampossibleworlds or, more prosaically,a set of states.Individual ples exist where "necessarily" is replaced by "John knows states are denoteds, t, . .. . Here R is a binary relation on W that" or "Mary believes that." Contexts in which the law fails called the accessibilityrelation and sRt is read ", is accessible are sometimes called referentially opaque, whereas those in from s." I.Jsually,there is a special state r that stands for the which it holds are called referentially transparent. real world or the start state. The model M is then obtained There is an axiomatization of first-order modal logic in confrom the frame by assigning a truth value to each atom P at stant-domainmodels,i.e., modelsin which all worlds have the each state s. The truth value u (4, s) of an arbitrary formul a A same individuals, which is due essentially to Ruth Barcan at each state s is then defined as Marcus (7). The axiomatizatton includes axioms for first-order u(AvB, s) _ true iffeither u(A, s) - true or u(^B,s) - true logic and laws inherited from propositional modal logic of the appropriate kind. It also includes the well-known Barcan foru(-A, s) - true tff u(A, s) _ false rnula and its converse.These last two say that the universal u(IA, s) - true ifffor all r such that sBt, u(4, t) - true
(A3)
IA+trlA
MORPHOTOGY
quantifier commutes with f , so that nVrA(r) is equivalent to VrIA(r) (seeRef. 6 for details). 6.
Dynamic logic
7.
Consider a world W whose states s are the possible states of some actual or abstract computer. The formulas A of the language express properties of individual states. Now each program o, consideredas an IIO relation, is a binary relation on W and generatesa modality lal. To be precise,let sBof mean that there is an execution of the program a that begins in the state s and terminates in f. Then the state s satisfies the formula lalA if, for every f such that sBot, t satisfiesA. Since there are infinitely many programs, there are infinitely many accessibility relations and hence infinitely many modalities. These modalities all satisfy the laws of the logic K and other laws that depend on the particular program o. Intuitively, lalA is the property that has to hold now so that A must hold if and when the program a terminates. The formula (alA, where (a) is -[o]-, says that the property A may hold after o terminates. Since programs are allowed to be nondeterministic, [ ] and ( ) are distinct. However, there are interactions not only betweenthese modalities and the usual logical notions but also among themselves.For example, if a and b are two prograffis, and they are composedto form a third program c : a; b, the formula [c]A is equivalent to the formul alal[b]A. A more interesting example expressesa fundamental property of the "while do" construct. If B is a formula, and d ts the program "while B do a," then the formul a ldlA is equivalent to the formula
8.
9.
puter ScienceNo. 193, Springer-Verlag,New York, pp. 156-168, 1985. S. Kripke, "Semantical considerations on modal logic," Acta Philos. Fenn. 16, 83-94 (1963).Also in Ref. 14, pp. 63-72. G. E. Hughes and M. J. Cresswell,An Introduction to Modal Logic, Methuen, London, 1968. Z. Manna and A. Pnueli, Verification of Concurrent Programs: The Temporal Framework, in Boyer and Moore (eds.),The Correctness Problem in Computer Science, Academic Press, New York, pp. 215-273, 1982. R. Ladner, "The computational complexity of provability in systems of modal propositional logic," SIAM J. Comput. 6, 467-480 (Le77).
10. W. V. Quine, Referenceand Modality, in From a Logical Point of View, Harper & Row, New York, pp. 139-L57,1961.Also in Ref. 14, pp. L7-34. 11. V. Pratt, Semantical Considerationson Floyd-Hoare Logic, Proceedingsof the SeuenteenthAnnual IEEE Symposiumon Foundations of Computer Science,IEEE Computer Society, Piscatawoy, NJ, pp. 109-L2L, 1976. L2. D. Harel, First Order Dynamic Logic, Lecture Notes in Computer ScienceNo. 68, Springer-Verlag,New York, 1979. 13. L. McCarty, Permissions and Obligations, Proc. of the Eighth I JCAI, Karlsruhe, FRG, 1983, pp. 287-294. 14. M. Fittitg, Proof Methods for Modal and Intuitionistic Logics,D. Reidel, Boston, MA, 1983. 15. L. Linsky, Referenceand Modality, Oxford University Press,New York, 1971. 16. A. Prior, Logic, Modal, in P. Edwards (ed.),Encyclopediaof Philosoph!, Vol. 5, Collier-MacMillan, New York, 1967. R. Penrrn Brooklyn College, City Universitv of New York
(-B &A) v (B &IaltdlA). It is now possibleto expressthe partial correctionassertion {A } a{B}, where A and B are formulas and o is a program. The assertion {A } a{B} says that if the formul a A holds before the program a begins,then B must hold if and when o terminates. This fact can be expressed by the dynamic logic formula A+lalB. Thus, dynamic logic becomesan effective tool for studying the properties of programs. However, it also has a potential for applications in the logic of actions and in formalrzing legal reasonirg, both of which are areas with relevance to AI. SeeRefs. 11 and 12for further reading on dynamic logic and Ref. 13 for an application of dynamic logic to legal reasoning. References 14-L6 contain a more detailed treatment of modal logic and various issues connectedwith it.
BTBLIOGRAPHY
619
MORPHOLOGY Morpholory describesword formation, i.e., inflection, derivation, and compounding.A base form of a word, e.g., reach can be inflected in a paradigm of forms (reaches,reached,reaching), and new words related to it can be producedusing derivational affixes (reachable,reacher, unreachable,etc.) (1). Morphology relies on a lexicon, which contains entries for a set of words. It consists of rules for handling derived and inflected forms by relating them to existing entries in the lexicon. Aspectsof Word Formation
Any AI system using natural language has to recognize in1. C. I. Lewisand C. H. Langford,SymbolicLogic,Dover,Mineola, flected words, but the effort required for implementing a morNY, t932. phological component greatly varies from language to lan2. K. Gddel,CollectedWorks,M. Feferman et al. (eds.),Oxford Uniguage and also dependson the extent of the vocabulary used. -302. versity Press,New York, 1986,pp. 301 Originally published English has an extremely simple inflectional system, and natas "Eine Interpretation des Intuitionistichen Aussagenkalktils," ural-language interfaces with a restricted English vocabulary ErgebnisseeinesMathematischenKolloquiums 4, 34-38 (1933). 3. J. Hintikka, Knowledge and Belief, Cornell University Press, often ignore morphology altogether by listing all distinct word forms in their dictionary (2). Even in the caseof English, proIthaca, NY, L962. nunciation of derivations is phonologically complex enough to 4. J. Halpern and Y. Moses,Knowledge and Common Knowledge in deserve a careful treatment in speechsynthesis and recogniaByzantine Environment, Proceedingsof the Third Annual Symtion (3). Other languageslike German, Swedish,and Russian posium on the Principles of Distributed Computing, Associationof have more elaborate inflectional systeffis,and the listing of all Computing Machin€ry, New York, pp. 50-61, 1984. derivations, compounds,and inflectional forms is impractical; 5. R. Parikh and R. Ramanujam, Distributed Processesand the in Finnish it is quite impossible becauseevery noun can be Logic of Knowledge, inLogics of Programs, Lecture Notes in Com-
620
MOTIONANATYSIS
inflected in 2000-odd distinct forms and every very verb in some 12,000forms. Word formation consists of three parts: specifying the meaning of the resulting entry from the meanings of the components, specifyittg the components (word roots, derivational and inflectional affixes) and the order in which they may be combined with each other, and the shape in which these components are realtzed in the actual written or pronouncedword form. The meaning can be describedin terms of features and values using templates and unification (4). The secondtask defines the morphotactic structure of words and the third task consists of rules governing the phonological and morphophonological alternations, e.g., in English nominal stems (book, bus, sky) may optionally be followed by a plural suffix (+s): book+s bus* s sky* s
books buses skies
The shape of the stem affects the realization of the plural ending (esaftet s, sh, ch, Jc,z, !, otherwise s), and the presence of the plural affects the realization of the stem (the final y shows up as i). MorphologicalAnalysis
Rules are compiled into finite-state machinesby a compiler, or they can be hand coded directly as finite-state automata. There are several implementations of the two-level model at least in Pascal and in most dialects of LISP. These programs accepta lexicon system and a set of rules as input and are then ready for analyzrng word forms of that particular language. Descriptions exist for several languages (English, French, Finnish, Swedish, Old Church Slavonic, Romanian, Polish). The two-level model is bidirectional, i.e., the programs can both analyze and generate word forms using the same rule automata.
BIBLIOGRAPHY 1. P. H. Matthews,Morphology.An Introductionto the Theoryof Word Structure, Cambridge University Press, Cambridg", u.K., r974. 2. T. Winograd, Language a,sCognitiueProcess,Vol. 7, Synfar, Addison-Wesley,Reading,MA, pp. 544-549,1983 3. K. Church, StressAssignment in Letter to SoundRules for Speech Synthesis, Pro ceedingsof the Twenty -Third Annual Meeting of the Association for Computational Linguistics, Morristown, NJ, pp. 246-253, 1985. 4. S. Shieber, The Design of a Computer Language for Linguistic Inforrnation, Proceedingsof the Tenth International Conferenceon Computational Linguistics (Coling84), Association for Computational Linguistics, Morristown, NJ, pp. 362-366, 1984. 5. H. L. Resnikoff and J. L. Dolby, "The nature of affixing in written English," Mechan. Transl. S(3), 84-89 (June f965); 9(2), 23-33 (June 1966). 6. J. B. Lovins, "Development of a stemming algorithm," Mecha,n. Transl. lO 22-31 (1969).
Morphological analysis is often carried out with language-specific procedureswith little reference to linguistic theories. A straightforward method is to proceed by stripping endings from the end of the word form and by tentatively undoing morphological alterations until a stem in the lexicon is reached(5-7). The problem with these methodsis that inflection has to be described in rather artificial terms. The treatment of ambiguous word forms may be defective, compounds 7. M. Kay and G. Martins, The MIND System: The MorphologicalAnalysis Progr&ffi,Memorandum RM-626512-PR,The RAND Corwritten without an intervening space(as in German) need ad poration, Santa Monica, CA, 1970. hoc procedures,etc. Inflectionally simple languages like En8. C. Sloat, S. H. Taylor, and J. Hoard, Introduction to Phonology, glish can successfullybe handled with thesemethods,but comPrentice-Hall, EnglewoodCliffs NJ, 1978. plex languages such as Sanskrit and Arabic are clearly beyond 9. M. Kay, When Meta-rules Are Not Meta-rules, in K. Sparck-Jones their scope. and Y. Wilks (eds.),Automatic Natural Language Parsing, Ellis Within linguistics, it has been taken for granted that morHorwood, Chichester,U.K., pp. 94-L16, 1983. phology should be describedaccording to principles universal 10. K. Koskenniemi, A General Computational Model for Word-Form enough to apply to all natural languages. The formalism of Recognition and Production, Proceedings of the Tenth Internagenerative phonology satisfies this requirement (8). Generational Conferenceon Computational Linguistics (Coling84), Assotive phonology (see Phonemes) has proved to be difficult to ciation for Computational Linguistics, Morristown, NJ, pp. 178implement as an efficient algorithm (see,however, Ref. 9). 181,1984. K. KosKENNTEMT University of Helsinki
GeneralMethodsfor Analysis A computationally feasible general approach to describe the processof word formation is provided by the two-level model (10). It consistsof a lexicon system and a rule component.The lexicon system has a set of lexicons, some for the word roots and others for various classesof inflectional and derivational endings.In addition to lexical representationsof the units, the system lists the coruectsequencesin which they may occur. A linkage mechanism using continuation classeshandles compounding, derivation, and the stacking of various classesof endings. The rule componentdefineshow the lexical representations are realized on the surface.All rules operate in parallel without intermediate stages. The realtzation of sky*s as skies on *:e s.'sand the surfaceis governedby rules like y:i 3. However, much simpler linear algo- tion has to be quite high (typically 1000 x 1000 picture elerithms are available (54), one of which is describedbelow. ments, assuming image-point features can be measured to Assume there are three 3-D point correspondences within one picture element). Theoretical studies or even systematic simulation studies on how the estimation erro,rs dei-Lr213' Pi-P'i, pend on various factors are yet to be made. The situation with the two-camera caseis somewhat better (52). Somesimulation Let rAtl results for the two-camera case are given below to indicate A ft\:Pt tTLt :Pt Ps (74) Ps how redundant point correspondencescan be used to improve mz lpz - p; mz \pz - ps estimation accuracy. The algorithm of Motion form 3-D Correspondences(above) Then, from Eq. (73) requires only three 3-D point correspondences.If more than (75) three point correspondencesare available, the redundancy can ff12 -Rmz f/11 - Rmt be used to improve estimation accuracyin several ways, two of which are adaptive least-squares {. 2) and RANSAC (59). A If com(76) hybrid of the two was used in Ref. 52, from which some trls A ffl1 x trl2 trLs A rT\ X rrl2 puter simulation results are quoted. The imaging geometry is as follows: Two pinhole cameras with focal length 28-mm are (Consider rn; anid m'i as vectors.), then used, and the two image planes are coplanar; each image is 07) 38 mm x 50 mm and has a resolution of 5L2 x 5I2 picture tTLg : Rmg elements. The baseline distance between the two cameras is Combining Eqs. 75 and 77, 400 mm. The 3-D points are chosenrandomly in a cube centeredat a (78) lmt, mi, rn;l Rlmt, rtlz,msf point 3 m from the cameraseach side of which is 0.75 m long. The true motion is a rotation of 35" about an axis through the whence (7e) origin with direction 0.9, 0.3, and 0.316 followed by a translaR _ lmt, rnL, rn|llmt, trl2,ms]-r tion of 0.8, 0.2, and 0.6 m. The simulation is done as follows. The 3-D points before and after the motion are projectedonto and the two images. The image coordinates of these points are (80) quantized (with a resolution of 5L2 x 512). The quantized foti-L,2,3 T-p:-Rp, image points are then used in the method describedin Motion from 3-D Feature Correspondencesto estimate R and 7. That Note that the numerical accuracy of this algorithm is usually improved if normalized (to a magnitude of 1) versionsof rniar'd is, triangulation is done using these quantized image points to obtain the 3-D coordinates of the points, which are then used mi are used in the formulation. Two remarks are in order. First, the abovealgorithm can be in the algorithm describedabove. The errors in the estimated used not only for 3-D point correspondencesbut also for 3-D R and T are due to the inaccuraciesin the 3-D coordinatesof straight-line correspondencesand surface-normal comespon- the points, which are in turn due to the quantization of the dences.In the latter two casesonly two correspondencesare image coordinates.The results are: The average errors (tn 7o) needed.Second,in the pressure of noise in the data (3-D point of 0, rL!,rlz, ns, Lx. Ly, and A'zare, respectively:5.2,2.3, I4.5, coordinates),the matrix R obtained from the above algorithm 8.1, 10.1,30.7, and 10.7with seven3-D point coruespondences may not be a rotation (i.e., orthonormal and with a determi- and 2.2, 1.0, 7.L,3.1, 4.8, L4.9,and 4.4 wtth fifteen 3-D point
MOTION ANALYSIS
correspondences.For each of the two casesthe averages are computedover 100 trials.
center of gravity, which is moving on a polynomial curve (e.g., a parabola) in space.
Multiple Objects. The methodsdescribedin the earlier secHigh-LevelMotion Understanding.In many casesthe ultitions are for a single isolated rigid body. What if the scene mate goal of motion analysis is to come up with a symbolic contains several rigid bodies moving differently (this includes description of the dynamic sceneunder study. A completesysthe specialcaseof a single rigid body moving against a station- tem can conveniently be thought of as comprising two modary but textured background)?Segmentation needsto be done ules. The first module extracts from the observed raw data somewherealong the way. If one is working with the two-view (e.g., &il image sequence),low/intermediate-level features case described in Solution Using Point Coruespondences such as motion and structure parameters. Then the second (above)and if the motions of the rigid bodiesare small from f1 module arrives at a symbolic description of the dynamic scene to t2, the following approach can be tried. by high-level reasoning based on the low/intermediate feaAssuming the motions are small, one can still hope to get tures as well as other a priori information about the scene. correct point correspondences.However, one does not know One can find such complete dynamic scene-analysissyswhich points lie on which objects.This one attempts to find by tems in the literature in the biomedical area. Two excellent a clustering technique. The basic idea is to take all possible examples are Ref. 68, which describesa rule-basedsystem for octets from th. point correspondences,and for each octet com- characterizing blood cell motion, and Ref. 69, which describes pute R and 7 using the algorithm described above under A a system for analyzing the motion of left-ventricle walls. In Linear Algorithm. Then clusters are found in the five-dimen- both casesthe " scenes"are basically 2-D in nature, and therefore the task of the low/intermediate-level module is greatly sional (nt, n2, 0, Lx, Ay)-space.Ideally, each rigid body will grve one cluster. To save computation, one usesheuristics (qv) simplified. For truly 3-D scenesa complete dynamic scene-analysis to reduce the number of octets to consider and perhaps does clustering in subspacesof the five-dimensional space. Obvi- system is hard to construct. The main problem is that the low/ ously, the same approach can be used in the binocular case. intermediate-level features the high-level module needsfor its reasoning may be very difficult, if not impossible, to extract Here, one only has to deal with triplets. In order to handle the multiple-object caseeffectively, con- from the raw data. In fact, the low/intermediate-level module will probably need help from high-level reasoning to improve straints on the scenario should be used wherever possible.A very impressive piece of work in that direction has been done its performance.Someimpressive examplesof high-level modules are Refs. 62,70, and 71. Reference70 describesa system by Adiv (60). that observestraffic scenesand producesnatural-language deNonrigid Objects. Two casesare of particular interest: an scriptions of them. In particular, the system will recognizeand articulated object(i.e.,dh objectcomprising severalrigid parts verbalize interesting occurrences(events)in the scene-€.g., connectedthrough various joints) and an elastic object. Some one car is overtaking another. Reference7I describesan exaspects of motion analysis of articulated objects have been pert system for event identification. The applications considstudied by Asada, Yachida, and Tsuji (61); O'Rourke and ered are simple assembly-linetasks. However,in both systems Badler (62); and Webb and Aggarwal (63).In particular, Webb the low/intermediate-level features neededby the high-level and Aggarwal investigated the case where the rotation axis modules are furnished at least in part by human operators. can be assumed fixed in direction throughout the observed Future Research.To summaruze,the following are imporimage sequence.The same authors (64) have also studied a special caseof elastic objectswhere the object is assumedto be tant researchtopics in motion analysis: locally rigid, which implies an affine transformation between two image planes under local parallel projection. This ap- 1. to find robust algorithms for motion estimation, proach is being extended by Chen (65) to handle general elas- 2. to find algorithms for estimating motion of multiple objects, tic bodies.Finally, Koenderink and VanDoorn (66) are investi- 3. to find algorithms for estimating motion of nonrigid objects, gating the special case of bending deformation. The class of 4. to find algorithms for predicting motion, and bending deformations encompassesall deformations that conserve distances along the surface but not necessarily through 5. to link and coordinate low/intermediate-level and highlevel motion analysis. space. Motion Modeling and Prediction.This entry has been concerned mainly with estimating the motion parameters R and T of an object between two time instants \ and f2 based on image frames taken at these time instants. In most practical problems one is more interested in predicting rather than just estimating motion. In order to predict, one needs a model of the motion that is valid over a number of image frames and contains a small number of parameters that remain constant over these frames. One can first estimate these parameters basedon the first few frames and then use these estimated values to predict future motion and hence where the object will be in future frames. One such approachis describedin Huang, Weng, and Ahuja (67), where the object has a precessional motion around its
BIBLIOGRAPHY Analysls, Springer-Verlag,Hei1. T. S. Huang (ed.),Image Sequence delberg,FRG, 1981. 2. T. S. Huang (ed.),Image SequenceProcessingand Dynamic Scene Analysls, Springer-Verlag,Heidelb€rg,FRG, 1983. 3. S. Utlman, The Interpretation of Visual Motion, MIT Press,Cambridge, MA, L979. 4. IEEE Trans. PAMI, special issue on Motion and Time-Varying Imagery 2(6),493-588 (November1980). 5. IEEE Comput. Mag., special issue on Computer Analysis of TimeVarying Images 14 (8) pp. 7 -69 (August 1981). 6. Comput. Vis. Graph. I*g. Proc., special issues on Motion and
MOTION ANALYSIS Time-Varying Imagery 2L (1 and 2),I-293 (January and February 1983). 7 . Proceedingsof the Workshop on Computer Analysis of Time-Varying ImagetA, Abstracts, University of Pennsylvania, Moore School of Electrical Engineering, Philadelphia, PA, April L979. 8 . Proceedingsof the ACM Worh,shopon Motion: Representationand Perception,Toronto, Ontario, April 4-6, 1983. 9 . Proceedingsof the IEEE Workshopon Motion: Representationand Analysls, Kiawah Island, SC, May 7-9,1986. 1 0 .J. a. Fang and T. S. Huang, A Corner-Finding Algorithm for Image Analysis and Registration, Pittsburgh, PA, August 18-20, pp. 46-49, L982. 1 1 . H. P. Moravec, Obstacle Avoidance and Navigation in the Real World by a SeeingRobot Rover. Ph.D. Dissertation, Stanford University, September1980. L 2 . H. H. Nagel, "Constraints for the estimation of displacementvector fields from image sequenc€s,"Proc. of the Eighth IJCAI , Karlsruhe, FRG, Aug. 8-L2, 1983;pp. 945-951.
1 3 . R. Kories and G. Ztmmermann, A Versatile Method for the EstiPromation of DisplacementVector Fields from Image Sequences, ceedings of the IEEE Workshop on Motion, Kiawah Island, SC, May 7 -9, 1986.
631
in Ref. ture of a Rigid Body Using Straight Line Correspondences, 2, pp.365-394.
27. Y. C. Liu and T. S. Huang, "Estimation of rigid body motion using straight-line correspondenc€s,"Proceedingsof the IEEE Workshop and Analysis, May 7-9,1986, Kiawah on Motion: Representa,tion Island, SC, pp. 47-52.
28. R. O. Duda and P. E. Hart, Pattern Classificationand SceneAnalysis,Wiley, New York, p. 373, 1973.
29. J. K. Cheng and T. S. Huang, "Image registration by matching relational structures,"Patt. Recog.17(1), 149-160 (1984). 3 0 . A. Mitiche, S. Seida, and J. K. Aggarwal, Line-BasedComputation of Structure and Motion Using Angular Invariance, Proceedings of the IEEE Workshopon Motion: Representationand Analysls, Kiawah Island,SC, May 7-9,1986, pp. 175-180.
3 1 .J. P. Gambotto and T. S. Huang, Motion Analysis of IsolatedTargets in Infrared Image Sequences,Proceedings of the Seuenth ICPR, Montreal, Quebec,July 30-August 2, L984. 32. K. Kanatani, "Detecting the motion of a planar surface by line and surface integrals," Comput. Vis. Graph. Img. Proc.29, 13-22 (1e85).
3 3 . T. Y. Young and Y. L. Wang, "Analysis of 3-D rotation and linear shape changes,"Patt. Recog.Lett. 2r 239-242 (L984).
14. J. Q. Fang and T. S. Huang, "Some experiments on estimating the 3-D motion parameters of a rigid body from two consecutive image frames," IEEE Trans. PAMI 6(5), 547-554 (September 1984).
34. J. A. Orr, D. Cyganski, and R. Yaz, Determination of Affine
18. R. Y. Tsai and T. S. Huang, "Uniqueness and estimation of 3-D motion parameters of rigid bodies with curved surfaces,"IEEE Trans. PAMI 6(1), L3-27 (January 198a). 19. B. L. Yen and T. S. Huang, Determining 3-D Motion Parameters of a Rigid Body: A Vector-Geometric Approach," Proceedingsof the ACM Workshopon Motion, Toronto, Ontario, April 1983.
38. J. Limb and J. Murphy, "Estimating the velocity of moving im-
Transforms from Object Contours with No Point Correspondence Information, Proceedings of the /CASSP 85, Tampa, FL, March 26-29, 1985,pp. 24.10.1-4. 15. W. K. Gu, J. Y. Yang, and T. S. Huang, Matching Perspective 3 5 . D. Cyganski and J. A. Orr, 3-D Motion Parametersfrom Contours Using a Canonic Differential, Proceedings of the /CASSP 85, Views of a 3-D Object Using Composite Circuits, Proceedingsof Tampa, FL, March 26-29, pp. 24.9.I-4. the SeuenthICPR, July 30-August 2, 1984. 16. A. Mitiche and J. K. Aggarwal, A Computational Analysis of 36. K. Kanatani, "Tracing planar surface motion from a projection without knowing the correspondence,"Comput. Vis. Graph. I*9. Time-Varying Images,in T. Y. Young and K. S. Fu (eds.),HandProc. 29, L-12 (1985). book of Pattern Recognition and Image Processing, Academic Press,New York, 1985. 3 7 . F. Rocca,TV Bandwidth CompressionUtilizing Frame-to-Frame Correlation and Movement Compensation, in T. S. Huang and 17. H. C. Longuet-Higgins, "A computer program for reconstructing a O. J. Tretiak (eds.),Picture Bandwidth Compression,Gordon and scene from two projections," Nature 293, 133-135 (September Breach,London, L972. 1981).
20. X. Zhu&trg, T. S. Huang, and R. M. Haralick, "Two-view motion analysis: A unified algorithm," J. Opt.Soc. Am. 3(9), 1492-1500 (September1986). 2L. T. S. Huang, Determining 3-D Motion/Structure from Two PerspectiveViews, in T. Y. Young and K. S. Fu (eds.),Handbook of Pattern Recognition and Image Processing,Academic Press, New York, 1985. 22. H. C. Longuet-Higgins, The Reconstruction of a Scenefrom Two Projections-Configurationsthat Defeat the 8-Point Algorithm," Proceedingsof the First Conferenceon Artifi,cial IntelligenceApplications,Denver, CO, Decembet5-7,1984, pp. 395-397. 23. R. Y. Tsai and T. S. Huang, "Estimating 3-D motion parametersof a rigid planar patch," IEEE Trans. ASSP 29(7),IL47-L152 (De-
agesin TV signals," Comput.Graph.Img. Proc.4,3LL-327 (1975).
39. B. K. P. Horn and B. G. Schunck, "Determining optical flow," Artif. Intell. 17, 185-203 (1981).
40. J. D. Robbins and A. N. Netravali, RecursiveMotion Compensation: A Review, in Ref. 2.
4r. C. Cafforio and F. Rocca,The Differential
Method for Image Mo-
tion Estimation, in Ref. 2.
42. H. H. Nagel, "Displacement vectorsderived from 2nd-orderintensity variations in image sequences,"Comput. Vis. Graph. Img. Proc. 21, 85-117 (January 1983).
43. E. C. Hildreth, The Measurement of Visual Motion, MIT Press, Cambridg", MA, 1984. 44. K. Prazdny, "Egomotion and relative depth map from optical flow," Biol. Cybernet.36, 87 -102 (1980). 45. X. Zhuang, R. M. Haralick, and J. S. Lee, "Rigid body motion and the optic flow under a small perturbation," IEEE Trans. PAMI, in press.
46. H. C. Longuet-Higgins, "The visual ambiguity of a moving plane," cember1981). Proc. Roy.Soc. SeriesB 223(D, 165-170 (1984). 24. R. Y. Tsai, T. S. Huang, and W. L. Zhu, "Estimating 3-D motion parameters of a rigid planar patch, II: Singular value decomposi- 4 7 . A. M. Waxman and S. Ullman, Surface Structure and 3-D Motion tion," IEEE Trans. ASSP 30(4), 525-523 (August 1982); correcfrom Image Flow: A Kinematic Analysis, CAR-TR-?A, CS-TRtion, 31(2), 5L4 (April 1983). L332, Center for Automation Research,University of Maryland, October 1983. 25. R. Y. Tsai and T. S. Huang, "Estimating 3-D motion parametersof a rigid planar patch, III: Finite point correspondencesand the 48. A. M. Waxman and K. Wohn, Contour Evolution, Neighborhood three-view problem," IEEE Trans. ASSP 32(2), 2I3-22A (April Deformation, and Global Image Flow: Planar Surface in Motion, 1984). CAR-TR-58, CS-TR-L394,Center for Automation Research,University of Maryland, April 1984. 26. B. L. Yen and T. S. Huang, Determining 3-D Motion and Struc-
632
MS. MALAPROP
49. A. M. Waxman and K. Wohn, Contour Evolution, Neighborhood Deformation and Image Flow: Textured Surfaces in Motion, in W. Richards and S. Ullman (eds.), Image (Jnderstanding ,94. Ablex, Norwood,NJ, 1984. 50. T. S. Huang and R. Y. Tsai, Image SequenceAnalysis: Motion Estimation, in Ref. 1. 51. T. S. Huang, Three-Dimensional Motion Analysis by Direct Matching, ConferenceDigest, Optical Society of America Topical Meeting on Computer Vision,Incline Village, Nevada, March 2022, L985,pp. FAI-I to 4. 52. T. S. Huang and S. D. Blostein, Robust Algorithms for Motion Estimation Basedon Two Sequential StereoImage Pair s, Proceed,ings of the Conferenceon Computer Vision and Pattern Recognition, San Francisco,CA, June 10-18, 198b. 53. H. H. Chen and T. S. Huang, Maximal Matching of Two 3-D Point Sets, Proceedings of the Internationql Conferenceon Pattern Recognition, Paris, France, October 27-.31,1986. 54. S. D. Blostein and T. S. Huang, Estimating Motion from Range Data, Proceedingsof the First Conferenceon AI Applications, Denver, CO. December1984, 55. O. D. Faugeras and M. Hebert, A 3-D Recognitionand Positioning Algorithm Using Geometrical Matching between Primitive Surfaces,Proceedingsof the Eighth International Joint Conferenceon Artificial Intelligence, Karlsruhe, FRG, August 1989, pp. 996LTO?. 56. T. S. Huang, S. D. Blostein, and E. A. Margeruh, Least-squares Estimation of Motion Parameters from 3-D Point Correspondences,Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, Miami Beach, FL, Jun e 28-26, 1986. 57. J. Q. Fang and T. S. Huang, "solving 3-D small-rotation motion equatiors," Comput. Vis. Graph. Img. Proc.26, 189-296 (1984). 58. J. Q. Fang and T. S. Huang, "Some experimentson estimating the 3-D motion parameters of a rigid body from two consecutiveimage frames," IEEE Trans. PAMI 6(5), 547-55b (September1984). 59. M. A. Fischler and R. C. Bolles, "Random sample consensus:A paradigm for model fitting with applications to image analysis and automated cartography," CACM Z4(G),881-g9b (June 1g81). 60. G. Adiv, "Determining 3-D motion and structure from optical flow generated by several moving objects,"IEEE Trans. PAMI 7(4), 384-401 (July 1985). 61. M. Asada, M. Yachida, and S. Tsuji, "IJnderstanding of B-D motions in blocks world," Pattern Recognition,l7(L), 57-84 (1984). 62. J. O'Rourke and N. Badler, "Model-basedimage analysis of human motion using constraint propagation,"IEEE Trans. ?AMI z, 522-536 (1980).
70. B. Neumann, Natural Language Description of Time-Varying Scenes,Bericht No. 105, FBI-HH-B-105/84, August 1984, Fachberich Informatik, University of Hamburg, FRG. 7L. G. C. Borchardt, A Computer Model for the Representationand Identification of Physical Events, Technical Report T-L42, Coordinated ScienceLaboratory, University of Illinois, Urbana, IL, May 1984. T. S. Huamc University of Illinois The preparation of this entry was supported by Scientific Services Program, Battelle Columbus Laboratories contract DAAG29-81-D0100.
MS. MATAPROP This is a natural-language-understanding (qv) system in which the inference (qt) process is directed by frame-structured knowledge. Knowledge about mundane situations is structured in a modular hierarchy of frames, thereby allowing sharing of information between frames (see Frame theory). Developedaround L977 by Charniak at the University of Geneva (see E. Charniak, Ms. Malaprop, a Language Comprehension Progr&ffi, Proceedings of the Fifth IJCAI, Cambridge, MA, pp. L-7, L977). K. S. Anone SUNY at Buffalo
MULTISENSOR I NTEGRATION
Object-recognition systems using single sensors(typically vision) are still limited in their ability to correctly recognize different three-dimensional objects.By utilizing multiple sensors, in particular vision (qt) and touch, more information is available to the system. This entry is an attempt to show the utility of multiple sensorsand explore the problems and possible solutions to converging disparate sensory data for object recognition (seealso Color vision; Motion analysis; Proximity sensing;Sensors). 63. J. A. Webb and J. K. Aggarwal, "structure from motion of rigid Humans are able to make use of multiple-sensor input to and jointed objects,"Artif. Intell. 19(1),107-190 (1989). perform such tasks as object recognition very easily. In trying 64. J. A. Webb and J. K. Aggarwal, "Shape and correspondence," to recognize an object, one is able to integrate color, motion, Comput. Vis. Graph. Img. Proc. 21, t45-100 (1989). touch, shape, and language. The disparate kinds of informa65. S. S. Chen, Shape and Correspondenceof Nonrigid Objects,Protion supplied by these sensors is somehow able to converge ceedingsof the IEEE Workshop on Computer Vision, Bellaire, MI, into a coherent understanding of the objects perceived in a October15-18, 1985, scene.Robotic systems (see Robotics)will eventually have to 66. J. J. Koenderink and A. J. Van Doorn, Depth and Shape from incorporate this multisensor capability (1). Single robotic senDifferential Perspective in the Presenceof Bending Deformation, sors [e.9.,vision and, even more so,touch (2)] are yet to be well Preprint, Department of Medical and Physiological Physics, understood and utilized on anything approaching a human Princetonplein 5 Utrecht, The Netherlands, 1985. scale. So why should one bother with more than one sensor? 67. T. S. Huang, J. weng, and N. Ahuja, 3-D Motion from Image The answer is that multiple sensorscan provide information Sequences:Modeling, Understanding, and Prediction, Proceedthat is difficult to extract from single-sensorsystems.Further, ings of the IEEE Workshop on Motion: Representationand Analymultiple sensorscan complement each other to provide better sis, Kiawah Island, SC, May 7-9,1986, pp. 125-130. 68. M. D. Levine, P. B. Nobel, and Y. M. Youssef,A Rule-BasedSys- understanding of a scene. Similar issues are addressedby Henderson and Fai (3). The additional cornplexity posed by tem for Charactefizing Blood Cell Motion, in Ref. 2. multiple-sensor environments is tempercC by the great re69. J. K. Tsotsos,J. Mylopoulos, H. D. Corvey, and S. w. Zucker, "A wards in resolving the ambiguity that more than one sensor framework for visual motion understanding," IEEE Trans. PAMI, 2(6), 563-573 (November 1980). can bring.
INTEGRATION 633 MULTI-SENSOR The utilization of multiple sensorspresents five important issues for object recognition. They are representations for object models, organization of the database of models, accessingthe databaseof models, strategies for using sensors,and convergenceof sensory data. This entry is an exploration of these issues involved in converging multiple-sensor data for object recognition. Needless to say, these issues are of interest to both the robotics and AI communities. Recent proceedingsof conferenceson AI, as well as on automation and robotics are a manifestation of this point. The discussion focuses on possible solutions to these problems using as an example the limited domain of the kitchen. for Obiect Models Representations In model-based object recogRition the sensed data must be related to the object models at hand. If a multiple-sensor environment is postulated, one needsto have multiple representations of objects.The nature of the data sensedfrom vision and touch is quite distinct and suggestsdifferent representational models at work (2,4). Many different object model representations have been used in the past, including generalized cylinders (qv) or cones (5-7), polyhedra (8), and curved-surface patches (3). These systems, in general, try to compute these primitives alone from the senseddata. A major difference between systemsis the richness of the models in their databases. Systems that contain large amounts of information about object structure and relationships reduce the number of false recognitions. All of these systems are discrimination systems that attempt to find evidence consistent with a hypothesized
model and for which there is no contradictory evidence (9). Although the approach discussed here is similar, it is not based on a single primitive but on multiple features and surface properties that the sensorscan derive. The models used contain geometric, topological, and relational information about the objects to be recognized. Semantic discrimination similar to the net described in Ref. 10 is also an important candidate for inclusion in such a multiple representation system but is beyond the scopeof this entry. In human perception this discrimination is done in many different ways. One might perceive a unique feature, shape,or topolory that will lead down a path of recognition (seeFeature extraction; Shape analysis). It is not clear a priori what the path will be. For this reason one cannot rely on a single representation from a single sensor as the mechanism for recognition. Rather, one must leave the system open and available to follow any representational avenue presented to it from the multiple-sensed data. It is important that one tries not to impose arbitrary hierarchies on these representations that will limit the strategies that can be used with the sensors.One should be as aggressive and opportunistic as possible in exploring multiple paths toward recognition. SensorEnvironment.Multiple sensorsprovide the opportunity to discriminate between objectsbasedon features that are derived from different sources. The sensing environment shown in Figure 1 and describedin detail in Ref. 11 consistsof a stereo pair of CCD cameras along with a robot manipulator containing a tactile sensor (see Stereo vision; Manipulators). The tactile-sensormounted on a finger and on a hand is shown in Figures 2a and 2b, respectively. Recognizingthe strengths and weaknessesof each sensorsystem is important in order to effectively utihze them (seeFig. 3). The camerasare capableof extracting sparse three-dimension (3-D) data from the scene. The robot manipulator receives feedback from the touch sensor, allowing it to trace surfaces subject to varying sets of
Data base
V is io n system
Control system
T h e h an d / a r m system
Output description P U M A5 6 0 + P A H H
Stereopair System overview Figgre
1. System overview.
634
MUTTI.SENSOR INTEGRATION
parallelepipeds, and orthogonal slices across the object. Although these descriptors are by themselves not sufficient for recognition, their use with other sensedfeatures allows further discrimination among competing models.Failure to find a gross shape description will cause a greater space to be searchedbut will not prevent recognition since multiple pathways of sensory recognition are available. Surface properties are an extremely useful discriminator and are especially important with touch sensing.By modeling the objects as collections of surfaces,it is possibleto match a rich set of surface descriptions. The surface characteristics to be computed are area, curvature (including surface cavities), and 3-D moments (if a closedsurface).Area is a weak discriminator but will add support to hypotheses.The other measures are much stronger surface characteristics and will narrow the range of possibilities greatly. Holes are distinguishable features that can be measured with touch. Vision processing can hypothesize holes, and touch-sensing can be used to verify their existence. Further, touch sensing can quantify the holes to aid in matching. This approach emphasizescomputlng as many features as possible from the different sourcesto come up with a consistent set of interpretations of the data. Tl]. features range from weak descriptors like gross shape and area to specificdescriptors such as surface curvature, holes, and cavities. The conjunction of this senseddata will lead to a correct interpretation. It is important to note that the measures are three-dimensional. This allows one to utilize 3-D features of objectsrather than projective features, as Lowe does (6). Organizationof the Model Database
Figure 2. The tactile sensoron (o) a fingerand (b) a hand.
constraints. The vision is passive in nature and fast, with large bandwidth, whereas the touch-sensor is slow, has low bandwidth, and must be actively controlled. Vision is subject to the vagaries of lighting, reflectance, and occlusion.Touch, however, can feel occludedsurfacesand report back 3-D world coordinates as well as surface normals. Featuresfor Discrimination.Given this sensing environment, one should be able to derive features for discriminating among objects.The desirable features for the model are gross shape descriptors and surface properties and topology. Gross shape descriptors are important becausethey limit the search spacewithin the database(LZ).The most important gross feature to be distinguished is the planarity of an object. If an object appears to be planar, there are different representations and modeling techniques available for recognition than if it is three-dimensional. Determining if an object is planar is difficult with vision alone, but by tracing acrossthe object and analyzing the 3-D data, one can easily determine planarity. A tactile trace acrossthe contour of the object in two orthogonal directions allows one to interpolate a surface and test for planarity. In the kitchen domain planarity is irnportant in distinguishing flatware and plates from 3-D objects such as cups and glasses.Gross shape descriptors can be further extended to 3-D objectsby computing volumes, bounding
If multiple representations are used as stated above, these representations need to be organized in a coherent way for accessto the model information. The important points to consider here are the relationships between different representations and allowing these different representationsto converge (as discussedbelow). The databaseof objectmodelsconsistsof object records.Each object record contains a vector of features that can be sensed.An object record contains the following information: object name, gross shape, list of surface descriptions that comprisethe object, list of holes, list of cavities, and list of boundary curves joining the surfacesin the record. Gross Shape. Gross shape properties in the object record include the volume of the object, a measure of its planarity, and a description of its bounding rectangular parallelepiped. These properties allow a coarse filtering of the objects to be recognized. SurfaceDescriptions.The surface descriptions may be planar, quadric, or bicubic in nature, allowin g a wide variety of surface models to be used (as from CADiCAM systems).The objects in the database are modeled as collections of these surfaces. Besides a parameterized description of the surface, the surface's area and locations of curvature maximum and minimum are included. If the surface is a closedsurface that contains a volume, a 3-D moment set is also included for the surface. An example would be the handle of a cup. This is a very powerful feature becauseit provides the center of mass of the enclosedvolume of the surface and the moments of inertia, which form an orthogonal basis (13). The center of mass allows one to find the translational parameters that take one from
99 qJ
@
O \o
C-.|
r'\ CT
t . 1
C\
t.
co
L I
q lrJ
a
a E
o
qJ
7. TJ C o E o0 o a
r..l
tf (! a C' td
o
)r F
c o
TJ
z
H
o
TJ
U .-,
O !J
(0
F
b0 o a
(! O{
fi!
ct Id
o b.
F{
!
x
t=
b0
t1
o
'r{
c) E b0 o (n
tJ
$ q,
.l{
o
!
o 7.
.F{
--.,i
o
q-.r
tl
tl -t
O{
v1
'rJ
-o F
o X
>\ a
co
\T
O r\
-r C
g
c.l
O
ca
co
r , 1
o t!
ca
U)
:t
(n
crf
rn
o\
C'J
cn
(-, (-
@
r-1
tn
V)
7-
oo (n
q)
b{ \u a
7.
F
{tJ
(rl 0)
rl cJ qr !
.Fl
'rt
0) E
+J rl r-J frl
o0 .1)
!
qJ
q)
bo
E co
q)
q)
(n
a
!/ r.l
--1
X
OJ
..1
rn
tll
a
0,
.F{
-o
ftl r+l
! t
r--l a)
0) a
! .1
q,
.--l !J
arl
(n
q)
(n
(-
b0 5 'r{
q)
'rl
frl
I'r .1
lrJ
o
o0
r--J rr{ tl-{
cd q-]
(|..r tJ 5 a
t1-{ E
H
c,
U (0
G) -c
fl
o !
z
z
(/)
U
.F.{
cd \+-l L a a
t1 -,1
a{
U
X
tr
TJ
{J
!J
H
c q)
fil
a^)
O r,1
J
q,
z
F
qJ
q)
.r{
H
!
'z f
0, .o
q)
qJ 'l1
!J
€
0)
bl CN
l-r rYl
! -.t
5
a) a
=
o) )*-a
IJ
a
E
-q)
!
(n
n l
o
c.)
a
tt
€ c) ro o
E
+) I o
5 o co q)
ca
()
r-O
-t
a
(,
..
.2,
JJ
0 'r{
..J
+J U
+r f-{
U
(-)
O
a
td c-n
0)
(u a +) k
'-i
rt)
t! a
@
a
qJ
+-, rt
@
AI
@
a
E o 7
-{ q,
o0 0) o
o
'r{
H
0) r{
o
(t .r{ X
t'.r 4J
a qJ U
€ )
qr
-
TJ
CJ
Lr
q)
co ca
-.t
o\ :a
a
o U)
'lJ
7 4-.,
@
x H
co TJ
b0 o
a)
a)
q) U (!
o.
r-l-{
.F.l
E
€
c\
C{
a
fi t-
co
7,
!J
).
a 0) U rn t+-{ f-{
tJ 'r{
t-t
q,
l'1
b0
q)
O
.r{ 1J
a. E
-.|
'r{
(n
(, c0
U 'r.{
a
l.J n
o U
q)
!
CC q-{
n
!
E
rn
!
(n
TJ
c\^l
o E bl a
co
d
'----
o
ftl
.|-J l.J
c) .r-)
-)
E
-f,
Lrl
\o
rlJ a
frl a
t! a
c..|
a.l
u)
(n
c^) O a
a
O U)
a
O
(-
b0
H
IJ
H
'r{
.i{
U
T,
qJ '--1
-1
o C
U)
-
a
TJ
l) a q) 'rl
'.{ !J
&
o c.
r-.t
ri c)
q) t-. a
dl r l
:c
(,) o
J r--{
+1 Lr a CN
rFl
-l
q)
..1 l.J
.-{ q,l
t!
t-r a
('
nl f
l
o r-l-{ t-.
..'{ U lr 'r{
a c,) o U! !
!
a
a!
IA 'r{ q)
z
(^
O O
l.i
b0 .n -U
r l
Q
ca
co
E a)
d
nl
(-
tr
q)
c) rj-a q-t
Q
tJ
q)
@
U 0,) .r-.)
F{
IJ
'r{
Ll-{
E r)
b0
z
7"
{J
a o U
E
c, 1,
7.
IJ
tr
X
dl
q)
F{
tr
an
b0 .t
-/\
NATURALLANGUAGEGENERATION
ence later decisions.In this instance the abstract linguistic properties doing the constraining are not already bundled and formed as a phase structure but are distributed as a feature space. The overall specificationof the text is determined in recursive layers top-down, as it is in nearly all of the approaches (the prime exception being systemsthat use phrasal lexicons). Features are accumulated at a given level, e.g., the main clause of a sentence,until all of the aspectsin which clauses can vary have been consideredand the options settled. During this phase the issue is what functions are appropriate for the clauseto carry out given the situation and the speaker'sintentions; with those determined, the functional features are realized as a group and specify the clause's form. That form now createsan environment for the constituents of the clause.The determination of what functions each of them should serve is then carried out and, when completed,will lead to the reahzation of their forms, which in turn will lead to a functional analysis of their own constituents,and so on recursively until the constituents are words, at which point the text is read out as it would be from the description constructed with a FUG. As a linguistic tradition, systemic grammar owes its form and perspectiveprincipally to one person,Halliday (57), who was himself influenced by the London Schoolof functionalism lead by Firth (47). The influence of systemic grammar on generation research is considerably wider than just the systems that employ it directly since it is the sole well-known linguistic formalism that has as its very basis the identification of the choicesimplicit in a language. Choices form the notational basis of systemic grammars, which, like ATNs, are written as traversable graph structures that define the spaceof possible control flow for at least the linguistic portion of the generation process.The very small fragment of a grammar shown in Figure 3 illustrates how the graph is formed. Choice systems are given either as AND paths (leading curled brace), where one choicemust be made from each of the systems named on the right, or as OR paths (leading square brace), where only one of the alternative features listed may be selected.The selection of a feature opensthe system that it names (note:the feature will be the leftward "root node" of the tree on its side that constitutes a system within the network), which means that a choice from that system must now be made. Choices continue as the locus of control moves left to right through the network (usually simultaneously active in several choices at once due to the presenceof the AND systems), until a rightmost system is reached that consists of a
bare feature without an accompanying system. These rightmost nodes are the concrete elements from which specifications of form are built up. Leftward-pointing curled braces indicate path mergers in the control flow, where decisionsin disjoint systems have a combined influence. Two important generation systemshave been basedon systemic grammar, Davey's PROTEUS (a) (discussedearlier) and Mann and Matthiessen'sNIGEL (1,58).NIGEL is the largest systemic grammar in the world and very likely one of the largest machine grammars of any sort. Besidesthe quite important contribution simply of articulatin g a systemic grammar so thoroughly, Mann and Matthiessen have developedan original technique for formalizing the usage criteria that govern the choices the grammar defines (59). A set of criterial predicatesare defined for each choice system in the grammar, which act as functions from the internal state of the planner and underlying program to features. The generation processis carried out by starting at the leftmost entry system of the nextwork and applying successive"chooser"proceduresto determine the path through the network (i.e., the feature set) that best captures the speaker's intentions. Other ResearchAreas The field of natural-language generation, even as seenonly by researchers in AI, is considerably larger than this entry has been able to accommodate.Two areas must at least be mentioned in passing. Planning.Pioneering work by App elt (2,L4) supplied a rigorous logical framework by which to encodebasic notions such as intention and reference. His planning technique, the progressive elaboration of goals through the use of Sacerdoti's procedural networks formalism (60), builds on a tradition of viewing the articulation of a generator's goals by chaining backward from fundamental communicationsgoals (49,61). From a complementary direction, McKeown has presented a theory of the organi zation of paragraphs into groups of conversational moves (8), drawing on earlier work by Grimes (62). She employs paragraph schemasas reahzations of high-level moves such as "compare and contrast." The schemas act as templates to organize the content selection and rhetorical structuring that the planner does. PsycholinguisticTheory. Once there are generation systems that have a significant capability, it becomespossible to consider deliberately chosenrestrictions on the power of the vir-
Transitivity Mood .
. imperative nominal subject
[*"'"' Clausel
Ir"ro'-ation LrVrinnrn
Theme
I
Cohesisn-
I |rn"-"tization
653
I
) cohesiv" {-fAnaPhoric -tlliptical | tl r coniunct Noncohesive -L Norr.o.rjunct
Figure 3. (from Ref. 26).
654
NATURATLANGUAGEGENERATION
tual computational engine underlying the system's capacity. Such restrictions gain the possibility of providing an explanatory account of aspects of the human generation processby appealing to intrinsic properties of the machine that make it impossible for its behavior to be otherwise. There has been work toward this end by Kempen and Hoenkamp for restarting phenomena (51) and by McDonald for an account of people's fluency and lack of grammatical error and certain classes of speecherrors (50). Generation is a young research area. It is populated by a vigorous, mutually identifying group of researchers that is growing at an ever-increasing rate. The intellectual climate within the generation community is not unlike that of the language-understanding community of about L974, with a roughly similar number of players and a similar feeling in the air of significant things happening. There is every reason to believe that the further developmentand contributions of generation research to AI as a whole in the next L2 years will be every bit as large as the contribution of understanding research in the last 12.
BIBLIOGRAPHY 1. W. Mann and C. Matthiessen,Nigel: A SystemicGrammarfor Text Generation,in Freedle(ed.),SystemicPerspectiues on Discourse:SelectedTheoretical Papers of the Ninth International Systemic Workshop,.lrblex, Norwood, N.J., 1985. 2. D. Appelt, Planning English Sentences,Cambridge University Press,Cambridge U.K., 1985.
16. W. Mann, M. Bates, B. Grosz,D. McDonald,K. McKeown, and W. Swartout, "Text generation: The state of the art and literatur€," JACL 8Q) (1e82). L7. T. Winograd, Understanding Natural Languag€, Academic Press, New York, L972. 18. K. Forbus and A. Stevens,"Using qualitative simulation to generate explanatiohs," Proc. of the Third Annual Conf. of the Cog. Sci. Soc.,Berkeley, CA, August, 1981,pp. 2L9-22L. 19. C. Frank, A Step Towards Automatic Documentation, MIT AI Laboratory WP-213, 1980. 20. B. Sigurd, Computer Simulation of SpontaneousSpeechProduction, Proceedings of COLING, Stanford, CA, July 1984. 2L. D. McDonald, Subsequent Reference: Syntactic and Rhetorical Constraints, in Theoretical Issuesin Natural Language Processing .I/, Association of Computing Machin€ry, New York, pp. 38-47, 1978. 22. R. Granville, Controlling Lexical Substitution in Computer Text Generation, Proceedingsof the COLING, Stanford, CA, pp. 381384, 1984. 23. W. Woods, "Transition network grammars for natural language analysis," CACM 13 J0), 591-606 (1970). 24. S. C. Shapiro, "Generalized augmented transition network grammars for generation from semantic networks," JACL 8(1), 12-25 (1e82). 25. G. Brown, Some Problems in German to English Machine Translation, MIT LCS TR L42, L974. 26. M. A. K. Halliday and J. Martin (eds.),Readings in Systemic Linguistics, Batsford Academic,London, 1981. 27. D. McDonald, A Preliminary Report on a Program for Generating Natural Langu &ga,Proceedingsof the Fourth IJCAI, Tbilisi, Georgia, pp. 401-405, L975. 28. S. Shapiro, "Generation as parsing from a network into a linear string," JACL Fiche 33, 45-62 (1975).
3. M. Bates and R. Ingria, Controlled Transformational Sentence Generation, Proceedingsof the ACL, Stanford, CA, 1980. 4. A. Davey, Discou.rseProduction, Edinburgh University Press,Ed29. B. Bruce, Generation as Social Action, Proceedingsof TINLAP-1, inburgh U.K., 1979. ACM, pp.74-78, 1975. 5. J. Clippinger, Meaning and Discourse:A Computer Model of Psy30. J. Clippinger, Speaking with Many Tongues: Some Problems in choanalytic Speech and Cognition, Johns Hopkins Press, BaltiModeling Speakersof Actual Discour se,Proceedingsof TINLAP -I , more, MD, 1977. ACM, pp. 68-73, L975. 6. K. Kukich, Knowledge-BasedReport Generation: A Knowledge 31. N. Goldman, The Boundaries of Language Generation, ProceedEngineering Approach to Natural Language Report Generation, ings of TINLAP-I, ACM, pp. 74-78, L975. Ph.D. Thesis, Information Science Department, University of 32. W. Swartout, A Digitalis Therapy Advisor with Explanations, Pittsburgh, 1983. MIT LCS Technical Report, Cambridge,MA, L977. 7. D. McDonald, Description-Directed Natural Language Genera33. W. Clancey, Tutoring Rules for Guiding a Case Method Dialog, tion, Proceedingsof the Ninth IJCAI, Los Angeles, CA, pp. 799Proceedingsof IJMMS //, pp. 25-49,1979. 805, 1985. 34. D. Chester, "The translation of formal proofs into English ," Artif. 8. K. McKeown, Text Generation,Cambridge University Press,CamIntell. 8(3), 26L-278 (1976). bridge, U.K., 1985. 9. N. Goldman, Conceptual Generation, in R. Schank (ed.),Conceptual Information Processing, North-Holland/Elsevier, Amsterdam, pp. 289-372, L975.
35. D. McDonald and J. Pustejovsky,TAGs as a Grammatical Formalism for Generation, Proceedingsof the ACL, Chicago,July 1985, pp.94-103.
10. D. McDonald, Natural Language Generation as a Computational Problem: An Introduction, in M. Brady and R. Berwick (eds.), Computational Models of Discou,rse,MIT Press, Cambridge, MA, pp. 209-266,1983. 11. W. Mann and J. Moore, "Computer generation of multi-paragraph English text," JACL 7(L) (1981). t2. R. Wilensky, Y. Arens, and D. Chin, "Talking to UNIX in English: An overview of UC," CACM, 577-593 (June 1984). 13. H. Thompson, Strategy and Tactics: A Model for Language Production, Proceedingsof the Chicago Linguistic Society,L977. L4. D. Appelt, Problem Solving Applied to Language Generation,Proceedingsof the ACL, Philadelphia, PA, pp. 59-63, 1980. 15. L. Danlos, Conceptual and Linguistic Decisions in Generation, Proceedingsof the COLING, Stanford, CA, pp. 501-504, 1984.
36. M. Kay, Functional Unification Grammar: A Formalism for Machine Translation, Proceedingsof COLING, Stanford, CA, July 1984,pp. 75-78. 37. E. Hovy, Integrating Text Planning and Production in Generation, Proceedingsof the Ninth IJCAI, Los Angeles, August 1985, pp. 848-851. 38. J. Becker, The Phrasal Lexicort,Proceedingsof TINLAP-I, ACM, pp. 60-64. Also as Bolt Beranek and Newman Report 3081,Cambridge, MA. 39. P. Jacobs,PHRED: A Generator for Natural Language Interfaces, Berkeley Computer ScienceDepartment, TR 85/198, 1985. 40. J. Friedman, "Directed random generation of sentences,"CACM t2(6), 40-46 (1969). 4L. V. H. A. Yngve, A Model and a Hypothesis for Language Struc-
NATURAL-IANGUACE INTERFACES 655 precise. (The problem is well illustrated by recent fierce debates about whether chimpanzeesthat have been taught some sign language are really using language.) For most of the history of the human race, the only entities using "natural" language have been human, so it is difficult to separatelinguistic capabilities from other human capabilities such as memory, reasoning (qv), problem solving (qv), hypothesis formation, classification, planning (qv), social awareness, and learning (qv). This makes it sometimesdifficult to distinguish between 45. S. Bossie, A Tactical Component for Text Generation: Sentence a natural language interface (NLI) and the underlying system Generation Using a Functional Grammar, University of Pennsylto which it is an interface. On one hand, one doesnot want to vania, TR MS-CIS-81-5,1981. require a computer to have all of these capabilities before say46. F. Danes, Papers on Functional SentencePerspective,Academia, ing that it can use language; on the other hand, without these CzechoslovakianAcademy of Science,L974. capabilities, any computer system will use language differ4 7 . J. R. Firth, Papers in Linguistics 1934-1951, Oxford University ently than human beings do and thus is open to the charge Press,Oxford, U.K., 1957. that it is not really using language. 48. ReferenceL4, p. 108. Without defining precisely what NL understanding is, most people would accept Woods's(1) statement that "natural lan49. R. Power, "The organisation of purposeful dialogu€s,"Linguistics 17, L07-151 (1979). guage assumes understanding on the listener's part, rather 50. D. McDonald, Description Directed Control: Its Implications for than mere decoding. It is characterized by the use of such Natural Language Generation, in Cercone (ed.), Computational devices as pronominal references, ellipsis, relative-clause Linguistics, Plenum, New York, pp. 403-424, 1984. modification, natural quantification, adjectival and adverbial bl. G. Kempen and E. Hoenkamp, Incremental SentenceGeneration: modification of concepts, and attention-focusing transformaImplications for the Structure of a Syntactic Processor,Proceed' tions. It is a vehicle for conveying concepts such as change, ings of COLING, Prague, August 1982. location, time, causality, purpose, etc.,in natural ways. It also 52. P. Jacobs,A Knowledge-BasedApproach to Language Production, assumesthat the system has a certain awarenessof discourse Berkeley Computer ScienceDepartment, TR 86/254, 1985. rules, enabling details to be omitted that can be easily in53. W. Swartout, personal communication, Information SciencesInferred." This charactenzation absolutely excludes systems stitute, Los Angeles, JulY 1984. that merely use English words to replace symbols in what 54. R. Simmons and J. Slocum, "Generating English discoursefrom would otherwise be an "unnatural" language. semantic networks," CACM 15(10),891-905 (1972). Human conversational partners share a lot of information, 55. J. Slocum, Question Answering via Cannonical Verbs and Seman- can model one another's knowledge and capabilities, can protic Models: Generating English from the Model, University of cess huge amounts of information (even conflicting informaTexas, Department of Computer Science,TR NL-23, 1973. tion), and can update all of these structures in amazing detail 56. S. C. Shapiro, The SNePS Semantic Network ProcessingSystem, as the conversation progresses. Computers are currently a in Findler (ed.), AssociatiueNetworks, AcademicPress,New York, long way from having this very genial, very powerful, very 1979. broad-basedlanguage capability. Fortunately, a more limited 57. M. A. K. Hatliday, "Notes on transitivity and theme in English," language capability will suffrcefor many applications, and huJ. Ling. 3(1),37-81 (1967). mans can easily adapt at least some aspectsof their language b8. W. Mann, The Anatomy of a Systemic Choice, Information Scibased on their knowledge of their conversational partner. If encesInstitute TR/RS-82-104,L982too much adaptation is required however, the communication 59. W. Mann, Inquiry Semantics:A Functional Semanticsof Natural becomesunnatural even if it is conducted in English. One Language, Information SciencesInstitute TR/RS-83-8,1983. must carefully distinguish between natural-language com60. E. Sacerdoti,A Structure for Plans and Behauior, Elsevier Northmunication, natural communication (which may use language Holland, Amsterdam, L977. or not, and requires no learning by the user), and user-friendly 61. P. Cohen, On Knowing What to Say: Planning SpeechActs, Uniinterfaces (which generally do not use language and are easy versity of Toronto, TR 118, 1978. to learn but are not necessarily natural). 62. J. Grimes, The Thread of Discou.rse,Mouton, The Hague, 1975. A critical obstacle to the use of many computational re63. R. Brown, LJseof Multiple-Body Interrupts in DiscourseGenera- sources such as database-managementsystems (DBMS) and tion, Bachelor's Thesis, MIT, Department of Electrical Engineerdecision support systems (DSS) is the mismatch between the ing and Computer Science,1974. needsof users and their ability to communicate these needsto 64. H. K. T. Wong, Generating English Sentencesfrom Semantic the computer. The developmentof graphical interfaces such as Structures, University of Toronto, Department of Computer Scispread-sheet systems, menu systems, and the "electronic ence,TR 84, 1985. desktop" are important steps toward improving the interface D. D. McDoNALD for a class of stereotyped, semirepetitive tasks. For many University of Massachusetts tasks, however, greater flexibility is needed, and NLIs can provide this capability to a wide range of users. In the area of NATURAL-LANG UAGEI NTERFACES DBMS, such interfaces allow users who are unfamiliar with the technical characteristics of the underlying database-manThe term natural language (NL) is very deceptive. Everyone agement system to query a database using typed English inhas an intuitive feel for what it means to communicate in put. The output is usually plain data, a statistical summary, or natural langusg€, but it is very difficult to make this notion a graphical representation of the required data. NL interfaces ture, Proceedings of the American Philosophical Society, pp. 444466, 1960. 42. M. Kay, Functional Grammar, Proceedingsof the Berkeley Linguistic Society, L979. 43. G. Ritchie, The Computational Complexity of SentenceGeneration using Functional Unification Grammar, Proceedingsof COLING, Bonn, FRG, August 25-29,1986. 44. J. Bresnan (ed.),The Mental Representationof Grammatical Relations, MIT Press, Cambridg", MA, 1984.
NATURAL.LANGUAGE INTERFACES
are also used to specify the input to decision-supportsystems and expert systems and to pose questions to them.
data ("What were our sales last year?"), but one might also want to format that data ("Graph last year's salesby monl[,'), enter new data ("Set my department's projectedsales for next month to $87,00Q"),query the system about its capabilities when ls Englishthe Most AppropriateInterfaceLanguage? ("How far back do your sales figures go?"),place standing orIn the rush to make computers more accessible,it is easy to be ders ("Don't show anyone's first name"), or do a myriad of taken in by the following false argument: Not everyone who other tasks. wants to use a computer can or will take the time to learn a The ambiguity of NL is often an advantage in retrieval special language for dealing with it; everybody already knows tasks, but can be a serious problem when updating. A good English (or some other natural language); therefore, the only example of this was given in Kaplan (2): If someone says way to get everyone to use computers is to let them use En- "Change Brown's manager from Jones to Baker," does this glish. This section shows the flaws in this argument and sets mean that Brown is to be moved from the group managed by the stage for examining NL interfaces in a more realistic way. Jones to that managed by Baker, or that Jones is being reNatural language may be useful when the user of a system placedby Baker as the manager of the group that Brown is in? does not know the capabilities or limitations of the system, Even a system that has excellent NL capability within a when he or she cannot or will not learn a formal interface small domain (such as accessingdata about sales figures) may language, when the underlying interface is not user friendly not be useful for users who have no idea of the limitations of (and hence would be awkward to use even if the user were that domain or who want to perform tasks outside the scopeof prepared to get technical), or when the nature of the task to be the system. Such users may want to ask questions about the performed is not well specified. system's capability such as "What can you tell me about perEven under the conditions stated in the previous para- sonnel?" and the system may not be able give to any kind of graph, natural language is not useful when the content of coherent answer. interactions is so limited that the brevity of an artificial language (such as a menu of choices)is desirable; systemswith a Interfaces sophisticated interactive graphic and menu interface can be state of the Art of Natural-Language operated without English and with very little training. This In an attempt to jump on the bandwagon of NL interfaces, use of icons is effective only becausethe users thoroughly un- some software producers simply take their current system inderstand the conceptual model underlying the domain (open- terface and modify it slightly so that is usesEnglish words and ing files, sending messages,etc.) and becauseonly a small thus, at first glance,looks like it can understand English. One amount of detail must be conveyed by the icons and by the way to detect such exaggeration is to compare the "English" user's manipulation of them that is allowed with the underlying interface. If there is a English is not useful when physical controls are appropri- fairly clear correspondence between the two, very little NL ate-imagine driving a car or playing most video ga*.r 6ing processingis going on. Even without accessto the underlying written or spoken English! Thus, in graphics-oriented situainterface, it is usually easy to confuse such systemsby giving tions, such as laying out a slide for a presentation, or in com- them simple, natural variations of input. If "List male manputer-aided design (qv), the exact placement of the elementsof agers" works but "Give me the managers who are men" and an image is best done with some form of pointing device. In "Which managers are male?" do not work, the interface is not these casesEnglish may still play a role in initially specifying very closeto English. the images to be placed on the screen if the set of porrible Another distinction, and one that is harder to detect by stored images is large and not readily broken down in a way simply observing the system in operation, is what Moore (B) that would make single or multiple menu selections appro- calls special-purpose vs. general-purposesystems. Generalpriate. purpose systems have the domain-dependent knowledge English is not useful for object identification when the user clearly separatedfrom more general syntactic and/or semantic can more easily point to something (as with a mouse or a knowledg"; such systems are of great interest to researchers. touch-sensitive screen) than describe it. (However, pointing Special-purposesystems have knowledge about their particucan be as ambiguous as English. For example, doesa particular application domain built in at very low levels of processing; lar pointing action refer to "that line," "that trianglq,, ,,that they ffi&Y,for example, be able to reco gnizeunits around a key region of the screen," "the object depictedby that triangle,', or word like "sales" but may not dependat all on general linguissomething else?) tic entities such as noun phrases.They may have specialrules One intermediate position between formal interfaces and of inference for deducing new information from old, but the English is the use of treelike menu systems in which each rules are formulated only for the particular application dochoiceof a word or phrase from a menu causesthe display of a main, not in general terms. Special-purposesystemsare somenew menu dependent on that choice (seeMenu-basednatural times called semantic grammar systemsot pragmatic gromlanguage). This is a good alternative to complex interfaces, mars becausethey combine the semantics andior pragmatics provided that the speed of the display is comfortable for use, of the domain directly with syntactic analysis in a single that the amount of data to be presentedin arrymenu is not too gfammar (see Grammar, semantic). large to be visually processedeasily, and that the user can By mixing the domain model, databasemodel, syntax, and identify the branch he or she wants to take at any point. semantics of a particular domain, special-purposesystemscan Even an application as apparently restricted as using a achieve high performance for that domain. Their drawback is DBMS does not necessarily make the choice of interface easy that it is difficult or impossible for anyone but the original becauseof the many different kinds of tasks a user might want system designer to make significant changes to the system, to perform. The most obvious DBMS task is to simply retrieve and it must be almost entirely rewritten if a new domain is
NATURAL.LANGUAGE INTERFACES
required. Moore (3) has said that it takes between 2 months and 5 years for programmers experienced in building these systemsto produce a special-purposeNL front-end for a small but useful domain. On the other hand, general-purposesystemsoffer the promise of easy transportability from one domain to another by changing the lexicon and the domain-dependent semantics. Their disadvantages are the long development time required to produce the domain-independent componentsand the fact that for some applications this approach brings more to bear on the problem than is necessary,with a correspondingprice tag.A critique of this approachis presentedin Ref. 4. Research systemssuch as TEAM (5-7) and IRUS (8) use this model. It wilt probably be several years before general-purpose(by this definition) systems begin to be widely available, but when they are, the effort required to adapt them to a particular application is expectedto be a few weeks or months. There have been many publications in the research literature about natural-langu age interfaces. Most, but not all, focuses on general-purpose systems. Some of these papers describe research systems that are being used to investi gate various aspectsof the NL problem or are offered as "proof by example" that (limited) NL understanding is possible(9-2L). Others try to present general issues and problems relating to applied NL interfaces, particularly for database access (1,3,22-24). Several conferenceshave had panels or sessions devoted to this subject (25-27) and several special issues of journals have also focused on it (20,28). These research successesimply that the technological basis for commercial successhas been achieved.Commercial ventures using this technolory have begun to appear, and more are sure to follow. How ShouldProspectiveUsersfudge NL Systems? In this section are presented a number of topics that should be investigated when one examines a system that claims to understand English. It is important to keep in mind that the right question to ask is not "Does system X have feature Y?" (Becausethe answer will almost never be a clear yes or no) but rather "How much of feature Y doessystem X handle, and how important is it to the application I have in mind?" In a generalpurpose system the system developers should be able to describe the mechanisms used to handle these issues;a demonstration of their use in one domain is fairly good evidence of their applicability to another domain. In the case of specialpurpose systeffis, evaluation is more difficult, since the techniques used may be more ad hoc; a demonstration that is impressive in one domain may not be relevant to the kinds of problems that will arise in a different application.
657
class" words in English such as prepositions,conjunctions, articles, etc. In addition, since it is impossible for any system to have complete coverage,it is important to know how easy or difficult it is to extend the vocabulary of the system. What knowledge of linguistics and the internal structure of the dietionary is required? Can an end user add new vocabulary or doesit take an applications programmer with some short training, or must vocabulary always be added by the system developers?It is also important to distinguish between new words that are essentially synonymsfor existing words and new words that involve new concepts for the system. Syntactic Couerage.What is the range of syntactic phenomena the system can deal with? Doesthe system handle complex verb forms, relative clauses,various question forms, passives,comparatives,subordinate clauses,time and place adverbials, measure expressions, ellipsis, pronomtnaliza' tion, and conjunction?Although this is the most well studied aspect of natural-langu age understanding, there is not as yet a benchmark against which to test a system,nor even a generally agreed upon list of phenomena.A useful list of phenomenais given by Winograd (29) in his book on syntactic processing. Semantic Couerage.How much doesthe systemunderstand about the domain? For a DBMS retrieval system, does the system have a model of the semantics of the applications domain or does it merely make a direct translation of certain English phrases into specific queries in a formal retrieval langua ge?This is particularly important if the system is to be able to accessnew databases,or to work when old databasesare restructured. There is a major difference between having to ask "Is there an employment record for Joneswith Acme Co. in the employerfield?" and "Did Jones ever work for Acme?" If the system treats the latter question as simply a variant of the first, it will not be able to handle such a query if the database is modified to list the employees for each company (but not the companies for each employee),nor would one expectit to be able to handle "Did Jones ever work for division 5?" ot "Did Jones ever work for Smith?"
Although a system with extremely large coverageis likely to be habitable, even systems with very limited coveragecan be habitable if properly designed,and systemswith wide variations of coverage may be less habitable than ones with uniformly smaller coverage. The critical issues are whether the system has enough coverage to let users meet a reasonable proportion of their needs (i.e., is there at least one way to express everything a user really needs to saY), whether the user can quickly find an appropriate way of expressing a reCoverageand Habitability. Thesetwo propertiesare related quest, and whether the user can easily learn to avoid the system's blind spots. but not identical. Coverageis a characterrzationof the linguisA system'shabitability is reduced if the user is led to bemeasures habitability whereas tic competence of a system, how quickly and comfortably a user can recognizeand adapt to lieve that the system has capabilities that are beyond it, and the system'slimitations. The coverageof a NL system may be there is no clear indication of the boundaries.This can happen categorized in a number of dimensions, some of which are if the language the system presentsto the user is not language that the user can present to the system. discussedbelow. In most applications some English is presentedto the user, Lexical Couerage.How large a vocabulary doesthe system even if it is only canned text. English output from a computer have? The overall size of the vocabulary is not as critical as system witl either be prestored strings or generated text that the relevance of the vocabulary for the application domain, comesfrom a different knowledge base than that used by the though the system should certainly cover all the "closed language-understanding part of the system (instead of being
658
INTERFACES NATURAL-IANGUAGE
integrated as in humans). This means that the language that can be expressedby a computer system may exceedits comprehensior, & situation that is precisely oppositethat of humans! Human users of a system will, very naturally and unconsciously, be influenced by the computer's language and will assume that the computer can understand the kind of language it produces.Thus, a desirable goal is to ensure that the vocabulary in the output is understood by the interface and that the syntactic constructions used in the output are within its syntactic coverage. Even if the two capabilities are matched, there is another possiblepitfall. In normal conversations people typically use pronouns and other anaphoric expressionslike "that purchase order," "those salespeople,"and "the average" to refer to entities introduced into the conversation by their dialogue partner. If the system used canned text for output, or even if it synthesizesEnglish output as needed, it will not be able to understand such anaphoric expressions unless it maintains a model for everything it (as well as the user) has said. The difficulties in achieving habitability with a semantic grammar are based on the fact that without great care such grammars can give users misleading clues as to coverage.If the system can understand both "list the salespeoplewho have been under quota for two months" and "what salespeoplehave been under quota for two months" and the system can understand "list the products that Jones sold to Acme," the user might reasonably expectthe system to understand "what products did Jones sell to Acme?" In a special-purposesystem, however, the system may have different portions of the grammar for each verb, and it is easy for them to becomeinconsistent. Inference. This is the art of drawing logical conclusions basedon the data in the databaseand/or general knowledge of the subject domain (see Inference). It is often the case that retrieving only data that is explicitly stored in a database is insufficient to meet a normal user's needs.Users will assume that the system has the ability to infer new information from that already in the database. (This is particularly true if the user doesnot have detailed knowledge of the database.) The "navigation problem" is an example of a simple inference: Suppose a database contains records about employees and records about jobs the company has performed for clients; the employee record has a field for jobs the employee has worked on, and the job record has a field for the client's name. Someone accessingthis database might naturally ask "Has Ellen Matthews ever worked for Adams Co.?"Notice that in order to interpret this question correctly, the system must be able to follow the chain of reasoning that Matthews has worked for Adams if she has worked on a job that had Adams as the client, although no job was explicitly mentioned in the query and no relation "work for" exists in the database. End User Control of lnterpretation. Supposesomeoneasks "What is the largest division in the company?" This could mean largest in terms of number of employees, number of employeesof a particular type, gross sales,or someother metric. Either the NL system has some built-in metric or it does not. If it does,it may or may not match what was meant. If it is not what was meant, how does the user find this out (the answer "Division 4" probably will not help) and can the user change it? If the system does not have a default metric, it might have a set of metrics that it can ask about, but the user
will not want to seethe question "Do you mean largest number of employeesor largest building area or highest sales?"every time he or she usesthe term largest Ideally, the user should be able to set temporary (or permanent) "standing orders" that will be interpreted in context, but this is currently possible only in a limited way. Useof Pronouns.Any NL system will claim that it can handle pronouns (he, her, it, they, their, himself, etc.) becausethey are so widely used in English, but every system has limitations in this regard because pronoun use can be extremely complex.For example, pronouns usually refer to objectsexplicitly mentioned in previous discourse,but sometimesthey can refer to objects mentioned later ("After he transferred from Department 22, did John Jones work in Division 6?"). Pronouns can also refer to actions ("Did Smith ever cometo work later than 10 am? How often has he done that?"). In NL interfaces,users find it perfectly natural to use pronouns to refer to objectsin the computer's previous response,not just objectsin their own language (Q: "How many projects are ahead of schedule?"A: "One." Q: "Who is in charge of it?"). Other Kindsof Reference.Pronouns are a specificcase of a linguistic phenomenon called anaphoric reference, in which one refers to things without using their full names. Even things that have not been mentioned explicitly can be referred to if it is "obvious" that they should be inferred from the previous context. For example, the multisentence utterance "Seven contracts were concludedlast month. Thoseprofits will set a new record." usesthe phrase "those profits" to refer to the profits of the contracts just mentioned. A "natural" DBMS interface should also provide someability to specify items on the basis of previously computedaggregates,e.9., "products whose salesare at least 80Voof the average sales of the ten most profitable products." Ellipsis.fn conversation,people often leave out large portions of sentences,assuming that the missing parts can be filled in by the listener who sharesthe contextbeing discussed. For example, a user might want to make the following seriesof queries: "How many people did we hire last month?" "The month before?" "How many do we expect next month?" It is easyto be fooledinto thinking that becausea systemhandlesa few examples, it can handle any kind of ellipsis (qv). Quantification. The use of words like soffLe,euery,all, and any can complicate NL understanding becausetheir interpretation often depends on wide-ranging commonsenseknowledge or on detailed knowledge of the particular domain. The queries "Did every person in department 5 submit his/her trip report?" and "Did every person in department 5 consult his/ her department manager?" are structurally equivalent, but the first caserefers to multiple trip reports and the secondcase to a single manager. Negation. Negation is particularly tricky when combined with quantification. Does "All of the projects weren't completed on time" mean that none of the projectswere completed on time or that some were and some were not? Negation can also occur in noun phrases as well as verb phrases:"Who sold no widgets last quarter?" Time and Tense.This is currently an open research issue. There are no general mechanisms for effectively and efficiently representing events and objectsthat change over time.
INTERFACES 659 NATURAL.LANCUAGE Fortunately, many database applications do not have to be concernedwith this issue since they often contain only limited historical data that does not contain complex time relations. Conjunctionand Disjunction.And's and or's are extremely common in English. Often they join completeunits ("the book and the author"), but sometimes they join discontinuous segments ("I adjusted for and calculated next quarter's overhead"). Handling simple conjunctions is within reach of current systems, but the combination of conjunctions with ellipses and other phenomena is still an open problem in computational linguistics. Telegraphiclnput. Although full English sentencesare easy to say, peoplewho have to type a lot frequently want to abbreviate their input by dropping out "unnecessary" words. For example, "Show saleslast year midwest by salesman"is easily understood (by most humans) as a paraphrase of "Show (me) (the) sales (from) last year (in the) midwest (graphed with sales) by salesman." Of course, in the appropriate context it might also mean "Show (to the sales department) (the figures from) last year (graphed with the) midwest (sales) by salesman." An important point to remember about this capability is that, although it is desirable, one pays for it with an increased potential for misunderstanding and (usually) a decreasedability to use the finer points of grammatical structure to influence the processingof nontelegraphic input.
will fit comfortably in individual work stations or personal computers and will be able to locally translate the user's input into a sequenceof commandsto be sent to the DBMS on another machine. There are (at least) two approachesto looking for NL capability in a computer system. One is to look at available systems and say, "If I had it, what could I do with it?" This is likely to be misleading since it is very easy to infer from a few examples that the system can do more than it actually can. A better approach is to determine in advance what kinds of interactions one would like to be able to have with the machine [perhaps by taking protocols of a simulation, as in Bates and Sidner (30) or just by asking potential users of the system to describe a few dozen examplesl. Armed with this unbiased language sample, one can then ask, "Will system X be able to handle this input?" Conclusion:The Future
In the next few years one can expect to find natural-language interfaces to a wide variety of computer systems, including database systems, graphics packages, expert systems, and DSSs.This already large market is certain to grow as personal work stations and network accessto data and DSSs become widely available. Some organizations wiII chooseto develop their own NL interfaces in-house; others will buy that capability elsewhere. UngrammaticalInput. Closely related to the notion of tele- Because the development of language systems requires a much different programming approach than, say, accounting graphic input is that of ungrammatical input. In fact, usually the same techniques are used to handle both kinds of nonstan- or database packages, the in-house systems will tend to be special purpose and difficult to modify as the needs of the dard language. people using them grow. Some companies will offer to build All of the issues above represent problems that have been special-purposesystems on a contract basis, and fewer will at least partially solved in general-purposeresearch systems. offer general-purposesystems(becauseof the very limited supThe following list presents somehighly desirable attributes of ply of experts neededto developthem and the lengthy developsystemsfor databaseretrieval that are not so well understood ment cycle). The subject areas for NL applications will be very broad. in general terms but may be available in limited form for Some vendors will aim for one or more well-defined user comparticular applications. munities and develop specializedpackages;others will build a family of more general, tailorable systems.It will not be easy What-If Capability. This is nearly essentialin decisionsupfor the purchaser of an NL system to judge whether a particuport system (DSSs).Simple specificationsof conditions are lar system is capable of meeting the demands of the proposed easy to handle, but complex specificationspresent serious application. This problem will continue to require expert adprobleffis, particularly if they are expressedincrementally vice and consulting. and modified during a dialogue. In summary, the technology for useful, cost-effective,natuPresentation of Output. This includes formatting reports ral-language interfaces is available now and will begin to have and tables, interfacing to graphics modules,and generating a major impact on database retrieval and other areas in the English output. Simple capabilities are available now and very near future. However, these interfaces will not behave are expanding rapidly. like a human conversational partner, so users must carefully Tools for Altering Domain and File Structttrres.Systems examine such systems to understand their capabilities and that have a very direct correspondencebetween the input limitations. langUage and the retrieval langUage can be modified by users (or systemsprogrammers at the users' organizations), but more sophisticated NL capability implies the need for BIBLIOGRAPHY most customization to be done by the developer of the NL system. Software tools that will make it easier to develop, 1. W. A. Woods,"A Personal view of natural language understandexpand, and modify domain-dependent information and i.g," SIGART Newslett. (61), L7-24. (Februaty L977). DBMs-dependent information will only gradually be devel2. S. J. Kaplan and Davidson, "Interpreting Natural Language Daoped. tabase Update s," Proceedings of the 19th Annual Meeting of the Associationfor Computational Linguistics, Stanford University, Implementation in Work Station For many applications, it CA, June 1981. Stanford, proresources to is undesirable to use mainframe computer Moore, Practical Natural-Language Processing by Comsystems 3. R. C. NL some Soon queries commands. and English cess
NATURAL.TANGUAGE UNDERSTANDING puter, Technical Report Technical Note 25t, SRI International, Menlo Park, CA, October 1981. S. P. Schwartz,Problems with Domain-IndependentNatural Language DatabaseAccessSystems,Proceedingsof the TwentiethAnnual Meeting of the ACL, Association for Computational Linguistics, University of Toronto, Toronto, Ontario, June 1982, pp. 60-62. B. Grosz, TEAM: A Transportable Natural-Language System, Technical Report No. 263R, SRI Artificial Intelligence Center, Menlo Park, CA, November 1982. B. J. Grosz, Transportable Natural-Language Interfaces: Problems and Techniques,Proceedingsof the Twentieth Annual Meeting of the Associationfor ComputationalLinguistics, University of Toronto, Toronto, Ontario, June L982,pp. 46-50. B. J. Grosz,TEAM, a Transportable Natural Language Interface System, Proceedingsof the Conferenceon Applied Natural Languqge Processirg, ACL and NRL, Santa Monica, CA, February 1983,pp. 39-45.
23. R. C. Moore, Natural Language Accessto Databases:Theoretical/ Technical Issues,Proceedingsof the Twentieth Annual Meeting of the Association fo, Computational Linguistics, Association for 4. Computational Linguistics, University of Toronto, Toronto, Ontario, pp. 44-45, June 1982. 24. M. Templeton and J. Burger, Problems in Natural Language Interface to DBMS with Examples from EUFID, Proceedingsof the Conference on Applied Natural Language Processing, ACL and 5. NRL, Santa Monica, CA, February 1983,pp. 3-16. 25. Association for Computational Linguistics, Proceedings of the Nineteenth Annual Meeting of the Association for Computational 6. Linguistics, ACL, Stanford University, Stanford, CA, 1981. 26. Association for Computational Linguistics, Proceedings of the Twentieth Annual Meeting of the Association for Computational 7. Linguistics, University of Toronto, Toronto, Ontario, June 1982. 27. Association for Computational Linguistics and the Naval Research Laboratory, Proceedingsof the Conferenceon Applied Natural Language Processing,ACL, Santa Monica, CA, 1983. 8. M. Bates and R. J. Bobrow, A Transportable Natural Language 28. S. Kaplan, "Special section: Natural language," SIGART Interface for Information Retrieval, Proceedingsof the Sixth AnNewslett. (79), 27 -109 (January 1982). nual International ACM SIGIR Conference,ACM Special Interest 29. T. Winograd, Language as a Cognitiue Process,Vol. 1, Syntax, Group on Information Retrieval and American Society for InforAddison-Wesley,Reading,MA, L982. mation Science,Washington, DC, June 1983. 30. M. Bates and C. L. Sidner, A Case Study of a Method for Deter9. E. F. Codd,R. S. Arnold, J-M. Cadiou,C. L. Chang,and N. Roussomining the NecessaryCharacteristics of a Natural Language Inpoulis, RENDEZVOUS Version 1: An Experimental English-Lanterface, Integrated Interactiue Computing Systems,North-Holguage Query Formulation System for Casual Users of Relational land, Amsterdam,pp. 263-278, 1983. Data Bases,Technical Report R.I2I44, IBM Research,San Jose, CA, January 1978. M. Berns 10. M. Epstein and D. Walker, Natural Language Accessto a MelaBBN Laboratories Inc. noma Data Base,Proceedingof the SecondAnnual Symposiumon Computer Applications in Medical Care, 1978. Also ART Technical Note L71, September1978. An earlier version of this entry appearedin M. Bates and R. Bobrow, 11. J. M. Ginsparg, A Robust Portable Natural Language Data Base Natural Language Interfaces: What's Here, What's Coming, and Who Needs It, in Artificial Intelligence Applications for Business, Ablex, Interface, Proceedings of the Conferenceon Applied Natural Language Processing, ACL and NRL, Santa Monica, CA, February Norwood,NJ, 1984,pp. 179-193. 1983,pp.25-29. L2. G. Guida and C. Tasso, IR-NLI: An Expert Natural Language Interface to Online Data Bases,Proceedingsof the Conferenceon Applied Natural Language Processing, ACL and NRL, Santa Monica, CA, February 1983,pp. 31-38. 13. G. G. Hendrix, The LIFER Manual: A Guide to Building Practical Natural Language Interfaces, Technical Report Technical Note 138, SRI International, Menlo Park, CA, February 1977. L4. G. Hendrix, E. Sacerdoti,D. Sagalowicz,and J. Slocum,"Developing a natural language interface to complex data," ACM Trans. DatabaseSys.3(2),105-147 (June 1978). 15. G. G. Hendrix, "Natural-language interface," Am. J. Computat. Ling. 8(2), 56-61 (April-June 1982). 16. S. C. Shapiro and S. C. Kwasny, "Interactive consulting via natural language," CACM,18(8), 459-462 (L975). L7. I. Spiegler, "Modelling man-machine interface in a data base environment," Int. J. Man-Mach. Stud. 18, bb-70 (lgg3). 18. B. H. Thompsonand F. B. Thompson,Introducing ASK, A Simple Knowledgeable System, Proceedingsof the Conferenceon Applied Natural Language Processing,Santa Monica, CA, February 1988, pp. L7-24. 19. D. E. Walker and J. R. Hobbs,Natural LanguageAccessto Medical Text, Technical Report Technical Note 240, SRI International, March 1981. 20. D. Waltz, "Natural language interfaces," SIGART Newstett. (6L), 16-64 (February Lg77). 2L. D. H. D. Warren and F. C. N. Pereira, An Efficient Easily Adaptable System for Interpreting Natural Language Queries, Technical Report 155, university of Edinburgh, February 1981. 22. G. J. Kaplan and D. Ferris, "Natural language in the DP world," Datamation 28(9), LI4-L20 (August 1982).
NATURAT-LANGUAGE PROCESSINC. See Natural-language generation; Natural-language interfaces.
NATURAL-LANC UAGEU N DERSTAN DING Natural-language communication with computers has long been a major goal of AI both for the information it can give about intelligence in general and for its practical utility. Databases, software packages, and Al-based expert systems all require flexible interfaces to a growing community of users who are not able or do not wish to communicate with computers in formal, artificial command languages. Whereas many of the fundamental problems of general natural-language processing (NLP) by machine remain to be solved, the area has matured in recent years to the point where practical natural-language interfaces to software systems can be constructed in many restricted, but nevertheless useful, circumstances. This entry is intended to survey the current state of natural-language processing by presenting computationally effective NLP techniques, by exploring the range of capabilities these techniques provide for NLP systems, and by discussing their current limitations. This presentation is organized in two major sections: the first on language recognition strategies at the single-sentence level and the second on language processing issues that arise during interactive dialogues. In both cases the concentration is on those aspects of the problem appropriate for interactive natural-langu age interfaces but relate the techniques
NATURAL-LANGUAGE UNDERSTANDING
and systems discussedto more general work on natural language, independent of application domain. Nature of Natural-LanguageProcessing.Natural-language processing(NLP) is the formulation and investigation of computationally effective mechanisms for communication through natural language. To take the bold face phrases in reverse order, first the subject area deals with naturally occurring human languages such as German, French, or English. Second,it is concerned with the use of these languages for communication, both communication between people,the purpose for which these languages evolved, and communication between a person and a computer. Third, NLP doesnot study natural-langu age communication in an abstract way, but by devising mechanismsfor performing such communication that are computationally effective, i.e., can be turned into computer programs that perform or simulate the communication. It is this third characteristic that sets the NLP subarea of AI, itself a subarea of computer science, apart from traditional linguistics and other disciplines that study natural language. This entry examines the relationship of NLP among two other closely related disciplines: linguistics and cognitive psychology (qv). Linguistics is traditionally concernedwith formal, general, structural models of natural language. Linguists, therefore, have tended to favor formal models that allow them to capture as much as possible the regularities of langu age and to make the most appropriate linguistic generalizations. Little or no attention was paid in the developmentof these modelsto their computational effectiveness.That is, Iinguistic models characterize the language itself, without regard to the mechanisms that produceit or decipher it. A goodexample, as shown below, is Chomskian transformational grammar (qv) (1,2), perhaps the best known of all linguistic models, which turns out to be unsuitable as a basis for computationally practical language recognition [although see work by Petrick (3)]. The goal of cognitive psychology (qv) on the other hand is not to model the structure of language but rather to model the use of language and to do it in a psychologicallyplausible w&y, where plausibitity here is defined by correlation with experimental results, especially timing studies of language-understanding tasks (see Anderson (4) for a good example of the flavor of this approach).This is somewhatcloserto the spirit of Al-based NLP in its emphasis on the use of language in communication, but again it is not of primary importance to the cognitive psychologist whether his models are computationally effective. Moreover, the models producedare not often targeted at language understanding per se but at more general aspects of human cognition and memory organizattort, with natural langu age serving only as the vehicle through which these related phenomena are studied. In addition to relating NLP to the study of language in other disciplines, we should point out a major division that arises within NLP itself. The distinction is between general and applied NLP. One can think of general NLP as a way of tackling cognitive psychology from a computer scienceviewpoint. The goal is to make models of human language use and also to make them computationally effective. The vehicles for this kind of work are general story understanding, &s in the work of Charniak (5), Schank (6), Cullingford (7), Carbonell (8), and others, and dialogue modeling, as in the work of Cohen and Perrault (9), Allen (10), Grosz (11), Sidner (L2), and others. One of the most important lessons learned from this
661
work is that general NLP requires a tremendous amount of real-world knowledg"; most of the work just cited is mainly concernedwith the representation of such real-world knowledge and its application to the understanding of natural-language input. Unfortunately, AI has not yet reachedthe stage where it can routinely handle the amount of knowledge required for these tasks, with the result that systems constructed in this area tend to be "pilot" systems that demonstrate the feasibility of a concept or approach but do not contain a large enough knowledge base to make them work on more than a handful of carefully selected example naturallanguage passagesor dialogues. Applied NLP, or the other hand, is not typically concerned with cognitive simulation but rather with allowing people to communicate with machines through natural language. The emphasis is pragmatic. It is less important in applied NLP whether the machine "understands" its natural-language input in a cognitively plausible way than whether it respondsto the input in a way helpful to the user and in accordancewith the desires expressedin the input. Typical applications are databaseinterfaces,as in the work of Hendrix (13),Grosz(I4), Kaplan (15), and others, and interfaces to expert systems(qv), as in the work of Brown and Burton (16) and Carbonell (J. R.) (17) and Carbonell (J. G.) et al (18). Becausesuch systems must operate robustly with real users, in addition to actually processing well-formed natural langueg€, they must be concerned with the detection and resolution of errors and misunderstandings by the user. BasicProblemof NLP. If there is one word to describewhy NLP is hard, it is ambiguity. It arises in natural language in many different forms including the following. Syntactic(or Structural)Ambiguity John saw the Grand Canyon flying to New York Time flies like an arrow Is it John or the Grand Canyon doing the flying? The answer dependson the ambiguous syntactic role of the word flying rn this example. Again, is time flying, or are we talking about a speciesof insect called time flies in the second example. It dependswhether flies is a noun or a verb. (Actually, the second example here has at least six different parsings. Seeif you can find them all.) Word SenseAmbiguity The rrlan went to the bank to get softLecash and jumped in Here the word bank refers either to a repository for money or the side af a river, depending on the two different continuations. Case He ran the mile in four minutes the Olympics Linguistically, a "case"refers to the relation betweena central organi zing concept, here an act of running, and a subsidiary
662
NATURAL.LANGUACE NG UNDERSTANDI
LISP expressions(most often for expert system requests). Case frame instantiations (for a variety of applications). Conceptual dependency(for story understanding).
"Who is the captain of the Kennedy?"
Figure 1. Translationfrom a natural-language utteranceto unambiguousinternal representation. concept, here time or location. In both examples the same preposition, in, indicates the two quite different relationships. Referential
In general NLP, translation of an utterance into an unambiguous internal representation can require inference based on a potentially unbounded set of real-world knowledge. Consider, for instance, Jack took the bread from the supermarket shelf, paid for it, and left. Coming up with an unambiguous representation for this requires answers to such questions as What did Jack pay for? what did Jack leaue?
(the referent of it) (the ellipsed object of left)
and possibly even I took the cake from the table and ate it. What was eaten, the cake or the table? The answer is "obvious," but, independent of real-world knowledge, lt could refer to either one. For instance, it would have a different referent in the example above if one were to repl aceate with cleaned,. Literalness Can you open the door? I feel cold. What are the correct interpretations here? There are some circumstances when the first question might be answered quite reasonably yes or no, e.g., before setting off on a long journey to the place where the door is. On the other hand, it is easy to think of circumstances where the speaker might be very unhappy with such a reply. Again, the secondexample might be a statement of fact or request to closea window. The ambiguities here lie in whether to interpret the utterance literally or whether to treat it as an indirect speechact (qv) (19), e.9., Bil implicit request as in the examplesabove. Becauseof these and other kinds of ambiguity, the central problem in NLP, and this is true for both the general and applied variety, is the translation of the potentially ambiguous natural-language input into an unambiguousinternal i.e., internal to the program doing the processing,representation, as suggestedby Figure 1. The secondlayer of Figure 1 shows an example translation of a natural-langu age database query into an expressionin a databasequery language-the one used by the LADDER (20) system for accessto its database of information about U.S. Navy ships. Note how a potentially ambiguous word such as Kennedy rs resolved into the internal name, JOHN F. KENNEDY, of a specificship, or captain is resolved into the name, COMMANDER, of a field of the relational databaseconceptually underlying the LADDER system. The specific internal representation used here is, of course, highly specialized.In general, there is no commonly agreed standard for internal representations, and different types are useful for different purposes.A partial list includes:
Did Jack haue the bread,with him when he teft? To answer these questions,information on supermarkets,buying and selling, and other real-world topics is required. As mentioned above, AI knowledge representation (qv) techniques have not yet developed to the stage where i6"y can handle at an acceptablelevel of efficiency the large quantities of such knowledge required to do a completejob of rnd"rstanding a large variety of topics. Moreover, even if the knowledge could be represented, unresolved problems in inference fqu) techniques remain a barrier to applying the correct knowledbt to the input in order to produce the desired unambiguous internal representation. The result is that current general NLp systems are demonstration systems that operate with a very small amount of carefully handcrafted knowledge, specifically designedto enable the processingof a small set ol example inputs. The main point of such systems is to investigate the feasibility of certain inference or knowledge representation techniQuesrather than to achieve broad .ou.r"ge in the I{Lp they perform. Applied NLP systems potentially face exactly the same problem, but they finesse it by taking advantage of certain characteristics of the highly limited domains in which they operate. Supposethe input How rnany terminals are there in the ord,er?
was addressedto an expert system that acted as a computer salesman'sassistant.Such a systemneednot considermany of the potential ambiguities lurking in this example. The word terminals, for instance, can be assumed to refer to computer terminals, rather than airport terminals, terminally ill patients, or terminal values of a mathematical series.Also, ,nssuming the system processesone sales order at a time, "the order" can be assumed to refer to the current order without considering any others. In general, the technique is to premake as many inferences as possibtein a way appropriate to the task at hand. For suitable tasks in many restricted domains, this has been used very successfullyto reduce the amount of knowledge that must be representedand the number of inferences that must be made to manageable proportions. By restricting the natural langu agedealt with by an interExpressions in a database query language (for DB access). face to that required to handle a limited task in a limited Parse trees with word sense terminal nodes (for machine domain, it is thus possible to construct performance systems transldtion). capable of useful natural-langu age communication, and this
NATURAL.LANGUACE UNDERSTANDING
represents the current state of the art in practical NLP. Clearly, this is far from satisfactory, since in particular, each task and domain that are tackled require careful preanalysis so that the required inferences can be preencodedin the system, thus making it difficult to transfer successfulnaturallanguage interfaces from one task to another. Some research (e.g.,Ref. t4) is being conductedto improve the portability of current interfaces, but until the problem of preencodinginferencesis solved in a more general way, the portability issue will be the one that most hinders the widespreaduse of natural-language interfaces. A practical alternative, however, is the Language Craft (Carnegie-Group Inc.) approach,where a development environment and grammar interpreter are provided to shorten drastically the development of new domainspecific interfaces. AnalysisTechniques Natural-Language In this section, several of the more common techniques for natural-language analysis are examined in some detail, i.e., for translating natural-language utterances into a unique internal representation. Virtually all natural-language analysis systems can be classified into one of the following categories: Pattern matching (qv) [e.g., ELIZA (qv) (2I), PARRy (qv) (22)1. Syntactically driven parsing (qv) [e.g.,ATNs (23)]. Semantic gfammars (seeGrammar, semantic) [e.g.,LIFER (seeGrammar, semantic) (13), SOPHIE (seeGrammar, semantic) (16)1. Case frame instantiation [e.g', ELI (seeGrammar' semantic) Q4)1. Wait and see [e.g.,Marcus (25)]. Word expert [e.g.,Small (26)]. Connectionist (seeConnectionism)[e.9.,Small (27)]. Skimming [e.g.,FRUMp (qv) (28),IPP (29)]. The examples provided with each category are the names of language analysis systems following that approach or the names of builders of such systems.Of these categories,the first four represent the bulk of the language analysis systems aIready constructed and are the only ones coveredin detail. The reader is encouragedto follow up the referencesprovided for further details of the other methods. PatternMatching. The essenceof the pattern-matching (qv) approach to natural-langu age analysis is to interpret input utterances as a whole rather than building up their interpretation by combining the structure and meaning of words or other lower level constituents. The approach is thus wholistic rather than constructive. With this approach,the interpretations are obtained by matching patterns of words against the input utterance. Associatedwith eachpattern is an interpretation, so that the derived interpretation is the one attached to the pattern that matched. In the simplest case this arrangement is simply a list of correspondencesbetween equivalence classesof utterances (the onesthat match a given pattern) and interpretations (the ones associated with each pattern). In more sophisticated variations of the approach, patterns may involve higher level constituents or semantic elements so that some aspects of the interpretation may becomeconstructive, but the basic flavor of the approach still remains to go as
663
directly as possiblefrom the input utterance to the interpretation, as suggestedby Figure 2. One of the simplest and earliest examplesof pattern-matching parsing was the ELIZA system of Weizenbaum (21). This system demonstrated how a very simple mechanism in the form of a computer program was able to mimic apparently intelligent behavior. The carefully selectedtask of ELIZA was to simulate a Rogerian psychologist as he interviewed a patient. The style of this school of psychologyallowed ELIZA to understand virtually nothing of what was said and still, in many cases,come up with convincing replies. ELIZA did not construct an internal representation of its input as such but instead went directly from the input to its reply, which could therefore be thought of as the "interpretation" of the input. The input was matched by a small set of single-level patterns, each of which was associatedwith several replies. The first pattern found to match an input lead to a selectedreply from the associatedset; if a pattern matched again on a subsequent input, the next reply was given. Variables in the patterns that would match any sequenceof words whatsoever allowed text from the input to be incorporated into the reply. Example patterns and correspondingreplies with variables representedby X and Y include: X always Y
X you Y me
Can you think of a specific example? When? Really, always? Why do you think I Y you? You like to think I Y you, don't you. I did Y you. What would that t1:nrthat
These would allow ELIZA (E) conduct the following dialogue with a patient (P): P: People always stare at me. E: Can you think of a specific example? P: For instanceyou keepstaring at me. E: Why do you think I keep staring at you. P: Becauseyou hate me. E: You like to think I hate !ou, don't you?
Matches first pattern
Matches secondpattern with Y matching "keep staring at" First reply for secondpattern with appropriate substitution for Y Secondpattern again with Y matching "hate" Next (second)reply for second pattern
The simplicity of the matching and reply generation meant that most conversations with ELIZA did not go nearly as smoothly as this, but there are several anecdotesabout people being fooled into thinking they were talking to a real person. ELIZA could achieve its results with such a low level of analysis only by ignoring most of what was said. To make more complete analyses of the input using the same tech-
P a t t e r nm a t c h
Figure 2. Parsing by pattern matching.
664
NATURAL.TANGUACE UNDERSTANDING
niques would require far too many patterns-in the extreme, one pattern for every possible utterance. Moreover, many of these patterns would contain common subelements because they mentioned the same objects or had the same concepts aruanged with slightly different syntax. In order to resolve these problems within the pattern-matching approach,hierarchical pattern-matching methods have been developed in which some patterns match only part of the input and replace that part by some canonical result. Other higher-level patterns can then match on these canonical elements in a similar way, until a top-level pattern is able to match the canonicalized input as a whole according to the standard patternmatching paradigm. In this way similar parts of different utterances can be matched by the same patterns, and the total number of patterns is much reduced and made more manageable. The best known example of hierarchical pattern matching is the PARRY system of Colby (22,30).Like ELIZA, this program operates in a psychological domain but models a paranoid patient rather than a psychologist.Using the traditional pattern-matching paradigffi, PARRY interprets its input utterances as a whole by matching them against a set of about 2000 general patterns. The internal representation into which the input is transformed is a set of updatesto a simple model of the paranoid patient's mental state plus a representation of any factual content of the input. Replies are generated from the updated paranoid model plus the factual content. However, before the general patterns are applied, PARRY massagesits input through a series of eight canonicalizing steps, most of which are basedon localizedpattern matching. Examples of these steps include: Canonicalizingrigid idioms (e.g.,"have it in for" -+ "hate"). Noun phrase bracketing using an ATN (seebelow). Canonicalizing flexible idioms (e.g., "lend a hand" + "help "). Clause splitting (e.g.,"I think you need help" + "(I think) (you need help)"). Using rules of this form, an input such as Do you haue it in for me? I want to lend you a hand.
which matclnes all events in which a person compels another person to do something. Other general patterns involved people doing things to objects,objectsbeing in certain states, etc. To allow matches against these patterns, Wilks represented word sensesas formulas of the same semantic primitives as appeared in the patterns, so for instance, intenogate was ((MAI{ SUBJ) ((MAN OBJE) (TELL FORCE))), i.e., a person forcing another person to tell something, and crook was one of the following possibilities: (KNOTGOOD ACT) OBJE) DO) SUBJ MAN)) ((((((THIS BEAST) OBJE) FORCE) SUBJ MAN)) POSS) LINE THING)) i.e., a person who does bad things or a long thin thing that a person usesto force animals (normally sheep)to do something. As well as providing an interpretation of the input, the process of matching these formulas against the general patterns also allowed word sensesto be disambiguated. So The policeman intercogatedthe crook. is analyzed by matching it against the (MAN FORCE MAN) pattern, and this also choosesthe bad person sense of crook becauseit matches the secondMAN of this pattern. There is also a (MAN FORCE THING) pattern, but this doesnot match as well becausethe formula for interrogate specifiesMAN for its object.Note that the notion of degreeof match is present in this system. As shown below, this idea makes parsing by pattern matching considerably more powerful, especially when the input contains grammatical errors. To summaruzethis section on parsing by pattern matching, the basic paradigm is to recognizeinput utterances as a whole by matching them against patterns of words, wildcards, andior semantic primitives. The result of the match is the interpretation of the utterance. Unless a very shallow level of analysis is acceptable,the number of patterns required is too large, even for restricted domains. This problem can be ameliorated by hierarchical pattern matching in which the input is gradually canonicalized through pattern matching against subphrases. The number of patterns can also be reduced by matching with semantic primitives instead of words.
can be canonicalized into a form similar to (YOU HATE ME) + INTERROGATIVE _? O WANT) (I HELP YOU) An appropriate reply is generated by matching against PARRY's 2000 general patterns. As well as matching patterns of words, it is also possibleto analyze natural-language input by matching patterns of semantic elements with potentially very powerful results as shown by the pilot machine translation (qv) system of Wilks (31). The goal of this system was to translate English input into French output. To do this, it first analyzed its English input into an internal semantic pattern from which it could generate the French. This analysis was performed by matching the input against a very general set of patterns such as (MAN FORCE MAN)
SyntacticallyDriven Parsing.Syntax deals with the ways that words can fit together to form higher level units such as phrases, clauses, and sentences.Syntactically driven parsing (qv) is, therefore7naturally constructive, i.e., the interpretations of larger groups of words are built up out of the interpretations of their syntactic constituent words or phrases.In this sense,it is just the oppositeof pattern matchirg, in which the emphasisis on interpretation of the input as a whole. The most natural way for syntactically driven parsing to operate is to construct a complete syntactic analysis of the input utterance first and only then to construct the internal representation or interpretation. This leads to considerable inefficiency, and more recent syntactically driven approacheshave tried to intermix parsing and interpretation. Parselrees and Context-FreeGrammars. The most common form of syntactic analysis is known as a parse tree. Figure 3 shows a parse tree for the sentence
NATURAL-LANGUAGE UNDERSTANDING
The rabbit nibbled the carrot. The tree showsthat the sentenceis composedof a noun phrase (subject) and a verb phrase (predicate).The noun phrase consists of a determiner (the)followedby a noun (rabbiD,whereas the verb phrase consistsof a verb (nibbled) followed by another noun phrase (the direct object), whose determiner ts the and whose noun is carcot. Syntactic analyses are obtained by application of a grammar that determines what sentencesare legal in the language being parsed. The method of applying the grammar to the input is called a parsing (qv) mechanism or parsing algorithm. A very simple style of grammar is called a context-freegrammar, which means that the symbol on the left side of a rewrite rule may be replaced by the symbol on the right side regardless of the context in which the left side symbol appears.The context-freegrammar consistsof rewrite rules of the following form:
665
English between subject and object.To enforce such an agreement, there would have to be two completely parallel grammars, one for singular sentences and the other for plural. Moreover, a grammar that also allowed passivesentencessuch The carrot was nibbled by the rabbit.
would have to have another completely different set of rules, even though the passive and the active forms of the same sentence have a clear syntactic relation, not to mention semantic equivalence. These duplications are multiplicative rather than additive, leading to exponential growth in the number of the grammar rules. Thus, in terms of the number of rules involved and in terms of being unable to capture related phenomena by related rules, context-free grammars turn out to be quite unsuitable for natural-language analysis. Recent work by Gazdar (34) and other has shown that these problems of exponential rule growth can be masked using notational shorthand devices such as "metarules" plus relatively minor S-+NPVP extensions to the context-free formalism and in particular NP -+ DET NIDET ADJ N without going to the transformational framework discussed VP-+VNP below. However, the computational tractability of generalized DET + the phrase structure grammar (qv), as the extended formalism is + ADJ biglgreen called, has yet to be determined. N + rabbit lrabbits Icarrot There is one more point to be made with this example, one V + nibbledInibbledlnibble not specific to context-free grammars, but a serious problem As this example shows, context-free grammars have the ad- for all syntactically driven parsing. The above grammar also vantage of being simple to define. They have been widely used allows for computer languag€s, and highly efficient parsing mechaThe rabbit was nibbled by the carcot. nisms (32,33)have been developedto apply them to their input. However, they also suffer from some severe disadvantages. It should be clear that the above context-freegrammar This is an example of a sentencethat is perfectly goodsyntacaccountsfor the parse shown in Figure 3; rewrite rules corre- tically but makes no senseat all. For utterances that are am(and for more comprehensivegrammars, spond directly to bifurcations in that tree. Although it ac- biguous syntactically is very common),such acceptanceof nonambiguity syntactic counts for that and several other good sentences,the grammar can lead to the highly inefficient geninterpretations sensical also allows several bad ones, such as eration of multiple parses,only one of which has a reasonable translation into the final internal semantic representation. The rabbits nibbles the carrot. TransformationalGrammar. The problems mentioned above to context-free grammars were tackled by linguists, as specific The problem here is that the context-free nature of the grammar does not allow agreements such as the one required in in particular Chomsky (f ,2), through transformational grammar (qv). As shown in Figure 4, their answer was to add another type of rule to a context-free grammar. The basic idea was to use the context-free grammar to generate a parse tree just as before but add onto it certain tags, such as one for a plural sentence.The set of transformations on the parse tree would then rearrange things so that the pluralness was transmitted to all parts of the tree concerned and the required agreements could be enforced. The transformations that enforced agreements were called obligatory transformations. A secondclass of optional transformations was used to capture the relations between, for instance, active and passive sentences; the active and passive versions of the same sentence had the same representation in the base componentproduced
Context-free grammar
the Figure
r ab b i t
n i b bblleedd
the
carrot
3. A parse tree for "the rabbit nibbled the carrot."
Figure 4. Transformational grammar.
666
UNDERSTANDING NATURAL-LANGUAGE
by the context-free grammar, but the passive version was the result of applying an extra optional transformation. Transformations are context-sensitive rules that map a parse tree into a related parse tree. Although transformational grammar did a much better job of accounting for the regularities of natural language than context-free grammar, from the point of view of computational effectiveness,it was much worse. (However, significantly, a complete transformational grammar of English has never beenproduced.)As the abovedescriptionimplied, it was set up as a generative model, i.e., it told you how to produce a sentence starting from the symbol S. Running the model in reverse to do sentenceanalysis turned out to be a computational nightmare, largely becausetransformations operate on trees, not strings of words, and so are highly nondeterministic when run backward. For instance, the "equi-NP deletion" transformation deletes without trace the second occurrence of a coreferential noun phrase in certain structures, and it is impossible to run a deletion backward if there is no clue as to what was deleted.Consequently,although someattempts have been made [e.g., Petrick (3)], parsers based on transformational grammar have not played a major role in NLP. AugmentedTransition Networks. Largely in responseto the problems of transformational grammar (qv), Bobrow and Fraser (35) proposed and Woods (23) subsequently developed a method of expressing a syntactic grammar that was computationally tractable and yet still could capture linguistic generalizations in a concisew&y, in many casesmore conciselythan transformational grammar itself. The formalism Woodsdeveloped was known as an augmented transition network (ATN) (seeGrammar, augmented-transition-network). It consistedof a recursive transition network (formally equivalent in expressive power to a context-free grammar) augmented by a set of tests to be satisfied before an arc was traversed and a set of registers that could be used to save intermediate results or global state. An example ATN is shown in Figure 5. The network recognizessimple sentenceswith just a subject,verb, and direct object in all combinations of active, passive,declarative, and interrogative. The symbolsattached to the arcs show what constituent must be recognizedto traverse the arc;AUX is an auxiliary verb (like is or haue);NP is a noun phrase, which is definedby another network in the same formalism as this one; V is a verb; and "by" is the word by. The numbers on the arcs serve as indices to the following table, which list the tests that must be true to traverse the arcs and the action that must be performed as the arc is traversed.
Figure 5. ExampleATN. In this LISP-like notation, the asterisk refers to the constituent just parsed, and SETR sets a register, whose name is specifiedby its first argument, to the value of its secondargument. A concrete example of the network in operation will make this clearer. Supposeone wanted to parse The rabbit nibbled the carrot. One would start at the leftmost node in the graph and at the left of the sentence.Two arcs lead from that node,but only arc 2 is applicable since in the input one is not looking at the auxiliary verb required by arc 1 but at a noun phrase, "the rabbit," as required by arc 2. One can see from the table that arc 2 has no additional test (indicated by T), so we traverse that link setting the SUBJ register to the thing just parsed, i.e., "the rabbit," and the TYPE register to DECLARATIVE. One is now at a node with only one arc, arc B, and that arc requires a verb. Fortunately, one is now looking at "nibbled" in the input, so one can try to traverse it. Arc 3 has an additional test requiring that*, i.e., the present word in the input (the verb), agree with the contents of the subject register; this is the way agreementsare enforcedin an ATN. In this casethe agreement is correct, and one can traverse the arc, setting the V register to the verb. The node one gets to now has a line through it, indicating that this can be the end of the parse provided that there is no input left to consume,so "The rabbit nibbled" would be acceptedhere. In this example there is another noun phrase, "the carrot," and so one follows atc 6, whose test requires that the verb in the V register be transitive, which "nibbled" is. So one ends up at another terminal node with no further input, and so the parse is completed successfully.The result of the parse is the setting of the four registers: SUBJ, TYPE, V, and OBJ, and these can be combined into a tree or whatever representation is desired. A more interesting use of registers can be seen from the example The camot was nibbled by the rabbit.
Predicate
1T 2T 3 (agrees.V) 4 (agreesSUBJ.) 5 (AND (GETF PPRT) (: V'BE))
6 (TRANS V) 7 AGFLAG 8 T
(SETRV-) (SETRTYPE'QUESTION) (SETRSUBJ-) (SETRTYPE'DECLARATIVE) (SETRSUBJ-) (SETRV-) (SETROBJ SUBJ) (SETRV-) (SETRAGFLAGT) (SETRSUBJ'SOMEONE) (SETROBJ-) (SETRAGFLAG FALSE) (SETRSUBJ-)
To parse the first three words, we traverse arcs 2 and 3 much as before,with the difference that now "the carrot" is in SUBJ and "was" is in V. One cannot take arc 6 becauseone is only up to "nibbled" in the input, but one can take arc 5 becausenibbled is a verb. The test on arc 5 also requires nibbled to be a past participle, which it is, and the contents of V to be, and since was is a form of the verb to be, the test is satisfied. The action on arc 5 is interesting; it puts the contents of the SUBJ register in the OBJ register, overwrites the verb register with the past participle verb, sets a flag to true, and puts a placeholder "someone"in the SUBJ register. This correspondsto recognrzing that the sentence is in passive form, and in our casemakes the carrotthe objectand ruibbledthe verb. One has reached "by" in the input and so can follow arc 7, which just requires the passiveflag to be set; its only action is to turn this
NATURAL.LANCUAGE UNDERSTANDING
flag off, so that the arc cannot be traversed again. Finally, one gets back to their terminal node via arc 8, which puts "the rabbit" in the SUBJ register. Note that the result of this parse is the same as the result of the first example. Now try to follow the parses of
Did the rabbit nibble the carrot? Was the carrot nibbled by the rabbit? These brief examples should give someidea of the power of an ATN and of how its tests and registers can be used to capture the regularities of language in a conciseand elegant way. Very large ATN grammars of several hundred nodes(36) have been developed that capture large subjects of English. However, ATNs also have several disadvantages: Complexity and Nonmodularity. As the coverage of an ATN increases, so does its structural complexity. It becomesextremely difficult to modify or augment an existing ATN without causing large numbers of unforeseenside effects. For instance, if another outgoing arc is added to a node with a large number of incoming arcs in order to handle an additional type of phrase that is a valid continuation of the parse represented by one of the incoming arcs' it could to lead to spurious and incorrect parseswhen the node is reached via a different incoming arc. (Fan-out and fan-in factors of 10 or 20 are not uncommon in large realistic grammars.) Fragility. The curuent position in the network is a very important piece of state information for the operation of an ATN. If an input should be slightly ungrammatical, even by a single word, it is very hard to find the appropriate state to jump to that would enable the parse to continue, though seethe work by Kwasny and Sondheimer (57) and Weischedel and Black (37) on dealing with such extragrammaticality and the work on island-driven ATN parsing for speech input by Bates (38). Infficiency through Backtracking Search. Although the aboveexamplesare not complex enough to show it, the task of traversing an ATN is in general nondeterministic and requires search. The natural way to search an ATN is through backtracking (qv). Because intermediate failures are not remembered in such a search, major inefficiencies can result through repetition of the same subparsesarrived at through different paths through the network. Chartparsing techniques (39-41) were designedas alternatives to ATNs precisely to avoid these inefficiencies. Ineffi,ciency through Meaningless Parses. Normally the grammar of an ATN is purely syntactic, and a complete syntactic parse is producedbefore any semantic interpretation is performed. In that situation many spurious meaningless parsescan be produced,especiallyif the grammar is large and comprehensive. To combat this, recent parsers (42) in the ATN tradition have tried to interpret each constituent as it was produced, thus preventing complete
to be that couldbepredicted on constituents
ffiffl3,1x:] See Refs. 43 and 44 for more discussion on the relative advantages and disadvantages of ATNs.
667
SemanticGrammars.Language analysis basedon semantic grammars (qv) is similar to syntactically driven parsing except that in semantic grammars the categories used are defined semantically as well as syntactically. Thus, instead of the category "noun phrase" in a syntactic grammar, a semantic grammar might have the category "description of a ship," which is syntactically always a noun phrase but has additional strong semantic constraints. Semantic grammars were introducedby Burton (45) for use in SOPHIE (16), a computer-aided instruction system for electronic circuit debuggirg, to deal with the problems of inefficiency due to the generation of syntactically correct, but meaningless, parses mentioned above for ATN-based syntactic grammars. The goal was to eliminate the production of meaningless parses by setting up the grammar so that only meaningful parses could be produced.To do this, it was necessary to categorrzeall the objectsand actions that the SOPHIE system neededto parse to conduct a conversation in its domain of electronic circuitry and then to construct the grammar so that, for instance, only a description of a switch could be the object of a "close" action. This technieu€, while retaining the fragility of an ATN, worked well to reduce parsing inefficiency. Becausethe relevant semantic categories were available at parse time, it also allowed semantic interpretation to proceedas the parse unfolded. However, the technique only works properly in restricted domains, Iike the one mentioned above, in which all objects and their relations can be categorized in advance, allowing a grammar to be built around the possible semantic relations. Semantic grammars are thus a technique useful only for applied natural-language processing,not for general NLP. For an example of how semantic grammars can be used, consider the following grammar definition in the formalism used by LIFER, a system for building semantic grammars developedby Hendrix (13). S -+ (present) the (attribute) of (ship) (present) - what is I [can youl tell me (attribute) + length I beam I class (ship) + the (shipname) | (classname)class ship (shipname) -+ kennedy I enterprise (classname)-+ kitty hawk I lafayette An expandedversion of this grammar was used for accessto a database of information about U.S. Navy ships in the LADDER (20) system. Even the above "mini" version is capableof recognrzing such inputs as What is the length of the Kennedy? Can you tell me the class of the Enterprise? What is the length of Kitty Hawk class ships? Since the definitions used by LIFER are similar to those used for context-free grammars, the reader should have no difficulty in seeing how these inputs could be recognizedby the above glammar. In addition to defining a grammar, LIFER also allowed an interface builder to specifythe interpretations to be producedfrom rules that were used in the recognition of an input. In the above case this resulted in database query language statements corresponding to the inputs being produced as a direct result of the recognition. The databasequery language statements in effect took the place of a parse tree, and so no separate semantic interpretation stage was required.
NATURAT-LANGUAGE UNDERSTANDING
Note in the example above that not all the categoriesare specializations of pure semantic categories; (present), for instance,will parse several phrases, none of which fits into any standard grammatical category. The phrases may differ from each other in their syntactic structure, including the number of verbs they contain. The ability to construct cross-grammatical categorieslike this allows a semantic grammar to incorporate some features of pattern matching. Also note how strongly directed the recognition is. The word class for instanceoccursin two quite different ways in the grammar: once as a ship attribute and the other as part of the secondtype of ship description. Thus, in the (rather silly) question What is the class of Lafayette classships? the appropriate category for class would be used each time it appeared without considering its other role in the grammar. This directednessof recognition is also useful in building spelling correction into the recognition process.In an imput like What is the legnth of the Kennedy? the spelling of legnth need only be checkedagainst the list of ship attributes rather than the entire system vocabulary because a ship attribute is the only category that can appear at the place where the misspelling occurs. A final advantage of the strong top-down direction available through semantic glammars can be seen in LIFER's ellipsis mechanism, which was intended to deal with input sequencessuch as What is the length of the Kennedy? The beam? Here the fact that beam and length are in the same semantic grammar category allows the secondinput to be interpreted as "What is the beam of the Kenne dy?" rather than say "What is the length of the beam?" See below for discussionon ellipsis mechanismsin general. In addition to their numerous advantages for limited domain applications, semantic grammars have several disadvantages, chief of which is the requirement that a new grammar be developedfor each new domain, since the semantic categories for each domain will be quite different. However, if the applications are similar (e.g., both include database access), there will be many parts of the grammar (e.g.,the basic framework for questions)that are the same.A related disadvantage is that semantic grammars tend to get large very quickly, partly because of the repetition of similar constructions in different semantic categories. This makes nontoy-semantic grammars quite hard to construct and can result in very "spotty" kind of coverageof syntactic variation. For instance, adding a rule that allows the possessiveto be apostrophizedin the description of a ship attribute (i.e.,you can say "the Kennedy's length" as well as "the length of the Kennedy") doesnot also allow possessivesto be apostrophizedin the description of an attribute of a sailor (i.e., you might not be able to say "officer's rank" even though you can say "rank of an officer") becausethe two categories are in different parts of the grammar, and their recognition is unrelated. A secondrule would be required. Three approaches have been tried to resolve these prob-
lems. One is to go back to recognition by a syntactic grammar before semantic interpretation, but to try to intermix the semantic and syntactic componentsmuch more closely, so that every syntactic constituent is interpreted as soon as it is constructed. The RUS system (42) is an example of this approach. It provides some improvement over a pure syntax first approach but is still not as efficient as pure semantic grammars; it is also difficult to incorporate semantic constraints, B process that requires writing different chunks of LISP code,called 'Irules," of each domain. An alternative approach, as exemplified by the TEAM system (14),is to focusin on a specificclassof applications,access to relational databases,and to abstract out the linguistically commonaspectsof a semantic grammar for such a class.Building a specific interface, then, requires only instantiating a template, as it were, with the vocabulary and morphological variation required for a specific database.This approach has the potential to produce highly efficient natural-language interfaces,but at the cost of someexpressivepower and inability to go beyond the class of applications without restarting from the ground up. The third approach is to combine the strengths of several parsing strategies, such as semantic grammars, syntactic transformations, and pattern matching into a single system that maps structures into ore canonical forms before attempting to use the full semantic grammar, thus allowing many redundant and unnecessary constructions to be eliminated (46,47).This multistratery approachhas been implemented in the DYPAR system (48) and applied to databasequery, expert system command, and operating system command interfaces. Although richer in expressivepower, this approach demands more sophistication of the glammar writer, requiring knowledge of how to write transformations, context-free rules, and patterns. CaseFrameInstantiation.A major developmentin computational linguistics (qv) was the inclusion of case-frameinstantiation (seeGrammar, case)in the repertoire of effective parsing techniques. Case frames were popularized by the linguist Charles Fillmore in his seminal paper "A Casefor Case" (49), and their computational import was quickly grasped by several researchers in natural-language processing, including Simmons (50), Schank (6), and Riesbeck(s1). Case frame instantiation is one of the major parsing techniquesunder active researchtoday. Its recursive nature, and its ability to combine bottom-up recognition of key constituents with top-down instantiation of less structured constituents, gives this method very useful computational properties (seealso, Frame Theory). What are CaseFrames.?A caseframe consistsof a head concept and a set of roles, or subsidiary concepts,associatedin a well-defined manner with the head concept.Initially, only sentential-level case frames were investigated, where the head consists of the main verb, and the casesinclude the "agent" that carries out the action, the "object" acted upon, the "location" in which the action takes place, etc. For instance, consider the sentence In Elm Street, Johrt. broke a window with a hammer for Billy. In simplified generic notation, the caseframe correspondingto this sentence is
NATURAL.LANGUAGE UNDERSTANDING
IBREAK [caseframe agent: JOHN
?y,i::;I:y,H[*,o recipient: directiue: locatiue;ELM STREET benefactiue:BILLY co-agent: I Imodals
I'f;,xTl".'
669
In order to illustrate the directiue case, consider "John kicked the ball toward the goal" and "John flew the airplane to New York." In the former example "the goal" fills the directiue case, and in the latter "New York" fills the same case, since both expressthe direction in which each respectiveaction was performed. In some early formulations of case frames no distinction was made between locatiue and directiue,but the need to encodestative vs. dynamic information explicitly-plus the need to represent sentencessuch as "fn Yankee Stadium, John threw the ball at the catcher" that instantiate both cases-led to the acceptanceof two semantically distinct cases,one encoding global location, the other a local change in location. The recipient caseis filled by "Mary" in both of the following: "John gave Mary a ball" and "John gave a ball to Mary." Note that in this instance there are syntactically distinct sentences that map onto a unique semantic caseframe representation, to wit:
In the notation above, cases, such as agent, are written in lowercase,and their fillers are in uppercase. Caseframes, 8s adoptedin computational linguistics, differ markedly from simple, purely syntactic, parse trees. The relaTGIVE tions between the head of the case frame and the individual [caseframe agent: JOHN cases are defined semantically, not syntactically. Hence, a recipient: MARY noun in the subject position can fill the agent case,as in the object: BALLI example above, or it can fill an object case,as in "the window broke" (the window was not the agent that causedthe break. .l age),or it can fill the instrument case,as in "the hammer broke the window." These are different semantic roles played by the Required,Optional, and Forbidden Cases.Each case frame same syntactic constituent, "subject." Since the purpose of a defines some required cases, some optional cases,and some nature-Ianguage interface is to extract the semantics of the forbiden cases.A required caseis one that must be present in input it behoovesthe caseframe representation to encodeexfor order the verb to make sense.For instance, break requires plicitly semantic differences in otherwise similar syntactic the objectcase.A sentenceis not complete without it (try consemantic requires frames parsing into case parse trees. Thus, structing one), but no other case is required. "The window knowledge, as well as syntactic information, as shown below. broke" is a complete, if not very informative, sentence.An Consider someother properties of caseframes. In the examoptional caseis one that, if present, provides more information ple above, only some of the caseswere instantiated. What of to the case frame representation but, if absent, doesnot harm the other cases, such as recipient and co-agent?There are its integrity. For instance agent, co-agent,and locasemantic examples that illustrate these shortly. First, consider the tiue are optional casesof break. Forbidden casesare those that meaning of each case,as outlined below: cannot be present with the head verb. The directiue and recipient casesare forbidden for the break case frame. (Again, try constructing a sentence with these casesusing break as the tffEAD VERB) head verb.) [caseframe agent: (the active causal agent instigating the action) Conieptual Dependency.It is often useful in natural-lanobject: (the object upon which the action is done) guage processing to employ a semantic representation that instrument: (an instrument used to assist in the action) represents information in as canonical a manner as possible. recipient: (the receiver of an action - often the indirect In the ideal canonical representation, different ways of stating object) the same information would be represented identically, and directiue: (the target of a (usually physical) action) propositions that encodesimilar information would map into locatiue: (the location where the action takes place) semantic encodingsthat highlighted the similarities while rebenefactiue:(the entity on whose behalf the action is taining the differencesin an explicit manner. The best known taken) attempt at a canonical semantic representation is the concepco-agent: (a secondary or assistant active agent) tual dependency (CD) (qr) formalism developed by Schank tl (6,52,53)as a reductionistic caseframe representation for common action verbs. Essentially, it attempts to represent every If instead of saying "John broke the window with a hamaction as a composition of one or more primitive actions, plus mer," one were to say "John broke the window with Mary," intermediate states and causal relations. To use Schank's example, supposeone wants to represent, Mary would fill the co-agent case. Presumably John did not in a caseframe notation, "John gave Mary a ball" and "Mary swing Mary over his head and use her as a battering ram to took a ball from John." These sentencesdiffer syntactically, shatter the window, much as he would use an instrument like they differ in terms of verb selection, and they differ in how a hammer or a tree branch. Since Mary is taking part in caustheir cases are instantiated (e.g., "John" is the agent of the ing the action to happ€r, regardless of whether her action is independent of, or in support of, John's action, she fiIIs the co- first sentence and "Mary" of the secondsentence).However, both sentencesexpress the proposition that a ball was transagent case.
670
UNDERSTANDING NATURAL-LANGUAGE
ferred from John to Mary, and in both casesone can infer that John had the ball before the action took place,that Mary has it after the action, and that John no longer has it after the action. The only significant difference is that in the first sentence, John performed the action, and in the latter Mary did so. In CD there is a primitive action called ATRANS (for Abstract TRANSfeT of possession,control, or ownership) that encodes the basic semantics of both of these verbs and many more. The CD representation of these sentenceis: TATRANS rel: POSSESSION actor: JOHN object: BALL source:JOHN recipient: MARYI "John gave Mary a ball"
TATRANS rel: POSSESSION actor: MARY object: BALL source:JOHN recipient: MARYI "Mary took a baII from John"
(Somereaders may be acquainted with Schank'scomplexnotation of double and triple arrows. The direct simplified notation (shown above) is virtually isomorphic, somewhat clearer, and closer to the data structures used by most of the computer programs that parse into CD and other caseframe representations.) Thesetwo structures are very simple to match against each other to determine precisely in what aspectsthe two propositions differ and in what aspectsthey are identical. Moreover, inference rules associatedwith ATRANS can be invoked automatically wh en giue and take are parsed into these structures. There are many more verbs that contain the ATRANS primitive (such as bequeath, donate, steal, sell, buy, a,ppropriate, ucpropriate,etc.). Sometimes ATRANS is used in conjunction with other CD primitives that capture other aspects of the meaning. The verb sell, for instance, involves two ATRANS primitives in mutual causation:
TATRANS
IATRANS rel: OWNERSHIP CAUSE + actor: JOHN object:APPLE CAUSE source:JOHN recipient: MARYI "John sold an apple to Mary
rel: OWNERSHIP ,actor: MARY ' object:25 CENTS source:MARY recipient: JOHNI for 25 cents"
The casesused in CD are similar but not identical to the set used originally in casegrammars, although the basic ideas are the same. One refinement in CD was to separate agent into actor and source,&s the two can be instantiated by different entities in the underlying semantic primitives. Other CD primitive actions include: PTRANS MTRANS MBUILD INGEST PROPEL ATTEND SPEAK
Physical transfer of location Mental transfer of information Create a new idea or conclusionfrom other information Bring any substanceinto the body Appty a force to an object Focus a senseorgan (e.g.,eyes,ears) Produce sounds of any sort
Later work (54) has extended this list to include social and other interpersonal actions.
Parsinginto CaseFrames.The discussionof caseframes thus far has focused on their structural properties, including parsimony and clarity of representation. Now the uses of case frames in parsing natural language are discussed,in particular certain parsing techniques available to parsers whose target representation is basedon caseframes. In essence,parsers built around caseglammars help to combine bottom-up recognition of structuring constituents with more focusedtop-down instantiation of less structured, more complex constituents. This essential property is demonstrated in the example case frame recognition algorithm presented below (see also Parsing). Thus far caseframes have been mentioned that consist of a header and a collectionof semantically definedcases.There is a bit more to it than that. Each case consistsof a filler and a positional or a lexical marker. There have been examples of casefillers in the abovesections.A positional casemarker says that the filler of the caseoccursin a predefinedlocation in the surface string. A lexical casemarker saysthat the casefiller is precededby one of a small set of marker words (usually prepositions) in the surface string. For instance, considerthe following input to a natural-langu age interface to an operating system: Copy the fortran files from the system library to my directory. "Copy" is the case header, the object case is marked positionally as the noun phrase occupying the simple direct object position (i.e., the first noun phrase to the right of the verb that is not precededby a preposition). The filler of the objectcaseis constrained semantically to be some information structure in a computer. Hence, the parser knows where in the input to search for the fiIler of the object case and moreover knows what to expect in that position (a noun phrase denoting an information structure, Iike a fiIe or directory in a computer). The source case is marked lexically by the preposition from and the recipient case is marked by the preposition fo. Both casefillers are constrained to be noun phrasesdenoting information repositories in the computer (directories,tapes, etc.). More explicitly, the case frame information available to the parser is: TCOPY(header-pattern) [object: (POSITIONAL DIRECT-OBJECT) marker: (i nformation- structure)l filler: lsource: (LEXICAL (from-marker)) marker: (information-repository)| (input-device)l filler: ldestination: (LEXICAL (to-maker)) marker: (information-repo sitory) | (output- device)l filler: l Where: (header-pattern) --+ copyI transfer lmovel. (from-maker) --+ from lin (to-marker) --+to linto lonto plus patterns or NP-level case frames to recognize outputdevices,input-devices,information structures, and information-repositories A typical case-frame parsing algorithm that operates on this caseframe data structure could be summarized as follows:
NATURAL.TANGUAGE UNDERSTANDING
1. For each case frame in the grammar, attempt an unanchored match of the header pattern against the input string. If none succeeds,the input is unparsable by the grammar. (An unanchored match is the processof searching for a particular pattern anywhere in the input, as opposedto an anchored match, where the match is attempted only starting at a predefinedposition in the input string.) If one or more matches are found, perform the following steps for frame where header matched, and the onesthat account for the entire input are the possible parses of the input string. 2. Retrieve the case frame indexed by the recognized case header. 3. Attempt to recogRizeeachrequired case,as follows: a. If the caseis marked lexically, do an unanchored match for the case marker (a very simple one- or two-word pattern), and if that succeeds,perform the more complex recognition of the case filler by anchored match to the right of the case marker or by a more complex parsing strategy (such as recognizing an embedded case frame startin g at that location in the input). "Source"and "destination" in the example above are marked lexically. b. If the caseis marked positionally, do an anchoredmatch of the case filler (or again a more complex recognition stratery) starting at the designated point in the input string. "Object" in the example above is marked positionally. c. If the casemaker can be marked either way, searchfirst for the lexical marker, and, failing that, attempt to recognizeit positionally. For instance, the recipient casein GIVE can be marked by the word to (or u,nto,etc.) or it can appear positionally in the indirect object location ("John gave an apple to Mary" vs. "John gave Mary an apple"). If one or more required casesare not recognized,return an error condition. This signifies a possible ellipsis, incorrect selectionof the caseframe, ill-formed user input, or insufficient grammatical coverage.The following sectionsaddress issues of robust recovery from ill-formed user input. 4, Attempt to recognizeall the optional casesby applying the same method used to parse the required ones. If some are not recognized,however, do not generate error conditions. 5. If, after all, the required and optional caseshave been processedand there is remaining input, generate a potential error condition denoting spurious input, insufficient cover&g€,or garbled or ill-formed input that may be recognized by more flexible parsing strategies. As the caseframe is parsed, the input segmentsrecognizedas casefillers are processedand stored as the value of the corresponding casesin the caseframe. A partially instantiated case frame can serve to guide error-correction processesor to formulate focusedqueries to the user (46,55,56).The initial case frame selection phase can be speededup by indexing the caseheader patterns by the words they contain and recognuzung them in a pure bottom-up manner. This bottom-up indexbased process is computationally effective if there are very many case frames, and each case header consists of a relatively simple pattern. Otherwise, the top-down unanchored pattern match is sufficiently efficient (few case frames), or both processesrequire substantial computation (large numbers of case frames with complex header patterns).
671
Case-frame instantiation can be applied recursively to parse relative clauses or any other linguistic structures that can be expressed as case frames. Noun phrases with postnominal modifiers (i.e.,trailing propositionalphrasesthat modify the main noun phrase), for instance, can be encoded and recognized by an extension of the sentential-level caseframe instantiation algorithm presented above. Moreover, case-frame instantiation works in concert with semantic gTammars or patterns used to recognize any subconstituents such as case markers represented as nonterminal nodes in a grammar. The advantagesof case-frameinstantiation over other parsing techniques can be summarized as follows: Caseframes combine bottom-up recognition of simple structuring constituents, such as caseheadersand casemarkers, with top-down recognition of semantically more complex, but syntactically less signifi cant, case fillers. The differential treatment of different constituents provides more efficient parsing in general, allows for ellipsis resolution, and makes possible some forms of error recovery, as discussed below. Caseframes combine syntax and semantics.Positional and case marker information is used in concert with semantic recognition of case fillers, thus reducing (though certainly not eliminating) structural and lexical ambiguity. Case frames are a fairly convenient representation for back-end systems to use. In contrast, parse trees must first be interpreted semantically and subsequentlytransformed into a representation more convenient for other modules in the system. Robust Parsing.Atty natural-language interface that is used in a practical application with a multitude of users must be able to handle input that is outside its grammar or expectations in various ways. When people use language spontaneously, whether in spoken or written form, they inevitably make mistakes resulting in extragrammatical utterances that a natural-language interface will receive. Given the present limited state of NLP, a natural-language interface must also be prepared for input that is, as far as the user is concerned,perfectly correct but that the parser cannot recognize because of its own limited competence. Some types of extragrammatical utterances [see Refs. 43 and 44 for more complete accountsl are listed below with example utterances that might be encounteredby an interface to a college course registration system. Spelling errors: tarnsfer Jim Smith from Econoics 237 too Mathematics 15?Note that some spelling errors can result in different correctly spelled words (e.g., too). Novel words: transfer Smith out of Economics237 to Basketwork 100 Here one suppos"r tftut "out of" is not listed as a (multiword) preposition correspondingto the sourcecasemarker of transfer and that "Basketwork" is not in the interface's dictionary of department names. Spurious phrases: pleaseenroll Smith if that's possible in I think Economics 237
672
NATURAL-LANGUACE UNDERSTANDING
Ellipsis or other fragmentary utterances: also Physics 514 This might be the a follow-up input to the previous one. Unusual word order: in Economics2ST Jim Smith enroll Missing words: enroll Smith Economics2SZ Here the in is missing, but the meaning is still perfectly clear.
the operation of an ATN. If an input should be slightly ungrammatical, even by a single word, it is very hard to find the appropriate state to jump to that would enable the parse to continue. This assumes,moreover,that it is possibl. [o determine exactly where the input has departed from the grammar's expectations.The backtracking search used with most ATNs can make this difficult. Work by Weischedeland Black (37) has dealt with extragrammaticality caused by incorrect agreementsthat can be resolved by relaxing the predicateson ATN arcs, and Kwasny and Sondheimer(bi) havl looked into Unless a natural-language interface can deal with prob- adding extra arcs to ATNs on a dynamic basis to make the lems in these classeseasily, it will appear very uncooperative grammar fit the input. Earlier work on speechparsing (Bg) also tried to use ATNs in an island-driven mode. and stupid to its users, who will tend not to use it if they A more recent development in robust parsing by Carbonell "ith", have that choice or to use it with a high level of frustration. and Hayes (56,58) uses a construction-rp..ific approach that Examined below are techniques available to deal with someof fits in well with semantic grammars and caseframe instantiathe above deviations from grammatically in more detail. tion. The basic idea is to tailor parsing strategies to specific Spelling errors are the most common and normally the construction types; this not only results in efficient purrlrrg most easily correctedof all grammatical deviations. of The usual grammatical input but also permits built-in recovery basic approach when a word is found to be outside stratethe vocabugies that exploit the characteristics of the particular lary of a natural-language interface is to compare constructhe word tion type. For instance, the following simpiu r".overy against a set of known words and substitute strategy (or the word works quite well for simple imperative caseframes: words) from that list found to be closestto the unknown word according to some metric and subject to some threshold of Skip over unexpected input until a case marker is closeness.There is not space here to go into the found; methods of parse skipped segmentsagainst unfilled cases,using comparison,but clearly the processwill be made more only efficient semantic constraints. and less prone to error by shortening the list of words against which to comparethe unknown word. For this reason, methods If this strategy is applied to of language analysis, such as semantic grammars and caseframe instantiation, that are able to apily strong top-down transfer Economicsz4T to physlcs slT smith constraints to their recognition are at a rigninrult when it comesto spelling correction.For instance,"drlntage "Economics 247" and "smith" will initially in be skipped over, with "to Physics 3rT" being correctly parsed since ,,to,, is a tarnsfer Jim Smith from Econoics237 too Mathematics Is6 valid case marker. Then the skipped segments will be correctly parsed against the unfilled casesrro,rr.e-course,, and a system basedon case-frameinstantiation such as that exam- "student:'respectively, leading to a parse identical to that for ined above need only compare Econoics against its list of department names rather than against its whole vocabulary. transfer smith from Economics24T to physics sl| This ability is particularly important in the caseof tooin this example. Too is a real word that might well be in the system,s Such methods of robust parsing are under active investigation -outstanding vocabulury, and without the strong prediction that it should be at the moment with the chief problem beiig the a preposition marking a caseof transfer, the system would be coordination of multiple, independent, Ionstruction-specific unable to correct it (a match against the whole vocabulary parsing strategies on the same input. would make too the best match), or even notice that it is misspelled. DiafoguePhenomena Whereas spelling correction can be dealt with at the lexical level, other forms of grammatical deviation require modi fica- In addition to recognizingindividual sentences, the problem of tion to a NLP system's grammatical expectations. interactive communication through natural langulge, The way in be it which this can be accomplisheddiffers markedly by approach. communication between man and machine or communication In pattern matchitg, for instance, the obvious approachis parbetween two people, entails discourse phenomena that trantial pattern matching as attempted in the Flexp system (4g). scendindividual sentences(seeDiscourseunderstanding). Patterns are deemed to match partially if most, but not a1, their elements actually do mat.n tft" input. Clearly, this can Anaphora- Pronouns and other anaphoric references be useful for missing or extra words, but is (words like it, that, or one)refer to concepts not useful in the describedprevicaseof unusual word order. Moreover, in practice, it turns out ously in a dialogue. Anaphoric resolution entails identifythat some elements of a pattern are more important ing the referents of these place-holder words. Interactive than others, and unless allowance is made for these difierences,it is dialogues invite the use of anaphora much more than simdifficult to decide exactly how much of a pattern pler database query situations. Therefore, as natural-lanneeds to match before the pattern as a whole can be declared guage interfaces increase in complexity and to have expand their matched. domain of application, anaphoric resolution becomes an inDealing with grammatical deviation in an ATN-based creasingly important problem. sys_ tem turns out to be extremely difficult. The curyent position in Definite Noun Phrases. Noun phrases often serve the network is a very important piece of state information as anfor other type of anaphoric referenceby referring to previously
NATURAL.LANGUAGE UNDERSTANDING
mentioned concepts,much like the less specific anaphors do. IJsually such phrases are flagged by a definite pronoun (e.g., the). As Grosz (11) noted, resolving the referent of definite noun phrases or any other anaphors often requires an understanding of the planning structure underlying cooperative discourse. Ellipsis. People often use sentencefragments to expressa complete proposition. These terse utterances must be filled out in the context of the dialogue. Sentential-levelellipsis (qv) has long been recogntzedas ubiquitous in discourse. However, semantic ellipsis, where ellipsis occurs through semantically incomplete propositions rather than through syntactically incomplete structures, is also an important phenomenon.The ellipsis resolution method presentedbelow addressesboth kinds of ellipsis. Extragramrrlatical Utterances. Interjections, dropped articles,false starts, misspellings, and other forms of grammatical deviance abound. Developing robust parsing techniques that tolerate errors has been the focus of much recent work (37,56,58-60), as discussedin the preceding section. Metalinguistic Utterances. Intrasentential metalanguage has been investigated to somedegree(61) but its more common intersentential counterpart has received little attention (62). However, utterances about other utterances (e.g., corrections of previous commands,such as "I meant to type X instead" or "f should have said . .") are not infrequent, and an initial stab is being made at this problem (68). Note that it is a cognitively less demanding task for a user to correct a previous utterance than to repeat an explicit sequence of commands (or worse yet, to detect and undo explicitly each and every unwanted consequenceof a mistaken command). Indirect Speech Acts. Occasionally users of natural-language interfaces will resort to indirect speechacts (qv) (6466), especially in connectionwith intersentential metalanguage or by stating a desired state of affairs and expectittg the system to supply the sequenceof actions necessaryto achieve that state. Empirical studies suggest that users of natural-language interfaces avail themselvesof discoursephenomenawhenever such deviceshelp in formulating short, succinct linguistic expressionsover lengthier, more explicit ones.This observation is summartzed as follows:
673
by the Rl system (69).Part of the XSEL task is to provide the user with information about DEC components,hence subsuming the databasequery task. However, unlike a pure database query system, &D expert system interface must also interpret commands, understand assertions of new information, and carry out task-oriented dialogues [such as those discussedby Grosz (11)1.XCALIBUR, in particular, deals with commands to modify an order, as well as information requestspertaining to its present task or its databaseof VAX componentparts. In the following example dialogue, user inputs are pr..uded with ((>" a prompt. >what is the largest IlTB0 fuceddisk und,er 940,000? The rp}7-aa is a 516 MB fixed.pack disk that costsg58,000. >The largest under 950,000? The rp}7-aa. >Add two rp}7-aa disks to my ord,er. Line item I addedt (Z rp77-aa) >Add a printer with graphics capabitity fixed or changeablefont? >ftxed font lines per minute? >make it at least 200, upperllowercase. OK. Line item 2 addedt (t txyll-ry) >Tell me about the lxyl l The kylL bilities.
ls a 240-llm line printer with plotting capa-
For details of the XCALIBUR interface, the reader is referred to (70-72). In this entry, only illustrating the caseframe ellipsis resolution method is discussed. The XCALIBUR system handles ellipsis at the case-frame level. Its coverageis a supersetof the LIFER/LADDER system (13,20)and the PLANES (qv) ellipsis module (73).Although it handles most of the ellipsed utterances we encountered,it is not meant to be a general linguistic solution to the ellipsis phenomenon. The following examples are illustrative of the kind of sentence fragments the current case-frame method handles. For brevity, assume that each sentencefragment occurs immediately following the initial query below. INITIAL QUERY: 'what is the price of the three largest single port fucedmedia disks?" "Speed?" "Two smallest?" "How about the price of the two smallest?" "alEo the smallest with dual ports" "Speed with two ports?" "Disk with two ports."
TersenessPrinciple: Users of natural-language interfaces insist on being as terseas possible, independentof task, communication media, typing ability, or instructions to the contrary, without sacrifi,cingthe flexibility of expressioninherent in natural-language corrlmunicatiora[This principle may be viewed as a surprisingly strong form of Grice's maxim of In the representative examples above, punctuation is of no brevity (67).1 help, and pure syntax is of very limited utility. For instance, the last three phrases are syntactically similar (indeed, the Case Frame Eflipsis Resof ution. In order to illustrate the last two are indistinguishable), but each requires that a differubiquity of ellipsis in interactive dialogues through a naturalent substitution be made on the parse of the preceding query. language interface, look at the 1SALIBUR project, whose obEllipsis is resolved differently in the pr"r"rce or absenceof jective is to provide flexible natural-language access(compre- strong discourse expectations.In the former casethe discourse hension and generation) to the XSEL expert system (6g). expectation rules are tested first, and if they fail to resolve the XSEL, the Digital Equipment Corporation'sautomated sales- sentencefragmant, the contextual substitution rules are tried. man's assistant, advises on selection of appropriate VAX com- If there are not strong discourse expectations,the contextual ponents and producesa salesorder for automatic configuration substitution rules are invoked directly.
674
NATURAT-IANGUACEUNDERSTANDING
An exemplary discourse expectation rule follows: 'fhe
IF:
system generated a query for confirmation or disconfirmation of a proposedvalue of a filler of a case in a case frame in focus,
THEN:
EXPECT one or more of the following: 1) A confirmation or disconfirmation pattern. 2) A different but semantically permissible filler of the case frame in question (optionally repeating the attribute or providing the case marker). 3) A comparative or evaluative pattern. 4) A query for possible fillers or constraints on possible fillers ofthe case in question. [If this expectation is confirmed, a subdialogue is entered, where previously focused entities remain in focus.l
The following dialogue fragment, presented without further commentary, illustrates how these expectations come into play in a focused dialogue: >Add a line printer with graphics capabilities. Is 150 lines per minute acceptable? >No, 320 is better Erpectations 1, 2, & g (or) other options for the speed,? Expectation 4 (or) Too slnw, try 300 or faster Expectations 2 & J The utterance "try 300 or faster" is syntactically a complete sentence, but semantically it is just as fragmentary as the previous utterances. The strong discourse expectations, however, suggest that it be processedin the same manner as syntactically incomplete utterances since it satisfies the expectations ofthe interactive task. The terseness principle operates at all levels: syntactic, semantic, and pragmatic. The contextual substitution rules exploit the case-frame representation of queries and commands discussedin the previous section. The scopeof these rules, however, is limited to the last user interaction of appropriate type in the dialogue focus, as illustrated below. The rules search the ellipsed fragment for casefillers (or casemarker and filler pairs) to substi. tute for corresponding casesin the parse ofthe previous input. Substitution can occur at a top-level (sentential) caseframe or in embedded (relative-clause or noun phrase) case frames. >What is the size of the 3 largest single-port fiaed-media disks? >And the price and speed? and >What is th.esize of the 3 largest single-port fi.tced-media disks? >disks with two ports? Note that it is impossible to resolve this kind of ellipsis in a general manner ifthe previous query is stored verbaiim or as a semantic-grammar parse tree. "Disks with two ports,,would be best correspond to some (disk-descriptor) nonterminal and hence, according to the LIFER algorithm (19,20), would replace the entire phrase "single-port fixed-media disks,' that correspondedto (disk-descriptor) in the parse of the original
query. However, an informal poll of potential users suggests that the preferred interpretation of the ellipsis retains the previous information in the original query. The ellipsis resolution process, therefore, requires a finer grain substitution method than simply inserting the highest level nonterminals in the ellipsed input in place of the matching nonterminals in the parse tree of the previous utterance. Taking advantage of the factthat a caseframe analysis of a sentenceor object description captures the meaningful semantic relations among is constituents in a canonical manner, a partially instantiated nominal caseframe can be merged with the previous case frame as follows: Substitute any casesinstantiated in the original query that the ellipsis specifically overrides. For instance "with two ports" overrides "single port" in our example, as both entail different values of the same case filler regardless of their different syntactic roles. ("Single port', in the original query is an adjectival construction, whereas "with two ports,, is a postnominal modifier in the ellipsed fragment.) Retain any casesin the original parse that are not explicitly contradicted by new information in the ellipsed fragment. For instance, "fixed media" is retained as part of the disk description,as are all the sentential-levelcasesin the original query, such as the quantity specifier and the projection attribute of the query (,,size,'). Add casesof a caseframe in the query that are not instantiated therein but are specifiedin the ellipsed fragment. For instance, the "fixed-head" descriptor is added as the media caseof the disk nominal caseframe in resolving the ellipsed fragment in the following example: >which disks are configurable on a vAX 1I -Ts\? >Ary configurablefixed-headdisks? In the event that a new case frame is mentioned in the ellipsed fragment, wholesalesubstitution occurs,much like in the semantic grammar approach. For instance, if after the last example one were to ask "How about tape drives?" the substitution would replace "fixed-head disks" with "tape drives" rather than replacing only "disks" and producing the phrase "fixed-head tape drives," which is meaningless in the current domain. In these instances of wholesale context switch the semantic relations captured in a caseframe representation and not in a semantic grammar parse tree prove immaterial. The key to case-frameellipsis resolution is matching corresponding cases rather than surface strings, syntactic structures, or noncanonical representations.Although correctly instantiating a sentential or nominal case frame in the parsing processrequires semantic knowledge, some of which can be rather domain specific,once the parse is attained, the resultittg canonical representation, encoding appropriate semantic relations, can and should be eploited to provide the system with additional functionality such as the present ellipsis resolution method. For more details and examplesof the rules that perform case-frame substitution, see the XCALIBUR report (7L). More Complex Phenomena.In addition to ellipsis and anaphora, there are more complex phenomena that must be addressedif one is to understand and simulate human dis-
UNDERSTANDING NATURAL.LANCUAGE
course. This type of deeper understanding has not yet been incorporated into practical natural-language interfaces. However, as natural-Ianguage interfaces increasein sophistication (as they surely willl tnr.e more complex phenomenarequire attention, so, as the final topic of this entry, someexamplesof these more esoteric discoursephenomena are discussed. Goal DeterminationInference.The interpretation of an utterance may dependon the inferred conversationalgoals of the speaker. Consider the following set of examples,in which the same utterance spoken in somewhat different contexts elicits radically different responses.These responsesdepend on the interpretation of the initial utterance, in which the attribution of goals to the speaker plays a dominant role. Passer-by:Do you know how to get to Elm Street? Person on the street corner: Walk toward that tall building and, EIm Street is the fr.fth or sixth on your left. The passer-by'squestion was quite naturally interpreted as an indirect speech act, since the information sought (and given) was not whether the knowledge of getting to Elm Street was present but rather how actually to get there. Lest the mislaken impression be given that it is a simple matter to identify indirect speech acts computationally, consider the following variant to the examPle: Passer-by:Do you know how to get to Elm Street? person reading a street map and holding an envelope with an Elm Street addresson it: No, I hauen't found it; could you help me? In the secondexample, the listener infers that the goal of the passer-byis to render assistance,and therefore the initial utl*rurrre is interpreted as a direct query of the knowledge state of the listener in order to know whether assistanceis required. Hence,the passer-by'squestion is not an indirect speechact in this example. Nor is the task of the interpreter of such utterances only to extract a binary decision on the presenee or absence of a speechact from goal expectations.The selectionof which indirect speechact is meant often rests on contextual attribution of different goals to the speaker. Consider, for instance, the following contextual variant of our previous example:
Example
original example Map reader Cabbie example
675
Speech act
Indirect information request Direct information request Indirect action request
Socia/ Role Constraints.The relative social roles of the dicourse participants affect their interpretation of utterances as illustrated below: Army General; I want a juicy Hamburger. Aide: Yes sir! Child: I want a juicY Hamburger. Mother Not today, perhaps tomoryowfor lunch. Prisoner 1: I want a juicy hamburgerPrisoner 2: Yeah, me too.all the food here tasteslike cardboard. Clearly, the interpretation of the sentence "I want a juicy hambu rgef' differs in each example with no context present beyond tft" differing social roles of the partiipants and their consequent potential for action. In the first example a direct order is inferred, in the seconda request, and in the third o nly a general assertion of a (presumably unattainable) goal. Therefore,comprehendinga dialogue rests critically on knowledge of social roles (74, 75).Moreover, social role constraints provide part of the setting essential in making goal attribulionr and therefore impinge (albeit indirectly) on goal determination inferences discussedin the previous section. In unconstrained discourse there is strong interaction between goal expectations,social role constraints, indirect speechacts, and metalanguage utterance interpretation. Conclusion
This entry has presented a brief overview of the current state of the art of NLP-the processof developingcomputer systems that communicate with their users through natural language. The computational approach to NLP differs from the more general open-endedapproach to natural langu age in linguisii.r and cognitive psychology.As shown above,practical natural-language interfaces can currently be constructed to perform limited tasks within restricted domains, and the various techniques that have been employed to construct such interPasser-by:Do you know how to get to Elm Street? faces have been examined and compared. Further details on Waiting cabbie:Sure, hop in. How far up Elm Street are you any of the systems or techniques describedcan, of course,be going? obtained by following the large set of referencesprovided. A reader with desire for further general information may be parIn this example, the cabbie interpreted the goal of the passer- ticularly interested in Refs.76-78, and a reader with a desire by as wanting aride to an Elm Street location. Making sure to see some implementation details of systems illustrative of the cabbie knows the destination is merely instrumental to the cognitive simulation approachmay wish to look at Ref. 53, the inferred goal. The social relation between a cabbie and a which includes unusually complete descriptions of a small (potential) customer is largely responsible for triggering the number of NLP systems (see also Ref. 79). goal attribution. Thus, the passer-by'sutterance in this example is also interpreted as an indirect speechact, but a different one from the frst example (i.e., wanting to be driven to the destination vs. wanting to know how to navigate to the desti- BIBLIOGRAPHY nation). In summary, three totally different speechacts (qv) 1. N. Chomsky,SyntacticStructures.Mouton,The Hague,1957' are attributed to identical utterances as a function of different of the Theoryof Syntax,MIT Press,Cam2. N. Chomsky,Aspects goals inferred from contextual information (for additional dis1965. MA bridge, cussion of goal determination inferences in discoursecompre3. S. R. Petrick, A RecognitionProcedurefor Transformational hension see Refs. 4L, 65, and 74)
676
NATURAT-IANGUAGE UNDERSTANDING
Grammars. Ph.D. Thesis, Department of Modern Languag€s, MIT, Cambridge,MA, 196b.
26. S. L. Small and C. Rieger, Parsing and Comprehendingwith Word Experts (A Theory and its Realization), in M. Ringle and W. 4. J. R. Anderson, Language, Memory, and, Thought. Lawrence Lehnert (ed.), Strategies for Natural Language Proiessirg, LaErlbaum, Hillsdale, NJ, Lg7G. wrence Erlbaum, Hillsdale, NJ, 1gg2 pp. gg_I47. 5. E. C. Charniak, Toward a Model of Children's Story Comprehen- 27. S. Small, G. Cotrell, and L. Shastri, Toward Connectionist Parssion, TR-266, MIT AI Lab, Cambridge,MA, 1972. itg, Proceedingsof the SecondNational Meeting of the AAAI, University of Pittsburgh, Pittsburgh, pA, pp.247-280, August 1gg2. 6. R. C. Schank, Conceptual Information Processing,Amsterdam, North-Holland, L975. 28. G. Dejong, Skimming Stories in Real-Time. Ph.D. Thesis, Computer science Department, yale university, New Haven, cr, 7. R. Cullingford, Script Application: Computer Understanding of 1979. Newspaper Stories, Ph,D. Thesis, Computer ScienceDepartment, Yale University, New Haven, CT, 1928. 29. R. C. Schank, M. Lebowitz, and L. Birnbaum, "An integrated understander,"Am. J. computat. Ling.6(1), 18-80 (1gg0). 8. J. G. Carbonell, Subjective Understanding: Computer Models of Belief Systems,Ph.D. Thesis, Yale University, New Haven, CT, 30. K. M. Colby, Simulations of Belief Systems,in R. C. Schank and 1979. K. M. Colby (eds.),Computer Models of Thought and,Language, Freeman, San Francisco, pp. 2EL-296, Ig7g. 9. P. R. Cohen and C. R. Perrault, "Elements of a plan-basedtheory of speechacts," Cog.Sci. 3, L77-2L2 (1979). 31. Y. A. Wilks, "PreferenceSemantics,"in Keenan (ed.), Format Semantics of Natural Language, Cambridge University Press,Cam10. J. F. Allen, A Plan Based Approach to SpeechAct Recognition, bridge, UK, L975. Ph.D. Thesis, University of Toronto, Lg7g. 32. J. Earley, "An efficient context-free parsing algorithm,,, CACM 11. B. J. Grosz,The Representationand Use of Focus in a System for r3(2),94-LA2 (1970). Understanding Dialogues. Proceedings of the Fifth International Joint Conferenceon Artifi.cial Intelligence, Cambridge, MA, pp. 33. M. Tomita, Effi'cientParsing for Natural Language, Kluwer Aca67-76, L977. demic Publishers,Boston, MA, 1980. 12. C. L. Sidner, Towards a Computational Theory of Definite Anaph- 34. G. Gazdar, Phrase Structure Grammars and Natural Langu&g€, Proceedingsof the Eighth International Joint Conferenceon Artifiora Comprehension in English Discourse, TR- 537, MIT AI Lab, Cambridg", MA, 1979. cial Intelligence, Karlsruhe, FRG, pp. bb6-b65 August 19g8. 13. G. G. Hendrix, Human Engineering for Applied Natural Lan- 35. D. G. Bobrow and J. B. Fraser, An Augmented State Transition guage Processitg, Proceedings of the Fifth Interna,tional Joint Network Analysis Procedure, Proceedings of the First International Joint Conferenceon Artifi,cial Intelligence,Washington, DC, Conferenceon Artificial Intelligence, Cambridg", MA, pp. 189pp. 557-567, 1969. 1 9 1 ,1 9 7 7 . L4. B. J. Grosz,TEAM: A Transportable Natural Language Interface 36. w. A. Woods, R. M. Kaplan, and B. Nash-Webber,The Lunar sciencesLanguage system, Final Report, 2}78,Bolt, Beranek, and System, Proceedingsof the Conferenceon Applied Natural Language Processing,Santa Monica, CA, February 1988. Newman, Cambridge, MA, 1972. 15. S. J. Kaplan, CooperativeResponsesfrom a Portable Natural Lan37. R. M. Weischedel,and J. Black, "Respondingto potentially unparseablesentences,"Am. J. Computat.Ling. 6, 97-L0g (19s0). guage Data Base Query System, Ph.D. Thesis, Department of Computer and Information Science,University of Pennsylvania, 38. w. A. woods, w. M. Bates,G. Browh, B. Bruce, c. cook, J. KlovPhiladelphia, L979. stad, J. Makhoul, B. Nash-webber, R. schwartz, J. wolf, and v. 16. J. S. Brown and R. R. Burton, Multiple Representationsof KnowlZue, Speech Understanding Systems, Final Technical Report 3438, Bolt, Beranek, and Newman, cambridge, MA, 1976. edgefor Tutorial Reasonirg,in D. G. Bobrow and A. Collins (eds.), Representationand Understanding, Academic Press, New York, 39. R. M. Kaplan, A General Syntactic Processor,in R. Rustin (ed.), pp. 311-349, 1975. Natural Language Processing,Algorithmics, New york, pp. 19824r, L973. L7. J. R. Carbonell, Mixed-Initiatiue Man-Computer Dialogues, Bolt, Beranek, and Newman, Cambridge,MA, 1971. 40. M. Kay, The MIND System, in R. Rustin (ed.),Natural Language Processing,Algorithmics, New York, pp. 1bb-188, 1gTB. 18. J. G. Carbonell,W. M. Boggs,M.L. Mauldin, and P. G. Anick, The XCALIBUR Project, A Natural Language Interface to Expert Sys- 4L. R. Frederking, A Rule-BasedConversation Participant, Proceed,tems, Proceedingsof the Eighth International Joint Conferenceon ings of the 19th Meeting of the Associationfor ComputationalLinArtificial Intelligence,Karlsrube, FRG, 1988,pp. 6b3-6b6. guistics, Stamford, CT, ACL-81, 1981. 19. J. R. Searle, Speech Acts, Cambridge University press, Cam- 42. R. J. Bobrow, The RUS System,BBN Report 9828,Bolt, Beranek, bridge, UK, 1969. and Newman, Cambridge,MA, lg78. 20. E. D. Sacerdoti,Language Accessto Distributed Data with Error 43. P. J. Hayes and G. V. Mouradian, "Flexible parsirg," Am. J. ComRecovery,Proceedings of the Fifth International Joint Conference putat. Ling. 7(4), 232-241 (1981). on Artificial Intelligence, Cambridge, MA, pp. 196-202, Lg77. 44. P. J. Hayes and D. R. Reddy, "Stepstoward graceful interaction in 2I. J. Weizenbaum, *ELIZA-A computer program for the study of spoken and written man-machine communication." Int. J. Mannatural language communication between man and machine, Mach. Stud. 19(B),2LL-294 (Septemberlgg3). CACM 9(1), 36-45 (January 1966). 45. R. R. Burton, Semantic Grammar: An Engineering Techniquefor 22. R. C. Parkisotr, K. M. Colby, and w. S. Faught, "Conversational Constructing Natural Longuage (Jnderstanding Systems, BBN language comprehensionusing integrated pattern-matching and Report 3453, Bolt, Beranek, and Newman, cambridge, MA, Deparsing," Artif , Intell. 9, 111-134 (1977). cember1976. 23. W. A. Woods, "Transition network grammars for natural lan- 46. J. C. Carbonell and P. H. Hayes, Robust Parsing Using Multiple guage analysis," CACM 13(10),591-606 (October1970). Construction-SpecificStrategies, in L. Bolc (ed.;, Natural Language Parsing Systems,Springer-Verlug, New York, 198b. 24. C. R. Riesbeck and R. C. Schank, Comprehensionby Computer: Expectation-Based Analysis of Sentencesin Context. 78, Com- 47. J. G. Carbonell, Towards.a Robust, Task-OrientedNatural Lanputer Science Department, Yale University, New Haven, CT, guage Interface, Workshop/Symposium on Human Computer InL976. teraction, Georgia Technical Information Sciences,March 1981. 25. M. A. Marcus, A Theory of SyntacticRecognitionfor Natural Lan- 48. J. G. Carbonell, Robust Man-Machine Communication, Lfser guage, MIT Press, Cambridge, MA, 1980. Modelling and Natural Language Interface Design, in S. Andriole
NOAH
677
10, Ellis Horwood, and Wiley, New York, Chichester, U'K" pp' (ed.), Applications in Artificiat Intelligence' Petrocelli, Boston' 325-337, L982. MA, 1985. (eds'), Harms R. McDermott Rl:A Rule-Based Configurer of Computer Systems, 69. J. and Bach E. in 49. c. Fillmore, The case for case, Carnegie-Mellon University Computer Science Department, (Jniuersals in Linguistic Theory, Holt, Rinehart, and winston, Pittsburgh, PA' 1980. New York, PP. 1-90, 1968' use unq G. Carbonell, J. H. Larkin, and F. Reif, Towards a General computation 70. J. Their b0. R. F. simmons, semantic Networks: K' M' and Reasoning Engine, Carnegie-Mellon University ComSchank, C. Scientific R. in sentences, English for Understanding FreeDepartment, Pittsburgh, PA' CIP #445, L983' puter Language, Science and Thought colby (eds.),computir Mod,etsof 1973' W. M. Boggs, M. L. Mauldin, and P' G' Anick, The 63-113, G. Carbonell, ?1. J. Francisco, man, San PP. (ed.),concepA Natural Language Interface to Expert Sys' hoject, Schank XCALIBUR C. R. in Analysis, conceptual 51. c. Riesbeck, pp' 83in S. Andriole (ed'), Applications in ArtifiBases, Data and Amsterdam, tems tual Information Processing,North-Holland, MA, 1985. Boston, Intelli.gence, cial 156, t975. Pragmatics in Task-Oriented Natural underand Discourse Plans Carbonell, J. G. Goals, 72. scripts, Abelson P. R. , and 52. R. G. Schank, of the Twenty-First Annual Proceedings Interfaces, 1977. Language NJ, standing, Lawrence Erlbaum, Hillsdale, Linguistics, CamComputational Association the of Meeting for La' Understanding, 53. R. Schank and C. Riesbeck,Inside Computer 1983. ACL-83, MA, bridge, wrence Erlbaum, Hillsdale, NJ, 1980' Address: ?3. D. L. Waltz and A. B' Goodman, Writing a Natural Language 54. R. C. Schank and J. G" Carbonell, Re The Gettysburgh (ed.), Data Base System, Proceedings of th.e Fifth /JCAI, Cambridge' Aspolitical Findler v. N. in Acts, Representing social and MA, pp. 14l-15O, 1977. L979' sociatiue Networks,Academic Press, New York, 327-362' Carbonell, Subjectiue und.erstanding: Computer Mod'els of Be' J. Interac- ?4. 55. P. J. Hayes, A Construction specific Approach to Focused Systems,UMI Research Ann Arbor, MI' 1981. Iicf proceedings Annual parsing Nineteenth of the , tion in Flexible Stanford B. J. Grosz, Utterance and Objective: Issues in Natural Language Linguistics, ?5. Computa,tional Meeting of the Associationfor Communicatio n, Proceedingsof the Si,xthInterrntional Joint ConUniversitY, PP. 149-Il2,June 1981' ferenceon Artificinl Intelli'gence, pp. 1067-1076' 1979' b6. p. J. Hayes and J. G. carbonell, Multi-strategy constructionProUpdate, and 76. E. Charniak and Y. Wilks (eds.), Compuational Semantics, Base Data Query Specificiarsing for Flexible Artificial on North-Holland. Amsterdam, 1976' Conference Joint International Seuenth the of ceed,ings pp' 432Vancouver' Columbia, ??. R. G. Schank and K. M, Colby (eds.),CornputerModels of Thought Intettigence,University of British and. Language, Freeman, San Francisco, 1973. 1981. August 439, for ?8. T. Winograd, Language as a Cognitive Process, Yol. l, Synta't' techniques "Relaxation sondheimer, K. N. and Kwasny b7. s. c. unAddison Wesley, Reading, MA, 1982. parsing grammatically ill-formed input in natural language (1981)' 99-108 (May Ling.7(2) Computat. J. A*. s,'; system derstanding ?9. "Teaching Computers Plain English," Ht'ch TechnoLogy,tG in selection strategy Dynamic Hayes, J. P. and 1986). 58. J. G. Carbonell of Flexible parsin g, proceed.ingsof the NineteenthAnnual Meeting UniverStanford Linguistics, Computational the Association for Jurr,rr G. C$noNELL and Ptutp J. Hlvps sity,Stanford,CA,pp'143-I47,June1981' Carnegie-Mellon University and Carnegie Group Inc' parsing and its bg. p. J. Hayes and J. G. carbonel, Multi-strategy Role in Robust Man-Machine Communication,CMU-CS-81-118' This research was sponsored in part by the Defense Advanced Reby carnegie-Mellon university computer science Department, search Projects Agency OOD), ARPA Order No' 3597, monitored F33615-81-K-1539' contract under Laboratory 1981' Avionics MaY Air Force the Pittsburgh, PA, and Ex- and in part by the Air Force Office ofScientific Researchunder Con60. s. c. Kwasny and N. K. sondheimer, ungrammaticality The views and conclusions contained in this tragrammaticality in Naturar Language understanding systems, tractF4g620-79-c-0143. proceed.ingsof thi SeuenteenthMeeting of the Association for Com' document are those of the authors and should not be interpreted as expressed or implied, of putationol Linguistics,San Diego, CA, ACL-?9, pp. L9-23, L979' representing the official policies, either or the U'S' govResearch, of Scientific Office Force Air the DARPA, 61. J. R. Ross,"Metalinguistic Anaph ora,"Ling. Inq. L(2),273 o970)' Pre- ernment. 62. J. G. Carbonell, Interpreting Meta-Langu&ge Utterances. L'Orpar Naturel Language prints of the workshop: L',Analyzedu dinateur, Cadarache,France, 1982' Learning' Processing NEAR-MISS ANAIYSIS. See Concept learning; 63. P. J. Hayes and J. G. Carbonell, A Framework for Eighth Corrections in Task-Oriented Dialogs. Proceedings of the KarlsInternational Joint Conferenceon Artificial Intelligence, NOAH 1983. ruhe, FRG, Earl 64. J. F. Allen and C. R. Perrault, "Analyzing intention in utterA hierarchical planner (qv) developed around 1975 by to nets procedural uses NOAH ances,"Art. Intell. 15(3), I43-L78 (1980)' International, SRI at Sacerdoti Plans and for A Structure Basis a (see as Acts speech E. Sacerdoti, plans P. R. Cohen, and Allen, F. represent J. 65. c. R. Perrault, for Understanding Dialog Coherence,Proceedings of the Second Behavior, hechnical Note 109, AI Center, SRI International' Conferenceon Theoretical Issuesin Natural Langua,geProcessing, 1975). Cambridge,MA, 1978. K. S. Anona 60. J. R. Searle, Indirect SpeechActs, P. Cole and J. L. Morgan (eds.), SUNY at Buffalo rn Syntax and Semantics, Vol . 3, Speech Acts, Academic Press, New York, L975. NONMONOTONIC tOGlC. See Reasoning,nonmonotonic. 67. H. P. Grice, ConversationalPostulates,in D. A. Norman and D. E. Rumelhart (eds.), Explorations in Cognition, W. H. Freeman' San Francisco,1975. NONMONOTONIC REASONING.See Belief revision; Theo68. J. McDermott, XSEL: A Computer Salesperson'sAssistant, in J. proving. rem (eds.), vol. Intelligence, Machine Hayes, D. Michie, and Y-H. Pao,
678
NON.VON
NON-VON
referred to as a single instruction stream, multiple d,atastream (SIMD) mode of execution. The name NON-VON refers to a family of massively parallel The current version of the general NON-VON design, how"new generation" computer architectures (1) develop.i at co- ever' provides for a number of LPEs, each capabl" o? broadlumbia University for use in high-performance AI applica- casting an independent stream of instructions to some subtree tions. The NON-VON machine architecture is basedon a very of the active memory tree, as first described in Ref. (2). The large number (many thousands and, ultimately, millions) of LPEs in the general machine are interconnectedusing a high processing elements implemented using specially designed bandwidth, low latency interconnection network. The in.orpocustom integrated circuit chips, each containing a number of ration of a number of communicating LPEs gives the general processingelements.An initial 63-processorprototype, called NON-VON architecture the capacity for *ilttpt, inslruction NON-VON 1, has been operationalat Columbia sinceJanuary stream, multiple data stream (MIMD) and multipte SIMD exe_ 1985. cution, multitasking applications, and multiuser operation. This entry begins with a brief overview of the NON-VON The general NON-VON architecture also includes a Eecond,ary architecture. Performance projections derived through de- processing subsystembased on a bank of "intelligent" disk tailed analysis and simulation are then summartzed,foiappli- drives capable of high-bandwidth parallel transfers between cations in the areas of rule-basedinferencing, computer vision, primary and secondarystorageand of the parallel execution of and knowledge base management. The results of these pro- certain operatorsat the level of the individual disk heads. jections, most of which are basedon benchmarkspropor"d by other researchers, suggest that NON-VON could provide a performance improvement of as much as several orders of Applicationsand PerformanceEvaluation magnitude on such tasks by comparison with a conventional NON-VON's performancehas thus far been evaluated in three sequential machine of comparable hardware cost. The entry AI task areas: concludeswith a conciseexplanation of the basis for NONVON's performanceand cost/performanceadvantagesin these 1. rule-basedinferencing, implemented using the OpSb prosuperficially dissimilar AI task domains. duction system language (seeRule_basedsystems); 2' the performanceof a number of low- and intermediate-level image-understanding(qv) tasks; and NON-VON Architecture 3. the execution of certain "difficult" relational algebraic opCentral to all membersof the NON-VON family is a massively erations having relevance to the manipulation of knowlparallel active memory. The active memory is composedof a edgebases. very large number of simple, area-efficient small frocessing elements (SPEs)that are implemented using custom VLSI cirAn experimental compiler and run time system for the execuits. The most recently fabricated active memory chip con- cution of OPS5on a one-LPE NON-VON has beenwritten and tains eight 8-bit processingelements. Each SPE .o*prlses a tested on an instruction-level simulator (B).In order to predict small local RAM, a modestamount of processinglogic, and an the algorithm's performance when executing real prodrr.tion I lO switch that permits the machine to be dynamically recon- systems, its running time has been calculated bar.a on meafigured to support various forms of interprocessorcommunica- surements obtained by Gupta and Forgy (4) of the static and tion. dynamic characteristics of six actual production systems, In the current version of the general NON-VON machine, which had an averageof 910 inferencerules each.Accordingto the SPEs are configured as a complete binary tree whose these calculations, a NON-VON configuration having approxileaves are also interconnected to form a two-dimensional or- mately the same cost as a VAX LIl780 would executeapproxithogonal mesh. Each node of the active-memory tree, with the mately 903 productions per second.By way of compurirorr, . exception of the leaves and root, is thus connectedto three LISP-based OPS5 interpreter executing the sequential Rete neighboring SPEs, which are called the parent, left child,, and Match algorithm on a VAX lll780 typically fires between 1 right child of the node in question, and each leaf is connected and 5 rules per second,and a Bliss-basedintlrpreter executes to its parent and to its four mesh-adjacent SPEs, which are between 5 and 12 productionsper second. called tts north, south, east, and west neighbors. In addition, In the image-understanding domain, algorithms have been the IIO switches may be dynamically configuredin such a way developed, simulated, and in some casesexecutedon the acas to support "linear neighbor" communication, in which all tual NION-VON 1 machine for image correlation, histogramSPEs are capable of communicating in parallel with their left itg, thresholding, union, intersection, set differerrr.,- conor right neighbors in a particular, predefined linear ordering. nected component labeling, Euler number, area, perimeter, NON-VON programs are not stored within the small RAM center of gravity, eccentricity, the Hough transform (qv), and associated with each SPE but are instead broadcast to the the "moving light display,' problem (5). The results of these active memory by one or more large processing elements comparisons suggestthat NON-VON should offer an increase (LPEs), each based on an off-the-shelf 32-bit microprocessor in performanceof between a factor of 100 and 1000 by comparhaving a significant amount of local RAM. In the simplest ison with a VAX lll780 of approximately the same cost and NON-VON configuration, which was also the first to be imple- should in a number of casesimprove on the best results remented, the entire active memory operates under the control ported in the literature for special-purposevision architecof a single LPE that broadcastsinstructions through a high- tures and other highly parallel machines. speedinterface called the active memory controller for simulAlgorithms for a number of databaseprimitives have been taneous execution by all enabled SPEs.This simple configura- developedfor the NoN-voN machine, including select,protion thus restricts NON-VON's operation to what is oft.rt ject, join, union, intersection, set difference,aggregation,and
NON.VON various statistical operations. To evaluate NoN-voN's applicability to the kinds of databaseoperationsmost relevant to AI applications, a detailed analysis was performed (6) of the ma,hirr.,* projected performance on a set of benchmark queries formulated by Hawthorn and DeWitt (7). This analysis predicted that NON-VON should provide higher performance than any of the fi.ve special-purposedatabase machines evaluated by Hawthorn and Dewitt at approximately the same hardware cost. Although NON-VON's relative cost/performance advantage over specialtzed database machines was modest in the case of relational selection, major advantages were found in the case of those computationally demanding operations that appear to be most relevant to AI applications.
679
NON-VON's strong performanceon any given AI task is probably of less interest than the range of diverse AI tasks that would appear to be efficiently executable within a single machine. It must be noted that there is still insufficient evidence to adequately evaluate the extent to which the NON-VON architecture might serve as the basis for a high-performance "general AI machine." The diversity of AI applications for which NON-VON has been shown to offer significant potential performance and cost/performanceadvantages,however, suggests that some of the essential principles underlying this architecture might point the way toward one possible approach to the ultimate development of such machines (seealso Boltzmann machines; Connection machines; LISP machines).
Sourcesof NON-VON'sAdvantages Different aspects of the NON-VON architecture appear to be responsibleior the machine's advantagesin different problem ur.Lr. It is nonethelesspossible to identify a relatively small number of features, several of which are typically operative in the caseof any single application, to which the machine's advantages may be attributed.
I
The effective exploitation of an unusually high degree of parallelism, which is made possibleby the very fine granularity of the active memory. The extensive use of broadcast communication, high-speed content-addressablematchirg, and other associative processingtechniques.
BIBLIOGRAPHY 1. D. E. Shaw, Organization and Operation of a Massively Parallel Machine, in G. Rabbat (ed.), Computers and Technology,Elsevier North-Holland, Amsterdam, 1985. 2. S. J. Stolfo and D. E. Shaw, DADO: a Tree-structured Machine Architecture for Production Systems, Proceedings of the Second National Conference on Artificial Intelligence, Pittsburgh, PA, 1982. B. B. K. Hillyer and D. E. Shaw, "Execution of OPS5 production systems on a massively parallel machine:' J. Parall. Distr. Comput. 3(2), 236-268 (June 1986). 4. A. Gupta and C. L. Forgy, Measurements on Production Systems, Technical Report, Carnegie-Mellon Computer ScienceDepartment,
The exploitation of other physical and logical interconnection topologiesto support a number of problem-specificcommunication functions.
Pittsburgh, PA, 1983. b. H. A. H. Ibrahim, Image Understanding Algorithms on FineGrained Tree-Structured SIMD Machines, Ph.D. Thesis, Department of Computer Science,Columbia University, New York, October 1984. 6. B. K. Hillyer, D. E. Shaw, and A. Nigam, "NON-VON's performance on certain database benchmarks," IEEE Trans. Software Ens. SE-12(4),577-S83 (April 1986).
The capacity for SIMD, MIMD, and MSIMD execution and for a mixture of synchronous and asynchronousexecution within a single algorithm.
7. p. B. Hawthorn and D. J. DeWitt, "Performance analysis of alternative database machine architectures," IEEE Trans. Software Eng.SE-8(1), 6t-75 (January 1982).
The use of the active memory tree to execute algebraically commutative and associativeoperations (such as sum and maximum) in logarithmic time.
The simplicity and cost-effectivenesswith which the machine .un be implemented using currently available technology.
D. E. Ssaw ColumbiaUniversitY