Game Sound Technology and Player Interaction: Concepts and Developments Mark Grimshaw University of Bolton, UK
InformatIon scIence reference Hershey • New York
Director of Editorial Content: Director of Book Publications: Acquisitions Editor: Development Editor: Publishing Assistant: Typesetter: Production Editor: Cover Design:
Kristin Klinger Julia Mosemann Lindsay Johnston Joel Gamon Milan Vracarich Jr. Natalie Pronio Jamie Snavely Lisa Tosheff
Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail:
[email protected] Web site: http://www.igi-global.com Copyright © 2011 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Game sound technology and player interaction : concepts and development / Mark Grimshaw, editor. p. cm. Summary: "This book researches both how game sound affects a player psychologically, emotionally, and physiologically, and how this relationship itself impacts the design of computer game sound and the development of technology"-- Provided by publisher. Includes bibliographical references and index. ISBN 978-1-61692-828-5 (hardcover) -- ISBN 978-1-61692-830-8 (ebook) 1. Computer games--Design. 2. Sound--Psychological aspects. 3. Sound--Physiological effect. 4. Human-computer interaction. I. Grimshaw, Mark, 1963QA76.76.C672G366 2011 794.8'1536--dc22 2010035721 British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the authors, but not necessarily of the publisher.
Editorial Advisory Board Theo van Leeuwen, University of Technology, Australia Gareth Schott, University of Waikato, New Zealand
List of Reviewers Thomas Apperley, University of New England, Australia Roger Jackson, University of Bolton, England Martin Knakkergaard, University of Aalborg, Denmark Don Knox, Glasgow Caledonian University, Scotland Theo van Leeuwen, University of Technology, Sydney, Australia David Moffat, Glasgow Caledonian University, Scotland Patrick Quinn, Glasgow Caledonian University, Scotland Gareth Schott, University of Waikato, New Zealand
Table of Contents
Foreword ............................................................................................................................................. xii Preface ................................................................................................................................................ xiv Acknowledgment ................................................................................................................................. xx Section 1 Interactive Practice Chapter 1 Sound in Electronic Gambling Machines: A Review of the Literature and its Relevance to Game Sound ...................................................................................................................... 1 Karen Collins, University of Waterloo, Canada Holly Tessler, University of East London, UK Kevin Harrigan, University of Waterloo, Canada Michael J. Dixon, University of Waterloo, Canada Jonathan Fugelsang University of Waterloo, Canada Chapter 2 Sound for Fantasy and Freedom... ........................................................................................................ 22 Mats Liljedahl, Interactive Institute, Sonic Studio, Sweden Chapter 3 Sound is Not a Simulation: Methodologies for Examining the Experience of Soundscapes................................................................................................................... 44 Linda O’ Keeffe, National University of Ireland, Maynooth, Ireland Chapter 4 Diegetic Music: New Interactive Experiences... ................................................................................... 60 Axel Berndt, Otto-von-Guericke University, Germany
Section 2 Frameworks & Models Chapter 5 Time for New Terminology? Diegetic and Non-Diegetic Sounds in Computer Games Revisited... ................................................................................................................................ 78 Kristine Jørgensen, University of Bergen, Norway Chapter 6 A Combined Model for the Structuring of Computer Game Audio...................................................... 98 Ulf Wilhelmsson, University of Skövde, Sweden Jacob Wallén, Freelance Game Audio Designer, Sweden Chapter 7 An Acoustic Communication Framework for Game Sound: Fidelity, Verisimilitude, Ecology ............................................................................................................................................... 131 Milena Droumeva, Simon Fraser University, Canada Chapter 8 Perceived Quality in Game Audio ...................................................................................................... 153 Ulrich Reiter, Norwegian University of Science and Technology, Norway Section 3 Emotion & Affect Chapter 9 Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games... ........................... 176 Paul Toprac, Southern Methodist University, USA Ahmed Abdel-Meguid, Southern Methodist University, USA Chapter 10 Listening to Fear: A Study of Sound in Horror Computer Games... .................................................. 192 Guillaume Roux-Girard, University of Montréal, Canada Chapter 11 Uncanny Speech.................................................................................................................................. 213 Angela Tinwell, University of Bolton, UK Mark Grimshaw, University of Bolton, UK Andrew Williams, University of Bolton, UK Chapter 12 Emotion, Content, and Context in Sound and Music.......................................................................... 235 Stuart Cunningham, Glyndŵr University, UK Vic Grout, Glyndŵr University, UK Richard Picking, Glyndŵr University, UK
Chapter 13 Player-Game Interaction Through Affective Sound... ........................................................................ 264 Lennart E. Nacke, University of Saskatchewan, Canada Mark Grimshaw, University of Bolton, UK Section 4 Technology Chapter 14 Spatial Sound for Computer Games and Virtual Reality... ................................................................. 287 David Murphy, University College Cork, Ireland Flaithrí Neff, Limerick Institute of Technology, Ireland Chapter 15 Behaviour, Structure and Causality in Procedural Audio... ................................................................ 313 Andy Farnell, Computer Scientist, UK Chapter 16 Physical Modelling for Sound Synthesis... ......................................................................................... 340 Eoin Mullan, Queen’s University Belfast, N. Ireland Section 5 Current & Future Design Chapter 17 Guidelines for Sound Design in Computer Games... .......................................................................... 362 Valter Alves, University of Coimbra, Portugal & Polytechnic Institute of Viseu, Portugal Licínio Roque, University of Coimbra, Portugal Chapter 18 New Wine in New Skins: Sketching the Future of Game Sound Design... ........................................ 384 Daniel Hug, Zurich University of the Arts, Switzerland Appendix..... ....................................................................................................................................... 416 Compilation of References ............................................................................................................... 427 About the Contributors .................................................................................................................... 467 Index ................................................................................................................................................... 473
Detailed Table of Contents
Foreword ............................................................................................................................................. xii Preface ................................................................................................................................................ xiv Acknowledgment ................................................................................................................................. xx Section 1 Interactive Practice Chapter 1 Sound in Electronic Gambling Machines: A Review of the Literature and its Relevance to Game Sound ...................................................................................................................... 1 Karen Collins, University of Waterloo, Canada Holly Tessler, University of East London, UK Kevin Harrigan, University of Waterloo, Canada Michael J. Dixon, University of Waterloo, Canada Jonathan Fugelsang University of Waterloo, Canada An analysis of the music and sound used in electronic gambling machines. The psychology at play is discussed: how sound is used to create a sense of winning and how such specific sound design might be useful to computer game sound design in general. Chapter 2 Sound for Fantasy and Freedom... ........................................................................................................ 22 Mats Liljedahl, Interactive Institute, Sonic Studio, Sweden The relationship between sound and image in computer games and how, in a reversal of the normal situation, sound can be given priority over the visual. The rationale for such a reversal is demonstrated through practical game design examples. Chapter 3 Sound is Not a Simulation: Methodologies for Examining the Experience of Soundscapes................................................................................................................... 44 Linda O’ Keeffe, National University of Ireland, Maynooth, Ireland
What is the relationship between player and the game’s soundscape? How elements of the soundscape are perceived by the player is explained through the principles and theories of acoustic ecology. Chapter 4 Diegetic Music: New Interactive Experiences... ................................................................................... 60 Axel Berndt, Otto-von-Guericke University, Germany An analysis of diegetic music in games and, in particular, an assessment of issues of interaction and algorithmic performance. A framework is proposed that aids in the design of both individual and social musical performance paradigms into music games. Section 2 Frameworks & Models Chapter 5 Time for New Terminology? Diegetic and Non-Diegetic Sounds in Computer Games Revisited... ................................................................................................................................ 78 Kristine Jørgensen, University of Bergen, Norway The terms diegetic and non-diegetic are widely used in the analysis games (and not solely for sound). A thorough analysis of the application of the terminology to computer game sound is provided resulting in a new model that accounts for the interactive nature of the medium. Chapter 6 A Combined Model for the Structuring of Computer Game Audio...................................................... 98 Ulf Wilhelmsson, University of Skövde, Sweden Jacob Wallén, Freelance Game Audio Designer, Sweden A framework for the analysis and design of computer game sound is provided that builds upon existing frameworks in games and film. A practical example demonstrates the models utility. Chapter 7 An Acoustic Communication Framework for Game Sound: Fidelity, Verisimilitude, Ecology ............................................................................................................................................... 131 Milena Droumeva, Simon Fraser University, Canada Soundscape and communication theories are used to assess the computer game’s soundscapes and the ways in which the player perceives it. Different codes of realism are discussed and a model of the player and soundscape combined as acoustic ecology is proposed. Chapter 8 Perceived Quality in Game Audio ...................................................................................................... 153 Ulrich Reiter, Norwegian University of Science and Technology, Norway
Perceptual bi-modality and cross-modality between auditory and visual stimuli is discussed in addition to issues of realism and verisimilitude. A design model is put forward that assesses audio quality in computer games on the basis of player interactivity and attention. Section 3 Emotion & Affect Chapter 9 Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games... ........................... 176 Paul Toprac, Southern Methodist University, USA Ahmed Abdel-Meguid, Southern Methodist University, USA An overview of relevant emotion theories and their potential application to sound design for computer games. In particular, discussion centres around the eliciting of fear and anxiety during gameplay and the results of experiments in this area are discussed. Chapter 10 Listening to Fear: A Study of Sound in Horror Computer Games..................................................... 192 Guillaume Roux-Girard, University of Montréal, Canada A thorough analysis of sound design and sound perception in the survival horror game genre that focuses upon sound’s ability to instil fear and dread in the play. An analytical model of sound design is proposed that is founded upon the reception of sound, rather than production, and the use of the model is illustrated through several practical examples. Chapter 11 Uncanny Speech.................................................................................................................................. 213 Angela Tinwell, University of Bolton, UK Mark Grimshaw, University of Bolton, UK Andrew Williams, University of Bolton, UK An exploration of the genesis of the Uncanny Valley theory and its implications for the design and perception of Non-Player Character speech in horror computer games. Empirical work by the authors’ on the perception of such speech is discussed particularly with regard to the evocation of fear and anxiety. Chapter 12 Emotion, Content, and Context in Sound and Music.......................................................................... 235 Stuart Cunningham, Glyndŵr University, UK Vic Grout, Glyndŵr University, UK Richard Picking, Glyndŵr University, UK A summary of emotion research and its relevance to the design of sound and music for computer games is provided before a discussion on the use and effect of musical playlists during gameplay. In particular,
such playlists can be generated automatically according to the real-world environment the player plays in and according to the player’s changing psychology and physiology. Chapter 13 Player-Game Interaction Through Affective Sound... ........................................................................ 264 Lennart E. Nacke, University of Saskatchewan, Canada Mark Grimshaw, University of Bolton, UK An assessment of the role and efficacy of psychological, physiological, and psychophysiological measurements of players exposed to sound and music during gameplay. Recent empirical results from a psychophysiological study on computer game sound is presented followed by a discussion on the implications of biofeedback for game sound design and player immersion. Section 4 Technology Chapter 14 Spatial Sound for Computer Games and Virtual Reality... ................................................................. 287 David Murphy, University College Cork, Ireland Flaithrí Neff, Limerick Institute of Technology, Ireland An introduction to spatial sound, its application to computer games and the technological challenges inherent in emulating real-world spatial acoustics in virtual worlds. A variety of current technologies are assessed as to their strengths and weaknesses and suggestions made as to the requirements of future technology. Chapter 15 Behaviour, Structure and Causality in Procedural Audio... ................................................................ 313 Andy Farnell, Computer Scientist, UK A critical assessment of the current use of audio samples for computer games from the point of view of creativity and realism in game sound design. Procedural audio is proposed instead and the strengths and opportunities afforded by such a technology is discussed. Chapter 16 Physical Modelling for Sound Synthesis... ......................................................................................... 340 Eoin Mullan, Queen’s University Belfast, N. Ireland A review of the potential for computer game sound design of one branch of procedural audio viz. physical modelling synthesis. The historical evolution of the process is traced leading to a discussion of how such synthesis might be integrated into game engines and the implications for player interaction.
Section 5 Current & Future Design Chapter 17 Guidelines for Sound Design in Computer Games... .......................................................................... 362 Valter Alves, University of Coimbra, Portugal & Polytechnic Institute of Viseu, Portugal Licínio Roque, University of Coimbra, Portugal A discussion of the relevance and importance of sound to the design of computer games with particular regard to the concepts resonance and entrainment. Seven guidelines for game sound design are presented and exemplified through an illustrative example of a game design brief. Chapter 18 New Wine in New Skins: Sketching the Future of Game Sound Design... ........................................ 384 Daniel Hug, Zurich University of the Arts, Switzerland The aesthetic debt that computer game sound owes to film sound is described as a prelude to a variety of examples from independent game developers going beyond such a paradigm in their sound design. Suggestions are made as to how game sound design might evolve in the future to take greater account of the interactive potential inherent in the structure of computer games. Appendix..... ....................................................................................................................................... 416 Compilation of References ............................................................................................................... 427 About the Contributors .................................................................................................................... 467 Index ................................................................................................................................................... 473
xii
Foreword
BANG! There, that got your attention. OK, so that’s a fairly bad joke to illustrate just what sound can do for you… namely, GET YOUR ATTENTION! Actually, sound does so much more: it connects your visual input to a frame of reference, the audio-visual contract. So, when we create experiences, either in film, TV, live on stage, or in computer games, we use this cerebral connection between sound and vision to intensify your overall experience. Because, that’s our goal in any of these mediums–to create an experience! Sound takes up 50% of this experience (maybe not 50% of the budget, but that’s another story). There’s an old adage we audiophiles use when discussing budgets in the hope that a producer might actually listen to us once in a while. If you get a room full of people to watch great graphics with poor sound and then compare it to poor graphics with great sound, they will almost always perceive the latter as the best quality graphics. Generally producers don’t believe this story, but I have witnessed it in real life. A few years ago I was working on an AAA title–action adventure: cars, guns, gangsters… you get the idea. One evening, the sound designer reworked the “Whacking someone over the head with a pool cue” sound, improving its overall effectiveness with small, subtle, deep thuds, some crunching bone (actually carrots), and a deliciously realistic skin smacking sound (supermarket chicken being hit by a baseball bat). He added his new sound to the game database and went home. The following morning the game team rebuilt the whole game (including the new sound). Later that day many people congratulated the “Whacking someone over the head with a pool cue” animator on his new improved animation: he was somewhat bemused to say the least. He hadn’t worked on that animation for several weeks. I’m sure you can work out what happened, people saw the same animation with the new improved sound and believed they were seeing a better animation. This is how we use the audio-visual contract to our benefit. OK, so that’s my practitioner's story in, but let’s take a look at game sound and what you need to study if you are interested in this field… and what’s in this book. There are several axes or dimensions to think about. Emotion is the obvious one: fear, anger, hatred and so on, these are all well represented in game sound, from survival horrors to gangster simulations. But what about humour, joy, happiness? Just play Mario Kart, Sonic the Hedgehog, Loco Roco and I guarantee you’ll soon realise that the sound has a great deal to do with provoking laughter, smiles, and an enlightened mood. So, I’ve now mentioned the breadth of experience our industry creates, but think too of another axis, the history of game sound. From tiny little beeps and bleeps (Pong) to the colossus soundscapes of today’s blockbuster games. A story which starts with a few programmers/musicians/sound engineers trying to get “something” out of a paltry 8-bit chip after the graphics guys have already had their fill, through to my point at the beginning of this introduction–persuading a producer to give you some kind of
xiii
serious sound budget. A tale of one guy who does everything (including the voice over) to a small army of specialists from musicians, Foley artists, sound technicians, weapons specialists, vehicle specialists, atmosphere creators, the list goes on. Our game sound pioneers took this journey and, along the way, solved some tricky issues, like repetition–in music, in dialogue, in sound effects–memory management, automated in-game mixing and so on. I am going to sum this section up by saying there are now many different aspects to game sound: music, diegetic sound, atmospheres, interactive music, development of emotional connection, realism, abstractism, super-realism. What I really like about this book’s approach to game sound is the 5 core sections which give it a unique and very practical way of tying together all the axes I mentioned earlier, namely: Interactive Practice, Frameworks & Models, Emotion & Affect, Technology, and Current & Future Design. In conclusion, then, I hope that you, as a reader, enjoy the discussion and findings discussed here as much as I have. Dave Ranyard Dave Ranyard is the Game Director/Executive Producer of Sony’s hugely successful, 20+ million selling SingStar franchise. He has been in the games industry since the mid nineties, starting out as an AI programmer at Psygnosis, and later moving to Sony Computer Entertainment Europe's London Studio where he has held a number of roles over the past 10 years, ranging from audio manager to running the internal creative services group. He has worked on titles including Wip3out, The Getaway & The Getaway: Black Monday, The Eyetoy: Play series and, more recently, Singstar. Prior to the games industry he lectured in Artificial Intelligence at the University of Leeds where he also gained a PhD in the subject. In recent years, Dave has taken a keen interest in GDC and is currently on the advisory board. Dave is a keen musician and he has written and produced many records over the past 15 years.
xiv
Preface
A phrase often used when writing about the human ability to become immersed in fantasy is “the willing suspension of disbelief” which Samuel Taylor Coleridge first coined in the early 19th Century as an argument for the fantastical in prosody and poetry. What is a computer game? At base, it is nothing more than a cheap plastic disc encased within a cheaper plastic tray. And the system it is destined for? A box of electronics, lifeless in a corner. Put the two together, though, throw in the player's imagination and interaction and he or she is delivered of experiences that, to use Diderot's phrase, are “the strongest magic of art”. Disbelief is suspended willingly, sense and rationality recede, and the player becomes engaged with, engrossed in, and, given the appropriate game, immersed in a virtual world of flickering light and alluring sound where the fantastical becomes the norm and the mythic reality. For the reader interested in that flickering light, there is a plethora of books and scholarly articles on the subject. For the reader interested in the ins and outs of music and sound software, how to rig a microphone to record sound, and how to transfer that sound to a game environment, there likewise is a wealth of handy resources. For the reader truly interested in understanding or harnessing the power of sound in that virtual world, in emulating reality or the creation of other realities, in engaging, engrossing and immersing the player through sound and emotion, there is this book. This is a book that deals with computer game sound in a variety of forms and from a variety of viewpoints. Sound FX, rather than game music is the topic, other than where the music is interactive or otherwise intimately bound up with the playing of the game. Such sound FX may be used to emulate acoustic environments of the real world while others deliberately set out to create alternate realities, some are based upon the use of audio samples whilst others are starting to make use of procedural synthesis and audio processing, some sound works hand-in-hand with image and game action to immerse the game player in the gameworld while interactive music in other cases is the sole raison d’être of the game. From the simplest of puzzle games to the most detailed and convoluted of gameworlds, sound is the indicator par excellence of player engagement and interaction with the structures of the game and the rules of play. Academic writing about game sound, its analytical and theoretical drivers , is a developing area and this is reflected by the diversity of theoretical methodologies and the variety of terminology in use. Far from being a weakness, this range points to the potential for the discipline and the wide appeal of its study because it is, at heart, multidisciplinary. The range of subject matter across the chapters reflects the complexity and potential of human interaction with sound in virtual worlds as much as it reflects the passions, backgrounds, and training of the book's contributors. Their contributions to the study of computer game sound bring in disciplines and theories from film studies, cultural studies, sound design, acoustic ecology, acoustics, systems design and computer programming and cognitive sciences and psy-
xv
chology. The authors themselves have a diversity of experience: some are researchers and academics whilst others are sound practitioners in the games industry. All are experts in their chosen field yet all are students of game sound, forever exploring, forever questioning, forever seeking to drive the study and practice forward. The readership of this book is intended to be similarly diverse in terms of both discipline and motivation. There is something for everyone here: the student for whom a knowledge of computer game sound leads to that important qualification , a game sound designer wishing to keep abreast of the latest thinking and developmental concepts, or an academic theoretician or researcher working to innovate game sound theory or technology. Furthermore, the appeal of the book is wider than computer games, reaching out to those working in virtual reality or with autism, for example. The reader will not find screeds of instructions for software or hardware, programming recipes or tips on how to break into the industry. Instead, contained within this book, will be found lucid essays on philosophical questions, theoretical analyses on aspects of computer game sound, models for conceptualizing sound, ideas for sound design, and provocative discussions about new sound technology and its future implications. All chapters raise further questions as to the fascinating relationship between player and sound. Reflecting the disciplines the authors come from, some key terms (found at the back of each chapter) are provided with definitions that, prima facie, differ slightly to the definition provided for the same key term in another chapter. As with the authors' preferences for American or British English, this has been allowed to stand in order to illustrate both the diversity of approach to the topic throughout the book and the educational and professional backgrounds of each author. The study of computer game sound is yet young and the terminology and its application still in flux: the definition for each key term, where minor differences exist, pertains to the chapter the key term belongs to. The term “computer game” has been chosen, in preference to a number of other possibilities, as referring to all forms of digital game, arcade machine, gaming console, PC game, or videogame and the reader may assume that, unless a chapter uses one of those specific forms, “computer game” references the general case. Quite deliberately, the term has been chosen in preference to videogame in order to fly the flag for sound: videogames are not just video but sound too and all chapters proselytize the importance of sound to the game experience even where they reference the relationship of sound to image. “Sound” has generally been chosen in preference to “audio” because the focus of the book is on the relationship between sound and player rather than techniques for creating and manipulating audio data. However, “audio” is the usual term in some disciplines and, here, authors have been given free reign to use whichever terminology they are comfortable with. The book itself is organized into five sections. None is mutually exclusive in terms of its content. Indeed, the astute reader will pick up divers common threads meandering their way through the chapters: the debt game sound owes to film sound and the need to slough off that used skin, issues of presence and player immersion, realism, the unique, interactive nature of computer game sound, and the potential for the emotional manipulation of the player, for instance. All chapters, too, have an eye on the future and its possibilities and authors have been encouraged to speculate on that future. An oft-overlooked area in computer gaming (and certainly not the first thing that comes to mind with the term “computer game”) is that of electronic gambling machines: one-armed bandits and their modern equivalents. Karen Collins and her co-authors open the first section on Interactive Practice by providing a fascinating glimpse into the sound of such machines and how music and sound FX provoke and toy with the user's emotions in an effort to part them from their money. They draw parallels to sound use in other, more typical computer games and suggest ways in which sound use in electronic gambling
xvi
machines might provide inspiration for the design and analysis of sound in computer games in general. Mats Liljedahl's chapter is an attempt to redress the imbalance between visual and auditory modes in computer games. It does this by providing an overview of the use of sound in virtual environments and augmented realities, in particular, concentrating on the sound designer's required attention to emotion and flow. Using the concept of GameFlow, Liljedahl describes and explains two games he has been involved in the design of in which the sound modality is purposefully given priority over the visual. The chapter seeks to inspire and serves as an introduction to the art of computer game sound design: Sound for Fantasy and Freedom. Linda O Keeffe takes a holistic view of computer game sound by treating it as a dynamic soundscape created anew at each playing. She draws upon soundscape and acoustic ecology theory to elucidate her stance and compares and contrasts game soundscapes to real-world soundscapes. Throughout, O Keeffe prompts questions as to the listener's perception of, and relationship to, soundscapes: what is noise, what roles do context and the player's culture and society play? Ultimately, how can (and why should) we design immersive soundscapes for the gameworld? Next, Axel Berndt takes a close look at the occurrence of diegetic music in games, design principles for music games and issues of interactivity and algorithmic performance. A critique is presented of recent and current games as regards the performance of in-game music and advice and solutions are offered to improve what is currently a rather static state of affairs, merely scratching at the surface of possibility. Interactivity in music games is assessed through a critique of what is termed visualized music and Berndt proposes a framework of design that incorporates musical performance paradigms both as individual and as social, collaborative practice. This leads to Frameworks & Models which is opened by Kristine Jørgensen and whose chapter is both an exhaustive survey of the use of diegetic terminology, with regard to game sound, and a proposal for a new conceptual model for such sound. The main thrust of her argument is that the concepts of diegetic sound and non-diegetic sound have been transposed from film theory to the study of computer games with frequently scant regard for the very different premises of the two media. The interactive, real-time nature of computer games and the immersive environments of many game genres requires a radical reappraisal of sound usage and sound design for games: games are not films and the use-value of game sound is greater than that of film sound. This is followed by a chapter in which Ulf Wilhelmsson and Jacob Wallén propose a model for the analysis and design of computer game sound that combines two previous models–the IEZA Framework for game sound and Walter Murch's conceptual model for film sound–with affordance and cognition theories. The IEZA Framework accounts for the structural basis of game sound, the function of sound, while Murch's model describes sound as either embodied or encoded, a system that accounts for human perception and cognitive load limits. Combining the two systems, the authors assert, provides a powerful tool not only for analysis but also for the planning and design of computer game sound and this claim is demonstrated by a practical example. Milena Droumeva, in her chapter,filters the computer game soundscape through the precepts of Schafer's and Truax's soundscape and acoustic communication theories. Different ways of listening to game sound are proposed with assessments of the role of sound in the perception of realism: Does sound provide fidelity to source or does it provide a sense of verisimilitude and what are the strengths of each approach as regards computer game sound design? Droumeva ultimately advocates a view comprising game soundscape and player together as an acoustic ecology and expands that ecology from the virtual world of the game to include concurrent sounds from the real world.
xvii
Ulrich Reiter's chapter on Perceived Quality in Game Audio explores the bi-modality and cross-modality of auditory and visual stimuli in gameworlds. It summarizes previous work in this area and proposes a high-level salience model for the design of audio in games that accounts for both interactivity and attention as the bases for the evaluation of audio quality. Issues of level of realism and verisimilitude are discussed while the validity, and use, of Reiter's proposed model is substantiated through experimental methods outlined towards the end of the chapter. Paul Toprac and Ahmed Abdel-Meguid's chapter introduces the section dealing with Emotion & Affect. The authors present to the reader four relevant emotion theories then summarize fundamental research they conducted in order to test those properties of diegetic sound best suited to evoke sensations of fear, anxiety, and suspense. Their results, an early example of an empirical and statistical basis for sonic fear and anxiety, lead the authors to devise some rough heuristics for the design of such emotions into computer game sound and to point to directions for future research in the area. The following chapter, by Guillaume Roux-Girard, also deals with the perception of fear in computer game sound being an in-depth analysis of sound usage in the survival horror game genre that focuses on sound's ability to instil fear and dread in the player. It proposes a model for sound, based upon film sound practice and existing models for computer game sound, that is user-centric–one based on the reception of sound rather than its production–and Roux-Girard provides several illustrative examples from recent horror games to validate the model. In Uncanny Speech, Angela Tinwell, Mark Grimshaw, and Andrew Williams continue the horror theme with a look at Non-Player Character speech in horror games and its relationship to the 1970s' theory of the Uncanny Valley. The authors trace the development of theories of the uncanny from its beginnings in psychoanalysis over 100 years ago through to its practical application in robotics (as the Uncanny Valley theory) and its strong correlation to fear and anxiety in computer games. Recent empirical work by the authors is described and its implication for the design and production of Non-Player Character speech in computer games is discussed. Stuart Cunningham, Vic Grout, and Richard Picking's chapter looks at Emotion, Content, and Context in Sound and Music. The chapter is an exploration of the interaction that is possible between player and computer game sound, in particular, music playlists used in conjunction with games. The authors provide an overview of emotion research in the context of computer games and consider the emotional and affective value of sound and music to the player. The experimental work that is summarized in the chapter includes the generation of musical playlists according to the environmental context of the player: that is, the environment outside the game. Not only does this raise the intriguing situation of sensory and perceptual overlap and interplay between real-world and virtual, but the authors also suggest further possibilities such as the playlists themselves being responsive to the changing psychology and physiology of the player during gameplay. Concluding the section Emotion & Affect, Lennart Nacke and Mark Grimshaw's chapter is a study of computer games as affective activity. In this form of activity, sound has a large role to play and the chapter focuses on that role as it affects, indeed effects flow and, particularly, immersion: in the latter case, a preliminary mathematical equation is supplied for modelling immersion. The authors start with a review of psychological and physiological experiments and, combining these approaches, psychophysiological experiments on the effects of sound and image in virtual environments. Following a summary of a recent psychophysiological study on computer game sound conducted by the authors, the chapter concludes with a discussion on the advantages and disadvantages of such an empirical methodology before speculating on the implications of biofeedback for computer game sound with reference to player
xviii
interaction and immersion. The section on Technology opens with a chapter on Spatial Sound for Computer Games and Virtual Reality by David Murphy and Flaithrí Neff. The authors guide the reader through the basics of human spatial sound processing and the propagation of sound in space while pointing out the problems faced in transferring and accurately replicating these phenomena within computer game systems. They conclude with a survey of existing spatial sound technologies for use in virtual worlds, their strengths and weaknesses, and look to the future possibilities for computer game sound posed by the ongoing development of the technology. Andy Farnell's chapter comprises an in-depth critique of sample-based game audio followed by an analysis of the potential of procedural audio: the real-time design of sound. For Farnell, audio samples have proven to be too limiting, both for the purposes of creativity in game sound design and for the promise of realism: audio samples are predicated upon selection whereas procedural audio is design. A close discussion of procedural audio techniques, both as they have been used and how they might be used to the benefit of computer games, leads the author to the conclusion that it is both pointless and wasteful of computer resources to pursue precise sonic realism: procedural audio can instead be used to provide just the necessary level of realism, a perceptual realism, that is required for the player to comprehend source and source behaviour whilst saving scarce resources for more interesting and immersive tasks. Eoin Mullan's chapter delves further into the promise of procedural audio by providing a detailed exploration of the potential of physical modelling, a branch of procedural audio. He traces the technique's development from the synthesis of musical instrument sounds to its current state where it stands poised to deliver a new level of behavioural realism to computer games. For Mullan, this will be achieved through the integration of the technology with game physics engines and through physical modelling's ability to provide unprecedented levels of player-sound interaction. Current & Future Design is the subject of the next two chapters comprising the final section. First, Valter Alves and Licínio Roque present a lucid case for the importance of sound to the design and experience of computer games: attention should be paid to sound from the start of the design process the authors assert. Alves and Roque discuss concepts such as resonance and entrainment as means to engage and immerse players in the gameworld. They present 7 guidelines for game sound design and detail an illustrative example of the application of these heuristics. Lastly, in this section, Daniel Hug's chapter is a clarion call for a new aesthetic of computer game sound. Through a discussion of two dominant paradigms in computer game sound discourse, pursuit of reality and cinematic aesthetics, it details the debt that game sound owes to cinema sound but then uses examples, in particular from many innovative game developers and from cinema's own subversive stream, to shrug off that mantle and argue for a new future for game sound design. Rich in ideas and provocative in its discourse, the chapter is full of practical suggestions for making computer game sound not only a different experience to cinematic sound but an engaging and rewarding one too. Closing the chapters is an appendix which is a lightly edited transcript of an online discussion forum to which the book’s contributors were invited to attempt to debate and answer the question: What will the player experience of computer game sound be in the future? This is, of course, an open-ended question and the unstructured, lively debate that ensues is indicative of the open-ended potential for the future of computer game sound. Whatever your need in picking up the book, I hope you will find it met. At the very least, perhaps one of the contributions here will raise intriguing questions in your mind, an itch that will be scratched
xix
by future investigation on your part. Perhaps the ideas contained within will inspire you to develop a new game sound design paradigm or to innovate the technology and push the frontiers of human-sound interaction? After all, the aim of this book is not just to contribute to the development of ever better computer games or more cogent analyses but it is also to cast an illuminating light on at least one part of humankind's relationship with sound as we step out of reality into virtuality. Mark Grimshaw University of Bolton, UK
xx
Acknowledgment
An anthology such as this requires the hard work and input of many people not just the contributing authors: publishers and their staff, the Foreword author, the Editorial Advisory Board, and the reviewers. Each chapter has been exhaustively blind reviewed by two leading academics in the field and I would like to thank each and every one of them for their tireless work in support of this project: Thomas Apperley (University of New England, Australia), Roger Jackson (University of Bolton, England), Martin Knakkergaard (University of Aalborg, Denmark), Don Knox (Glasgow Caledonian University, Scotland), Theo van Leeuwen (University of Technology Sydney, Australia and member of the Editorial Advisory Board), David Moffat (Glasgow Caledonian University, Scotland), Patrick Quinn (Glasgow Caledonian University, Scotland), and Gareth Schott (University of Waikato, New Zealand and member of the Editorial Advisory Board). My thanks too to Dave Ranyard of Sony Computer Entertainment Europe Ltd. for his flattering foreword and my appreciation for the guidance and patience shown towards me by Joel Gamon of IGI Global. Finally, I must extend my apologies to my contributors and fellow authors for the hectoring they were subjected to by the editor: The end surely justifies the means. Mark Grimshaw University of Bolton, UK.
Section 1
Interactive Practice
1
Chapter 1
Sound in Electronic Gambling Machines:
A Review of the Literature and its Relevance to Game Sound Karen Collins University of Waterloo, Canada Holly Tessler University of East London, UK Kevin Harrigan University of Waterloo, Canada Michael J. Dixon University of Waterloo, Canada Jonathan Fugelsang University of Waterloo, Canada
AbstrAct A much neglected area of research into game sound (and computer games in general) is the use of sound in the games on electronic gambling machines (EGMs). EGMs have many similarities with commercial computer games, particularly arcade games. Drawing on research in film, television, computer games, advertising, and gambling, this chapter introduces EGM sound and provides an introduction into the literature on gambling sound in general, including discussions of the casino environment, the slot machine EGM, and the physiological responses to sound in EGMs. Throughout the article, we address how the study of EGM sound may be relevant to the practice and theory of computer game audio. DOI: 10.4018/978-1-61692-828-5.ch001
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Sound in Electronic Gambling Machines
INtrODUctION A much neglected area of research into computer game sound is the use of sound in electronic gambling machines (EGMs; also known as slot machines, video slots and video fruit machines). To put the influence of EGMs into perspective, the computer game industry in the United States contributes approximately $8 billion in sales each year to the country’s GDP (Seeking Alpha 2008). The slot industry, on the other hand, generates approximately $1 billion a day in wagers in the United States alone (Rivlin, 2004). Moreover, this amount is increasing as slot machines grow in popularity and are increasingly found outside of designated casinos. In 1980, an average of 45% of the gaming floor of a Nevada casino was devoted to slots, whereas today this number is at least 77%, with machines generating more than twice the combined revenue of all other types of games (Schull, 2005). Although they are also increasing in complexity (see below), slot machines are attractive to players because they require little or no training or previous experience, they are quick and easy to play and, perhaps most importantly, they elicit a number of sights and sounds that make them striking and exciting on the casino floor. EGMs have many similarities with commercial computer games, particularly arcade games. In fact, many of the early video arcade game companies also had a long prior history manufacturing slot machines, including Bally and the Williams Manufacturing Company. As such, many of the creators and designers of slot machines today have also worked for computer game companies. In fact, much of the sound design and music of slots is still outsourced to game sound designers and composers, such as George “The Fat Man” Sanger (composer of 7th Guest, Wing Commander, and others). Furthermore, until the 1990s slot machines had fairly standard mechanical or electro-mechanical reels and parts. Today, however, with the digitization of slot machines there are now considerably
2
more structural components to slot machine gameplay. Many of these structural components have been adapted from computer games, such as cut scenes, bonus rounds and specialist plays. And while the arm of the “one-armed bandit” remains on many slot machines, more commonly players use simple rectangular or round blinking buttons very similar to those of many arcade games. There are also, of course, some notable differences between computer games and electronic gambling machines. Historically, the vast majority of EGMs have been exclusively installed in casinos, where the usual age for entry is 21, thus effectively excluding young people from gameplay. However, this is changing as the companies attempt to capture a younger audience and the machines proliferate in non-gambling environments (Rivlin, 2004). Today, EGMs can be found in bars, restaurants, arcades, hotel lobbies, and entertainment and sporting venues. There are also, of course, virtual slot machines online, and these represent a significantly growing proportion of slot income. Research has further shown that casinos and gaming companies are seeking to target women, particularly those over 55 as its main demographic, although as the venues change, the target market is becoming younger. Electronic gambling machines today are also much faster to play than their mechanical and electronic ancestors. Now, the average player initiates a new game every 6 seconds (Harrigan & Dixon, 2009a, p. 83), playing up to 600 games per hour, and there are even artificially intelligent machines that adapt to the speed of the player— when they start slowing down, the machine will slow down with them, but work to build them back up after a little break. Many games aim for “immersion” (what might be best described in terms of Csíkszentmihályi’s concept of “flow”, characterized by concentration on the task at hand, a sense of control, merging of awareness and action, temporal distortion and a loss of selfconsciousness—see Csíkszentmihályi, 1990). It is, however, often possible to jam the button
Sound in Electronic Gambling Machines
with a piece of card, and let the machine play on its own for even faster results. Most machines also include a “Bet Max” function, a one-button mechanism that simultaneously allows players to wager the maximum allowable amount and to spin the reels—a function that encourages both faster wagering and continuous, rapid gameplay requiring a minimum of attention from distracted players.1 Thus, a “nickel slot” can mean wagers of up to about $4 per bet, although these are typically displayed in “credits” of 25-cent allotments so the illusion is that the player is betting less. The biggest distinction between slot machines and computer games is, of course, the aspect of financial risk added to gameplay, which adds a potential new level of psychological, cognitive, and emotional involvement in the game (we say potential because these distinctions are as yet unexplored in the research). The win-loss component of electronic gambling games is more complicated than it at first appears, with “losses disguised as wins”, and “near-misses” (see below). These are carefully doled out according to a reward schedule, based on scientific research about how long we will play before needing a win to keep motivated (see Brown, 1986). Reward schedules have also been built into computer games, particularly hunter-gatherer type games in which the player must spend considerable time roaming lands and collecting objects.2 Some psychologists suggest that the reward schedule combined with the rapidity of the gameplay is similar in character to the effect of amphetamines, stimulating the on-off cycle that repeatedly energizes and de-energizes the brain. This link is supported by functional magnetic resonance imaging studies revealing that brain scans of active gamblers and active cocaine users reveal similar patterns of neurocircuitry (Crockford, Goodyear, Edwards, Quickfall, & el-Guebaly, 2005). It has been suggested that there are many different motivations for gambling, with a distinct dichotomy between arousal/action seekers and those who seek escape/dissociation. In other words, slot machine games are designed
to simultaneously satisfy different needs of different players. In this chapter, we will introduce the literature of EGMs and related phenomena to the reader with a specific focus on the use of sound. A brief introduction to the structural components of gameplay is followed by an examination of existing studies on the sonic elements of casinos and gambling and an exploration of how this knowledge might apply to computer games.
strUctUrAL cOMPONENts OF EGM GAMEs A slot machine essentially involves three or more reels (in today’s EGMs, these are often computergenerated digital simulations, rather than actual mechanical parts). Touch-screen machines typically do not have handles, but rather the reels are spun by the player pressing a button (the one-armed bandit style pull-lever handle still exists on most slot machines, but is not often used). When the reels stop spinning, three or more icons (often up to five) will line up on the payline for a win, but other combinations of icons can also lead to a win (diagonal lines, and so on), with the amount won relating inversely to the probability of the symbol coming up on the payline (Turner & Horbay, 2004). Payouts vary by country/state/province and by initial betting amount, ranging from about 80 to 95%—in other words, a fairly significant number of plays result in some form of a “win” (see below for information about these “wins”). The amount bet on a win can vary also—the player can, for instance, be playing a “nickel slot” but can end up betting several dollars on a single play by betting on a larger number of potential payout lines. Moreover, with EGMs, the number of payout lines also varies. For example, Lucky Larry’s Lobstermania made by IGT, has five reels and 15 possible paylines. The maximum wager is 75 credits ($3.75), while the top prize is 50,000 credits ($2,500). There are also two different bonus
3
Sound in Electronic Gambling Machines
rounds available depending on the version of the game: a Great Lobster Escape, and a Buoy Bonus round in which additional payouts are guaranteed but the amount of payout varies.3 In these bonus rounds, the player is asked to select from a variety of options, giving the player the illusion of control and the perception of skill. The use of a stopping device, for instance, in which the players can stop the spinning of the reels voluntarily, increases the perception that the stopping is not random but that there is some form of skill involved: By having that control, there is an increased probability of success, thus making the game more attractive to the player (Ladouceur & Sévigny, 2005). Indeed, slot machines today can feature a library of game variations, in order to increase what the industry calls “time on device” (Schull, 2005, p. 67). Some features of EGMs (and particularly bonus rounds) such as nudge and stop buttons, give the illusion of control to the player—an important component but one that the gaming industry has referred to as being an “idiot skill” (Parke & Griffiths, 2006, p. 154). This perhaps calls to mind the “button-mashing” skill of the early arcade game beat’em-up genre.4 David Surman (2007) notes that Capcom’s 1987 arcade hit Street Fighter, for instance, was released with a touch-sensitive hydraulic button system in which the increase of the player’s pressure on the button related to the power of the player’s character’s kicking and punching, thus encouraging players to bang and smash on the buttons. He states: “This ‘innovation’ led to many machines being rendered defunct by over-zealous players smashing the control system. The cacophony of these large red buttons being bashed would come to signify the arcades which stocked a number of these first Street Fighter units” (pp. 208-209). When the player has an increased perception of control, they are more likely to engage with the game, play for longer, and spend more money. Bonus or built-in “secret” functions (often a cancel button, slow-down or hints—these are typically not actually secret but often not immediately
4
apparent) also increase the illusion of control. The bonus elements of gameplay are sometimes hinted at by the sound (as in The Simpsons EGM, in which Krusty the Clown says “Here’s a clue for ya, Jack”). A simple bonus or increased skill component leads to an increased psychological involvement on the part of the player and, it is suggested, has a “significant effect on habitual gambling” (Parke & Griffiths, 2006, p. 176). The use of these functions helps to keep players interested in that they hope that they will learn the “secrets” of the machine and thus be able to demonstrate their skill through winning as well as increase their winnings. Of course, similar bonus rounds and “Easter Eggs” are often built into computer games to reward the regular player who has taken the time to find them—thus upping the player’s credibility amongst other gamers. Usually superfluous to gameplay, Easter Eggs are nevertheless viewed as rewards for the time spent on the device (see Oguro, 2009). But even beyond the world of Easter Eggs, players develop skills beyond the initial simple skills required to technically play a game, notes Surman (2007): While a player new to videogames explores the pleasures of the gameworld with the clumsy curiosity of a toddler, as one becomes a more sophisticated gamer other pleasure registers come into play, which are concerned with a literacy of sorts in which one is sensitive to the codes and conventions of the gameworld, and the panoramic experience of worldliness reduces to a hunt for the telltale graphical or acoustic ‘feedback loops’, confirming success in play. Still higher, as the core gameplay becomes exhausted, players end up centring on the reflexive undoing of the gameworld; pushing it to its limits, exploring and exploiting glitches, ticks, aberrations in the system. (p. 205) This description fits closely with Csíkszentmihályi’s (1990) ideas of the requirements for flow (immersion), where a careful balance between difficulty and skill is required to continually en-
Sound in Electronic Gambling Machines
gage a player in an activity. As the skill increases, so must the difficulty, or the player will become bored. If the skill required is too difficult for the novice, the player will likewise lose interest. Equally important to the psychology of the player are the built-in gambling machine concepts of the “near miss” and the “loss disguised as a win”. A near miss is a failure that was close to a win—such as two matching icons arriving on the payline followed by a third reel whose icon sits just off the pay-line. Slot machine manufacturers use this concept to create a statistically unrealistically high number of near misses (Harrigan, 2009), which convinces the player that they are close to winning, and therefore leads to significantly longer playing times (Parke & Griffiths, 2006). Described gambling researchers Jonathan Parke and Mark Griffiths (2006): At a behaviourist level, a near miss may have the same kind of conditioning effect on behaviour as a success. At a cognitive level, a near miss could produce some of the excitement of a win, that is, cognitive conditioning through secondary reinforcement. Therefore, the player is not constantly losing but constantly nearly winning. (p. 163) A loss disguised as a win, on the other hand, is a play in which the player “wins” but receives a payout amount of money less than that of the amount wagered, hence actually losing on the wager despite being convinced (sonically) that they have, in fact, won. So for example, a gambler might wager $2 on a play and win $1.50 back. S/he is actually losing 50 cents, but is given the reinforcement cues (see below) of a win. An important contributing factor to all of these illusions that increases playing time and increases money lost is sound. A small number of previous studies of sound in slot machines have shown that sound influences a gambler’s impression or perception of the machine, including the quality of the machine (the fidelity of the sound is a primary reason for selecting one EGM over another),
helping to create a sense of familiarity, branding or distinguishing the machine, and creating the illusion of winning, since players may only hear winning sounds (Griffiths & Parke, 2005). Furthermore, Dibben (2001) argues that, for listeners, the reception of music and sounds are not only embedded in the material and physical dimensions of hearing but are also, and critically, grounded in social and cultural knowledge and awareness, based on “listeners’ needs and occupations” (p. 183). This idea—that response to music and sounds can be influenced by culture and personal experience—has self-evident relevance for a study focusing on the role of sound in relation to individuals immersed in gambling environments and/ or those at risk for addictive gambling behavior. We will first cover the environment in which the slot machines are commonly found and then focus on the machines themselves.
cAsINO sOUND: ENVIrONMENtAL FActOrs The sound of electronic gambling machines in the context of a casino can play a significant role in the perception of the games. Background music in the casinos or bars changes throughout the day, with pop music played in daytime, and relaxing music in the evenings (Dixon, Trigg, & Griffiths, 2007). The noise and music gives the impression of an exciting and fun environment and, critically, that winning is more common than losing. In fact, Anderson and Brown (1984), in a comparison of response to gambling in a laboratory and a casino setting, found that in the casino, the player’s heart rate increases considerably. Moreover, increased exposure to the casino setting in problem gamblers leads to an increased arousal response. They note that “[t]he constant repetition of major changes in autonomic or other kinds of arousal associated in time and place with various forms of gambling activity is likely to have a powerful classical or
5
Sound in Electronic Gambling Machines
Pavlovian conditioning effect on gambling behavior” (p. 400). There has been considerable research into environmental sounds and its impact on consumer behavior in regards to advertising and retail. Servicescapes—that is, the soundscape and landscape of the service environment—have been one recent area of focus in advertising and marketing research. A pleasant ambience, it is felt, is key to a pleasurable shopping experience. Congruency in ambience between the brand, sounds scent, and other aspects of the store are vital to a positive consumer experience (see Mattilaa & Wirtz, 2001). Companies like the now-defunct Muzak have, of course, built businesses on this idea. Alvin Collis, VP of strategy and brand for Muzak, outlines the concept of the servicescape: I walked into a store and understood: this is just like a movie. The company has built a set, and they’ve hired actors and given them costumes and taught them their lines, and every day they open their doors and say, ‘Let’s put on a show.’ It was retail theatre. And I realized then that Muzak’s business wasn’t really about selling music. It was about selling emotion—about finding the soundtrack that would make this store or that restaurant feel like something, rather than being just an intellectual proposition. (see Owen, 2006) Certainly, statistics seem to back-up Muzak’s ideas, with some studies suggesting that young people spend 36% more time in a shop when music is being played, that if Muzak is played in a supermarket, it will increase the percentage of customers making a purchase there by 17%, and so on (KSK Productions, n.d). Generally speaking, consumers spend longer in environments when there is some form of background music as long as the volume is low and uncomplex (Garlin & Owen, 2006, p. 761). Music tempo changes can alter the length of time a shopper spends as well as the amount of money. Not only this, but music can also influence the perceived amount of time
6
spent. Young people under 25 perceived that they had spent longer in an “easy listening” store condition, while older shoppers perceived that they had spent longer in a Top 40 store condition. Familiar music led to the impression that they were shopping longer (Yalch & Spangenberg, 2000). Muzak’s website described of its music concept (what it terms “audio architecture”): Its power lies in its subtlety. It bypasses the resistance of the mind and targets the receptiveness of the heart. When people are made to feel good in, say, a store, they feel good about that store. They like it. Remember it. Go back to it. Audio Architecture builds a bridge to loyalty. (Muzak Corporation, n.d) Music is, of course, not the only element of environmental sound that plays into the overall ambience. Sound effects, such as in Discovery Channel’s stores with sound zones, or a Canadian supermarket close to one of the authors, Sobey’s, which has chirping birds and frogs in the produce aisle, can also create an overall atmosphere. Both sound effects and music can help to quickly identify a brand for consumers without prior experience of that brand. Music can cue the shopper as to the intended market, and a poor choice of music can clash with the values of the brand (Beverland, Lim, Morrison, & Terziovski 2006). Griffiths and Parke (2005) draw on a theoretical model by Condry and Scheibe (1989) regarding persuasion in advertising and adopt this model for slot machine sound. They suggest that there are stages in the persuasion process that involves a person committing to the machine. This begins with exposure (they must be exposed to the machine and that might be in a bar) and leads to attention (in which sound plays a particularly important role to draw attention in a noisy atmosphere). From there, comprehension and yielding takes place— a familiar musical theme helps draw the player in, believing the machine is socially acceptable because the sound is likable and familiar. Finally,
Sound in Electronic Gambling Machines
the retention and decision-to-gamble stages occur. In other words, sound is used to draw people in, make them feel comfortable, and convince them to play. The authors hypothesize that the background sounds and music might increase confidence of the players, increase arousal, help to relax the player, help the player to disregard previous losses, and induce a romantic state leading them to believe that they may win. One study into the effect of background music on virtual roulette found that the speed of betting was influenced by the tempo of the music, with faster music leading to faster betting. Another suggests that there are two main types of casino design: a playground design (spacious, with warm colors, vegetation, and moving water) and a low-ceiling, crowded and compact area. This study found that music increased perceived at-risk gambling intentions in the playground casino design while decreasing the intentions in the other gambling design. In the presence of just ambient sounds, however, this finding was reversed (Marmurek, Finlay, Kanetkar, & Londerville 2007). What is certain is that the flashing lights, the room lighting, the carpeting, and visual design of the space, the conflicting smells of food, perfume and alcohol, and in particular the use of loud sounds serves to at once create feelings of excitement and luxury as well as serving to distract the player by increasing cognitive load (the efforts involved in processing multi-modal information and use of working memory) (see Hirsch, 1995; Kranes, 1995; Skea, 1995). Multiple conflicting stimuli and calls on attention leading to this increased cognitive load causes people to process information using guessing, stereotyping, and automatic response to stimuli rather than reasoned and rational response and introspection.5 This depends, somewhat, on the type of music involved, as well as the personal perception of the individual involved (Carter, Wilson, Lawsom, & Bulik, 1995; McCraty, Barrios, Atkinson, & Tomasino 1998; Wolfson & Case, 2000).
Some slot machines, however, employ noise cancellation technology to remove any “destructive interference” that may distract a player from the flow of gameplay, to increase immersion (Schull, 2005, p. 67). An Australian study found conflicting reviews of background ambience, with some players getting distracted, and others reporting excitement: “You can go either way when you hear somebody else going, you can get all hyped up and think, gee their machine’s going I could also have it, or it could go the opposite, why isn’t my machine paying. It has a double affect” versus, “The minute I hear the ‘ching, chong China man’, I quickly run around to see”… Two participants noted that the music made them “anxious” and “desperate” as they believed that everyone else around them was winning something, when they were not” (Livingstone, Woolley, Zazryn, Bakacs, & Shami, 2008, p. 103). Computer games today are rarely consumed in an arcade environment whose music and sound can be manipulated, but the use of non-diegetic music in games as well as the use of ambience could be adjusted to take into consideration some of the results of these studies. For instance, altering the perception of time through the use of changing tempos or generating feelings of excitement with carefully timed sound effects in the ambient world may help to engage the player. There are also implications here relating to games that require further research. In particular, how does the fact that players can substitute their own music in Xbox360 games influence their perception of gameplay? How does the use of familiar music impact the player’s perception of unfamiliar games? These questions are outside the scope of this chapter, but clearly have important consequences in regards to player engagement with and enjoyment of a game. Of course, more easily manipulated than the environmental space in which gameplay takes place, is the use of sound in the games themselves.
7
Sound in Electronic Gambling Machines
sOUND IN EGMs The earliest slot machines, such as the Mills Liberty Bell of 1907, included a ringing bell with a winning combination, a concept that is still present in most slots today. Playwright Noël Coward noted that sound was a key part of the experience in Las Vegas: “The sound is fascinating . . . the noise of the fruit machines, the clink of silver dollars, quarters, nickels” (in Ferrari & Ives, 2005). As in the contemporary nickelodeons, sound’s most important early role was its hailing function, attracting attention to the machines (Lastra, 2000, p. 98). Sound in EGMs has advanced alongside the technological changes introduced into the machines in the last few decades. EGMs are now using computer-generated graphics, popular music, and high-fidelity sampled sound rather than relying on mechanical ball-bearings, bells or basic square-wave synthesizer chips. Today, sound effects in EGMs are used for a variety of feedback and reward systems. Up until about the early 1990s, slot machines featured about 15 ‘’sound events’’, whereas they now average about 400 and are often carefully researched to manipulate the player (Rivlin, 2004, p. 4). Sound designer George Sanger described that sound is created “by committee” and that the committee “always want it to be more exciting” with little consideration for a dynamic range in the excitement portrayed (Personal communication, October 15 2009, Austin, TX). This includes sound effects of coins falling even though many slot machines neither accept nor pay out coins anymore. Notes Bill Hecht, an audio engineer for IGT, “We basically mixed several recordings of quarters falling on a metal tray and then fattened up the sound with the sound of falling dollars” (Rivlin, 2004, p. 3). Moreover, these false coin sounds can portray wins much larger than the actual win. Unpredictable sounds in particular help to capture and maintain our attention (Glass & Singer, 1972). There has even been a recent patent to randomize winning sound effects in order to
8
increase the perception that the sound is more real than it is in actuality and to reduce the recognition that it is merely careful programming at play. The patent describes: In the conventional slot machine… the sound effects generated from the speaker are based on only one kind of sound effect pattern. For example, when a big bonus game occurs, a fanfare indicating the occurrence of the big bonus game is sounded, and so forth. Meanwhile, with a slot machine in which a special game has once occurred, the player typically keeps playing games while expecting special games to further occur. In this case, if the sound effects (winning sounds) identical to those at the first occurrence of the special game are generated upon the second or later occurrence, the pleasure of gaming may not fully be enjoyed. (Tsukahara, 2002, p. 1) Slot machines use pseudo-random number generators carefully programmed to elicit the right reward schedule, however, and there is no real skill involved, only manipulations of perception. Recent research findings that music can increase success rate, for instance, are fallacious because it is simply not possible. Yamada (2009) for example proposes that: Results indicated that the no-music condition showed the best rate of success. Moreover, a “mixed” musical excerpt added “unpleasantness” to the game and, in turn, resulted in a negative effect on the success rate. Increasing the speed increased the “potency” of the game, but did not affect the success rate, systematically. In the second experiment, we used the two excerpts performed in various registers and with various timbres as musical stimuli. (p.1) It is unclear if Yamada custom-designed the games that were tested or if the test was for illusion of success and perceptions of gameplay rather than actual success, and neither is it clear
Sound in Electronic Gambling Machines
if the game involved was a custom-built game for the purposes of the study (we could find no references to the game in Google), and so Yamada’s findings remain dubious at best. However, what Yamada’s work does show is that it is highly likely that music plays an important role in increasing the illusion of success. Of course, winning sounds are particularly important to the popularity and attraction of the machines and losing sounds are rarely heard. When losing sounds are used in some machines, they are intentionally employed to antagonize the player, creating a short-term sense of frustration that, it has been suggested, prolongs the play period in what has been called “acoustic frustration”: Antagonistic sounds invoke frustration and disappointment. For example, on The Simpsons fruit machine, Mr. Smithers smugly informs Homer Simpson that, “You’re fired”, or Chief Wiggam says, “You’re going away for a long time”. At present, we can only speculate about the consequences of such sound effects. In line with hypotheses supporting frustration theory and cognitive regret… this might make the fruit machine more inducing.” (Parke & Griffiths, 2006, p. 171) This idea of acoustic frustration could be adapted and utilized by computer games more effectively than is currently seen. For instance, commentary on gameplay (see below) is common in some types of games but absent in most computer games. Sound effects and music could play a commentary effect without using dialogue as well. The types of sounds used are particularly important to their affective power. Pulsating sounds that increase in pitch or speed (vibrato and tremolo) have been shown to help to increase tension and verbal reinforcements (both negative and positive) are used to goad the player on with a sensation known as perceived urgency (see Edworthy, Loxley, & Dennis, 1991; Haas & Edworthy,1996). The deeper a player gets into a
game, the louder and quicker the music usually becomes. High pitched sounds—very common to slot machines—are also very useful in attracting our attention as they perceptually appear closer to us. Notes Millicent Cooley (1998): “Advertisers use this principle when they pump television commercials full of high frequency audio that makes characters sound as if they are intruding into viewers’ homes.” The types of sounds used in EGMs are also carefully chosen according to Western cultural likes and dislikes. As one study of pleasing sounds found, chimes are particularly highly rated: “Our highest rated sounds generally related to escapism (e.g., fantasy chimes, birds singing) and pleasure (children laughing)” (Effrat, Chan, Fogg, & Kong, 2004, p. 64). Large wins in slot machines are characterized by a “rolling sound” with the length of the win tied to the length of the music cue. Winning sounds are often carefully constructed to be heard over the gameplay of other players to draw attention to the machine and to raise the self-esteem of the player, who then becomes the center of attention on the slot machine/casino/bar floor (Griffiths & Parke, 2005, p. 7). Often, this music contains high pitched, major mode songs with lots of chimes and money sounds. Higher pitch also has a tendency to increase the perception of urgency, with that increase in perceived urgency corresponding to an increase in pitch, but it also helps to cut through the ambient noise of a busy casino (Haas & Edworthy, 1996). There are several implications here for computer game sound. First is the reinforcement role of both encouraging and antagonistic sound. Sonic rewards are under-utilized in games and the idea of a reward schedule, while it has been used in computer games, is likewise unusual. To tie the two together—to have a system of sonic rewards at anticipated specific timings in the game—can help to keep a player interested for longer. Losing sounds, as discussed above, are perhaps the equivalent of player health decreases or death in a computer game. It is quite common for computer
9
Sound in Electronic Gambling Machines
games marketed to children to sonically represent the player’s character’s death as a not particularly negative event. This may in fact even be silence upon the character or game’s end (the equivalent of not hearing a losing sound in a slot machine), a fun “raspberry”, a game show-like losing sound (as in Rocky and Bullwinkle on the Nintendo Entertainment System (NES)), or a cheery “try again” music (as in the Jetsons or Flintstones game-over music for the NES). On the other hand, in more adult-oriented games, the player’s death can be a much more negative event with serious funeral dirges. It may be worthwhile for sound designers to explore the possibility of including both more losing-type sounds in other places within the game, in order to increase the acoustic frustration the player feels, thus enhancing the impact of winning sounds and increase emotional engagement. Psychological studies have shown that frustrative non-rewards are considerably motivating. In simple terms, “failing to fulfill a goal produces frustration which (according to the theory) strengthens ongoing behavior”, leading to cognitive regret, encouraging persistent play in the desire to relieve the regret (Griffiths, 1990; see also Amsel, 1962). Note King, Delfabbro, & Griffiths (2009): Video games have also become longer and more complex, making a punishment like permanent character death an unappealing feature, particularly for a less committed, casual playing audience. Common forms of punishment in games include having to restart a level, failing an objective, or losing resources of some kind, like items, XP or points. (p. 10) It is possible, therefore, to improve the sound of losing tied to these lesser events, in order to tap into the acoustic frustration effect seen in slot machines. While we typically hear sounds tied to these events in current games, a stronger sense of loss (and thus, upon winning, reward) may improve player involvement.
10
Likewise, the concepts of near misses and losses disguised as wins are elements popular in slot machines but rarely—if ever—heard in computer game sound. One might imagine, for instance, a “mini-game” within a larger game in which the player is sonically teased with almost winning a bonus round or is given the impression that they have won more points than they actually have within that bonus round. This would probably, of course, only be useful for certain types of games aimed at certain types of players. One can imagine this effect in a Wii casual game designed for all ages, for instance, but less so for a big budget first-person shooter title on the Xbox360.
slots, Familiarity and brands Important to feelings of player comfort and emotional connection to the machine is the role of branding EGMs by using well-known intellectual property. Popular songs are often used to attract a player to the machine and to cause players to feel more comfortable and familiar with that machine. Similarly, sound can play a role in branding by certain companies which create distinctive winning sounds in an effort to have their sounds heard over the din of the casino. Indeed, branded EGMs are becoming both more commonplace and more popular in casino environments. Whereas once producers of popular culture sought to remain apart from the perceived negative connotations associated with the gambling industry, today films like Top Gun, and Star Wars, television game shows like Jeopardy!, Deal or No Deal and The Price Is Right, and musical acts like Elvis Presley, the Village People and Kenny Rogers all have branded EGMs (Dretzka, 2004). Familiarity with a television show, film, person, place, musical act or sport can, for instance, entice players to the machine because it may “represent something that is special to the gambler… Players may find it more enjoyable because they can easily interact with the recognizable images and music they
Sound in Electronic Gambling Machines
experience” (Griffiths & Parke, 2005, p. 5). As Dretzka (2004) observed: Seemingly overnight, casinos actually began sounding different. Instead of clanging bells, mechanical clicks and clacks, and jackpot alarms, the soundtrack was more of an electronic gurgle and hum, with bursts of ‘This is Jeopardy!’, Wheel of Fortune!’ and snippets of rock songs. A generation of Americans raised in front of their television sets ate it up. Moreover, familiarity and repetition of musical themes has been shown to have a positive influence on our liking of the music (see Bradley, 1971). Verbal reinforcement with known characters (as well as, to a lesser degree, unknown characters) also takes place, as seen above, with familiar characters telling people that they are “cool” or “a genius”. Parke and Griffiths (2006) note that verbal reinforcement that increases play is designed to raise self-esteem, give hints and guidance, and even provide friendship or company (p. 171). An unexplored area of research is the relationship between verbal reinforcement and the anthropomorphizing of slot machines. Describes Langer (1975), with regard to such anthropomorphism: “Gamblers imbue artifacts such as dice, roulette wheels, and slot machines with character, calling out bets as though these random (or uncontrollable) generators have a memory or can be influenced” (see also Gaboury & Ladouceur, 1989; Toneatto, Blitz-Miller, Calderwood, Dragonetti, & Tsanos, 1997). It is very likely that sound plays a considerable role in the anthropomorphizing of slot machines—particularly in those cases where the machines “talk” to the player, but also in the mere fact that they are sonically responsive to our input. In reference to the game show computer game You Don’t Know Jack, Millicent Cooley (1998) notes that the player: Will be aggressively challenged to prove that you know jack (anything at all), and you know this,
again, because of the dialog and swaggering, aggressive tone of the host. The machine is in charge and you, the player, are not; the game is quick-paced, there is a sense that you will be rushed along and should try to keep up and prove that you do, in fact, know jack. You feel this pressure because the voice of the host rushes you to sign in, taunting you impatiently at every step. (p. 8) It is possible that a similar process is at work with slot machines—that is to say, the taunting will increase the speed with which the player plays, antagonizing the player to the point where the player loses focus on what truly matters (that is, the loss of their money). In reference to sonic branding, Jackson (2003) suggests that the voice heard links to the perceived personality (including perceived behavior and perceived appearance) of the speaker and, therefore, of the brand (p. 135), and it is equally likely that a similar effect is seen in the perceived personality of the machine. It has been said that 38% of the effect we have on other people can be attributed to our voice, with only 7% to the actual words we’ve spoken (the rest being body language) (Westermann, 2008, p. 153). In a study into voice and brand, UK Telecom provider Orange identified a series of attributes that define the sound of a voice: rhythm (emphasis is placed on what is said); pitch (high versus low), melody (rhythm and pitch together; pace (speed), tone (overall musical quality); intonation (what is said relating to how it is said), energy; clarity; muscular tension; resonance pause, breath; commitment, and volume (Westermann, 2008, p. 153). Each of these attributes work together to impact our perceptions of what is being said. Particularly notable is the impact that the voice (and what it says) can have on our perceptions of what we are seeing and/or experiencing. Several studies have shown how the voice influences our perception of video sports performances. In a study of sports commentary, Bryant, Brown, Comisky, & Zillmann, (1982) discovered that our enjoyment of watching sports
11
Sound in Electronic Gambling Machines
is largely tied to the dramatic embellishments provided by the commentary of the sportscasters. However, it is not only our enjoyment but also the ways that we interpret what we are seeing that is influenced by commentary. In one study, it was found that commentary affected the perception of aggression of the players in an ice hockey match. (Comisky, Bryant, & Zillmann, 1977). Not only this, but the more aggressive commentary was also perceived as more enjoyable. Other similar studies have reached similar conclusions in commentary in a tennis match, (Bryant, Brown, Comisky, & Zillmann, 1982), a soccer game (Beentjes, Van Oordt, & Van Der Voort, 2002) and a basketball game (Sullivan, 1992). This influence of commentary on perception is likely to play an equally important role in slot machines as well as computer games, although this remains another area of game sound largely unexplored. Sports games in particular make use of commentary although it is also very common to find commentary in games that imitates television game shows. It is possible, therefore, that the addition of a narrative of events in some games may impact the player’s perception of their gameplay, as well as their enjoyment of the game, although the technique is clearly under-utilized. Another trait discussed above that is highly popular in slot machines but less common in computer games is the use of familiarity and branding tied to the machines. Not only do the games themselves have distinctive sounds, but each company has its own overarching style and aesthetic that can be quickly learned upon spending time on the casino floor. The coin sounds from an IGT slot machine, for instance, sound different from those generated by a Bally machine. While this acoustic branding is particularly relevant in an environment where machines are competing for attention, the relevance of creating a distinctive sound and branding franchise games or episodic games remains in other environments also. Some computer games have, of course, employed this technique—the Super Mario Bros series, for
12
instance, has maintained a distinctive aesthetic through countless incarnations, platforms, and technological improvements. However, there are many games that still do not attempt to capitalize on this ability to entice experienced players to a new version of the game with the creation a distinct, recognizable sound.6
rEsPONsEs tO EGM sOUND The response of players to slot machine sounds is diverse, representing the different needs and desires of the players. For many, music and sound signify success, as one study has found: “I like it when it’s going long [the music], because you know you’re winning plenty of money. When they’re short, I don’t like them…” (Livingstone et al., 2008, p. 103). Other players—those which by their comments appear to be more regular gamblers—dislike the sounds, the study found: “sounds are too loud and attract attention. If someone lets the feature music go on and on they are not serious—the problem gamblers hate hearing it go on and on—and it draws attention to you” (Livingstone et al., 2008, p. 103). A few other participants also reported pressing “collect” straight after a win specifically to stop the music from playing. While some players found the sounds of others winning exciting, others felt that it gave them the impression that “everyone is winning but you” (Livingstone et al., 2008, p. 103). One study regarding sound’s presence (as on or off) showed that players strongly preferred sound to be on (Delfabbro, Fazlon, & Ingram, 2005). Response to sound, therefore, can vary from player to player, but some typical responses can be summarized. Studies of the physiological response to sound (typically industrial noise, but also including music, speech, and other sounds) have found that sounds can contribute to increases in blood pressure and, most importantly, impair performance on a vigilance task (Smith & Morris
Sound in Electronic Gambling Machines
1997). Wolfson and Case (2000) studied heart rate response to manipulation of loudness of sound in a computer game, finding that louder sounds led to increased heart rate, and discussed the impact that physiological arousal has on our attention levels. They note: People performing a task when minimally aroused are more likely to be slow, indifferent, and spread their attention across a wide range of stimuli. When highly aroused, people tend to be faster but less accurate, and they focus mainly on the most salient aspects of a task. Thus both high and low levels of arousal can have detrimental effects on performance. (Wolfson & Case, 2000, p. 185) Physiological responses to stimuli can be tested using a variety of measures, including (but not limited to) electroencephalograms (EEGs), facial electromyography, heart rate, pupil dilation and electrodermal response. Galvanic skin response (GSR), one component of electrodermal response, also known as skin conductance response or sweat response, is an affordable and efficient measurement of simple changes in arousal levels—one of the reasons why it is the main component of a polygraph device. Essentially, GSR measures the electrical conductivity of the skin, which changes in resistance due to psychological states. (See Nacke & Grimshaw, 2011 for the use of such measures when assessing psychophysiological responses to computer game sound.) Studies using GSR on subjects while being exposed to music date back to at least the 1940s (for example, Dreher, 1947; Traxel & Wrede, 1959) but are highly contradictory due to the conditions in which the studies took place. Sound and music has a known influence on listener’s arousal and anxiety levels, but this depends on many factors including the degree of musical knowledge, the tempo of the music, the familiarity with the music, preference for the music, and recent exposure to that music. Smith and Morris (1976) found that stimulating music increased worry and anxiety, al-
though they tested their student subjects during an examination. Rohner and Miller (1980) found that music had no influence on anxiety levels. Pitzen and Rauscher (1998) and Hirokawa (2004), on the other hand, more recently found that stimulating music increased energy and relaxation (increasing GSR but not heart rate). Although there are many studies about music in isolation and its physiological effect on listeners, there has been much less research on music’s impact on GSR while taking into consideration the interaction between sound and visual image (for example, Thayer & Levenson, 1983). Perceptual studies (non-physiologically based research) from the field of advertising suggest that image and sound, when used congruently (that is, for instance, when both have a similar message), tend to amplify each other (for instance, Bolivar, Cohen, & Fentress, 1994; Bullerjahn & Güldenring, 1994; Iwamiya, 1994). There have also been studies into the physiological effects of gambling, which have shown that pupils may dilate, heart rate may increase, and skin conductance levels increase (raising the GSR). Collectively, these are known as arousal levels, and it is the arousal inducing properties of slot machines that are affected by winning and losing, with increased arousal levels for wins (such as Coventry & Constable, 1999; Coventry & Hudson, 2001; Sharpe, 2004). Additionally, a number of studies, for instance, research by Dickerson and Adcock (1987), have questioned whether there is a connection between physiological responses to gambling and wider psychological issues governing perceptions of such elements as gambling environment, luck, and mood. These studies suggest there is some evidence to support both psychological and physiological responses to gambling behaviors are fuelled in part by a player’s illusion of control (for example, Alloy, Abraham, & Viscusi, 1981). In more recent research into computer games and the computer gaming environment, Hébert, Béland, & Dionne-Fournelle (2005) have discovered that, “for the first time…auditory input
13
Sound in Electronic Gambling Machines
contributes significantly to the stress response found during video game playing” (pp. 23712372). This research suggests that physiological responses to music in computer games may be linked in part to genre, noting generally that the more aggressive and rapid the music, the more elevated physiological stress levels become. A recent pilot study into the sounds and sights of losses disguised as wins was undertaken with 16 participants by the University of Waterloo’s Problem Gambling Research Group. Each participant played Lucky Larry’s Lobstermania for 45 minutes while being tested for their arousal levels using GSR. Participants wore a GSR recording device on their fingers while they played, with the output from the GSR being tied to two wires which output when the player pressed the play button and whether or not the play resulted in a win, loss disguised as a win (where payout is less than spin wager) or a regular loss (that is, losses without reinforcing sounds of a win). As might be expected, the highest GSR rating—indicating the highest arousal level—was found with wins, with the lowest rating with regular losses. What is particularly interesting, however, is that losses disguised as wins were much closer physiologically to wins, than to losses. In other words, hearing the sounds of winning, even though the player has lost money, is enough to trick the mind/ body into believing that the player is winning (Dixon, Harrigan, Sandhu, Collins, & Fugelsang, forthcoming). In the case of losses disguised as wins, these games play on the idea of synchresis. Film theorist Michel Chion (1994) defines synchresis as “the forging of an immediate and necessary relationship between something one sees and something one hears,” combining the ideas of synchronism (simultaneous events) with synthesis (p.5). Essentially, sound changes our perception of the image that we see and, despite there being an opposing relationship between sound and image, we view images as connected to sound when they are played concurrently, with the sound dominating
14
our response. With losses disguised as wins, the numbers displayed on the machine tell us that we are losing (in other words, we “won” 50 cents, but our total credits and cash have been reduced since the last play) but the sound tells us that we are winning. In a sense, the sound overrules our eyes and leads the emotional (and physiological) response to the event. This phenomenon illustrates the importance of sound to our overall perception of audio-visual media, and demonstrates one under-utilized way that sound is used in computer games. Far from merely reinforcing image, sound can have a much more complex relationship with what is occurring on screen. We might use a “winning cue” sound for instance in a battle scene to trick the player into thinking that the evil “big boss” enemy is dead, only to have them return to life. Or, we might use sound into tricking the player into thinking drinking that bottle of potion was a beneficial event, only to later reveal that it was not.
cONcLUsION The intent of this article has been to explore a comparatively understudied area of computer game sound, chiefly that of the role of music and sound in electronic gambling machines (EGMs). We explored the structural components of EGMs and EGM games, tracing the development of technical advances that have led to progressively more enhanced audio interfaces over the past two decades. Central to this discussion is the interrelationship between EGM technology, sound and human behavioral psychology. Research has shown that standard EGM gameplay concepts like, for instance, the “near miss” and “losses disguised as wins”, coupled with enhanced sound prompts and triggers can encourage both more rapid and longer gameplay. A second correlated point in this study has been our consideration of EGM sound within the wider soundscape of a casino/bar/gaming environment.
Sound in Electronic Gambling Machines
An interesting area of research as yet unexplored is determining whether gambling behavior is affected when EGM sounds commingle with, and compete against, external sources of music, sounds, and noise. Further, it would be interesting to explore whether a correlation exists between the concurrent use of image and sound in EGMs. Specifically, to determine if EGM sound and video individually and together amplify and/or reinforce the notion of a loss disguised as a win or, conversely, if EGM sound and visuals instead worked to distract and divert gamblers’ attention away from the machine, and by extension, from the act of gambling. Early research does indicate that sound does, in fact, reinforce the idea of winning even when the player is losing. There have been no studies to explore the impact that a similar sonic process has in computer games, but this is an interesting area for future exploration. A particularly important concept that can be taken from slot machines is the idea of customization. Slot machines, as shown, have two basic markets that they cater to: arousal/action seekers, and those who seek escape/dissociation. It may be suggested that computer games have a similar audience, although this simple way of dividing players is perhaps inadequate. What does remain, however, is the concept that players have different needs for gameplay. And while some players enjoy the sounds of slot machines and the casino environment, others clearly would prefer the ability to turn down—or turn off—sound altogether. computer games, of course, have long recognized this and offered the ability to turn sound on, off, and later adjust volumes of individual elements (ambience/sound effects/dialogue/music). More recently, the option for players to insert their own preferred music into a game has furthered the ability to customize game sound. Further, some games have “boredom switches” that drop the volume levels automatically after a player has become “stuck” at a particular stage in the game. However, it might also be possible to adjust sound based on the player’s skill level and ability—with
more frequent frustration sounds being used as the player advances, for example, and greater sonic encouragement at the start of a game. Different sounds may be used when the game is being played as a one-player or in multi-player mode. Recently, with the creation of physiologically aware gaming devices such as the Wii Vitality Sensor, it has become possible to adjust in real-time based on the player’s physiological response. We believe that this area of computer gaming—what we might call “player aware” games—will become an important future area for research. In particular, it is possible to both craft sound to manipulate the player based on their physiological response, as well as to respond based on their physiological response. It might be possible, in other words, for games to “read” our emotional and physiological state and adjust music to keep us interested, to guide us to another state, or to enhance an existing state. Sound clearly plays an important role in the perception of gaming, and will continue to grow in importance as computer games search for ever-increasing ways to keep players interested.
rEFErENcEs Alloy, L., Abramson, L., & Viscusi, D. (1981). Induced mood and the illusion of control. Journal of Personality and Social Psychology, 41, 1129–1140. doi:10.1037/0022-3514.41.6.1129 Amsel, A. (1962). Frustrative nonreward in partial reinforcement and discrimination learning: Some recent history and a theoretical extension. Psychological Review, 69(4), 306–328. doi:10.1037/ h0046200 Anderson, G., & Brown, R. I. T. (1984). Real and laboratory gambling, sensation-seeking and arousal. The British Journal of Psychology, 75(3), 401–410.
15
Sound in Electronic Gambling Machines
Beentjes, J. W. J., Van Oordt, M., & Van Der Voort, T. H. A. (2002). How television commentary affects children’s judgments on soccer fouls. Communication Research, 29, 31–45. doi:10.1177/0093650202029001002 Beverland, M., Lim, E. A. C., Morrison, M., & Terziovski, M. (2006). In-store music and consumer–brand relationships: Relational transformation following experiences of (mis)fit. Journal of Business Research, 59, 982–989. doi:10.1016/j. jbusres.2006.07.001 Bolivar, V. J., Cohen, A. J., & Fentress, J. C. (1994). Semantic and formal congruency in music and motion pictures: Effects on the interpretation of visual action. Psychomusicology, 13, 28–59. Bradley, I. L. (1971). Repetition as a factor in the development of musical preferences. Journal of Research in Music Education, 19(3), 295–298. doi:10.2307/3343764 Brown, R. I. F. (1986). Arousal and sensationseeking components in the general explanation of gambling and gambling addictions. Substance Use & Misuse, 21(9), 1001–1016. doi:10.3109/10826088609077251 Bryant, J., Brown, D., Comisky, P. W., & Zillmann, D. (1982). Sports and spectators: Commentary and appreciation. The Journal of Communication, 32(1), 109–119. doi:10.1111/j.1460-2466.1982. tb00482.x Bryant, J., Comisky, P., & Zillmann, D. (1982). Drama in sports commentary. The Journal of Communication, 27(3), 140–149. doi:10.1111/j.1460-2466.1977.tb02140.x Bullerjahn, C., & Güldenring, M. (1994). An empirical investigation of effects of film music using qualitative content analysis. Psychomusicology, 13, 99–118.
16
Carter, F. A., Wilson, J. S., Lawson, R. H., & Bulik, C. M. (1995). Mood induction procedure: importance of individualising music. Behaviour Change, 12, 159–161. Chion, M. (1994). Audio-vision: Sound on screen. New York: Columbia University Press. Comisky, P. W., Bryant, J., & Zillmann, D. (1977). Commentary as a substitute for action. The Journal of Communication, 27(3), 150–153. doi:10.1111/j.1460-2466.1977.tb02141.x Condry, J., & Scheibe, C. (1989). Non program content of television: Mechanisms of persuasion . In Condry, J. (Ed.), The Psychology of Television (pp. 217–219). London: Erlbaum. Cooley, M. (1998, November). Sound + image in computer-based design: Learning from sound in the arts. Paper presented at International Community for Auditory Display Conference, Glasgow, UK. Coventry, K. R., & Constable, B. (1999). Physiological arousal and sensation seeking in female fruit machine players. Addiction (Abingdon, England), 94, 425–430. doi:10.1046/j.13600443.1999.94342512.x Coventry, K. R., & Hudson, J. (2001). Gender differences, physiological arousal and the role of winning in fruit machine gamblers. Addiction (Abingdon, England), 96, 871–879. doi:10.1046/ j.1360-0443.2001.9668718.x Crockford, D., Goodyear, B., Edwards, J., Quickfall, J., & el-Guebaly, N. (2005). Cue-Induced brain activity in pathological gamblers. Biological Psychiatry, 58(10), 787–795. doi:10.1016/j. biopsych.2005.04.037 Csíkszentmihályi, M. (1990). Flow: The psychology of optimal experience. New York: HarperPerennial.
Sound in Electronic Gambling Machines
Delfabbro, P., Fazlon, K., & Ingram, T. (2005). The effects of parameter variations in electronic gambling simulations: Results of a laboratorybased pilot investigation. Gambling Research: Journal of the National Association for Gambling Studies, 17(1), 7–25. Dibben, N. (2001). What do we hear, when we hear music? Music perception and musical material. Musicae Scientiae, 2, 161–194. Dickerson, M., & Adcock, S. (1987). Mood, arousal and cognitions in persistent gambling: Preliminary investigation of a theoretical model. Journal of Gambling Behaviour, 3(1), 3–15. doi:10.1007/BF01087473 Dixon, L., Trigg, R., & Griffiths, M. (2007). An empirical investigation of music and gambling behaviour. International Gambling Studies, 7(3), 315–326. doi:10.1080/14459790701601471 Dixon, M., Harrigan, K. A., Sandhu, R., Collins, K., & Fugelsang, J. (2011: In press). Slot machine play: Psychophysical responses to wins, losses, and losses disguised as wins. Addiction. Dreher, R. E. (1947). The relationship between verbal reports and the galvanic skin response. Journal of Abnormal and Social Psychology, 44, 87–94. Dretzka, G. (2004, December 12). Casinos, celebrities bet on our love for pop culture icons. Seattle Times. Retrieved July 15, 2009, from http:// community.seattletimes.nwsource.com/archive/? date=20041212&slug=casinos12. Edworthy, J., Loxley, S., & Dennis, I. (1991). Improving auditory warning design: relationship between warning sound parameters and perceived urgency. Human Factors, 33, 205–231. Effrat, J., Chan, L., Fogg, B. J., & Kong, L. (2004). What sounds do people love and hate? Interaction, 11(5), 64–66. doi:10.1145/1015530.1015562
Ferrari, M., & Ives, S. (2005). Slots: Las Vegas gamblers lose some $5 billion a year at the slot machines alone. Las Vegas: An unconventional history. New York: Bulfinch. Gaboury, A., & Ladoucer, R. (1989). Erroneous perceptions and gambling. Journal of Social Behavior and Personality, 4(41), 111–120. Garlin, F. V., & Owen, K. (2006). Setting the tone with the tune: A meta-analytic review of the effects of background music in retail settings. Journal of Business Research, 59, 755–764. doi:10.1016/j. jbusres.2006.01.013 Glass, D. C., & Singer, J. E. (1972). Urban stress. New York: Academic. Griffiths, M., & Parke, J. (2005). The psychology of music in gambling environments: An observational research note. Journal of Gambling Issues, 13. Retrieved July 15, 2009, from http://www. camh.net/egambling/issue13/jgi_13_griffiths_2. html. Griffiths, M. D. (1990). The cognitive psychology of gambling. Journal of Gambling Studies, 6(1), 31–42. doi:10.1007/BF01015747 Haas, E. C., & Edworthy, J. (1996). Designing urgency into auditory warnings using pitch, speed and loudness. Computing and Control Engineering Journal, 7, 193–198. doi:10.1049/cce:19960407 Harrigan, K. A. (2009). Slot machines: Pursuing responsible gaming practices for virtual reels and near misses. International Journal of Mental Health and Addiction, 7(1), 68–83. doi:10.1007/ s11469-007-9139-8 Harrigan, K. A., & Dixon, M. (2009). PAR sheets, probabilities, and slot machine play: Implications for problem and non-problem gambling. Journal of Gambling Issues, 23, 81–110. doi:10.4309/ jgi.2009.23.5
17
Sound in Electronic Gambling Machines
Hébert, S., Béland, R., & Dionne-Fournelle, O. (2005). Physiological stress response to videogame playing: the contribution of built-in music. Life Sciences, 76, 2371–2380. doi:10.1016/j. lfs.2004.11.011 Hirokawa, E. (2004). Effects of music, listening, and relaxation instructions on arousal changes and the working memory task in older adults. Journal of Music Therapy, 41(2), 107–127. Hirsch, A. R. (1995). Effects of ambient odors on slot-machine usage in a Law Vegas casino. Psychology and Marketing, 12(7), 585–594. doi:10.1002/mar.4220120703 Hopson, J. (2001). Behavioral game design. Gamasutra. Retrieved October 23, 2009, from http://www.gamasutra.com/view/feature/3085/ behavioral_game_design.php. Iwamiya, S. (1994). Interaction between auditory and visual processing when listening to music in an audio visual context. Psychomusicology, 13, 133–154. Jackson, D. (2003). Sonic branding: An introduction. New York: Palgrave/Macmillan. doi:10.1057/9780230503267 King, D., Delfabbro, P., & Griffiths, M. (2009). Video game structural characteristics: A new psychological taxonomy. International Journal of Mental Health and Addiction, 8(1), 90–106. doi:10.1007/s11469-009-9206-4 Kranes, D. (1995). Play grounds. Gambling: Philosophy and policy [Special Issue]. Journal of Gambling Studies, 11(1), 91–102. doi:10.1007/ BF02283207 Ladouceur, R., & Sévigny, S. (2005). Structural characteristics of video lotteries: Effects of a stopping device on illusion of control and gambling persistence. Journal of Gambling Studies, 21(2), 117–131. doi:10.1007/s10899-005-3028-5
18
Langer, E. J. (1975). The illusion of control. Journal of Personality and Social Psychology, 32, 311–328. doi:10.1037/0022-3514.32.2.311 Lastra, J. (2000). Sound technology and the American cinema: Perception, representation, modernity. New York: Columbia University Press. Livingstone, C., Woolley, R., Zazryn, T., Bakacs, L., & Shami, R. (2008). The relevance and role of gaming machine games and game features on the play of problem gamblers. Adelaide: Independent Gambling Authority of South Australia. Lucas, G. (Director). (1977). Star Wars [Motion picture]. Los Angeles, CA: 20th Century Fox. (2002). Lucky Larry’s Lobstermania [Computer game]. Reno, NV: IGT. Marmurek, H. H. C., Finlay, K., Kanetkar, V., & Londerville, J. (2007). The influence of music on estimates of at-risk gambling intentions: An analysis by casino design. International Gambling Studies, 7(1), 113–122. doi:10.1080/14459790601158002 Mattilaa, A. S., & Wirtz, J. (2001). Congruency of scent and music as a driver of in-store evaluations and behavior. Journal of Retailing, 77, 273–289. doi:10.1016/S0022-4359(01)00042-2 McCraty, R., Barrios-Choplin, B., Atkinson, M., & Tomasino, D. (1998). The effects of different types of music on mood, tension and mental clarity. Alternative Therapies in Health and Medicine, 4, 75–84. Muzak Corporation. (n.d.). Why Muzak. Retrieved October 5, 2009, from http://music.muzak.com/ why_muzak. Nacke, L., & Grimshaw, M. (2011). Player-game interaction through affective sound . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Sound in Electronic Gambling Machines
Oguro, C. (2009). The greatest Easter eggs in gaming. Gamespot. Retrieved October 5, 2009, from http://www.gamespot.com/features/6131572/ index.html. Owen, D. (2006, April 10). The soundtrack of your life: Muzak in the realm of retail theatre. The New Yorker. Retrieved October 5, 2009, from http://www.newyorker.com/ archive/2006/04/10/060410fa_fact. Parke, J., & Griffiths, M. (2006). The psychology of the fruit machine: The role of structural characteristics (Revisited). International Journal of Mental Health and Addiction, 4, 151–179. doi:10.1007/s11469-006-9014-z Pitzen, L. J., & Rauscher, F. H. (1998, May). Choosing music, not style of music, reduces stress and improves task performance. Poster presented at the American Psychological Society, Washington, DC. Productions, K. S. K. (n.d.). Cinematic & Muzak. Retrieved October 20, 2009, from http://www.kskproductions.nl/en/services/cinematic-a-muzak. Rivlin, G. (2004, May 9). The tug of the newfangled slot machines. New York Times. Retrieved July 15, 2009, from http://www.nytimes. com/2004/05/09/magazine/09SLOTS.html. Rohner, S. J., & Miller, R. (1980). Degrees of familiar and affective music and their effects on state anxiety. Journal of Music Therapy, 17, 2–15. Schull, N. D. (2005). Digital gambling: The coincidence of desire and design. The Annals of the American Academy of Political and Social Science, 597, 65–81. doi:10.1177/0002716204270435 Scott, T. (Director). (1986). Top Gun [Motion picture]. Hollywood, CA: Paramount Pictures.
Seeking Alpha, “The Video Game Industry: An $18 Billion Entertainment Juggernaut” August 05, 2008 http://seekingalpha.com/article/89124the-video-game-industry-an-18-billion-entertainment-juggernaut. Sharpe, L. (2004). Patterns of autonomic arousal in imaginal situations of winning and losing in problem gambling. Journal of Gambling Studies, 20, 95–104. doi:10.1023/ B:JOGS.0000016706.96540.43 Skea, W. H. (1995). “Postmodern” Las Vegas and its effects on gambling. Journal of Gambling Studies, 11(2), 231–235. doi:10.1007/BF02107117 Smith, C. A., & Morris, L. W. (1976). Effects of stimulative and sedative music on cognitive and emotional components of anxiety. Psychological Reports, 38, 1187–1193. Sullivan, D. B. (1992). Commentary and viewer perception of player hostility: Adding punch to televised sports. Journal of Broadcasting & Electronic Media, 35, 487–504. (1995). Super Mario Bros [Computer game]. Redmond, WA: Nintendo. Surman, D. (2007). Pleasure, spectacle and reward in Capcom’s Street Fighter series . In Krzywinska, T., & Atkins, B. (Eds.), Videogame, player, text (pp. 204–221). London: Wallflower. 7th guest [Computer game]. (1993). Trilobyte (Developer). London: Virgin Games. Thayer, J. F., & Levenson, R. W. (1983). Effects of music on psychophysiological responses to a stressful film. Psychomusicology, 3(1), 44–52. The adventures of Rocky and Bullwinkle [Computer game]. (1992). Radical Entertainment (Developer). Agoura Hills, CA: THQ. The Flintstones. (1991). The rescue of Dino & Hoppy [Computer game]. Vancouver, Canada: Taito Corporation.
19
Sound in Electronic Gambling Machines
The Jetsons. (1992). Cogswell’s caper! [Computer game]. Vancouver, Canada: Taito Corporation. Toneatto, T., Blitz-Miller, T., Calderwood, K., Dragonetti, R., & Tsanos, A. (1997). Cognitive distortions in heavy gambling. Journal of Gambling Studies, 13, 253–261. doi:10.1023/A:1024983300428 Too human [Computer game]. (2008). Silicon Knights (Developer). United States: Microsoft Game Studios. Traxel, W., & Wrede, G. (1959). Changes in physiological skin responses as affected by musical selection. Journal of Experimental Psychology, 16, 57–61. Tsukahara, N. (2002). Game machine with random sound effects. U.S. Patent No. 6,416,411 B1. Washington, DC: U.S. Patent and Trademark Office. Turner, N., & Horbay, R. (2004). How do slot machines and other electronic gambling machines actually work? Journal of Gambling Issues, 11. Westermann, C. F. (2008). Sound branding and corporate voice: Strategic brand management using sound. Usability of speech dialog systems: Listening to the target audience. Berlin: SpringerVerlag. (1990). Wing commander [Computer game]. Austin, TX: Origin Systems. Wolfson, S., & Case, G. (2000). The effects of sound and colour on responses to a computer game. Interacting with Computers, 13, 183–192. doi:10.1016/S0953-5438(00)00037-0 Yalch, R. F., & Spangenberg, E. R. (2000). The effects of music in a retail setting on real and perceived shopping times. Journal of Business Research, 49, 139–147. doi:10.1016/S01482963(99)00003-X
20
Yamada, M. (2009, September). Can music change the success rate in a slot-machine game? Paper presented at the Western Pacific Acoustics Conference, Bejing, China. You don’t know jack [Computer game]. (1995). Berkeley Systems/Jellyvision (Developer). Fresno, CA: Sierra On-Line.
KEY tErMs AND DEFINItIONs Acoustic Frustration: The use of sound to antagonize a player, creating a short-term sense of frustration that, it has been suggested, prolongs the play period. Electronic Gambling Machines: EGMs, also known as slot machines, video slots, or video fruit machines are digital, electronic slot machines. They tend to be much faster than electric or mechanical slots, with an increased number of play options and bonuses. Galvanic Skin Response: GSR: one component of electrodermal response, also known as skin conductance response or sweat response, is an affordable and efficient measurement of simple changes in arousal levels—one of the reasons why it is the main component of a polygraph device. Essentially, GSR measures the electrical conductivity of the skin, which changes in resistance due to psychological states. Losses Disguised as Wins: A play in which the player “wins” but receives a payout amount of money less than that of the amount wagered, hence actually losing on the wager despite being convinced (sonically) that they have, in fact, won. Near Miss: A failure that was close to a win— such as two matching icons arriving on the payline followed by a third reel whose icon sits just off the pay-line. Slot machine manufacturers use this concept to create a statistically unrealistically high number of near misses (Harrigan 2009), which convinces the player that they are close to win-
Sound in Electronic Gambling Machines
ning, and therefore leads to significantly longer playing times (Parke & Griffiths, 2006). Reward Schedule: A schedule of pay-off or rewards tied to timings or game actions, resulting in a series of emotional peaks and valleys to keep a player interested in a game. Rolling Sound: The music or sound effects that are played when a player wins a round on a slot machine. The length of the sound (its roll) is tied to the amount of the win, with longer sounds rolling for longer times.
2
3
4
ENDNOtEs 1
It is a common practice for many avid slot machine gamers to play multiple, adjacent machines simultaneously. Further, activities like drinking, smoking and interaction with
5
other gamblers and passersby may also take gamers’ attention away from the machines. For instance, a reward schedule is built into Too Human. Personal conversation, Denis Dyack of Silicon Knights, St. Catherines, Ontario, 2008. See Hopson, 2001. There are different versions of the game available, including a “progressive slot” with varying jackpots, a 25-line slot with a max bet of 1,250 credits and a payout of 500,000 credits. Thanks to the anonymous reviewer of the chapter for this idea. Commission on Behavioral and Social Sciences and Education Committee on the Social and Economic Impact of Pathological Gambling. (1999). Committee on Law and Justice. Commission on Behavioral and
21
22
Chapter 2
Sound for Fantasy and Freedom Mats Liljedahl Interactive Institute, Sonic Studio, Sweden
AbstrAct Sound is an integral part of our everyday lives. Sound tells us about physical events in the environment, and we use our voices to share ideas and emotions through sound. When navigating the world on a day-to-day basis, most of us use a balanced mix of stimuli from our eyes, ears and other senses to get along. We do this totally naturally and without effort. In the design of computer game experiences, traditionally, most attention has been given to vision rather than the balanced mix of stimuli from our eyes, ears and other senses most of us use to navigate the world on a day to day basis. The risk is that this emphasis neglects types of interaction with the game needed to create an immersive experience. This chapter summarizes the relationship between sound properties, GameFlow and immersive experience and discusses two projects in which Interactive Institute, Sonic Studio has balanced perceptual stimuli and game mechanics to inspire and create new game concepts that liberate users and their imagination.
INtrODUctION At the Interactive Institute, Sonic Studio in Piteå, Sweden, we do research on sound and auditory perception in order to find new ways to use sound, new contexts where sound can be utilized, and new applications for sound in general. Of special interest to us is how sound resembles and differs from other sensory stimuli and how this can be DOI: 10.4018/978-1-61692-828-5.ch002
put to play. In our work we use perspectives and methods from art, science, and technology and we utilize digital technology as a vehicle for our ideas and experiments. In a series of projects we have explored intuitive, emotional, imaginative, and liberating properties of sound. These projects have resulted in new insights and knowledge as well as in new and innovative applications for sound, audio, and technology. In this chapter I will describe our perspective on a number of sound properties and how
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Sound for Fantasy and Freedom
we have put these to work in various ways. The projects are based on and inspired by an ecologic and everyday-listening approach to sound, like the ones proposed by R. Murray Shafer, William Gaver, and their followers. As human beings, we are good at interpreting the soundscape constantly surrounding us. When we hear a sound we can make relatively accurate judgments about the objects involved in generating the sound, their weight, the materials they are made of, the type of event or series of events that caused the sound, the distance and direction to the sound source, and the environment surrounding the sound source and the listener, for example. Much of the existing research on sound and auditory perception is about how to convey clear and unambiguous information through sound. In computer games, however, the aim is also to create other effects, effects that have as much to do with emotions, the subconscious, intuition, and immersion as they do with clear and unambiguous messages. This article describes a couple of projects in which we have worked with the balance between eye and ear, between ambiguity and un-ambiguity, between cognition and intuition and between body and mind. The aim has been to create experiences built on a multitude of human abilities and affordances, mediated by new media technology. In a traditional computer game setting, the TV screen or computer monitor is the center of attention. The screen depicts the virtual game world and the player uses some kind of input device, such as a game pad, a mouse, a keyboard, or a Wiimote, to remotely control the virtual gameworld and objects and creatures in it. The action takes place in the virtual world and the player is naturally detached from the game action by the gap between the player’s physical world and the virtual world of the game. Much work has to be done and complex technology used in order to bridge that gap and to have the player experience a sense of presence in the virtual gameworld. The aim is to make the player feel as immersed as possible in the game experience and to make her suspend her natural
disbelief. To achieve this, the computer game industry must build broader and broader bridges over the reality gap to make the virtual game reality more immersive. The traditional way to increase immersion and suspension of disbelief has primarily been to increase graphics capability and, today we can enjoy near photo-realistic, 3D-graphics in real time. But there might be alternative ways to tackle the problem. Potentially, computer games could be more engaging and immersive without having to build long and broad bridges over the reality gap. What about narrowing the gap instead of building broader bridges over it?
bAcKGrOUND Sound and light work in different ways and reach us on complementary channels. Our corresponding input devices, the visual and auditory perceptions, show both similarities and differences and we have an innate ability to experience the world around us by combining the visual, auditory, touch and olfactory perceptions into one, multimodal whole. We are built for and used to handling the world through a balanced mix of perceptual input from many senses simultaneously. This can be exemplified in different ways. One is by crossmodal illusions, for example, the McGurk effect (Avanzini, 2008, p. 366) which shows how our auditory perception is influenced by what we see. Another example is the ventriloquist illusion in which the perceived location of a sound shifts depending on what we see (O’Callaghan, 2009, section 4.3.1). If the signal on one sensory channel is weak, we more or less automatically fill in the gaps with information from other channels and, in this way, we are able to interpret the sum of sensory input and make something meaningful of that sum. Watching lip movements in order to hear what your friend is saying at a noisy party is just one everyday example of this phenomenon. A third example is Stoffregen and Bardy’s concept of “global array” (Avanzini, 2008, p. 350).
23
Sound for Fantasy and Freedom
According to this concept, observers are not separately sensitive to structures in the optic and acoustic flows, but are rather sensitive to patterns that extend across these flows: the global array. Another way to describe this is that we do not “see and hear” but rather “see-hear”, what we perceive is the sum of sensations reaching our different modalities. What we really hear, what a sound is, where a sound is located and so forth are questions that philosophers have been arguing over for several hundreds of years. O’Callaghan (2009) gives a broad summary of the history and current state of the field. What most philosophers and sound researchers agree on is that sounds are the result of events in the physical world. Sound holds information about these events and the objects involved in them. This means that to our perception, sounds are strongly linked to the physical world and we are “hard wired” to treat sounds as tokens of physical activity, matter in motion and matter in interaction. In this context the pioneering work of William Gaver (1993) on sound classification and listening modes is still often cited and relevant for game sound design. Gaver makes a distinction between musical listening and everyday listening. In musical listening, you listen to the acoustic properties of the sound, for example, its pitch, loudness, and timbre. In everyday listening, on the other hand, you listen to events rather than sounds. When you hear a car passing by or you hear a bottle breaking you do not pay much attention to pitch or loudness but more to the event as such. In everyday listening, the interpretation and the mapping of sounds to the individual’s previous experiences and memories are crucial. When a bottle crashes against the floor, loses its original shape and turns into a number of smaller and larger pieces, this is immediately obvious to the eye. But, in order to be able to pinpoint the event that caused the sound of the broken bottle, the ear has to learn and form a memory that connects the sound of broken glass to the event of a bottle crashing and losing
24
its shape. Even if the individual has a previous experience and memory that connects the sound of broken glass to the event of a broken bottle, the ear, not knowing the exact cause of the sound, might hesitate. Was it a bottle that crashed or was it perhaps a large drinking glass that broke? The eye can give the correct answer, whereas the ear is left to interpret and to guess in various degrees. Tuuri, Mustonen, and Pirhonen (2007) have continued along this path and propose a hierarchical scheme of listening modes. Two of these are preconscious, two are source-oriented, three are context-oriented and one is quality-oriented. In the two preconscious-oriented listening modes, the focus is on what reflexive, emotive and associative responses a sound evokes in the listener. In the two source-oriented modes, the focus is on how the listener perceives the source of a sound and what event caused it. In the three context-oriented modes, the focus is on whether the sound had a specific purpose, if it represents any symbolic or conventional meaning, and if the sound in that case was suitable and understandable in the context. In the last, quality-oriented listening mode, the focus is on the acoustic properties of the sound, its pitch, loudness, duration and so forth. To use these or other, complementary, identified listening modes is a powerful way to inform the sound design process of not only computer games, but sound design processes in general. The important thing to notice here is that research on listening modes in general shows that sound can indeed be used to evoke emotions and associations, to communicate properties of physical objects and events and to convey meaning and purpose. Already from the time before we are born our auditory perception starts giving us information about the world around us (Lecanuet, 1996). From day one we start building our library of associations to individual sounds and to whole soundscapes. Gradually, we learn what they mean and we train our ability to interpret them. Furthermore, some researchers argue that we experience sounds “as of” a bigger whole. O’Callaghan (2009) argues
Sound for Fantasy and Freedom
that the sound of hooves of a galloping horse is not identical with the galloping. Instead, it is part of the particular event of galloping: “Auditory perceptual awareness as of the whole [sic] occurs in virtue of experiencing the part” (part 2.3.2). This strong linkage between the sounds we hear and the physical world we inhabit can be brought into play in computer games through rich soundscapes in order to convey information about objects, environments, and events in the game world. Try this simple experiment. Pick an environment with a reasonable number of activities, people, birds, machines or whatever you can find that makes everyday sounds. Close your eyes and try not to interpret, make associations and create mental pictures from what you hear. It is very hard to put the auditory interpreter to rest and this is true for sounds from all types of sources, including the headphones playing sounds from your iPod. This interpretation, mapping, or disambiguation of individual sounds and whole soundscapes involves high-level mental processes related to our conscious and subconscious, cognitive and emotional layers. As such, these processes have the potential to invoke a myriad of physical and mental responses: fear, flight, well-being, happiness, anger, understanding and so on. In computer game design, this means huge potential to both convey cognitive meaning and to create moods and affect. Auditory perception can be understood as becoming aware of the whole by virtue of the parts. Sounds can also be said to be more ambiguous and leave wider space for interpretation than visual stimuli do, at least when it comes to interpreting where and what we have heard. In Human Computer Iinterface (HCI) contexts, ambiguity has often been thought of in terms of disadvantage and problem (Sengers & Gaver, 2006) and much, perhaps even a majority, of the research done in the field has tried to overcome this and find ways to create clear and unambiguous systems and interfaces. Research on sound interaction design is no exception to this, as described by, for example,
Gaver (1997). This is true also when it comes to sound in computer games but, in this context, the need to interpret and disambiguate the computer game system is not the only aspect of the issue. On the contrary, some authors argue that ambiguity and the need to interpret a system instead can be used as an asset (Sengers & Gaver, 2006; Sengers, Boehner, Mateas, & Gay, 2007). Here, we argue that this is certainly the case. When the ideas of ambiguity and interpretation are combined with the concepts of flow and GameFlow described below, the sum can be used to inform the game design process in new ways. Development of computer games has so far mostly been geared towards vision. When it comes to sound in games, much of the work is inspiring case studies but less research. Sweetser and Wyeth list three aspects of usability in games that have previously been in focus for research (Sweetser & Wyeth, 2005). These are interface (controls and display), mechanics (interacting with the gameworld), and gameplay (problems and challenges). Lately, also other dimensions of the design and use of computer games have started to gain interest among game researchers, dimensions that incorporate new and more complex aspects and ideas of player enjoyment and computer game design. Several research groups have, for example, made connections between interactivity in general and, more specifically, player enjoyment in games on the one hand and the concept of flow developed by Mihaly Csíkszentmihályi on the other. In the 1970s and 1980s Csíkszentmihályi conducted extensive research into what makes experiences enjoyable. He found that optimal experiences are the same all over the world and can be described in the same terms regardless of who is enjoying the experience. He called these optimal experiences flow. A flow experience is defined by Csíkszentmihályi (1990) as being “so gratifying that people are willing to do it for its own sake, with little concern for what they will get out of it, even when it is difficult, or dangerous” (p. 71).
25
Sound for Fantasy and Freedom
Judging from the volume and type of work built on and derived from Csíkszentmihályi’s flow principle, it can be argued that the concept is relevant in the context of computer games. Andrew Polaine (2005) has written about The Flow Principle in Interactivity. This work does not relate to computer games per se, but is closely related to the subject in that it connects flow with both “willing suspension of disbelief” (a term borrowed from narratives in theater and film) and the experience of play. The GameFlow model developed by Sweetser and Wyeth builds directly on the concept of flow and is a model for evaluating computer games from an enjoyment perspective. Another example is Kalle Jegers’ (2009) “Pervasive GameFlow” model that takes Sweetser’s and Wyeth’s GameFlow concept to the pervasive game arena. A final example is Cowley, Charles, Black, and Hickey’s (2008) USE model (User, System, Experience) that looks at games, player interaction, and flow from an information system perspective. Built on the Flow concept, Sweetser and Wyeth’s GameFlow model consists of eight elements for achieving enjoyment in games. The model can be used both when designing new games and when evaluating existing game concepts. In summary, according to Sweetser and Wyeth, games must keep the player concentrated through a high workload. At the same time, the game tasks must be sufficiently challenging and match the skill level of the player. The game tasks must have clear goals and the player must be given clear feedback on progression towards these goals. Enabling deep yet effortless involvement in the game can potentially create immersion in the game. According to Sweetser and Wyeth, experiences can be immersive if they let us concentrate on the task of the game without effort. “Effortlessly” can, in this context, be interpreted in several ways: one way to think about it is in terms of how true to real life a gaming experience is and how transparent the interaction with the game creating the experience it is. How the GameFlow model can
26
be used in sound design for games is covered in more detail below. A number of research projects report on sound and audio’s ability to create rich, strong and immersive experiences using mobile platforms that give physical freedom to the users. These projects also support the general idea that sound and audio are well suited for use in the design of computer game experiences based on the GameFlow model. Reid, Geelhoed, Hull, Cater, and Clayton report on a public, location-based audio drama called Riot 1831. The evaluation of the project showed that a majority of the users had rich and immersive experiences created from the sounds of an audio-based narrative. Based on the results from this project, the authors argue that “immersion is a positive determinant for enjoyment (and vice versa)” (Reid, Geelhoed, Hull, Cater, & Clayton, 2005). It should be noted that the drama took place in a square in Bristol, UK, which gives this project similarities to pervasive and location-based games where the virtual gameworld and the physical world of the player are blended. Friberg and Gärdenfors (2004) report on a project in which three audiobased games (what the authors term TiM games) were developed. Based on audio communication with the users, the authors report that these games give the users spatial freedom, encourage physical activity and open up possibilities to create new types of interfaces for input to the game. Ekman et al. report on the development of a game for a mobile platform (Ekman et al., 2005). They point out that sound and audio can indeed be used to create immersion, but also that the use of sound does not automatically create immersion. Great care must be taken when designing the game sounds and the developers must also carefully select the best technology and equipment to play back the game audio to get the desired effects. In two projects, the Interactive Institute’s Sonic Studio has investigated how sound in games can be used to bring the user’s own fantasy into play to create new gaming experiences (Liljedahl, Lindberg, & Berg, 2005; Liljedahl, Papworth, &
Sound for Fantasy and Freedom
Lindberg, 2007). In both these projects the balance point between visible and audible stimuli from the game has been moved away from the visual and towards the audible. In both cases the users of the computer games are given only a minimum of visual information and are, instead, given rich and varied soundscapes. The projects have shown that the users have had rich and immersive gaming experiences and are given other types and amounts of freedom compared to more traditional computer games. These projects will be described in more detail later in this chapter. Humankind has, in recent centuries, invested considerable energy and creativity in creating complex technology. We have a long tradition in replacing human capability with machinery. In the early days it was mostly muscle power that was mimicked, replaced, and superseded by steam, combustion, and, later, electricity. It can be argued that research into artificial intelligence is striving to do the same with human cognitive and emotional capabilities. Following this long tradition, it seems that we often neglect human capabilities, affordances, gifts, and needs when designing computer games and other systems. Much of the focus has been on creating photorealistic 3D-environments in real time and less on how the players’ internal, fantasy-driven, “sound interpreter and mapper” can be put into play to create complementary, mental images. In the following I will describe how we at Interactive Institute, Sonic Studio work with finding ways to increase user satisfaction and involvement in gaming situations by using existing technology in slightly new ways. Often, this has meant moving complexity from technology to the user, decreasing the demands on technology used, and increasing the demands on the user to invest and spend energy physically and mentally in a game experience.
MIND tHE GAP—sOUND FOr FEEDbAcK AND IMMErsION Pictures are not the real world; they are merely the shadows of it. René Magritte’s provoking pipe is a painting about exactly this: the picture of a pipe and beneath it the text “Ceci n’est pas une pipe” (This is not a pipe). We are surrounded by still and moving images and we are used to treating pictures as pictures and not the real, physical world. Even the most violent computer games and Hollywood film productions are assumed to be physically and mentally non-hazardous to us just because we are supposed to be able to discriminate between reality and the fictive picture of it. Sound, on the other hand, seems to work slightly differently. When striving for engagement, immersion, and suspension of disbelief in computer games and films, sound, plays a very prominent role and, according to Parker and Heerema (2007), “sound is a key aspect of a modern video game”. Natural sounds in the physical world are the result of events in that world and we become aware of physical events to a large degree through sound. It can thus be argued that sound is a strong link to the physical world. In fact, Gilkey and Weisenberger argue that “…an inadequate, incomplete or nonexistent representation of the auditory background in a VE [Virtual Environment] may compromise the sense of presence experienced by users” (quoted in Larsson, Västfjäll, & Kleiner, 2002). It is this mechanism that is utilized when creating the sound tracks to films and games. Just seeing Donald Duck smash into a wall is not enough. It is not until the sound effect is added that the nature and the full consequence of the smash are made evident to the audience. When we hear the sound of the smash, all of us have our own, slightly unique, experiences of and relationship to the sound. The sound has the power to immediately trigger our interpretation machinery and evoke memories and fantasies. In a fraction of a second the sound makes us re-live our own experiences and we can feel what Donald feels:
27
Sound for Fantasy and Freedom
pain, anger, and humiliation. In this way it can be argued that the sound is playing us. Like a guitarist plucking a string that generates sound, sound is plucking our interpretation, spawning memories, understanding, and emotions. The string cannot stop the guitarist from plucking it and we cannot stop sound from triggering our understanding, our memories, associations, and emotions. For a computer game to be successful it is crucial that the players can immerse themselves in the gaming experience and that they are invited to a gameworld and game experience in which they are willing to suspend their natural disbelief. After all, World of Warcraft is not the real world. In their GameFlow concept, Sweetser and Wyeth (2005) set up a number of criteria that game designers and game researchers can use when designing and evaluating games with respect to immersion and suspension of disbelief. Some of these criteria are general, overarching principles that relate to many human activities, while other criteria relate more closely to gaming and the media used to convey the game’s metaphor and narrative. The GameFlow model lists the following criteria for player enjoyment in games:
•
Concentration. Games should require concentration and the player should be able to concentrate on the game Challenge. Games should be sufficiently challenging and match the player’s skill level Player Skills. Games must support player skill development and mastery Control. Players should feel a sense of control over their actions in the game Clear Goals. Games should provide the player with clear goals at appropriate times Feedback. Players must receive appropriate feedback at appropriate times Immersion. Players should experience deep but effortless involvement in the game
•
•
•
• • • • •
28
Social Interaction. Games should support and create opportunities for social interaction (Sweetser & Wyeth, 2005).
As can be seen from the list, these criteria are very general and could be applied to many aspects of life, from children’s play to high school education, working life, and leisure. When it comes to sound design for computer games, some of these criteria are more relevant than others. When looking at Tuuri, Mustonen, and Pirhonen’s (2007) hierarchical listening modes, a clear link to the GameFlow concepts Feedback and Immersion criteria can be found. Sweetser and Wyeth divide the Feedback criterion into the following parts: • • •
Players should receive feedback on their progress towards their goals Players should receive immediate feedback on their actions Players should always know their status or score (Sweetser & Wyeth, 2005).
The Immersion criterion is similarly divided into the following parts:
• • • •
Players should be less aware of their surroundings Players should be less self-aware and less worried about everyday life or self Players should experience an altered sense of time Players should feel emotionally involved in the game Players should feel viscerally involved in the game (Sweetser & Wyeth, 2005).
Given our ability to listen on several cognitive abstraction levels, as indicated by Tuuri, Mustonen, and Pirhonen’s hierarchical listening modes, it can be argued that sound is well suited to communicate feedback to the user and to substantially add to the game’s ability to immerse the player in the gaming experience. In the following
Sound for Fantasy and Freedom
we will look at how sound can be used and what sound properties could be brought into play in order to give immediate and continuous feedback to users, to help them become less aware of their surroundings and themselves, and to help them get involved in the game.
sOUND PrOPErtIEs At YOUr DIsPOsAL There are a number of properties of sound as a physical, acoustic phenomenon that, in conjunction with the inherent workings of our auditory perception and our ability to use different listening modes, are at our disposal to use, explore, and exploit when designing computer game experiences. Most of these properties are well known in everyday contexts and most people will immediately be able to connect to the descriptions of them, have own experiences of them and to understand the implications of them. These properties can, of course, be described in physical and acoustic terms of frequency, amplitude, overtone spectrum, envelopes and so forth. Unfortunately these terms say very little about our human experiences of sounds, sound sources, and soundscapes. It is therefore important to also describe sound properties in relation to how our hearing works. The following is a summary of what we have discussed above and an attempt to start making the discussion more concrete and applicable to sound design for computer games.
Omni-Directionality Sound is omni-directional and reaches our ears from all directions (almost) simultaneously. Actively and consciously, as well as automatically and pre-consciously, we use this omni-directionality to navigate in our everyday lives. Even though we do not have to look out for saber-toothed tigers anymore, we are constantly warned for cars and buses from left and right, falling trees from
behind, and other dangers. Our ears are under a constant bombardment of auditory input from all directions and we cannot simply turn away from a sound. To be able to handle all this information and to avoid fatigue and sensory overload, we handle most of the input subconsciously. Luckily, we also have the ability to focus on specific parts in the soundscape. We can, for example, isolate a conversation with a friend in a noisy restaurant from a dozen nearby, unrelated conversations. This is often referred to as “the cocktail party problem” (Bregman, 1990, p. 529). In GameFlow terms, the omni-directional qualities of sound relate to both feedback and immersion. Sound for feedback from a game does not force the user to look at a special location on a screen: in fact, it does not require a screen at all. Sound is a strong carrier of emotions, events, and objects, as discussed above. In our everyday lives, we are also used to being surrounded by sound. Mimicking this in a computer game scenario can make profound contributions to the immersive qualities of the game.
Uninterruptible Along the same line is the fact that we do not have “earlids” and cannot just shut our ears to get rid of the sounds around us or choose to hear just one of the sounds of the total mass that reaches our ears. From an evolutionary point of view, it has been an advantage to get early warnings and hear all dangers, not only the dangers you choose to listen to, but all. It also means that our eyes and our ears are designed differently and that the streams of sensory input from those senses complement and interact with each other. Again, this means that a constant stream of input data must be handled. The way to cope with this is to do it subconsciously. In our everyday lives we are submerged in the ever-present stream of sounds from the world surrounding us. By supplying a relevant and welldesigned stream of sounds from a computer game, the users can get constant and natural feedback
29
Sound for Fantasy and Freedom
on their actions, very much like in real life. This in turn adds in a natural way to the sought-after effortless immersion.
sound connects to the Physical World Sound connects you to the physical world by telling about physical objects and events that involve physical objects. We can be described as hardwired to perceive and automatically interpret sounds as results of events occurring in the physical world. This is true even if the sound is mediated through a loudspeaker: our internal interpreter does not make much difference between sounds from a physical coffee cup being placed on a table and the recorded sound of the same event played back through a pair of headphones as long as the technical quality is sufficient. It is still a coffee cup being placed on a table. As with the real-world example you were asked to listen to above, try listening to a film with your eyes shut. It is virtually impossible to turn off the flow of images, feelings and associations flowing through you as you listen. You have to concentrate very hard on something else not to be affected by the sounds that reach your ears. The sound of a dentist’s drill gives a direct bodily sensation and you can almost feel the drill in your own mouth. The picture of the drill alone, without the sound, does not have the same power over our imagination, emotions, and physiology. Again, sound can be used to immerse the user in the gameworld in a way that strongly resembles the way we handle and work in everyday life.
sound can be Ambiguous We constantly hear sounds from all directions and, to some degree, we can decide the direction and the distance to the sound source. At will, we can consciously filter out discrete sounds of special interest to us from the whole soundscape around
30
us, but we are also forced to process most of what we hear subconsciously. Often, we do not know exactly what the source of a sound is or from what direction and distance it comes. We can hear a vehicle approaching from behind but have to guess what type of vehicle it is and how fast it is approaching. We can roughly tell if it is a truck or a car and make educated guesses about when it will pass us, but usually not more than that. Sound leaves a relatively large space within which we can (or are forced to) fill out the details ourselves and make assumptions and interpretations based on our individual memories, experiences and associations. When telling stories, making films or designing computer games, this ambiguity can be of great value. By planting a well-designed sound at the right moment, you can trigger a person’s imaginative and emotive mechanisms by forcing her to consciously or subconsciously interpret and disambiguate the sound. Leaving the user space open to her own interpretation, inviting her and giving her the freedom to use her own imagination can potentially help the user to be emotionally and viscerally involved in the game.
sound reaches us on subconscious channels Our ears are constantly capturing the soundscape around us. If all that data were to be processed by the cognitive and conscious layers in our brains, we would either suffer from mental overload or have another brain constitution. But thanks to the limited bandwidth of our consciousness, our subconscious, emotional and intuitive layers process most of the sounds we hear. This does not mean that we are not affected by what our ears pick up and what our brains are processing. What it does mean is that the effect is not totally controllable by us and that we are, to a large degree, victims of the sonic world. Often this is useful, sometimes it is stressful and sometimes it is fun. We are more or less forced
Sound for Fantasy and Freedom
to interpret and react to what we hear. A sound heard spawns meaning and interpretations based on our previous experiences. In games this can be extremely useful as a way to invite the players to invest and get deeply involved in the game. This relates strongly to the GameFlow criterion “immersion” described above.
sOUND tYPEs At YOUr DIsPOsAL There are a number of ways to categorize and classify sounds. In this context it makes sense to use the three categories traditionally used for sound in films and computer games (Sonnenschein, 2001; see also Hug, 2011; Jørgensen, 2011 for more involved taxonomies of computer game sound): •
•
•
Speech and dialog. Human language brought to sound, the sounding counterpart to the visual text. The most cognitive and unambiguous of the three types often used to convey clear messages with least possible risk of misunderstanding Sound effects and the subcategory ambient sounds. The result of events in the physical world. A falling stone hitting the ground; air fluttering in the feathers of a bird; a mechanical clock ticking; a heavy piece of frozen wood dragged over a horizontal, dry concrete floor; the ever-present, everchanging sounds of the atmosphere Music. Sometimes referred to as “the language of emotion”. An integral part of human cultures since the dawn of Homo sapiens.
Note that these categories are only for clarity and discussion. It is important to point out the fact that, in reality, the possible borders between them are floating. he borders between music particularly and the other two categories have been blurred for centuries: for example, music and dialog in opera
and musicals, music and ambience, and music and sound effects in games and films.
speech and Dialog When you want to convey a clear and unambiguous message, the human voice is a natural choice. The same is true if you want to tell a riddle or recite a poem or just want to be vague and ambiguous. Human language is so rich and there are a myriad of ways to use this in computer game contexts. Speech and dialog can be used to address several of the criteria for player enjoyment included in the GameFlow concept. They can be used to promote concentration on the game by providing a complementary source of stimuli, getting the player’s attention without disrupting the player’s visual focus, or spreading the total game workload on complementary channels, for instance. Sometimes it is necessary to give instructions to the player on what to do next, or what is expected from the game. If you do not want to exclude the player from an ongoing game sequence or if you have problems with limited screen size, using speech as a complement to text is one solution. Today, more and more computing and gaming platforms have built-in support for voice recognition, which means that the player can control the game by issuing voice commands. Since this is totally in line with what we do in our everyday lives, it also supports a very natural way to co-create the game world and to get a desired sense of impact upon it. Speech is a natural way to get feedback from a game on player progress and distance to game goals without having to force the player to shift visual focus to get the necessary feedback. Speech and human voices are totally natural parts of human society and of everyday lives. The human voice is therefore very well suited to making the players forget that they are participating in the game through a medium and it helps to make the game interface less visible and less obtrusive to the player. Voices can therefore be integrated into
31
Sound for Fantasy and Freedom
the background soundscape of the game to give a sense of human presence. Apart from the above-mentioned rather objective and technical uses of speech and dialog, all variations of subjective, expressive and dramatic qualities of the human voice are also available. A bad result uttered with an offensive voice will be something radically different from the same result uttered with a friendly and supportive voice. Here, the thin border between computer games, film, theater and other narrative media is clear.
as freed and part of the physical world through the added sound. Friberg and Gärdenfors use a number of categories for the sounds in the TiM games mentioned above. Most of their categories can be seen as subcategories to the traditional sound effect category. The categories listed by Friberg and Gärdenfors (2004) are:
sound Effects Make it real
•
Events in the physical world generate sounds. It is actually very hard to live and be active in this world without giving rise to sounds. Sounds heard in the physical world are the results of events involving physical objects. Explosions in a combustion engine, oscillations of the vocal cords in your throat, putting down your cappuccino cup on the saucer. Sounds are the proofs that you are still firmly attached to the physical world of your senses. The absence of sound, on the other hand, could be the sign that what you are experiencing is not real, that it is a dream or virtual reality. A green rectangle silently moving over a computer screen is probably perceived as just a green rectangle on the screen. But if you add the sound of a heavy stone dragged over asphalt to this simple animation, the green rectangle automagically turns into a heavy stone. Sound and computer game audio is a bridge on which the virtual visual worlds can travel out and become part of the real, physical world. Ambient or background sounds are the sounding counterparts to the graphic background. Having no ambient sounds is like having a pitch-black visual background and can be perceived as an almost physical pressure on the ears. Adding just a virtually inaudible ambient sound to the virtual world of a computer game can create an immediate experience of presence and reality. The silent virtual world that was locked in can be perceived
32
•
• •
Avatar sounds refer to the effects of avatar activity, such as footstep sounds, shooting or bumping into objects Object sounds indicate the presence of objects. They can be brief, recurring sounds or long, continuous sounds, depending on the chosen object presentation Character sounds are sounds generated by non-player characters Ornamental sounds are sounds that are not necessary for conveying gameplay information, such as ambient music, although they enrich the atmosphere and add to the complexity of the game.
In GameFlow terms this means that sound effects and ambient, background sounds can add to several of the criteria for player enjoyment. Presenting a lot of stimuli to the player on various channels is crucial for the ability of the player to concentrate on the game. We are also used to constantly interpreting the soundscape surrounding us, and a well designed game soundscape will have great potential to grab the player’s attention and help them focus on the game. Sound effects are today absolutely necessary for feedback to the players of computer games. Everything from game control commands issued by the player to virtual events caused by non-player characters can be signaled and embodied using sounds. Sound effects and ambient sounds are very important for player immersion and to involve the player emotionally and viscerally in the game. Many of the sound stimuli that reach our ears are processed subconsciously and handling sound
Sound for Fantasy and Freedom
on this level of perception is totally natural to us. This fact also supports the idea that sound is very well suited to adding to the total experience of immersion in the game world.
Music Makes You Feel Sound in general and music in particular have a very strong ability to touch our feelings. Music works emotionally in two significant ways. Firstly, it tells us stories about feelings that we do not necessarily feel ourselves: the music works like sounding pictures of emotions (Gabrielsson & Lindström, 2001, p. 230). Secondly, music can have the power to induce feelings in us, that is, to actually make us feel (Juslin & Västfjäll, 2008, p. 562). Today, the borders between music, sound effects, ambient background sounds and voices become more and more blurred and music is used as sound effects and sound effects can be used as music. It can therefore be hypothesized that the emotional qualities of music are also, to some extent, true for other types of sounds. Research has shown that music alone, in the absence of supporting pictures or other sensory input, can in many cases and for a majority of people induce feelings of happiness and sadness. Most people can also accurately tell if a piece of music is composed and intended to express sadness or happiness. Other, more complex emotions like jealousy or homesickness are harder to distinguish: Music, alone, has less power to induce such feelings and to actually make us feel them. However, if you add pictures and other media to the musical expression, the musical power increases exponentially. Auditory perception tends to dominate judgment in the temporal dimension (Avanzini,, 2008, p. 390). Music is a special case of this, since it is sound that is highly structured in time. By synchronizing sound and visual movements, very strong effects can be created. Some of the music we hear affects us very individually: it is not universal and does not com-
municate the same thing to two persons. But if the music is paired with something else, for example, a film or a game, something happens. People that are said to hate classical music, and would never put on a recording of classical music, can spend hours watching films with music tracks firmly grounded in western classical music tradition, sounding like something composed in the late 19th Century by Richard Wagner or Gustav Mahler. When musical sounds meet other sensory inputs, for example, music in an animated film, the individual stimuli tend to blend together and become a new whole. The “film + music” object is perceived as being radically different from the film alone and the music alone. The music becomes more universal and has the ability to communicate relatively universal values, emotions, and moods. Music is normally a very linear phenomenon: a song starts at A and ends at B, and the journey between the two is always the same and takes the same amount of time to travel each time. This is especially true of recorded, mediated music. In a non-linear and interactive context, this linear music concept does not necessarily apply. Most often, music has a form that creates successions of tension and relief, which in turn creates expectations on how the music will continue: the music can therefore not be altered as quickly and easily as other media. To function and be perceived as music, it has to follow at least some basic musical rules of form and continuity. A number of techniques and systems have been developed to cope with the gap between linear music and non-linear environments. Many of these are proprietary systems developed by the commercial game developers and are not available to the general public. What most of the systems seem to agree on is a division between a vertical and horizontal dimension. The vertical dimension controls aspects of musical intensity and emotion and the horizontal dimension controls aspects of time and form. The vertical dimension is often implemented using a layered approach whereby a number of musical tracks play in parallel. Each
33
Sound for Fantasy and Freedom
track plays music with a certain content representing a level of intensity or emotion and the game engine cross-fades between the tracks to create the correct blend of intensity and emotion. The horizontal dimension is often implemented using short phrases of 1, 2 or 4 bars linked together. When a transition from one musical segment to another is motivated by the state of the game, the current phrase is played to the end and the chain of linked phrases takes another route than if the game state had not changed.
3D-Positioned Audio Since sounds are the results of physical events in three-dimensional space, it is often vital to be able to give the impression of game sound as emanating from a certain point in a 3D space (see Murphy, 2011). 3D-positioned audio is a powerful technique to bridge the gap between the virtual game reality and the physical world of the player’s senses. This is especially true for sound effects but is also very useful for speech and dialog. Music and ambient sounds are most often not 3D-positioned.
sOUND FOr FANtAsY AND FrEEDOM We cannot hear away from a sound like we can look away from an object, and we have no “earlids” to shut as we can our eyelids. These simple facts makes sound ideal to use if you are looking for new game concepts to contrast the traditional screen and eye-based computer games. Western societies are often said to be vision-based or eye-centric. This suggests that we rely mostly on our eyes and use our other senses and abilities more or less just as support for what we see. In language this is reflected in that we “watch” things. We “watch” TV and films despite silent movies being history since the 1930s. We even “watch” music concerts (at least this is true in Swedish). Our knowledge
34
and awareness about vision, graphic design and so forth is also remarkably higher, more general and more common than their sounding counterparts, as are the creative tools available. In the Association for Computing Machinery’s Computing Classification System (2010), sound and audio are added late compared to, for example, computer graphics. Sound and audio are also mentioned on a lower level (level three) in the classification system, whereas computer graphics is a level two item.
balance the senses Our eyes play a dominant role in our everyday lives and computer game development has traditionally put most emphasis on graphics and vision. At the same time, other modalities and media types such as sound and hearing can be described as underused. This suggests that new computer game concepts could potentially be found by changing the balance between modalities and media types. What happens for example if we reduce graphics and visual stimuli and instead build the gaming experience more on sound and audition? What would the effect be if you had a computer game with only an absolute minimum of graphics and instead a rich, varied and gameplay-driving soundscape? Potentially such a game would be immersive in other ways and give different types of game experiences compared to more traditional, graphics-based games. A couple of things immediately become obvious. First of all, the game designer must let other qualities than computer graphics build and drive gameplay. Secondly the player is liberated from the need to keep her eyes on a 20-something-inch rectangle (in mobile applications only a few inches). Instead, all of a sudden, she becomes free to move over much larger areas or even volumes. Both these open up possibilities to create radically new types of computer games for radically new computer game experiences. They also represent new challenges for both game designers and computer game players (see Hug, 2011 for an expansion of such ideas).
Sound for Fantasy and Freedom
Our auditory perception is good at interpreting sounds as tokens of events. When we hear a sound we know something has happened, matter has interacted with matter. The sounds of broken glass, of cars colliding, of footsteps, our own breathing, and combustion engines all contain information about materials, weights, speeds, surface roughness and so on. In our everyday lives we are constantly immersed in a soundscape that we receive through two streams, one in the left ear and one in the right. From day one we start training our perception in order to be able to make priorities and pick out the relevant information from these two streams. Since sound reaches us from all directions, it can be hypothesized that most of the events we hear, we do not see. In the light of the above, it becomes natural to use sounds as means to convey feedback on both player actions and other events occurring in the virtual world of a computer game. Since sound tells us in a totally natural way about things we do not see, sound can be used to expand the game world far beyond what is displayed on a screen. Sound is very well suited for delivering the feedback and creating the immersion necessary for successful game concepts, as described by the GameFlow concept above. The use of sound to convey information about events, creatures and things that are not visible adds yet another dimension to the game experience: imagination, a word originally meaning “picture to oneself”. When we hear a sound without seeing the sound source we make an interpretation of what we have heard. The interpretation is based on previous experiences of memories of and associations to sounds with similar properties. The interpretation is often subconscious and made without effort. To invite the players to use their imagination, fantasy, and associations to fill out the gaps in this way and complement what they see on the screen is one way to make the players emotionally and viscerally immersed in the game. In a series of research and development projects we have conducted investigations and experiments
based on questions related to the ideas outlined above. These projects have shown that by shifting the balance between graphics and other media types and between eyes and other modalities, games with new qualities can indeed be created: games that attract new user groups and games that can be used in new contexts, in new ways and for new purposes. In this context it is also relevant to make a distinction between gameplay or game mechanics and metaphor. Gameplay can be defined as the set of rules and the mechanics that drive the game, the game’s fundamental natural laws. Metaphor on the other hand defines the world in which these abstract laws work. Gameplay can, for example, define that you are able to navigate in 4 directions called north, south, east and west, that you will be presented with challenges you can either win or lose, and that you win the game by winning a defined number of these challenges. Metaphor defines the world in which the navigation takes place and the nature of the challenges. When gameplay defines an abstract challenge, metaphor can, for example, show an enemy soldier that must be eliminated or it can present the player with a falling egg that must be caught before it hits the floor. A good game must have both a welldesigned gameplay and a metaphor that supports that gameplay: both are equally important. Often, the sound designer works with the metaphor side of a game. The metaphor chosen dictates the possibilities available to the sound designer. A metaphor with a large number of natural sounds that the players are likely to be able to relate to is potentially more immersive than a metaphor with few and/or unknown sounds.
two case studies In two projects, alternative ways to balance visual and audible stimuli in computer games have been explored by the Interactive Institute, Sonic Studio. In the first project, called Beowulf, a game for devices with limited screen size, such as cell phones,
35
Sound for Fantasy and Freedom
Figure 1. The Beowulf game window
was developed (Liljedahl, Papworth, & Lindberg, 2007). In this project, the hypothesis was set up that a game with most of the graphics removed, having, instead, a rich, varied and challenging soundscape, can create a new type of immersive game experience. The hypothesis also included the idea that a game built mostly on audio stimuli will be more ambiguous and open for interpretation than a game built on visuals and that the need for the users to interpret and disambiguate the soundscape will create a rich and immersive game experience with new qualities compared to traditional computer games. The game uses both a well-known gameplay and a traditional metaphor to keep as many parameters as possible constant. Although the gameplay is very simple, the game’s sound-based metaphor makes it a both challenging and rewarding game to play. The Beowulf game world is graphically represented by a revealing map, a map showing only the parts of the game world you have visited so far as a red track (see Figure 1). Your position in the game world is indicated by a blue triangle pointing in your current direction. The player uses headphones to listen to the gameworld, which is described in much greater detail audially than visually. The player navigates this gameworld by listening to sound sources positioned in a 3D space. Navigating includes localizing sound sources by turning and moving to experience changes
36
and differences just as in real life. Feedback on player actions and progress is given by footstep sounds, breathing sounds, the sound of a swinging sword, and other sounds natural in the context of the game’s world metaphor. Immersion is created through the natural and effortless interaction with the sounding dimension of the gameworld. In the second project, called DigiWall, the computer monitor was removed totally (Liljedahl, Lindberg, Berg, 2005). Instead, a computer game interface in the form of a climbing wall was developed (see Figure 2). The 144 climbing grips are equipped with sensors reacting to the touch of hands, feet, knees, and other body parts. The grips are also equipped with red LEDs and can be lit, turning the wall’s climbing area into an irregular and very low-resolution visual display. A number of games were then developed based on a balanced mix of sounds, physical activity, and the sparse visuals of the climbing grips. The absence of traditional computer game graphics and the shift in balance between modalities and media types gives another effect: the games become open for the players to adapt to their own level of physical ability, their familiarity with the games, how they chose to team up, to create variation and so on. In this sense, the new balance between modalities and media types means new freedom for the players.
Sound for Fantasy and Freedom
Figure 2. DigiWall climbing wall computer game interface
Both projects explored questions related to how computer game players could be offered new and unique gaming experiences in terms of freedom and fantasy. In Beowulf the hypothesis was that a shift in balance between eye and ear would invite the players to co-create the game experience and to bring their imagination into play in new ways, compared to traditional, graphics-based games. The studies performed on the game concept showed that, to a majority of players, this was also the case. The DigiWall concept is based on the players’ freedom to use their whole bodies and to play the games by moving over the whole, 15m2 game interface. The absence of a traditional computer monitor also opens up the rules of play in such a way that the users are invited to co-create and adapt the basic gameplays offered to their own needs and desires. In this context it is also important to mention the term “user investment”. Both projects eventually showed that the need to interpret and disambiguate the soundscape of the games was in fact an asset. Both games more or less forced the players to use their own imagination and experiences to flesh out the sounding skeletons supplied by the game’s metaphors. In the Beowulf case, the user investment was expressed as high-ranking in game satisfaction as well as in vivid descriptions of the
gameworld’s environments, materials, temperature, atmosphere, inhabitants and so forth none of which had any visible cues. In the DigiWall case, positive user investment ranked highly both in player satisfaction and the subsequent publicity and commercial success of the project. In these projects, audio is used in a number of ways to create a sense of presence and to link, as closely as possible, the virtual reality of the game to the physical reality of the player. Sound was also used to communicate instructions, cues, clues, feedback, and results from the game to the player. The aim was to create new balances between sound and graphics compared to traditional computer game applications and to explore if and how sound could be used to drive gameplay and to create fun, challenging, rewarding and immersive gaming experiences. The aim was also to use sound to blur the borders between the virtual reality of the game and the physical reality of the player. In both cases, game metaphors were chosen to match the gameplays and to present as many possibilities and large design spaces as possible for the sound designers. Here follows a brief description of how sound was implemented in the two projects.
37
Sound for Fantasy and Freedom
Ambience and background to bridge the reality Gap Physical environments are (almost) never silent. Air, water, objects, creatures and machines around us all more or less make sounds. The absence of sound is unnatural and scary; it is an auditory counterpart to pitch black. Sounds are the signs of presence, life and function. By adding just a very soft sound of moving air, an otherwise dead and detached game environment can come alive. If the sound is well designed, it is possible to create an experience where the game-generated sounds blend with the sounds from the gamer’s physical environment, creating an inseparable whole. The gap between the realities closes. Ambient sounds can be strong carriers of emotion and mood. They share this ability with music and the fact is that the border between the two is more and more often blurred by film and game sound designers (Dane Davis, cited in Sonnenschein, 2001, p. 44). Carefully “composing” an ambient or background sound can serve several purposes at the same time. It can create a sense of physical presence, it can set the basic mood and it can communicate emotion and arousal. In the Beowulf game, the ambient sounds were the sound of air softly flowing through the gameworld’s system of caves and tunnels. The sounds had a slight amount of reverb added to create a sense of volume in the caves and the reverb was removed for tunnels. The ambient sounds were also deliberately freed as much as possible from musical components such as pitch and rhythm: We wanted to give the players as much freedom as possible to use their own imagination, not influencing them in any direction defined by us more than necessary. Most of the DigiWall games use music tracks as ambient and background sounds. In this case the purpose is the opposite. Music is used to set the basic mood of the games and to encourage physical activity in the players. The music is designed to communicate subconsciously with the
38
players and, for example, “whisper” that speed is increasing or that time is running out and you must hurry.
sound Effects and Music for cues and clues Often game designers want to encourage the players to go in certain directions or to take certain actions. By carefully planting sound effects and/ or music, the player can be guided, inspired or even intentionally misled. Beowulf uses a large number of natural sounds to warn the player of potential dangers such as predators, bottomless holes or boiling lava. The DigiWall games use music and sound effects with musical properties to guide attention in certain directions on the wall. One example is the game Catch The Grip, in which the direction from the last grip caught to the next to catch is represented by a series of notes. The length of the series tells the physical distance on the wall. The panning of the notes in the loudspeaker system signals the direction left/ right. In the game Scrambled Eggs, sound effects with a falling pitch denote the movement of “eggs” falling from the top of the wall towards the floor.
speech, Music and sound Effects for Information and Feedback Many sounds are emotional and meant to create and communicate mood and presence. Other sounds are meant to convey cognitive information about rules, scores, results and so forth. Speech is, of course, very versatile and useful in this case. It is very effective to have a voice read the initial instructions for a game, especially if it is a game with relatively simple gameplay and few rules. The same is true for scores and results. Who won, the left or right team? How many points did you score? To have a voice read these results creates a strong feeling of presence and makes the game come alive. One drawback with speech is, of course, language. For example, Swedish voice-
Sound for Fantasy and Freedom
overs in a game do not make very much sense in the UK. As with text, it is necessary to have localized versions and this quickly starts adding cost in terms of computer memory, coding, development time, and other resources. But then again, sometimes it is worth it. In the DigiWall games serving as an example here, speech is used as introduction to all games. A majority of the games also present scores and results using speech. The DigiWall game interface is equipped with two buttons, so the players can select one of two available languages. A danger with speech is the risk of wearing out often-repeated phrases. It is therefore useful to give the players the option to skip, for example, instructions when they are no longer needed. Music and sound effects can also serve as carriers of information, albeit not as clear and unambiguous as speech. This is not an innate disability though, but rather an effect of the way we use music and sound effects. Rhythm, for example, can be used to convey semantics just as well as any speech: what is required is simply to learn the system (Morse code, for example). One of the advantages with sound effects and music is that they are not limited by language, but are more universal. This can of course be used in many ways. In the Beowulf game, each new round starts with a short, horn melody, as if it were announcing the approach of the king’s ambassador. The players learn very quickly what this signal means and, since it is very short, the risk of becoming bored with it is minimal. Beowulf also uses pure music to signal success and failure. Success is signaled by a short triumphant brass fanfare and failure is signaled by a short funeral march. By carefully selecting the metaphor aspect of a game’s design, tremendous opportunities to create sound effects for feedback and information can be opened up. By placing the game in an environment (metaphor) that the players are likely to have some kind of relation to, the designer can choose sounds for feedback and information that are natural in that environment. Using natural
sounds that the players can immediately relate to can greatly enhance the gameplay aspect of the same game as well as create the sought-after sense of presence and immersion. The DigiWall’s game Scrambled Eggs uses the sound of broken eggs to signal points lost and the sound of an egg rescued in the palm of your hand to signal points gained. In Beowulf, if the player enters a forbidden game tile, the sound of a scream receding down a hole together with the sound of falling rocks signals life lost. When this is followed by the funeral march, failure and the end of the game are obvious to anyone, without the need for speech or text.
FUtUrE rEsEArcH DIrEctIONs It is often said that sound is still underused and that audio is a media type with potential yet to be unleashed. In order to free this unused potential, research and development efforts must be carried out on several parallel fronts. We need to develop more in-depth knowledge about auditory perception and how heard experiences affect users of computer games and other interactive systems. This also implies richer taxonomies and more developed languages for writing about, talking about and reflecting over this new knowledge and making it useful in wider contexts. Furthermore, a number of current ideas and traditions in the field must be challenged and a set of updated ideas must be developed. Ambiguity and wider interpretation spaces treated as design assets rather than problems in the design of interactive systems is one example. Another example is when simple efficiency metrics for player enjoyment are replaced with more complex systems for the design and evaluation of computer game experiences, such as the GameFlow concept. Finally, new technology that can carry and realize the new knowledge and ideas must be developed. This includes technologies for procedural audio (see Farnell, 2011; Mullan, 2011 for further descriptions of this technology)) and systems for dynamic
39
Sound for Fantasy and Freedom
simulation of room acoustics and acoustic occlusion and obstruction, just to name a few.
cONcLUsION Sound is a complex stimulus and it is only in recent years that science has started to understand auditory perception in any depth. Much of the knowledge and practice in sound design for computer games and other interactive applications is based on experience and anecdotal evidence. But the awareness of sound’s potential and scientifically-based knowledge in sound design is slowly increasing. This is not only true in the computer game industry, but in industry and society in general. The implications of the fact that our ears and our eyes complement each other are slowly beginning to have an effect. Graphics alone gives one type of experience: sound alone gives another type of experience, and graphics plus sounds gives new and unique experiences. By working with the balance of ears, eyes and other senses and human abilities, new opportunities emerge for the computer game designer. The Wii, Dance Dance Revolution and DigiWall are just a couple of examples of this. Sounds in the physical reality of our bodies are the results of physical events in that same reality. Our hearing is designed and “hardwired” to constantly scan and analyze the soundscape surrounding us and react rationally to the sounds heard. Most of the time this is done subconsciously and our hearing can therefore be described as, to a large degree, intuitive, emotional, or pre-cognitive. The soundscape reaching our ears demands interpretation and disambiguation in other ways than the visual stimuli reaching our eyes. This need to interpret and disambiguate can be turned into a great asset in computer game design. A game with a well-designed, rich, and varied soundscape will play on the user’s intuition and emotions: the game will be immersive and give fun and rewarding gaming experiences.
40
How we interpret a sound depends on, and draws from, our previous personal experiences. Well-known sounds will spawn a myriad of pictures in our inner, mental movie theaters. Unknown sounds can create both confusion and excitement. Working in parallel with the gameplay and the metaphor aspects of computer game design, and making sure that the two match and support each other, is a powerful way to find and design the sounds that build the total soundscape of the game. By working in parallel with and carefully balancing the graphics and the sounds of a computer game the users’ bodies and fantasies can be set free, creating unique, immersive, and rewarding gaming experiences.
rEFErENcEs Association for Computing Machinery. (2010). ACM computing classification system. New York: ACM. Retrieved February 4, 2010, from http:// www.acm.org/about/class/. Avanzini, F. (2008). Interactive sound . In Polotti, P., & Rocchesso, D. (Eds.), Sound to sense, sense to sound – A state of the art in sound and music computing (pp. 345–396). Berlin: Logos Verlag. Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sound. London: MIT Press. Cowley, B., Charles, D., Black, M., & Hickey, R. (2008). Toward an understanding of flow in video games. ACM Computers in Entertainment, 6(2). Csíkszentmihályi, M. (1990). Flow: The psychology of optimal experience. New York: Harper Collins. Dance dance revolution [Computer game]. (2010). Tokyo: Konami. DigiWall [Computer game]. (2010). Piteå, Sweden: Digiwall Technology. Retrieved February 10, 2010, from http://www.digiwall.se/.
Sound for Fantasy and Freedom
Ekman, I., Ermi, L., Lahti, J., Nummela, J., Lankoski, P., & Mäyrä, F. (2005). Designing sound for a pervasive mobile game. In Proceedings of the ACM SIGCHI International Conference on Advances in Computer Entertainment Technology,2005. Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Friberg, J., & Gärdenfors, D. (2004). Audio games: New perspectives on game audio. In Proceedings of the ACM SIGCHI International Conference on Advances in Computer Entertainment Technology2004, 148-154. Gabrielsson, A., & Lindström, E. (2001). The influence of musical structure on emotional expression . In Juslin, P., & Sloboda, J. A. (Eds.), Music and emotion: Theory and research. Oxford, UK: Oxford University Press. Gaver, W. (1993). What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology, 5(1), 1–29. doi:10.1207/s15326969eco0501_1 Gaver, W. (1997). Auditory interfaces . In Helander, M. G., Landauer, T. K., & Prabhu, P. (Eds.), Handbook of human-computer interaction (2nd ed.). Amsterdam: Elsevier Science. doi:10.1016/ B978-044481862-1/50108-4 Gaver, W. W., Beaver, J., & Benford, S. (2003). Ambiguity as a resource for design. Proceedings of the ACM CHI Conference on Human Factors in Computing Systems, 2003, 233-240. Hug, D. (2011). New wine in new skins: Sketching the future of game sound design . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Jegers, K. (2009). Elaborating eight elements of fun: Supporting design of pervasive player enjoyment. ACM Computers in Entertainment, 7(2). Jørgensen, K. (2011). Time for new terminology? Diegetic and non-diegetic sounds in computer games revisited . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. The Behavioral and Brain Sciences, 31, 559–621. Larsson, P., Västfjäll, D., & Kleiner, M. (2002). Better presence and performance in virtual environments by improved binaural sound rendering. In AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio. Lecanuet, J. P. (1996). Prenatal auditory experience . In Deliège, I., & Sloboda, J. (Eds.), Musical beginnings: Origins and development of musical competence (pp. 3–36). Oxford, UK: Oxford University Press. Liljedahl, M., Lindberg, S., & Berg, J. (2005). Digiwall: An interactive climbing wall. Proceedings of theACM SIGCHI International Conference on Advances in Computer Entertainment Technology, 2005, 225-228. Liljedahl, M., Papworth, N., & Lindberg, S. (2007). Beowulf: An audio mostly game. Proceedings of the International Conference on Advances in Computer Entertainment Technology, 2007, 200–203. Mullan, E. (2011). Physical modelling for sound synthesis . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
41
Sound for Fantasy and Freedom
Murphy, D., & Neff, F. (2011). Spatial sound for computer games and virtual reality . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Tuuri, K., Mustonen, M. S., & Pirhonen, A. (2007). Same sound – different meanings: A novel scheme for modes of listening. In Proceedings of Audio Mostly 2007 – 2nd Conference on Interaction with Sound, 13-18.
O’Callaghan, C. (2009 Summer). Auditory perception. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy, Retrieved January 24, 2010, from http://plato.stanford.edu/archives/ sum2009/entries/perception-auditory/.
(2010). World of warcraft [Computer game]. Reno, NV: Blizzard Entertainment.
Parker, J. R., & Heerema, J. (2008). Audio Interaction in Computer Mediated Games. International Journal of Computer Games Technology, 2008, 1–8. .doi:10.1155/2008/178923 Polaine, A. (2005). The flow principle in interactivity. In Proceedings of the Second Australasian Conference on interactive Entertainment. Reid, J., Geelhoed, E., Hull, R., Cater, K., & Clayton, B. (2005). Parallel worlds: Immersion in location-based experiences. In CHI ‘05 Extended Abstracts on Human Factors in Computing Systems. Sengers, P., Boehner, K., Mateas, M., & Gay, G. (2008). The disenchantment of affect. Personal and Ubiquitous Computing, 12(5), 347–358. doi:10.1007/s00779-007-0161-4 Sengers, P., & Gaver, B. (2006). Staying open to interpretation: Engaging multiple meanings in design and evaluation. Proceedings of the 6th Conference on Designing Interactive Systems, 2006, 99-108. Sonnenschein, D. (2001). Sound design: The expressive power of music, voice and sound effects in cinema. Studio City, CA: Michael Wiese Productions. Sweetser, P., & Wyeth, P. (2005). GameFlow: A model for evaluating player enjoyment in games. ACM Computers in Entertainment, 3(3).
42
ADDItIONAL rEADING Altman, R. (Ed.). (1992). Sound theory sound practice. New York: Routledge. Boehner, K., DePaula, R., Dourish, P., & Sengers, P. (2005). Affect: From information to interaction. In Proceedings of the 4th Decennial Conference on Critical Computing: Between Sense and Sensibility. Brown, E., & Cairns, P. (2004). A grounded investigation of game immersion. In CHI ‘04 Extended Abstracts on Human Factors in Computing Systems. Juslin, P., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford, UK: Oxford University Press. Kaptelinin, V., & Nardi, B. A. (2009). Acting with technology: Activity theory and interaction design. Cambridge, MA: MIT Press. Norman, D. A. (1988). The design of everyday things. New York: Basic Books. Polotti, P., & Rocchesso, D. (Eds.). (2008). Sound to sense, sense to sound: A state of the art in sound and music computing. Berlin: Logos Verlag. Schafer, R. M. (1977). The soundscape: Our sonic environment and the tuning of the world. Rochester, VT: Destiny Books. Sider, L., Freeman, D., & Sider, J. (Eds.). (2003). Soundscape: The School of Sound lectures 1998 – 2001. London: Wallflower Press.
Sound for Fantasy and Freedom
KEY tErMs AND DEFINItIONs Auditory Perception: The process of attaining awareness or understanding of auditory information or stimulus. Avatar: A controllable representation of a person or creature in a virtual reality environment. Feedback: Output from a computer game to inform the user of various changes in game state. Flow: The mental state of operation in which a person is fully immersed in what he or she is doing by a feeling of energized focus, full involvement, and success in the process of the activity.
Gameplay: The rules and mechanics defining the functionality of a computer game. Game Metaphor: The embodiment of the virtual environment comprising the game world. Immersion: Deep mental involvement. Pervasive Game: A computer game tightly interwoven with our everyday lives through the objects, devices and people that surround us and the places we inhabit. Suspension of Disbelief: A silent agreement between an audience and an entertainment producer in which the audience agrees to provisionally suspend their judgment in exchange for the promise of entertainment.
43
44
Chapter 3
Sound is Not a Simulation: Methodologies for Examining the Experience of Soundscapes Linda O’ Keeffe National University of Ireland, Maynooth, Ireland
AbstrAct In order to design a computer game soundscape that allows a game player to feel immersed in their virtual world, we must understand how we navigate and understand the real world soundscape. In this chapter I will explore how sound, particularly in urban spaces, is increasingly categorised as noise, ignoring both the social significance of any soundscape and how we use sound to interpret and negotiate space. I will explore innovative methodologies for identifying an individual’s perception of soundscapes. Designing virtual soundscapes without prior investigation into their cultural and social meaning could prove problematic.
INtrODUctION Simmel (as cited in Frisby, 2002) argues that the exploration and navigation of a space, particularly an urban space, impacts all of the human senses. Equally he suggests that when exposed to multiple inputs of both internal and external stimuli, we make choices, such as movement and interaction, based on the sensory information of a given space (Simmel, 1979). In the design of gameworlds, we must examine this concept of sensory input as both a method of navigation and socialisation.
Within a real world all the senses are exposed to information, sight, sound, smell, and touch. Within a gameworld, we are currently exposed to an overriding visual experience and minimal sound information. There is a deficit of sensory information occurring within this digital world and, as more people move towards gaming and virtual communities, this deficit must be examined. For digital virtual worlds to create a convincing immersive experience with the technology that is available, we must explore sound as well as sight in the construction of gameworlds from a sociological perspective.
DOI: 10.4018/978-1-61692-828-5.ch003
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Sound is Not a Simulation
Thompson (1995) argues that when we enter virtual spaces or communities we leave orality behind: he sees no space for sound within virtual worlds or online communities. It has been prevalent in social and media theory to ignore the experience of sound in a space, whether that sound is produced by human activity or by other natural sources. It is my argument that sound plays a part in the social construction of space, whether real or virtual, either by its presence or absence. Equally, I will argue that sound which is produced by objects through reverberation and other acoustic qualities can affect how we navigate or place meaning in a space. I will also explore the process of control which is dominating research into the Soundscape, this is primarily due to an increasing awareness of the side effects or apparent dangers of loud sounds on people. The need to monitor and control sound in the environment has become a predominant research focus within soundscape studies. Sounds within urban centers are increasingly seen as a by-product of industry and technology: this has led to the creation of noise policies within a number of countries. Sound is increasingly seen as a measure of sound pressure levels rather than being seen as a social structure (Blesser & Salter, 2009, pp. 1, 2). This is significant for sound designers who wish to gather data on the meaning of sound within society. If a sound designer considers sound only in relation to volume, noise, or other objective criteria they might ignore the meaning sound beyond its output level. In looking at the social and perceptual aspects of sound we are constructing what Feld (2004) would call an acoustemology of the sound world. He increasingly acknowledges that soundscape studies, which react to human interventions to the natural soundscape, ignore cultural systems which develop as a result of being immersed and surrounded by sound. The game space, or any virtual space which asks a person to become immersed in it, needs to be founded upon an understanding of the sociologi-
cal impact of sound on the individual and society. A game designer must also take into account the more abstract representation of sound that is experienced in art, cinema, and other mediated spaces. There is already a history of the experience of sound through mediatisation (Bull, 2000; Cabrera Paz & Schwartz, 2009; Cohen, 2005; Drobnick, 2004): the difference between these theories and the theory of game sound design is the concept of immersion, interactivity, and simulated reality. What describes a soundscape, who defines the description and what models are used to categorise levels of sound and their meaning? There are no set methods for the study of acoustic ecology or the soundscape from a sociological perspective. I propose an interdisciplinary method which will draw on social theory, media theory, and sound design. In order to explore the soundscape we must incorporate different methods and theories to analyze the social impact of the soundscape, real or virtual on the individual and the group.
tHE EXPLOrAtION OF tHE sOUNDscAPE Some of the earliest documented exploration of the modern soundscape arose from within the arts and modern music composition. Those who practised the art of listening explored the changes in our early soundscape, technology was seen to change our soundscape, but this was not seen as a negative event (Luigi Russolo’s 1913). Luigi Russolo’s 1913 manifesto, The Art of Noise, posited that sound had reached a limit of invention, technological sounds allowed for an “enjoyment in the combination of the noises of trams, backfiring motors, carriages, and bawling crowds”. He argued that in listening and using these sounds as types of music we would create an awareness of the rapidly changing soundscape. In an ever changing technological climate, we would increasingly be exposed to new types of sounds at a faster rate than at any time preceding mechanisation. The
45
Sound is Not a Simulation
soundscape would also play a much stronger part in the construction of music and sound art with the introduction of audio recording devices. However over time this modern soundscape became less a usable musical landscape or instrument and more like an environmental pollutant (Bijsterveld 2008). Bijsterveld (2008) argued that technology became a symbol of the loudness and unhealthy character of the urban soundscape. Schafer’s examination of the soundscape in the 1960s was guided by an awareness of the increased levels of sound within urban centers as well as (Cohen, 2005; Schafer, 1977). He argued that the spread of industrialisation, polluted not only such physical spaces as water and land but the hearing space, leading to an alteration of the perceived space for animals and humans. Sound, or what was now being called noise, was increasingly seen as a negative side effect of industry. Schafer’s research focused on a reification of past soundscapes and the preservation of soundmarks (similar to historic landmarks). The World Soundscape Project, established by several people including Murray Schafer, in the late 1960s, proposed a practice of recording the landscapes of different spaces around the world. They wanted to record and archive certain landscapes they felt were being transformed as a result of a noisier soundscape. These recordings would then highlight the effect that increased sound levels were having on certain spaces. Although Schafer brought the sound world into the equation as a factor within industrial change, very little focus has been on the positive aspects of contemporary soundscapes or their social meaning. Human activities produce sound, we are also embedded in sound, space becomes revealed to us through sound and, as spaces become more built up or newly transformed our ability to see beyond our immediate space becomes limited. Blesser and Salter (2009) argue that sound allows us to envision our space; a space becomes revealed to us through its “aural architecture”. They examine the ability of humans to restructure their sound
46
environment, to act back on loud sound spaces and argue that because constructed spaces remain static it is through social behaviour that we have the ability to modify our sound arena.
the Designed space De Certeau (1988) argues that the city is a representation of political economy, historical narrative and social forces of capitalism and while architects and planners see the whole, the vista, the individual who lives and works in the city will never see it in totality. He suggests that we walk the city blindly, reconstructing our own narratives of space. De Certeau implicates sound without referencing it as a way to see an invisible whole. He argues against the rationalising of the city or functionalist utopianisms, allowing for the transformation of space by those that live within it. Adams et al (2006) suggest that “a soundscape is simultaneously a physical environment and a way of perceiving that environment” (Adams et al., 2006, p. 2). They see the soundscape as a construct through which we will navigate. Adams et al. and de Certeau understand that the construction of space and our ability to navigate through it is dependent on more information than the visible. In recreating the soundscape in digital landscapes, the designer pays homage to the real world she tries to replicate: she codes, intentionally or not, the universalisms of design into the construction of her virtual space. The space is built to replicate the reflection of sound against object, as if this is the only way sound moves through space or equally the only way we perceive it. She is equally guided by the epistemology of sight as the “the epistemological status of hearing has come a poor second to that of vision (Bull & Back, 2004, p. 1). Like any other visual medium, the design makes assumptions on how sound should be perceived in any constructed space. This functional approach only measures our potential physiological responses to sound. It does not explore the individual or community experience of sound or
Sound is Not a Simulation
the subjective and immersive experience of time and space through either real world listening or mediated listening. Augoyard and Torgue (2006) theorize that sound may guide social behaviours: they argue that no sound event can be removed from “spatial and temporal conditions” and that sound is never experienced in isolation. They have adopted qualitative approaches to the exploration and analysis of sound in urban spaces. Augoyard and Torgue argue that the term “soundscape” is tied to a certain empirical model of measurement which may be too narrow in its meaning, belonging more to a textual rather than observational critic of “acoustical sources” and “inhabited spaces” (2006, p. 4). They suggest that the term sonic effect better describes the experience of sound within space. It breaks the analysis of sound into three distinct fields: “acoustical sources, inhabited space, and the linked pair of sound perception and sound action” (Augoyard & Torgue, 2006, p. 6). Each of these fields are required in order to examine the ubiquitous nature of the soundscape as a process which impacts on social, physiological as well as psychological behaviour. What is most difficult to analyse, but fundamental to the soundscape design is the subjective experience of sound. When constructing a virtual landscape, the primary consideration is—and for a number of games it is the only goal—the reaction time of game player interaction: if I shoot, will I hear the sound of gunfire instantaneously?
MEDIAtED LIstENING The numbers of people turning to electronic devices (mp3, walkman, ipod, mobile games, and laptops) as a means of shutting out real world sounds has increased exponentially in the last decade (Bull, 2000). The personal headphone has played a part in reconfiguring the landscape, allowing us a choice in how we perceive our world and how we are perceived as taking part in or stepping out
of real time and space. Thompson (1995) explores the change in perception of “spatial and temporal characteristics of social life” (Thompson, 1995, p. 12) due to the development of communications technology. He recognises that the role of oral traditions has changed: face to face contact is eliminated in favour of virtual communications. Bull (2000) argues that mediated listening is now used as a means to escape the “urban overload” of our cities and suggests that the use of mobile technology for listening to the radio or to music collections affords a breather or a meta-physical removal from the real world. How we shift between these acoustic environments, and how our personality and behaviour may be manipulated, both by our apparent control of one type of space and our lack of control over another, may affect social patterns of relating to each other and the world we inhabit.
sound control Research has shown that the reasons for putting on headphones are motivated by numerous factors, such as (Bull, 2000). Erving Goffman’s (1959) theory of civil inattention addresses this concept. He examines the unwillingness of the individual to be seen in public spaces and explores the notion of contexts structuring “our perception of the social world” (as cited in Manning, 1992, p. 12). Goffman suggests that social spaces are framed and, within these frames, we act a certain way. How we act is perceived as being the acceptable or normal behaviour for those spaces and he uses the example of the elevator space: when travelling in such a confined space, the “normal” behaviour is to look anywhere but at another person’s face. Mediated spaces contain their own framed context. When we engage in a fully immersive experience, such as gaming or mediated listening, even if this happens in a public space we are not seen to be ignoring the real world. We are seen to be engaged within another space, one which requires our full attention.
47
Sound is Not a Simulation
Bull’s (2000) research also highlights how the perception of time becomes distorted when listening to personal headphones. For some, listening is required to manage the boredom of “slow time”. It is also used to negotiate a path through space, a path which is experienced through a virtual soundscape or soundtrack and this alters the listener’s perception of time. Bull’s studies have revealed that time is almost always a reason for engaging in mediated listening. This concept of controlling space and time, through mediated listening, suggests that the senses required for listening extend beyond simply hearing. If the experience of listening alters the perception of time and space then reality also becomes less fixed and more flexible. Lefebvre (2004) argues that time and the everyday life exists on multiple levels and that the experience of time contains a value coding, depending on the task being done. He suggests that time is both fundamental and quantifiable and that quantifiable time is an imposed measure which is based on the invention of clocks and watches. When engaged in mediated listening (radio, sound art, audio books, and games, for example), time may be re-appropriated. We are experiencing what Schafer called a schizophonic shift in perception, where, by means of mediated listening we exist between two time zones, one created by our imagination and the other by the world around us. Devices, such as stereo headphones, mobile phones and portable games, which we use to pull us out of time, also act as filters: they give us the choice to decide what it is we hear and do not hear. Equally, we can choose to hear both spaces, real and mediated, so that we do not become so distracted in our mediated listening that we walk under a car. The increased use of mediated listening devices, particularly in public spaces, might be seen as an adaptation to the increase in sound levels within urban spaces. It could also be as a result of the sheer diversity of sounds that exist within our world, most of which have no meaning or relevance in our day to day lives.
48
There are massive assumptions being posited by researchers into the field of noise or increased sound levels. Schafer and the World Forum for Acoustic Ecology argue that increased sound levels are creating a rift between the natural world and humanity’s relationship to it. They support research which is concerned with the “preservation of natural and traditional soundscapes” (Epstein, 2009). This focus on the conservation of older or traditional soundscapes ignores the “everyday urban situations impregnated with blurred and hazy...sound environments” (Augoyard & Torgue, 2006, p. 6).
NOIsE: tHE sIDE EFFEct OF INDUstrY The term noise is often used to describe unwanted sound or sound that, in its make-up, carries certain characteristics that define it as negative. Schafer’s early work on the soundscape explored ways of quantifying noise levels. One of his early explorations into the soundscape used a system of tables which measured the amount of complaints made against certain noise sources and this project was carried out in several countries. Schafer’s research concurred with what most people would suspect: in most modern cities, traffic is seen as a pollutant both for carbon emissions as well as sound levels. Yet in Johannesburg, South Africa, we see a very different picture in relation to what is seen as noise and what is accepted as city sounds (Schafer, 1977, p.187). The vast majority of complaints for sounds considered intrusive or annoying were made against the increased sounds of animals and birds within the city: unusually, the smallest numbers of complaints were directed towards traffic. It could be argued that one type of sound is seen as normal and part of the everyday urban while the more natural sounds no longer fit with the concept of an urban landscape.
Sound is Not a Simulation
sound as side Effect One of the areas in which noise pollution has focused on within the urban soundscape is that of the motor vehicle, which is seen as a major contributor to increased sound levels within cities and towns. Bijsterveld’s (2004) historical analysis of noise laws, highlight the increasingly negative public opinion directed towards the motor vehicle since the turn of the century. The city was increasingly seen as a space which had once held silence and that this silence needed to be regained, either through the removal of motor vehicles or severe noise laws. Yet, over the decades, a relationship has developed between motorists and the sounds of their vehicles, an idea which is being explored by Paul Jennings. Jennings’ (2009) research focuses on the positive aspects of sounds produced by cars, from the sound of the door shutting, to the sounds of a petrol engine. He explores the various ways of simulating the sounds emitted by cars; studies have revealed that drivers have developed a relationship to the sounds produced by cars such as power, control, and drivability and so on. Simultaneously further research has shown that car sounds exterior to the vehicle are an important factor to visual orientation, particularly to the blind, hard of sight and cyclists (“Fake Engine Noises” 2008). The sound of a vehicle has become an inherent part of the urban soundscape and it is used to measure distance, speed, and time. In virtual terms, this association to a vehicle’s individual soundscape has new meaning. If, for example, the hybrid car (electric and petrol and very quiet) becomes more prevalent in society, will we change the perceived soundscape of the urban space? For decades, we have associated the sounds of cities with vehicles and they have become a significant part of the urban soundscape, an ambience that defines the metropolis. If this sound disappears what effect might this have on our relationship to both the city and its transport?
Our relationship to the Modern soundscape Industrialisation has had a major impact on civilisation, and the association of sound to production is seen as implicit. If we introduce noise abatement laws to tackle sound levels we ignore the relationship that has evolved between humans and the sounds of mechanisation and industrialisation. In our concern for the soundscape and its possible effects on humans we may change our soundscape to create a perceived better sound level or quality, but ultimately we might also change the relationship people now have to cities or industrial centres. It is necessary to fully understand the relationship that groups and individuals have to the urban soundscape, specifically the sounds that are reminders of its urbanity, economy, and population as well as its activities. MacLaran (2003) argues that the urban space is increasingly becoming partitioned and that the individual increasingly tries to locate a private space in which to claim ownership. With geographic boundaries becoming increasingly part of the urban space, defined by economics, politics and as a reaction to overpopulation, the urban space is increasingly seen as a “mirror of the societies that engender them” (MacLaran, 2003, p. 67). Yet Thompson (1995) suggests that a changing landscape is part and parcel of the urban metropolis, people have and will adapt to further architectural or cultural shifts within urban areas, creating new cultures and social movements that stand alongside these changes to the landscape. What is not considered by these researchers is that a city is more than its visual or geographical cues. Thompson argues that within the media, particularly the internet, new social structures will form within virtual spaces, and these will, to a certain extent, replace the physical world in developing community and place which is increasingly seen as crowded. Yet within mediated environments and the real world there is no real consideration to the soundscape and its importance as a social
49
Sound is Not a Simulation
construct in the formation of identity and society. There is a substratum of symbolic content associated with the visual space; Schafer’s research has created a set of hermeneutics from which soundscape studies may draw. It is necessary to create dialectic on the soundscape, one which poses questions of meaning, noise, control, structure and interpretation. This becomes more significant as urban and governmental policy move towards controlling sound. If we operate on the basis that sound is a set of objects which can be assessed by their levels rather than their meaning, we will construct passive digital soundscapes. While the study of sound through the social and physical sciences have advanced towards exploring sound as a subject, we are gradually moving towards an acoustic epistemology which embraces the ephemerality of sound. It is both sensorial and primary, a subject which needs fundamental and theoretical frameworks which can be realised through methodological research. Unfortunately, in rushing towards categorising sound and its effects, certain policies have been created to simply categorize sound as noise, not understanding the many social contexts which may explain why, “despite successful implementation of noise maps and action plans…there is little evidence of preventing and reducing environmental noise” (“Working Group Noise Eurocities” n.d.). These policies fail to understand that sound has many social contexts and that this means understanding that sound is not simply a signifier of some otherness, an association with a producer; a product or side affect of technology, car sounds, factory sounds etc. What this underlines is that there is a need to explore the control issue which has arisen within soundscape research, if sound is being seen as a negative effect of industry and modernism one which seems beyond the individuals control then we have a concept to explore in virtual soundscapes. The positive act of listening in a virtual soundscape is that the sound can be controlled, be it
50
through volume or interactive means of changing the sound environment. In the visual world of games certain elements are static and the controller cannot change or effect the environment. This is based on the conceptual approximation of reality, (a tree is a tree and must remain so in order to simulate reality). If we introduce ambient sound it too must approximate this idea a gamer can close their eyes to shut out the world, but no one can close their ears. But as in the real world we can create or find spaces of acoustic interest to us, we can in a virtual environment turn of an engine, perhaps a gamer should be able to turn off all engines and close down (or destroy) factories and other sounds they perceive as unwanted in their soundscape. Equally the soundscape should simulate reality, the ambient soundscape whatever that is must be all surrounding and there must be limits to the control of this sound that is if the intention to approximate the physicality of space. I do not propose that we draw attention to the soundscape within games, the more real a soundscape seems, the less a gamer would notice it. Instead we must consider that to increase the perception of immersion the soundscape must reflect or approximate a real world soundscape, rather than being as a “bit part player to the visual star” (Grimshaw & Schott, 2007, p. 2). Ambient sound denotes a sound that surrounds all physical space; it has been defined by some as foreground, middle ground and background sound (Adams, 2009; Schafer, 1977). This three part description of a soundscape lays out sound, within both the virtual and the real world, as an assemblage, one which is created as a result of reverberation, dynamics, levels and acoustics. These three characteristics imply that sound can be split apart to understand its workings, and then reconstructed as a virtual soundscape, that is if we ignore how sound is socially and psychologically perceived. While technology can break sound apart so that we can hear minute elements of the whole, we physically hear sound in its entirety because we cannot shut out sounds; we do not have what
Sound is Not a Simulation
Schafer called “earlids”. We comprehend that sound may be reaching us from particular distances or places, and we make choices in regards to what we consider important sounds to listen to, but we cannot choose to not hear sounds within our hearing range. Equally we inhabit and work in spaces that produce sounds that we have to make meaning from and that we contribute to, our entire lives are spent surrounded by sounds. So how do we make meaning from these sounds and how do we measure that meaning? If we wish to simulate the experience of being within a space whether this space is a war zone a different planet or the North Pole, we must understand that sound is socially and culturally constructed (Drobnick, 2004). For sound design this is paramount, if we wish to create a simulacrum of the real we must understand to what extent sound plays in our navigation both physically and socially of spaces.
IMMErsION AND sIMULAtED rEALItY It is the concept of immersion which guides design within the gaming industry, being seen as the “holy grail of digital game design” (Grimshaw, Lindley, & Nacke, 2008). Graphic design in gaming has evolved through several stages of realism, towards the appearance or “illusion of life” (Hodgkinson, 2009, p. 1). One outcome of this simulation of the real world within digital games can be seen in the film industry. Films are produced which have been based on games: Tomb Raider (West, 2001) and Resident Evil (Anderson, 2002). Equally we have movies which resemble gameworlds and the gameworld concept: Final Fantasy (Sakaguchi, 2001), Aeon Flux (Kusama, 2005) and most recently Avatar (Cameron, 2009). The focus of digital visual game design seems aimed towards an essential realism, but why this search for the most realistic? Early games were less concerned with the realism of the space or the characters and more on the idea of game and
competition, for example Space Invaders (Taito, 1978), Pac Man (Namco, 1980) and Donkey Kong (Nintendo, 1981). Has the goal shifted towards the user having a more connected experience or relationship with the virtual or gameworld? If they space is a simulation of the real world do we engage less with the concept of a game and more towards the concept of being able to relate to the space. Bull and Back (2004) would argue that of the human senses “vision is the most ‘distancing’ one” (Bull & Back, 2004, p. 4), revealing only what is real and what is. The goal has evolved to create a sense of co-presence within film and potentially games; 3D cinema examines the possibility of the image creating a sense of surround and presence, again see Avatar and the new 3D TV from Panasonic. The overall assumption seems to be, that the only way to create a sense of reality within a digitally created world is through the imagery, a kind of simulated panoptic vision. What seems to be forgotten within this quest for immersion is that sound is actually three dimensional and listening is not a simulated experience.
sonic Immersion Sound is inherently physical and we are always immersed in it, even if we focus our listening towards one sonic experience we are still hearing the entire sonic effect of any space. This is then the challenge and the goal for digital game sound designers; to create spaces that accept the whole universality of the ambient space, and be aware of the outside world that will invariably intrude on this design. Therefore sound design must create a sense of displacement or removal from the real, while accepting that the real will equally intrude on the virtual experience. Similarly digital game designers must address the issue of the senses being in their entirety necessary to comprehend a world. Surround sound must then play a part within the design of certain game spaces, for example, first-person shooter (FPS) games. FPS games generally involve a single
51
Sound is Not a Simulation
player navigating through a space; if they are to feel physically immersed the sound must seem all-surrounding. The need for surround sound or immersive experiences must also take into account the physics of sound. Connor (2004) argues that sound is both intensely corporeal, it physically moves us, and paradoxically immaterial, it cannot be grasped. He argues that sound does not simply surround us, it enters us, if loud enough or high enough it can cause pain and damage; it is seen as tied to emotion more so than sight which is seen as neutral. Within social theory sight has overwhelmed the senses; the epistemological status of sight over sound has crossed over to many disciplines including digital game design. In Simmel’s 1886 work Sociological Aesthetics (as cited in Frisby, 2002), he argues that vision gave a fuller expression to the fragmented city, the eye if “adequately trained” perceives all of a space. This merging of all visual signals suggests that we do not see in parts but in total. Simmel saw sound as intrusive to the perfection of the visible world; it was the profusion of sounds that distracted one from the beauty of the modern urban space. Tonkiss (2004) argues that within modern sociology the goal was to flatten the city, to will sound to silence, to order it. Tonkiss suggests that vision is spectacle, whereas sound is atmosphere and she argues that sound offers us a sense of depth and perspective.
sOUND MEtHODOLOGY AND ANALYsIs In order to identify what is significant about a soundscape one must adopt a multi- method approach. One method is soundwalking created by Hildegard Westerkamp and Murray Schafer in the 1970s. Westerkamp’s use of this method involved asking participants to move through an area that was known to them and recording places of significance. These recordings would later become part of radio art works or installations.
52
The soundwalk technique has been adopted by different researchers for numerous projects around the world since the seventies. Most recently Adams adopted the soundwalking method for the Positive Soundscapes Project in 2006. The purpose of the research was to develop a holistic approach to studying the soundscape. The project invited people to engage in listening to their soundscape and then identify sounds of importance. Adams adopted Schafer’s terminology of keynote sounds, soundmarks and sound signals as analytical models in which to assess the data. This method in itself does not clarify contextual or social meaning so we must explore other qualitative approaches such as field research and interviews, and deciding which qualitative paradigm will best suit this investigation. Traditional sociological methods should play a part in the exploration of meaning and construction of sound. In Adams research, when “prompted to consider spatial layout” (2009, p. 7) the respondents tried to identify the sounds that they heard in the same way they would objects. This proved problematic as the participants had no vocabulary to describe the soundscape or its meaning. Simply focusing on identifying sounds and their meaning may limit the explanation or interpretation of cultural or social meaning. Therefore other methods must be incorporated into the exploration of the soundscape that enable the researcher to comprehend the ubiquity of the sound environment. Interviews both structured and open ended allow for the retrieval of information beyond the specifics of description. Adopting a soundwalking method alongside personal narrative interviews or life history interviews can connect meaning to hearing. Allowing a participant a longer time to consider their sound environment, such as having them notate or record over a period of time, may reveal anamnesis experiences. This is where a sound can evoke a memory or sensation of a past experience. This is not as subjective as it may seem, the sound track in films—particularly the leitmotif—are
Sound is Not a Simulation
often used to refer to a previous part of the film causing a kind of anamnesis in the listener (Augoyard & Torgue, 2006; Chion, Gorbman, & Murch, 1994). Sounds become tied to experiences and therefore have a meaning beyond a description of sound and effect. Our participant, in having a longer time to record or document these kinds of experiences will allow for a further insight into what certain sounds can trigger. Riessman (1993) argues that in the act of telling there is an inevitable gap between the experience and the telling: the sound methods allow for the participant to embody themselves in the narrated space, as they are situated in the environment to which they are referring to. What these combined methods may reveal lie not in how we listen to sound but what we hear when we actively think about listening. That in itself may highlight how much active listening happens in a person’s life and if it turns out that there is, quite a lot heard in an individual’s day to day experiences we must consider sound more actively in the design of digital soundscapes conversely, if we reveal that sound plays only a minor part in a person’s relationship to his environment we may have to re-think how sound, beyond music, should be part of a digital game space. Sequeira, Specht, Hamalainen, and Hugdahl’s (2008) research on the hearing impaired noted that clarity is essential in picking up the minutiae within the complexity of sounds, as issues can occur when ambient sound levels are too high. The comprehension of language becomes more difficult when we try to distinguish dialogue which is surrounded by high levels of background sounds. Equally, Sanchez and Lumbreras’s (1999) research in the design of digital gameworlds for the blind highlighted the need for 3D audio interfaces as a method in which to navigate space. They argue that users, when deprived of the sense of sight, are able to recognise spatiality and “localise specific points in 3D space, concluding that navigating space through sound can be a precise task for blind people” (1999, p. 1).
For digital game sound this does not necessarily seem an important issue, the ambient soundscape rarely includes high levels of conversational sound and game designers rarely design for the blind. Yet in cities and urban centers, vocal sounds and directional sounds are one of the dominating sound and spatial characteristics of the environment. There is interplay between vocal sounds and architecture; they will resonate at different frequencies depending on the construction of the space. Thus understanding how people distinguish sounds, such as vocals amongst a variety of other sounds may be relevant if a designer wishes to include this soundtrack of reality into sound design for gaming. Equally we can make choices in what direction we choose to go to based on acoustic as well as visual information. This could be explored through a series of listening projects whereby a focus group must listen to different sets of sounds while trying to engage in other activities. If the level of information and not volume is increased over time, one could ascertain how much information we can process simultaneously while trying to complete tasks.
contextualizing Game space Understanding that there are a variety of ways to experience the gameworld is a necessary condition to deciding what soundscape should or could be placed within this virtual space. What is the operant behaviour of the gamer, what is the participation level and how much control in the gameworld does the player have? Finally how does one contextualise oneself within the world? Grimshaw and Schott (2007) noted that there was a feedback “for operant behaviours (panting breaths is a good indicator of the player’s energy level) (2007, p. 475). In examining FPS games, we see that sound is predominantly responsive and reactive, rather than passively situated in the background, and this is a key component to this type of gaming. We may hear the dying groans of another wounded warrior in FPS games, but we
53
Sound is Not a Simulation
do not hear the voices of hundreds of men dying or in pain, a sound that would exist in a real war. Our experiences of explosions are controlled lest we be deafened, but where is the artillery constantly humming over the horizon, the perpetual whump, whump of helicopters marking or spotting territory? Jørgensen (2008) argues that symbolic sounds are key components in Player V Player games, more so than background. For her, game context is key: what kind of game is it and what type of space does the avatar inhabit? Jorgensen’s research focuses on the situation oriented approach which interprets sounds in reference to events, rather than object orientated perspective. She argues that the gamer must understand the rules of the system in order to both manipulate it and understand that it “can affect individual actions” (Jørgensen, 2008, p. 2). This concept reflects Blumer’s (1986) symbolic interactionist approach, where humans “define each other’s actions instead of merely reacting to each other’s actions” (p. 79). The other person in this case is the gameworld. There may be several schools of thought on sound within gaming. If the sound is too real, would it terrify the gamer, distract them, annoy them, or just confuse them? Both Schafer and Smith have looked at the history of the soundscape and analysed the possible cause and effect of certain soundscapes on the human condition (Schafer, 1977; Smith, 1999, 2004). However, a new research model is needed to identify how certain sounds trigger emotive or psychological responses, particularly to the soundscape that is featured in a large number of games: war sounds. For a conclusive multi-method we must first decide what is actually needed in a digital game space. For example, if the game has no point of free space where the player can actively listen to their environment, is it necessary for a detailed soundscape? This question may be answered by the questionnaire approach; a series of semi structured interviews may reveal how people hear a space that they only traverse through. This type
54
of interview allows the interviewer a certain level of control which directs the interviewee down particular paths. Equally it allows the interviewee to expand on themes outside the limits of the question, which can reveal unexpected information (Bryman, 2008).
the Mapped soundscape If we were to map the soundscape of a city where would we start? Would we first categorize it, a heading from loudest to quietest or might we break it up into specific human sounds, crowds, individuals, groups of five or more, age related or gender specific? Females have a different tonality to their voices compared to men, children have higher pitched voices to adults, and teenagers are louder than everybody. Then we refer to acoustics, how different do people sound on a pedestrian street as compared to a car filled street or even a park? We can then examine the architecture of the space, the height of the buildings their position and how this might change the reverberant space. Then we could move on to city noises, for example trams running through a city. This would sound at a very low but continuous level, marking specific territories within a city at particular times. Then there is the multitude of cars, trucks and vans and the occasional house alarm, fire alarms, fire trucks, police cars and ambulances sounding off regularly throughout the day, reminding us of sickness, danger and intrusion. The continuous hum of traffic that never quite stops, but it shifts in decibel level throughout the day and sits alongside a cacophony of beeping horns. There is the opening and shutting of thousands of doors onto streets, which might include the hiss of sliding doors, the beeping signals at pedestrian traffic lights, or a robotic voice counting down till we can cross the street. These sounds are part of the ambient soundscape of most cities, but they are still just a small part of the overall sound. Maybe we think we have not heard the sound of a million footsteps pounding a street—it is such a
Sound is Not a Simulation
huge part of the murmur of a city that we no longer distinguish it from the background noise—yet if it stopped… we would notice the silence. The street hawkers and homeless, a perpetual cry of, “What do you want?”, “Can you give?”, “Have you got any change?”, “Will you buy?”, Specific sound markers in Dublin are, “flowers get your flowers, get your fruit, get your veg, paper, evenin’ paper, any money for a hostel”. These oral announcements could also be considered part of the ambient sound track of the city. They would in fact be the soundmarkers for particular urban spaces. This multitude of sound still leaves out the sounds related to the outside or inside acoustics created by structures and objects such as buildings, cars, trains or metro stations. If one moves to what urban dwellers consider the apparently quiet soundscape of the natural world, we find a multitude of sounds connected to the society of animals, from mating cries to hunting calls as well as the sound of eating and foraging, flying, climbing and running. There is the ambient sound of wind through trees, grass or wood bending, rain storms, flowing rivers, rippling water, small streams, and all of this situated in one small area. Now relate this minimal soundscape to sounds within gaming. Such a comparison might lead us to ask how we can experience a real, or significantly close to real, soundscape in a virtual world if the sound design is limited to “character or interface sounds” (Grimshaw & Schott, 2007). This description might be considered too linear and too connected to time and human activities. The ability to comprehend space and the sounds within it are not based entirely on the ability to hear, it is also based on the cultural and social context of both the sounds we hear and our interpretation of them. Blesser and Salter (2009) would argue that we cannot interpret and construct sonic architecture without accepting the cultural relativism of the sensory experience. Therefore in my description of the urban rural soundscape I cannot claim to be objective; my choice of sounds relate to my experience of par-
ticular spaces, my interpretation of these sounds lie in my education, upbringing, and the socially constructed meanings that are inevitably tied to certain sounds. We again return to what Augoyard and Torgue (2006) would consider the inherent problem of describing or analysing a soundscape: the subjectivity issue. If each group or individual perceives sounds differently, how can we generalise when constructing a soundscape? This argument could cross over to many disciplines, within the arts it is generally understood that a work of art is best understood by the artist who made it. Yet the artist accepts that their work will be interpreted differently by every person that sees it. So what makes a great work of art? Is it tied to cultural phenomena, can a particular work be representative of a particular time? Do people understand the meaning because it resonates with what is happening at a particular moment, globally, politically and socially? It is not enough to dismiss understanding how the individual experiences sound because it is subjective, we must explore how people understand sound in particular places at particular times and then look for similarities between other places and people. Then perhaps we can generalise in the construction of digital sound design based on data that reveals particular generalities.
cONcLUsION The interpretation and meaning of sound alters in relation to personal, historical and cultural experiences, as well as the context of our auditory experience. The physicality of sound can alter our perception of the space in which we hear it, expanding or contracting the landscape and shaping our psychological and sociological response to place. If we wish to construct a digital soundscape which simulates reality and creates the sense of immersion, a study of the sociological impact of the soundscape must be undertaken. However the
55
Sound is Not a Simulation
consideration of what defines reality and experience must also be explored. As mentioned earlier in the text the simulated soundscape of war games are not based on the real soundscape of a war zone, but on a sound designer’s definition of war sounds. What definition of reality are we measuring this soundscape of virtual worlds against, and how real do we want our virtual environments to be? Most of the environments we experience within games are spaces which we may never experience in reality. Our experience of certain soundscapes may be understood in relation to other media representations: television, Internet and cinema. The digital game soundscape then becomes a construct of definitions rather than a simulated reality. If we are trying to simulate a sense of reality in gaming we must consider how real we wish to go. Grimshaw (2007) argues that it is only through the audification of gaming that we actually simulate the idea of immersion. This implies that sound in itself provides a sense of reality whether or not the sound is based on reality. So what is it about the physical aspects of sound that create a sense of being elsewhere? It is not enough to suggest that because sound is physical it creates a sense of immersion. Sound must be understood beyond the physical, a language must be developed as a result of empirical research which explores the sociological phenomena of sound. Thibaud (1998) suggests that we must create a “praxiology” of sound from the natural soundscape before we construct artificial soundscapes. He also argues that beyond just meaning and interpretation, sound can and does affect our choices; we pick up “information displayed by the environment in order to control actions (such as locomotion or manipulation) […] thus, the environmental properties and the actor/perceiver activities cannot be disassociated: they shape each other” (Thibaud, 1998, p. 2). Sound can be both active and passive and this will affect our response to it. Driving a car, for example, might be considered a passive produc-
56
tion of sound, we have no choice in the sound the engine makes, but beeping a horn is active sound making. Thus sound production has an implicit message the interpretation of which might be subjective. Whether it is perceived as positive or negative can depend on the intention. It may also affect behaviour, do we choose to move out of the way of a vehicle or allow it to stimulate anger or other emotive responses. This active sound does not simply reference the acoustics of space or a description of noise; it carries a message, a description of a situation that has social and cultural context. If, as Thibaud (1984) suggests sound is not a “mere epiphenomenon or secondary consequence of activity” (p. 4) then we must consider that all sound has meaning, it is how to deconstruct that meaning that will allow for a clearer understanding of the soundscape. With this understanding we can construct digital soundscapes which will challenge the perception that the image is what gives the illusion of the real.
rEFErENcEs Adams, M. (2009). Hearing the city: Reflections on soundwalking. Qualitative Research, 10, 6–9. Adams, M., Cox, T., Moore, G., Croxford, B., Refaee, M., & Sharples, S. (2006). Sustainable soundscapes: Noise policy and the urban experience. Urban Studies (Edinburgh, Scotland), 43(13), 2385. doi:10.1080/00420980600972504 Anderson, P. W. S. (2002). Resident evil [Motion picture]. Munich, Germany: Constantin Film. Augoyard, J., & Torgue, H. (2006). Sonic experience: A guide to everyday sounds (illustrated ed.). Montreal, Canada: McGill-Queen’s University Press.
Sound is Not a Simulation
Bijsterveld, K. (2004). The diabolical symphony of the mechanical age: Technology and symbolism of sound in European and North American noise abatement campaigns, 1900-40 . In Back, L., & Bull, M. (Eds.), The auditory culture reader (1st ed., pp. 165–190). Oxford, UK: Berg. Bijsterveld, K. (2008). Mechanical sound: Technology, culture, and public problems of noise in the twentieth century. Cambridge, MA: MIT Press. Blesser, B., & Salter, L. (2009). Spaces speak, are you listening?: Experiencing aural architecture. Cambridge, MA: MIT Press. Blumer, H. (1986). Symbolic interactionism. Berkeley: University of California Press. Bryman, A. (2008). Social research methods (3rd ed.). Oxford, UK: Oxford University Press. Bull, M. (2000). Sounding out the city: Personal stereos and the management of everyday life. Oxford, UK: Berg. Bull, M., & Back, L. (2004). The auditory culture reader (1st ed.). Oxford, UK: Berg. Cabrera Paz, J., & Schwartz, T. B. M. (2009). Techno-cultural convergence: Wanting to say everything, wanting to watch everything. Popular Communication: The International Journal of Media and Culture, 7(3), 130. Cameron, J. (Director). (2009). Avatar [Motion picture]. Los Angeles, CA: 20th Century Fox. Lightstorm Entertainment, Dune Entertainment, Ingenious Film Partners [Studio]. Chion, M., Gorbman, C., & Murch, W. (1994). Audio-vision. New York: Columbia University Press. Cohen, L. (2005). The history of noise [on the 100th anniversary of its birth]. IEEE Signal Processing Magazine, 22(6), 20–45. doi:10.1109/ MSP.2005.1550188
Connor, S. (2004). Edison’s teeth: Touching hearing. In V. Erlmann (Ed.), Hearing cultures: Essays on sound, listening, and modernity (English ed., pp. 153-172). Oxford, UK: Berg. de Certeau, M. D. (1988). The practice of everyday life. Berkeley: University of California Press. Donkey kong [Computer game]. (1981). Kyoto: Nintendo. Drobnick, J. (2004). Aural cultures. Toronto: YYZ Books. Epstein, M. (2009). Growing an interdisciplinary hybrid: The case of acoustic ecology. History of Intellectual Culture, 3(1). Retrieved December 29, 2009, from http://www.ucalgary.ca/hic/issues/vol3/9. Fake engine noises added to hybrid and electric cars to improve safety. (2008). Retrieved January 10, 2010, from http://www.switched. com/2008/06/05/fake-engine-noises-added-tohybrid-and-electric-cars-to-improve/. Feld, S. (2004). A rainforest acoustemology . In Bull, M., & Back, L. (Eds.), The auditory culture reader (1st ed., pp. 223–240). Oxford, UK: Berg Publishers. Frisby, D. (2002). Cityscapes of modernity: Critical explorations. Cambridge, UK: Polity. Goffman, E. (1959). The presentation of self in everyday life (1st ed.). New York: Anchor. Grimshaw, M. (2007). Sound and immersion in the first-person shooter. In Proceedings of 11th International Conference on Computer Games: AI, Animation, Mobile, Educational and Serious Games.Published to CDROM. Grimshaw, M., Lindley, C. A., & Nacke, L. (2008). Sound and immersion in the first-person shooter: Mixed measurement of the player’s sonic experience. In Proceedings of Audio Mostly Conference 21-26.
57
Sound is Not a Simulation
Grimshaw, M., & Schott, G. (2007). Situating gaming as a sonic experience: The acoustic ecology of first-person shooters. In Proceedings of Situated Play, 24-28. Hodgkinson, G. (2009). The seduction of realism. In Proceedings of ACM SIGGRAPH ASIA 2009 Educators Program (pp. 1-4). Yokohama, Japan: The Association for Computing Machinery. Jennings, P. (2009). WMG: Professor Paul Jennings. Retrieved December 30, 2009, from http:// www2.warwick.ac.uk/fac/sci/wmg/about/people/ profiles/paj/. Jørgensen, K. (2008). Audio and gameplay: An analysis of PvP battlegrounds in World of Warcraft. GameStudies. Retrieved January 10, 2010, from http://gamestudies.org/0802/articles/jorgensen. Kusama, K. (Director). (2005). Aeon flux[Motion picture]. Hollywood, CA: Paramount. Lefebvre, H. (2004). Rhythmanalysis: Space, time and everyday life. Continuum. Lumbreras, M., & Sánchez, J. (1999). Interactive 3D sound hyperstories for blind children. In Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit (pp. 318-325). Pittsburgh, PA: ACM. MacLaran, A. (2003). Making space: Property development and urban planning. London: Hodder Arnold. Manning, P. (1992). Erving Goffman and modern sociology. Standord, CA: Stanford University Press. Pac man [Computer game]. (1980). Tokyo, Japan: Namco. Riessman, D. C. K. (1993). Narrative analysis (1st ed.). Los Angeles: Sage. Russolo, L. (1913). Russolo: The art of noises. Retrieved December 30, 2009, from http://120years. net/machines/futurist/art_of_noise.html.
58
Sakaguchi, H. (Director). (2001). Final fantasy [Motion picture]. Los Angeles: Columbia. Schafer, R. M. (1977). The tuning of the world. Toronto: McClelland and Steward. Sequeira, S. D. S., Specht, K., Hämäläinen, H., & Hugdahl, K. (2008). The effects of different intensity levels of background noise on dichotic listening to consonant-vowel syllables. Scandinavian Journal of Psychology, 49(4), 305–310. doi:10.1111/j.1467-9450.2008.00664.x Simmel, G. (1979). The metropolis and mental life. Retrieved February 1, 2010, from http://www. blackwellpublishing.com/content/BPL_Images/ Content_store/Sample_chapter/0631225137/ Bridge.pdf. Smith, B. R. (1999). The acoustic world of early modern England: Attending to the o-factor (1st ed.). Chicago: University Of Chicago Press. Smith, B. R. (2004). Tuning into London c.1600 . In Bull, M., & Back, L. (Eds.), The auditory culture reader (1st ed., pp. 127–136). Oxford, UK: Berg. Space invaders [Computer game]. (1978). Tokyo, Japan: Taito. Thibaud, J. (1998). The acoustic embodiment of social practice: Towards a praxiology of sound environment . In Karlsson, H. (Ed.), Proceedings of Stockholm, Hey Listen! (pp. 17–22). Stockholm: The Royal Swedish Academy of Music. Thompson, J. B. (1995). The media and modernity. Standford, CA: Stanford University Press. Tonkiss, F. (2004). Aural postcards: sound, memory and the city . In Back, M., & Bull, L. (Eds.), The auditory culture reader (1st ed., pp. 303–310). Oxford, UK: Berg. West, S. (Director). (2001). Laura Croft:Tomb raider [Motion picture]. Hollywood, CA: Paramount.
Sound is Not a Simulation
Working Group Noise Eurocities. (n.d.). Retrieved January 10, 2010, from http://workinggroupnoise. web-log.nl/.
KEY tErMs AND DEFINItIONs Holistic: In order to understand the whole of a system, one must look at the parts within it that make it up. Within sociology, Durkheim developed a concept of holism which is in opposition to methodological individualism. Immersion: To be completely surrounded by sound. Mediatization: Sonia Livingstone’s definition of Mediatization is for me the most accurate because it refers “to the meta process by which every day practices and social relations are increasingly shaped by mediating technologies and media organisations” (http://www.icahdq.org/ conferences/presaddress.asp par. 3). Schizophonic: Murray Schafer describes the term schizophonic as the split between an original sound and an electroacoustic reproduction in a soundscape. I am using it as a metaphor for a split between two types of listening spaces: If one is listening to music while traversing through a real space the attention is split in comprehension
between the real world space and the virtual soundscape. Social construction of space: Social constructivists examine ways in which individuals and groups participate in the creation of their perceived social reality. In this context, I am focusing on how society can change their perceived space through sound, either by how they listen to or produce sound in a space. Sonic Architecture: The study of the acoustic affect of objects such as building’s, interior and exterior, on space. Equally, sonic architecture explores how people can construct sonic structures or challenge the sounds of places by creating their own sonic space. Soundscape: Refers to both natural and manmade sounds that immerse an environment. Soundwalking: A soundwalk is a journey where the objective is to discover an environment by listening to it. Symbolic Interactionist: The study of microscale social interaction. It is seen as a process that informs and forms human conduct, the premise being that humans beings act on and upon things based on the meaning these things have, things being defined as physical objects such as chairs, trees, phones, and human beings, mothers, shop clerks and so forth.
59
60
Chapter 4
Diegetic Music:
New Interactive Experiences Axel Berndt Otto-von-Guericke University, Germany
AbstrAct Music which is performed within the scene is called diegetic. In practical and theoretical literature on music in audio-visual media, diegetic music is usually treated as a side issue, a sound effect-like occurrence, just a prop of the soundscape that sounds like music. A detailed consideration reveals a lot more. The aim of this chapter is to uncover the abundance of diegetic occurrences of music, the variety of functions they fulfill, and issues of their implementation. The role of diegetic music gains importance in interactive media as the medium allows a nonlinearity and controllability as never before. As a diegetic manifestation, music can be experienced in a way that was previously unthinkable except, perhaps, for musicians.
INtrODUctION Dealing with music in audio-visual media leads the researcher traditionally to its non-diegetic occurrence first, that is offstage music. Its interplay with the visuals and its special perceptual circumstances have been largely discovered and analyzed by practitioners, musicologists, and psychologists. Its role is mostly an accompanying, annotating one that emotionalises elements of the plot or
DOI: 10.4018/978-1-61692-828-5.ch004
scene, associates contextual information, and thus enhances understanding (Wingstedt, 2008). Comparatively little attention has been given to diegetic music. As its source is part of the scene’s interior (for example, a performing musician, a music box, a car radio), it is audible from within the scene. Hence, it can exert an influence on the plot and acting and is frequently even an inherent part of the scenic action. In interactive media it can even become an object the user might be able to directly interact with. This chapter addresses the practical and aesthetic issues of diegetic music. It clarifies
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Diegetic Music
Figure 1. A systematic overview of all forms of diegetic music
differences to non-diegetic music regarding inner-musical properties, its functional use, and its staging and implementation. Particular attention is paid to interactivity aspects that hold a variety of new opportunities and challenges in store, especially in the context of modern computer games technology. This directly results in concrete design guidelines. These show that adequate staging of diegetic music requires more than its playback. The problem area comprises the simulation of room acoustics and sound radiation, the generation of expressive performances of a given compositional material, even its creation and variation in realtime, amongst others. The complexity and breadth of these issues might discourage developers. The effort seems too expensive for a commercial product and is barely invested. Game development companies usually have no resources available to conduct research in either of these fields. But in most cases, this is not even necessary. Previous and recent research in audio signal processing and computer music created many tools, algorithms, and systems. Even if not developed for the particular circumstances of diegetic music, they approach or even solve similar problems. It is a further aim of this chapter to uncover this fallow potential. This may inspire developers to make new user experiences possible, beyond the limitations of an excluded passive listener.
The key to this is interactivity. However, different types of games allow different modes of interaction. Different approaches to diegetic music follow, accordingly. To lay a solid conceptual basis, this chapter also introduces a more differentiating typology of diegetic music and its subspecies, which is outlined in Figure 1. The respective sections expand on the different types. Before that, a brief historical background and a clarification of the terminology used are provided.
Where Does It come From? Early examples of diegetic music can be found in classic theatre and opera works, for instance, the ball music in the finale of W.A. Mozart’s Don Giovanni (KV 527, premiered in 1787) which is performed onstage, not from the orchestra pit. Placing musicians onstage next to the actors may hamper dialog comprehensibility. To prevent such conflicts, diegetic music was often used as a foreground element that replaces speech. It wasn’t until radio plays and sound films offered more flexible mixing possibilities that diegetic music grew to be more relevant for background soundscape design (for example, bar music, street musicians). Such background features could now be set on a significantly lower sound level to facilitate focusing the audience’s attention on the spoken text, comparable to the well-known Cocktail Party Effect (Arons, 1992).
61
Diegetic Music
A further form of occurrence evolved in the context of music-based computer games, having its origins in the aesthetics of music video clips: music that is visualized on screen. In this scenario, the virtual scene is literally built up through music. Musical features define two- or three-dimensional objects, their positioning, and set event qualities (for example, bass drum beats may induce big obstacles on a racing track or timbral changes cause transitions of the color scheme). The visualizations are usually of an aesthetically stylized type. Thus, the scenes are barely (photo-)realistic but rather surrealistic. Typical representatives of music-based computer games are Audiosurf: Ride Your Music (Fitterer, 2008), Vib-Ribbon (NanaOnSha, 1999), and Amplitude (Harmonix, 2003). However, music does not have to be completely precomposed for the interactive context. Games like Rez (Sega, 2001) demonstrate that player interaction can serve as a trigger of musical events. Playing the game creates its music. One could argue that this is rather a very reactive non-diegetic score. However, the direct and very conspicuous correlation of interaction and musical event and the entire absence of any further sound effects drag the music out of the “off” onto the stage. The surrealistic visuals emphasize this effect as they decrease the aesthetic distance to musical structure. In this virtual world, music is the sound effect and is, of course, audible from within the scene, hence diegetic. The conceptual distance to virtual instruments is not far as is shown by the game Electroplankton (Iwai, 2005) and the lively discussion on whether it can still be called a game (Herber, 2006). In the contexts of Jørgensen’s (2011) terminology discussion, a more precise clarification of the use of diegetic and non-diegetic in this chapter is necessary. The diegesis, mostly seen as a fictional story world, is here used in its more general sense as a virtual or fictional world detached from the conventional story component. It is rather the domain the user interacts with either directly (god-like) or through an avatar which itself is part
62
of the diegesis. The diegesis does not necessarily have to simulate real world circumstances. The later discussion on music video games1 will show that it does not have to be visual either, even if visually presented. Again, the diegesis in interactive media is the ultimate interaction domain, not any interposed interface layer. Keyboard, mouse, gamepad, and graphical user interface elements like health indicator and action buttons are extradiegetic. They serve only to convert user input into diegetic actions or to depict certain diegetic information. The terms diegetic and non-diegetic in their narrow sense describe the source domain of a described entity: diegesis or extra-diegesis. Diegetic sound comes from a source within the diegesis. Many theorists add further meaning to the terms regarding, for instance, the addressee. A soldier in a strategy game may ask the player directly where to go. As the player may also adapt his playing behaviour to non-diegetic information (a musical cue warns of upcoming danger), these can be influential for the diegesis. Such domaincrossing effects are unthinkable in linear, that is non-interactive, media. The strict inside-outside separation of the traditional terminology is, of course, incapable of capturing these situations and it may never be meant to do so. Galloway (2006) deals with this subject in an exemplary way. This chapter does not intend to participate in this discussion. For the sake of clarity, the narrow sense of the terminology is applied in this chapter. This means that the terms only refer to the source domain, not the range of influence. Diegetic is what the mechanics of the diegesis (world simulation, in a sense) create or output. If the superior game mechanics produce further output (for example, interface sounds or the musical score) it is declared non- or extra-diegetic. This is also closer to the principles of the technical implementation of computer games and may make the following explanations more beneficial.
Diegetic Music
ONstAGE PErFOrMED MUsIc The primal manifestation of diegetic music is music that is performed within the scene, either as a foreground or background artifact. As such, it usually appears in its autonomous form as a self-contained and very often a pre-existent piece. The most distinctive difference between diegetic music and its non-diegetic counterpart is that the latter cannot be considered apart from its visual and narrative context. Likewise, the perceptual attitude differs substantially. Foreground diegetic music is perceived very consciously, comparable to listening to a piece of music on the radio or a concert performance. Even background diegetic music that serves a similar purpose as non-diegetic mood music is comprehended differently. While mood music describes an inner condition (What does a location feel like?) background diegetic music contributes to the external description (What does the location sound like?) and can be mood-influential only on a general informal level (They are playing sad music here!).
Functions Therefore, the role of background diegetic music is often regarded as less intrinsic. It is just a prop, an element of the soundscape, which gives more authenticity to the scenario on stage. As such it serves well to stage discos, bars, cafés, street settings with musicians, casinos (see Collins, Tessler, Harrigan, Dixon, & Fugelsang, 2011) for an extensive description of sound and music in gambling environments) and so forth. However, it does not have to remain neutral, even as a background element. It represents the state of the environment. Imagine a situation where the street musicians suddenly stop playing. This is more than an abrupt change of the background atmosphere, it is a signal indicating that something happened that stopped them playing, that something has fundamentally changed.
Conversely, it can also be that dramatic events happen, maybe the protagonist is attacked, but the musical background does not react. Instead, it may continue playing jaunty melodies. Such an indifferent relation between foreground and background evokes some kind of incongruence. This emphasizes the dramaturgical meaning of the event or action. Moreover, it is sometimes understood as a philosophical statement indicating an indifferent attitude of the environment. Whatever happens there, it means nothing to the rest of the world: “life goes on” (Lissa, 1965, p.166). Even though the source of diegetic music is part of the scene it does not have to be visible. The sound of a gramophone suffices to indicate its presence. In this way diegetic music, just like diegetic sound effects, gathers in non-visible elements of the scene and blurs the picture frame, which is particularly interesting for fixed-camera shots. It associates a world outside the window and beyond that door which never opens. Its role as a carrier of such associations takes shape the more music comes to the fore because the linkage to its visual or narrative correlative is very direct and conspicuous (The guy who always hums that melody!). Furthermore, when diegetic music is performed by actors, and thereby linked to them, it can become a means of emotional expression revealing their innermost condition. The actor can whistle a bright melody, hum it absentmindedly while doing something else, or articulate it with sighing inflection. Trained musicians can even change the mode (major, minor), vary the melody, or improvize on it. The more diegetic music becomes a central element of the plot the more its staging gains in importance. Did the singer act well to the music? Does the fingering of the piano player align with the music? It can become a regulator for motion and acting. The most obvious example is probably a dancing couple. Very prominent is also the final assassination scene in Alfred Hitchcock’s (1956) The Man Who Knew Too Much. During a concert
63
Diegetic Music
performance of Arthur Benjamin’s Cantata Storm Clouds the assassin tries to cover his noise by shooting in synch with a loud climactic cymbal crash. Even screaming Doris Day is perfectly in time with the meter of the orchestra.
Design Principles However, when a musical piece is entirely performed in the foreground, it creates a problem. It slows the narrative tempo down. This is because change processes take more time in autonomous music than on the visual layer, in films as well as in games. In contrast to non-diegetic music, where changes are provoked and justified by the visual and narrative context, diegetic music has to stand on its own. Its musical structure has to be self-contained, hence, change processes need to be more elaborate. Such compositional aspects of non-diegetic film music and its differences to autonomous music have been discussed already by Adorno and Eisler (1947). For an adequate implementation of diegetic music, further issues have to be addressed. In contrast to non-diegetic music, it is subject to the acoustic conditions of the diegesis. A big church hall, small bed room, or an outdoor scene in the woods, each environment has its own acoustics and resonances. Ever heard disco music from outside the building? The walls usually filter medium and high frequencies, the bass is left. This changes completely when entering the dance floor. Diegetic music as well as any other sound effect cannot, and must not, sound like a perfectly recorded and mixed studio production. A solo flute in a large symphony orchestra is always audible on CD but gets drowned in a real life performance. According to the underlying sound design there might, nonetheless, be a distinction between foreground and background mixing that does not have to be purely realistic. Furhter discussion of this can be found, for instance, in Ekman (2009). The sound positioning in the stereo or surround panorama also differs from that of studio record-
64
ings. Diegetic music should come from where it is performed. The human listener is able to localize real world sound sources with deviations down to two degrees (Fastl & Zwicker, 2007). Depending on the speaker setting, this can be significantly worse for virtual environments. But even stereo speakers provide rough directional information. Localization gets better again when the source is moving or the players are able to change their relative position and orientation to the source. In either case the source should not “lose” its sound or leave it behind when it moves. It would, as a consequence, lose presence and believability. Positioning the music at the performer’s location in relation to the listener is as essential as it is for every further sound effect. But up to now only a very primitive kind of localization has been discussed: setting the sound source at the right place. In interactive environments, the player might be able to come very close to the performer(s). If it is just a little clock radio, a single sound source may suffice. But imagine a group of musicians, a whole orchestra, the player being able to walk between them, listening to each instrument at close range. Not to forget that the performer, let us say a trumpet player, would sound very different at the front than from behind, at least in reality. Each instrument has its individual sound radiation angles. These are distinctively pronounced for each frequency band. The radiation of high-frequency partials differs from that of medium and low frequencies, a fact that, for instance, sound engineers have to consider for microphonics (Meyer, 2009). How far do developers and designers need to go? How much realism is necessary? The answer is given by the overall realism that the developers aim for. Non-realistic two-dimensional environments (cartoon style, for example) are comparably tolerant of auditory inconsistencies. Even visually (photo-) realistic environments do not expect realistic soundscapes at all. Hollywood cinematic aesthetics, for instance, focus on the affect not on realism. Ekman (2009) describes further situations
Diegetic Music
where the human subjective auditive perception differs greatly from the actual physical situation. Possible causes can be the listener’s attention, stress, auditory acuity, body sounds and resonances, hallucination and so forth. All this indicates that diegetic music has to be handled on the same layer as sound effects and definitely not on the “traditional” non-diegetic music layer. In the gaming scenario, it falls under the responsibility of the audio engine that renders the scene’s soundscape. Audio Application Programming Interfaces (APIs) currently in use are, for instance, OpenAL (Loki & Creative, 2009), DirectSound as part of DirectX (Microsoft, 2009), FMOD Ex (Firelight, 2009), and AM3D (AM3D, 2009). An approach to sound rendering based on graphics hardware is described by Röber, Kaminski, and Masuch (2007) and Röber (2008). A further audio API that is especially designed for the needs of mobile devices is PAudioDSP by Stockmann (2007). It is not enough, though, to play the music back with the right acoustics, panorama, and filtering effects. Along the lines of “more real than reality”, it is often a good case to reinforce the live impression by including a certain degree of defectiveness. The wow and flutter of a record player may cause pitch bending effects. There can be interference with the radio reception resulting in crackling and static noise. Not to mention the irksome things that happen to each musician, even to professionals, at live performances: fluctuation of intonation, asynchrony in ensemble play, and wrong notes, to name just a few of them. Those things hardly ever happen on CD. In the recording studio, musicians can repeat a piece again and again until one perfect version comes out or enough material is recorded to cut down a perfect version during postproduction. But at life performances all this happens and cannot be corrected afterwards. Including them in the performance of diegetic music makes for a more authentic live impression.
Non-Linearity and Interactivity However, in the gaming context in particular this authenticity gets lost when the player listens to the same piece more than once. A typical situation in a game: The player re-enters the scene several times and the diegetic music always starts with the same piece as if the performers paused and waited until the player came back. This can be experienced, for example, in the adventure game Gabriel Knight: Sins of the Fathers (Sierra, 1993) when walking around in Jackson Square. Such a déjà vu effect robs the virtual world of credibility. The performers, even if not audible, must continue playing their music and when the player returns he must have missed parts of it. Another very common situation where the player rehears a piece of music occurs when getting stuck in a scene for a certain time. The performers, however, play one and the same piece over and over again. In some games they start again when they reach the end, in others, the music loops seamlessly. Both are problematic because it becomes evident that there is no more music. The end of the world is reached in some way and there is nothing beyond. A possible solution could be to extend the corpus of available pieces and go through it either successively or randomly in the music box manner. But the pieces can still recur multiple times. In these cases it is important that the performances are not exactly identical. A radio transmission does not always crackle at the same time within the piece and musicians try to give a better performance with each attempt. They focus on the mistakes they made last time and make new ones instead. This means that the game has to generate ever new performances. Examples for systems that can generate expressive performances are: • •
the rule-based KTH Director Musices by Friberg, Bresin, and Sundberg (2006) the machine learning-based YQX by Flossmann, Grachten, and Widmer (2009)
65
Diegetic Music
•
the mathematical music theory-based approach by Mazzola, Göller, and Müller (2002).
Even the expressivity of the performance itself can be varied. This can derive from the scene context (the musician is happy, bored, or sad) or be affected by random deviations (just do it differently next time). Systems to adapt performative expression were developed by Livingstone (2008) and Berndt and Theisel (2008). But modifying performative expression is not the only way to introduce diversity into music. A further idea is to exploit the potential of sequential order, that is, to rearrange the sequence of musical segments. The idea derives from the classic musical dice games which were originally invented by Kirnberger (1767) and became popular through Mozart (1787). The concept can be extended by so-called One Shot segments that can be interposed occasionally amongst the regular sequence of musical segments as proposed within several research prototypes by Tobler (2004) and Berndt, Hartmann, Röber, & Masuch (2006). These make the musical progress appear less fixed. Musical polyphony offers further potential for variance: Building block music2 allows various part settings as not all of them have to play at once. One and the same composition can sound very different by changing the instrumentation (Adler, 2002; Sevsay, 2005) or even the melodic material and counterpoint (Aav, 2005; Berndt et al., 2006; Berndt, 2008). Thus, each iteration seems to be a rearrangement or a variation instead of an exact repetition. Generative techniques can expand the musical variance even more. Imagine a virtual jazz band that improvises all the time. New music is constantly created without any repetition. This can be based on a given musical material, a melody for instance, that is varied. The GenJam system, a genetic approach (Miranda & Biles, 2007), is a well known representative. MeloNet and JazzNet are two systems that create melody ornamenta-
66
tions through trained neural networks (Hörnel, 2000; Hörnel & Menzel, 1999). Based on a graph representation of possible alternative chord progressions (a Hidden Markov Model derivative called Cadence Graph), Stenzel (2005) describes an approach to variations on the harmonic level. Beyond varying musical material it is also possible to generate ever new material. Therefore, Hiller and Isaacsons (1959) have already attempted this through the application of random number generators and Markov chains. This is still common practice today, for example, for melody generation (Klinger & Rudolph, 2006). Next, harmonization and counterpoint can be created for that melody to achieve a full polyphonic setting (Ebcioglu, 1992; Schottstaedt, 1989; Verbiest, Cornelis, & Saeys, 2009). Further approaches to music composition are described by Löthe (2003), Taube (2004), and Pozzati (2009). Papadopoulos & Wiggins (1999) and Pachet and Roy (2001) give more detailed surveys of algorithmic music generation techniques. The nonlinear aspects of diegetic music as they have been discussed up to now omitted one fact that comes along with interactive media. Music, as part of the diegesis, not only influences it but can also be influenced by it, especially by the player. Which player is not tempted to click on the performer and see what happens? In the simplest case a radio is just switched on and off or a song is selected on the music box. Interaction with virtual musicians, by contrast, is more complicated. Two modes can be distinguished: the destructive and the constructive mode. Destructive interaction interferes with the musician’s performance. The player may talk to him, jostle him, distract his attention from playing the right notes and from synchronisation with the ensemble. This may even force the musician to stop playing. Destructive interaction affects the musical quality. A simple way to introduce wrong notes is to change the pitch of some notes by a certain interval. Of course, not all of them have to be changed. The number of changes depends on the
Diegetic Music
degree of disruption. Likewise for the size of the pitch interval: for example, the diatonic neighbor (half and whole step) with small errors and bigger intervals the more the musician is distracted. In the same way rhythmic precision and synchrony can be manipulated. Making musicians asynchronous simply means adding a plain delay that puts some of them ahead and others behind in the ensemble play. The rhythmic precision, by contrast, has to do with the timing of a musician. Does he play properly in time or is he “stumbling”, in other words, unrhythmical? Such timing aspects were described, investigated, and implemented by Friberg et al. (2006) and Berndt and Hähnel (2009) amongst others. As ensemble play is also a form of communication between musicians, one inaccurate player affects the whole ensemble, beginning with the direct neighbor. They will, of course, try to come together again which can be emulated by homeostatic (self-balancing) systems. Such self-regulating processes were, for instance, described by Eldridge (2002) and used for serial music composition. Constructive interaction, by contrast, influences musical structure. Imagine a jazz band cheered by the audience, encouraged to try more adventurous improvisations. Imagine a street musician playing some depressive music. But when giving him a coin he becomes cheerful, his music likewise. Such effects can rarely be found in virtual gaming worlds up to now. The adventure game Monkey Island 3: The Curse of Monkey Island (LucasArts, 1997) features one of the most famous and visionary exceptions. In one scene the player’s pirate crew sings the song “A Pirate I Was Meant To Be.” The player chooses the keywords with which the next verse has to rhyme. The task is to select the one that nobody finds a rhyme for, to bring them back to work. The sequential order of verses and interludes is adapted according to the multiple-choice decisions that the player makes. A systematic overview of this and further approaches to nonlinear music is given by Berndt (2009).
So much effort, such a large and complex arsenal for mostly subsidiary background music? Do we really require all this? The answer is ”no”. This section proposed a collection of tools of which the one or other can be useful for rounding off the coherence of the staging and to strengthen the believability of the music performance. Moreover, these tools establish the necessary foundations for music to be more than a background prop, but to come to the fore as an interactive element of the scene. This opens up the unique opportunity for the player to experience music and its performance in a completely different way, namely close up.
VIsUALIZED MUsIc Beyond visualizing only the performance of music, that is showing performing musicians or sound sources as discussed so far, there is a further possibility: the visualization of music itself. In fact, it is not music as a whole that is visualized but rather a selection of structural features of a musical composition (rhythmic patterns, melodic contour and so on). Moreover, the visual scene must not be completely generated from musical information. Music video games just like music video clips often feature a collage-like combination of realistic and aesthetically stylized visuals. The latter is the focus of this section. The Guitar Hero series (Harmonix, 2006-2009) works with such collage-like combinations. While a concert performance is shown in the background the foreground illustrates the guitar riffs which the player has to perform. PaRappa the Rapper (NanaOn-Sha, 1996) also shows the performers on screen and an unobtrusive line of symbols on top that indicates the type of interaction (which keys to press) and the timing to keep up with the music. In Audiosurf, by contrast, the whole scene is built up through music: the routing of the obstacle course, the positioning of obstacles and items, the color scheme, background objects, and visual effects, even the driving speed. So music
67
Diegetic Music
not only sets visual circumstances but also event qualities. Some pieces induce more difficult tracks than others.
the Musical Diegesis The visual instances of musical features are aesthetically looser in video clips. In the gaming scenario they have to convey enough information to put the game mechanics across to the player. Hence, they have to be aesthetically more consistent and presented in a well-structured way. Often a deviation of the pitch-time notation, known from conventional music scores (pitch is aligned vertically, time horizontally), forms the conceptual basis of the illustrations. Upcoming events scroll from right to left. Its vertical alignment indicates a qualitative value—not necessarily pitch—of the event. The orientation can, of course, vary. Shultz (2008) distinguishes three modes: •
•
•
Reading Mode: corresponds to score notation as previously described and implemented, for example, in Donkey Konga (Namco, 2003) Falling Mode: the time-axis is vertically oriented, the pitch/quality-axis horizontally, upcoming events “drop down” (Dance Dance Revolution by Konami (1998)) Driving Mode: just like falling mode but with the time-axis in z-direction (depth), upcoming events approach from ahead (Guitar Hero).
The illustrations do not have to be musically accurate. They are often simplified for the sake of better playability. In Guitar Hero, for instance, no exact pitch is represented, only melodic contour. Even this is scaled down to the narrow ambit that the game controller supplies. It is, in fact, not necessary to translate note events into some kind of stylization. Structural entities other than pitch values can be indicated as well. In Amplitude, it is the polyphony of multiple tracks (rhythm, vocals,
68
bass, for example) arranged as multiple lanes. Color coding is often used to represent sound timbre (Audiosurf). Other visualization techniques are based on the actual waveform of the recording or on its Fourier transformation (commonly used in media player plug-ins and also in games). For completeness, it should be mentioned that it is, of course, not enough to create only a static scene or a still shot. Since music is a temporal art its visualisation has to develop over time, too. In music video games, as well as in video clips, music constitutes the central value of the medium. It is not subject to functional dependencies on the visual layer. Conversely, the visual layer is contingent upon music, as was already described. Although the visual scene typically does not show or even include any sound sources in a traditional sense (like those described in the previous section), music has to be declared a diegetic entity, even more than the visuals. These is only a translation of an assortment of musical aspects into visual metaphors. They illustrate, comment, concretize, and channel associations which the music may evoke (Kungel, 2004). They simplify conventional visually marked interaction techniques. But the interaction takes place in the music domain. The visuals do not and cannot grasp the musical diegesis as a whole.3 In this scenario the diegesis is literally constituted by music. It is the domain of musical possibilities. In this (its own) world, music is subject to no restrictions. The visual layer has to follow. The imaginary world that derives from this is equally subject to no logical or rational restrictions. The routings of the obstacle courses in Audiosurf run freely in a weightless space: even the background graphics and effects have nothing in common with real sky or space depictions. Practical restrictions, such as those discussed above for onstage performed music (like radio reception interference, wrong notes and so forth), likewise do not exist. Hence, the performative quality can be at the highest stage, that is, studio level.
Diegetic Music
Interactivity in the Musical Domain However, the possibilities to explore these worlds interactively are still severely limited. Often, statically predetermined pieces of music dictate the tempo and rhythm of some skill exercises without any response to whether the player does well or badly. This compares to conventional on-rails shooter games that show a pre-rendered video sequence which cannot be affected by the player whose only task is to shoot each appearing target. A particular piece of music is, here, essentially nothing else but one particular tracking shot through a much bigger world. Music does not have to be so fixed and the player should not be merely required to keep up with it. The player can be involved in its creation: “Music videogames would benefit from an increasing level of player involvement in the music” (Williams, 2006, p.7) The diegesis must not be what a prefabricated piece dictates but should rather be considered as a domain of musical possibilities. The piece that is actually played reflects the reactions of the diegesis to player interaction. An approach to this begins with playing only those note events (or more generally, musical events) that the player actually hits, not those he was supposed to hit. In Rez, for instance, although it is visually an on-rails shooter, only a basic ostinato pattern (mainly percussion rhythms) is predefined and the bulk of musical activity is triggered by the player. Thus, each run produces a different musical output. Williams (2006) goes so far as to state that “it is a pleasure not just to watch, but also to listen to someone who knows how to play Rez really well, and in this respect Rez comes far closer to realising the potential of a music videogame” (p.7). In Rez, the stream of targets spans the domain of musical possibilities. The player’s freedom may still be restricted to a certain extent but this offers a clue for the developers to keep some control over the musical dramaturgy. This marks the upper boundary of what is possible with precomposed
and preproduced material. Further interactivity requires more musical flexibility. Therefore, two different paths can be taken: • •
interaction by musically primitive events interaction with high-level structures and design principles.
Primitive events in music are single tones, drum beats, and even formally consistent groups of such primitives that do not constitute a musical figure in itself (for instance, tone clusters and arpeggios). In some cases even motivic figures occur as primitive events: they are usually relatively short (or fast) and barely variable. The game mechanics provide the interface to trigger them and set event properties like pitch, loudness, timbre, cluster density, for example. Ultimately, this leads to a close proximity of interactive virtual instrument concepts. It can be a virtual replica of a piano, violin, or any instrument that exists in reality. Because of the radically different interaction mode (mouse and keyboard) these usually fall behind their realworld prototypes regarding playability. To overcome this limitation, several controllers were developed that adapt form and handling of real instruments like the guitar controller of Guitar Hero, the Donkey Konga bongos, the turntable controller of DJ Hero (FreeStyleGames, 2009), and not to forget the big palette of MIDI instruments (keyboards, violins, flutes, drum pads and so forth). Roads (1996) gives an overview of such professional musical input devices. But real instruments do not necessarily have to be adapted. The technical possibilities allow far more interaction metaphors, as is demonstrated by the gesture-based Theremin (1924), the sensorequipped Drum Pants (Hansen & Jensenius, 2006), and the hand and head tracking-based Tone Wall/ Harmonic Field (Stockmann, Berndt, & Röber, 2008). Even in the absence of such specialized controllers keyboard, mouse, and gamepad allow expressive musical input too. The challenge, therefore, is to find appropriate metaphors like
69
Diegetic Music
aiming and shooting targets, painting gestural curves, or nudge objects of different types in a two- or three-dimensional scene. Although the player triggers each event manually he does not have to be the only one playing. An accompaniment can be running autonomously in the background like that of a pianist that goes along with a singer or a rock band that sets the stage for a guitar solo. Often repetitive structures (ostinato, vamp, riff) are therefore applied. Such endlessly looping patterns can be tedious over a longer period. Variation techniques like those explained in the previous section can introduce more diversity. Alternatively, non-repetitive material can be applied. Precomposed music is of limited length, hence, it should be sufficiently long. Generated music, by contrast, is subject to no such restrictions. However, non-repetitive accompaniment comes with a further problem: it lacks musical predictability and thereby hampers a player’s smooth performance. This can be avoided. Repetitive schemes can change after a certain number of iterations (for example, play riff A four times, B eight times, and C four times). The changes can be prepared in such a way that the player is warned. A well-known example is the drum roll crescendo that erupts in a climactic crash. Furthermore, tonally close chord relations can relax strict harmonic repetition without losing the predictability of appropriate pitches. The player can freely express himself against this background. But should he really be allowed to do anything? If yes, should he also be allowed to perform badly and interfere with the music? In order not to discourage a proportion of the customers, lower difficulty settings can be offered. The freedom of interaction can be restricted to only those possibilities that yield pleasant satisfactory results. There can be a context sensitive component in the event generation just like a driving aid system that prevents some basic mistakes. Pitch values can automatically be aligned to the current diatonic scale in order to harmonize. A time delay can be used to fit each event perfectly
70
to the underlying meter and rhythmic structure. Advanced difficulty settings can be like driving without such safety systems. It is most interesting for trained players who want to experiment with a bigger range of possibilities. Interaction with high-level structures is less direct. The characteristic feature of this approach is the autonomy of the music. It plays back by itself and reacts to user behaviour. While the previously described musical instruments are rather perceived as a tool-like object, in this approach the impression of a musical diegesis, a virtual world filled with entities that dwell there and react and interact with the player, is much stronger. User interaction affects the arrangement of the musical material or the design principles which define the way the material is generated. In Amplitude (in standard gameplay mode) it is the arrangement. The songs are divided into multiple parallel tracks. A track represents a conceptual aspect of the song like bass, vocals, synth, or percussion and each track can be activated for a certain period by passing a skill test. Even this test derives from melodic and rhythmic properties of the material to be activated. The goal is to activate them all. The music in Amplitude is precomposed and, thus, relatively invariant. Each run leads ultimately to the same destination music. Other approaches generate the musical material just in time while it is performed. User interaction affects the parameterization of the generation process which results in different output. For this constellation of autonomous generation and interaction Chapel (2003) coined the term Active Musical Instrument, an instrument for real-time performance and composition that actively interacts with the user: “The system actively proposes musical material in realtime, while the user’s actions [.. .] influence this ongoing musical output rather than have the task to initiate each sound” (p.50). Chapel states that an Active Instrument can be constructed around any generative algorithm. The first such instrument was developed by Chadabe (1985). While music is created autono-
Diegetic Music
mously, the user controls expressive parameters like accentuation, tempo, and timbre. In Chapel’s case the music generation is based on fractal functions which can be edited by the user to create ever new melodic and polyphonic structures. Eldridge (2002) applies self-regulating homeostatic networks. Perturbation of the network causes musical activity—a possible way to interact with the system. The musical toy Electroplankton for Nintendo DS offers several game modes (called plankton types) that build up a musical domain with complex structures, for example, a melodic progression graph (plankton type Luminaria) and a melodic interpreter of graphical curves (plankton type Tracy). These can be freely created and modified by the user. A highly interactive approach that incorporates precomposed material is the Morph Table presented by Brown, Wooller, & Kate (2007). Music consists of several tracks. Each track is represented by a physical cube that can be placed on the tabletop: this activates its playback. For each track, there are two different prototype riffs represented by the horizontal extremes of the tabletop (left and right border). Depending on the relative position of the cube in-between, the two riffs are recombined by the music morphing techniques which Wooller & Brown (2005) developed. The vertical positioning of the cube controls other effects. The tabletop interface further allows collaborative interaction with multiple users. This anticipates a promising future perspective for music video games. Music making has always been a collaborative activity that incorporates a social component, encourages community awareness, interaction between musicians, and mutual inspiration. What shall be the role of music games in this context? Do they set the stage for the performers or function as performers themselves? In contrast to conventional media players, which are only capable of playing back prefabricated pieces, music video games will offer a lot more. They will be a platform for the user to experiment with and on which to realize his ideas. And they
will be—they already are—an easy introduction to music for everyone, even non-musicians, who playfully learn musical principles to good and lasting effect.
INtErActING WItH MUsIc: A cONcLUsION Music as a diegetic occurrence in interactive media cannot be considered apart from interactivity. But music being the object of interaction is a challenging idea. It is worth taking up this challenge. The growing popularity of music video games over the last few years encourages further exploration of the boundaries of interactivity and to surmount them. Music does not have to be static. It can vary in its expressivity regarding the way it is performed. Users can interact with virtual performers. These do not have to play fixed compositions. Let them ornament their melodies, vary or even improvise on them. Why not just generate new music in realtime while the game is played? Let the players exert an influence on this. Or enable them to playfully arrange or create their own music. Few of these possibilities are applied in practice up to now. Music is a living art that should be more than simply reproduced, it should be experienced anew each time. It is a temporal art and its transience is an inherent component. This chapter has shown how to raise music in interactive media above the status of its mere reproduction. As a domain of interactivity, it invites the users to explore, create, and to have new musical experiences.
rEFErENcEs AM3D (2009). AM3D [Computer software]. AM3D A/S (Developer). Aalborg, Denmark.
71
Diegetic Music
Aav, S. (2005). Adaptive music system for DirectSound. Unpublished master’s thesis. University of Linköping, Sweden. Adler, S. (2002). The study of orchestration (3rd ed.). New York: Norton & Company. Adorno, T. W., & Eisler, H. (1947). Composing for the films. New York: Oxford University Press. Arons, B. (1992, July). A review of the cocktail party effect. Journal of the American Voice I/O Society, 12, 35-50. Berndt, A. (2008). Liturgie für Bläser (2nd ed.). Halberstadt, Germany: Musikverlag Bruno Uetz. Berndt, A. (2009). Musical nonlinearity in interactive narrative environments. In G. Scavone, V. Verfaille & A. da Silva (Eds.), Proceedings of the Int. Computer Music Conf. (ICMC) (pp. 355-358). Montreal, Canada: International Computer Music Association, McGill University. Berndt, A., & Hähnel, T. (2009). Expressive musical timing. In Proceedings of Audio Mostly 2009: 4th Conference on Interaction with Sound (pp. 9-16). Glasgow, Scotland: Glasgow Caledonian University, Interactive Institute/Sonic Studio Piteå. Berndt, A., Hartmann, K., Röber, N., & Masuch, M. (2006). Composition and arrangement techniques for music in interactive immersive environments. In Proceedings of Audio Mostly 2006: A Conference on Sound in Games (pp. 53-59). Piteå, Sweden: Interactive Institute/Sonic Studio Piteå. Berndt, A., & Theisel, H. (2008). Adaptive musical expression from automatic real-time orchestration and performance. In Spierling, U., & Szilas, N. (Eds.), Interactive Digital Storytelling (ICIDS) 2008 (pp. 132–143). Erfurt, Germany: Springer. doi:10.1007/978-3-540-89454-4_20
72
Brown, A. R., Wooller, R. W., & Kate, T. (2007,). The morphing table: A collaborative interface for musical interaction. In A. Riddel & A. Thorogood (Eds.), Proceedings of the Australasian Computer Music Conference (pp. 34-39). Canberra, Australia. Chadabe, J. (1985). Interactive music composition and performance system. U.S. Patent No. 4,526,078. Washington, DC: U.S. Patent and Trademark Office. Chapel, R. H. (2003). Real-time algorithmic music systems from fractals and chaotic functions: Towards an active musical instrument. Unpublished doctoral dissertation. University Pompeu Fabra, Barcelona, Spain. Collins, K., Tessler, H., Harrigan, K., Dixon, M. J., & Fugelsang, J. (2011). Sound in electronic gambling machines: A review of the literature and its relevance to game audio. In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Ebcioglu, K. (1992). An expert system for harmonizing chorales in the style of J. S. Bach. In Balaban, M., Ebcioglu, K., & Laske, O. (Eds.), Understanding music with AI: Perspectives on music cognition (pp. 294–334). Cambridge, MA: MIT Press. Ekman, I. (2009). Modelling the emotional listener: Making psychological processes audible. In Proceedings of Audio Mostly 2009: 4th Conference on Interaction with Sound (pp. 33-40). Glasgow, Scotland: Glasgow Caledonian University, Interactive Institute/Sonic Studio Piteå. Eldridge, A. C. (2002). Adaptive systems music: Musical structures from algorithmic process. In C. Soddu (Ed.), Proceedings of the 6th Generative Art Conference Milan, Italy: Politecnico di Milano University.
Diegetic Music
Fastl, H., & Zwicker, E. (2007). Psychoacoustics: Facts and models (3rd ed., Vol. 22). Berlin, Heidelberg: Springer.
Hiller, L. A., & Isaacsons, L. M. (1959). Experimental music: Composing with an electronic computer. New York: McGraw Hill.
Firelight (2009). FMOD Ex. v4.28 [Computer software]. Victoria, Australia: Firelight Technologies.
Hitchcock, A. (1956). The Man Who Knew Too Much [Motion picture]. Hollywood, CA: Paramount.
Fitterer, D. (2008). Audiosurf: Ride Your Music [Computer game]. Washington, DC: Valve. Flossmann, S., Grachten, M., & Widmer, G. (2009). Expressive performance rendering: introducing performance context. In Proceedings of the 6th Sound and Music Computing Conference (SMC). Porto, Portugal: Universidade do Porto. FreeStyleGames (2009). DJ Hero [Computer game]. FreeStyleGames (Developer), Activision. Friberg, A., Bresin, R., & Sundberg, J. (2006). Overview of the KTH Rule System for musical performance. Advances in Cognitive Psychology. Special Issue on Music Performance, 2(2/3), 145–161. Galloway, A. R. (2006). Gaming: Essays on algorithmic culture. Electronic Mediations (Vol. 18). Minneapolis: University of Minnesota Press. Hansen, S. H., & Jensenius, A. R. (2006). The Drum Pants. In Proceedings of Audio Mostly 2006: A Conference on Sound in Games (pp. 60-63). Piteå, Sweden: Interactive Institute/Sonic Studio. Harmonix (2003). Amplitude [Computer game]. Harmonix (Developer), Sony. Harmonix (2006-2009). Guitar Hero series [Computer games]. Harmonix, Neversoft, Vicarious Visions, Budcat Creations, RedOctane (Developers), Activision. Herber, N. (2006). The Composition-Instrument: Musical emergence and interaction. In Proceedings of Audio Mostly 2006: A Conference on Sound in Games (pp. 53-59). Piteå, Sweden: Interactive Institute/Sonic Studio Piteå.
Hörnel, D. (2000). Lernen musikalischer Strukturen und Stile mit neuronalen Netzen. Karlsruhe, Germany: Shaker. Hörnel, D., & Menzel, W. (1999). Learning musical structure and style with neural networks. Computer Music Journal, 22(4), 44–62. doi:10.2307/3680893 Iwai, T. (2005). Electroplankton [Computer game]. Indies Zero (Developer), Nintendo. Jørgensen, K. (2011). Time for new terminology? Diegetic and non-diegetic sounds in computer games revisited. In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Kirnberger, J. P. (1767). Der allezeit fertige Polonaisen und Menuetten Komponist. Berlin, Germany: G.L. Winter. Klinger, R., & Rudolph, G. (2006). Evolutionary composition of music with learned melody evaluation. In N. Mastorakis & A. Cecchi (Eds.), Proceedings of the 5th WSEAS International Conference on Computational Intelligence, ManMachine Systems and Cybernetics (pp. 234-239). Venice, Italy: World Scientific and Engeneering Academy and Society. Konami (1998). Dance Dance Revolution. Konami, Disney, Keen, Nintendo. Kungel, R. (2004). Filmmusik für Filmemacher— Die richtige Musik zum besseren Film. Reil, Germany: Mediabook-Verlag. Lissa, Z. (1965). Ästhetik der Filmmusik. Leipzig, Germany: Henschel.
73
Diegetic Music
Livingstone, S. R. (2008). Changing musical emotion through score and performance with a compositional rule system. Unpublished doctoral dissertation. The University of Queensland, Brisbane, Australia. Loki & Creative. (2009). (1.1). Loki Software. Open, AL: Creative Technology. Löthe, M. (2003). Ein wissensbasiertes Verfahren zur Komposition von frühklassischen Menuetten. Unpublished doctoral dissertation. University of Stuttgart, Germany. LucasArts. (1997). Monkey Island 3: The Curse of Monkey Island. LucasArts. Manz, J., & Winter, J. (Eds.). (1976). Baukastensätze zu Weisen des Evangelischen Kirchengesangbuches. Berlin: Evangelische Verlagsanstalt. Mazzola, G., Göller, S., & Müller, S. (2002). The topos of music: Geometric logic of concepts, theory, and performance. Zurich: Birkhäuser Verlag. Meyer, J. (2009). Acoustics and the performance of music: Manual for acousticians, audio engineers, musicians, architects and musical instrument makers (5th ed.). New York: Springer. Microsoft. (2009). [Computer software] [. Microsoft Corporation.]. Direct, X, 11. Miranda, E. R., & Biles, J. A. (Eds.). (2007). Evolutionary computer music (1st ed.). USA: Springer. doi:10.1007/978-1-84628-600-1 Mozart, W. A. (1787). Musikalisches Würfelspiel: Anleitung so viel Walzer oder Schleifer mit zwei Würfeln zu componieren ohne musikalisch zu seyn noch von der Composition etwas zu verstehen. Köchel Catalog of Mozart’s Work KV1 Appendix 294d or KV6 516f. Namco (2003). Donkey Konga [Computer game]. Namco (Developer), Nintendo. NanaOn-Sha (1996). PaRappa the Rapper [Computer game]. NanaOn-Sha (Developer), Sony.
74
NanaOn-Sha (1999). Vib-Ribbon [Computer game]. NanaOn-Sha (Developer), Sony. Pachet, F., & Roy, P. (2001). Musical harmonization with constraints: A survey. Constraints Journal. Papadopoulos, G., & Wiggins, G. (1999). AI methods for algorithmic composition: A survey, a critical view and future prospects. In AISB Symposium on Musical Creativity. Edinburgh, Scotland. Pozzati, G. (2009). Infinite suite: Computers and musical form. In G. Scavone, V. Verfaille & A. da Silva (Eds.), Proceedings of the International Computer Music Conference (ICMC) (pp. 319322). Montreal, Canada: International Computer Music Association, McGill University. Roads, C. (1996). The computer music tutorial. Cambridge, MA: MIT Press. Röber, N. (2008). Interacting with sound: Explorations beyond the frontiers of 3D virtual auditory environments. Munich, Germany: Dr. Hut. Röber, N., Kaminski, U., & Masuch, M. (2007). Ray acoustics using computer graphics technology. In Proceedings of the 10th International Conference on Digital Audio Effects (DAFx-07) (pp. 117-124). Bordeaux, France: LaBRI University Bordeaux. Schottstaedt, W. (1989). Automatic counterpoint. In Mathews, M., & Pierce, J. (Eds.), Current directions in computer music research. Cambridge, MA: MIT Press. Sega (2001). Rez [Computer game]. Sega. Sevsay, E. (2005). Handbuch der Instrumentationspraxis (1st ed.). Kassel, Germany: Bärenreiter. Shultz, P. (2008). Music theory in music games. In Collins, K. (Ed.), From Pac-Man to pop music: Interactive audio in games and new media (pp. 177–188). Hampshire, UK: Ashgate.
Diegetic Music
Sierra (1993). Gabriel Knight: Sins of the Fathers [Computer game]. Sierra Entertainment. Stenzel, M. (2005). Automatische Arrangiertechniken für affektive Sound-Engines von Computerspielen. Unpublished diploma thesis. Otto-von-Guericke University, Department of Simulation and Graphics, Magdeburg, Germany. Stockmann, L. (2007). Designing an audio API for mobile platforms. Internship report. Magdeburg, Germany: Otto-von-Guericke University. Stockmann, L., Berndt, A., & Röber, N. (2008). A musical instrument based on interactive sonification techniques. In Proceedings of Audio Mostly 2008: 3rd Conference on Interaction with Sound (pp. 72-79). Piteå, Sweden: Interactive Institute/ Sonic Studio Piteå. Taube, H. K. (2004). Notes from the metalevel: Introduction to algorithmic music composition. London, UK: Taylor & Francis. Theremin, L. S. (1924). Method of and apparatus for the generation of sounds. U.S. Patent No. 73,529. Washington, DC: U.S. Patent and Trademark Office. Tobler, H. (2004). CRML—Implementierung eines adaptiven Audiosystems. Unpublished master’s thesis. Fachhochschule Hagenberg, Hagenberg, Austria. Verbiest, N., Cornelis, C., & Saeys, Y. (2009). Valued constraint satisfaction problems applied to functional harmony. In Proceedings of IFSA World Congress EUSFLAT Conference (pp. 925-930). Lisbon, Portugal: International Fuzzy Systems Association, European Society for Fuzzy Logic and Technology. Williams, L. (2006). Music videogames: The inception, progression and future of the music videogame. In Proceedings of Audio Mostly 2006: A Conference on Sound in Games (pp. 5-8). Piteå, Sweden: Interactive Institute, Sonic Studio Piteå.
Wingstedt, J. (2008). Making music mean: On functions of, and knowledge about, narrative music in multimedia. Unpublished doctoral dissertation. Luleå University of Technology, Sweden. Wooller, R. W., & Brown, A. R. (2005). Investigating morphing algorithms for generative music. In Proceedings of Third Iteration: Third International Conference on Generative Systems in the Electronic Arts. Melbourne, Australia.
KEY tErMs AND DEFINItIONs Diegesis: Traditionally it is a fictional story world. In computer games, or more generally in interactive media, it is the domain the user ultimately interacts with. Diegetic Music: Music that is performed within the diegesis. Extra-Diegetic: The terms extra-diegetic and non-diegetic refer to elements outside of the diegesis. Extra-diegetic is commonly used for elements of the next upper layer, the narrator’s world or the game engine, for instance. Nondiegetic, by contrast, refers to all upper layers up to the real world. Music Video Games: Computer games with a strong focus on music-related interaction metaphors. For playability, musical aspects are often, if not usually, transformed into visual representatives. Musical Diegesis: In music video games, the user interacts with musical data. These constitute the domain of musical possibilities, the musical diegesis. Nonlinear Music: The musical progress incorporates interactive and/or non-deterministic influences.
75
Diegetic Music
ENDNOtEs 1
76
Although this book prefers the generic term computer games, here, I use the term music video game both to emphasize the musical interaction and because it is the more commonly used term for this genre.
2
3
Building block music: translated from the German term “Baukastenmusik” (Manz & Winter, 1976). Likewise, non-diegetic film music does and cannot mediate the complete visual diegesis.
Section 2
Frameworks & Models
78
Chapter 5
Time for New Terminology? Diegetic and Non-Diegetic Sounds in Computer Games Revisited Kristine Jørgensen University of Bergen, Norway
AbstrAct This chapter is a critical discussion of the use of the concepts diegetic and non-diegetic in connection with computer game sound. These terms are problematic because they do not take into account the functional aspects of sound and indicate how gameworlds differ from traditional fictional worlds. The aims of the chapter are to re-evaluate earlier attempts at adapting this terminology to games and to present an alternative model of conceptualizing the spatial properties of game sound with respect to the gameworld.
INtrODUctION Two concepts from narrative theory that often appears in discussions about game sound are diegetic and non-diegetic (Collins, 2007, 2008; Ekman 2005; Grimshaw 2008; Grimshaw & Schott 2007; Jørgensen 2007b, 2008; Stockburger, 2003; Whalen, 2004). The terms are used in film theory to separate elements that can be said to be part of the depicted fictional world from elements that the fictional characters cannot see or hear and which should be considered non-existent in the fictional world (Bordwell, 1986; Bordwell & Thompson, 1997). According to this approach, dialogue beDOI: 10.4018/978-1-61692-828-5.ch005
tween two characters is seen as diegetic, while background score music is seen as non-diegetic. In connection with game sound, a likely adaptation of these concepts would describe the response “More work?” from an orc peon unit in the realtime strategy game Warcraft 3 (Blizzard, 2002) as an example of a diegetic sound since it is spoken by a character within the gameworld. Music that signals approaching enemies in the role-playing game Dragon Age: Origins (Bioware, 2009) would according to this view be an example of non-diegetic sound since the music is not being played from a source within the game universe. However, when analyzing the examples more closely, we see that using these terms in computer games is confusing and at best inaccurate. As a
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Time for New Terminology?
response to a player command, the “More work?” question has an ambiguous status in relation to the gameworld: If we ask ourselves who the peon is talking to, it appears to address the player, who is not represented as a character in the gameworld, but manages the troops and base from the outside of the gameworld. The warning music heard in the role-playing game is also ambiguous. Although there is nothing to suggest that the music is being played by an orchestra in the wilderness, there is no doubt that the music influences the players’ tactical decisions and therefore has direct consequence for the player-characters’ actions and the progression of the game. The confusion comes into being because game sound has a double status in which it provides usability information to the player at the same time as it has been stylized to fit the depicted fictional world. It works as support for gameplay, while also providing a sense of presence in the gameworld (Jørgensen, 2007a, 2009; Nacke & Grimshaw, 2011). From this point of view, diegetic and non-diegetic sounds tend to blend systematically in games, thereby creating additional levels of communication compared to the traditional diegetic versus non-diegetic divide. Although sound may be categorized and discussed in several ways, the diegetic versus non-diegetic divide may be especially attractive for describing modern computer games since they are set in universes separate from ours and that on the surface remind one of the fictional universes of film and literature. This makes the terminology seem like an illustrative approach for describing auditory properties with respect to the represented universe in games. The concepts enable us to separate what is perceived as internal to that universe from what is perceived as external to it. However, as this chapter will argue, the concepts of diegetic and non-diegetic are developed with traditional media in mind, and are therefore confusing and misleading when attempts are made to uncritically transfer them to computer games. First, the participatory role of the player is not accounted for in this theory, which
means that the functional aspects of game sound therefore disappears when applying diegetic and non-diegetic to game sound. Also, gameworlds cannot be appropriately described by these terms since they are designed for different purposes than traditional fictional worlds. Since gameworlds invite users to enter their domains as players, they are qualitatively different from other fictional worlds, and this makes the traditional diegetic versus non-diegetic divide problematic when applied to computer games. While the aim of the chapter is to evaluate the use of the two concepts in relation to game sound, the chapter will also be a revision of my earlier theory on transdiegetic sounds (Jørgensen, 2008b). I will discuss my own and other attempts at adapting the concepts to game sound, based on the original meaning and uses of diegesis, and present an alternative way of conceptualizing the phenomena in relation to game sound. The main argument of this chapter rests on two principles. One is that the participatory nature of games allows the players a dual position where they are located on the outside of the gameworld but with power to reach into it. The other is that gameworlds differ from traditional fictional worlds in fundamental ways as they are worlds intended for play. This difference requires game sound to be evaluated on terms other than those used for analyzing film sound. A short reader guide is appropriate. The chapter is organized according to principles of clarity where an overview of earlier theory creates the basis of the argument and, in order to get the most out of the chapter, it should be read from beginning to end rather than being dipped into. I will introduce the chapter with a discussion of the origin and application of diegetic and non-diegetic in traditional media before going on to present other attempts at categorizing game sound (Collins, 2007, 2008; Huiberts & van Tol, 2008; Stockburger, 2003; Whalen, 2004). Next, the chapter will review different attempts to adapt diegetic terminology to games (Galloway, 2006) and game sound (Ekman, 2005; Grimshaw,
79
Time for New Terminology?
2008; Jørgensen, 2007b). I will then discuss how gameworlds separate themselves from traditional fictional worlds and that this has consequences for the way we interact with them (Aarseth, 2005, 2008; Klevjer, 2007), and consequently for the application of diegetic and non-diegetic. The last section of the chapter will present an alternative model for analyzing game sound in terms of spatial integration. Throughout the chapter, I will also use data from research interviews with empirical players where this is appropriate. The data concerns player interpretations of so-called transdiegetic features in computer games, and support the idea that gameworlds work on premises other than traditional fictional worlds. Although this chapter focuses on the auditory aspect of games in particular, it should be noticed that the discussion about the relevance of diegetic and non-diegetic features does not concern auditory features alone. However, sound is particularly interesting for several reasons. Since sound is neither tangible nor visible, and has a temporal quality, it has the ability to remain non-intrusive even when it breaks the borders of the gameworld. The ability to seamlessly integrate with the gameworld gives it the opportunity to challenge the relationship between diegetic and non-diegetic in a way that visual information cannot.
bAcKGrOUND Diegetic vs. Non-Diegetic sound The term diegetic originally stems from The Republic, where Plato separates between two narrative modes that he calls diegesis and mimesis. Diegesis, or pure narrative, is when the poet “himself is a speaker and does not even attempt to suggest to us that anyone but himself is speaking”; while mimesis, or imitation, is when the poet “delivers a speech as if he were someone else” (Plato in Genette, 1983, p. 162). According to film scholar David Bordwell (1986), the term
80
diegesis was revived in the 1950s to describe the “recounted story” of a film, and today it has become the accepted terminology for “the fictional world of the story” (p. 16). According to this terminology, diegetic sound is represented as “sound which has a source in the story world”, while non-diegetic sound is “represented as coming from a source outside the story world” (Bordwell & Thompson, 1997, p. 330). Game scholars who use diegetic and non-diegetic when describing game sound, tend to take their point of departure from this newer, film theory understanding of diegesis, and extend the meaning of the “fictional world of the story”, to the universe of the game. As mentioned, this is confusing since it implies that the gameworld is a storyworld, and is misleading because game sound works for different purposes compared to film sound. These points will be in focus in the following discussion that critically evaluates the use of diegetic and non-diegetic in relation to computer game sound. Of course, the debate about the relationship between diegetic and non-diegetic features is not unique to game studies. Also, film theory sees the limited ability of this theory to precisely describe sound. While David Bordwell and Kristin Thompson (1997) define non-diegetic sound as “represented as coming from a source outside the story world” (p. 330), Edward Branigan separates non-diegetic features into extra-fictional and non-diegetic. He argues that when a piece of background film music is accompanying the credits of a film, it should be interpreted as extra-fictional, but when it accompanies a series of shots from a nightclub, and is thus presented as typical of an evening at that location, it should be interpreted as non-diegetic (1992, p. 96). In this view, Branigan claims that non-diegetic sound is related to the diegesis, but does not correspond to the fictional characters’ experience of it (1992, p. 96), while extra-fictional sound exists outside the diegesis and is required to talk about the diegesis as fictional (1992, p. 88). Although not accounting for the participatory nature of games, Branigan’s view
Time for New Terminology?
of non-diegetic is more sympathetic towards how, for instance, score music works in games, since there is some kind of bond between the sound and what happens within the diegesis. When discussing film music, Michel Chion also points out that the non-diegetic category is complicated. A central reason, in his view, is that so-called diegetic music, like non-diegetic music, may have a commentary function meant to help the interpretation of what is going on in the film. Chion’s own example is Siodmak’s Abschied, in which the protagonist’s emotional states are being punctuated by the music of his pianist neighbor, thereby questioning the non-diegetic state of the music. Because of such ambiguous cases, Chion argues that the reference to diegetic and non-diegetic music is misleading, and uses pit music and screen music instead. While pit music “accompanies the image from a non-diegetic position, outside the time and space of action”, screen music refers to “music arising from a source located directly or indirectly in the space of time” (Chion, 1994, p. 80). From this approach, screen music could also be used to describe the computer game version of leitmotifs (Gorbman, 1997, pp. 3, 26-29), in which music with an apparent non-diegetic source warns the player about dangers. The relationship between diegetic and nondiegetic is not a simple one in literary theory either. One example of this is provided by Gerard Genette, who points out that the diegetic and nondiegetic levels often blend together in the act of narration. He uses the term metalepsis to describe any transition from one diegetic level to another. While the classics used the term to refer to “any intrusion by the non-diegetic narrator or narratee into the diegetic universe” (Genette, 1983, pp. 234-235), Genette extends the term and calls all kinds of narrative transitions of elements between distinct levels of the literary diegesis narrative metalepsis”. In literature, these transitions range from simple rhetorical figures, where the narrator addresses the reader, to extremes in which a man is killed by a character in the novel he is reading.
However, being closely connected to the act of narration—how a story is told—metalepsis only serves as a comparative illustration for the transboundary movement that happens in computer games. These methods of categorization show that the relationship between diegetic and non-diegetic sound is not without debate in film theory and literary theory but, while the concepts work as a point of departure and as a common ground for understanding the narrative levels of traditional fiction, they create confusion in connection with computer game sound because of the participatory nature of games and gameworlds (Collins, 2008, p. 180; Jørgensen, 2006, p. 48, 2007b, p. 106). In films and computer games equally, sound cues the media user’s understanding of the environment, direction, spatiality, temporality, objects and events. However, film sound is limited to informing the audience as to how to interpret what is going on in an inaccessible world while game sound provides information relevant for understanding how to interact with the game system and behave in the virtual environment that is the gameworld (Jørgensen, 2008). This means that game sound has a double status in which it provides usability information to the player at the same time as it has been stylized to fit the depicted universe. This may create confusion with respect to the role of the sound since it appears to have been placed in the game from the point of view of creating a sense of presence and physicality to the game universe while it actually works as a support for gameplay. A comparison serves as illustration. When the players of The Elder Scrolls III: Morrowind (Bethesda, 2002) hear the music change when navigating through a forest, they know that an enemy is approaching, and may act accordingly. However, since this music has no source in the gameworld, the player character should not be able to hear it, but since the player does hear it and may act upon it, the character also seems to act as if it knows enemies are approaching even though it does not yet see them coming. In this sense, sound
81
Time for New Terminology?
that appears to be non-diegetic affects diegetic events, thereby disrupting the traditional meaning of diegetic and non-diegetic sound (Jørgensen, 2007b). In Pulp Fiction (Tarantino, 1994), on the other hand, one of the characters is sitting in his car accompanied by what at first appears to be non-diegetic music. Suddenly he starts whistling along with the music. In this case, the audience is not led to believe that the character hears music that is not present; instead, they re-interpret the music not as non-diegetic, but as diegetic music played on the car radio. On the surface, the situations from the game and the film may appear similar, but in terms of how it affects its context, there is a huge difference between the film music and the game music: In the case of the film music, we revise our interpretation when we realize that the fictional character actually can hear it (Branigan, 1992, p. 88). There is therefore never any ambiguity connected to the origin of the music, and we are never led to believe that the character hears music that is not present in his world. The game music, on the other hand, has a functional value related to the game system: it provides a warning to the players about a change in game state: namely that an enemy is aware of their presence and about to attack. In this sense, the role of game music is to enable the player to use its informative value to make progress in the game. In this respect, film music and game music have fundamental different roles. While film music provides clues about moods, upcoming events, and how to interpret specific scenes, game music works as a user interface that provides usability information that helps players progress in the game. Also, while non-diegetic film music never allows the audience to change the protagonists’ behavior or to save them from certain death, game music can enable the player to guide their avatar away from danger or to make them draw their sword even before the enemy has appeared. This is, of course, a direct result of the difference between players and audiences and it puts emphasis on the fact that the concepts of diegetic and non-diegetic
82
have not been designed to take this difference into account, and is therefore not sufficient for analyzing sound in computer games.
categorization of Game sound There have been different attempts to categorize game sound and, in this section, I will present some of the most fruitful endeavors. Although only a few scholars base their descriptions on whether or not sounds are diegetic and non-diegetic, many refer to the concepts and may in some cases use them as unambiguous ways to look at sound. This section will provide a short overview of such scholarly attempts before the next section goes on to discuss specific attempts to adapt diegetic and non-diegetic concepts to game sound. Alex Stockburger (2003) was perhaps the first academic that came up with a method of categorization for game sound. He defines a number of “sound objects” according to their use in the game environment, and separates between score sound objects, zone sound objects, interface sound objects, speech sound objects, and a range of different effect sound objects connected variously to the avatar, to objects usable by the avatar, to other game characters, to other entities, and to events. Although Stockburger emphasizes the importance of understanding the functional role of sound, his categories do not cover this. Instead, his model describes sound according to what kind of object it is connected to in the game engine. He also uses diegetic and non-diegetic as matter-of-fact and straightforward concepts and does not discuss how they should be interpreted in terms of game sound. One who does argue that diegetic concepts can be usefully applied to game sound is Zack Whalen. He states that non-diegetic game music has two functions; to “expand the concept of a game’s fictional world or to draw the player forward through the sequence of gameplay” (2004). In other words, it can either support the sense of spatiality and presence in the game environment, or support the player’s progression through the
Time for New Terminology?
game. His approach is interesting as it takes into account the fact that game music provides information relevant for gameplay, but by being tied to the traditional meaning of non-diegetic it is equally misleading as other adaptations of the concepts. A scholar who does see the diegetic/nondiegetic division as complicated is Karen Collins (2007, 2008). She points out that the division between diegetic and non-diegetic sound is problematic since the player is engaging in the on-screen sound playback process directly (2008, p. 125). Her separation between interactive and adaptive sound is based on functionality. Whereas interactive sound refers to sound events occurring in response to player action, adaptive sound reacts to events in the environment (2007, 2008, p. 4). In this respect, sound is understood as a dynamic feature closely related to events, at the same time as it takes into account the agency of the player. Huiberts & Van Tol (2008) also point out that using diegetic and non-diegetic is complicated in connection with game sound, since interactivity allows non-diegetic sounds to affect diegetic events. They still decide to use the terms because they see them as established within game studies. By putting diegetic and non-diegetic in context with setting and activity, their IEZA framework takes into account the interactive aspects of game sound, but does not take into consideration that gameworlds are designed for different purposes compared to diegeses, and that they therefore influence sound in a different way. There are also other models for describing sound in this anthology. Wilhelmsson & Wallén’s (2011) general framework for sound design and analysis combines theories of listening with both the IEZA framework and Murch’s description of five layers between “encoded” and “embodied” sound in film ranging from speech to music via effect sounds: However, like many others, they take the fruitfulness of diegetic and non-diegetic for granted. In his discussion of diegetic music, Berndt (2011) claims that what he calls visualized music must be considered diegetic. This is the
visualization of structural features of a musical composition, exemplified by the stylized visualization of patterns found in the user interface of music games such as Rock Band (Harmonix, 2007) and Electroplankton (Indies Zero, 2006). From the point of departure of this chapter, this view of diegetic is problematic, since it distances itself from the original use of diegesis and thereby creates confusion. Milena Droumeva, on the other hand, outlines a framework of game sound according to “realism” in terms of fidelity and verisimilitude, and connects these to acoustic ecology and Barry Truax’ idea of an acoustic community that includes physical world sounds that have an impact upon gameplay. Examples of this are the acoustic soundscape of group play, and online conferencing (“live chat”) (Droumeva, 2011). From this perspective, she argues that the use of diegetic and non-diegetic terminology is limited because it fails to acknowledge the importance of these kinds of sounds. Although a valid point when discussing the general soundscape of the gaming activity, this point has only limited value to the argument of this chapter, since it is restricted to how game internal sound works with respect to the gameworld, and only briefly mentions externally produced sounds.
Diegetic theories of Game sound Some of the more critical attempts at adapting diegetic and non-diegetic to games have resulted in analyses that show that game sound has more significant layers of meaning than can be explained by using the terminology above. In this section, I will evaluate the most comprehensive of these adaptations and discuss their strengths and weaknesses. However, even though the following accounts are attentive to how the concepts of diegetic and non-diegetic when used for describing games differ from how they are used for films, emphasizing this difference may lead to a situation in which one keeps leaning too heavily on a terminology that is meant to describe film sound, without be-
83
Time for New Terminology?
ing able to free oneself to establish a new model designed to take the particular characteristics of game sound into account. A game scholar that partly succeeds in using diegetic and non-diegetic in his description of games is Alexander Galloway (2006). Focusing on games as activities, he couples the terminology with his own terminology of whether it is the player (operator) or game system (machine) that performs the act. His model describes all actions as executed either inside the “world of gameplay” or outside of it and whether it is the player or the game system that takes a specific action. In this way, he describes all actions from the player firing a gun to configuring the options menu, from the movements of non-playing characters to the spawning of power-ups. While the categories themselves are not crucial to this chapter, Galloway’s perspective is important. He emphasizes the fact that games are activities and that they must be described as such. He also states that when diegetic and non-diegetic are used in connection with games the meaning of the terms changes (Galloway, 2006). However, even though he points this important fact out, Galloway’s use of these terms is somewhat confusing since he, like I do with the term transdiegetic, tries to change the concepts from describing the relative positioning of features in space to describing actions. The model is worth mentioning, however, since the action-oriented perspective supports sound by focusing on temporality: that is, like sound, action is time-based. Galloway’s approach to diegesis as a “world of gameplay” is also closely related to Mark Grimshaw’s radical modification of what should count as diegetic sound in computer games. He extends the idea of diegetic sound compared to film theory, and states that in computer games, diegetic sound is “defined as the sound that emanates from the gameplay environment, objects and characters and that is defined by that environment, those objects and characters”, and that it must “derive from some entity of the game during play” (Grimshaw,
84
2008, p. 224). In this respect, sounds do not have to be placed within the game environment in a way that we recognize from the physical world. In other words, as long as the referent is diegetic, the signal does not need to be. There is no need to have a character in the gameworld that produces the sound for it to count as diegetic. For Grimshaw, sounds are diegetic as long as they relate to actions and events in the gameworld. He exemplifies by pointing out that sounds signaling the entrance or exit of players in a multiplayer game should be considered diegetic since they concern entities in the game environment and affect their behavior. Based on this understanding, Grimshaw elaborates that diegetic game sounds are not limited to sounds that exist in the gameworld but that we also need to take into account all sounds that provide information relevant for understanding the gameworld. In effect, this would also include the traditional background music that signals an enemy about to attack in The Elder Scrolls III: Morrowind, and disembodied voiceovers in Warcraft 3. By introducing additional new concepts that specify whether a sound is heard by a specific player (ideodiegetic sounds), and whether such a sound results from the player’s haptic input or not (kinediegetic versus exodiegetic sounds) (Grimshaw & Schott, 2007; Grimshaw, 2008), Grimshaw creates a game-specific terminology that recognizes its theoretical relationship to the diegetic or non-diegetic divide. A concept that is particularly interesting is what he calls telediegetic sounds. Connected to multiplayer situations, these are sounds produced by one player and of consequence for a second player who does not hear that sound. While it may be seen as a paradox to call this information auditory when it is in fact the action of the first player that affects the second player, the concept has interesting implications. If we detach the concept from the idea that it must be heard by a first player, it may be extended to all situations in which players appear to react to a sound that they do not hear, such as is the case when players apparently react to the traditionally
Time for New Terminology?
speaking non-diegetic music of approaching enemies. However, even though Grimshaw’s theory emphasizes all sounds that have relevance for player actions in the gameworld, it is confusing that he still insists on using the concept diegetic also for sounds that appear to have no source in the game environment and that the avatar should not be able to hear. In any respect, Grimshaw’s extension of what counts as diegetic, and his focus on the player in relation to the concept, are strong arguments for exchanging the existing terminology with new. In my Ph.D. research (Jørgensen, 2007a, 2009), I developed a model of categorization that took into consideration functionality with respect to usability and type of information, location with respect to the gameworld, and referentiality with respect to the relationship between sound signal and the event it refers to (2007a, pp. 84-87). In Jørgensen (2008), the model was further developed to include what generates a specific sound. However, in describing the location of sound with respect to the gameworld, these models both included references to the diegetic/non-diegetic divide by the use of the neologism transdiegetic sounds (Jørgensen, 2007b). This approach described sound as transdiegetic by way of transcending the border between diegetic and non-diegetic: Diegetic sounds may address non-diegetic entities, while non-diegetic sounds may communicate to entities within the diegetic world. Such sounds have an important functional value in computer games by being an extension of the user interface and providing information such as feedback and warnings to the player. Utilizing the border between diegetic and non-diegetic, transdiegetic sounds merge game system information with the gameworld and create a frame of reference that has usability value at the same time as it upholds the sense of presence in the gameworld. Using this terminology, I argued that apparently non-diegetic music that provides information relevant for player action in the gameworld is external transdiegetic since the musical source is not found within the
gameworld but is external to it. The same goes for the disembodied warning “Our base is under attack!” in Warcraft 3. It is external transdiegetic because it provides information relevant to player action, but is not produced by anyone within the gameworld. When the avatar in Diablo 2 (Blizzard, 1998) claims “I’m overburdened”, however, I called the sound internal transdiegetic because the avatar as a character existing in the gameworld communicates to the player situated in an external position. The strengths of transdiegetic as concept are that it emphasizes the functional role of the sound in relation to player action in the gameworld, and it points out that the spatial origin of the sound is often relative. It is also able to describe all game sounds by using the same framework. However, it is confusing that it is based on the term diegesis, which creates connotations to the mechanisms of narratology and storytelling. Also, the internal and external variations are flawed as they appear to be two variations over the same theme, while in reality they are not. While internal transdiegetic sounds can easily be interpreted as abstractions of “diegetic” sounds since they are partly integrated into the game environment, external transdiegetic sounds are externally situated but with clear impact on the game environment. Inger Ekman’s approach to game sound (2005) is closely related to that of transdiegetic sounds. Common to Ekman’s and my account is the idea that the space of the gameworld is not absolute, and that information is carried across its boundaries. Another common ground is the idea that game sounds are used to integrate the game system into the environment in which it is set. From a semiotic perspective, she observes that game sounds that traditionally would be labeled diegetic, often have non-diegetic referents, and vice versa. In this respect, computer game sound is not limited to being diegetic or non-diegetic, but creates two additional layers that may be used to integrate non-diegetic elements connected to the game system into the diegetic world of the game. Masking sounds is her term for diegetic sound signals with non-diegetic
85
Time for New Terminology?
referents. Such sounds appear to be produced in the gameworld, while its referent is a mechanic of the game system. An example of a masking sound can be found in World of Warcraft when a monster attacks the avatar preemptively. In such cases, a sound specific for that monster will be heard that signals to the player that the avatar has entered the aggression zone of that monster. This sound is hard to interpret as natural to the world of the game since no animal would signal to its prey that it is about to attack. Being represented by a sound signal with a source in the gameworld, the sound has the ability to mask its origin as a system message by being integrated into the gameworld, and thus becomes situated on the border of what is traditionally seen as the diegesis. Ekman calls a sound symbolic, however, in cases where the signal is non-diegetic and the referent is diegetic. An example of this is adaptive game music that is not produced by a source in the gameworld, but refers to an event in the gameworld, such as is the case when the player suddenly hears the music change when an enemy is about to attack in Dragon Age: Origins. Although Ekman’s model is fruitful in explaining how game sound relates to the traditional film theory understanding of diegetic and non-diegetic sound, it also demonstrates the problematic aspects of applying these concepts to games because game sound in many cases is only partially diegetic. Also, there are many examples of sounds that cannot be fully explained by Ekman’s model. When a voice that apparently belongs to the avatar proclaims that “I’m overburdened” in Diablo II, it is not certain whether signal and referent are diegetic or not. While the signal gives the impression of being diegetic due to the use of the first person personal pronoun and the fact that it is produced by a voice that seems to belong to the avatar, it may also be interpreted as a non-diegetic system sound masked as diegetic since it is unclear who the avatar is talking to (itself or the player?) and since it provides information about the inventory, which is the game system feature that allows the
86
player to collect and store items in the game. This interpretation was suggested by two player respondents in my research on the topic of transdiegetic communication: […] Well, it is the character’s voice saying this. But still I don’t get the feeling that it is the character speaking. It’s like the game narrator’s voice provides the player with a hint that, okay, you should check your inventory. […] (John, (30). Individual interview, Dec 10, 2008.)1 It’s a like some sort of error, or a… if you want to see her as an individual person, it’s really an error. Because then the question is, who is she talking to? […] (Isabel (25). Individual interview, Dec 1, 2008.) While John sees the above sound signal as a system message masked as diegetic, Isabel thinks of it as an error since it is unclear who the avatar is talking to. In this case, the referent is also ambiguous in the same way as it is not clear whether the sound refers to the fact that the avatar is trying to pick up something in the gameworld but fails or to the fact that the inventory is overloaded. Warcraft 3 provides another example. When the player tries to place a new building on an illegal location, a disembodied voiceover says, “Can’t build there!” At first glance, the signal seems to be non-diegetic since there is no character in the gameworld that produces the sounds. However, this is challenged by the fact that the voice and the accent are very similar to the voices of the other units of that race. The referent is even more ambiguous: while the sound refers to an operation that is illegal according to the game system, it also refers to the fact that this specific location in the gameworld has diegetic properties such as trees or existing structures that makes it impossible to build here. As has been demonstrated in the above discussion, the attempts to adapt the concepts diegetic and non-diegetic to game sound point to interesting
Time for New Terminology?
aspects that recognize the specificities of game sound compared to sound in other media. At the same time, however, these attempts also demonstrate that the use of concepts designed to explain traditional media is problematic and confusing. There is a need to invent a terminological apparatus that fully grasps the uniqueness of game sound without trivializing it or confusing it with related, but different, features in other media. However, what the adaptations above have in common, is seeing game sound as qualitatively different from sound in other audio-visual contexts. Specifically, there is a tendency to pay attention to the interactive nature of game sound and to see it as a part of the user interface of the game in that it provides information to the player that helps feedback and control (Saunders & Novak, 2006). These adaptations also suggest that gameworlds operate in a different manner compared to storyworlds. This is particularly evident in Grimshaw’s extended understanding of diegetic sound as all sounds that derive from a gameplay event. In the following I will discuss how the understanding of game sound as interface, and the gameworld as a different construct to traditional diegeses, affects the idea of diegetic sound and I suggest alternative ways of discussing the relationship between the gameworld and game sound.
sOUND AND tHE GAMEWOrLD I have suggested above that diegetic and nondiegetic are problematic in connection with games and game sound because gameworlds are different constructs compared to traditional fictional worlds, or diegeses, and because of the way the players interact with them. In this section I will go into the characteristics of gameworlds, what makes them different from traditional fictional worlds, and what consequences this has for understanding their sound usage. Rune Klevjer rejects using the term diegesis to describe gameworlds due to its link to storytelling,
and argues that gameworlds are radically different from storyworlds because they are worlds designed for playing games. This means they are unified and self-contained wholes, structured as arenas for participation and contest, and are therefore subject to a coherent purpose (Klevjer, 2007, p. 58). Such worlds are created around a different logic than “fictional storyworlds” and, as long as all elements are explained as being parts of the game system, they do not need to be explained as a credible part of a hypothetical world. Espen Aarseth (2008) makes a clear distinction between gameworlds and fictional worlds by stating that the virtual world of World of Warcraft (Blizzard, 2004) is no fictional world but instead “a functional and playable gameworld, built for ease of navigation” (p.118). This is also emphasized in Aarseth (2005) in which he describes the environmental design of Half-Life 2 (Valve, 2004). It is a carefully designed environment with a specific layout that guides the players through specific areas, and limits the freedom of navigation in order to set up the challenges of the game, at the same time as it is given properties that remind one of the physical world in terms of world-representation. I want to follow up on Klevjer’s and Aarseth’s approaches and further point out that gameworlds are universes designed for the purpose of playing games. This means that they are fitted for very specific uses, and their layouts are decided in terms of functionality according to the game system. Environmental features and dungeon layouts are not created randomly but, because of careful design, they are oriented towards a specific gameplay experience. This view will be the starting point for the following discussion that will focus on the functional aspects of gameworlds and sounds connected to it. As we will see, this view of the gameworld is important for understanding how sound is used, and explains why players do not see what I earlier called transdiegetic sounds as interfering. As different constructs compared to traditional fictional worlds, gameworlds operate on other
87
Time for New Terminology?
premises. One characteristic of gameworlds is that they need to have a comprehensive system for player interaction. They need to be able to communicate necessary information about changes in game state and allow the player the necessary degree of control. Many of these interface features, including sounds, are often added to the game as abstractions of specific game mechanics partly integrated into the gameworld and, as that, it is problematic to see them as either diegetic or nondiegetic in traditional terms. Instead of looking at what would be a credible representation of a naturalistic world, we should look at how the gameworld and the game system work to support each other. If the game rules state that monsters growl when attacking, and that individuals respawn with their amour 10% damaged after being killed, this is the premise of the specific gameworld. This is a view that is a familiar one for empirical players. One of the player respondents in my empirical research states it thus: […] In this world, you can define whatever you would like there to be, it doesn’t seem that things are very credible in themselves. Q: So why do we accept it? Because it’s a game. And that is something completely different from a film. (Isabel (25). Individual interview, Dec 01, 2008) Here Isabel emphasizes the idea that gameworlds do not need to be a credible alternative to other fictional worlds, and that game designers can decide what they want to include as existent in their world: Because they are integrated with the game system, gameworlds are necessarily different from fictional worlds, such as films. This interpretation supports Grimshaw’s extended view of what counts as diegetic in computer games, but at the same time it amplifies the problematic aspects of using diegesis as explanatory terminology,
88
since gameworlds functionally are very different from literary or cinematic diegeses. Based on the above, the upholding of the game system by the gameworld also has consequences for the integration and design of sound in games. All game sounds have a function with respect to the gameworld, be it to provide information relevant for gameplay or to provide a specific atmosphere. Specific games and genres use sound in different ways and the degree to which it is incorporated into the gameworld plays an important role for reasons of clarity and consistency and in order to create an immediately understandable relationship between the sound and the gameworld. When designing user interfaces for games, a designer needs to decide how to present information to the player. Central to this is deciding which menus that should allow interaction or not, how and whether the user interface should be integrated into the gameworld, and how sounds and visual elements should work together. Game designers Kevin Saunders and Jeannie Novak (2006) describe two ways of relating the user interface to the gameworld and the gamespace. A dynamic interface supports the idea that all audio-visual aspects of a game should be seen as interface because they all provide the player with some kind of information, and dynamic interfaces are therefore completely incorporated into the gameworld. An example is the way an avatar’s amour and weapons provide information in a massively multiplayer online game (MMO)2 like World of Warcraft: By looking at what gear the opponent has, a player receives vital information about class, level and power of that avatar. A static interface, on the other hand, is an overlay interface that consists of external control elements such as health bar, map, pop-up menus, inventory, action bars and so on. Since user interface and gameworld often tend to merge, making the boundary between gameworld and interface relative (Jørgensen, 2007b, 2008, 2009), the static/dynamic divide should not be seen as absolute, but as a continuum where the interface may be more or less integrated
Time for New Terminology?
into the gameworld. Used as an interface, sound often takes on a relativistic position where it is integrated into the gameworld while remaining part of the game system. Using sound signals that are based on real world sounds, but which have been stylized, user interface designers add sounds that provide the necessary usability information at the same time as ensuring the sounds seem natural to the environment of the game. Ekman’s masking sounds are textbook examples of this. Another example is the response “More work?” by Warcraft 3’s orc peons. As a verbal statement produced by a character in the gameworld, it has a direct link to that gameworld, but at the same time it is an interface sound produced in response to player action. However, the sound is not an actual sound of an event in the gameworld, since it would make little sense if the peon actually were talking to the player.
Gameworld vs. Gamespace So far we have seen that game system information and game user interface features such as sound may be more or less integrated into the gameworld. However, they will also have a specific relationship to the gamespace of a specific game. Looking at this relationship may provide us with clear insights into how gameworlds work compared to diegeses. Gamespace should be understood as the conceptual space in which the game is played (Juul, 2005, p. 167), independent of any possible fictional universe used as a context for it. It is thus the arena on which gameplay takes place, and includes all elements relevant for playing the game. According to the magic circle theory (Huizinga, 1955, p. 10; Salen & Zimmermann, 2004, pp. 94-95) all games are seen as a subset of the real world, delimited by a conceptual boundary that defines what should be understood as part of the game and not. The magic circle is what separates the game from the rest of the world, and defines thus the gamespace (Juul, 2005, pp. 164-167). One may go as far as claim-
ing that all elements affecting gameplay should be counted within in the gamespace, regardless of whether these are part of the original system or design. From this point of view, gamespace seems to be equivalent to Grimshaw’s and Berndt’s understanding of diegesis, since it includes external system features relevant for gameplay, such as voiceovers announcing new players entering the game. Gamespace is therefore also what Droumeva (2011) seems to have in mind when focusing on the importance of live chat and talk that happens during group play. The gamespace is thus separated from the gameworld by including all features that have direct relevance to progress in the gameworld, be it score music signaling approaching enemies or add-on software in World of Warcraft, while the gameworld is the contained universe or environment designed for play in which actions and events take place. In this sense, a static overlay interface of a computer game is part of the gamespace, even though it may not be part of the gameworld, while a dynamic integrated interface would be part of the gameworld. For clarification, take the screenshot from Diablo II in Figure 1 as an illustration. The right half of the screen consisting of inventory, the bottom action bar including health and mana measurements, and the upper left icon of the avatar’s minion are all parts of the overlaid interface. These should not be interpreted as part of the gameworld, which is represented by the virtual environment on the left. The interface features are, however, directly relevant for player progress in the gameworld, and they are also attributes governed by the game system. They must therefore be seen as part of the gamespace; that is, the space of action relevant for the game progression included within the magic circle of the game. Now consider the left side of the screenshot, a screen segment of the gameworld. One interesting feature in this part of the image is the small illuminated icon above the avatar’s head which represents a boost to the avatar’s stamina. In terms of transdiegeticity, I would have explained this feature as internal transdiegetic because, in
89
Time for New Terminology?
Figure 1. Gamespace vs. gameworld. Diablo 2. ©2000 Blizzard Entertainment, Inc. All rights reserved
a traditional sense, it is a feature that seems alien to the diegesis while at the same time it provides information about the gameworld. However, viewing gameworlds as different constructs compared to traditional fictional worlds, the icon is clearly part of the gameworld, since it is not part of the overlay interface, but a feature picked up as the avatar visited a stamina well and which follows the avatar everywhere he walks. Since gameworlds works on other premises than traditional diegeses, players would have no problem accepting that this is part of the gameworld even though the avatar is not aware of it. There is an important direct link between the gamespace and the gameworld which is particularly accentuated by the use of sound. When the player decides to discard an item in the screenshot above, he will use his mouse to drag and drop the item from the inventory on the right to the virtual environment on the left or, in other words, he will move it from the gamespace to the gameworld. The moment he selects the item in the
90
inventory, there will be a short, nondescript click which does not seem to represent any actual sound in the gameworld. However, once he discards it in the gameworld, there is a responsive sound resembling that item being dropped to the ground. If it is a potion, there is a bubbling sound and, if it is a weapon, there is the sound of metal hitting the ground. By being adjusted to the atmosphere of the different spaces, the sound clearly emphasizes which frame it belongs to; there is no doubt, though, that it does move from one to the other. However, how this movement from frame to frame is achieved may vary between games and genres. A first-person shooter like Crysis (Crytek, 2007) that integrates the interface as a HUD3 that is part of the avatar’s suit situates the relationship between gameworld and gamespace somewhat differently from third person perspective avatar-based games. One of the empirical player respondents elaborates:
Time for New Terminology?
I’m absolutely positive to the idea [that the avatar sees the HUD]. It’s presented so that the suit he’s wearing […] in a way provides all the information that you need, through the perspective. And, well, it’s one solution, they probably try to make it an integrated part of this world. (Eric, (26). Individual interview, Nov 28, 2008) Here, even the HUD and overlaid features must be interpreted as part of the gameworld and thus the gameworld and the gamespace overlap each other more or less completely. The reason for this is that the game user interface designers have decided to make the interface part of the avatar’s advanced military suit so that all audio-visual information is provided to the avatar in the same manner as it is provided to the player. While all features are part of the gamespace as long as they are not connected to external menus in which one changes the game settings or starts a new game, they may or may not be connected to the gameworld as well4. If they are, they are typically positioned in the gameworld in the same way as what I earlier called internal transdiegetic features. While not appearing to be native to the gameworld, they are still positioned inside it graphically. They may be placed above the heads of non-playing characters in a way that allows the player to move around it: It will move with the environment, and not with the overlay interface that is tied to the edges of the screen. An example of a corresponding auditory feature, is the “Hi, you’re a tall one!” response from a nonplaying character (NPC) in World of Warcraft. Features I earlier called external transdiegetic, however, are not part of the gameworld, only of the gamespace. They are not integrated into the gameworld but provide information relevant for gameplay. An auditory example of this is music signaling the presence of enemies in The Elder Scrolls III: Morrowind and Dragon Age: Origins. In this section I have argued that sounds have a particular role in connecting the gamespace and the gameworld, making the boundary between the
two more seamless by using interfaces that are integrated into the gameworld in different ways. Since sound is neither tangible nor visible and has a temporary quality, it does not disrupt the sense of a unified space in the same way as alien graphical features would. It therefore seems to be easier to accept the growl of an attacking animal than it is to accept a question mark floating around in thin air. This therefore provides greater potential for designers for manipulating auditory information compared to visual information when creating user interfaces for games. The fact that gameworlds work on other premises compared to traditional fictional worlds is what makes the player accept stylistic and abstract sounds that integrate the game system into the gameworld, but this ability is also part of the reason why gameworlds are accepted as a different constructs compared to the traditional fictional worlds. This discussion also puts emphasis on the argument that talking about diegesis, and thus diegetic and non-diegetic sound, has crucial shortcomings that are avoided if we instead evaluate gamespaces on their own terms by emphasizing how gameworlds differ from other fictional worlds.
sPAtIAL INtEGrAtION OF GAME sOUND If we want to find an alternative model that describes the relative integration of sounds in gameworlds, we need to get away from the biased meaning of diegesis and instead focus on the specificities of game sound. In evaluating the usefulness of the concepts diegetic and nondiegetic in relation to game sound, I have stressed that these do not grasp how sounds are integrated into the gameworld and that they do not emphasize how sounds work as an interface providing action-relevant information to the player. In this section, I will present a game-specific approach to describing game sound that avoids the use of the diegetic/non-diegetic diad. Due to the scope of
91
Time for New Terminology?
this chapter, the model focuses on spatial integration and the difference between gameworlds and storyworlds, but it also reflects awareness of the functional aspects of game sound by looking at it as an interface, and how these aspects transcend the border of the gameworld in a meaningful way. This model puts emphasis on how well a sound is integrated into the gameworld. It builds on and supports existing theories on how we may understand gameworlds, game sound and how they work together. Grimshaw’s radical interpretation of diegesis is conserved in emphasizing the distinction between gameworld and gamespace, and we also gain new insight into the functional and integrational aspects of so-called transdiegetic sounds. Also, Galloway’s focus on games as activities is preserved as there is a heavy focus on how sounds affect gameplay in addition to the fact that gameworlds are games intended for play. Last but not least, the model avoids all confusion connected to the usage of terminology connected to the diegesis. This approach will be described in detail below. In pointing out that game sounds should be seen as an interface, it places emphasis on the usability aspects of sound in the sense that it provides information to the player such as warnings and responses as well as information relevant to game control, identification, and orientation. See Table 1. This interpretation of sound’s integration into the gameworld is based in Saunders & Novak’s separation of static and dynamic interfaces, but I believe it is more fruitful and more correct to see this separation not as a binary divide but as a continuum that integrates user interface elements
into the gameworld to a lesser or greater degree. Moreover, since sound is part of a game’s user interface, it is also possible to locate different sounds on the same continuum. In the table above, I have identified five points on this continuum where sound signals tend to be located in modern computer games. All categories have a certain degree of integration into the gameworld, with the exception of the first group which is the only one that is not part of the gameworld. I call this group metaphorical interface sounds since they are not “naturally” produced by the game universe but have a more external relationship to the gameworld, even though they also have a metaphorical similarity (Keller and Stevens, 2004) to the atmosphere and the events in it. The enemy music found in Dragon Age: Origins and The Elder Scrolls III: Morrowind are typical examples of these kinds of sounds, which are usually systemgenerated and may provide orientating and identifying information as well as working proactively as a warning to the player. The remaining four categories are all integrated into the gameworld in different ways and to different degrees. Overlay interface sounds have the same relationship to the game as Saunders & Novak’s static user interface when it is added as an overlay. These sounds are directly connected to the overlay menus, maps and action bars, and are typically generated by the player in response to his commands. These are found in most game genres but are in particular common to interfaceheavy genres like real-time strategy games. The example above is from Command & Conquer 3: Tiberium Wars (EA LA, 2007), where the player
Table 1. Game sound and world integration Metaphorical interface
Dragon Age: Origins: Enemy music
Overlay interface
C&C3: Mouseclick when selecting actions
Integrated interface
Diablo 2: Sound following boost
Emphasized interface
WoW: “Hi, you’re a tall one!”
Iconic interface
Crysis: Avatar moans when injured
92
Time for New Terminology?
typically hears the generic sound of a mouseclick every time he selects an action from any of the menus. Integrated interface sounds are typically related to user interface elements that have been placed into the gameworld, such as exclamation marks and the icons above the heads of characters. The sound played as the avatar gets a boost to stamina in Diablo 2 is a typical example of this and it is a system-generated sound that works as a notifier that also identifies the boost in question. Emphasized interface sounds have a somewhat different relationship to the gameworld as they often appear to be generated by friendly NPCs in the gameworld. An example is the lines spoken by NPCs in World of Warcraft in response to player targeting: When the goblin merchant says “Hi, you’re a tall one!” This is a sound that appears to be diegetic in the traditional sense of the term since it is something a character in the gameworld actually says, but it is in fact a system-generated sound that has been stylized and fitted into the gameworld. Iconic interface sounds, however, are completely integrated into the gameworld and correspond to Saunders and Novak’s dynamic user interface features. In terms of film theory, these sounds would be labeled diegetic as they seem to belong naturally to the universe in which they are in. They can have any kind of generator and may provide any kind of usability information. An example of an iconic interface sound is heard when the avatar moans because he is injured in Crysis. While this model is limited to solely taking into account spatial integration of game sound, it is fully compatible with my earlier models describing the usability value of a sound (Jørgensen, 2007b) and what generates a sound (Jørgensen, 2008). When combining these functions, we may study game sound along several dimensions that grasp usability on a more general level by identifying whether a sound provides responsive or urgent information and whether it is related to control functions, orientation or identification. Such a combination would be able to dive into the gameworld describing what event generates
a sound and identifying what that event means for the player’s state. Last but not least, it would take into account how the sound is integrated into the gameworld. Combined, the models will form a comprehensive and detailed analytical tool that describes all gameplay related sounds in computer games, without creating the confusing association to traditional diegeses.
cONcLUsION When sounds work functionally in the sense of providing gameplay-relevant information to the player, it must be seen as part of the user interface of a game. In this respect, we need to acknowledge its status as such and use an approach that allows us to describe it in terms of an interface. However, the traditional distinction between diegetic and non-diegetic is not based on participatory use and does not allow us to describe game sound in this way. This article presents a game-oriented alternative to diegetic and non-diegetic that takes into account spatial integration of sounds from a gameplay perspective. The model is also compatible with earlier models characterizing game sound (Jørgensen, 2007a, 2008, 2009) and together they form a framework that allows us to describe the interface aspects of computer game sounds while also paying equal attention to its relationship to the gameworld as an environment that reminds of those of fiction but instead is built on game rules. While this chapter argues for substitution of the terms diegesis, diegetic and non-diegetic when discussing sound in games, it should be stressed that these terms may be fruitful in some respects. They may be used when a scholar wants to compare computer games and game sound with other media and they may also be used the way this chapter does; to show why they are problematic. From these perspectives, Galloway’s, Ekman’s, Grimshaw’s and Jørgensen’s earlier work on the subject are important contributions that are especially fruitful for those seeking to
93
Time for New Terminology?
understand how game sound and gameworlds differ from other media. It is, however, important to emphasize the fact that spatiality in computer games operates on very different premises than in film, for instance, and that we talk about a different relationship between sound and environment compared to the traditional separation between diegetic and non-diegetic. A crucial difference is that gameworlds are different constructs from traditional fictional worlds and this must be taken into consideration when discussing the origin of sounds and other features. It is important to note that the model presented here is not limited to the study of game sound but that it may be used to analyze all interface-related features of a computer game. However, sound is particularly interesting because of its seamless integration and its ability to remain non-intrusive even when it tends to break with the conventions of the gameworld. It should also be mentioned that the framework is supposed to work as a tool to help us better understand how game sound and other game features operate, and as such, it will always be subject to modification.
Berndt, A. (2011). Diegetic music: New interactive experiences . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global.
AcKNOWLEDGMENt
Command & conquer 3: Tiberium wars. (2007). EA Games.
Thanks to Jesper Juul, Matthew Weise, Mark Grimshaw and the anonymous review committee for comments.
Crysis. (2007). EA Games, Crytek.
Bordwell, D. (1986). Narration in the fiction film. London: Routledge. Bordwell, D., & Thompson, K. (1997). Film art. An introduction to film theory. New York: MacGraw-Hill. Branigan, E. (1992). Narrative comprehension and film. London: Routledge. Chion, M. (1994). Audio-vision, sound on screen. New York: Columbia University Press. Collins, K. (2007). An introduction to the participatory and non-linear aspects of video games audio . In Hawkins, S., & Richardson, J. (Eds.), Essays on sound and vision. Helsinki: Helsinki University Press. Collins, K. (2008). Game sound: An introduction to the history, theory, and practice of video game music and sound design. Cambridge, MA: MIT Press.
Diablo 2. (2000). Blizzard Entertainment. Dragon age: Origins. (2009). EA Games, Bioware.
rEFErENcEs Aarseth, E. (2005). Doors and perception: Fiction vs. simulation in games. In Proceedings of 6th Digital Arts and Culture Conference 2005. Aarseth, E. (2008). A hollow world: World of Warcraft as spatial practice . In Corneliussen, H., & Rettberg, J. W. (Eds.), Digital culture, play and identity: A World of Warcraft reader. Cambridge, MA: MIT Press.
94
Droumeva, M. (2011). An acoustic communication framework for game sound: Fidelity, verisimilitude, ecology . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Ekman, I. (2005). Meaningful noise: Understanding sound effects in computer games. Paper from DAC 2005. Retrieved January 12, 2009, from http://www.uta.fi/~ie60766/work/DAC2005_Ekman.pdf.
Time for New Terminology?
Electroplankton. (2006). Nintendo, Indies Zero. Galloway, A. R. (2006). Gaming: Essays on algorithmic culture. Electronic mediations (Vol. 18). Minneapolis, London: University of Minnesota Press. Genette, G. (1983). Narrative discourse: An essay in method. Ithaca, NY: Cornell University Press. Gorbman, C. (1987). Unheard melodies? Narrative film music. Bloomington: Indiana University Press. Grimshaw, M. (2008). The acoustic ecology of the first-person shooter. City, Country: VDM Verlag. Grimshaw, M., & Schott, G. (2007). Situating gaming as a sonic experience: The acoustic ecology of first person shooters . In Proceedings of DiGRA 2007. Situated Play. Half-Life 2. (2004). Sierra Entertainment, Valve Corporation. Huiberts, S., & van Tol, R. (2008). IEZA: A framework for game audio. In Gamasutra. Retrieved January 12, 2010, from http://www.gamasutra. com/view/feature/3509/ieza_a_framework_for_ game_audio.php. Huizinga, J. (1955). Homo ludens: A study of the play element in culture. Boston: Beacon Press.
Jørgensen, K. (2008). Audio and gameplay: An analysis of PvP battlegrounds in World of Warcraft. In Gamestudies, 8(2). Retrieved January 12, 2010, from http://gamestudies.org/0802/articles/ jorgensen. Jørgensen, K. (2009). A comprehensive study of sound in computer games. Lewiston, NY: Edwin Mellen Press. Juul, J. (2005). Half-real. Video games between real rules and fictional worlds. Cambridge, MA: MIT Press. Keller, P., & Stevens, C. (2004). Meaning from environmental sounds: Types of signal-referent relations and their effect on recognizing auditory icons. Journal of Experimental Psychology. Applied, 10(1). doi:10.1037/1076-898X.10.1.3 Klevjer, R. (2007). What is the avatar? Fiction and embodiment in avatar-based singleplayer computer games. Unpublished doctoral dissertation. University of Bergen, Norway. Nacke, L., & Grimshaw, M. (2011). Player-game interaction through affective sound . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Rock band. (2007). EA Games.
Jørgensen, K. (2006). On the functional aspects of computer game audio. In Proceedings of the Audio Mostly Conference (pp. 48-52).
Salen, K., & Zimmermann, E. (2004). Rules of play: Game design fundamentals. Cambridge, MA: MIT Press.
Jørgensen, K. (2007a). ‘What are these grunts and growls over there?’ Computer game audio and player action. Unpublished doctoral dissertation. Copenhagen University, Denmark.
Saunders, K., & Novak, J. (2006). Game development essentials: Game interface design. Stamford, CT: Cengage Learning.
Jørgensen, K. (2007b). On transdiegetic sounds in computer games. Northern lights Vol. 5: Digital aesthetics and communication. Intellect Publications.
Stockburger, A. (2003). The game environment from an auditory perspective. In M. Copier & J. Raessens (Eds.), Proceedings of Level Up: Digital Games Research Conference. Tarantino, Q. (1994). Pulp fiction. Miramax.
95
Time for New Terminology?
TheElder Scrolls III: Morrowind. (2002). Bethesda Softworks. Warcraft 3: Reign of chaos. (2002). Blizzard Entertainment. Whalen, Z. (2004). Play along: An approach to video game music. Gamestudies, 4(1). Retrieved January 12, 2010, from http://www.gamestudies. org/0401/whalen. Wilhelmsson, U., & Wallén, J. (2011). A combined model for the structuring of game audio . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. World of Warcraft. (2004). Vivendi Games.
KEY tErMs AND DEFINItIONs Diegesis: Originally referring to pure narrative, or situations in which the author is the communicating agent of a narrative, diegesis was revived in the 1950s to describe the “recounted story” of a film. It is today the accepted term in film theory to refer to the fictional world of the story. Diegetic: That which is part of the depicted fictional world. Diegetic sounds are thus sounds that have a source in the fictional world. Game System: The formal structure of the game consisting of a set of features that affect each other to form a pattern. Includes the rules of a game and the mechanisms that decide how the rules interact. Gamespace: The conceptual space or arena in which a game is played, independent of any possible fictional universe in which it may be set. Gamespace is defined by the magic circle, and includes potentially all elements relevant for playing, regardless of whether they are part of the original system or not. Gameworld: A unified and self-contained universe that is functionally and environmentally
96
designed for the purpose of playing a specific game. Gameworlds are oriented towards a specific gameplay experience and do not need to be explained as a credible part of a hypothetical world. Metaphorical Interface Sounds: Sounds that provide usability information to the player while being placed external to the gameworld. An example is adaptive music which informs the player that an enemy is approaching. Non-Diegetic: That which is external to the fictional world. Non-diegetic sounds are thus sounds represented as coming from a source outside the fictional world. Overlay Interface Sounds: Sounds that are associated with the overlay interface placed as a filter on top of the gameworld. An example is the sound of mouseclicks whenever the player makes a selection from the action bar. Transdiegetic: Transdiegetic features are auditory and visual elements of a computer game which transcend the traditional division between diegetic and non-diegetic by way of merging system information with the gameworld. Transdiegetic features thus create a frame of communication that has usability value at the same time as they are integrated into the represented universe of the game. Integrated Interface Sounds: Sounds that are connected to user interface elements that have been placed inside the gameworld for usability purposes. An example is system-generated sounds that follow the player’s collecting of coins, boosts or other prizes. Emphasized Interface Sounds: Sounds that have been stylized and fitted into the gameworld while also remaining clear system-generated features. Examples are the auditory responses from units being selected in strategy games. Iconic Interface Sounds: System-generated sounds that are completely integrated into the gameworld as if they were natural to that universe. An example is the sound of weapon use in a game.
Time for New Terminology?
ENDNOtEs 1
2
3
All quotes are originally in Norwegian, and have been translated by the author. MMO is short for Massively Multiplayer Online games. These are games in which thousands of players play together on online servers. Originally a military technology, HUD is short for heads-up display which is “an electronic display of instrument data projected at eye level so that a driver or pilot sees it without looking away from the road or course” (Random House Dictionary, 2009).
4
As the formal structure of the game, the game system seems to lie somewhere in between the gamespace and the gameworld. While talk between players during group play in the same physical space would be part of the gamespace, this kind of communication is not an actual part of the formal game system. However, so-called external transdiegetic features such as music signalling incoming enemies, are clearly part of the game system even though they are not part of the gameworld.
97
98
Chapter 6
A Combined Model for the Structuring of Computer Game Audio Ulf Wilhelmsson University of Skövde, Sweden Jacob Wallén Freelance Game Audio Designer, Sweden
AbstrAct This chapter presents a model for the structuring of computer game audio building on the IEZA-framework (Huiberts & van Tol, 2008), Murch’s (1998) conceptual model for the production of film sound, and the affordance theory put forth by Gibson (1977/1986). This model makes it possible to plan the audio layering of computer games in terms of the relationship between encoded and embodied sounds, cognitive load, the functionality of the sounds in computer games, the relative loudness between sounds, and the dominant frequency range of all the different sounds. The chapter uses the combined model to provide exemplifying analyses of three computer games—F.E.A.R., Warcraft III, and Legend of Zelda—. Furthermore, the chapter shows how a sound designer can use the suggested model as a production toolset to structure computer game audio from a game design document.
INtrODUctION Computer game audio is an often neglected area when analyzing and producing computer games (Cancellaro, 2006; Childs, 2007; Marks 2001). The same seems to be the case when analyzing or producing movies (Murch, 1998; Thom, 1999;). There is a general lack of functional models, for the analysis as well as the production of computer game audio, even though some good examples DOI: 10.4018/978-1-61692-828-5.ch006
of functional models, such as Sander Huiberts and Richard van Tol’s (2008) IEZA-framework (Figure 1), are available. The IEZA-framework is also discussed in Droumeva (2011). In this chapter, we use the IEZA-framework in combination with Walter Murch’s (1998) conceptual model for film sound (Figure 2). Why combine these two different areas, that is, a model concerned with computer game audio and another with film sound? As Huiberts and van Tol (2008) have noted, film sound is a “field of knowledge that is closely related to game audio”
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Combined Model for the Structuring of Computer Game Audio
Figure 1. Huiberts and van Tol’s IEZA-framework for the analysis and production of computer game audio into which we have added frames for the different categories. Adapted from Huiberts and van Tol (2008)
(p. 2). Although these two areas are related and do share some common ground, they are also quite different in many ways. It is striking that when we think about games we use the term audio, yet when we think about film we seem to primarily use the term sound. In our opinion, there is a difference between these 2; audio is a more technology-based term than sound. A sound is something you hear which in turn leads to listening, while audio is something that precedes sound but with stronger technological connotations as a term. Film sound is, as Murch notes, normally composites of sound in several layers, an assertion which precedes a more thorough discussion of this model (Figure 2). Murch concludes that we may be wise to limit those layers to a total of 5 different ones simultaneously played back on the sound track of a movie. A common method of separating the different parts of film sound is a typology consisting of 3 separate categories: speech, effects, and music (Bordwell & Thompson, 2001; Sobchack & Sobchack, 1980). This typology is originally based on the technology of early sound films and its 3 tracks, constituting a practice-oriented separation of sound into different categories. It also corresponds well to Murch’s
(1998) conceptual color model (Figure 2), which spans from language that clearly has relations to speech (encoded) via effects to music (embodied). With such a typology rather clearly differentiating the 3 basic entities of film sound from each other, we might jump to the conclusion that film sound is fairly easy to create and that computer game audio could be modeled, more or less, on the practice and theories of film sound. Since we only have 3 basic categories of sounds that can be used and combined to create a sonic environment, how hard can it really be? However, film sound is more complex than this initial typology suggests and we address this in this chapter. Furthermore, computer game audio works under quite different conditions than film sound does: film sound is fixed, stored and played linearly. This does not, however, mean that sound in movies needs to be synchronous with the visual, since it might be narrating at a different level that does not have its basis in the present image (Hug, 2011; Kubelka, 1998; Pudovkin, 1929/1985). Computer game audio, on the other hand, is dynamic and stored as a resource for the player to use in a non-linear fashion. An invariant set-up of sounds is stored in a database, but the use of objects that would
99
A Combined Model for the Structuring of Computer Game Audio
Figure 2. Murch’s conceptual model. Adapted from Murch (1998)
produce these sounds while the game is being played is likely to be highly variable if the game is not to become extremely linear in its progression and very boring to play (see Farnell, 2011; Mullan, 2011 for technologies offering the chance to break from this paradigm). The typology for film sound and its 3 categories can also be compared to how the human auditory system is biased. Humans are biased towards listening for voices (Chion, 1994, p. 6), and towards attempting to interpret voices as words of language, and spoken language is a primary resource for communication. Humans are generally most sensitive in the part of the sound spectrum occupied by the human voice, that is, approximately 150 to 6000 Hz, and especially sensitive within the range of 3000 to 4000 Hz. Spoken language occupies a quite broad part of the sound spectrum in which the threshold is low. In movies and games this part of the sound spectrum is also commonly inhabited by concurrent sounds, such as explosions, music, and so on, which have a natural broad spectrum.1 A voice does not need the same level as a low pitched boom in order to be perceived as having the same loudness. There are a number of reasons as to why some sounds fuse together into one and why, in some cases, they do not. These include frequency, relative amplitude, timbre, onset, amplitude envelope, and sound source location: Sounds that
100
fuse tend to have one or several of these factors within the same, small range. Additionally, one should not forget the active ability of humans to focus on particular sounds to the exclusion of others: what is commonly referred to as the cocktail party effect. In this chapter we can not discuss the whole field of acoustics and psychoacoustics but will need to focus the attention towards a limited number of issues concerning the complexity of sound such as dynamics, relative amplitude, dominant frequencies, and their relation to semantic value. 2 For the time being, we can conclude that if there are many sounds with the same properties, that clarity might then become a problem and, at worst, the mix will become blurred or distorted. Therefore, it can be useful to consider what types of sounds have already been used when designing a sonic environment, for which our model can be a powerful toolset. Will broad dynamics thus create good sound design by itself? The answer is obviously no. If two or more sounds are played simultaneously, they may blend and be heard as one. In theory, the more the sounds differ in a dominant frequency span and relative loudness, the easier it becomes to distinguish them into 2 different sounds with different semantic values. Reality, on the other hand, is not that simple. In games, 2 audio files can typically be played together in innumerable ways and with different timings. Consequently,
A Combined Model for the Structuring of Computer Game Audio
one of the key problems of computer game audio is the loss of control that a sound designer has over the playback of the sound in the gameplay of a complex game. Two or considerably more different (or identical) audio files might be played simultaneously due to gameplay events induced by the player or the game system. This could lead to a chaotic sonic environment, the “logjam” of sound that Murch (1998) describes in relation to film sound (see also Cancellaro, 2006; Childs, 2007; Marks, 2001; Prince, 1996; Wallén, 2008). This “logjam” does not support or enhance gameplay and may also become very tiresome to listen to during even short sessions of play. In order to avoid that sounds lose their definition and thereby their semantic value, a game audio designer needs to plan and structure the game audio as much as possible, which constitutes a major problem. The sound designer can design and deliver the sounds to a game but the player is the one person in control of the play button. The goal for a sound designer should be to retain as much control over the final sonic environment as possible, even though it is hard to define exactly when the sounds are going to be played. Since game sounds are usually loaded on call by certain events in the game, the sounds cannot be edited and mixed in a fashion similar to the mixing of film sound. In other words, the sonic environment has to be spread out beforehand. To avoid a big, undefined wall of sound, the sounds have to be somewhat compatible with each other. One could compare sound layers with a jig-saw puzzle; in order to complete it, each part must fit in with the surrounding parts. If a number of pieces are put onto each other, the parts at the bottom will be covered and not clearly visible. On the other hand, as Chion (1994) has noted, sounds may be superimposed on top of each other without the conceptualization that they stem from different environments (pp. 45--46). The problem is that sounds which are similar will blur into one another. By using the entire dynamic and frequency range, as well as the panorama and distribution of the cognitive
load over the brain, the sonic environment is more likely to be clear and distinct. Perhaps every sound does not have to be as loud as possible (Thom, 1999)? If every sound is evaluated and then given values for a set of variables, such as dynamic range, dominant frequency, and cognitive load, the sonic environment can be easier to visualize. This is what our combined model does. So far we have identified a number of key problems in the analysis and production of computer game audio: • • •
•
•
There is a general lack of functional models for the analysis of computer game audio There is also a general lack of functional models for the production of game audio The loss of control that a sound designer has over the playback of the audio in the gameplay of a complex game may lead to a chaotic blur of sounds which makes them lose their definition and hence their semantic value When two or more sounds play simultaneously, the clarity of the mix depends on the type of sounds, which leads to The nature of the relationship between encoded and embodied sounds. Furthermore:
•
Sound is often an abstract to game designers, graphical artists and programmers, due to a lack of consistent and communicable terminology.
The overall purpose of this chapter is therefore to present a model (Figures 3 to 8) that solves these problems and makes it possible to plan the audio layering of computer games in terms of: • •
The relationship between encoded and embodied sounds Cognitive load
101
A Combined Model for the Structuring of Computer Game Audio
Figure 3. The structure of the combined model of computer game audio (without the addition of specific sounds)
• • •
The functionality of the sounds in computer games The relative loudness between sounds and The dominant frequency range of all the different sounds.
We also try to establish a consistent communicable terminology for the analysis and production of computer game audio. The suggested model (Figure 3) combines Huiberts and van Tol’s (2008) IEZA-framework (Figure 1) with Murch’s (1998) conceptual model (Figure 2). This combined model may be used to plan the sonic environment of computer games in a complete and balanced way, that is, balanced in relation to the sound spectrum available and complete in relation to the visual component of the game. The model constitutes a tool that provides a sonic richness and avoids “the logjam of sounds” (Murch, 1998). It may also be used to
102
analyze computer game audio, and a number of analyses showing the benefits of this are provided. Through the use of this model, a sound designer will be able to clearly understand how different kinds of games emphasize different parts of the audio due to the genre and gameplay principles. This chapter is structured as follows: We first present the two models for game audio that we have tried to combine, starting with the IEZA-framework for computer game audio and proceeding to Murch’s conceptual model for film sound. This is followed by a presentation of the combined model, how it is structured and what kind of problems it can solve. In order to provide a more theoretical approach to the complexity of computer game audio, a discussion concerned with playing computer games and listening to the sounds, which is sustained by a case study, follows. We then provide 3 exemplifying analyses of existing computer games, F.E.A.R. (Monolith Productions, 2005), Warcraft III (Blizzard Enter-
A Combined Model for the Structuring of Computer Game Audio
tainment, 2002) and Legend of Zelda (Nintendo, 1987), to show how the combined model, as an analytical toolset, may be applied. Since this model is also suitable for the production of game audio, we then provide an example of how to actually use it as a production toolset. The final section is a summary of this chapter and our concluding thoughts.
the IEZA-Framework Although we show that sound is closely related to immersion, most literature on game audio does not deal with fundamental questions, such as those related to what game audio really is, what it consists of and what makes it function in games. It is striking that in this emerging field, theory on game audio is still rather scarce. While most literature focuses on the production and implementation of game audio, like recording techniques and programming of sound engines, surprisingly little has been written in the field of ludology about the structure and composition of game audio. (Huiberts and van Tol, 2008, p. 1) Huiberts and van Tol were looking for a functional and coherent framework to use for the study of game audio and examined different categorization methods, from games and films respectively. However, they found that none provided any sensible information about the organization and functionality of the audio. This is a problem since the functionality of sound is essential to computer games. While this model, in its original form, does not specifically discuss the semantics of sounds in a detailed way, our combined model emphasizes this important issue.3 Huiberts and van Tol propose that a more coherent way to categorize the audio in a game should also include the function, role and properties of the different sounds. They therefore developed the IEZA-framework (Figure 1) for the categorization and planning of audio in computer games.
The IEZA-framework consists of 4 categories: 1.
2.
3.
4.
Interface: Sounds related to the game’s interface. Interface sounds are non-diegetic and belong to the game as a system. Effect: Sounds directly or indirectly triggered by the player’s actions. The sounds of effects are diegetic and the result of activity within the game environment. Zone: Sounds related to the game environment. Zone sounds are diegetic and belong to the setting. Affect: Sounds outside the game environment, mainly intended to set the mood. The sounds of affects are non-diegetic and often used to create anticipation.
The 4 categories are divided into 2 axes in a cross pattern: diegetic versus non-diegetic in the vertical axis and activity versus setting in the horizontal axis. The terms diegetic and non-diegetic are also very often used in film theory (Bordwell & Thompson, 1994; Bordwell & Thompson, 2001; Chion, 1994; Wilhelmsson, 2001) and diversify the environment inside the movie/game, that is, the diegesis, versus the system that carries this world inside the movie/game, that is, the nondiegetic (Cunningham, Grout, & Picking, 2011; Jørgensen, 2011). The IEZA-framework makes a clear distinction between the sounds that belong inside the game environment, the Zone and the Effect sounds, and the sounds that belong to the system as such, the Affect and Interface sounds. The horizontal axis places the sounds on a scale of setting versus activity. The Zone and Affect sounds belong to the setting of the game and the Effect and Interface to the activities during gameplay. This is a good starting point for understanding how computer game audio may be categorized in accordance with its functionality within the sonic environment of a specific game. We agree with Huiberts and van Tol (2008) that this structure enables the IEZA-framework to go deeper than other similar frameworks. We have used the IEZA-
103
A Combined Model for the Structuring of Computer Game Audio
framework successfully in game audio courses at the University of Skövde with good and promising results. Nevertheless, we have also realized that the model does not cover all the important issues with regard to creating a sonic richness and at the same time avoiding the smearing of sound all over the sonic environment. The key problem is that the IEZA-framework does not, in itself, produce a visualization of the cognitive load, the relation between the semantic value of different sounds, the relation between encoded and embodied sounds, the dominant frequencies of a sound file or its loudness. Combined with Murch’s (1998) conceptual model the IEZA-framework can be part of a more elaborate tool for the production and the analysis of computer games. We have now covered the first node of our combined model and it is time to take a closer look at the second node: Murch’s conceptual model of film sound.
MUrcH’s cONcEPtUAL MODEL One central point made by Murch in his work on the conceptual model (1998) is that just as audible sound may be placed on a scale ranging from approximately 20 Hz to 20,000 Hz, a sound may also be placed on a conceptual scale from Encoded to Embodied covering a spectrum from speech to music via sound effects in order to avoid a “logjam” of sounds. This dimension of film sound is the reason for our choice of Murch’s conceptual model as the second node of our combined model of computer game audio. The IEZA-framework does not, in itself, categorize the different sounds on a scale from encoded to embodied, and no references to Murch’s conceptual model of film sound are made in Huiberts’ and van Tol’s article (2008).
Example from Murch (1998) 1. 2.
104
Violet – Dialogue Cyan/Green – Linguistic/Rhythmic Effects (e.g. footsteps, door knocks etc)
3. 4. 5.
Yellow – Equally Balanced Effects Orange – Musical Effects (e.g. atmospheric tonalities) Red – Music.
Before addressing Murch’s conceptual model we need to elaborate the statement that humans are biased towards listening for voices (Chion, 1994, p. 6). Chion states that: “Sound in film is, above all, voco- and verbocentric because human beings in their habitual behavior are as well” (p. 6). He suggests 3 different listening modes: causal, semantic and reduced listening. We first listen in order to identify the cause of a sound—causal listening—and, when identified, we listen to find the meaning of the sound—semantic listening. Reduced listening is a special case that is not discussed in this chapter. Therefore, what is Chion’s suggestion about how listening to a cinematic soundtrack works with regard to the 3 different types of sound, that is, speech, effects, and music? If the scene has dialogue, our hearing analyzes the vocal flow into sentences, words-hence, linguistic units. Our perceptual breakdown of noises will proceed by distinguishing sound events, the more easily if there are isolated sounds. For a piece of music we identify the melodies, themes, and rhythmic patterns, to the extent that our musical training permits. In other words, we hear as usual, in units not specific to cinema that depend entirely on the type of sound and the chosen level of listening (semantic, causal, reduced). The same thing obtains if we are obliged to separate out sounds in the superimposition and not in their succession. In order to do so we draw on a multitude of indices and levels of listening: differentiating masses and acoustic qualities, doing causal listening, and so on. (Chion, 1994, p. 45)
A Combined Model for the Structuring of Computer Game Audio
What does this then mean? Voco- and verbocentrism relates to 2 of the 3 listening modes suggested by Chion (1994), that is, causal listening and semantic listening. In Chion’s work, the part on semantic listening is very brief and only discusses semantic value in relation to a code or spoken language. However, it is fruitful to also use this concept in relation to the system of sounds that a given movie or a given game puts forth which may be understood as a semiotic system consisting of sound signs. Within such a system the sounds are part of the communication of the environment. This line of thought is also found in Murch’s conceptual model. According to Murch (1998), the clearest example of encoded sound is speech and the clearest example of embodied sound is music. Furthermore, since the human brain normally divides the processing of sound (and other stimuli) between the left and right side of the brain, we are able to discern 5 different layers of sound simultaneously, if they are evenly spread on the conceptual spectrum from encoded/violet to embodied/red. Murch (1998) provides a number of practice-based examples and problems from his work on the film Apocalypse Now! (Coppola, 1979): […] it appeared to be caused by having six layers of sound, and six layers is essentially the same as sixteen, or sixty: I had passed a threshold beyond which the sounds congeal into a new singularity dense noise in which a fragment or two can perhaps be distinguished, but not the developmental lines of the layers themselves. With six layers, I had achieved Density, but at the expense of Clarity. What I did as a result was to restrict the layers for that section of film to a maximum of five. By luck or by design, I could do this because my sounds were spread evenly across the conceptual spectrum. Murch’s problem in this case concerned the 6 concurrent layers described below:
1. 2. 3. 4. 5. 6.
Dialogue (violet) Small arms fire (blue-green ‘words’ which say “Shot! Shot! Shot!”) Explosions (yellow “kettle drums” with content) Footsteps and miscellaneous (blue to orange) Helicopters (orange music-like drones) Valkyries Music (red).
Something had to be sacrificed whilst maintaining density and clarity and Murch therefore decided to omit the music and have a five layer soundtrack consisting of: 1. 2. 3. 4. 5.
Dialogue (“I’m not going! I’m not going!”) Other voices, shouts, etc. Helicopters AK-47’s and M-16s Mortar fire.
In Murch’s (1998) example, the instances of “small arms fire” are effect sounds with a semantic value that Murch calls “blue-green ‘words’ which say Shot! Shot! Shot!”. Firing a gun in a game would typically result in a direct response soundwise. The player would also probably anticipate such a feedback. In addition, firing a gun would conceptually evoke a sound and some kind of visual response as well. Sound reveals something about the environment. In this case, it signals the presence of guns and a potential danger; it is a sign that denotes clear and present danger. Humans tend to seek meaning and structure in and from the surrounding environment. In this case, we try to identify the source of a sound and what it might mean in the present context. We have used both causal and semantic listening and found a specific sound among others in the concurrent audio layers. The sound as such is dense and clear at the same time since it is carefully planned to occupy a specific frequency range and to develop within a specific part of the dynamic range. Is Murch on the right track with his conceptual model and his conclusion that five concurrent
105
A Combined Model for the Structuring of Computer Game Audio
layers of sound is the maximum for obtaining density and clarity? Murch’s conceptual model corresponds well to Miller’s (1956)The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. As humans, we have limitations with regard to processing data. Miller (1956) discussed this in terms of bits and chunks: If the human observer is a reasonable kind of communication system, then when we increase the amount of input information the transmitted information will increase at first and will eventually level off at some asymptotic value. This asymptotic value we take to be the channel capacity of the observer: it represents the greatest amount of information that he can give us about the stimulus on the basis of an absolute judgment. The channel capacity is the upper limit on the extent to which the observer can match his responses to the stimuli we give him. He used Pollack’s (1952, 1953) work on auditory displays to discuss and explain absolute judgment of unidimensional stimuli which clearly showed the channel capacity for pitch to be “2.5 bits which corresponds to about six equally likely alternatives” (Miller, 1956). It is interesting to note that sound was the focus for this groundbreaking work on the human capacity for processing information. The combination of moving images and sonic environment makes up the setting in which the actions of a movie or a game take place. Watching a movie without any sound added is often somewhat dull. Our experience is that an audience trying to become immersed in old silent movies without any preserved soundtrack grows bored and separated from the events on the screen. This is in accordance with Walter Ong’s (1982/90) remarks on the bipolarity of sight and hearing which we later elaborate in the discussion about playing computer games and listening to the sounds. Adding a musical soundtrack heightens
106
the immersion radically, and even an old film such as The Phantom Chariot (Sjöström, 1921), with a musical soundtrack composed by Matti Bye in 1998, becomes an interesting movie. This clearly exemplifies that sound, even if it is only music, has the effect of including the audience in the environment and that the moving images do not, in themselves, have the same desired effect, which supports Ong’s claim on the bipolarity of vision and hearing. Only music… well, of course, music should not be solely considered as embodied rather than encoded. Music is a plethora of systems. It can be narrative and it contains many cultural dependent codes. However, Murch’s point is that music works rapidly and is usually aimed more at our emotive rather than our intellectual response. There are, of course, differences between individuals. “For a piece of music we identify the melodies, themes, and rhythmic patterns, to the extent that our musical training permits” (Chion, 1994, p. 45). In the case of Bye’s musical score for The Phantom Chariot, much of the music mimics other kinds of sounds, such as the squeaks of the chariot’s wheels, and therefore the music is also clearly semantic. Bye’s music can very well be said to make use of a scale of sounds from encoded to embodied that comprises the soundtrack. Thus far we can conclude that Murch’s model fits well into the idea of an upper limitation of a simultaneous processing capacity that has been thoroughly investigated since at least 1956 and onwards. Furthermore, the conceptual model he suggests makes it possible to consider sound on a scale spanning from encoded to embodied. This in turn implies that if such a scale, spanning from encoded to embodied sound, were to be used in combination with the IEZA-framework, it would be much easier to structure a sonic environment for a computer game. We also have a new set of parameters that add content to the sound categories suggested by the IEZA-framework. This content is the level of meaning a specific sound carries. Meaning, or semantic value, is not only carried by sonic signs such as the spoken words, utter-
A Combined Model for the Structuring of Computer Game Audio
ances and so on of language production, although language and its use is a prototypical example of highly encoded sounds which Murch’s model emphasizes.
tHE cOMbINED MODEL FOr GAME AUDIO This section introduces the different parts of the combined model for the layering of computer game audio. The combined model makes it possible to categorize the different sounds for any part of a game in a number of ways. Such a categorization could span from relative dynamic range (dominant frequency areas, “encoded sound” versus “embodied sound” (Murch, 1998)) to whether a sound belongs to the diegesis of the game, if it is part of the interface, belongs to the activity of playing, or to the setting in general. If, for example, many “encoded sounds” are used, such as spoken language in a game, it is necessary to be attentive to the total sonic environment in which these “encoded sounds” take place and plan for an acoustic niche for the dialogue with few interfering sounds played simultaneously within the same frequency span. If many “embodied sounds” are needed, such as music combined with ambient sounds designating the environment, it will be necessary to make them work together by shaping the sounds to fit and allow each other concurrent presence. As Figures 1 to 3 above show, we have taken the basic differentiation of the game audio divided into Interface, Effect, Zone and Affect sounds from the IEZA-framework. We have also used, from the IEZA-framework, the horizontal axis that differentiates sounds on the setting versus activity scale and the vertical axis that describes sound as diegetic or non-diegetic from the original IEZA-framework. The IEZA-framework is intact within our model; Murch’s conceptual model has, however, been visually adapted. The centre of the circle equates to the left-hand foot of Murch’s arch (violet/encoded) and, moving
away from the centre, Murch’s spectrum is then traversed until the periphery of the circle which equates to red/embodied. The more central a sound is placed, the higher its level of encoding; the more peripheral a sound is the lower its level of encoding. This is a clear difference from both the IEZA-framework and Murch’s conceptual model which do not themselves allow such a visual differentiation in the first case and in the second do not, apart from Effects, place the sounds in specific categories (such as Interface, Zone and Affect) nor place the sound on the vertical axis of diegetic versus non-diegetic or the horizontal axis of setting versus activity. Murch does write about the setting versus activity scale, but his conceptual model does not have a structure that visualizes this aspect clearly. The effect of combining these two models, that is, the IEZAframework and the conceptual model, will make it easier to understand what is happening in the sound environment beforehand, in a more detailed manner. The sound designer does not need to use actual sounds: They may be derived from a game script or story board prior to production.4 If the sound designer has lots of sounds in the centre of the model, she is most likely to produce a cognitive overload for the player because the sounds in the centre are encoded and need more intellectual processing to be meaningful and distinct. Since controlling dominant frequencies is one way to distinguish1sound stream from another, we have chosen to make this quality of sound visible within the model. We have also chosen to use 3 basic primitives, as Figures 4 and 5 illustrates: • • •
A circle = a sound in which the bass frequencies are dominant A square = a sound in which the midrange frequencies are dominant A triangle = a sound in which the treble/ high frequencies are dominant.
These 3 basic primitives were chosen since they seemed natural, but this is not to say that
107
A Combined Model for the Structuring of Computer Game Audio
Figure 4. The different loudness primitives used in the combined model ranging from very low to very loud. Midway should be about normal speech level, assuming that the game’s sound is somewhat dynamic. The scale is, in other words, relative to the specific game’s loudest and quietest sounds
Figure 5. F.E.A.R. analysis example
they would be the natural shapes to represent these qualities: They are more likely the result of a process of cultural connotation than anything else. A bass sound does not have the same sharpness as a midrange sound. It is often hard to hear from
108
what place a bass sound originates, which is why a circle was chosen. However, a midrange sound has a high degree of definition and is distinct, which is the reason a square was chosen. Furthermore, a triangle was chosen for the treble sound, which
A Combined Model for the Structuring of Computer Game Audio
is sharp and often pointy.5 But we also need yet another dimension and that is loudness. These primitives have therefore been made in 5 different sizes designating their relative loudness: A larger primitive represents a louder sound. The model is able to show the following aspects: • • • • • • • • • •
The amount and clustering of encoded sounds The amount and clustering of embodied sounds The amount and clustering of diegetic sounds The amount and clustering of non-diegetic sounds The amount and clustering of Interface sounds The amount and clustering of Effect sounds The amount and clustering of Zone sounds The amount and clustering of Affect sounds The relative loudness between sounds The dominant frequencies in each sound.
The parameters above help the sound designer avoid cognitive overload due to a logjam of sounds. We have now described how the combined model is structured. Before explaining how the sound designer can use it to analyze and/or structure the sonic environments in computer games, we need to elaborate on the discussion of the process of playing computer games and listening to the sonic component of the environment as part of game playing.
Playing computer Games and Listening to the sounds What is game playing? How does a player act and why does she act the way she does? What role does sound have in the playing of a computer game? Can sound be used to manipulate the player into acting in specific ways?
Playing a computer game involves the manipulation of objects within the game environment in a dynamic, sequential flow of events. During play, the player will be processing a lot of data that will need to be made meaningful in order to proceed within the game environment and the game as a system. The player will need to identify the data and turn it into information by categorizing the graphical as well as the sound elements. Objects in a game may be connected to a corresponding sound that carries meaning, that is, there is a semantic level in the sounds and the graphics that is fundamental to gameplay as such. Scripted sounds, or series of sounds, on the other hand, are the result of a player’s position within the environment rather than a specific gameplay action taken by the player. For example, a player reaches a certain point in the game environment and a sound starts. The player has not taken any conscious action and hence does not anticipate any specific feedback. The conceptual spectrum induces different kinds of anticipation. You might very well use scripted sounds to make the player take action, for example, by letting the player walk in a narrow corridor with glass walls on1side. When the player reaches a specific point of observation, you script a sound event to occur (Gibson, 1986) also adding a sudden motion seen through the glass wall. If the gameplay is based on survival/horror, you might induce the player to waste a number of bullets on the supposition that the sound and the motion imply danger is present, that is, you scare her into action through a scripted event. In a film everything is placed in a comparable scripted order when the editing process has been concluded and the film is completed. However, in a computer game, the total environment must support the gameplay. It may contain cut scenes, scripted events as well as those based on player action. In the first case you have the same control as in traditional movies, while in the second you preplan an event to occur at a specific point in the game environment. For the last case, a
109
A Combined Model for the Structuring of Computer Game Audio
number of options are available soundwise that can be supplied to the player through a database. As an example, you can, and probably should, limit the number of weapons accessible to the player. Every single weapon should be discernible from any other weapon through its sound in order to enhance the semantic value. Our point here is that the player is supplied with a number of objects with which to play the game. Many of them produce sound effects within the diegesis of the game, such as shots. As this example shows, the IEZA-framework is useful in this part of the process, discerning what kind of sound belongs where in the game’s structure. You have some, but not total, control over when, why and where the player will use these play objects. In the above, our focus is on the sonic environment of computer games and the problem of balancing the sounds in relation to each other. However, very few games consist of sound alone (notable examples can be found on websites such as http://www.audiogames.net/). What happens, then, when a game consisting of sound and graphical elements is played? What does sound provide to this experience? In the following section we discuss a case study that relates to this issue in general and the use of sound as a means of directing the player in particular. According to Ong (1982/90), vision and hearing have a basic bipolarity. The aim of the following paragraph is to discuss the relation between vision and hearing, as well as Ong’s suggested bipolarity of these two and how this relates to immersion. Vision separates us from the environment, making the limits of our bodily containers protrude, whereas sound integrates us with the environment, blurring the border between the container of the self and the adjacent environment. Ong’s theory, which is concerned with the differences between written and spoken language, might seem odd to use in relation to a model for the analysis and production of computer game audio. Nevertheless, we find his remark about this bipolarity highly relevant with regard
110
to understanding the function that sound has in audiovisual constructions such as computer games. Think about it; in order to fully take in an environment through vision we need to move around and turn our eyes towards what we would like to see (cf. Ong, 1982/90; Gibson, 1986). Ong actually refers to the immersive effect that highfidelity audio reproduction accentuates. Sight is limiting, but hearing is not in the same way. We can hear what is behind us and then turn around to see it. If sound integrates rather than separates us from the surrounding environment, it would seem reasonable that sound and immersion have a strong relationship. Integration through sound might lead to immersion. Hearing is, on the other hand, also a selective process. We may, to some extent, filter out uninteresting and disturbing sounds. A construed audiovisual environment is a prefiltered1into which sound and images have been put through the selective processes of their creators. We therefore discuss hearing, vision, the visual, and affordances (Gibson, 1977, 1986) in relation to the sonic environments of computer games. To some extent, the bipolarity of hearing and vision is innate. Biologically, we develop hearing before we can see. A human fetus can normally perceive sound from week 15 after conception and the ears are usually fully developed by week 24. The fetus is surrounded by amniotic fluid and this underwater environment is an immersive one that completely encloses us. We are immersed and can feel touch from week ten. In fact, one of the primary definitions of immersion used in the context of computer games clearly connects the concept of immersion with being under water (Murray, 1997, pp. 98-99). Hearing the environment precedes seeing it, in terms of how these senses develop from conception, and feeling the environment precedes hearing it. In the womb, movement is restricted and, as newborns, we have no locomotion and must be transported by others. Sight is still limited and objects in the visual field need to be very close to be in sharp focus,
A Combined Model for the Structuring of Computer Game Audio
even if the eyes themselves are more or less fully developed from week twenty five. However, we can hear and recognize sound such as the voices of our parents or melodies from a computer game prior to birth and respond to such auditory stimuli by kicking and moving around. It can be asserted that sound activates the fetus. When we grow up, hearing is still physically affective and may induce both conscious and reflexive physical responses. Rapid and loud sounds may be frightening while slow and soft ones may be relaxing. When listening to the long sequence of the breathing sound in the movie 2001: A Space Odyssey (Kubrick, 1968), it is our experience that it is almost impossible not to fall into the same rhythm and breathe in synchronization with the sound of the film. The bottom line is that hearing includes us in the environment, making us part of it rather than separating us from it. Listening is part of feeling immersed and immersion is a perceptual, body based experience. Central to human perception and cognition is the configuration of the human body and its ability to move around within an environment. This is the basis for a number of theories on embodied and situated cognition, that is, how humans make meaning of the environment in which they are situated. We propose that the organization of sound within computer game environments would benefit from some basic insight into cognitive theory and the idea of basic level primacy (Lakoff, 1987; Lakoff & Johnson, 1999). A central claim in Lakoff and Johnson’s work is that the objects we encounter may be understood from a perspective of the superordinate, that is, basic and subordinate levels of which the basic level is the highest one that provides an object with an overall understandable form and a general user pattern. The concept of a chair is, for instance, at the basic level, whereas furniture is at a superordinate level and a specific red and white chair, made of steel and concrete, is at the subordinate level. The more detailed meaning we can assign to any given object the more subordinate it is. We do not have
a common user pattern for furniture, nor could the whole category be described by 1simple and understandable form. For example, a table and a cupboard are both pieces of furniture but they do not look the same or have the same schemata of use. Although there are good reasons to follow Ong’s idea about the bipolarity of vision and hearing, we argue that sound might also be understood from the perspective suggested by Lakoff and Johnson. Furthermore, we constantly observe the environment through points of observation which include all our senses (Gibson, 1986). The human mind and the human body are not primarily separate units but make up complex systems of which the visual and auditory sensory systems are of great importance for our understanding of the world. The configuration of the human body has effects on human perception as well as human cognition. Gibson (1986) suggests a number of different kinds of vision which are based on the situations they are employed in: • • • •
Snapshot vision: fixating a point and then exposing some other point momentarily Aperture vision: successive scanning of the visual stimuli Ambient vision: looking around by turning the head Ambulatory vision: Looking around by moving towards objects.
In a real time strategy (RTS) game like Warcraft III (Blizzard Entertainment, 2002), the player has a larger visual field than the controlled characters she is commanding, enabling an overview of the diegetic environment using 3 of the 4 different types of vision. She can use snapshot vision to fixate a point, aperture vision to perform successive scanning, and ambulatory vision by moving towards objects. If the controlled avatar is turned around, the visual field, as such, does not rotate. This is the common practice in RTS games. The player may, to some extent, change the angle of the visual field which is also somewhat limited
111
A Combined Model for the Structuring of Computer Game Audio
by a highlighted ring. What lies beyond this ring is not visible. In order to see what is hidden, the player must enforce ambulatory vision, that is, move the controlled characters. In an adventure game such as Legend of Zelda (Nintendo, 1987), the player’s view is locked in a specific angle heightwise. The player needs to move the avatar towards the end of the visual field in order to reveal what lies beyond the framing of the diegetic environment, that is, she can use snapshot vision, aperture vision and ambulatory vision. In F.E.A.R. (Monolith Productions, 2005), which is a first-person shooter (FPS), the player, on the other hand, sees the world through the eyes of the avatar.An immobile observer, who cannot change the viewpoint within the game, can only use either snap shot vision or aperture vision within the visual field of the computer game environment. A game such as Myst (Cyan Worlds, 1993) works this way, as does Pac-Man (Namco, 1980), Space Invaders (Taito, 1978) and many other early games. Hence “The single, frozen field of view provides only impoverished information about the world […] The evidence suggests that visual awareness is in fact panoramic and does in fact persist during long acts of locomotion” (Gibson, 1986, p. 2). All the games mentioned above have sound to make the environment more connected to the act of playing and to achieve a more prominent and lifelike game environment. The sonic environment of Myst is quite elaborate for its time, being distributed on CD-ROM which allows considerably more data than earlier games. In addition, more data capacity also meant comparably high audio resolution, that is, bit depth and sample rate. Pac-Man and Space Invaders used other kinds of technology in their original form, relying on the hardware rather than the software but, when ported to other platforms, such as consoles and PCs, they were kept quite close to the original limits soundwise. If sound integrates us in the environment, as Ong (1982/90) proposes, and if sound and immersion are also related, we might also employ different kinds of listening to make the environ-
112
ment meaningful. As mentioned earlier, Chion (1994) suggests 3 different listening modes: causal, semantic and reduced listening. In addition to these, we suggest the following different kinds of listening when playing a computer game, which are analogous to Gibson’s 4 kinds of vision: •
• •
•
Snapshot listening: fixating a point and then shifting to some other point momentarily by filtering out all other sound sources Aperture listening: successive scanning of the audio stimuli Ambient listening: increasing the frequency range of the sound by turning the body towards its source for higher definition of the sound Ambulatory listening: listening by moving around and using sound as part of the navigation within the environment.
In order to support the idea of our 4 listening modes and their relation to Gibson’s 4 modes of seeing, we briefly present a case study called In the Maze, which was a laboratory-based experiment conducted at the InGaMe Lab at the University of Skövde. This discussion also provides interesting connections to our combined model for game audio. The case study was originally devised to investigate whether or not sound can be said to align with Gibson’s (1977, 1986) ideas of affordances and, if so, whether sound stimuli would make certain locomotive patterns more probable than others. The affordances of the environment are what it offers the animal, what it provides or furnishes, either for good or ill […] I mean by it something that refers to both the environment and the animal in a way no existing term does. It implies the complementary of the animal and the environment. (Gibson, 1986, p. 127)
A Combined Model for the Structuring of Computer Game Audio
The game environment for our experiment was a hexagonal structure comprising a labyrinth of corridors that made it impossible to adopt a strategy based on always going left or always going right because that would only lead the test subject back to the starting point. We also tried to create a consistent environment, that is, a level consistent with reality and including sound that would match the visual environment. There were, however, differences in the sound played at specific parts in the corridors leading to intersections. The test subjects played the game wearing headphones. At certain points in the game’s 4 levels, the game was scripted to play 2 different kinds of sounds in the right and left headphone speakers. The player did not at first know when such a scripted sound would occur but could turn back in the corridor and the sounds would be triggered again. That is to say, the first time a scripted sound played, it was not the result of any conscious game strategy formulated by the player. The difference between the sounds was thatonekind of sound was meant to have the semantic value open and the other closed, at a basic level of categorization. In other words, we tried to propose a certain universal user pattern, to walk towards the open sound rather than the closed one. The basic intention of the test was that if a sound in the right speaker was designed to suggest “closed”, the path to the right would lead to a dead end and vice versa. The only instructions the test subjects received were to play a game. By doing so, we introduced the idea that there ought to be some form of rules and ludus element rather than free playing activity, that is, paidia (Caillois, 1958/1961). The hypothesis was that with only rudimentary instructions the test subjects would need to identify the environment as a maze by exploring it using the 4 different types of vision that Gibson suggests, then devise a strategy for moving through the maze. The idea of our 4 modes of listening was not part of the hypothesis but a result deduced from these tests. We collected several layers and types of data for this test:
• •
•
• •
Video recordings of the gameplay session from the players’ perspective Video recordings of the game player from 3 different angles; face on, from the left side and from above Sound recording of the player while playing, for the purpose of capturing spontaneous comments addressed to the game as a system6 Video and audio recordings of semi-structured interviews after test sessions and Video and audio recording of a replay of each player’s session, in which they were given a chance to freely comment upon their own gameplay.
Several test subjects adopted an audio-based game strategy even if many were not actually aware of it. What we can deduce from the data collected is that many of the test subjects tried to follow sound to reach the end of each level in the labyrinth. We did put in a reversed level to examine whether there actually was audio that mattered with regard to choices. That is, 1of the 4 levels had the sounds that signaled open leading to dead ends and vice versa. This level indicates a tendency that test subjects really followed sounds and were confused when the pattern was changed. The data shows that a perceptual/cognitive set (Bugelski & Alampay, 1961; Wilhelmsson, 2001) may be constructed of audio and visual stimuli and that such a perceptual set may lead to the formation of a strategy to reach the end state of a game. We used the game engine of Half-Life (Valve Corporation, 1998) for our case study test. Some of the test subjects identified the game as a HalfLife level very quickly, which in turn and, due to previous experience of Half-Life, led to immediate speculation about what the game would provide. Half-Life is a FPS game based on the principle of Agôn (Caillois, 1958/1961). Test subjects with previous and deep experience of FPS games therefore presumed they would encounter enemies of some kind and quickly adopted a spe-
113
A Combined Model for the Structuring of Computer Game Audio
cific locomotive pattern. When the game began, they rapidly turned around to get an overview, that is, ambient vision; they also tried to watch their back at times and had a tendency to walk in a criss-cross pattern, indicating ambulatory vision. At times, criss-crossing led them to see more of 1corridor, depending on which side they were walking or running when they reached an intersection. That is, if they were keeping to the right, they would see more of the corridor to the left and, in some cases, they preferred to move towards what they could see. Test subjects who had a great deal of experience with Half-Life or other similar first-person shooters, probably also used snapshot vision to rapidly scan the environment. However, we had no means of measuring this and the only way we can observe the use of snapshot vision from our data is how the centre of the first-person perspective fluctuates. It is probably the case that subjects moving rapidly in the environment need to use snapshot vision due to their velocity but further tests would need to be conducted before jumping to conclusions. Furthermore, inexperienced players tended to always walk in the direction suggested by their starting orientation. The ME-FIRST orientation (Lakoff & Johnson, 1980; Wilhelmsson, 2001), as well as the experience of having a body and moving primarily forwards, overshadowed other possibilities of locomotion: Since people typically function in an upright position, see and move frontward, spend most of their time performing actions, and view themselves as being basically good, we have a basis in our experience for viewing ourselves as more UP than DOWN, more FRONT than BACK, more ACTIVE than PASSIVE, more GOOD than BAD. (Lakoff & Johnson, 1980, p. 132) The Game Ego manifestation (Wilhelmsson, 2001) presented the inexperienced player with a direction for walking or running that was not initially questioned. At the same time, the affordance
114
of the walk-ability is at play: You walk straight on because that is what you do in a corridor. “Moving objects generally receive a FRONT–BACK orientation so that the front is in the direction of motion (or in the canonical direction of motion, so that a car backing up retains its front)” (Lakoff & Johnson, 1980, p. 42). We can also conclude that the culture of play and prior familiarity with this kind of game environment had some influence on how the test subjects tried to move the Game Ego manifestation around within the game environment and not only the visual (and sonic) affordances as such. The results of the case study are, in essence, transferable to the suggested combined model and take into account the relation between vision and sound. In fact, the method used in the case study makes a good integral part of the combined model. Some of the test subjects were affected not only by the ambulatory listening but also by the ambulatory visual position within the game environment. It is important to bear this in mind, since most digital games consist of graphics and sound manifesting some kind of environment. Games are actions undertaken by players and these actions may be induced by sound and/or by sound and graphics in combination. The horizontal axis of the original IEZA-framework is the one that categorizes the game audio in terms of setting versus activity and here we have a clear connection between our test and the combined model. The case study provides the material for the analysis of the sound and image relation and the effect of locomotion, that is, it stresses the horizontal axis of the IEZA-framework that differentiates setting and action. The vertical axis of the IEZA-framework categorizes the game audio in terms of diegetic versus non-diegetic. The dynamics of the audiovisual environment and Ong’s (1982/90) suggested bipolarity of vision and hearing can be understood at a deeper level using the combined model that takes into account both the cognitive load on the subject playing the game and the moving around within
A Combined Model for the Structuring of Computer Game Audio
Figure 6. Warcraft III analysis example. Numbers are collected from Table 2 as an imaginable snapshot of sounds. The combined model visualizes the possible cognitive load in the audio layering. The more central a sound is placed, the higher its level of encoding; the more peripheral a sound is, the lower its level of encoding
the game environment, exploring its possibilities in a dynamic flow (Figures 3, 5, 6, 7, and 8).
UsING tHE cOMbINED MODEL tO ANALYZE cOMPUtEr GAMEs In the following section, we provide 3 sample analyses using the combined model. The games used for these analyses are: F.E.A.R. (Monolith Productions, 2005); Warcraft III (Blizzard Entertainment, 2002); and Legend of Zelda (Nintendo, 1987). The aim of the analyses was to find each sound source possible within these games and then estimate the properties of the sounds. Not every single sound is included as it would have taken too long to find them all. Thus, for example, sound 25 in Table 2 “Hero uses magic” represents every sound created when a Hero in the game uses some kind of magic spell. In this way, the results are applicable to the combined model and are presented both as tables (Tables 1, 2, and 3)
and as snapshots of the audio layering within the combined model (Figures 5, 6, and 7). While employing the combined model, we found that the internal emphasis on the different parts of the IEZA-framework vary between different types of games, due to limitations of technology and genre conventions, which in turn depends on the gameplay and the relationship between player and game environment. For example, a first-person shooter or a shoot’em up game will emphasize the Effect sounds, and have fewer Zone sounds. A typical example of this is F.E.A.R. (Monolith Productions, 2005). There are indeed a lot of yellow effect sounds in the game F.E.A.R. (2005). Luckily, not all of these sounds are always played simultaneously. However, sometimes, many are played together, which results in a big wall of sound that is hard to make sense of. This can be used to emphasize chaos and, in the context of an action game, it might well be useful. Nevertheless, that would be its only use. With regard to the analysis, F.E.A.R.
115
A Combined Model for the Structuring of Computer Game Audio
Figure 7. Imaginable snapshot of sounds in Legend of Zelda
also has few interface sounds, probably due to the minimalistic graphical interface. Warcraft III (2002) was chosen for analysis on the basis of the personal pre-understanding that its audio is well balanced and also complete in relation to the visual component of the game. In Figure 6, we used the sounds from Warcraft III to exemplify how to apply the combined model for the analysis of a specific game. Numbers are collected from Table 1 as a possible snapshot of sounds. The combined model visualizes the possible cognitive load in the audio layering. In addition, the model visualizes that there are 3 bass, 4 midrange, and 2 treble sounds. If there are too many sounds in the centre of the model, the cognitive load, in terms of encoded sounds, is higher. Not surprisingly, the midrange sounds are cyan (1 = Cutting wood) and violet (32 = “Our Goldmine has collapsed”). Numbers 12, 17 and 18 are sound events connected to Effect and Activity
116
and all are diegetic sounds. Sound 21 (Crickets) belongs to the Zone and is an ambient sound with a lot of treble. Sound 32 (“Our goldmine has collapsed”) originates from the non-diegetic Affect part of the sonic environment as does sound 43 which is the background music. Our analysis of Legend of Zelda (Nintendo, 1987) was mainly due to curiosity about how earlier games differ in the distribution of audio categories in relation to IEZA and Murch’s conceptual model. The Loudness and Frequency parameters were omitted in the analysis of Legend of Zelda. Due to the technical nature of the system in which Legend of Zelda is played, the number of possible simultaneous sounds is limited to only a few. The dynamic range is also very narrow, therefore all the shapes are sparse and equally sized. The system, as such, does not support spoken language, which is why there are no encoded sounds.
A Combined Model for the Structuring of Computer Game Audio
Table 1. F.E.A.R. analysis F.E.A.R. (2005) Sound Event
State
Diegetic?
IEZA
Color
Origin
Loudness
Frequency Band
1
Weapon reload
In-game
Yes
Effect
Yellow
Character
3
Middle
2
Clothes sound
In-game
Yes
Effect
Yellow
Character
2
Middle
3
Player footsteps
In-game
Yes
Effect
Cyan
Character
2
Middle
4
Enemy footsteps
In-game
Yes
Effect
Cyan
Character
2
Middle
5
Landing after jump
In-game
Yes
Effect
Yellow
Character
3
Middle
6
In-game music
In-game
No
Effect
Red
“Orchestra”
2
Low
7
Enemy gunfire
In-game
Yes
Effect
Cyan
Character
4
Middle
8
Fire gun
In-game
Yes
Effect
Cyan
Character
4
Middle
9
Glass shatter
In-game
Yes
Zone
Yellow
Object
2
High
10
Empty shell bounce
In-game
Yes
Effect
Cyan
Object
2
High
11
Enter slow motion mode
In-game
No
Affect
Orange
“Narrator”
2
Low
12
Exit slow motion mode
In-game
No
Affect
Orange
“Narrator”
2
Low
13
Enemy radio chatter
In-game
Yes
Effect
Violet
Character
3
Middle
14
Friendly radio chatter
In-game
Yes
Effect
Violet
Character
3
Middle
15
Radio noise
In-game
Yes
Effect
Orange
Object
2
Middle
16
Throw grenade
In-game
Yes
Effect
Yellow
Character
1
Middle
17
Grenade bouncing
In-game
Yes
Effect
Cyan
Object
2
High
18
Grenade explosion
In-game
Yes
Effect
Yellow
Object
5
Low
19
Enemy talk
In-game
Yes
Effect
Violet
Character
3
Middle
20
Change weapon
In-game
Yes
Effect
Yellow
Object
3
Middle
21
Enemy dies
In-game
Yes
Effect
Violet
Character
3
Middle
22
Breaking environment
In-game
Yes
Zone
Yellow
Object
2
Middle
23
Pause game
In-game
No
Interface
Orange
“Narrator”
1
High
24
Unpause game
In-game
No
Interface
Orange
“Narrator”
1
High
25
Ghost talking
In-game
Yes
Effect
Violet
Character
2
Middle
26
Picking up weapon
In-game
Yes
Effect
Yellow
Object
2
Middle
27
Picking up grenade
In-game
Yes
Effect
Yellow
Object
2
Middle
29
Throw weapon
In-game
Yes
Effect
Yellow
Object
3
Middle
30
Pick up health booster
In-game
No
Interface
Orange
“Narrator”
3
Middle
31
Pick up reflex booster
In-game
No
Interface
Orange
“Narrator”
3
Middle
32
Pick up medkit
In-game
Yes
Effect
Yellow
Object
2
Middle
33
Using medkit
In-game
Yes
Effect
Orange
objetct
2
Middle
28
Menu music
Menu
No
Affect
Red
“Orchestra”
2
Low
34
Menu selection
Menu
No
Interface
Orange
“Narrator”
1
High
35
Menu accept
Menu
No
Interface
Orange
“Narrator”
1
High
36
Menu go back
Menu
No
Interface
Orange
“Narrator”
1
High
117
A Combined Model for the Structuring of Computer Game Audio
Table 2. Warcraft III analysis Warcraft III Sound Event
State
Diegetic?
IEZA
Color
Origin
Loudness
Frequency Band
1
Cutting wood
In-game
Yes
Effect
Cyan
Character
2
Middle
2
“I can´t build there”
In-game
Yes
Effect
Violet
Character
3
Middle
3
Insufficient recourses
In-game
Yes
Affect
Violet
“Narrator”
3
Middle
4
”Awaiting order”
In-game
Yes
Effect
Violet
Character
3
Middle
5
“Job´s done”
In-game
Yes
Effect
Violet
Character
3
Middle
7
“Accepting order”
In-game
Yes
Effect
Violet
Character
3
Middle
8
New Unit Available
In-game
Yes
Effect
Violet
Character
3
Middle
9
Unit attack order
In-game
Yes
Effect
Violet
Character
3
Middle
10
Click on building
In-game
Yes
Effect
Yellow
Object
2
Low/Middle
11
Building construction
In-game
Yes
Effect
Yellow
Object
2
Low/Middle
12
Goldmine collapse
In-game
Yes
Effect
Yellow
Object
4
Low/Middle
13
Building collapse
In-game
Yes
Effect
Yellow
Object
4
Low/Middle
14
Building on fire
In-game
Yes
Effect
Yellow
Object
2
Middle
15
Building attacked
In-game
Yes
Effect
Yellow
Object
2
Middle
16
Click on “critter”
In-game
Yes
Effect
Yellow
Character
1
Middle
17
Unit Attacked
In-game
Yes
Effect
Yellow
Character
2
Low
18
Falling tree
In-game
Yes
Effect
Yellow
Object
4
Low
19
Singing birds
In-game
Yes
Zone
Yellow
Ambience
1
High
20
Ambient noise
In-game
Yes
Zone
Yellow
Ambience
1
High
21
Crickets
In-game
Yes
Zone
Yellow
Ambience
1
High
22
Frog
In-game
Yes
Zone
Yellow
Ambience
1
Middle
23
Cicadas
In-game
Yes
Zone
Yellow
Ambience
1
High
24
Owl
In-game
Yes
Zone
Yellow
Ambience
1
Middle
25
Hero uses magic
In-game
Yes
Effect
Orange
Character
2
All
26
Unit constructing
In-game
Yes
Effect
Cyan
Character
2
Short
27
Unit dies
In-game
Yes
Effect
Cyan
Character
3
Middle
28
“Victory”
In-game
No
Affect
Violet
“Narrator”
4
Low/Middle
29
“Defeat”
In-game
No
Affect
Violet
“Narrator”
4
Low/Middle
30
“Research complete”
In-game
No
Affect
Violet
“Narrator”
3
Middle
31
“Upgrade complete”
In-game
No
Affect
Violet
“Narrator”
3
Middle
32
“Our goldmine has collapsed”
In-game
No
Affect
Violet
“Narrator”
3
Middle
33
“Our hero has fallen”
In-game
No
Affect
Violet
“Narrator”
3
Middle
34
“Our forces are under attack”
In-game
No
Affect
Violet
“Narrator”
3
Middle
35
Rooster
In-game
No
Affect
Yellow
“Narrator”
2
Middle
36
Wolf Howl
In-game
No
Affect
Yellow
“Narrator”
2
Low
37
Set rally point
In-game
No
Affect
Yellow
“Narrator”
2
Low
38
Set building spot
In-game
No
Affect
Yellow
“Narrator”
2
Low
continued on following page
118
A Combined Model for the Structuring of Computer Game Audio
Table 2. continued Warcraft III IEZA
Color
Origin
39
Unavailable Sound
Sound Event
In-game
State
No
Diegetic?
Interface
Yellow
“Narrator”
3
Loudness
Middle
Frequency Band
40
Click upper GUI
In-game
No
Interface
Orange
“Narrator”
1
Middle
41
Mini map signal
In-game
No
Affect
Orange
“Narrator”
1
High
42
Click lower GUI
In-game
No
Interface
Orange
“Narrator”
1
Middle
43
Background Music
In-game
No
Affect
Red
“Orchestra”
2
Low/Middle
44
Meteor Falling
Menu
Yes
Effect
Yellow
Object
2
High
45
Meteor Impact
Menu
Yes
Effect
Yellow
Object
3
Low
46
Rain
Menu
Yes
Zone
Yellow
Ambience
1
High
47
Thunder
Menu
Yes
Zone
Yellow
Ambience
2
Low
48
Click Menu
Menu
No
Interface
Orange
“Narrator”
2
Middle
49
Menu switch
Menu
No
Interface
Orange
“Narrator”
2
High
50
Menu Music
Menu
No
Affect
Red
“Orchestra”
2
Low/Middle
The 3 sample analyses clearly show that the emphasis on specific parts of the IEZA-framework shifts depending on what genre and what kind of platform the game belongs to. The snapshots made with the combined model show how sounds are clustered and provide a visualization of the sound layering. Figure 5 illustrates how the sounds of F.E.A.R. are mostly only within one quarter of the model: the diegetic activity quarter. A highly paced game, such as this, would probably have most of its sound in this quarter. Figure 6 shows that Warcraft III has sounds in all 4 quarters, with an emphasis on the diegetic activity quarter, while Figure 7 is an example of an older kind of game in which the technical limitations affect how the sonic environment is structured.
UsING tHE cOMbINED MODEL As A PrODUctION tOOLsEt We have declared that the potential loss of control over the sonic environment while producing a computer game is a problem. How can the combined model solve this?
The combined model above (Figure 3) allows the visualization of the sonic environment of a computer game, in terms of cognitive load (Figures 5, 6, 7 and 8). While producing a sonic environment for a game, the different sounds to be used are first categorized, in accordance with the IEZA-framework, and then placed into Murch’s model, as more or less encoded or embodied, which in combination results in our proposed model (Figures 3 to 8). The combined model visualizes the sonic environment of a given environment in a way that makes it possible to see how sound might be clustered; the closer the sounds are to each other the fewer that can be used if clarity of meaning is wanted, that is, a good level of semantic value. The effect will be that the sound designer can see beforehand whether the sonic environment will be biased towards encoded or embodied sound, providing the opportunity to rebalance accordingly. This also balances the frequency spectrum of the sonic environment and distributes the cognitive load in the brain. Even if all possible combinations of sounds cannot be plotted, a sound strategy for limiting these unwanted effects can be adopted by plotting prototypical game events
119
A Combined Model for the Structuring of Computer Game Audio
Table 3. Legend of Zelda analysis Legend of Zelda Sound
State
Diegetic?
IEZA
Color
Origin
1
Enter cave/walking stairs
In-game
Yes
Effect
Cyan
Character
2
Use sword
In-game
Yes
Effect
Yellow
Character
3
Sword shoots
In-game
Yes
Effect
Yellow
Character
4
Enemy takes damage
In-game
Yes
Effect
Yellow
Character
5
Enemy dies
In-game
Yes
Effect
Yellow
Character
6
Open locked door
In-game
Yes
Effect
Cyan
Object
7
Door shuts
In-game
Yes
Effect
Cyan
Object
8
Boomerang
In-game
Yes
Effect
Cyan
Object
9
Boss sound
In-game
Yes
Effect
Yellow
Character
10
Sword useless
In-game
Yes
Effect
Yellow
Character
11
Place bomb
In-game
Yes
Effect
Yellow
Object
12
Bomb explode
In-game
Yes
Effect
Yellow
Object
13
Waves against shoreline
In-game
Yes
Zone
Yellow
Ambience
14
Background music
In-game
No
Affect
Red
“Orchestra”
15
Letters typing
In-game
No
Effect
Cyan
“Narrator”
16
Pick up consumable
In-game
No
Effect
Orange
“Narrator”
17
Pick up quest item
In-game
No
Effect
Orange
“Narrator”
18
Consumable appear
In-game
No
Effect
Orange
“Narrator”
19
Collect money
In-game
No
Effect
Orange
“Narrator”
20
Dungeon music
In-game
No
Affect
Red
“Orchestra”
21
Key appear
In-game
No
Affect
Orange
“Narrator”
22
Collect key
In-game
No
Effect
Orange
“Narrator”
23
Solve a puzzle
In-game
No
Affect
Orange
“Narrator”
24
Take compass
In-game
No
Effect
Orange
“Narrator”
25
Take Map
In-game
No
Effect
Orange
“Narrator”
26
Low health
In-game
No
Affect
Orange
“Narrator”
27
Player character “dies”
In-game
No
Effect
Orange
“Narrator”
28
Switch item
In-game
No
Interface
Orange
“Narrator”
29
Dungeon Complete Music
In-game
No
Affect
Red
“Orchestra”
30
Menu music
Menu
No
Affect
Red
“Orchestra”
31
Menu selection
Menu
No
Interface
Orange
“Narrator”
32
Chose letter
Menu
No
Interface
Orange
“Narrator”
33
Game over music
Menu
No
Affect
Red
“Orchestra”
in accordance with the central gameplay aspects of a given game and a given game genre. One can also use the different sizes of the primitives, put into the combined model (Figure 8), to show
120
the dynamic range, that is, the relative loudness between sounds. It can at first be difficult to see how to utilize the model practically. The model’s present design may well be refined later, but that
A Combined Model for the Structuring of Computer Game Audio
Figure 8. Shoot the Ducks level 1
does not really matter. This is not just a model but also a kind of paradigm or, in other words, a way of thinking about these matters. The key is to be pro-active with regard to sound design and to plan the distribution of sounds before they are even created. In most audio-editing software, the colors of Murch’s original conceptual-model, and our combined model (Figures 2 and 3), may well be used to designate the status of the sound files as more or less encoded. The music track (affect) could be made red, guns and explosions (effect) would be yellow, the ambient sounds, such as birds and so on, (zone) should be orange, and the dialogue (encoded) shall be blue, which is in accordance with Murch’s (1998) conceptual model. This feature of the color encoding of specific sound events is found in many commercial products and may very well be used in this manner while creating a sonic environment for a game or movie
in order to avoid cognitive overload. In fact, the combined model might, in itself, be used as an interface for audio editing software. What then are the benefits of using our combined model in practice? Let us provide an example of creating the sound design for a simple game. We first present the game’s design document. The aim here is not to create a stunning new best selling game, but rather to exemplify how the combined model may be used to plan the sound of the proposed shoot’em up game on the basis of a design document.
shoot the Ducks Design Document Game Objects The game includes twenty objects: 4 ducks, armor for the ducks for each of the levels from 5 to 8, 2 guns, a pond, and a wall. The wall object
121
A Combined Model for the Structuring of Computer Game Audio
Table 4. The sounds from the Shoot the Ducks design document A sound that is played as background music for instructions A sound that is played to indicate game started A sound that is played as background music for level #1 A sound that is played as background music for level #2 A sound that is played as background music for level #3 A sound that is played as background music for level #4 A sound that is played as background music for level #5 A sound that is played as background music for level #6 A sound that is played as background music for level #7 A sound that is played as background music for level #8 A duck sound that is played while the ducks are swimming A duck chatter sound A duck sound that is played when the duck is hit by a shot. A duck sound that is played when a duck dies A bounce sound that is used when the ducks hit a wall object. A gun handling sound that indicates the change from one gun to another gun click A sound that is played when gun #1 is fired A sound that is played when gun #2 is fired A sound that is played when a wall is hit by a shot A sound that is played as background music end titles A sound that is played to indicate that the player has reached high score A sound that is looped that contains ambience sound A sound that is played to indicate a score change A sound that is played to indicate a change of level
has a grass-like image, while the playing area, the pond is surrounded by the grass-like objects. From levels 4 to 8, the pond has small islands of wall objects behind which the ducks can seek shelter (they are programmed to do so). The ducks swim in the pond which is made of water like images. The only function of the wall object is to stop the ducks from moving out of the pond. The duck object, which has the image of a duck, moves with varying speed. Whenever it hits a wall object it bounces. The player can shoot through the walls on levels 4 to 8, but the effect of the shots decreases. Whenever the player hits a duck the score increases 10 points. The duck’s speed increases slightly when it jumps to a random
122
place. The gun object is placed at the bottom of the screen, where it is fixed and can only move to the left and right in a half circular pattern by keyboard commands (See Controls).
Sounds We use 24 sounds in this game, covering all the categories from the IEZA-framework as well as the span from encoded to embodied sounds (see Table 4). In addition, the sounds are spread across the action versus setting axis and the diegetic versus non-diegetic axis.
A Combined Model for the Structuring of Computer Game Audio
Controls Both mouse and keyboard control the game. The mouse must have left and right buttons and a scroll wheel. In order to aim the gun, A and D on the keyboard are pressed. The A button moves the gun barrel towards the left and D moves the gun barrel towards the right. The gun is fired with a click of the left mouse button, and using the mouse wheel causes a gun change from one to another.
Game Flow The game starts with the instructions, background music #1 plays and the game begins when the player presses the on-screen start button. This shows the room with the swimming ducks. The game ends when the player presses the <Esc> key.
Scores At the start of the game the score is set at 0. The number of hits a duck can take before dying depends on the following factors: • • • • •
Distance from the gun Angle of the shot Area hit by the shot Whether the shot has passed a grass island The armor value of the duck.
Levels The game has 8 levels. The difficulty of the game increases because the initial speed of the ducks increases after each level and they are given armor from levels 4 to 8. The pond also has small islands of grass behind which the ducks can seek shelter from levels 4 to 8. The game ends when the player has killed all the ducks on all the levels or when the player presses <Esc>.
Utilizing the combined Model in a Game Design Document The relationship between game design and sound design obviously depends on the complexity of the game. In this case we mainly focused on how our model can be implemented in the sound design process. We first categorized each sound and determined its position in our combined model (Table 5). Categorizing the sounds in relation to our combined model provides us with a sense of how the sonic environment will be balanced. For the sake of consistency, we chose to use a table that is similar to the analysis examples mentioned earlier (Tables 1, 2 and 3). Furthermore, in order to keep it simple, we clustered all the similar sounds into groups, for example, all the music sounds do not have to be specified as single sound events. They will not be played simultaneously under any circumstances, if the game runs as intended. Instead of 10 music sounds we only added one. If we group similar sound events, we do not have to think about sound variations at this stage. We also used this kind of sound grouping in the analyses mentioned earlier in this chapter. However, you might then ask why the Warcraft III analysis (Table 2 and Figure 6) has a lot of quotations in its table whereas the F.E.A.R. table (Table 1 and Figure 5) does not. This is simply because all the quotations from Warcraft III originate from essential sound events that are important for the gameplay. In the F.E.A.R. analysis, we chose to group the sounds of characters because they have sufficient similarities to constitute 1single group of sounds. By carefully planning the game audio with two of Chion’s suggested listening modes in mind (causal listening and semantic listening), the sound designer can group the different sounds in relation to their cause and meaning. Furthermore, if the sound designer emphasizes the basic level of categorization, the game audio, as such, will suggest what the player is supposed to do or provide feedback about what the player has done.
123
A Combined Model for the Structuring of Computer Game Audio
In the following review of a few sound events, we explain the process using our combined model as a production toolset. We begin with looking at quite a tricky sound event: the sound omitted when a duck is swimming. To simplify matters, we quickly decided to use a swimming sound, some kind of movement through water. It is first necessary to determine if a sound for this event would take place in the gameworld. In other words, should it be considered diegetic or non-diegetic? Indeed, it must be diegetic since the ducks live and move within the game’s environment of which the water is very much a part. After deciding that the sound is diegetic, we looked at where it belongs in the IEZA-framework. With regard to the IEZAframework, we have two options: A diegetic sound can either be a zone or an effect sound. This is where it becomes tricky if we do not pause for a second. The outcome depends on whether the sound is omitted, due to player-induced activity, or if it is an integral part of the game’s setting. It should be remembered that in the IEZA-framework, a sound, in order to qualify as being the result of activity, has to be directly or indirectly triggered by the actions of the player. Since this sound event is not triggered by the player, either directly or indirectly, we categorized it as a zone sound. The presence of the swimming sound is to enhance the environment, as well as sustain the presence of water in the pond and the motion of the ducks. Our last step was to determine the position of the sound in Murch’s conceptual model. The sound has a kind of rhythmic effect which neatly suits the description of the color cyan. The sound of swimming can also have the semantic value of movement, which is the category of cyan sounds. The music sound was the next to be categorized. The design document did not mention any in-game orchestra that plays music and we therefore categorized the sound as non-diegetic. This gave us 2 options: Did the sound depend on activity or setting (in other words, was it an affect or interface sound)? We chose affect. Referring to
124
Murch’s (1998) conceptual model, music sounds are red and embodied. We now categorized an interesting sound event which allowed us to make an active decision to minimize the cognitive load. According to Murch, too much dialogue or, more specifically, too much spoken language which is encoded, will make the sonic environment dense and avoiding an excess of spoken language means keeping the sound design clear. In this trivial example of just a few sounds, a cognitive overload is obviously unlikely, but it is, however, good practice to think about. The sound for a level change could utilize some sort of announcer using a voice to inform the player of the level change. If we, for example, choose to use an encouraging musical effect instead, the result would be quite different. A short encouraging musical effect is non-diegetic, so the sound was either affect or interface. Since it belonged to the game system, it was therefore an interface sound and given the color orange, which is the category for musical effects and embodied sounds. This game had no encoded sounds (violet) because we avoided including dialogue and spoken language (see Table 5). As the snapshot (Figure 8) of the combined model below illustrates, the sounds at this level of the game are spread evenly across the dynamic range and the frequency range of the sonic environment. This example shows how the combined model may function as a production toolset. A sound designer can, of course, make a quick pencil drawing based on the game design document. The use of a computer to begin the planning of the game audio is not needed. Making the model easy to draw with pencil and paper is beneficial for the sound designer. She can be part of the production process as soon as there is a design document or, in fact, in the initial discussions before a design document exists. The purpose of the combined model is to be a rapid but, at the same time, very structured way of planning the sonic environment of a game. Working with the model is meant to be
A Combined Model for the Structuring of Computer Game Audio
Table 5. Shoot the Ducks sounds categorized Shoot the Ducks Sound Event
Comment
Diegetic?
IEZA
Color
Loudness
Frequency Band
Game started
Beep
No
Interface
Orange
3
Middle
Music
Percussion
No
Affect
Red
2
Low
Duck swimming
Swimming sound
Yes
Zone
Cyan
1
Middle
Yes
Effect
Cyan
2
Middle
Yes
Effect
Yellow
3
Middle
Yes
Effect
Yellow
4
Middle
Duck chatter Duck hit
Duck ”scream”
Duck dies Duck bounce off wall
Cartoony bounce
Yes
Zone
Orange
2
Low
Change weapon
Click
Yes
Effect
Yellow
2
Low
Gun fired
Yes
Effect
Yellow
5
Low
Bullet through wall
Yes
Effect
Yellow
2
Middle
Ambience
Tree whisper and birds
Yes
Zone
Yellow
1
High
Score change
Beep
No
Interface
Orange
3
High
Level change
Short musical effect
No
Interface
Orange
3
Middle
As Table 5 above indicates, there is an emphasis on the sounds of effects in the game, which is due to the shooter genre as such. A shooter needs the sounds of effects as the main audio feedback for the player. After all, as a shooter you are supposed to shoot things. You would also expect sounds from weapons as well as those designating hits or missed shots, the handling of different weapons and maybe some big explosions. The internal order of the IEZA-framework would rather be EZIA in this example of Shoot the Ducks.
an easy process that leads to thinking about sound in a diversified way, providing density and clarity, avoiding a logjam of sound, unwanted sonic artifacts as well as a clear cut visualization that is communicable to other members of a game development team. We have deliberately tried to minimize the terminology within the combined model in order to make it comprehensible for team members other than the sound designer.
cONcLUsION As this chapter has shown, audio in computer games is a complex matter the understanding of which could be made easier using the suggested combined model for game audio. A summary of the problems addressed in this chapter and our solutions to these problems follow.
•
The general lack of functional models for analyzing computer game audio
In order to solve the first problem, we have provided a functional model for the analysis of existing sonic environments in computer games and movies. The combined model covers the dynamic range of the sounds in relation to each other and the frequency range occupied by the different sounds. Encoded sounds, primarily speech, have a natural position in the human frequency response curve. Our work has been anchored in theories spanning from linguistics to semiotics and from film theory to theories of cognition. We have found that different genres of games have different emphases on which types of sound dominate the sonic environment. This might be caused by the choice of technology or by the genre as such.
125
A Combined Model for the Structuring of Computer Game Audio
A shooter needs more effect sounds than a role playing game, for example. We have briefly presented a case study, In The Maze, to show how sound might be part of a game playing strategy, despite the results of the case study also supporting the idea that prior experience and anticipation, concerning the game’s content, affect the locomotive patterns in a game, as also the visual field does. Ong’s statement that sight separates us whereas sound integrates us with the environment has been discussed and put in relation to the models for sound suggested by Huiberts and van Tol as well as by Murch. •
The general lack of functional models for the production of game audio
In this chapter, we have attempted to put forth a model for the production of the sonic environments of computer games. We have shown how the sonic environment of computer games (and movies) may be planned to avoid cognitive overload as well as unwanted interference, by using a model that combines Huiberts and van Tol’s (2008) IEZA-framework for computer game audio and Murch’s (1998) conceptual model for film sound. The loss of control a sound designer has over the playback of the audio in the gameplay of a complex game may lead to a chaotic blur of sounds causing them to lose their definition and thereby their semantic value. •
•
When 2 or more sounds are played simultaneously, the clarity of the mix depends on the type of sounds, which leads to The nature of the relationship between encoded and embodied sounds
The sound designer has some, though limited, control of the sonic environment in a game. To avoid a blurred sonic environment, it will be necessary to define the sound as much as possible. The combined model gives the sound designer an overview of the sonic environment that structures
126
the sound to avoid cognitive overload, supports density and clarity, diversifies the sounds in the 4 basic categories of Interface, Effect, Zone, and Affect sounds, as well as the setting versus activity axis. It also allows the sound designer to distinguish between diegetic and non-diegetic sounds as well as embodied versus encoded ones. The structure of the combined model provides an overview that enables the clustering of encoded and embodied sounds to be visualized in order to help the sound designer plan the production. Furthermore, the combined model establishes a common ground of terminology that is communicable in a dialogue between the sound designer, the game designer, the game writer, the graphical artist and the programmer. The combined model will need further refinement and might have the potential to function as an interface for sound design software. However, even a small step for a sound designer, such as this, might serve as a good starting point for how to plan and analyze the sonic environments of computer games.
rEFErENcEs Bordwell, D., & Thompson, K. (1994). Film history: An introduction. New York: McGraw-Hill. Bordwell, D., & Thompson, K. (2001). Film art: An introduction. New York: McGraw-Hill. Bugelski, B. R., & Alampay, D. A. (1961). The role of frequency in developing perceptual sets. Canadian Journal of Psychology, 15(4), 201–211. doi:10.1037/h0083443 Cancellaro, J. (2006). Exploring sound design for interactive media. Clifton Park, NY: Thomson Delmar Learning. Childs, G. W. (2007). Creating music and sound for games. Boston, MA: Thomson Course Technology.
A Combined Model for the Structuring of Computer Game Audio
Chion, M. (1994). Audio-vision: Sound on screen. New York, Colombia: University Press. Coppola, F. F. (Director). (1979). Apocalypse now! [Motion picture]. Hollywood, CA: Paramount Pictures. Cunningham, S., Grout, V., & Picking, R. (2011). Emotion, content and context in sound and music . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Droumeva, M. (2011). An acoustic communication framework for game sound – Fidelity, verisimilitude, ecology . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Ekman, I. (2008). Comment on the IEZA: A framework for game audio. Gamasutra. Retrieved January 13, 2010, from http://www.gamasutra. com/view/feature/3509/ieza_a_framework_for_ game_audio.php Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. F.E.A.R. (2005). Vivendi Universal Games. Monolith Productions. Gibson, J. (1977). The theory of affordances . In Shaw, R. E., & Bransford, J. (Eds.), Perceiving, acting and knowing (pp. ##-##). New Jersey: LEA. Gibson, J. (1986). The ecological approach to visual perception. New Jersey: LEA. Howard, D. M., & Angus, J. (1996). Acoustics and psychoacoustics. Oxford: Focal Press.
Hug, D. (2011). New wine in new skins: Sketching the future of game sound design . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Huiberts, S., & van Tol, R. (2008). IEZA: A framework for game audio. Gamasutra. Retrieved October 13, 2008, from http://www.gamasutra. com/view/feature/3509/ieza_a_framework_for_ game_audio.php Jørgensen, K. (2011). Time for new terminology? Diegetic and non-diegetic sounds in computer games revisited . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Kubelka, P. (1998). Talk on Unsere Afrika Reise. Presented at The School of Sound, London, England. Kubrick, S. (Director). (1968). 2001: A space odyssey [Motion picture]. Location: MetroGoldwyn-Mayer. Lakoff, G. (1987). Women, fire and dangerous things. Chicago: University of Chicago Press. Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: University of Chicago Press. Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh. New York: Basic Books. Legend of Zelda. (1987). Nintendo. Loftus, G. R., & Loftus, E. F. (1983). Mind at play. New York: Basic Books. Marks, A. (2001). The complete guide to game audio. Location: CMP.
127
A Combined Model for the Structuring of Computer Game Audio
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Originally published in The Psychological Review (1956), 63, 81-97. (Reproduced, with the author’s permission, by Stephen Malinowski). Retrieved March 10, 2009, from http://www.musanim.com/miller1956/ Mullan, E. (2011). Physical modelling for sound synthesis . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Pudovkin, V. I. (1985). Asynchronism as a principle of sound film . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice (pp. ##-##). New York: Columbia University Press. (Original work published 1929) Sjöström, V. (1921). The phantom chariot. Svensk Filmindustri. Sobchack, V., & Sobchack, T. (1980). An introduction to film. Boston, MA: Little Brown. Space Invaders. (1978). Taito.
Murch, W. (1998). Dense clarity – Clear density. Retrieved March 10, 2009, from http://www.ps1. org/cut/volume/murch.html
Thom, R. (1999). Designing a movie for sound. Retrieved July 7, 2009, from http://filmsound.org/ articles/designing_for_sound.htm
Murray, J. (1997). Hamlet on the holodeck: The future of narrative in cyberspace. Cambridge, MA: MIT Press.
Valve Corporation. (1998). Half-Life [computer game]. Sierra Entertainment.
Myst. (1993). Brøderbund. Ong, W. (1982/1990). Orality and literacy: The technologizing of the word (L. Fyhr, G.D. Hansson & L. Perme Swedish Trans.). Göteborg, Sweden: Anthropos. Pac-Man. (1980). Namco. Pollack, I. (1952). The information of elementary auditory displays. The Journal of the Acoustical Society of America, 24, 745–749. doi:10.1121/1.1906969 Pollack, I. (1953). The information of elementary auditory displays II. The Journal of the Acoustical Society of America, 25, 765–769. doi:10.1121/1.1907173 Prince, R. (1996). Tricks and techniques for sound effect design. CGDC. Retrieved October 10, 2008, from http://www.gamasutra.com/features/ sound_and_music/081997/sound_effect.htm
128
Wallén, J. (2008). Från smet till klarhet. Unpublished bachelor’s thesis. University of Skövde, Country. Retrieved month day, year, from http://his.diva-portal.org/smash/record. jsf?searchId=1&pid=diva2:2429 Warcraft III. (2002). Blizzard Entertainment. White, G. (2008). Comment on the IEZA: A framework for game audio. Retrieved January 13, 2010, from http://www.gamasutra.com/view/ feature/3509/ieza_a_framework_for_game_audio.php Wilhelmsson, U. (2001). Enacting the point of being. Computer games, interaction and film theory. Unpublished doctoral dissertation. University of Copenhagen, Country.
ADDItIONAL rEADING Adams, E., & Rollings, A. (2007). Game design and development. Saddle River, NJ: PearsonPrentice-Hall.
A Combined Model for the Structuring of Computer Game Audio
Alexander, B. (2005). Audio for games: Planning, process, and production. Berkeley: New Riders. Alexander, L. (2008). Does survival horror really still exist? Retrieved March 12, 2009, http://kotaku.com/5056008/does-survival-horror-reallystill-exist. Branigan, E. (1992). Narrative comprehension and film. London, New York: Routledge. Collins, K. (2007). An introduction to the participatory and non-Linear aspects of video games audio . In Hawkins, S., & Richardson, J. (Eds.), Essays on sound and vision. Helsinki: Helsinki University Press. Cousins, M. (1996). Designing sound for Apocalypse Now. In J. Boorman & W. Donohue, Projections 6: Film-makers on film-making (pp. 149-162). Location: Publisher. Grimshaw, M., & Schott, G. (2007). Situating gaming as a sonic experience: The acoustic ecology of first person shooters, In Proceedings of DiGRA 2007: Situated Play. Huizinga, J. (1955). Homo ludens: A study of the play element in culture. Boston: Beacon Press. Jørgensen, K. (2006). On the functional aspects of computer game audio. In Proceedings of the Audio Mostly Conference. Jørgensen, K. (2007). ‘What are these grunts and growls over there?’ Computer game audio and player action. Unpublished doctoral dissertation. Copenhagen University, Country. Jørgensen, K. (2008). Audio and gameplay: An analysis of PvP pattlegrounds in World of Warcraft. GameStudies, 8(2). Juul, J. (2005). Half-real. Video games between real rules and fictional worlds. Cambridge, MA: MIT Press.
Katz, J. (1997). Walter Murch in conversation with Joy Katz. PARNASSUS Poetry in Review: The Movie Issue, 22, 124-153. Klevjer, R. (2007). What is the avatar? Fiction and embodiment in avatar-based singleplayer computer games. Unpublished doctoral dissertation. University of Bergen, Country. Murch, W. (2000). Stretching sound to help the mind see. Retrieved January 25, 2010, from http:// www.filmsound.org/murch/stretching.htm. Neale, S. (2000). Genre and Hollywood. New York: Routledge. Perron, B. (2004). Sign of a threat: The effects of warning systems in survival horror games. In . Proceedings of COSIGN, 2004, 132–141. Salen, K., & Zimmermann, E. (2004). Rules of play. Game design fundamentals. Cambridge, MA: MIT Press. Stockburger, A. (2003). The game environment from an auditive perspective . In Proceedings of DiGRA 2003. Level Up. Taylor, L. (2005). Toward a spatial practice in video games. Gamology. Retrieved June 23, 2007, from http://www.gameology.org/node/809. Whalen, Z. (2004). Play along: An approach to video game music. GameStudies, 4(1).
KEY tErMs AND DEFINItIONs Affordance Theory: A theory put forth by James J Gibson (1977 and 1986). An affordance is what an environment provides an animal. A path in a wood is “walk-able”, a chair is “sit-able” etc. Ambient Listening: increasing the frequency range of the sound by turning the body towards its source for higher definition of the sound.
129
A Combined Model for the Structuring of Computer Game Audio
Ambulatory Listening: listening by moving around and using sound as part of the navigation within the environment. Aperture Listening: successive scanning of the audio stimuli. Combined Model for the Structuring Computer Game Audio: The model suggested in this chapter that combines the IEZA-framework with Murch’s conceptual model. IEZA-Framework: A framework suggested by Sander Huiberts and Richard van Tol 2008. The IEZA-framework distinguishes between sounds that belong to: the Interface (I), the Effects (E), the Zone (Z) and the Affects (A) in a computer game. Murch’s Conceptual Model: A model for the production of film sound put forth by sound designer Walter Murch 1998. The conceptual framework spans from encoded sound (language) to embodied sound (music). It also suggests that in order to obtain density and clarity of a sound mix the sound designer should limit the amount of sound layers to five separate layers. Snapshot Listening: Fixating a point and then shifting to some other point momentarily by filtering out all other sound sources.
130
ENDNOtEs 1
2
3
4
5
6
In action movies there has been, and still is, a tendency to equate loud with good (Thom, 1999). Violent explosions, big loud weapons and roars of wild engines fill the sonic environment in far too many action movies produced in the last two decades. The interested reader is referred to a good textbook on acoustics and psychoacoustics such as Howard and Angus (1996). The original article on the IEZA-framework has been criticized for not considering previous work (Ekman, 2008; White, 2008). Which is of course the case also with the original IEZA-framework and Murch’s conceptual model but the level of detail is significantly higher in the combined model. Of course further research into this would be necessary to put forth more solid cognitive ground for these primitives but we do believe that they fill their purpose for our combined model. As Loftus &Loftus (1983) have shown, players may occasionally try to talk to the system in acts of personification of the system.
131
Chapter 7
An Acoustic Communication Framework for Game Sound: Fidelity, Verisimilitude, Ecology Milena Droumeva Simon Fraser University, Canada
AbstrAct This chapter explores how notions of fidelity and verisimilitude manifest historically both as global cultural conventions of media and technology, as well as more specifically as design goals in the production of sound in games. By exploring these two perspectives on acoustic realism through the acoustic communication framework with its focus on patterns of listening over time, acoustic communities, and ecology, I hope to offer a model for future theorizing and exploration of game sound and a lens for indepth analysis of specific game titles. As a novel contribution, this chapter offers a set of listening modes that are derived from and describe attentional stances towards historically diverse game soundscapes in the hopes that we may use these to not only identify but also evaluate the relationship between gaming and culture.
INtrODUctION Within game studies—a relatively young discipline itself—the field of game sound has already experienced growth, however there are still scarce resources and analytical frameworks for understanding the role of sound for purposes of cultural critique, historical analysis or crossmedia examination. Frameworks such as the IEZA one (Huiberts & van Tol, 2008; Wilhelmsson & DOI: 10.4018/978-1-61692-828-5.ch007
Wallén, 2011), which builds on several existing design guideline systems for game sound (Ekman, 2005; Grimshaw & Schott, 2007; Jørgensen, 2006; Stockburger, 2007), and particularly Grimshaw’s (2008) conceptualization of an acoustic ecology in first-person shooter games are beginning to pave the way for more in-depth explorations into understanding, analyzing and representing the role of sound in games. In addition to the more established foundations of game sound in music synthesis, algorithmic sound generation, and real-time implementation of
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
An Acoustic Communication Framework for Game Sound
sound effects (Brandon, 2004; Collins, 2007; Friberg & Gärdenfors, 2004; Roeber, Deutschmann, & Masuch, 2006), there is a need for building more general theoretical and analytical frameworks to describe the various elements of game sound and their role within the game’s designed soundscape and its informational ecology. Examples of rich theoretical works on game sound are still few (Collins, 2008; Grimshaw, 2008). I would like to propose a framework for studying game sound that engenders a multi-disciplinary perspective with a specific focus on listening as a dynamically developing, socio-cultural activity influenced by and influencing cultural production and experience. This framework, based on the acoustic communication model developed by Barry Truax (2001) and inspired by R. Murray Schafer (1977) combines media histories with the current technological and cultural reality and takes a critical analytical stance towards discussing the way media shapes our world. Delivering a full history of any game sound predecessors and tracing critical, socio-cultural perspectives of every game genre in existence is not only an ambitious task, but is one that has been done in parts by both scholars and game writers (Collins, 2008; McDonald, 2008). Instead, I will focus on two particular aspects of game sound—fidelity and verisimilitude—and situate them within the interdisciplinary framework of analysis that the acoustic communication model offers. They are two sides of the same idea representing notions of realism or reality in game soundscapes. They reflect long-standing cultural ideals and production values whose histories transgress radio, cinema, and real-world environments. By juxtaposing the two ideas in this manner I hope to elucidate qualities and features of game sound both in a richer way and within a socio-historical discursive context. Fidelity reflects the development of sound in games from a technological perspective while verisimilitude reflects the cultural emergence of authenticity, immersion and suspension of disbelief in cinema,
132
and characterizes the magic flow state in games. Finally, I’d like to connect both these ideas to acoustic ecology and particularly to the concept of acoustic community, which includes the real situation of a player’s own acoustic soundscape in addition to the game’s sonic environment, interlaced in a complex ecology.
tHE AcOUstIc cOMMUNIcAtION MODEL: bAcKGrOUND AND rELEVANcE tO GAME sOUND The concept of acoustic communication articulated by Truax (2001) is a framework that attempts to bring multi-disciplinary perspectives into the study of sound reception as well as sound production and that provides a structure for analyzing and understanding the role of sound in contemporary culture, in media, and in technology. Its roots lie in the tradition of acoustic ecology that was the basis of Schafer’s work in the late 1960s and 1970s: work that is already referenced by several authors (Grimshaw, 2008; Hug, 2011). The following history helps contextualize and focus the particular perspective that acoustic communication has taken on. A pioneer in the field of acoustic ecology, Schafer first defined the notion of a soundscape to mean a holistic system of sound events constituting an acoustic environment and functioning in an ecologically balanced, sustainable way (Schafer, 1977). Born out of the threat of urban noise pollution, Schafer focused on conceptualizing and advocating an ecological balance in the acoustic realm. He developed the terms hi-fi and lo-fi to describe different states of aural stasis in the environment. A hi-fi soundscape, exemplified in Schafer’s view by the natural environment, is one where frequencies occupy their own spectral niches and are heard distinctly, thus creating a high signal-to-noise ratio. A lo-fi soundscape, on the other hand, often exemplified by modern urban city settings, is one where amplified sound,
An Acoustic Communication Framework for Game Sound
traffic, and white noise mask other sound signals and obstruct clear aural communication, creating a low signal-to-noise ratio (Truax, 2001, p. 23). Following Schafer’s work, Truax developed a multi-disciplinary framework for understanding sound based on notions of acoustic ecology as well as communication theory. This framework models sound, listener and environment in a holistic interconnected system, where the soundscape mediates a two-way relationship between listener and environment (Truax, 2001, p. 12). It also places importance on the role of context in the process of listening, emphasizing the listener’s ability to extract meaningful information from the content, qualities, and structure of the sound precisely by situating this process in their knowledge and familiarity with the context and environment (p. 12). Yet Truax also recognizes listening as a product of cultural and technological advances, subject to macro shifts and patterns over time. Such a multi-disciplinary understanding of sound allows us to bring socio-cultural considerations into the soundscape paradigm alongside auditory perception and cognition. Traditional models of auditory perception conceptualize listening as a process of neural transmission of incoming vibrations to the brain (Cook, 1999) that, shaped by our physiology, allows us to experience sound qualities. In fact, as pointed out by Truax (2001) and others, listening is a complex activity involving multi-level and dynamically shifting attention, as well as higher cognitive functions (inevitably dependent on context) such as memory associations, template matching, and foregrounding and backgrounding of sound (p. 11). Again, this model points to the importance of understanding listening as a physiological as well as a cultural and social practice. From a design perspective, it is also imperative to understand that listening is a dynamic and fluid activity that in turn affects the perception and experience of sounds in the acoustic or electroacoustic environment and helps mediate the relationship between actor, activity, context
and environment. Two major classifications of listening are everyday listening as put forward by Gaver (1994, p. 426) —an omni-directional, semi-distracted, adaptive-interactive listening that focuses on immediate information-processing of sound–and analytic listening (Truax, 2001, p. 163) —listening that has attention to detail and which is an expert activity focused on an aesthetic or analytical experience of sound that is rooted in context as its frame of reference for the extraction of information from sound characteristics. Based on the idea of different classifications of listening, Truax developed a number of categories exemplifying major listening modes and processes (pp. 21-27): see Table 1. Clearly, this ontology of listening needs a significant degree of modification in order to fit the complexities of listening in gameplay contexts, and we will continue returning, adding to, and re-conceptualizing the idea of listening positions with regard to game soundscapes. This set of listening types is simply a beginning, allowing us a way to access the historical evolution of listening stances as media, technology, and design have changed. These types of listening, as part of the acoustic communication framework, directly represent macro shifts in the historical and cultural reality of acoustic, electroacoustic, and media listening, and, as an extension, game listening. In analyzing game sound then, this set of listening attentions is to be amended in a similar fashion to uncover and elucidate macro shifts directly procured by the socio-historical experience of sound in games. The notions of fidelity, verisimilitude, and ecology are a particular choice too, yet the concept and drive towards realism is one that I see as not only one aspect of game design and game culture but a more symbolic movement intersecting many media genres and technologies. Rather than simply a design requirement, it is an ideology of contemporary mediated expressions. Examples span from immersive cinematic soundscapes for the big screen and surround sound aesthetics
133
An Acoustic Communication Framework for Game Sound
Table 1. List of Listening Positions from the Acoustic Communication framework (Truax, 2001) Listening Positions
Description
Listening-in-search
Active attentional and purposeful listening, a questing out towards a sound source or soundscape. Sometimes listening-in-search involves a determined seeking of a particular sound template in an aurally busy environment. The cocktail party effect, for example, is a special mode of listening-in-search, which involves a zooming in on a particular sound source—often semantic-based (speech) and familiar in an environment of competing sound information in the same spectrum (Truax, 2001, p. 22).
Listening-in-readiness
Listening-in-readiness involves background listening with an underlying expectation for a particular sound or set of sound signals (such as a baby’s cry). It is a sub-attentional listening in expectation of a familiar sound or signal, a latent alertness.
Background Listening
A non-attentional listening, a receptive stance without a conscious attention or interpretation of sounds or soundscape heard.
Media Listening
An adaptation of media’s flow of perceptual and attentional cues as delivered through sound. Media listening and distracted listening are two positions of listening that Truax (2001) argues are a direct result of the transition to electroacoustic sound and especially the way in which sound has evolved in its use in media. Since much of media is experienced as a background to life, often in the visual background, programming flow has developed sophisticated and strong aural cues in order to manage and direct listeners’ attention to the next item on the media program.
Analytical Listening
A focused, critical expert listening to particular qualities of electroacoustic sounds and recordings.
taking the viewer into a powerful suspension of disbelief, to complete virtual reality, ambient intelligent environments, and computer-augmented physical spaces which have become the norm for contemporary museums and art galleries. There is also the ever-so-popular genre of reality TV, which has reared and acculturated a version of society of the spectacle generation of audiences.
FIDELItY Literally, fidelity means faithfulness. In relation to sound, fidelity signifies the accuracy and quality of sound reproduction, that is, the degree to which an electroacoustic iteration faithfully represents the original acoustic source. From there, the notions of hi-fi (high fidelity) and lo-fi have emerged and are now commonly applied to refer to quality of audio equipment, specific recordings and (cinematic) listening experiences. As noted in the previous section, Schafer (1977) also utilized these two distinctions of fidelity, except he applied them to refer to a soundscape’s ecological balance in terms of a signal-to-noise ratio. In this section, we’ll focus on fidelity as a concept representing
134
the move from abstract musical chiptunes (8-bit synthetic tunes) to realistic sampled sounds in the design of game soundscapes. Fidelity here will exemplify the technological changes in game sound’s realism.
role in Game sound: socio-cultural History In tracing some of the history of game sound, Stephen Deutch (2003) makes a convincing point about the trajectories that sound for games has taken historically. As he points out, the first game sound designers were essentially musicians and/or experimental composers (p. 31). In that, historically there was a split between those who followed Pierre Schaffeur’s musique concrète tradition and those who were interested in electronic music. The second group ended up getting involved in game sound production and laying the foundations of contemporary game sound. The way in which this fact concerns fidelity is that while musique concrète works with sampled sound—that is, real acoustic sources—as material for sonic expressions, electronic musicians were fascinated with the purely abstract world of the synthesizer and
An Acoustic Communication Framework for Game Sound
the completely un-real soundscapes it produced. From here we have the tradition of chiptunes: 8-bit synth tunes encoded directly on the microchip of the game console. Initially, of course, space and memory were some of the pragmatic issues driving the minimalistic and synth-based soundscapes in games. With technological improvements, such constraints are no longer relevant, however the demographic of game sound practitioners still exerts a formative role not on what is possible but on what is realized in game sound today and how associations between sounds and their meanings in a game become forged. As Deutch puts it, even though game sound emulates film sound in its “filmic reality” of representation, it is often too literal—“sound effects as opposed to sound design” (p. 31) —see Figure 1. Invoking what Schafer (1977) might call the listener as composer, many games today utilize adaptive-interactive audio, that is, each player
constructs her own unique soundscape by moving and interacting with their avatar. Yet even then sound effects are “loopy”: they often come from generic sound banks, (see Figure 3) and are exactly the same each time they sound, sometimes getting cut off if the player’s actions are faster than the sound file’s duration. They get called up and filtered according to the spatial/contextual demands of the character’s progression, however, it is only in high-end games, typically in firstperson shooters (FPS) where the richness of a complex soundscape really comes through with 3D audio rendering and spatialization (Grimshaw, 2008) to account for acoustic coloration and atmospheric variables. FPS games afford the player the unique position of literally listening with the character’s ears since the game presupposes the player is that character. Any other POV (point of view) character stance by definition distances the player from the soundscape, making
Figure 1. Note the compressed, repetitive nature of the waveform, reflecting synthetic strings of sounds, often separated by little sine tone clicks and artificial silences
135
An Acoustic Communication Framework for Game Sound
Figure 2. Historical and cross-genre, cross-platform example of game soundscapes. In the first two examples we see a progression from 8-bit sound to polyphonic synthesized sound, while Fallout 3 reflects a 3D spatialized environment of varied and large dynamic range (highs and lows) to avoid masking and maximize clarity; finally God of War features a broadband soundscape where many (high-quality sounds) are mixed in, competing and somewhat masking each other
them more of an audience member as opposed to a true participant in that acoustic ecology. This current model of game sound design has slowly shifted to reflect the interactive, dynamic and personalized nature of game soundscapes, departing from the cinematic tradition and the early game 8-bit sound. It uses sound samples organized
136
in banks that are called up in real-time to be filtered and mixed in as a player progresses through a game, reflecting the quality of a space, sound behaviour and ambience in real time as well. For example, if our avatar is in an ocean setting we will hear waves, wind and seagulls; similarly, if the avatar is moving down a tunnel looking to
An Acoustic Communication Framework for Game Sound
Figure 3. Note the flow of gameplay, comprised of series of loops, varied slightly, however having a uniform attack-sustain pattern thus still sounding “loopy”, and often triggered out of temporal sync, resulting in unrealistic interruptions and overlap. Also, the stereo zoom-in reveals little if any spatialization. Elements that aren’t identified on the diagram are the background music and cave ambiance, as well as a few other uniform sound effects such as footsteps
avoid or preempt enemy attacks, out-of-frame sounds are heard as coming from their respective (implied) locations and from the appropriate distance. There are three ways in which we can examine the shifts of game sound fidelity over time. As pointed out in other game sound histories (Collins, 2008; McDonald, 2008), 8-bit sound from early fantasy and arcade-style games has evolved to polyphonic MIDI orchestrations, higher quality
rendering, and richer textures but with essentially the same melodies and game sound conventions. On the other hand, shifts in interactiveadaptive audio, as a relatively contemporary design standard, are less evident historically, but manifest themselves across different game genres and platforms. For instance, portable platforms feature only a limited sonic variety in representative/environmental sound effects, relying heavily on synthesized polyphonic mixes; more affordable
137
An Acoustic Communication Framework for Game Sound
consoles such as the Gamecube, the Wii, and the PlayStation 2 tend to feature games with more authentic soundscapes and variety, and higher-end consoles such as the PlayStation 3 and Xbox 360 flaunt stellar graphics as well as multi-channel, 3D sound capabilities capable of delivering that precision of spatialization and timbre characteristic of FPS games. Similarly, fantasy and action role playing games (RPGs) such as Final Fantasy, Prince of Persia, Assassin’s Creed 2, and God of War, to mention a few, use limited and uniform sound effects banks to build environments with minimal acoustic properties: even though the audio is less compressed in quality then in their predecessors. Higher-end military, FPS and strategy games such as Hitman, and Metal Gear Solid often combine a rich variety of high-quality sound effects rendered with 3D sound spatialization techniques and sound behaviour physics engines to simulate the temporal and spatial trajectories of competing sonic information in the game space. Finally, fidelity changes in game sound can also be discussed in terms of Schafer’s classifications of hi-fi and lo-fi soundscapes (1977; Truax, 2001, p. 21) reflecting the ecological acoustic balance in a given environment. Quite simply, as game sound has become more complex, richer in textures, and in need of accommodating an ever-expanding variety of alert cues and signals, game soundscapes have become sites for much sonic masking. If we look at Figure 2 we see a transition from a one-track synthesized music model, which lacks authentic fidelity but has little masking; to more complex games where the soundtracks become a constant broadband spectrum of high-quality music, environmental sound effects, alerts and signals, and ambience coloration. However, the newest trend in game sound design (Collins, 2008; Farnell, 2011; Hug, 2011; Phillips, 2009) might be to return to synthesis utilizing much more sophisticated tools - physical modeling and real-time sound synthesis to realistically convey not only every sound occurring
138
in a game but its every unique variation, coloration, temporal and spatial character, in interaction with other sounds within the electroacoustic environment. Such an approach to game sound synthesis would make the game soundscape truly personalized through subtlety and non-repetition, and it would reverse the tendency to use substitute aural objects or sound images from the cinematic tradition, essentially returning game sound to a realistic modelling of acoustic phenomena. However, would such a turn eliminate the necessity for purposeful sound design? Would it make it all about programmatic representation? After all, sound’s role in games is not simply descriptive, one of reflecting reality in a high-fidelity manner, but it is largely about function! Interface sounds, warning sounds, alerts, and musical earcons must continue to be part of this acoustic ecology, subject to issues of acoustic balance, masking and fidelity, as well as the informational ecology of interactive play.
the Listening Experience So what types of listening do these aspects of fidelity foster in game players/listeners? Listening is essentially a particular way of paying attention. Truax (2001) describes this phenomenon in terms of listening positions that we have developed both with regard to everyday listening and when engaging with different forms of media (pp. 19-23). Film theorists such as Chion (1994) and Murch (1995), among others, have already spoken about different listening modes: The one proposed by Chion has also been discussed and augmented by Grimshaw and Schott (2007) in their discussion of FPS games. Tuuri, Mustonen, and Pirhonen (2007) provide a more recent compelling account of listening modes in gameplay, identifying a hierarchical attentional structure of listening. Table 2 attempts to summarize popular notions of listening to game sound and organize them according to existing typologies of game functions (Jorgensen, 2006), attentional positions (Stockburger, 2007)
An Acoustic Communication Framework for Game Sound
Table 2. An attempt at linking attentional and listening positions with game functions and examples of game sound Attentional Position
Foreground
Mid Ground
Background
Game Functions
Listening Position
Examples from Gameplay
Reference Frames
Action-Oriented Functions
Analytical Listening (Truax, 2001) Listening-insearch (Truax, 2001) Semantic Listening (Chion, 1994) Causal Listening (Chion, 1994) Functional, Semantic and Critical Modes of Listening (Tuuri, Mutsonen, & Pirhonen, 2007)
Alerts: notifications, warnings, confirmation and rejection Interface sounds
Trans-diegetic
Orienting Functions Identifying Functions
Media Listening (Truax, 2001) Navigational Listening (Grimshaw & Schott, 2007) Causal & Empathetic Modes of Listening (Tuuri, Mutsonen, & Pirhonen, 2007)
Contextual sound effects Auditory icons Earcons
Diegetic
Atmospheric Functions Control-related functions
Background Listening (Truax, 2001) Reduced Listening (Chion, 1994) Reflexive & Connotative Modes of Listening (Tuuri, Mutsonen, & Pirhonen, 2007)
Musical score Environmental soundscape
Extra-diegetic
and states of diegesis (Chion, 1994; Grimshaw, 2008; Huiberts & van Tol, 2008; Jørgensen, 2006). As a side note, Jørgensen’s (2011) newest work in this book brings an important critique of the very usefulness of discussing game sound in terms of diegesis given that sound in games needs to function on many different levels besides a descriptive/immersive one and such levels may be non-diegetic according to film theory’s defini-
tion of diegesis, and yet function as diegetic cues within a game’s soundtrack. As another limitation of diegesis, I will argue in the last section of this chapter that it fails to recognize sounds outside the gameworld which may very much be part of the experience of play: the acoustic soundscape of group play, the arcade environment or online audio conferencing such as Teamspeak.
139
An Acoustic Communication Framework for Game Sound
VErIsIMILItUDE If fidelity refers to the faithfulness of sound quality in computer games, verisimilitude concerns itself with the experience and nature of truthfulness and authenticity in a game context, as conveyed through the game soundscape. In the section above we used the notion of fidelity to trace the move from synthetic tones representing real actions to realistic sound effects attached to character movements that are called up interactively to combine into a unique and (at least in principle) seamless flow. Verisimilitude addresses precisely the nature of this acoustic ecology and its claim to represent a realistic experience in both temporal and spatial terms. In its traditional literary/theatrical definition, verisimilitude reflects the extent to which a work of fiction exhibits realism or authenticity, or otherwise conforms to our sense of reality. In film, the notion of verisimilitude signifies the relative success of cinematography at creating an immersive, engaging fictional world of hyper-realistic proportions both in terms of image and sound, but also of intensity of emotion and experience (Chion, 1994; Deutch, 2003; Figgis, 2003; Murch, 1995). The core idea in this section is the notion that game sound has developed historically to conform to our sense of reality while at the same time it has constructed a sense of reality, particular to games, that we now expect.
role in Game sound: socio-cultural History Cinematic immersion works by presenting a hyperreal universe, a larger-than-life movie world with action and emotion wrought to an exaggeratedly high intensity. It both summons attention and diverts attention. Its visual and auditory elements both attract and construct an experience and work to divert the audience’s attention from realizing that what they see isn’t real. In games, this is even more the case-by definition games are interactivetheir auditory and visual elements are driven by
140
the player. So already, there is an implication that the auditeur is also a participant, hearing with the ears of the character. As Chion (2003) puts it (in relation to David Lynch’s cinematographic style): “We listen to the characters listening to us listening to them” (p. 153). In FPS games, this relationship is even clearer as the soundscape design is very intentionally oriented towards an authentic experience of listening with the character’s ears—the acoustic field shifting with the avatar’s movement on screen, the reflections, sound coloration and directionality of sounds dynamically and responsively shifting along—a mode of listening that Grimshaw (2008) defines as first-person audition (p. 83). Undoubtedly, one of the most important predecessors of game sound is sound in cinema. Expanding the context of significance to other media forms would include radio (the predecessor to film) as well as television and a particular genre of motion picture: cartoons (with their own predecessor, the paper comic). Unlike cinema, however, where sound’s role is highly artistic and affective, or radio and television, where sound is part of a programming flow (Truax, 2001, p. 169) sound in games must aspire to both aesthetic, affective as well as informational and epistemic functions. Since games are an interactive medium, these functions often overlap and are interdependent. Verisimilitude as a feature of a designed or supporting soundscape can be traced back to the early days of radio particularly with radio drama (Truax, 2001, p. 170). In the absence of a visual reference in-house generated sound effects came to play a central role in creating a realistic environment to go along with the narrative, thus inadvertently giving birth to some of the most widespread conventions of cinema and game sound: notable examples being fist-fight sounds or walking in snow sounds, the former being generated as an artificial exaggeration of what a punch would sound like, and the latter is easily simulated by grinding a fist into a bag of rice or peppercorns. Foley art, which emerged as the mainstream film
An Acoustic Communication Framework for Game Sound
sound craft in the earlier days of modern cinema, and which is experiencing a resurgence today, builds directly onto these conventions, generating an ever-increasing repertoire of techniques through which to simulate “real” sounds (typically by using other acoustic materials). In his discussion of film sound Christian Metz (1985) uses the term aural objects to refer to film’s tendency to solidify an arbitrary relationship between the viewer/listener’s perception of real sounds and the reality of the actual sound sources. The resulting realism, as pointed out not only by him, but other film theorists such as Chion (1994), Deutsh (2003), Figgis (2003) and Murch (1995), to name a few, is that film sound bites become hyper-real: We associate them with certain events and interactions in place of their authentic acoustic counterparts. For example, if someone played back the actual sound of walking in snow and the sound of close-miked grinding into a bag of rice, most of us would perceive the latter as more real. Given such a set of conventions, and media’s natural condition of being an inter-textual and self-perpetuating phenomenon, subsequent media forms and genres simply have to play on and incorporate said conventions. Or do they?
Aural Objects, Flow and space As mentioned already, the first RPGs utilized a small corpus of synthesized melodies to denote unique spaces, quintessential game moments and mood. Loosely based on music psychology conventions, these early game soundscapes used major tonality to signify an uplifting mood, minor tonality to signify danger or failure (as in Zelda or the Final Fantasy series), upward note-trill to denote jump and a downward note sequence to indicate death or end-game (as in all of the Super Mario-based and derivative series). The bigger picture in the early days consisted of having a continuously running soundtrack of synthesized music where many smaller elements, that are
meaningful in themselves mix together to create a flow of gameplay experience (McDonald, 2009) but also a game space. As with narrative support music in cinema, synthesized tunes in early games, specifically in the fantasy genre (titles such as Final Fantasy, Zelda, Castlevania and others), act as a vector (to use Chion’s (1994) term) to the temporal flow of the interactive experience and take on iconic or referential meaning (Deutch, 2003). It is precisely this quality of game sound that illustrates perfectly the distinction between fidelity and verisimilitude - as technologies, storage capacities and processing speeds of game consoles have improved over time, some games have moved towards a more and more authentic depiction of the acoustic reality, while others continue to preserve the nostalgic qualities of what Murch (1995) calls metaphoric sound, only in better sound quality (see Figure 2). Metaphoric sound—one that does not represent the action seen on the screen realistically, is so ingrained in our cultural memory that it seems odd to even point it out. Popularized by early fantasy games and their predecessors—isomorphic cartoon sounds (Altman, 1992), it contributes to a type of verisimilitude that is very different from the one richer and more realistic game genres strive for (adventure, military or FPS games). In other words, Super Mario, Zelda or Final Fantasy just wouldn’t be recognizable to their audience or, in our terms, possess verisimilitude, if it were not for their inter-textual references to iconic sounds of the past. Examples are ample the theme sounds of their game universe or even individual sound effects such as the 1-up sound, the brick-smashing sound or the jumping tune in Super Mario; the battle cries of Zelda’s Link and its iconic chest-opening sounds; or the epic combat rhythms during attacks and boss battles in Final Fantasy, among many others. Given this, sound designers for classic fantasy titles take great care to preserve these iconic sounds in each platform and each iteration of their titles. As Phillips (2009) mentions in his expose on film and game music,
141
An Acoustic Communication Framework for Game Sound
fantasy game theme songs have long transgressed the computer game genre and, particularly in Japan, are frequently re-orchestrated and performed by choirs and symphonic orchestras. Composers of game music, while largely unknown in North America have star status in most of Asia. There is another issue too: fantasy games deal with imaginary actions that no one has experienced in the real world, such as stepping on enemies’ heads, eating a giant mushroom, catching a star (references from Super Mario) and, sonically, these actions do not have ‘real’ counterparts in the acoustic reality we are familiar with. Creating the infamous sound of the lightsaber in Star Wars (McDonald, 2008) is a classic story in the history of metaphoric sound using both musical conventions and pop-psychology. Likewise, this quote from a sound designer of Torment illustrates game verisimilitude challenges perfectly: During Torment, I was processing some sword hits, and they were coming up very interesting. While they didn’t work for the spell I was working on, I gave them a description like ‘reverberant metal tones, good spell source.’ Later, I was looking for something with those qualities, but had forgotten I made those sounds. When I searched my database for ‘metal tones’, I found them, and they were exactly what I needed! (Farmer, 2009) A less discussed but highly important part of game verisimilitude is the temporal flow of the soundscape, as it is intimately linked to the tradition of sound effects and aural objects. While the fantasy sound of the past presents a highly melodic, musically semantic flow, the interactive-adaptive tradition results in a “loopy”-sounding score of slightly varied bank sound effects (i.e. there may be only one footsteps sound that is nevertheless used for all characters) organized around modules of game quests and activities but lacking an overall structure or temporal design (see Figure 3 below). Another aspect of verisimilitude in game sound has to do with creating space, specifically
142
in realistic, rich cinematic RPG/action games. I will begin with Murch’s (1995) notion of worldizing—giving a certain space acoustic qualities that make the player get involved—and combine that with Ekman’s (2005) discussion on diegetic versus non-diegetic sound as acoustic elements that do or do not belong to a gameworld. Historically, it is important to note again how early games (Collins, 2008; McDonald, 2008) instantiated the use of a melody to represent space—for example, in Final Fantasy towns have a certain melody representing the calm mood of a non-threatening environment while out-of-town wooded areas use a separate melody which is consistent everywhere in the game and represents mild danger: mission dungeons have their own musical melody and, within them, entering the space of a boss battle features a fastpaced tension music that is consistently the same throughout the game for each boss battle. Thus, these games established a situation where mood, space, and call-to-action are rolled into one and are all represented via one single melody/track. With the emergence of more powerful game consoles the notion of space becomes divorced from the conveyance of mood or a call for a particular action and becomes more representative and realistic aiming to immerse the player into a gameworld. This connects the idea of diegesis with the notion of verisimilitude through the experience of immersion, as “immersion is a mental construct resulting from perception rather than sensation” (Grimshaw & Schott, 2007, p. 476). While the cinematic concept of diegesis simply refers to whether or not the sound source is in or outside the frame, both Jørgensen (2006) and Ekman (2005) use this term to address whether a sound belongs to a gameworld or not. There is an important distinction to be made in using diegesis in this way as it puts the emphasis on immersion into the resounding space (Grimshaw & Schott, 2007) of a game and carries an implication that the gameworld already is an acoustic reality that sounds either belong or not belong to. On the other hand, regarding diegesis only as a refer-
An Acoustic Communication Framework for Game Sound
ence to in- or out-of frame sounds leaves the game soundscape intact as it assumes then that all sounds are part of the gameworld. Such an idea fits perfectly with Schafer and Truax’s notion of an acoustic community (1977; 2001): a sonic locale or context that is formed over time through a dynamic exchange between sounds, soundscape and listeners, becoming an ecology of its own that can be threatened, altered or generally disturbed by the introduction of new, foreign sounds or the removal of familiar signals that local inhabitants (players) depend upon. The question is whether it is an ecology, where the listener is consumed by the soundscape in a spectator-based relationship (Westerkamp, 1990), or if the ecology includes the player in an (inter)active co-production. Again, we have to remind ourselves that immersion is a perception, not a sensation (Grimshaw, 2008, pp. 170-174). The answer is in the ear of the listener so to speak: While even realistic games represent only a small portion of the game environment sonically (see Figure 4), they do successfully create and maintain a sense of immersion, verisimilitude, and belonging to a gameworld, not to mention conveying information through sonic signals.
LIstENING tO GAME sOUND It follows that the historic shifts of verisimilitude in game sound have affected the experience of listening as well. With the socio-cultural baggage of radio and film sound, listeners are already conditioned to accept aural objects (Metz, 1985), internalize them, and think of them as more real than the real sounds they represent. Further, listeners of game sound have adopted what Colin Ware (2004) refers to in visual studies as naive physics of perception—in the aural sense. That is, players accept and often ignore the clearly artificial behaviour of looped sound bites, their sometimes low or unrealistic quality, and their lack of diversity and complexity (see Figures 3 and 4). What Ware was trying to get to is that designers often reduce
work and design complexity by counting on the fact that players don’t need that much realism—only enough in order to be hooked. The idea being is, it is acceptable if a lot of things from the real world don’t necessarily manifest themselves sonically in the gameworld. Given this, we can now expand the framework of listening positions from Table 1 to include a pattern of attention to sound that ignores the otherwise obvious ”loopy”-ness of sound effects and as such, the predictability of game soundscapes as a whole. A listening of denial, or naive listening is perhaps a good term to use. It is not that players can’t, when prompted, identify the artificial nature of many sonic elements in a game soundscape, it is that they conditionally and purposefully ignore it, while instead immersing themselves in the experience of gameplay. Ideals of game sound become less about fidelity of acoustic sources or of audio quality and more the verisimilitude of non-engaging engagement with a holistic, interactive environment. From the discussion so far, there are a few other modes of listening that I would like to put forth, however before I introduce them, it is important to draw a link between the types of listening fostered by the flow of television and contemporary radio soundscapes, and those encouraged by the gameplay experience in general. The emergence of continuous media such as radio and TV created a brand new type of listening experience: one that Truax calls distracted or media listening (2001, p. 169). In order to accommodate viewers tuning in and out of the program and at the same time attract and keep their attention, TV sound flow uses a number of attention-management techniques such as dynamic shift changes and modular programming structure (Truax, 2001, p. 170). It essentially tells us how to listen. It trains us to increase or decrease our auditory attention by use of carefully crafted cues, until they become second nature. These gestalts of auditory perception, then, seamlessly integrate cinema and game sound, carrying the promise of total immersion, suspension of disbelief and verisimilitude. As a
143
An Acoustic Communication Framework for Game Sound
Figure 4. A sonic excerpt from Grand Theft Auto: San Andreas gameplay. While richer and more varied in dynamic range (including periods of relative silence) the game flow still consists of a series of sound effects strung together, with some distance/amplitude rendering
result, we begin relying less on active, engaged, information-processing listening, and more on habitual background and media listening in all of our surroundings (Schafer, 1977; Truax, 2001). This is not to forget however, that games are interactive, and the player is, in Schafer’s terms, a co-composer of her own game soundscape, at the same time that she listens to it. The listening positions that I’d like to add to in the interest of engaging with and critically understanding the
144
experience of computer game sound are presented in Table 3.
EcOLOGY Discussing game soundscapes as sites of local acoustic ecologies is not a novel idea (Grimshaw, 2008) and as Grimshaw and Schott (2007) point out, “the more immersive a game is the more appro-
An Acoustic Communication Framework for Game Sound
Table 3. A set of listening modes emergent out of the current discussion on fidelity, verisimilitude and the ecology of game sound. These modes reflect and attempt to identify macro trends borne out of historical shifts in the qualities, techniques and functions of game sound over time Imaginative Listening
A listening that supplies the perceptual conditions for immersion - building up a mental image of an environment from the little that is provided acoustically by the game’s soundtrack, for example, the way a game like Cooking Mama is reminiscent of Super Mario games and evokes a fun, fantastical, care-free world.
Nostalgic Listening
An analytical, culturally-critical type of listening that has emerged over time in experienced players who look for iconic game music themes through platforms and generations of a particular game (some notable examples here being the Final Fantasy, Super Mario, Zelda and Mega Man series).
Disjunctive Listening
A listening position that describes the ability that gamers develop to very quickly and fluidly interchange listening attentions—one moment they may be immersed in the heat and tension of a battle and in the next they may pause to change their settings, entering a user interface type of soundscape (for instance, in the Fallout 3 example in Figure 2, the player shifts constantly between the battlefield ambience/listening position and armour selection/target selection screens).
Naive Listening
A non-analytical, electroacoustic listening that allows the player to feel immersed into the game reality with the minimum amount of auditory complexity. In the absence of truly realistic soundscapes, players effortlessly ignore loops, repetitions and lack of sonic fidelity in order to become more immersed in the game. The name is inspired by Ware’s (2004)naive physics of perception idea.
Conditioned Listening
The type of listening that Truax (2001) calls media [flow] listening (p. 169) where players listen with an underlying expectation of how the flow of the game’s soundscape will unfold, tacitly familiar with the sonic elements of the games in general.
Inter-textual Listening
A result of cross-pollination of different media genres, this listening position addresses situations where game soundscapes contain radio, telephone, or TV sounds (most famously featured in Grand Theft Auto). Conversely, the popular events of Video Game Concerts are settings where game sounds live on outside gameworlds and are performed, listened to, and used for other purposes outside of games.
priate it is to discuss the game world in terms of an ecology and, therefore, the greater the immersive function of the game sounds’’ (p. 479). Grimshaw, unlike Schafer, analogizes game soundscapes to an actual bio-ecology where various species (in our case, sounds) interact, co-exist and are co-dependent on each other. He also focuses on the ecology of first person shooter (FPS) game soundscapes as this genre lends itself particularly well to a discussion of ecology in terms of sound. Spatialization and 3D sound rendering are honed to an art form in FPS games and the player literally has to listen through the character’s ears in order to play and succeed in the game. Sounds of shots, enemies in the background or out-of-the-frame (extra-diegetic) sounds are extremely important, as are user interface sounds including warnings and alarms that often require immediate attention and split-second decisions (trans-diegetic sounds, per Jørgensen, 2006). Schafer, however, would still look at ecology from the perspective of bal-
ance within an acoustic community where each sound has a meaning in the sonic context and a place within the spectral niche of the soundscape. This acoustic balance may or may not be in stasis: at certain times an element may mask and overpower other sonic elements. For example, in action scenes music often takes on a dominant sonic role overshadowing smaller environmental or game alert sounds (in Figure 4 it is clearly visible in the full sequence layout (top section) that music tracks have a significantly higher/broader dynamic range than all other sound effects). For Schafer, and especially for Truax whose work focuses more on electroacoustic sound, sound balance is not simply about loudness but also about value connotations. Music, for example, is not only a much stronger emotional, affective device than environmental sound within a given game environment, but it also carries a history of being used commercially, to condition consumers into spending time and money in certain settings
145
An Acoustic Communication Framework for Game Sound
(Truax, 2001; Westerkamp, 1991). As Hildegard Westerkamp (1991) points out, the phenomenon of background music is responsible for sound becoming “associated in our memories with environments and products” (1991). In essence it becomes the ambience of the media environment, however, it does not result in endless diversity of spaces and sounds but, rather, in the emergence of archetypal surrogate environments (Westerkamp, 1991). In the context of games, ominous abstract tones analogous to the cinematic model of the mood track provide such a strong emotional sense, enforced and enriched by previous generations of media listening such as film, radio and TV, that the acoustic qualities of space, reverberation, distance, location and timbre, which are the more subtle yet vital cues of everyday listening, are often lost in the ‘background’. Similarly, music in action and rhythm games often provides a promotion vehicle for indie bands whose sound is conceived as culturally related to the genre of the game itself thus perpetuating—not challenging—the status quo of popular culture and mass media. Essentially, music’s overshadowing of other sonic elements has both a cultural and a political economic implication for games in addition to an acoustic one.
Ecology of Listening While so far we have been discussing new listening patterns that emerge from the experience of game soundscapes and their socio-cultural and historical evolution, what about the listening that takes place inside game soundscapes? Does anybody listen within the game itself or is it a silent vacuum space where sound happens but no one can hear it? In other words, how would a game’s acoustic ecology change if characters in it (maybe even all of them!) could listen to one another and to the player’s character, or even to sounds outside the gameworld? In Truax’s (2001) terms this would complete the holistic relationship of true acoustic communication, uniting a constant interplay between listeners, sounds and soundscape, where
146
game characters and the player-driven avatar are all participants in the ecology. However, such algorithmic subtlety is far from reality to date and, partly due to economic reasons but also party due to notions of value, may never be a generally utilized phenomenon anyway. Even though sound in games has experienced tremendous growth and is now considered an important part of game design, development companies still invest in it considerably less than they do in visual graphics and animation. Sound designers in game development companies are typically pressured to stick with tried and true approaches to composition, design and functionality of audio, and are dissuaded from implementing “risky” new ways of using sound as part of the game mechanics. There are, of course, a few examples where sound is used in more participatory or ecological ways. For a while now Nintendo DS features a microphone input so games such as Elektroplankton and to a lesser degree titles such as Yoshi’s Island or Guitar Hero involve user-generated vocal elements into the gameplay: mostly in the form of shouting, blowing or speaking into the mike. More complex platforms support a genre called stealth games where the avatar’s own soundmaking in the game (primarily footsteps) is implied to be heard by the other non-player characters. Metal Gear Solid is the best known title, in addition to Hitman, Assassin’s Creed 2, and even youth-themed games such as Harry Potter and the Chamber of Secrets, or Zelda: The Phantom Hourglass, where Link has to walk slowly in the Temple of Time in order not to alert the phantom knights. Even at a rudimentary implementation such as linking the player/character’s speed to levels of “noise” in a given space, this approach taps into an aspect of acoustic ecology that has been largely overlooked: the character’s experience of listening within the gameworld. Acoustic Community as a Feature of Game Sound We have already discussed acoustic community in the context of game soundscapes as a conglomeration of different types of sound cues,
An Acoustic Communication Framework for Game Sound
sound functions, foreground, midground and background sounds; a community that forms over time and evokes a coherent sense of place in the gameworld. In this section, I’d like to also bring up the idea of the acoustic soundscape that is located outside the gameworld but exists synchronously to it: the sounds that surround the player in her physical environment, sounds that may or may not be related to the gameplay, but are nevertheless part of the immediate acoustic community that the player or players are in. Without focusing too much on the minutiae of less significant sonic details such as household sounds, context does offer quite a distinct sense of acoustic community depending on whether a player is at home alone, with friends, at an arcade, at a LAN party, or on a headset with online co-players (see Figure 5). A Rock Band house party, for example, is a particular community where the soundmaking of multiple players and audience members supplies much of what makes this game’s soundscape a great experience. It is precisely the exclamations of joy, frustration, encouragement—and not the designed game sound—that give this acoustic community both a sense of fidelity and verisimilitude. In
contrast, many RPG, sports or puzzle games that are played at home, even with company, result in a much quieter soundscape with sporadic and minimal interaction. Using Teamspeak or other voice chat programs for Massively multiplayer online role-playing games (MMORGs) or multi-player military strategy games results in yet another acoustic community where players’ voices have to fit seamlessly within the spectral niche of the game’s soundscape without masking or obliteration: every second counts and a lot of the designed sonic information is crucial to the gameplay (see Figure 5). Game expos, conventions and professional game championships are another quintessential acoustic community of gaming, filled with PAs (amplified public announcements), a constant arcade-like hum of game sounds: the shifting of chairs and mashing of buttons, whether players are wearing headphones or not, the murmur and exclamations of crowds. In fact the arcade environment, as Phillips (2009) points out, is responsible for some of the early choices in game sound as each game’s signature soundtrack was designed to attract attention in a loud and noisy acoustic environment of competing
Figure 5. On the left we see a recording from an arcade ambience: a constant hum of competing, masking sounds, many of which are already distorted synthetic chiptunes (in the zoom-in section). On the right we have a Teamspeak-based recording of a World of Warcraft mission: the progression (upper section) clearly reflects more verbal excitement as the team finally defeats a difficult boss, culminating into celebratory exclamations.
147
An Acoustic Communication Framework for Game Sound
game stations: hence gaming’s early and ready acceptance of sonic masking. As games moved into the home and became more technologically sophisticated, game sound changed to provide a fuller, more subtle soundscape, often to be delivered through headphones. With the emergence of MMORGs, the popularity of game tournaments, expos, LAN parties and, most recently, Guitar Hero and Rock Band house party nights, gaming is once more returning to a social model of play where the sounds of the cultural context and setting are again significant and instrumental in forming that sense of acoustic community that unites designed game sound with the incidental (acoustic and electroacoustic) sound-making and sonic environment.
cONcLUsION AND FUtUrE DIrEctIONs This chapter explores the notions of fidelity and verisimilitude manifesting historically both as global cultural conventions of media and technology, as well as, more specifically, being design goals in the production of sound in games. By exploring these two perspectives of acoustic realism through the lens of the acoustic communication framework with its focus on patterns of listening over time, acoustic communities and ecology, I hope to offer a model for future theorizing and exploration of game sound and a lens for in-depth analysis of particular game titles. As well, it is my hope that placing some much needed emphasis on listening, ecology, and the holistic acoustic setting of the gaming experience will benefit not only sound designers and game theorists but will also continue the trajectory of deepening inquiries into game studies as a rich and unique form of interactive media deserving of its own theoretical attentions. For example, before we go ahead and favour real-time audio synthesis and physical modeling for their realistic acoustic rendering (not an im-
148
minent event, I realize: science and programming still have a ways to go), we need to generate precisely the type of historical and socio-cultural analysis of game sound touched on in this chapter. We need to understand the importance of all the elements of a game soundscape, which, for better or for worse, have become important to audiences, or at the very least, we are now habituated to. There is a crucial epistemological relationship there—through inter-textual cross-pollination and transference of practices and artefacts, we have internalized many of these arbitrary meanings and a realistic physical modelling of a game soundscape might not mean much to us or even be conducive to gameplay. Designers, audio engineers and programmers need to know and think about these issues. Further, I believe the focus on listening positions in this chapter is a key to understanding not only some of the cultural practices surrounding gameplay, but it can also tell us something about auditory perception that designers or scientists could potentially use. Listening to game sound is now every bit as everyday as everyday listening goes in our media and technology-saturated environment, so games offer new opportunities to science, given the fact that contextual listening has always evaded laboratory psychoacoustic studies. Clearly, my main concerns however, are with the opportunities for critical and media studies to engage with and treat game sound and the phenomenon of listening to game sound as another rich cultural artifact—a text if you will— that can add to the layers of theory and critique surrounding media, art, and cultural expression. While the use of fidelity and verisimilitude are only two relevant heuristics in the analysis of game sound, it is my hope that the field of media studies will identify others and conduct the same kind of rigorous examination of their historical and cultural roots in order to elucidate their role and importance not only in game sound but in our culture-at-large today.
An Acoustic Communication Framework for Game Sound
Finally, my sense of the future directions in the field of game sound is that, as the game industry matures, as playing computer games starts to lose some of its negative notoriety, naturally there is more and more societal and media attention on games as well as on game elements such as sound. With that, increased popularity of gaming results in industry growth, expanding game genres, expanding the notions of what a game is, how it is played and how it is experienced. Sound plays a crucial role in experience and interactivity and there has been an increased design attention to it both from industry as well as from independent artists. With that comes a book like this one and my prediction is that there will be (hopefully) more to come from scholars, critics, media theorists, sociologists, scientists, and designers who would be now better equipped to continue this in-depth conversation about game sound and listening in a way that preserves the complex ecology of people’s interactions with their (media/ techno)-soundscapes while expanding the multidisciplinary nature of this maturing field. There has been a resurgence of concern over noise and the urban soundscape coming back into public attention in the context of environmentalism and sustainability and, well, it only takes one look at the history of game sound, inter-related with similar media forms and genres, to glean its influence on the way in which we listen, make sense of and experience our physical offline soundscape. More work in this area is not only needed, but is, I am confident, bound to come.
rEFErENcEs Altman, R. (1992). Sound theory sound practice. London: Routledge. Assassin’s Creed 2. (2009). Ubisoft Montreal. Ubisoft.
Brandon, A. (2004). Audio for games: Planning, process, and production. Berkeley, CA: New Riders Games. Castlevania. (1989). Konami Digital Entertainment. Chion, M. (1994). Audio-vision, sound on screen. New York: Columbia University Press. Chion, M. (1999). The voice in cinema. New York: Columbia University Press. Chion, M. (2003). The silence of the loudspeaker or why with Dolby sound it is the film that listens to us . In Sider, L., Freeman, D., & Sider, J. (Eds.), Soundscape: The School of Sound lectures 19982001 (pp. 150–154). London: Wallflower Press. Collins, K. (2007). An introduction to the participatory and non-linear aspects of video game audio . In Hawkins, S., & Richardson, J. (Eds.), Essays on sound and vision (pp. ##-##). Helsinki: Helsinki University Press. Collins, K. (2008). Game audio: An introduction to the history, theory, and practice of video game music and sound design. Cambridge, MA: MIT Press. Cook, P. (Ed.). (1999). Music, cognition, and computerized sound: An introduction to psychoacoustics. Cambridge, MA: MIT Press. Cooking Mama. (2007). OfficeCreate. Majesco Publishing. Deutch, S. (2003). Music for interactive moving pictures . In Sider, L., Freeman, D., & Sider, J. (Eds.), Soundscape: The School of Sound lectures 1998-2001 (pp. 28–34). London: Wallflower Press. Ekman, I. (2005). Understanding sound effects in computer games. In Proceedings of the 6th Annual Digital Arts and Cultures Conference, 2005, Copenhagen, Denmark: IT University Press.
149
An Acoustic Communication Framework for Game Sound
Electroplankton. (2006). Nintendo America. Nintendo. Fallout 3. (2008). Bethesda Softworks. Bethesda Game Studios. Farmer, D. (2009). The making of Torment audio. Retrieved July 9, 2009, from http://www.filmsound.org/game-audio/audio.html. Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Figgis, M. (2003). Silence: The absence of sound . In Sider, L., Freeman, D., & Sider, J. (Eds.), Soundscape: The School of Sound lectures 19982001 (pp. 1–14). London: Wallflower Press. Final Fantasy 2. (1988). Squaresoft. Square ENIX. Friberg, J., & Gärdenfors, D. (2004). Audio games: New perspectives on game audio. In Proceedings of ACM SIGCHI International Conference on Advances in Computer Entertainment Technology (pp. 148-154). Gaver, W. (1994). Using and creating auditory icons. In G. Kramer (Ed.). Auditory Display: Signification, Audification, and Auditory Interfaces (Santa Fe Institute Studies in the Sciences of Complexity, Vol. 18, pp. 417-446). Reading, MA: Addison-Wesley. God of War 2. (2007). SCE Studios Santa Monica. Sony Computer Entertainment. Grand Theft Auto. (2004). San Andreas. Rockstar North. Rockstar Games. Grimshaw, M. (2008). The acoustic ecology of the first-person shooter: The player, sound and immersion in the first-person shooter computer game. Saarbrücken, Country: VDM.
150
Grimshaw, M., & Schott, G. (2007). Situating gaming as a sonic experience: The acoustic ecology of first-person shooters. In Proceedings of the Third Digital Games Research Association Conference (pp. 474-481). Guitar Hero. (2006). Harmonix. Rec Octane. Harry Potter and the Chamber of Secrets. (2002). Eurocom. Electronic Arts. Hitman. (2002). Io Interactive. Eidos Interactive. Hug, D. (2011). New wine in new skins: sketching the future of game sound design . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Huiberts, S., & van Tol, R. (2008). IEZA: A framework for game audio, Gamasutra, Retrieved April 4, 2009, from http://www.gamasutra.com/view/ feature/3509/ieza_a_framework_for_game_audio.php?page=3. Jørgensen, K. (2006). On the functional aspects of computer game audio. In Proceedings of the first International AudioMostly Conference (pp. 48-52). Jørgensen, K. (2011). Time for new terminology? Diegetic and non-diegetic sounds in computer games revisited . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Marks, A. (2009). The complete guide to game audio: For composers, musicians, sound designers, game developers (2nd ed.). Location: Elsevier Press. McDonald, G. (2008). A brief timeline of video game music. Retrieved July 8, 2009, from http:// www.gamespot.com/gamespot/features/video/ vg_music/. Mega Man. (1993). Capcom. Capcom Entertainment.
An Acoustic Communication Framework for Game Sound
Metal Gear Solid. (1998). Konami Japan. Konami Computer Entertainment. Metz, C. (1985). Aural objects . In Belton, E. W. J. (Ed.), Film sound (pp. ##-##). New York: Columbia University Press. Murch, W. (1995). Sound design: The dancing shadow . In Boorman, J., Luddy, T., Thomson, D., & Donohue, W. (Eds.), Projections 4: Filmmakers on film-making (pp. 237–251). London: Faber and Faber. Phillips, N. (2009). From films to games, from analog to digital: Two revolutions in multi-media! Retrieved July 8, 2009, from http://www.filmsound.org/game-audio/film_game_parallels.htm. Planescape: Torment. (2005). Black Island Studios. Interplay. Rock Band. (2008). Harmonix. MTV Games. Roeber, N., Deutschmann, E. C., & Masuch, M. (2006). Authoring of 3D virtual auditory environments. In Proceedings of the First International AudioMostly Conference (pp. 15-21). Schafer, R. M. (1977). The tuning of the world. Toronto: McClelland and Stewart. Spyro the Dragon. (2008). Insomniac Games. Sony Computer Entertainment. Stockburger, A. (2007). Listen to the iceberg: On the impact of sound in digital games . In von Borries, F., Walz, S. P., & Böttger, M. (Eds.), Space time play: Computer games, architecture and urbanism: The next level (pp. ##-##). Location: Birkhäuser Publishing. Super Mario Bros. NES (1985). Nintendo. Nintendo. Truax, B. (2001). Acoustic communication (2nd ed.). Location: Ablex Publishing.
Tuuri, K., Mustonen, M., & Pirhonen, A. (2007). Same sound—different meanings: A novel scheme for modes of listening. In Proceedings of the Second International AudioMostly Conference, 2007, 13-18. Ware, C. (2004). Information visualization: Perception for design (2nd ed.). Location: Morgan Kaufman Publishing. Westerkamp, H. (1990). Listening and soundmaking: A study of music-as-environment . In Lander, D., & Lexier, M. (Eds.), Sound by artists (pp. ##-##). Location: Art Metropole & Walter Phillips Gallery. Wilhelmsson, U., & Wallén, J. (2011). A combined model for the structuring of computer game audio . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. World of Warcraft. (2004). Blizzard Entertainment. Blizzard. Yoshi’s Island. (2007). Nintendo Japan. Nintendo. Zelda: Phantom Hourglass. (2007). Nintendo. Nintendo.
KEY tErMs AND DEFINItIONs Acoustic Community: A term emerging from Schafer and Truax’s work with the WSP (World Soundscape Project) in the 1970s referring to stable sonic locales that include a set of sound which clearly belong there and characterize a community: For example, the sounds of coin machines, yelling, and synthesized music all belong to an arcade acoustic community. Acoustic Ecology: A movement started by R.M. Schafer and continued through the World Forum for Acoustic Ecology (WFAE) and also a term denoting the sonic balance in a given soundscape through its signal-to-noise ratio.
151
An Acoustic Communication Framework for Game Sound
Chiptunes: A term that has been now popularized by the arts community, referring to 8-bit synthesized melodies or single tones that were originally directly encoded onto a game’s electronic chip memory, in early game development. Diegesis: A term from film studies referring to what is in-the-frame of the screen as opposed to what isn’t. In game sound studies, Chion is credited with popularizing it to refer to sounds that are in or out of frame from the player’s perspective. It has also been used by others to refer to sounds that do or do not belong to the gameworld. Fidelity: Literally means faithfulness and here, it refers to the audio quality of a sound reproduction relative to its original acoustic source. Listening Positions: Developed by B. Truax as a term, it refers to types of listening attentions that have become patterns over time with exposure to certain types of sound environments, habits, or media, that is, background listening is a passive
152
form of listening attention that we all engage in at different times. Loopy: An adapted term I am using here to denote the quality of game sound flow in many RPG games where short looped sounds from an effects bank are triggered each time an action is performed, thereby often sounding cut-off, toosimilar, or simply uniform. Soundscape: A term coined by R.M. Schafer to describe the totality of sounds surrounding us at any given time/place: analogous to a landscape. Verisimilitude: Literally means similar to reality and it is a theatrical term referring to the ability of an artwork to appear real, to foster a sense of realism in the audience. Here, it refers to the ability of game soundscapes to sound real.
153
Chapter 8
Perceived Quality in Game Audio Ulrich Reiter Norwegian University of Science and Technology, Norway
AbstrAct This chapter reviews game audio from a Quality of Experience point of view. It describes cross-modal interaction of auditory and visual stimuli, re-introduces the concept of plausibility, and discusses issues of interactivity and attention as the basis for the qualitative, high-level salience model being suggested here. The model is substantiated by experimental results indicating that interaction or task located in the audio domain clearly influences the perceived audio quality. Cross-modal influence, with interaction or task located in a different (for example, visual) domain, is possible, but is significantly harder to predict and evaluate.
INtrODUctION Perceived quality in game audio is not a question of audio quality alone. As audio is usually only a part in an overall game concept consisting of graphics, physics, artificial intelligence, user input, feedback and so forth, audio has been considered to play a relatively minor role in the overall experience that a game provides. Consequently, a lot of effort has been put into providing near photo-realistic representations of (virtual) game scenarios to the player, but only little into audio. Interestingly, DOI: 10.4018/978-1-61692-828-5.ch008
this assessment has had to be revised over the last years. Learning from other artistic fields like cinema, in which storytelling is a central means of providing “user experience”, game developers have come to know that audio can trigger emotions and provide additional information otherwise hard to convey. Today, although budgets are still limited compared to other aspects of game engineering, audio in games is given more attention by the game developers than ever before. But there is more to audio in games than just an emotional support for a story. Most games are user-centered and non-linear, as opposed to the linear story telling of traditional, non-interactive
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Perceived Quality in Game Audio
content presentation. Therefore, the audio has to be manipulated in real-time depending on the player’s actions. Real-time processing of audio can become computationally very demanding and is a problem for complex game scenarios. This has introduced the concept of plausibility: the main goal in game audio is not to have an audio simulation as exact and close to reality as possible, but to render audio that is plausible in the game scenario, and that provides an overall quality impression that matches the other aspects of the game. One fact well known from home cinema applications is that an improved quality in video can also increase the subjectively perceived audio quality, and that the reverse effect also exists (Beerends & De Caluwe, 1999). It is therefore a most interesting question to see whether these effects can be exploited to increase the subjectively perceived overall quality of a game without actually increasing the computational load. Instead of just rendering more details (equivalent to a higher simulation depth), focusing on those details that are actually relevant in a certain context could provide a much higher Quality of Experience (QoE) (see Farnell, 2011 for a discussion of relevancy and redundancy in procedural audio design). The central question is, therefore, which stimuli in a game scenario are of most importance? Can information that is difficult and cost-intensive to convey in one modality be presented in another modality with less effort but similar perceptual impact? What role does interactivity play in the perception of quality? What are the technical parameters that can influence the perceived quality of a game, and which other factors exist that potentially dominate the perceptual process? This chapter aims at identifying and discussing general quality criteria in multimedia application systems with a focus on games. These criteria contain technical as well as human factors. In order to understand these factors, the first section touches upon the mechanisms of human percep-
154
tion: well-known facts about visual and auditory perception are summarized briefly. The second section presents a discussion of cross-modal influences, that is, interaction between auditory and visual stimuli in the perceptual apparatus, and cross-modality in general. A survey detailing the most accepted theories of how audiovisual (bimodal) perception is achieved in the human brain is given. This is far more complex than just adding the results of auditory and visual processing and is therefore worth an extended discussion. This is followed by examples of effects in bimodal perception (based on research in the fields of psychology and cognitive sciences) that can be relevant in the context of game audio. The third section discusses the concept of auditory and audio-visual plausibility. It briefly compares the requirements for exact (room) acoustic simulations versus real-time rendering and details the constraints resulting for computer games. The next section gives an overview on issues related to interactivity, such as latency, user input, and perceptual feedback. Interactivity is closely related to the generation of presence, defined as the “perceptual illusion of non-mediation”, or simply the feeling of “being there”. The concept of presence is discussed as an indirect measure for perceived quality. The fifth section elaborates on the concept of attention. The perception of multiple streams is discussed and an introduction to the general model of the Perceptual Cycle according to Neisser is given. From this, the concepts of selective attention and divided attention are discussed and capacity limits of the human perceptual system are explained. Finally, in the sixth section, the resulting factors (technical as well as human) are arranged to form a qualitative model describing human audio-visual perception based on saliency of stimuli. Such a model can serve as a basis for determining the QoE in games in general and specifically for game audio. Experimental results documenting inner-
Perceived Quality in Game Audio
modal versus cross-modal effects on perceived audio quality are summarized. Finally, a summary is given that reviews the most important concepts leading to the salience model presented in the preceding section. Further research potential is defined.
MEcHANIsMs OF HUMAN PErcEPtION Vision (sight) and audition (hearing) are the most important human senses for playing games. In the real world, these senses provide us with information about the more remote surroundings, as opposed to taste (gustation), smell (olfaction), and touch (taction or pressure) which provide information about our immediate vicinity. Because vision and audition communicate spatial and temporal relations of objects, and because the necessary technology to stimulate the two is readily available on computer systems used in the home, most games only stimulate the two.
Visual Perception Vision mainly serves to indicate spatial correlation of objects, as the human visual system seldom responds to direct light stimulation. Rather, light is reflected by objects and thus transmits information about certain characteristics of the object. The direction of a visually perceived object corresponds directly to the position of its image on the retina, the place where the light receptors are located in the eye. At the same time, a visual stimulus occupies a position in perceptual space that is defined relative to a distance axis, as well as to the vertical and horizontal axes. In the determination of an object’s distance to the eye, there are a number of potential cues of depth. These include monocular mechanisms like interposition, size, and linear perspective as well as binocular cues like convergence and disparity. All of these are usually evaluated jointly, allow-
ing us to solve even ambiguous situations with contradicting sensory information. All these depth cues can be exploited even when the environment is at rest. As soon as motion (of objects or of the head) is present, motion parallax takes on an important role in depth perception. Motion parallax describes the fact that the image of an object far away from the viewer moves more slowly across the retina than the image of an object at a close distance. Motion parallax also provides cues in the monocular case.
Auditory Perception Auditory stimuli are perceived to be localized in space. The sound is not heard within the ear, but it is phenomenally positioned at the source of the sound. In order to localize a sound, the auditory system relies on binaural and monaural acoustic cues. Directional hearing in the horizontal plane (azimuth) is dominated by two mechanisms which exploit binaural time differences and binaural intensity differences. For sinusoidal signals, interaural time differences (ITDs, the same stimulus arriving at different times at the left and the right ear) can be interpreted by the human hearing system as directional cues from around 80Hz up to a maximum frequency of around 1500Hz. This maximum frequency corresponds to a wavelength of roughly the distance between the two ears. For higher frequencies, more than one wavelength fits between the two ears, making the comparison of phase information between left and right ear equivocal (Braasch, 2005). For signals with frequencies above 1500Hz, interaural level differences (ILDs) between the two ears are the primary cues (Blauert, 2001). Regardless of the source position, ILDs are small at low frequencies. This is because the dimensions of head and pinnae (the outer ear visible on the side of the head) are small in comparison to the wavelengths at frequencies below about 1500Hz. Therefore they do not represent any noteworthy obstacle for the propagation of sound.
155
Perceived Quality in Game Audio
Directional hearing in the vertical plane (elevation) is dominated by monaural cues. These stem from direction-dependent spectral filtering caused by reflection and diffraction at the torso, head, and pinnae. Each direction of incidence (for instance, defined in terms of azimuth and elevation) is related to a unique spectral filtering for each individual. This spectral filtering can be described by head-related transfer functions (HRTFs). In addition to providing localization of sounds in the vertical plane, these spectral cues are also essential for resolving front-back confusions (Blauert, 2001). Pulkki (2001) reports that, for elevation perception, frequencies around 6kHz are especially important. In everyday situations, localization of sound sources seldom relies on auditory cues alone. Knowledge of the potential source of a sound (for example, airplane noises from above, or crunching shoes from below) aids in the localization process. Visual cues heavily influence the localization of sound sources.
crOss-MODAL INtErActION bEtWEEN AUDIO AND VIDEO Human perception in real world situations is a multi-modal, recursive process. Stimuli from different modalities usually complement each other and make the perceptual process more unequivocal. Only those stimuli that can actually be perceived by the primary receptors of sound, light, pressure and so on contribute to an overall impression (which is the result of any perceptual process). The human perceptual process, because of its complexity, cannot easily be explained in a simple block diagram without neglecting important features. A number of descriptive models exist, but these only cover certain aspects of the process, depending on the level of abstraction at which the respective model is located. Relatively little is known about the mechanisms of multi-modal processing in the human brain.
156
The main questions with respect to audio-visual perception are: At what level of perceptual processing do cross-modal interactions occur? And what mechanism underlies them?
Joint Processing of AudioVisual stimuli As early as 1909, Brodmann suggested a division of the cerebral cortex into 52 distinct regions, based on their histological characteristics (Brodmann, 1909). These areas, today called Brodmann areas, have later been associated to nervous functions. The most important areas in the audio-visual game context are Primary Visual Cortex (V1), Visual Association Cortex (V2 and V3), as well as Primary Auditory Cortex and anterior and posterior transverse temporal areas (H). This division suggests that the different modalities are related to separate regions of the brain, and that processing of stimuli is performed separately for each modality. Taking a closer look at the brain reveals that the neurons of the neocortex are arranged in six horizontal layers, parallel to the surface. The functional units of cortical activity are organized in groups of neurons. These are connected by four types of fibers, of which the association fibers are especially interesting when looking at information exchange between cortical areas. Short association fibers (called loops) connect adjacent gyri, whereas long association fibers form bundles to connect more distant gyri in the same hemisphere. These association bundles give fibers to and receive fibers from the overlying gyri along their routes. They occupy most of the space underneath the cortex. There are many such connections between different functional areas of the neocortex such that information can be exchanged between them and true multi-modal processing can be achieved. Goldstein (2002) gives an example of a red, rolling ball entering our field of view. Locally distinct neurons are then activated by either motion, shape, or color. Subsequently, dorsal and
Perceived Quality in Game Audio
ventral streams are also activated. Although the involved neurons are locally distinct, we perceive one singular object, not separate rolling, red color, or round shape. Until now, it is unclear how the processing of multiple characteristics of a single object is organized. A number of theories have been suggested to explain this binding problem, and the exploration of binding in the visual system has become a heavily discussed topic. According to Goldstein (2002), the most prominent theory, suggested by Singer, Engel, Kreiter, Munk, Neuenschwander, and Roelfsema (1997), assumes that visual objects are represented by groups of neurons. These so-called cell-assemblies are activated jointly, producing an oscillatory response. This way, neurons belonging to the same cell-assembly can synchronize. Whenever the reaction to stimuli is synchronized, this means that the respective cortical areas are processing data coming from one single object or context. Yet, this binding by synchrony theory has left doubts with respect to the interpretation and processing of the synchrony code. For example, Klein, König, and Körding (2003) postulate that “many properties of the mammalian visual system can be explained as leading to optimally sparse neural responses in response to pictures of natural scenes” (p. 659). According to Goldstein (2002), many others argue that binding can be explained by (selective) attention. Attention is discussed below.
Dominance of single Modalities Very often the dominance of visual stimuli over other modalities is accepted naturally as a given. In fact, looking at our everyday experiences we might be inclined to accept this posit without further discussion: because “seeing is believing”, we often think that we tend to trust our eyes more than the other senses. Yet, this appraisement is often due to the fact that in the real world we seldom have to face contradictions in the multi-modal
stimuli perceived by our senses. There is actually no need to consciously further evaluate the different percepts in terms of relevance, because they usually complement (and not contradict) each other. In order to actually verify any naturally given order of significance of the perceived stimuli, it is necessary to present the human perceptual system with contradictory sensory information and see what the generally dominating modality is—if there is any. There have been a number of scientific efforts to explain in a perceptual relevance model how the human perceptual system weighs the different contradicting percepts. Two such models have been proposed to describe how perceptual judgments are made when signals from different modalities are conflicting. One of these models suggests that the signal that is typically most reliable dominates the competition completely in a winner-take-it-all fashion: the judgment is based exclusively on the dominant signal. In the context of spatial localization based on visual and auditory cues, this model is called visual capture because localization judgments are made based on visual information only. The other model suggests that perceptual judgments are based on a mixture of information originating from multiple modalities. This can be described as an optimal model of sensory integration which has been derived based on the maximum-likelihood estimation (MLE) theory. This model assumes that the percepts in the different modalities are statistically independent and that the estimate of a property under examination by a human observer has a normal distribution. In engineering literature, the MLE model is also known as the Kalman Filter (Kalman & Bucy, 1961). Battaglia, Jacobs, and Aslin (2003) report that several investigators have examined whether human adults actually combine information from multiple sensory sources in a statistically optimal manner (that is, according to the MLE model). They explain:
157
Perceived Quality in Game Audio
According to this model, a sensory source is reliable if the distribution of inferences based on that source has a relatively small variance; otherwise the source is regarded as unreliable. More-reliable sources are assigned a larger weight in a linearcue-combination rule, and less reliable sources are assigned a smaller weight. (Battaglia et al., 2003, p. 1391) Looking at it this way, visual capture is just a special case of the MLE model: the highly reliable percept (the visual cue) is assigned a weight of one, whereas the less reliable percept (the auditory cue) is assigned a weight of zero. Battaglia et al. (2003) describe an experiment designed to answer the question whether human observers localize events presented simultaneously in the auditory and visual domain in a way that is best predicted by the visual capture model or by the MLE model. Their report suggests that both models are partially correct and that a hybrid model may provide the best account of their subjects’ performances. As greater amounts of noise were added to the visual signal, subjects used more and more information perceived via the auditory channel, as suggested by the MLE model. Yet most notably, according to their analysis, test subjects seemed to be biased towards using visual information to a greater extent than originally predicted by the MLE model. This means that the model used in the experiments committed a systematic error by constantly underestimating the test subjects’ use of visual information (thus overestimating the use of auditory information). Shams, Kamitani, and Shimojo (2000, 2002) describe experiments in which visual illusion was induced by sound, resulting in the auditory cue outweighing the visual cue. They presented test subjects with flashes of light and beeps of sound: whenever a single flash of light was accompanied by multiple auditory beeps, the single flash was perceived as multiple flashes. They conclude that this alteration of the visual percept is caused by cross-modal perceptual interactions, rather than
158
having cognitive, attentional, or other origins. This is especially interesting as there was no degradation in the quality of the visual percept offered, which otherwise inevitably provokes the human perceptual system to rely on other modalities. To sum up, the combined results of these experiments suggest that there is no clear, generalized bias of humans toward any of the available modalities in terms of dominance. Apparently, there is no such thing as a general dominance of visual percepts over other stimuli. Instead, whenever such a bias toward any of the available modalities exists, this seems to be highly dependent on the context. Whereas Battaglia et al. (2003) tested subjects for contradicting localization cues and were presented with a bias toward the visual percept, Shams et al. (2000) tested subjects for temporal variations of cues and were presented with a bias toward the auditory percept. This actually indicates that the human perceptual system tends to prefer those senses (give a higher weight to those percepts) that promise a higher degree of reliability or resolution for the presented perceptual problem: Whereas the horizontal resolution of the human auditory system is roughly 2 to 3 degrees for sinusoidal signals coming from a forward direction (Zwicker & Fastl, 1999), the resolution of the visual system is at least 100 times as high, about 1 min. of arc (Howard, 1982). On the other hand, the time resolution of the auditory system allows to resolve the temporal structure of sounds as close as 2ms (Zwicker & Fastl, 1999), whereas the human visual system can be tricked into believing in a continuously moving object when presented with only 24 sampled pictures of the continuous movement per second.
AUDItOrY AND AUDIOVIsUAL PLAUsIbILItY In classic room acoustic simulation, the time necessary to render the room audible (in other words, to perform the room acoustic simulation
Perceived Quality in Game Audio
itself), is often considered second-rank. Instead, the (acoustic) similarity between the simulation and the real situation is considered most important. In games, this situation is reversed: the available computational power is critical, and rendering has to be performed in real-time. Therefore, the concept of plausibility is applied: as long as there is no obvious contradiction between the visual and the acoustic representation of a virtual scene, the human senses merge auditory and visual impressions. Hence, it is usually possible to replace a cost-intensive geometry-based room acoustic simulation with a generic reverberation algorithm, for example, with combinations of all pass filters and delays according to Schroeder (1962, 1970), with nested all pass filters according to Gardner (1992), or with feedback delay line structures according to Jot and Chaigne (1991). This way, the auditory part of the presentation provides a rough sketch of the room’s characteristics, whereas the visual part complements the overall impression with an increased level of detail. As long as the information provided in the two modalities is not contradictory, there is a high chance that the player’s perceptual apparatus merges the stimuli and blends them to form a single, multi-modal representation of the scene. In general, it might be arguable whether a “perfect” reproduction of the properties of a real life experience will ever be possible in a computer game at all (with the assumption that a simulation is good enough as long as there is no perceptual difference to reality detectable by the human senses in the given situation). A lesser interpretation of this applies to scenes which have no counterpart in reality: their appearance needs to be plausible in every aspect and also in a sense of perfect agreement between the cues offered by the system in the different perceptual domains. In the context of games, this requirement can be further reduced. Because the visual representation of the scene is limited to a region in the frontal area and is not supposed to fill the field of view entirely, it suffices to require that the one part of
the virtual scene that is displayed (audio-visually) is perceived as plausible. It is thus accepted that stimuli coming from the surrounding real world (which cannot be entirely excluded in a typical computer game playing environment) might interfere with those from the virtual scene. Furthermore, the time and investment necessary to develop completely accurate auditory and visual models is as much of a limiting factor for how much detail will be rendered, as is the computational power alone. It is therefore reasonable to focus only on the most important stimuli and leave out those that would go unnoticed in a real world situation. In order to do so, it is necessary to predict what the most important stimuli or objects in the overall audio-visual percept are.
INtErActIVItY IssUEs AND PrEsENcE The concept of interactivity has been defined by Lee, Jin, Park, and Kang (2005) and Lee, Jeong, Park, and Ryu (2007) based on three major viewpoints: technology-oriented, communicationsetting oriented, and individual-oriented views. Here, the technology-oriented view of interactivity is adopted, which “defines interactivity as a characteristic of new technologies that makes an individual’s participation in a communication setting possible and efficient” (Lee et al., 2007). Steuer (1992) holds that interactivity is a stimulus-driven variable which is determined by the technological structure of the medium. According to Steuer, interactivity is “the extent to which users can participate in modifying the form and content of a mediated environment in real time” (p. 14) —in other words, the degree to which users can influence a target environment. He identifies three factors that contribute to interactivity: •
speed (the rate at which input can be assimilated into the mediated environment)
159
Perceived Quality in Game Audio
• •
range (the number of possibilities for action at any given time) mapping (the ability of a system to map its controls to changes in the mediated environment in a natural and predictable manner).
These factors are related to technological constraints that come into play when an application is supposed to provide interactivity to the user, as is the case for computer games. These technological constraints are briefly discussed in the following subsections.
Latency Latency is one of the main concerns in computer games. Latency in the context of interactivity can be defined as the time that elapses between a user input and the apparent reaction of the system to that input. It is closely related to Steuer’s speed factor. Latencies are introduced by individual components of the system. These components may include input devices, signal processing algorithms, device drivers, communication lines and so on. Although these components may interact in more than one way on a game platform, a system’s end-to-end latency should not vary over time to make it predictable. Meehan, Razzaque, Whitton, and Brooks (2003) report a study in which they tested the perceived sense of presence (see below) for two different end-to-end latencies in a Virtual Environment (VE). The low latency was 50ms, the high latency was 90ms. Test subjects were presented with a relaxing environment that was switched to a threatening one and their response was observed. Meehan et al. report that subjects in the lowlatency group had a higher self-reported sense of presence and a statistically higher change in heart rate between presentations of the two situations. MacKenzie and Ware (1993) conducted the first quantitative experiments with respect to effects of visual latency. Participants completed a
160
Fitts’ Law target acquisition task in which they had to move the mouse from a starting point to a target, with a latency of between 25ms and 225ms from moving the mouse to actually seeing the cursor move on the screen. The authors report that the threshold at which latency started to affect the performance was approximately 75ms. This effect was also dependent on the difficulty of the task: the harder the task, the greater was the adverse effect caused by increased latency. Wenzel (1998, 1999, 2001) has published a number of reports about the impact of system latency on dynamic performance in virtual acoustic environments with a focus on localization of sound sources. The bottom line is that depending on the source velocity of the audio signal itself, localization of sound sources might be impaired when total system latency (end-to-end latency) is higher than around 60ms for audio-only presentations (Wenzel, 1998). On the other hand, error rates in an active localization task, tested on an HRTF-based reproduction system, showed comparable error rates for both low and very high latencies suggesting that subjects were largely able to ignore latency altogether (Wenzel, 2001). Nordahl (2005) examined the impact of selfinduced footstep sounds on the perception of presence and latency. Interestingly, for audio-visual feedback in a VE, the maximum sound delay that was possible without latency being perceived as such was around 50% higher than for the audioonly feedback case (mean values of 60.9ms against 41.7ms). Nordahl explains this as attention being focused mainly on the visual, rather than the auditory feedback in the audio-visual case. Looking at these experimental results, it is difficult to draw a general conclusion on the maximum allowed latency for computer games. Apparently, the perception of latency as such depends on the system setup itself (screen, loudspeakers/ headphones, for example), on the task, and on the content that is displayed. At the same time, measuring total system latency correctly is not a trivial task. Therefore, a general recommendation
Perceived Quality in Game Audio
would be to keep latency as low as possible within any such system, that is, preferably below 50ms.
Input and Perceptual Feedback Perceptual feedback is the response that a system provides to the player’s input. In games, perceptual feedback is usually provided in the auditory and visual domains. Input provided by the player can, in the general case, consist of any kind of signal accepted by the system for controlling it: speech, gesture, haptic control, eye tracking and so forth. Input and perceptual feedback are related to Steuer’s (1992)mapping factor and his range factor is related to the kind of interaction that is offered by the game. This depends strongly on the goal of the application or game itself. In a first-person shooter, players might expect a different range of interaction than in a business simulation game. Hence, both input and perceptual feedback define the degree of interactivity a game player can experience.
Presence Closely related to interactivity is presence. Larsson, Västfjäll, and Kleiner (2003) define presence in interactive audio-visual application systems or VEs “as the feeling of ‘being there’” (p. 98), and as the element that generates involvement of the user. Lombard and Ditton (1997) define presence in a broader sense as the “perceptual illusion of nonmediation” (p. 24). According to Steuer (1992), the level of interactivity (degree to which users can influence the target environment) has been found to be one of the key factors for the degree of involvement of a user. Steuer has found vividness (ability to technologically display sensory rich environments) to be the second fundamental component of presence. Along the same lines, Sheridan (1994) assumes the quality and extent of sensory information that is fed back to the user, as well as exploration and manipulation capabilities, to be crucial for the
subjective feeling of presence. Other factors have been found to be determinants for presence but these depend on the theoretical concept applied by the researcher. Ellis (1996) points out that presence may not necessarily be the ultimate goal of every interactive audio-visual application system. He holds that successful task accomplishment can be far more important than presence, especially in situations “where the medium itself is not the message” (p. 253). This is easily accepted for player-game interaction, but is also applicable to communication between players in a multi-player game environment, when players have to team up to achieve a certain goal.
AttENtION When being confronted with an increased number of stimuli, the human perceptual apparatus will try to keep up with the processing required for the input on offer. Generally, this can be achieved using different strategies. According to Pashler (1999), all of them are usually referred to as attention. Many human activities require that information from a multitude of sources is taken in. When we attempt to monitor one stream of information, we pay attention to the source. Usually, natural scenes are multi-modal, thus providing information in more than one modality. Also, natural scenes usually provide more than one informational stream. The question is then, how is attention distributed if a multitude of information is presented in more than one stream? What role does multi-modality of the information play in computer games?
Perception of Multiple streams Eijkman and Vendrik (1965) conducted one of the earliest studies on the perception of bimodal stimuli. They asked test subjects to detect increments in the intensity of light and tones. The stimuli lasted one second and were presented
161
Perceived Quality in Game Audio
either separately or simultaneously. Subjects detected the increments in one modality without interference from simultaneously monitoring the other modality, and performance of detection was comparable to that of only monitoring one modality. Other studies, for example, Shiffrin and Grantham (1974) and by Gescheider, Sager, and Ruffolo (1975), also support these results for presentations of short bimodal stimuli. As the stimuli presented in the auditory and the visual modalities were not contextually related in the study of Eijkman and Vendrik (1965), they constituted what could be called separate perceptual streams. Yet, detection of increments in the duration of the same stimuli was showing marked interference. This suggests that temporal judgments might be processed by the same processing system (the same cortical areas), a theory that is further supported by the findings of Shams et al. (2000, 2002) already discussed in the subsection on visual dominance. Interestingly, other studies combining auditory and visual discrimination tasks showed modest but considerable decrements in terms of performance. This was observed when test subjects were confronted with bimodal stimuli in comparison to unimodal ones. To give an example, Tulving
and Lindsay (1967) presented test subjects with tones and patches of light. Subjects were asked to judge the intensity of either tone or light, and results were compared to the bimodal judgment of intensity of both stimuli. All of these studies characteristically involve magnitude judgments rather than categorical judgments. Therefore, the performance of test subjects in the bimodal case might have been limited by the difficulty of maintaining a standard in memory against which to judge the inputs, rather than by the influence of a second modality itself.
the Perceptual cycle Neisser’s model of the Perceptual Cycle describes perception as a setup of schemata, perceptual exploration and stimulus environment (Farris, 2003). These elements influence each other in a continuously updated circular process, see Figure 1. Thus, Neisser’s model describes at a very abstract level how the perception of the environment is influenced by background knowledge, which in turn is updated by the perceived stimuli. In Neisser’s model, schemata represent an individual’s knowledge about the environment. Schemata are based on previous experiences and
Figure 1. The Perceptual Cycle after Neisser. (Adapted from Farris, 2003)
162
Perceived Quality in Game Audio
are located in the long term memory. Neisser attributes to them the generation of certain expectations and emotions that steer our attention in the further exploration of our environment. The exploratory process consists, according to Neisser, in the transfer of sensory information (the stimulus) into the short-term memory. In the exploratory process, the entirety of stimuli (the stimulus environment) is compared to the schemata already known. Recognized stimuli are given a meaning, whereas unrecognized stimuli will modify the schemata, which will then in turn direct the exploratory process further (Goldstein, 2002, Farris, 2003). Returning to the area of games, the differences in schemata between human individuals cause the same stimulus to provoke different reactions in different game players. Following Neisser’s model, new experiences (those that cause a modification of existing schemata) are especially likely to generate a higher load in terms of processing requirements. Schemata therefore also control the attention that we pay toward stimuli. The exploratory process is directed in the same way for multi-modal stimuli as for unimodal stimuli.
selective Attention An unmanageable number of studies have tried to identify and describe the strategies that are actually used in the human perceptual process. Pashler (1999) gives an overview and identifies two main concepts of attention: attention as based on exclusion (gating) or based on capacity (resource) allocation. The first concept defines the mechanism that reduces processing of irrelevant stimuli to be attention. It can be regarded as a filtering device that keeps out stimuli from the perceptual machinery that performs the recognition. Attention is therefore identified with a purely exclusionary mechanism. The second concept construes the limited processing resource (rather than the filtering device) as attention. It suggests that when attention is given
to an object, it is perceptually analyzed. When attention is allocated to several objects, they are processed in parallel until the capacity limits are exceeded. In that case, processing becomes less efficient or eventually impossible. Neither of the two concepts can be ruled out by the many investigations performed in the scientific community up to now. Instead, assuming either the gating or the resource interpretation, all empirical results can be explained in some way or other. As a result it must be concluded that both capacity limits and perceptual gating characterize human perceptual processing. This combined concept is termed controlled parallel processing (CPP). It claims that parallel processing of different objects is achievable but optional. At the same time, selective processing of a single object is possible, largely preventing other stimuli from undergoing full perceptual analysis. In fact, further conceptualizing attention might not even be possible unless we understood the neural circuitry and operations that underlie these processes in detail. Rather, in the context of bimodal perception it is interesting whether there are separate perceptual attention systems associated with different sensory modalities or whether a unified multi-modal attention system exists. Are visual and auditory attention the same thing? According to Pashler (1999), investigations have shown that humans are capable of selecting visual stimuli in one location in space and auditory stimuli in another. Spence, Nicholls, and Driver (2001) have examined the effect of expecting a stimulus in a certain modality upon human performance. They measured the reaction time to a stimulus located in the auditory, visual, or tactile modality between different frequencies of occurrence (equal number of targets in all modalities against a 75% majority of targets located in one modality). Spence et al. report that reaction times for targets in the unexpected modalities were slower than for the expected modality or no expectancy at all. They further state that shifting attention away from the
163
Perceived Quality in Game Audio
tactile modality was taking longer than shifting from the auditory or visual modality. These results show that performance not only depends on what actually happens, but also on what is anticipated by a game player. Yet, it must also be noted that in this study a faster response time for the most likely modality was always related to priming from an event in the same modality on the previous trial, and not to the expectancy as such. Alais and Blake (1999) have found evidence that attention focused on a visual object markedly amplifies neural activity produced by features of the attended object. They applied single-cell and neuroimaging studies and reinforce that visual attention modulates neural activity in several areas within the visual cortex. They state that “attentional modulation seems to involve a boost in the gain of responses of cells to their preferred stimuli, not a sharpening of their stimulus selectivity” (p. 1015). These findings clearly indicate that the perceptual process is actually controlled by attention. They can not fully answer the question whether there is one multi-modal attention or whether attentions are associated with modalities. However, there are indicators that favor the latter.
Divided Attention and Perceptual capacity Limits One of these indicators is that capacity limits appear to be more severe when multiple stimuli are presented in the same modality compared with multiple modalities (Pashler, 1999; Reiter, Weitzel, and Cao, 2007; Reiter & Weitzel, 2007; Reiter, 2009). This means that capacity limits may occur earlier and more frequently if the main task and the so-called distractors (stimuli that are not directly related to the task/the direct focus of attention) are located in the same modality. In an overview article, Lavie (2001) examines the capacity limits in selective attention. Lavie reasserts and concludes what evidence from several studies suggests: that selective attention as discussed in the previous section can either
164
result in selective perception (concept of gating or early selection) or in selective behavior (resource allocation or late selection). Most importantly, she argues that the choice of mechanism actually applied depends on the perceptual load. At low perceptual load, irrelevant information continues to be processed—early selection fails and late selection becomes necessary. When the perceptual load is high, irrelevant information is not processed and resource allocation is no longer needed. She cites a number of experimental studies that support these conclusions: processing of distractors ceases when the perceptual capacity is exhausted. Interestingly, Lavie claims that distractor processing depends on perceptual capacity limits, rather than on limited information contained in the relevant stimuli. This makes the MLE model second-rank in importance: In the MLE model, limited information contained in the relevant stimuli should entail the processing of additional cues among the distractors to check for reliability of that limited information and the correctness of its interpretation. Following Lavie, this is either not possible when the perceptual load is high, or attention needs to be shifted to formerly irrelevant information.
PErcEPtUAL sALIENcE AND sALIENcE MODEL Landragin, Bellalem, and Romary (2001) suggest that in the absence of information about the history of an interactive process, a (visual) object can be considered salient when it attracts the user’s visual attention more than the other objects. This definition of salience originally valid for the visual domain can easily be extended to what might be called multi-modal salience, meaning that: •
certain properties of an object attract the user’s general attention more than the other properties of that object
Perceived Quality in Game Audio
•
certain objects attract the user’s attention more than other objects in that scene.
A salience model in the game context requires a user model of perception, as well as a task model. The user model describes familiarity of the game player with the objects’ properties, as attention on the properties of an object may vary with background and experience of the player. Whereas an avatar of a human being or a human speech utterance can be considered more or less equally salient to all players (because its significance to humans is embedded genetically), an acoustically trained person might focus more on the reverberation in a virtual room than a visually oriented person. The task model describes the fact that salience depends on intentionality: depending on the task a player is given, his focus will shift accordingly. Salience also depends on the physical characteristics of the objects themselves. In the auditory domain it is known that certain noises with increased measures of properties like sharpness or roughness call the attention more than others (Zwicker & Fastl, 1999). Adding to this, salience can be due to spatial or temporal disposition of the objects in a scene. One of the most interesting aspects of a salience model in the context of computer games is its dependency on the degree of interactivity that the game offers to the player. If the player is allowed to interact freely with the objects in a virtual scene, then it is quite easy to determine the player’s focus. Obviously, the player’s focus will be on the object he is currently manipulating, so there is a clear indication of where to create a higher agreement of modalities. Consequently, games with fewer interaction possibilities are less likely to provide a sense of being there to the player. Thus, interactivity is important for the perceived realism of games in two different ways: first, it allows the player to do something in the virtual world, and second, it allows the application to determine the player’s momentary focus. This information can then be used to enhance the
audio-visual appearance of the object in focus, for instance, by making the sound (effects) related to that object more realistic in terms of acoustic details, frequency range, localization and so on.
salience Model Obviously, there are situations in which the game engine has no or only limited information about the player’s current focus. In these cases, it appears to be useful to have a salience model classifying the objects contained in the game scene. No such generalized multi-modal salience model exists, yet. For the rather limited scope of a gaming situation, a qualitative salience model is suggested here. The salience model comprehends the influence factors that control the level of saliency of the objects in a game scene. Figure 2 shows how such a salience model may be structured: it is reasonable to start from the basis of human perception, the stimuli. In games, stimuli are generated by the game system itself, so they depend on a number of factors—the influence factors of level 1. These comprise the audio and visual reproduction setups, as well as input devices used for player feedback to (and control of) the system, like keyboard, joystick, mouse, or any other dedicated input device. Influence factors of level 1 are those related to the generation and control of stimuli. The core elements of human perception are sensory perception on the one hand and cognitive processing on the other. Sensory perception can be affected by a number of influence factors of level 2. These involve the physiology of the user (acuity of vision and hearing, masking effects caused by limited resolution of the human sensors and so on), as well as other factors directly related to the physical perception of stimuli. Cognitive processing produces a response by the player. This response can be obvious, like an immediate reaction to a stimulus, or it can be an internal response like re-distributing attention/ shifting focus or just entering another turn of
165
Perceived Quality in Game Audio
Figure 2. A salience model for perceived quality in audio-visual games
the Perceptual Cycle. Obviously, the response is governed by another set of influence factors of level 3. These span the widest range of factors and are also the most difficult to quantify: experience, expectations, and socio-cultural background of the player; difficulty of task in a specific game situation; degree of interactivity; and so forth Influence factors of level 3 are related to the processing and interpretation of the perceived stimuli. Cognitive processing will eventually lead to a certain quality impression that is a function of all influence factors of types 1–3. This quality impression cannot be directly quantified. It needs additional processing to be uttered in the form of ratings on a quality (or quality impairment) scale, as semantic descriptors and so on. The overall quality impression is, in turn, the result of evaluating single or combined quality attributes. For example, Woszczyk, Bech, and Hansen (1995) have developed a number of attributes that are believed to be relevant for an overall audio-visual quality impression: they organize these attributes (quality, magnitude, involvement, balance) into 4 dimensions of perception (motion, mood, space, action), resulting in a 4 by 4 matrix of quality criteria. Yet, a quantification of their impact is hardly possible as of now. This is because the
166
individual attribute’s weight not only depends on the audio-visual game scene under assessment (the stimuli), but also on the experimental methodology itself. An attribute that is explicitly asked for might be assumed to be of higher importance by a test player (we know from our experience that only important things are asked for in any kind of test). The player’s attention will be directed toward the attribute under assessment, which distorts unbiased perception of the audio-visual scene as a whole. Therefore, the player’s reaction in terms of quality rating can be assumed to be influenced as well.
Experimental results A number of experiments have shown that player interaction with an audio-visual game might have an effect on the perceived overall quality (Jumisko-Pyykkö, Reiter, and Weigel, 2007; Reiter et al., 2007; Reiter & Weitzel, 2007; Reiter & Jumisko-Pyykkö, 2007; Reiter, 2009). In these experiments, the general assumption was that by offering an attractive interactive content, or by assigning the user a challenging task, this user would become more involved and thus experience a subjectively higher overall quality. Along the
Perceived Quality in Game Audio
same lines, it was hypothesized that the subject’s ability to differentiate between different levels of quality would decrease with an increase in difficulty of task/degree of interaction. The results show that this is not generally the case. However, when both task and main varying quality attribute were located in the same modality, such an effect could be observed. More specifically, in the first experiment (Jumisko-Pyykkö et al., 2007; Reiter and JumiskoPyykkö, 2007) subjects were presented with a scenario located in a virtual sports gym. In the center of the gym, a loudspeaker was positioned that played back music/speech signals with varying amounts of reverberation (time and strength). Subjects were asked to rate the quality of reverberation under three different degrees of interaction: 1.
2.
3.
No interaction (watch task): subjects were automatically moved on a pre-defined motion path through the virtual scenario Limited interaction (watch and press button task): subjects were moved on a pre-defined motion path through the virtual scenario, but were asked to press a button whenever a certain object appeared within their field of view Full interaction (navigate and collect task): subjects were asked to move freely through the scenario by using the computer mouse and to collect as many objects as possible by approaching them.
Interestingly, the ability of subjects to rate the quality of reverberation correctly did not vary with the degree of interaction/difficulty of the task (Friedman Χ2=3.3, df=2, p>0.05, ns). Although subjects claimed to have experienced more difficulties in the interactive tasks, this did not show in the statistical analysis of the collected data. Three possible explanations were looked at. The first was that the quality differences were too obvious, that is, the steps between the different amounts of reverberation were too big. This is
possible but was not regarded as probable, given the results of informal experiments with a similar variation in reverberation. The second, was that the tasks (pressing a button, and navigating/ collecting objects) were not demanding enough and that it was too easy for subjects to dedicate part of their attention towards the quality-rating task. This was contradicted by the claims of the subjects themselves: a large majority claimed to have been distracted by the navigation task. The third possible explanation was subsequently looked at in further experiments: The additional cognitive load (pressing a button, navigating while collecting objects) was located in the visual and haptic domains, whereas the quality differences to be rated were located in the auditory domain. In a second round of experiments (compare Reiter et al., 2007; Reiter, 2009), both the additional cognitive load and the quality variations were located in the auditory domain. A virtual room (replica of the entrance hall of a large university building) was equipped with a virtual loudspeaker in the center, and subjects were asked to navigate freely through the room using a computer mouse. The loudspeaker played back a randomized sequence of numbers from 1 to 4 read out loud. The reverberation time of the room acoustic simulation could be adjusted between 1.0s and 3.0s in 0.5s steps, with 2.0s considered the “reference” reverberation time. In the experiment, the reverberation time was changed from reference to another value at a single random point in time during a transition time frame beginning 5 seconds after the start and ending 5 seconds before the end of each 30 second trial. A modified Degradation Category Rating scale according to Recommendation ITU-T P.911 (1998) was used, consisting of 5 levels (much shorter, shorter, equal, longer, much longer), to have subjects compare the test reverberation time with the reference reverberation time. The additional cognitive load consisted of a so-called n-back working memory task, similar to what has been introduced by Kirchner (1958). Here, subjects were asked to semantically compare
167
Perceived Quality in Game Audio
Figure 3. Presented stimuli and correct answers (“Comparison”) for 1-back and 2-back continuousmatching-tasks
the current stimulus (the current number) with the one presented n steps back, see Figure 3. In the experiment, n was varied between 0 (no additional load) and 2 (high additional load). The hypothesis was, again, that with increasing difficulty of the task, subjects would commit more errors in correctly rating the reverberation time as a measure of perceived quality. Here, for the statistical analysis, the rating errors were restructured according to flaw size, such that each 0.5s deviation would result in one error point. The subsequent analysis was performed on error points. A complete description of the experiment can be found in (Reiter, 2009, pp. 203-212). A comparison of the error rates for “navigation only” with “navigation with 2-back task” resulted in a highly significant difference (T=20, p≤0.01). Comparing these results to the first experiment described above, it becomes apparent that innermodal influence of task is significantly greater than cross-modal influence. This might indicate that humans perform a pre-processing of stimuli that—depending on modality – takes place in separate areas of the brain. Thus, in situations where stimuli that belong to different modali-
168
ties have to be processed at the same time, we are better able to parallelize and distribute the processing accordingly. This is also suggested by the common theories of capacity limits in human attention, see above.
Game Example In a third experiment (Reiter & Weitzel, 2007), inspired by Zielinski, Rumsey, Bech, de Bruyn, and Kassier (2003) and Kassier, Zielinski, and Rumsey (2003), it has been shown that crossmodal influence of interaction is very well possible when stimuli and interaction/task are carefully balanced. For this, a simplified Space Invaderslike arcade game has been created, in which two different types of objects (donuts and snowballs) moved through a virtual room. Motion of objects was straight towards the baseline, on which the player could move left and right. Players were instructed to collect as many donuts as possible and to avoid collisions with snowballs. Each collected donut resulted in an increase of the player’s score whereas a collision with a snowball decreased the score. The current game score was displayed
Perceived Quality in Game Audio
Figure 4. Grey-scale screenshot of the game scenario
on the screen near the chimney, which served as the source of the flying objects. Figure 4 shows a screenshot of the game scenario. A typical background music track for an arcade game was chosen for the game. For the experiment, each subject carried out a passive and an active session. The active session involved playing the computer game and evaluating the sound quality of the game music. This session was designed to cause a division of attention between evaluating the audio quality and reaching a high score. In the passive session, subjects were asked to evaluate the audio quality while a game demo was presented. Here, the attention of the subjects was assumed to be directed to the audio quality exclusively. In both sessions, active or passive, either the original (20kHz) game music, or a low-pass filtered version with cut-off frequencies at fc = 11kHz, 12kHz or 13kHz was played. This was complemented by an anchor with fc = 4kHz. Thus a total of 5, 3-test items, 1 anchor item, and 1 reference item (corresponding to the original full-range signal) were presented to the players in the experiment. After each round of the game,
players were asked to rate the perceived tonal quality degradation using the standardized ITUT P.911 (Recommendation ITU-T P.911, 1999), 5-level impairment scale. A total of 32 subjects participated in the experiment. Seven players were female and 25 were male (age M = 25.7, SD = 5.36). Regarding their listening experience, 20 subjects were considered initiated assessors and 12 classified as naive assessors. The group of initiated assessors had already gained abilities and knowledge in rating audio quality in preceding unimodal and bimodal subjective assessments. All participants reported normal hearing and normal or corrected to normal visual acuity. A Wilcoxon T test showed that the quality ratings of the active session varied significantly from the ratings of the passive session for cut-off frequencies up to 12kHz. A significant decrease in rating correctness was shown for the active session in comparison to the passive session for the anchor item (T = 37, p ≤ 0.01), the cut-off frequency fc = 11kHz (T = 452.50, p ≤ 0.01), and the cut-off frequency fc = 12kHz (T = 812, p ≤ 0.01). For the cut-off frequency of 13kHz and the
169
Perceived Quality in Game Audio
reference item, no significant differences could be found (T = 630.50 and T = 75, resp., p > 0.05, ns). The data analysis showed that the ratings of the tonal quality degradations in the active session differed significantly from those in the passive session. The low-pass filtering in the active session was rated as being less perceptible compared to the passive session, for which active players turned into passive viewers. More generally speaking, the experiment shows that an influence of interaction performed in one modality (visual-haptic) upon the perception of quality in another modality (in this case, auditive) is possible. Thus, cross-modal influences are possible. In order for a cross-modal influence to exist, the characteristics of stimuli and interaction/task must be carefully balanced. At this time, it is not possible to determine or quantify that balance a priori. However, some of the influence factors that contribute to this balance have been identified in the salience model in Figure 2 above. These influence factors need to be quantified and this is a task for the future.
sUMMArY AND cONcLUsION This chapter has reviewed some of the most important issues of perceived quality of audio in computer games. The main conclusion is that audio quality in games, as perceived by a game player, is not independent of other factors (apart from sound quality itself). Because games usually provide information and feedback to the player in more than the auditory modality, it is necessary also to take into account other modalities when judging the impact and quality of audio. A rating of audio quality alone, without the gameplay context, is not meaningful. The physical mechanisms of human auditory and visual perception are well understood. Crossmodal interaction between the two domains, that is, perceptual processing in the human brain, needs further research, before it is possible to model such
170
processes. Still, whether it is possible to come up with a generalized model of cross-modal perceptual processing at all is highly questionable. It is assumed by many that its complexity exceeds by far the possibilities for designing a suitable model. Yet, it seems feasible to aim at perceptual models that are valid for certain perceptual scenarios only. A specific game-playing scenario can be one of these, as factors like setup (computer screen, loudspeakers/headphones, input devices) and task are of rather small variance across users, given a certain use case. This has been demonstrated in the game example above. A salience model as described in this contribution could therefore serve as a starting point for the exploitation of salience effects. Saliency is closely related to distribution of attention and perceptual capacity limits. The experimental results summarized in this chapter indicate that effects of capacity limits are more dominant inner-modally than cross-modally. At the same time, capacity limits seem to be more predictable inner-modally than cross-modally. Unless we have better models of the perceptual processing underlying the generation of a subjective quality impression, it will be difficult to predict the perceived quality of audio in a multimodal context in general, or in a game context as discussed here. Nevertheless, both the experiments described, and the literature and effects reviewed here, suggest that there is potential for exploitation of such perceptual constraints. Future research should therefore concentrate on methodologies for the subjective evaluation of audio-visual quality, or multi-modal quality in general. Only a few recommendations exist for performing audio-visual experiments and the impact of interactivity—as naturally given in any gameplay—on the perceived quality is, until now, simply not considered at all. Once proper recommendations exist, it will be much easier to compare and validate experimental results, thus paving the way for a quantification of the salience model described in this chapter.
Perceived Quality in Game Audio
rEFErENcEs Alais, D., & Blake, R. (1999). Neural strength of visual attention gauged by motion adaptation. Nature Neuroscience, 2(11), 1015–1018. doi:10.1038/14814 Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. Journal of the Optical Society of America, 20(7), 1391–1397. doi:10.1364/JOSAA.20.001391 Beerends, J. G., & De Caluwe, F. E. (1999). The influence of video quality on perceived audio quality and vice versa. Journal of the Audio Engineering Society. Audio Engineering Society, 47(5), 355–362. Blauert, J. (2001). Spatial hearing: The psychophysics of human sound localization (3rd ed.). Cambridge, MA: MIT Press. Braasch, J. (2005). Modelling of binaural hearing . In Blauert, J. (Ed.), Communication acoustics (pp. 75–108). Berlin: Springer Verlag. doi:10.1007/3540-27437-5_4 Brodmann, K. (1909). Vergleichende Lokalisationslehre der Grosshirnrinde in ihren Prinzipien dargestellt auf Grund des Zellenbaues. Leipzig, Germany: Johann Ambrosius Barth Verlag. Eijkman, E., & Vendrik, J. H. (1965). Can a sensory system be specified by its internal noise? The Journal of the Acoustical Society of America, 37, 1102–1109. doi:10.1121/1.1909530 Ellis, S. R. (1996). Presence of mind... A reaction to Thomas Sheridan’s “Musing on telepresence.” . Presence (Cambridge, Mass.), 5, 247–259. Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Farris, J. S. (2003). The human interaction cycle: A proposed and tested framework of perception, cognition, and action on the web. Unpublished doctoral dissertation. Kansas State University, USA. Gardner, W. G. (1992, November). A realtime multichannel room simulator. Paper presented at the 124th meeting of the Acoustical Society of America. Gescheider, G. A., Sager, L. C., & Ruffolo, L. J. (1975). Simultaneous auditory and tactile information processing. Perception & Psychophysics, 18, 209–216. Goldstein, E. B. (2002). Wahrnehmungspsychologie (2nd ed.). Berlin: Spektrum Akadem. Verlag. Howard, I. P. (1982). Human visual orientation. New York: Wiley. Jot, J. M., & Chaigne, A. (1991). Digital delay networks for designing artificial reverberators. Paper presented at the AES 90th Convention. Preprint 3030. Jumisko-Pyykkö, S., Reiter, U., & Weigel, C. (2007). Produced quality is not perceived quality— A qualitative approach to overall audiovisual quality. In Proceedings of the 3DTV Conference. Kalman, R. E., & Bucy, R. S. (1961). New results in linear filtering and prediction problems. Journal of Basic Engineering, 83, 95–108. Kassier, R., Zielinski, S., & Rumsey, F. (2003). Computer games and multichannel audio quality part 2—Evaluation of time-variant audio degradation under divided and undivided attention. AES 115th Convention. Preprint 5856. Kirchner, W. K. (1958). Age differences in shortterm retention of rapidly changing information. Journal of Experimental Psychology, 55(4), 352–358. doi:10.1037/h0043688
171
Perceived Quality in Game Audio
Klein, D. J., König, P., & Körding, K. P. (2003). Sparse spectrotemporal coding of sounds. EURASIP Journal on Applied Signal Processing, 7, 659–667. doi:10.1155/S1110865703303051
Meehan, M., Razzaque, S., Whitton, M. C., & Brooks, F. P., Jr. (2003). Effect of latency on presence in stressful virtual environments. In Proceedings of IEEE Virtual Reality, 141-148.
Landragin, F., Bellalem, N., & Romary, L. (2001). Visual salience and perceptual grouping in multimodal interactivity. In Proceedings of International Workshop on Information Presentation and Natural Multimodal Dialogue IPNMD.
Nordahl, R. (2005). Self-induced footsteps sounds in virtual reality: Latency, recognition, quality and presence. In Proceedings of PRESENCE 2005, 8th Annual International Workshop on Presence, 353-354.
Larsson, P., Västfjäll, D., & Kleiner, M. (2003). On the quality of experience: A multi-modal approach to perceptual ego-motion and sensed presence in virtual environments. In Proceedings of First ISCA ITRW on Auditory Quality of Systems AQS-2003, 97-100.
Pashler, H. E. (1999). The psychology of attention. Cambridge, MA: MIT Press.
Lavie, N. (2001). Capacity limits in selective attention: Behavioral evidence and implications for neural activity . In Braun, J., & Koch, C. (Eds.), Visual attention and cortical circuits (pp. 49–60). Cambridge, MA: MIT Press. Lee, K. M., Jeong, E. J., Park, N., & Ryu, S. (2007). Effects of networked interactivity in educational games: Mediating effects of social presence. In Proceedings of PRESENCE2007, 10th Annual International Workshop on Presence, 179-186. Lee, K. M., Jin, S. A., Park, N., & Kang, S. (2005). Effects of narrative on feelings of presence in computer/video games. In Proceedings of the Annual Conference of the International Communication Association (ICA). Lombard, M., & Ditton, Th. (1997). At the heart of it all: The concept of presence. Journal of Computer-Mediated Communication, 3. MacKenzie, I. S., & Ware, C. (1993). Lag as a determinant of human performance in interactive systems. In Proceedings of the ACM Conference on Human Factors in Computing Systems – INTERCHI’93, 488-493.
172
Pulkki, V. (2001). Spatial sound generation and perception by amplitude panning techniques. Unpublished doctoral dissertation. Helsinki University of Technology, Finland. Recommendation ITU-T P.911. (1998/1999). Subjective audiovisual quality assessment methods for multimedia applications. Geneva: International Telecommunication Union. Reiter, U. (2009). Bimodal audiovisual perception in interactive application systems of moderate complexity. Unpublished doctoral dissertation. TU Ilmenau, Germany. Reiter, U., & Jumisko-Pyykkö, S. (2007). Watch, press and catch—Impact of divided attention on requirements of audiovisual quality . In Jacko, J. (Ed.), Human-Computer Interaction, Part III, HCI 2007 (pp. 943–952). Berlin: Springer Verlag. Reiter, U., & Weitzel, M. (2007). Influence of interaction on perceived quality in audiovisual applications: Evaluation of cross-codal influence. In Proceedings of 13th International Conference on Auditory Displays (ICAD), 380-385. Reiter, U., Weitzel, M., & Cao, S. (2007). Influence of interaction on perceived quality in audio visual applications: Subjective assessment with n-back working memory task. In Proceedings of AES 30th International Conference.
Perceived Quality in Game Audio
Schroeder, M. R. (1962). Natural sounding artificial reverberation. Journal of the Audio Engineering Society. Audio Engineering Society, 10(3), 219–223. Schroeder, M. R. (1970). Digital simulation of sound transmission in reverberant spaces (part 1). The Journal of the Acoustical Society of America, 47(2), 424–431. doi:10.1121/1.1911541 Shams, L., Kamitani, Y., & Shimojo, S. (2000). What you see is what you hear. Nature, 408, 788. doi:10.1038/35048669 Shams, L., Kamitani, Y., & Shimojo, S. (2002). Visual illusion induced by sound. Brain Research. Cognitive Brain Research, 14, 147–152. doi:10.1016/S0926-6410(02)00069-1 Sheridan, T. B. (1994). Further Musings on the Psychophysics of Presence. Presence (Cambridge, Mass.), 5, 241–246. Shiffrin, R. M., & Grantham, D. W. (1974). Can attention be allocated to sensory modalities? Perception & Psychophysics, 15, 460–474. Singer, W., Engel, A. K., Kreiter, A. K., Munk, M. H. J., Neuenschwander, S., & Roelfsema, P. R. (1997). Neuronal assemblies: necessity, signature and detectability. Trends in Cognitive Sciences, 1(7), 252–261. doi:10.1016/S13646613(97)01079-6 Spence, C., Nicholls, M. E. R., & Driver, J. (2001). The cost of expecting events in the wrong sensory modality. Perception & Psychophysics, 63(2), 330–336. Steuer, J. (1992). Defining virtual reality: Dimensions determining telepresence. The Journal of Communication, 42(4), 73–93. doi:10.1111/j.1460-2466.1992.tb00812.x Tulving, E., & Lindsay, P. H. (1967). Identification of simultaneously presented simple visual and auditory stimuli. Acta Psychologica, 27, 101–109. doi:10.1016/0001-6918(67)90050-9
Wenzel, E. M. (1998). The impact of system latency on dynamic performance in virtual acoustic environments. In Proceedings of the 15th International Congress on Acoustics and 135th Meeting of the Acoustical Society of America, 2405-2406. Wenzel, E. M. (1999). Effect of increasing system latency on localization of virtual sounds. In Proceedings of the AES 16th International Conference on Spatial Sound Reproduction, 42-50. Wenzel, E. M. (2001). Effect of increasing system latency on localization of virtual sounds with short and long duration. In Proceedings of 7th International Conference on Auditory Displays (ICAD). 185-190. Woszczyk, W., Bech, S., & Hansen, V. (1995). Interactions between audio-visual factors in a home theater system: Definition of subjective attributes. AES 99th Convention. Preprint 4133. Zielinski, S., Rumsey, F., Bech, S., de Bruyn, B., & Kassier, R. (2003). Computer games and multichannel audio quality—The effect of division of attention between auditory and visual modalities. In Proceedings of the AES 24th International Conference on Multichannel Audio, 85-93. Zwicker, E., & Fastl, H. (1999). Psychoacoustics—Facts and models (2nd ed.). Berlin: Springer Verlag.
KEY tErMs AND DEFINItIONs Binaural: Literally means “having or relating to two ears”. Binaural hearing, along with frequency cues, lets humans determine the direction of incidence of sounds. Brodmann Areas: 52 different regions of the cortex, defined on the basis of the organization of cells. Named after Korbinian Brodmann’s maps of cortical areas in humans, published 1909.
173
Perceived Quality in Game Audio
CognitivE Load: A term describing the load on working memory during instruction (problem solving, thinking, reasoning). Dorsal Stream: Also known as the parietal stream, the “where” stream, or the “how” stream, proposed to be involved in the guidance of actions and recognizing where objects are in space. Fitts’ Law: A model of the human movement in human-computer interaction and ergonomics which predicts that the time required to rapidly move to a target area is a function of the distance to and the size of the target. Localization: The ability to detect the direction of incidence of a sound. Monaural: Literally means “having or relating to one ear”. Multi-Modal: More than one perceptual modality involved, usually the auditory and the visual domain, sometimes also including haptics. Perceptual Cycle: A model describing human perception as a cyclic setup of schemata, perceptual exploration, and stimulus environment which influence each other in a continuously updated process, first introduced by US psychologist Ulric Neisser.
174
Presence: The feeling of being present in an artificial environment, for example a game scenario in a jungle. Quality of Experience: The overall acceptability of an application or service, as perceived subjectively by the end-user. Quality of Experience includes the complete end-to-end system effects (client, terminal, network, services infrastructure and so on). Overall acceptability may be influenced by user expectations and context. Salience: The state or quality of an item that stands out relative to neighboring items. Schema: Previous knowledge, something we already understand or are familiar with. Single-Cell Recording: A technique used in brain research to observe changes in voltage or current in a neuron, thus measuring a neuron’s activity. Space Invaders: An arcade video game designed by Tomohiro Nishikado, released in 1978, with the aim of defeating waves of aliens with a cannon, earning as many points as possible. Ventral Stream: Also known as the “what” stream, associated with object recognition and form representation.
Section 3
Emotion & Affect
176
Chapter 9
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games Paul Toprac Southern Methodist University, USA Ahmed Abdel-Meguid Southern Methodist University, USA
AbstrAct This chapter provides a theoretical foundation for the study of how emotions are affected by game sound as well as empirical evidence for determining how to promote fear, suspense, and anxiety in players using sound effects. Four perspectives on emotions are described: Darwinian, James-Lange, cognitive, and social constructivist. Three basic properties of diegetic sound effects were studied: volume, timing, and source. Results strongly suggest that the best sound design for causing fear is high volume and timed sound effects (synchronized game sound with visual moment) and somewhat suggest that sourced sound effects also promote fear. For anxiety, results strongly suggest that the best sound design is medium volume sound effects. Results also suggest that acousmatic and untimed sound effects evoke suspense rather than anxiety. Low volume sound effects are not effective at evoking fear, suspense, and anxiety due to potential masking by other sounds. Implications and future research directions are presented.
INtrODUctION Computer games are audio-visual entertainment media that provide an escapist experience (Grimshaw, 2007). That is, computer games utilize both audio and visual media to capture players’ attention and engage players’ motor and mental skills; thus immersing the players in the gameworld. This immersion provides an escape for players from DOI: 10.4018/978-1-61692-828-5.ch009
everyday life. Immersion occurs when the game: (1) “monopolizes the senses” (Carr, 2006, p. 68), (2) engages the player psychologically, and (3) requires physical action (see Nacke & Grimshaw, 2011 for more on immersion). The authors of this chapter believe that all three components of immersion are highly linked and can be (and are) used to evoke emotions from players. Visuals and sound are often used to elicit specific emotions among the consumers of computer games. Currently, however, the computer game
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
industry is focused on the quality of the graphics within the game. The computer game industry has clear guidelines for visuals, but not particularly for sound. Yet, sound is as least as important, if not more important, than visuals for creating immersion and evoking emotions (Anderson, 1996; Grimshaw, 2007), though often underrated by the players (Cunningham, Grout, & Picking, 2011). Sound can change the player’s perception of images to the point where the sound dominates even when the player is presented an opposing relationship between the sound and image (Collins, Tessler, Harrigan, Dixon, & Fugelsang, 2011). Unfortunately, as Collins (2007) states, “work into the sonic aspects of audio-visual media has neglected games [and] video games audio remains largely unexplored” (p. 263). Furthermore, as Serafin (2004) wrote, “[s]o far no quantitative results are available to help designers to build soundscapes which allow the user to feel fully immersed” (p.4). And, finally, according to Nacke and Grimshaw (2011), “not much work has been put into sensing the emotional cues of game sound in games, let alone in understanding the impact of game sounds on players’ affective responses”. The purpose of the current chapter is to create a theoretical foundation and empirical evidence for the study of how emotions and affect are impacted by game sound. Although Roux-Girard (2011) “firmly believes that adopting a position that emphasizes reception issues of gameplay can provide a more productive model than one that would be grounded directly in the production aspects (implementation and programming) of game audio”, we believe that researching the impact of the production aspects of game sounds is just as productive. Ultimately, we believe that both approaches are equally viable and should be used to understand the experience of game sounds. Whereas Roux-Girard attempts to understand the effect of game sounds from a top-down approach, our intent is to build from bottom-up a research foundation upon which further inquiry into the relationship between emotions and game sound
can be conducted. Furthermore, our aim is to produce valid results that are able to both explain phenomena and be useful for game designers. Specifically, this chapter describes a study to determine the best sound design principles pertaining to game sound effects (defined here as all diegetic game sound except dialogue) to cause fear and anxiety in players—two common emotions that players feel while playing computer games. The empirical research examines how to manipulate three basic properties of game sound (volume, timing, and source) through a game level designed to evoke fear, suspense, and anxiety. Through this quantitative and qualitative examination, the general design principles of how to develop game sound effects to promote fear and anxiety is better understood.
bAcKGrOUND: LItErAtUrE AND FIELD rEVIEW In order to design games and perform research using game sound for promoting fear, suspense, and anxiety, both theories of emotion and the current state of the art design of sound effects in games are important to understand. Emotions and affect are elusive in nature, and difficult to define (Cornelius, 1996). For instance, some consider emotions and affect to be the same psychological construct, while other researchers consider affect to be the conscious experience of emotions. In either case, our research measures the conscious experience of emotion, whether that is considered affect or emotion or both. Furthermore, rather than define emotion and affect, which is attempted in Nacke and Grimshaw (2011), we will describe emotions from the perspectives of four theoretical traditions of research on emotion in psychology (Cornelius, 1996). These schools of thought on the sources and development of emotion are the Darwinian theory of emotions, James-Lange theory, cognitive theory, and social constructivist theory. Our intent is to provide an understanding of the emergence
177
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
of emotions while playing a game, which Perron (2005a) termed gameplay emotions. Gameplay emotions are different than everyday emotions. For instance, gameplay emotions can be paradoxical in nature, such as deliberating watching a scary movie to enjoy the sensation of fear (Cunningham et al., 2011). The following discussion on emotions is focused on fear, anxiety, and suspense.
theories of Emotions and Games The Darwinian Perspective of Emotions In the Darwinian perspective of emotions, there are certain basic emotions that are inherited and shared across the human experience. Researchers of the Darwinian perspective, such as Plutchik (1984), have identified several primary emotions, such as rage, loathing, grief, amazement, terror, admiration, ecstasy, and vigilance. Each of these emotions has several levels of intensity. For instance, the less intense levels of the emotion of terror are fear and then apprehension. Game players can be observed showing many of these identified primary emotions. For example, players often feel fear or apprehension at the appearance of the enemy, particularly in survival horror computer games. Plutchik’s theory posits that we can promote fear in everyone. Fear is a psychological experience to prepare individuals for the ‘freeze, fight or flight’ response (Gray, 1971). However, Plutchik’s theory does not easily account for anxiety. Fear is a reaction to a specific danger or threat while anxiety is unspecific, vague, and objectless (May, 1977). Thus, anxiety is not a lower level of intensity of fear, or even apprehension. Anxiety is diffuse with a vague sense of apprehension (Kaplan & Sadock, 1998), rather than apprehension due to a specific stimulus (Gullone, King, & Ollendick, 2000). Anxiety is often thought to be a future-oriented mood--a vague discomforting sense--that things will go wrong, which can have an adaptive function of enhancing performance at optimal levels
178
(Barlow, 1988). May (1977) resolves whether anxiety is innate or not by suggesting that all humans have the instinctive capacity to react to threats, whether the threat is concrete (for fear) or unspecific (anxiety). However, what the individual considers threats may be learned and are triggered by the appraisal of particular events or stimuli.
The Cognitive Perspective of Emotions The importance of appraisals of particular events or stimuli, and their associations with emotions, is illuminated by the cognitive theory of emotions (Cornelius, 1996). Based on the cognitive perspective, emotions and behavior are constantly changing as an individual appraises and reappraises the changing environment (Folkman & Lazarus, 1990). Depending on what the player of a game is consciously thinking of a situation, he or she can experience any of a range of emotions and behaviors. Appraisals and reappraisals are important parts of the emotional experience of survival horror games (Perron, 2004). Game players may feel fear and anxiety by appraising particular sounds as being “scary” or “creepy” or they may appraise the same game sounds as “silly.” Game designers can promote the experience of fear and anxiety through priming cues, such as music, acousmatic sound effects, and visuals, which can encourage “thinking” about the scariness or creepiness of the game. After being primed, the player is more likely to appraise particular stimuli in the way that is desired by the game designer, such as fear when suddenly a monster appears in survival horror computer games.
The James-Lange Perspective of Emotions Appraisals are also an important part of the James-Lange perspective of emotions. However, in this perspective the appraisals are unconscious evaluations of the body’s response to stimuli (Cornelius, 1996). While playing games, the body
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
reacts and provides feedback to the brain, which unconsciously appraises the body’s reaction and influences further cognition. For instance, if the player jumps due to a stimulus, the mind may attribute that bodily action to being scared. Research by Wolfson and Case (2000) showed that louder sounds in a computer game increased heart rate and impacted physiological arousal and attention. According to the James-Lange perspective, the player may unconsciously attribute increased heart rate and arousal as a state of fear and/or anxiety. The game designer can use this knowledge to his or her advantage. Loud and sudden noises, for example, can make players instinctively jump, which promotes the feeling of fear. However, it is less clear how to use this perspective for eliciting anxiety. If players sweat profusely while playing a game, will he or she attribute that to fear or anxiety or something else? Perhaps the answer depends on the stimuli, which leads some researchers back to the Darwinian and cognitive perspectives of emotions, where Darwinians believe that our emotional reactions to stimuli are innate and cognitivists believe that they are learned.
The Social Constructivist Perspective of Emotions Are our reactions to particular stimuli innate or learned? If learned, how can game designers know what stimuli to use to promote fear and anxiety? Although there is some controversy regarding the answer to the former question, there do seem to be a few, select stimuli that innately promote fear such as sudden, unexpected movements, especially approach-motions, or sounds (Gebeke, 1993) and, for anxiety, the threatened security patterns between an individual and his or her valued significant persons (May, 1977). However, beyond that, it would seem that our reactions to particular events or stimuli are learned, and this leads to the question of how and what is learned. The response to this is best answered by the social constructivist perspective of emotions.
Fear and anxiety responses to particular stimuli are learned through conditioning from family and other valued persons, which, in turn, are part of the larger general culture (May, 1977). Socialconstructivists believe that emotions are used to maintain interpersonal relationships and identity in a person’s communities (Greeno, Collins, & Resnick, 1996). The community can be friends, relatives, or other game players, who are all influenced by the general culture. For instance, people often feel scared when they suddenly see a cockroach because that is what they have learned from their mothers, who feared and loathed cockroaches. If they did not feel fear, but rather liked cockroaches, their relationship with their mothers may have become strained, which, at a young age, would not be desirable. Thus, these youths appropriate the feeling of fear of cockroaches from their mothers, who, in turn, maintain this fear because it is part of the cultural milieu in which the mothers desired to participate. Finally, as Cunningham, Grout, and Picking (2011) point out, the social and cultural milieu includes the context in which the person is experiencing his or her emotion. Thus, emotions that emerge while playing games (that is, gameplay emotions) are different from everyday emotions because the context of playing games is not the same as the context of typical everyday experiences. For game designers, this means that anything that the player has been taught to fear can be leveraged to promote fear, including such things as death and failure, which are risks (to the player’s avatar) in most computer games. Furthermore, game designers can use sound effects as cues for threats. For example, if a player gets close to an electrical hazard, the game designer can add a loud sparking noise to scare the player. However, game designers must keep in mind that particular graphics and sound may elicit different or less intense emotions between individuals in different cultures. For instance, the sound of a slide-action shotgun pumping may promote more fear or
179
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
anxiety in America than in countries where there are fewer shotguns. Though “response to sound, therefore, can vary from player to player” (Collins et al., 2011), the theories of emotion described in this section provide a framework to understand the sources and range of emotional responses of players to game sound. In the Darwinian perspective, there are certain basic emotions that are inherited and shared across the human experience. The cognitive perspective to emotions contributes to understanding them by explicating the importance of appraisals of stimuli and their underlying association with emotions. Researchers of the James-Lange theory of emotions believe that humans first experience bodily changes as a result of the perception of the emotion-eliciting stimuli and that is the experience of the feeling. Furthermore, the social constructivist perspective posits the theory that emotions are learned and culturally determined. These emotion theories correspond to three generally accepted forms of human expression of emotions: expressive behavior (showing the emotion), subjective experience (appraising the feeling), and the physiological component (sympathetic arousal) (see Cunningham et al., 2011). As previously mentioned, anxiety, at first glance, could be conceptualized as a less intense experience of fear, but this is not considered by most emotion researchers to be the case. Fear and anxiety are closely related but not the same (Gullone, King, & Ollendick, 2000). Fear is an emotional response to a particular event or object and anxiety is an emotional response to an unspecific event or object. Though fear and anxiety are considered as two separate mental processes, representing different affective and cognitive states, the two are considered linked. Fear can feed off anxiety and vice-versa. Game designers may be able to increase players’ fear if the players are already anxious rather than in a state of calm. Because the player is already in a state of nervousness and worry, he or she may perceive
180
a threat to be more dangerous than warranted, resulting in an elevated fear response. So, what is the emotion of suspense? Compared to fear and anxiety, there has been very little research on suspense. However, psychologists quoted in Paradox of Suspense (Carroll, 1996) provide this definition of suspense: “…a Fear emotion coupled with the cognitive state of uncertainty” (p. 78). That is, fear coupled with anxiety. The film scholar, Zillman (1991), describes suspense as “the experience of uncertainty regarding the outcome of a potentially hostile confrontation” (p. 283), which is similar to the definition of anxiety but with more emphasis on specific stimuli that are associated with fear. Thus, we conclude that suspense is the intersection or overlap of fear and anxiety. Suspense can be viewed as fear of imminent threat that is likely to occur, but has not appeared, and/or a state of high anxiety due to an impending dangerous situation. As Krzywinska (2002), a professor in film studies, states: “Many video games deploy sound as a key sign of impending danger, designed to agitate a tingling sense in anticipation of the need to act” (p. 213). Fear, anxiety, and suspense are gameplay emotions that are intentionally promoted in the design of survival horror games. Game designers control all that the player sees and hears within the survival horror game experience, and they have used this control to develop sound design techniques to elevate the player’s fear, suspense, and anxiety. Some of these techniques are explained in the following section.
sound Design in Games Sound design is used in almost all computer games. To design the soundscape in a computer game, there are a large number of sound properties that can be manipulated (see Liljedahl, 2011; Wilhelmsson & Wallén, 2011). In this chapter, sound properties are reduced to three independent variables: volume, timing, and source. We believe that these are three of the most basic properties
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
of sound that are considered when designing the soundscape of a computer game. Volume is the relative loudness at which a sound is heard from a loudspeaker. Timing is the relative synchronization of the sound with its source. Source is the origin of a sound. Game sounds are here categorized into the common topology of three separate categories: music, dialogue, and sound effects (Wilhelmsson & Wallén, 2011). Music is a type of mood-setting technique that typically coincides with the theme of a game. Music is not considered as part of this study because it is typically non-diegetic in nature and music is “heavily controlled by tempo” (Cunningham et al., 2011) rather than the sound properties studied here. Dialogue is diegetic but is not considered in this chapter because properties of speech, such as intonation, may be more important than the selected properties of volume, timing, and source in their impact on players. Sound effects are diegetic game sounds, such as ambient, weapon, and environmental sounds. Examples of sound effects are: ambient noises such as rustling leaves and the steady drip of rain; player avatar sounds that are not related to dialogue, such as pained grunts; and weapon noises, such as the crack of a rifle or the swing of a club. To understand how the sound properties of sound effects are used by game designers to cause fear and anxiety, the state of the art in sound design for games is reviewed in this section. Specifically, computer games in the survival horror genre are reviewed using the sound properties of volume, timing, and source. Survival horror games provide good case studies because they are designed to keep the player in a state of fear, suspense, and anxiety throughout the game: “Crawling with monsters, survival horror games make wonderful use of surprise, attack, appearances, and any other disturbing action that happens without warning” (Perron, 2004, p. 2). The games chosen for this field review are: Alone in the Dark (2008), Dead Space (2008), Doom 3 (2004), Eternal Darkness (2002), and Silent Hill 2 (2001). The following
provides an overview of the sound design of the five different survival horror games selected: •
•
•
•
•
Alone in the Dark: This game uses high quality visuals coupled with interspersed moments of surprise to cause player fear and anxiety. Dead Space: This game has an abundance of ambient sound effects and clutter to add to the realism and increase the player’s anxiety and suspense. Dead Space also uses a combination of well-timed and high-volume sound effects to elicit fear responses from the player, and has received praise for its sound design. Doom 3: The soundscape of Doom 3 focuses on voice acting and ambient sound effects. The ambience succeeds in creating a mood of suspense, while the encounters with monsters focus on creating fear. Eternal Darkness: This game takes a minimalistic approach to sound design and only uses sounds very sparingly. This approach allows the player to hear what few sounds are in the game with little difficulty and this increases the effect of each sound. Silent Hill 2: This is an older game that uses a minimalist approach to sound design, like Eternal Darkness, but with more of an emphasis on game sounds without visible sources to create suspense.
Volume of sound Effects While game designers decide on what level of volume to play their sound effects relative to other sounds, players change the overall volume of the game sounds emitting from their loudspeakers at will. Thus, the game designer can only manipulate the magnitude of volume in relation to other sounds. A “loud” sound has a higher volume than the average sounds currently playing. A “soft” sound has a lower volume than the average. Psychoacoustic research suggests that the lower
181
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
in volume a sound effect is, the more likely that players will miss the sound (Healy, Proctor, & Weiner, 2003). For game designers, this means they should create important sound effects that are at least the same volume, if not louder, than the ambient soundscape in order to be perceived. Loud sound effects are more likely to be effective at evoking sudden and shocking emotions in the player than soft sounds. Softer sounds, however, may serve as a good atmospheric tool that can enhance immersion and set a mood. Computer games typically play ambient sound effects at low to medium volume, while emotion-evoking sound effects play at medium to high volumes to maximize the likelihood that the player perceives them. For instance, in Alone in the Dark, the ambient sound effects such as electrical sparks and raging fires are typically abrupt and louder than the background music but soft enough that they do not drown out other important sounds like dialogue and combat sound effects. The ambient sound effects of Dead Space consist of steam vents leaking, garbage rustling, and lights sparking, and are all low to medium volume. In these cases, it is not clear whether the ambient sounds solely promote greater immersion and/or promote anxiety. In Dead Space, the player’s interaction with the monsters is the most important part of the game, and thus, the sound effects related to these interactions are the loudest. For instance, any time the player engages in combat, the monsters screech loudly and very discernibly until they die. The scream instantly tells the player that his or her avatar is in danger and, because the monsters can kill the player’s avatar, these sound effects can cause fear in the player. All of the ambient sounds and music in Doom 3 are loud: they mask out almost all other game sound. The enemy sounds are quieter than the ambient noise, which causes the enemies to seem less menacing. The only thing consistently louder than the music and the ambient noise is the player’s gun and avatar’s pain screams. Doom 3
182
seems to focus more on visual quality than sound quality. Yet, one section in Doom 3 that stands out among the rest occurs when a screaming, flying skull circles around the player’s avatar, and its volume rises and falls based on its distance from that avatar. This part of the game causes fear due to the perceived danger from the sudden and loud sounds accompanied by the mysterious nature of the flying skulls, though over time the fear subsides as the player becomes more habituated to the situation. Furthermore, Doom 3 has an enemy ambush almost every time the player’s avatar picks up an item. The game has the same sound and the same enemy for many of these ambushes. This eventually becomes repetitive and boring, and players begin to anticipate the ambushes. Eternal Darkness and Silent Hill 2 use the technique of “less is more,” where a few high volume sound effects scattered throughout evoke more fear and anxiety than many high volume recurring sound effects that may eventually cause habituation, as in Doom 3. The use of volume with sound effects varies depending on whether the game designer is attempting to evoke fear or anxiety. Based on the above review of games, high volume, abrupt sound effects seem to be more effective at causing fear, while low to medium volume ambient sound effects may be more effective at creating suspense and anxiety by convincing the player that they are in a dangerous circumstance.
timing of sound Effects Game designers decide for each sound effect one of three alternatives for timing: (1) the sound effect is timed to coincide with a corresponding, often visible event or object, (2) the sound effect and the corresponding event or object lag each other, or (3) the sound effect is played without regard to corresponding specific event(s), that is, untimed. Thus, timing can be conceptualized as the degree of synchronization between the sound effect and visible object(s) (see Roux-Girard, 2011). For
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
instance, when an enemy ambush in a game begins, an appropriate sound effect accompanies the ambush (the game sound is highly synchronized with the visible event), such as a door swinging open or glass shattering. The intended purpose of these synchronized sound effects can be to surprise or startle the player, which can promote fear. While many survival horror games use ambushes regularly, such as the majority of encounters in Dead Space and Doom 3, Silent Hill 2 seldom uses ambushes to scare the player. Rather, this game does the opposite by playing a radio static sound loop, emanating from the avatar’s pocket radio, to warn players of nearby enemies. The player quickly learns that the white noise is a forewarning of an imminent attack. This lagging technique, coupled with the extremely limited visibility in the game, causes players to search for the source of the static whenever they hear it. Players know that a dangerous situation is nearby, which often causes players to feel suspense. This forewarning is an emotional and cognitive cue for problem solving (Perron, 2004). Untimed environmental sound effects are present in Alone in the Dark and Silent Hill 2. In these games, sounds such as crackling fire, whistling wind, or shaking earth are seemingly set to play at random.
source of sound Effects According to psychoacoustic theories, humans judge whether a sound comes from an appropriate source by the visible availability of a source and whether or not that source could sensibly create that sound (Healy, Proctor, & Weiner, 2003). Almost all games have clearly visible sources for their sound effects, such as the sound of an attacking enemy. Providing a visible source of sound helps the player determine what to do within the game and helps the player navigate through the game by listening (Grimshaw, 2008), and enhances the player’s avatar survival prospects (Roux-Girard, 2011). For instance, the player can listen for the location of monsters. Furthermore, ambient game
sounds help to immerse the player by bridging the reality gap between the game and real physical environments (Liljedahl, 2011). Some examples of ambient sound effects are leaking ventilation shafts in Alone in the Dark, sparking electrical wires in Dead Space, and rustling leaves in Eternal Darkness, which are all visible to the player when played. These ambient sound effects also help set the mood of the game for players, which may encourage players to appraise objects and events in the game as scary. In Silent Hill 2, however, most ambient sound effects have no visible source. Players of Silent Hill 2 are unable to find the source of these sounds (that is, acousmatic sounds), such as babies crying, discordant wind chimes clanging together, and tricycle bells ringing. Furthermore, monsters’ sounds are mixed at low volume within the nondiegetic music (Roux-Girard, 2011). These sound production techniques add a strong air of mystery and ambiguity between sound generators (RouxGirard, 2011), which may cause anxiety. Based on the literature and field review, our experimental hypothesis for designing sound effects for fear and anxiety is as follows: high volume, synchronized sound with the corresponding visual stimulus, and visibly sourced sound effects are more effective at creating fear, and low to medium volume, scary, eerie or mysterious acousmatic sound effects are more effective at creating anxiety, with no difference between timed and untimed sound effects for anxiety. The authors believe that if the source of the sound effect can be seen by the player then the synchresis of timing with the visible threat becomes salient, promoting veridicality (Collins et al., 2011; Roux-Girard, 2011) and resulting in the player feeling fear. If the source cannot be seen, that is, acousmatic sounds, then synchresis is not achieved and the player cannot determine the relationship between what the player sees and what the player hears: this should promote anxiety. If the sound effect is not timed with the visible threat, then the sound effect will probably be ignored and will not promote
183
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
fear or anxiety. Furthermore, timed sound effects-synchronized game sound and visual threat--will not promote anxiety because the threat becomes a clear stimulus. Thus, timing should not promote anxiety. Fear and anxiety were measured in the experiment discussed below, rather than suspense, because fear and anxiety are considered separate emotions, whereas suspense is the overlap of these emotions, which could confound the interpretation of the results. Our hypothesis was tested using the methodology described in the next section.
EXPErIMENtAL EVIDENcE The hypothesis, as stated in the previous section, was tested using a survival horror game level in Gears of War (2007), which was created in Unreal Editor 3. During each test subject’s play-through the participant heard one randomly selected alternative (wolf howl, gunfire, or wretch growl) for the volume test, one randomly selected alternative (thunder, boomer growl, or creaking door) for the timing test, and one randomly selected alternative (locust growl, glass shattering, or footsteps) for the source test. Both quantitative data using 7-point self report surveys and qualitative data were gathered and analyzed. Although there can be issues with “after-the-fact narration” (Nacke & Grimshaw, 2011) by participants completing self report surveys and interviews, the use of these indirect measures are common approaches to data gathering in research of emotions. Thirtyfour participants in the U.S.A., ten females and twenty-four males participated in the study. The average participants’ age was 26 years old. The average playing time per week was about eleven hours, and fifteen participants (approximately 44%) liked playing survival horror games. (For a full exposition of the methodology and results, see Amdel-Meguid, 2009).
184
causing Fear Findings Results showed a statistically significant (p < 0.05) and large difference (η2 > 0.14) in fear due to the volume of sound effects between low volume sound and high volume sound, as well as between medium volume sound and high volume sound. No meaningful qualitative data was gathered for volume related fear responses. For timing, results showed a statistically significant and very large (see eta square, Cohen, 1988) increase between timed and untimed sound effects. Qualitative data showed that timed sound effects enhanced the fear of many players when accompanied along with a visual gameplay element, such as the presence of an in-game enemy, though the sound effect by itself may not have substantially elicited fear. Findings for fear related to sourced sound effects appeared to be considerable but they were not statistically significant. Several participants verbally reported that the acousmatic sound effects, such as a breaking window or footsteps on the ceiling, evoked fear. In particular, participants reported that the acousmatic sound effect required them to be attentive to possible threats.
causing Anxiety Findings Results showed a statistically significant and large difference in anxiety due to the volume of sound effects between low volume sound and medium volume sound, as well as between low volume sound and high volume sound. No meaningful qualitative data was gathered for volume related anxiety responses. There was not a statistically significant difference between timed and untimed sound effects for anxiety. Qualitative data showed that untimed sound effects caught some players off guard, because they could not determine whether the sounds were meant to signal danger, or if they were benign sounds.
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
There was not a statistically significant difference between sourced and acousmatic sound effects. Some participants reported that they felt anxiety when they could not find the source of a sound effect. Other participants stopped and looked around when an acousmatic sound played, as they did for untimed sound effects, and proceeded to search for the source of the sound effect. One player stated that not finding the source of an untimed sound effect made him worry that he had possibly missed something important.
Discussion Causing Fear Discussion Results strongly suggest that high volume sound effects are most effective at causing fear in players. The quantitative data showed a significant, large increase in reported fear responses, when a sound effect is louder than the rest of the sounds in the game. This implies to game designers that the louder they create a sound effect, relative to other sounds, the more effective it is at promoting fear in a player. In addition, results strongly suggest that sound effects timed to coincide with a visual gameplay element, such as an in-game enemy, are effective at eliciting fear. Quantitative data showed a significant, large increase in fear responses due to timed sound effects compared to untimed sound effects. However, the qualitative data showed that players may not have reacted with fear to the sound itself, but, rather, fear was primarily evoked by the accompanied visual gameplay element. That is, a well-timed sound effect amplifies attention to the gameplay element and enhances the initial fear response caused by the visual perception of that element. The synchronization of the sound and corresponding image enhanced the feeling of fear through the process of synchresis, which promoted veridicality. This implies to game designers that accompanying a visual gameplay element with a well-timed, appropriate sound effect is more effec-
tive at causing fear in players than by introducing the gameplay element without a sound effect or with a mistimed sound effect. Finally, there were mixed results of whether sourced sounds elicit more fear than acousmatic sound effects. Quantitative data did not show a significant increase in fear responses to sourced sounds compared to acousmatic sound effects. However, qualitative data suggested that an acousmatic sound effect drew attention to a potentially imminent danger in the game, which may have put players in a state of suspense. If this is the case, some participants may not have reported that sourced sound effects evoked significantly more fear than acousmatic sounds because they considered their suspense responses to be closer to fear than anxiety.
Causing Anxiety Discussion Results showed that medium and high volume sound effects are significantly and substantially more effective at eliciting anxiety in players than low volume sound effects. Perhaps low volume sound effects are not easily perceived because they become masked amidst other higher volume sounds. In contrast, high volume sound effects are easily perceived but not necessary, because there is no significant difference between medium and high volume sound effects for evoking anxiety. Furthermore, given that high volume sound seems to elicit fear reactions from players, the use of this sound technique for evoking anxiety should be avoided, because of its potential confounding effect. This implies to game designers that the best volume to play anxiety-causing sound effects at is at the same volume as the average soundscape in the game. Low volume sound effects, perhaps, should be used to immerse the player by generating the ambience and mood of the game (Roux-Girard, 2011) rather than to promote specific emotions. The quantitative results did not show a significant change in anxiety between timed and untimed sound effects. Qualitative results indicated that
185
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
some players were concerned by the untimed sounds. However, this concern in finding the nature of the untimed sound effect seems to be more about not knowing the source of the sound effect rather than the timing. Likewise, the quantitative results did not show a significant difference between sourced and acousmatic sound effects in evoking anxiety. Nevertheless, the qualitative results indicate that when some players heard an acousmatic sound, they stopped and looked around in an attempt to find the source. Being unable to find the source caused some players to become concerned that something dangerous would occur later in the game, which could mean that the players were in a state of suspense. And, as noted previously, players may have considered suspense to be more of a fear emotion than anxiety, which would have resulted in less reporting of anxiety. The overall implication to game designers is that playing a threatening or eerie sound effect without a visual source may be better at causing suspense in players than accompanying a sound with a visual threat. Furthermore, untimed sound effects can also promote suspense if the player perceives it as independent from the visual source, at which point the player would appraise the sound effect as acousmatic.
cONcLUsION The aim of the current chapter was to provide a theoretical foundation for the study of evoking emotions using sound design and determine how to cause fear and anxiety through sound design in computer games. The literature and field review that focused on human emotion theory and survival horror games provided an understanding of basic sound design principles of volume, timing, and source in relation to the emotions of fear and anxiety. This study used qualitative and quantitative methods to determine the best use of volume, timing, and source of diegetic sound effects to cause fear and anxiety in players.
186
The results of this study strongly suggest that the best sound design for causing fear are high volume sound effects that are well-timed with the accompanying visual element. This may seem obvious but this study has provided statistical validity for using this technique and these results can be used as basis for further research. For anxiety, results strongly suggest that the best sound design is medium volume sound effects. Furthermore, qualitative data suggest that suspense was evoked by untimed and acousmatic sound effects. And, although results suggested that medium sound effects were able to promote anxiety, players may have been in a state of suspense at this time, as well. Low, acousmatic sound effects appear to not be effective at evoking fear and anxiety, and possibly any emotion, due to their tendency to become masked by other sounds. Perhaps low volume sound effects may be best used for enhancing immersion or mood. An interesting interpretation of the current study’s evidence is that anxiety, as a separate gameplay emotion, is difficult to evoke on its own. Rather, the combination of fear and anxiety, that is, suspense, is easier to promote, and probably more desirable. Players play survival horror games to experience fear and suspense (Perron, 2005b). Anxiety is too diffuse and vague to be compelling for players to experience in survival horror games. Players of these games would rather have a more direct and powerful emotional response to perceived events and gameplay. This chapter provided quantitative and qualitative evidence that game designers can manipulate the sound properties of volume, timing, and source to evoke fear, suspense, and anxiety in players. The literature and field review, methodology, and results of this study can serve as a foundation for future research.
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
FUtUrE rEsEArcH DIrEctIONs One of the limitations of this study is that most participants were male and tended to be heavy gamers. Future studies may want to focus on populations that better represent female and/or casual gamers. Another limitation is that all the sound effects were included in one level and there was possible interaction between the volume, timing, and source variables For instance, the volume of a sound effect may influence the emotional impact of timing and/or source. In order to control for these possible confounding factors, a future study may focus on the affect of solely one sound property. Furthermore, as in most studies, a larger population should yield better external validity. One possible research direction is to continue studying fear, suspense, and anxiety beyond the parameters of the study described in this chapter. For instance, are the sound design techniques the same for causing fear, suspense, and anxiety in game genres other than survival horror? What is the difference between male and female responses to sound design techniques that cause fear, suspense, and anxiety? What are the effects of the absence of sound on fear, suspense, and anxiety? What is the relationship between visual gameplay elements and sound effects on how they affect players’ fear, suspense, and anxiety? How can other types of sounds, such as music and dialogue, increase fear, suspense, and anxiety? How do other sound properties affect fear, suspense, and anxiety? Finally, future possible research can study other emotions using the same or other sound properties. For example, how can game designers elicit the emotions of anger, joy, and sadness in players through sound design? This research would lead to inquiry into the questions raised previously, such as the effect of genre, visual gameplay elements, and the type of sound evoking the studied emotion. From this research, we would not only understand how to promote certain emotional experiences from playing computer games through the use of sound design, but we may be also able
to add new insights and dimensions to emotional theories, as well.
rEFErENcEs Alone in the Dark. (2008). Eden Games. Amdel-Meguid, A. A. (2009). Causing fear and anxiety through sound design in video games. Unpublished master’s thesis. Southern Methodist University, Dallas, Texas, USA. Anderson, J. D. (1996). The reality of illusion: An ecological approach to cognitive film theory. Carbondale, IL: Southern Illinois University Press. Barlow, D. H. (1988). Anxiety and its disorders: The nature and treatment of anxiety and panic. New York: Guilford Press. Carr, D. (2006). Space, navigation and affect . In Carr, C., Buckingham, D., Burn, A., & Schott, G. (Eds.), Computer games: Text, narrative and play (pp. 59–71). Cambridge, UK: Polity. Carroll, N. (1996). The paradox of suspense. In Vorderer & Friedrichsen (Eds.), Suspense: conceptualization, theoretical analysis, and empirical explorations (pp. 71-90). Hillsdale N.J.: Lawrence Erlbaum Associates. Cohen, J. W. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Collins, K. (2007). An introduction to the participatory and non-linear aspects of video games audio . In Hawkins, S., & Richardson, J. (Eds.), Essays on sound and vision (pp. 263–298). Helsinki, Finland: Helsinki University Press.
187
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
Collins, K., Tessler, H., Harrigan, K., Dixon, M. J., & Fugelsang, J. (2011). Sound in electronic gambling machines: A review of the literature and its relevance to game audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Cornelius, R. R. (1996). The science of emotion. Upper Saddle River, NJ: Prentice-Hall. Creswell, J. (2005). Educational research: Planning, conducting, and evaluating quantitative and qualitative research (2nd ed.). Upper Saddle River, New Jersey: Pearson Education. Cunningham, S., Grout, V., & Picking, R. (2011). Emotion, content and context in sound and music . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Dead Space. (2008). Electronic Arts. Doom 3. (2004). Activision. Eternal Darkness. (2002). Nintendo. Folkman, S., & Lazarus, R. S. (1990). Coping and emotion . In Leventhal, N. B., & Trabasso, T. (Eds.), Psychological and biological approaches to emotion (pp. 313–332). Hillsdale, NJ: Erlbaum. Gears of War. (2007). Microsoft. Gebeke, D. (1993). Children and fear. Retrieved December 10, 2009, from http://www.ag.ndsu. edu/pubs/yf/famsci/he458w.htm. Gray, J. A. (1971). The psychology of fear and stress. New York: McGraw-Hill. Greeno, J. G., Collins, A. M., & Resnick, L. B. (1996). Cognition and learning . In Berliner, D., & Calfee, R. (Eds.), Handbook of educational psychology (pp. 15–46). New York: Simon & Schuster Macmillan.
188
Grimshaw, M. (2007). Sound and immersion in the first-person shooter. In Proceedings of The 11th International Computer Games Conference: AI, Animation, Mobile, Educational & Serious Games (CGAMES 2007). Grimshaw, M. (2008). The acoustic ecology of the first-person shooter: The player experience of sound in the first-person shooter computer game. Saarbrucken, Germany: VDM Verlag. Gullone, E., King, N., & Ollendick, T. (2000). The development and psychometric evaluation of the Fear Experiences Questionnaire: An attempt to disentangle the fear and anxiety constructs. Clinical Psychology & Psychotherapy, 7(1), 61–75. doi:10.1002/(SICI)10990879(200002)7:13.0.CO;2-P Healy, A. F., Proctor, R. W., & Weiner, I. B. (2004). Handbook of psychology: Vol. 4. Experimental psychology. Hoboken, NJ: Wiley. Kaplan, H. I., & Sadock, B. J. (1998). Synopsis of psychiatry. Baltimore, MD: Williams & Wilkins. Krzywinska, T. (2002). Hands-on horror . In King, G., & Krzywinska, T. (Eds.), ScreenPlay: Cinema/Videogames/Interfaces (pp. 206–223). London: Wallflower. Liljedahl, M. (2011). Sound for fantasy and freedom . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Lincoln, Y. S., & Guba, E. D. (1985). Naturalistic inquiry. Thousand Oaks, CA: Sage Publications, Inc. May, R. (1977). The meaning of anxiety (revised ed.). New York: Norton. Nacke, L., & Grimshaw, M. (2011). Player-game interaction through affective sound . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
Perron, B. (2004). Sign of a threat: The effects of warning systems in survival horror games. In Proceedings of the Fourth International COSIGN (Computational Semiotics for Games and New Media) 2004 Conference.
ADDItIONAL rEADINGs
Perron, B. (2005a). A cognitive psychological approach to gameplay emotions. In Proceedings of the Second International DiGRA (Digital Games Research Association) 2005 Conference.
Bridgett, R. (2008). Post-production sound: A new production model for interactive media. Soundtrack, 1(1), 29–39. doi:10.1386/st.1.1.29_1
Perron, B. (2005b). Coming to play at frightening yourself: Welcome to the world of horror video games. In Proceeding of the Aesthetics of Play conference. Plutchik, R. (1984). Emotions: A general psychoevolutionary theory. Hillsdale, NJ: Erlbaum. Roux-Girard, G. (2011). Listening to fear: A study of sound in horror computer games . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Serafin, S. (2004). Sound design to enhance presence in photorealistic virtual reality. In Proceedings of the 2004 International Conference on Auditory Display. Silent Hill 2. (2001). Konami. Wilhelmsson, U., & Wallén, J. (2011). A combined model for the structuring of game audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey: IGI Global. Wolfson, S., & Case, G. (2000). The effects of sound and colour on responses to a computer game. Interacting with Computers, 13, 183–192. doi:10.1016/S0953-5438(00)00037-0 Zillman, D. (1991). The logic of suspense and mystery . In Bryant, J., & Zillman, D. (Eds.), Responding to the screen: Reception and reaction processes (pp. 281–303). Hillsdale, NJ: Lawrence Erlbaum Associates.
Barbara, S. C. (2003). Hearing in three dimensions. The Journal of the Acoustical Society of America, 113(4), 2200–2200.
Calvert, S. L., & Scott, M. C. (1989). Sound effects for children’s temporal integration of fast-paced television content. Journal of Broadcasting & Electronic Media, 33(3), 233–246. Gärdenfors, D. (2003). Designing sound-based computer games. Computer Creativity, 14(2), 111–114. Grimshaw, M. (2009). The audio uncanny valley: Sound, fear and the horror game. In Proceedings of the Audio Mostly 2009 Conference. Houlihan, K. (2003). Sound design: The expressive power of music, voice, and sound effects in cinema. Journal of Media Practice, 4(1), 69–69. doi:10.1386/jmpr.4.1.69/0 Izard, C. E. (2009). Emotion theory and research: Highlights, unanswered questions, and emerging issues. Annual Review of Psychology, 60(1), 1–25. doi:10.1146/annurev.psych.60.110707.163539 Jennett, C., & Cox, A. L. (2008). Measuring and defining the experience of immersion in games. International Journal of Human-Computer Studies, 66(9), 641–661. doi:10.1016/j.ijhcs.2008.04.004 Jones, K. (2004). Fear of emotions. Simu l a t i o n & G a m i n g, 3 5( 4 ) , 4 5 4 – 4 6 0 . doi:10.1177/1046878104269893 Jørgensen, K. (2007). On transdiegetic sounds in computer games. Northern Lights: Film & Media Studies Yearbook, 5(1), 105–117. doi:10.1386/ nl.5.1.105_1
189
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
Klimmt, C., Rizzo, A., Vorderer, P., Koch, J., & Fischer, T. (2009). Experimental evidence for suspense as determinant of video game enjoyment. Cyberpsychology & Behavior, 12(1), 29–31. doi:10.1089/cpb.2008.0060 Kofler, A. (1997). Fear and anxiety across continents: The European and the American way. Innovation: The European Journal of Social Sciences, 10(4), 381–404. Kyosik, K., & Hyungtai, C. (2008). Enhancement of a 3D sound using psychoacoustics. International Journal of Biological & Medical Sciences, 1(3), 151–155. Levitt, H. (1971). Transformed up-down methods in psychoacoustics. The Journal of the Acoustical Society of America, 40, 467–477. doi:10.1121/1.1912375 Liu, M., Toprac, P., & Yuen, T. (2008). What factors make a multimedia learning environment engaging: A case study . In Zheng, R. (Ed.), Cognitive Effects of Multimedia Learning. Hershey, PA: Idea Group Inc. Portnoy, S. (1997). Unmasking sound: Music and representation in The Shout and Blue. Spectator: The University of Southern California Journal of Film & Television, 17(2), 50–59. Raghuvanshi, N. (2007). Real-time sound synthesis and propagation for games. Communications of the ACM, 50(7), 66–73. doi:10.1145/1272516.1272541 Roberts, J. R. (2006). Influence of sound and vibration from sports impacts on players’ perceptions of equipment quality. Journal of Materials: Design & Applications, 220(4), 215–227. Robertson, H. (2004). Random noises. Videomaker, 19(4), 71–74.
190
Satoru, O., & Shigeru, A. (2003). Video game apparatus, background sound output setting method in video game, and computer-readable recording medium storing background sound output setting program. The Journal of the Acoustical Society of America, 114(3), 1208–1208. Schafer, R. M. (1994). The soundscape: Our sonic environment and the tuning of the world. Rochester, VT: Destiny Books. Sider, L. (2003). If you wish to see, listen: The role of sound design. Journal of Media Practice, 4(1), 5–15. doi:10.1386/jmpr.4.1.5/0 Tinwell, A., & Grimshaw, M. (2009, April). Survival horror games: An uncanny modality. Proceedings of the International Conference Thinking After Dark. Yantas, A. E., & Azcan, O. (2006). The effects of the sound-image relationship within sound education for interactive media design. Computer Creativity, 17(2), 91–99. Zwicker, E., & Fastl, H. (1990). Psychoacoustics facts and models. New York: Springer-Verlag.
KEY tErMs AND DEFINItIONs Anxiety: A generalized mood condition that occurs without an identifiable triggering stimulus. Cognitive Emotional Theory: Cognitive activity in the form of judgments, evaluations, or thoughts is necessary in order for an emotion to occur. Darwinian Emotional Theory: Emotions evolved via natural selection and therefore have cross-culturally universal counterparts. Fear: An emotional response to a perceived threat. James-Lange Emotional Theory: Emotional experience is largely due to the experience of bodily changes.
Causing Fear, Suspense, and Anxiety Using Sound Design in Computer Games
Psychoacoustics: The study of subjective human perception of sounds. Sound Design: The manipulation of audio elements to achieve a desired effect. Sound Principles: Components that most influence how sound is perceived.
Source: The object emitting sound. Suspense: Is a feeling of fear and anxiety about the outcome of certain actions. Timing: The degree of synchronization between the sound effect and visible object(s). Volume: The amplitude or loudness of a sound
191
192
Chapter 10
Listening to Fear:
A Study of Sound in Horror Computer Games Guillaume Roux-Girard University of Montréal, Canada
AbstrAct This chapter aims to explain how sound in horror computer games works towards eliciting emotions in the gamer: namely fear and dread. More than just analyzing how the gamer produces meaning with horror game sound in relation to its overarching generic context, it will look at how the inner relations of the sonic structure of the game and the different functions of computer game sound are manipulated to create the horrific strategies of the games. This chapter will also provide theoretical background on sound, gameplay, and the reception of computer games to support my argument.
INtrODUctION Computer game sound is as crucial to the creation of the depicted gameworld’s mood as it is in its undeniable support to gameplay. In horror computer games, this role is increased tenfold as sound becomes the engine of the gamer’s immersion within the horrific universe. From the morphology of the sound event to its audio-visual and videoludic staging, sound cues provide most of the information necessary for the gamer’s progression in the game and, simultaneously, supply a range of emotions from simple surprise to the DOI: 10.4018/978-1-61692-828-5.ch010
most intense terror. In horror computer games, it is not recommended that the gamer divert their attention from the various sound events, as a careful listening will allow for—or at least favour—the survival of their player character. In his thesis on the sound ecology of the first-person shooter, Mark Grimshaw (2008) underlined that in common day life, where dangers are limited, the auditory system “can operate in standby mode (or, in cognitive terminology, [the] auditory system is operating at a low level of perceptual readiness) awaiting more urgent signals as categorized by experience” (p. 10). Just as Grimshaw did about the genre at the heart of his study, I suggest that “the hostile world of the [horror computer] game
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Listening to Fear
requires a high level of perceptual readiness in regard to sound” (p. 10). The level of attention required vis-à-vis sound must be increased all the more so as computer game environments are often designed to limit the visual perception of the gamer. Whether it is by means of a constraining virtual camera system (Taylor, 2005), by using stylistic effect such as the thick fog shrouding the streets of Silent Hill (Konami, 1999), or by drastically reducing sources of light, game designers have, through time, found a variety of ways to force the gamer to utilise their ears in order to help their player character survive in the nightmarish worlds in which they play. To fully comprehend how horror computer games manage to frighten the gamer, one must understand how sound is structured, as well as be aware of how the gamer makes meaning with the information the sounds carry. From this point, many questions arise. What are the implications of the generic context on the reception of the sounds in horror computer games? On what basis should we approach the sound structure of those games? How does this structure allow for the mise en scène of the dreadful elements or horrific strategies of the games? What are the basic functions of horror computer game sounds and, once again, how can the game work on these functions to create a sentiment of fear and dread in the gamer? As it will be further explored in the next sections of this chapter, I make the hypothesis that sound in computer games should be approached directly in regard to its purposes towards gameplay. After all, gameplay is what mainly distinguishes computer games from their linear audio-visual counterparts: the main difference between computer games and films being situated in the participatory and interactive nature of the videoludic medium. Therefore, it is mainly through a study of gameplay that true understanding of the role of game sound can be achieved. In this perspective, I also suggest that sound should be addressed in a way that is both accessible to designers and the most common gamer. In order to do so, I firmly
believe that adopting a position that emphasizes reception issues of gameplay can provide a more productive model than one that would be grounded directly in the production aspects (implementation and programming) of game sound. Overall, this text aims at explaining how horror game sound works in a way to elicit specific emotions in the gamer. Adopting a gamer- and gameplay-centric perspective, it wishes to highlight how the inner relations of the sonic structure and the different functions of game sound are used to create strategies based on the micro events and on the overarching generic context that regulates these events. With examples borrowed from the Alone in the Dark (I-motion, 1992-1995, Infogrames, 2001 & Atari, 2008), Resident Evil (Capcom, 1996-2009) and Silent Hill (Konami, 1999-2008) series, and from the computer game Dead Space (Electronic Arts, 2008), this paper will also try to demonstrate how the notion of genre, instead of being merely a tool to classify games, rather impacts on the expectations of the gamer and therefore structures the way they organize and make meaning of sound in relation to the game context.1
APPrOAcHING HOrrOr cOMPUtEr GAME sOUND Before we try to understand what purposes sounds serve in horror computer games and how they contribute in generating fear, it is essential to take a look at the numerous factors which condition the gamer’s journey and influence their listening through their gaming sessions.
the Horizon of Expectations In her book Game Sound: An Introduction to the History, Theory, and Practice of Video Games, Karen Collins (2008) noted that “game [sound] has been significantly affected by the nature of technology […] and by the nature of the industry”
193
Listening to Fear
(p. 123). Indeed, economic and technological constraints are greatly responsible for the game’s aesthetic as the limits imposed by production time and hardware often force the designers to lessen the richness of the soundscape while encouraging others to find inventive ways to overcome these constraints.2 However, as Collins explained, the games themselves also affect game sound by the means of their genre, narrative structure, and participatory nature. Consequently, she pointed out that “[g]enre in games is particularly important in that it helps to set the audience’s expectations by providing a framework for understanding the rules of gameplay” (Collins, 2008, p. 123). Consequently, the horizon of expectations gamers have of the games is probably the first thing that will influence the production of meaning towards a sound. As Hans Robert Jauss (1982) explains: The analysis of the literary experience of the reader [or the videoludic experience of the gamer] avoids the threatening pitfalls of psychology if it describes the reception and the influence of a work within the objectifiable system of expectations that arise for each work in the historical moment of its appearance, from a pre-understanding of the genre, from the form and themes of already familiar works, and from the opposition between poetics and practical language. (p. 22) This horizon of expectations will thus be forged by the gamer’s previous experiences at playing computer games, particularly those in the horror genre, but also his familiarity with broader horror mythology and conventions such as the ones found in movies and novels. We can also maintain that the notion of genre will play a determining role in the way game sound is produced and then received by the gaming community. This relationship between production and reception is fundamental to understand the functions and evolution of sound in horror computer games. Indeed, these games are generically marked “which rely on generic identification by an audience” (Neale, 2000, p.
194
28) as well as generically modelled which “draws on and conforms to existing generic traditions, conventions and formulae” (Neale, 2000, p. 28).3 To be considered as a horror game, a videoludic work must then be designed with a purpose of scaring the gamer and must be received as such by the gaming community that will then treat this intention as a gaming constraint. Accordingly, sound must be exploited to support these design choices, and, to a certain degree, correspond to the expectations the games produce.
What is a Horror computer Game? Horror computer games generate fear through mechanisms specifically tied to their videoludic nature even though they often draw their strategies of mise en scène from its cinematographic counterpart’s conventions and mythologies (Whalen, 2004; Perron, 2004). Derived from the “adventure” genre (Whalen, 2004), these computer games exploit horror conventions on the plot level, often by opposing a lone individual, trapped inside a gloomy location, to a flock of bloodthirsty, monstrous creatures which he must confront—or sometimes run from—in order to survive. On the gameplay level, “the gamer has to find clues, gather objects [...] and solves puzzles” (Perron, 2004, p. 133). As it was mentioned previously, sound will play a determining role, as these games normally limit vision through their formal and aesthetic treatments, in helping the gamer to gather the necessary information on their environment to stay alive. Horror computer games are not only designed to generate fear based on their narrative setting or the iconography they employ, but are also conceptualised to produce what Bernard Perron called gameplay emotions. According to Perron (2004), these games engender three different kinds of emotions: (1) fictional emotions which “are rooted in the fictional world and the concerns addressed by that world”, (2) artefact emotion emanating “from concerns related to the artefact, as well as
Listening to Fear
stimulus characteristics based on these concerns”, but mostly (3) gameplay emotions “that arise from the gamer’s action in the game-world and the consequent reactions of this world” (p. 132). While all horror computer games are provided with a more or less elaborate fictional setting, in the end, it remains a part of the experience of gameplay. For horror games to be effective, gameplay mechanics must have been designed with the intent of scaring the gamer, by limiting the quantity of ammunition, for instance.
How to Approach Horror computer Games sound? In the introduction to Sound Theory, Sound Practice, Rick Altman (1992) claimed that rather than seeing cinema as a self-centered text, it should be perceived as an event. Traditionally, film studies modelled the production and reception as gravitating around the film-as-text. However, as Altman explained: “Viewed as a macro-event, cinema is still seen as centred on the individual film, but […] the textual center is no longer the focal point of a series of concentric rings” (pp. 2-3). Following this model, the film-as-text mostly serves as a “point of interchange” between the process of production and the process of reception which mutually influence one another. The film itself thus becomes a representation of this “dialogue” or this “event”. Computer games can be envisioned in a similar fashion. However, while technical aspects of computer game production might enlighten certain points about how the sounds are implemented and structured within the game code, I believe that it is not with regard to this code that horror computer games should be approached. While some PC games offer the option to look at how the files are organised on the disc, most computer games—particularly console games—do not. I will therefore be addressing sounds through the gameplay process of the individuals playing the game and all design matters will be dealt with in regard to creating this gameplay experience.
For this matter, the notion of genre will mostly serve as an overarching catalyst through which the gamer structures their journey in the games.4 Of course, the chapter will still deal with design issues such as looking at the implementation of sound strategies, however, this will be done in order to investigate how designers built these stratagems out of their predictions of how the gamer potentially produces meaning through the sounds, in regards to generic constraints, during their gameplay activity. But then again, what is gameplay? In HalfReal, Jesper Juul (2005) approached the concept of gameplay using Richard Rouse’s definition as a basis: “A game’s gameplay is the degree and nature of the interactivity that game includes, i.e., how the [gamer] is able to interact with the game-world and how that game-world reacts to the choices the [gamer] makes” (Rouse in Juul, p. 87). To further elaborate on the question of gameplay, and to prevent a misunderstanding of the term, Juul added that “gameplay is not a mirror of the rules of a game, but a consequence of the game rules and the dispositions of the game players” (p. 88). Using this quotation as starting point, and as a way to oppose the fallacy constructed by Manovich’s (2001) definition of an algorithm, Arsenault and Perron (2009) reminded me that “one of the misconceptions of gameplay which needs to be addressed springs out when one does not make a distinction between the process of playing games and the game system itself” (p. 110). Following their logic, gameplay must not be understood as “the” game system but as the “ludic experience” emerging from the relation that is established between the gamer and the game system. Therefore, it is important to understand that through the eyes of a gamer, the experience of gameplay is not portrayed by a series of codes managed by an algorithm, nor a direct representation of the implementation of sound within this code.5 Consequently, I chose to exploit a terminology which facilitates the understanding of the gamer’s cognitive process during
195
Listening to Fear
gameplay. It will allow me to better illustrate how the gamer produces meaning of sounds as a means of completing their main objective: the survival of their player character. Arsenault and Perron (2009) defined computer games as “a chain of reactions” in which “[t] he [gamer] does not act so much as he reacts to what the game presents to him, and similarly, the game reacts to his input” (pp. 119-120) In other words, the gamer responds to events that were programmed by a designer (whose job partly consisted of predicting the gamer’s reactions to the proposed events), and then the game acts in response to the gamer’s input with other preprogrammed events fitting the new parameters. According to their gameplay (and gamer-centric model), the authors explained (a single loop of) gameplay through four steps in which “the game always gets the first turn to speak” (Arsenault & Perron, 2009): •
•
•
•
From the game’s database, the game’s algorithm draws the 3-D objects and textures, and plays animations, sound files, and finds everything else that it needs to represent the game state The game outputs these to the screen, speakers, or other peripherals. The gamer uses his perceptual skills (bottom-up) to see, hear and/or feel what is happening The gamer analyses the data at hand through his broader anterior knowledge (in top-down fashion) of narrative convention, generic competence, gaming repertoire, etc. to make a decision The gamer uses his implementation skills (such as hand-eye coordination) to react to the game event, and the game recognizes this input and factors it into the change of the game state. (pp. 120-121)
However, as the authors recalled, “the most obvious flaw of representing gameplay with a single circle is that the temporal progression—the
196
evolution of the gamer’s relationship with the game—is left aside” (Arsenault & Perron, 2009, p. 115). To correct this failing, Arsenault and Perron proposed a model—the Magic Cycle (Figure 1)—that is based on 3 interconnected spirals: the heuristic spiral of gameplay, the heuristic spiral of narrative and the hermeneutic spiral. They also clarified that: “[t]he relationship to each other is one of inclusion: the gameplay leads to the unfolding of the narrative, and together the gameplay and the narrative can make possible some sort of interpretation” (p. 118). Their model also took into account the gamer’s experience in gaming and the horizon of expectations of the gamer that are shaped by their previous knowledge of the game or sometimes by an introductory cut scene. While looking at the model, these are respectively represented by the dotted lines entitled “launch window” and by the inverted spiral. From this point the looping process described above will be “repeated countless numbers of time to make up the magic cycle” (p. 121) and to represent the mental image the gamer develops about the game (represented by the Game′ of the model). This perpetual process, alongside the implication of the generic context, will therefore allow for the mental organisation of sounds towards the gamer activity inside the game.
strUctUrING HOrrOr cOMPUtEr GAME sOUND When they are engaged in a horror game, the exercise of gameplay requires the gamer to somewhat organise sounds according to their gaming objectives which, in the case of the genre we study here, mainly revolve around allowing their player character to survive the horrors of the game. In order to do so, the gamer tries to answer two basic questions regarding game sound: 1) From where does that sound originate? and 2) what is the cause of that sound? I therefore propose to explore a basic sound structure that will effectively represent the
Listening to Fear
Figure 1. Arsenault & Perron’s Magic Cycle. (© 2009, Arsenault and Perron. Used with permission)
cognitive process (as previously explained with Arsenault and Perron’s model) that is performed almost unconsciously by the gamer while playing a horror computer game.
Inside and Outside of the “Diégèse” While glancing at the game sound literature (Collins, 2008; Grimshaw, 2008; Huiberts and van Tol, 2008, Jørgensen, 2006, 2011; Stockburger, 2003), we notice that one of the most common ways to envision the structure and composition of sound in games is relative to its status regarding the diégèse of the game (I am using the French word to avoid any misconception that this term holds the same meaning as Plato’s and Aristotle’s definition of diegesis6). Taking its origin in film studies, the diégèse must be understood as a “mental reconstruction of a world” (Odin, 2000, p. 18, freely translated) that can be “perceived as an inhabitable space” (Odin, 2000, p. 23, freely translated). This definition of diégèse clearly refers to the “historico-temporal” universe in which the story—or in the case that interests us, the simulation—takes place. This definition thus allows more easily for the parallel that is often established between the diégèse and the game-
world.7 From a structural perspective, more than the description of a world, it is particularly in the division that exists between elements considered as being part of the fictional world (diegetic) and elements which are not judged to be components of the fictional world (extra-diegetic8) that this notion has found a niche in works on sound in game studies. Indeed, while listening to horror computer game sound, the fact that a sound is part of the depicted gameworld or not will have a considerable impact on the decisions the gamer will make regarding this sound. Based on the gameplay model that was introduced earlier, these sound cues will engender many questions in an attempt to recreate the mental image of the game state. Is the sound produced by an instance present in the “diégèse”? If it is, does that instance represent a threat to the player character or is it just a part of the ambience of the gameworld? Furthermore, as it was hinted by this set of queries, while the diegetic status of a sound holds much importance, recreating the mental image of the game state necessitates a more elaborate set of qualifiers.
197
Listening to Fear
sound Generators In computer games, much attention must be paid to sound sources as they contribute to the construction of the diegetic space. However, more important than what instance or event emits the sound is what generates the sound. Not only does the notion of generator furnish knowledge on what caused a specific cue, but it also provides information on its relationship to other sounds, its relationship to the game state, as well as the situation in which they are heard. These sound generators, as Kristine Jørgensen (2008) explained are “not the same as the source of the sounds. While the source is the object that physically (or virtually) produces the sound: the generator is what causes the event that produces the sound” (Player Interpretation of Audio in Context section, para. 2). If we adapt Jørgensen’s example to a horror computer game context, this basically means that the shrieking sound emitted by one of Dead Space’s necromorph (its source) while being dismembered by the player character’s plasma cutter is in fact generated by the gamer. Therefore, this concept (in its definition) also reflects the interactive nature of computer games by putting forward the agency9 of the gamer within the simulated world, as well as the response of the game to the gamer’s actions. While studying World of Warcraft, Jørgensen (2008) identified 5 categories of sound generators: the gamer, allies, enemies, the gameworld, and the game system each of which is organized according to the perspective of the gamer. Even though, some horror games propose an interaction with friendly non-player characters such as Luis in Resident Evil 4 (Capcom, 2004) or, as in Resident Evil 5 (Capcom, 2009) and Left 4 Dead (Valve Software, 2008), offer a multiplayer co-operative mode, most games of the genre privilege the solitude of the player character and allies are normally quite scarce. Therefore, this chapter will focus on the dynamic and nondynamic sounds (Collins, 2008) produced by the gamer, the enemies, the gameworld, and the game
198
system. Accordingly, I will briefly describe these generators following Jørgensen’s definition and adapt them to my own corpus of study. General informative functions of each type of generator will also be mentioned as they will provide a tighter relationship with the next section of this chapter on the functions of horror computer game sounds. A sound generated by the gamer is “caused by [gamer] action” (Jørgensen, 2008, Player Generated Sound section, para. 1). As Jørgensen explained: The most important informative role of [gamer] generated sounds is to provide usability information, or more specifically to provide response since they always seem to appear immediately after a player action. Player generated sounds also provide spatial information, and sometimes also temporal and [player character] state information. (Player Generated Sound section, para. 1) In Resident Evil (Capcom, 1996), for instance, these sounds may include footsteps, gunshots, the opening of doors, angry monster growls after they are shot by the gamer, the opening of Chris Redfield’s or Jill Valentine’s inventory menus and so on. For their part, enemy generated cues “are produced externally from the [gamer’s] perspective, by being detached from the [gamer’s] own actions and emerging from the gameworld” (Jørgensen, 2008, Sound Generated By Enemies and Allies section, para. 1). Such sounds will furnish spatiotemporal information and will also serve “presence” purposes as they engage with the existence of enemies in the vicinity. Of course, these sounds also give information about modification in the game state and supply progression functions of the game: these might include the sounds of offscreen or on-screen monsters, or may indicate that the player character has been wounded after being hit by a zombie. Gameworld generated sounds are similar to what Huiberts and van Tol (2008) described as
Listening to Fear
zone sounds. These sound cues consist of sounds “linked to the environment in which the game is played” (Huiberts & van Tol, 2008, Zone section, para. 1). While these sounds are often implemented to generate the ambience of the game, they also serve as spatial functions and might give certain information about the game state. In Dead Space, these sounds include the rumbling of the ship and some of the gruesome sounds emitted by the preprogrammed burst of blood coming out of organic matter that can be found on the wall and floor. Game system-generated sounds are by far the most ambiguous. Jørgensen (2008) defined them as sounds “generated by the system to provide information that any [player character] cannot produce on its own, and carry information directly connected to game rules and as well as game and [gamer] state” (Conclusions and Summary section, para.3). Horror computer games do not include many of those sounds. However, a few examples can be found. The “fuzzing” sound, accompanied by heart pounding, that is emitted when an player character is lethally wounded in Resident Evil 5 could correspond to this description as it is not directly produced by a gamer’s action, it is generated by the system to warn the gamer that his player character needs immediate health assistance. While it is not explicitly mentioned by Jørgensen, I would argue that the extra-diegetic musical score of the game is also system generated. While this music often plays an affective role in the game, it also serves presence and game state purposes. For instance, in Alone in the Dark: Inferno (Atari, 2008), the music ramps up, signalling that enemies are nearby or attacking the player character. It is mostly according to the relationship between this extra-diegetic music, the gamer, and the gameworld that this category of generators will be examined in this paper. These generators will be used as a structural basis when studying the creation of horror game sound strategies.
tHE FUNctIONs OF HOrrOr cOMPUtEr GAME sOUND To reach his objective, the gamers must also gather information about the game state. To do so, they must ask themselves what are the functions of a particular sonic cue and, if the sounds serve more than one purpose, which function is more important according to the context? In computer games, sounds contribute to the gamer’s immersion: they construct the mood of the game, and provide information that will be used in gameplay. According to Jørgensen (2006), we can state that sound serves two main functions. On one hand, it “has the overarching role of supporting a user system” and, on the other, it is “supporting the sense of presence in a fictional world” (p. 48). This basically means that sound creates “a situation where the usability information of elements such as [sound] becomes integrated with the sense of presence in the virtual world” (Jørgensen, 2008, Integration of Game System and Virtual World section, para. 1).
the (Double) causality of sound To fill the important functions exposed by Jørgensen, I believe that sounds first need to create a feeling of causality with: 1) the images (and more largely with the gameworld) and 2) with the gamer’s actions. Just like in movies, images and sounds are tightly linked, producing the effect of added value, described by Michel Chion (2003) as a “sensory, informative, semantic, narrative, structural or expressive value that a sound heard during a scene leads us to project on the image, until creating the impression that we see in this image what in reality we ‘audio-see’” (p. 436, freely translated). The added value of a sound on the images creates what Chion called audio-visiogenic effects which can be classified within four categories: (1) effect of sense, atmosphere, content, (2) rendering and matter effect (materializing sound indices)
199
Listening to Fear
which creates sensations of energy, textures, speed, volume, temperature, for example, (3) scenography effect concerning the creation of an imaginary space and (4) effect related to time and the construction of a temporal phrasing. These audio-visiogenic effects and materializing sound indices are essential to horror computer games such as Dead Space, as they give an organic texture to an anthropomorphic monster. The gooey sound that accompanies the impact of a plasma cutter blast as blood and guts explode on the screen helps the gamer believe that what they are seeing is real, while in fact what is showed on the screen is a simple translation of coloured polygons. The effectiveness of the added value rests upon 3 factors that have also been defined by Chion. It is principally by means of synchronisation points, “a more salient moment of a synchronised reunion between concomitant sonic moment and visual moment” (p. 433, freely translated) or, more broadly, an effect of synchresis, and an effect of rendering which will give the sound a necessary degree of veridicality (Grimshaw, 2008) for it to seem “real, efficient and adapted” to “recreate the sensation [...] associated to the cause or to the circumstance evoked in the [game]” (Chion, 1990, p. 94, freely translated). For this to be effective, Grimshaw (2008) reminds us that a sound “must be as faithful as possible to its sound source [within the game], containing and retaining, from recording or synthesis through to playback, all the information required for the player to accurately perceive the cause and, therefore, the significance of the sound” (p. 73). However, we must not forget that computer games are not only audio-visual, but also interactive. Therefore, sound must also establish a sentiment of causality between the gamer’s actions which mostly correspond to the handling of joysticks and pressing buttons on their controller, and the action performed by the player character on the diegetic level. For this matter synchronisation points turn out to be less aesthetic and more pragmatic as they become the product of the gamer’s
200
will in act. This relationship between action and sounds is primordial in establishing the horror games conventions and greatly contributes to the effect of presence as it gives a sensory support for the gamer’s agency.
Gameplay Functions From a gameplay point of view, and following the loop of Arsenault and Perron’s (2008) model, sound performs two main functions: (1) to give information on the game-state and (2) to give feedback on the gamer’s activity in response to the game state. Before we engage in a typology of the different gameplay functions of sounds, I wish to mention that I am fully aware that every sound, while serving gameplay purposes, simultaneously has immersive and affective functions. However, for reasons of brevity, I will not integrate those functional poles together right away. In this line of thought, I will not present an exhaustive list of gameplay functions, and keep only those useful for my analysis of horror computer game sound strategies.10 Based on Collins (2008), Grimshaw’s (2008), Jørgensen’s (2008) and Whalen’s (2004) work, I wish to take a look at five gameplay functions that some horror game strategies are founded upon: spatial functions, temporal functions, preparatory functions, identification functions, and progression functions. In computer games, it is essential to determine the approximate location of the sound generators. Spatial functions allow for the localization of generators in terms of direction and distance, contribute to the quantification and qualification of game space and help the gamer to navigate through it. More precisely, the sounds will be described as choraplasts which are sounds “whose function is to contribute to the perception of resonating space [volume and time, localization]” (Grimshaw, 2008, p. 113). By privileging a “navigational” mode of listening (Grimshaw, 2008, p. 32), the augmentation or diminution of a sound’s intensity might, for instance, assist the gamer in localizing
Listening to Fear
the generators and help them decide if they want to advance, or not, in their direction. Sonic temporal functions are also very important to horror computer games. For example, in Resident Evil 5, the flamethrower and satellite laser-guide that the gamer needs to utilise in order to kill the dangerous Uroboros monsters are regularly required, respectively, to be filled with fuel or to regain energy. To signal that the weapons are recharging, in addition to a visual indicator, the game underlines this process with a distinctive sound. Similarly, when the replenishing is done, a tone will inform the gamer. The same assumptions can be applied to other weapons as reload times and rate of fire are sometimes vital to the survival of the player character. Sounds that are “affording the perception of time passing” are named chronoplast by Grimshaw (2008, p. 113). The preparatory functions, a term I have borrowed from Collins (2008), and which correspond to what Jørgensen (2006; 2008) called urgency functions, are sounds alerting the gamer that an event has occurred in the diegetic world or which forewarn them of the presence of an enemy within the immediate environment of the player character. For instance, in Dead Space, the alarm signalling that a section of the ship is being put into quarantine serves as an alert, while the off-screen moans of zombies in Resident Evil are considered a forewarning. It must also be acknowledged that adaptive and interactive (Collins, 2008) extra-diegetic music can also occupy these roles as they either punctuate an event or, as in Resident Evil 4, testify to the presence of infected Ganados. For their part, identifying functions, which were more accurately theorised by Jørgensen, (2006), correspond to the ability of a sound “to identify objects and to imply an objects value” (Identifying Functions section, para. 1). For example, the heavy footsteps and the characteristic music loop accompanying the presence of Nemesis in Resident Evil 3 (Capcom, 1999), as well as the screeching of Pyramid Head’s gigantic blade
in Silent Hill 2 (Konami, 2001) lead to a quick identification, while at the same time provide these characters with an imposing an threatening demeanour. The identifying functions’ use is not limited to distinguishing and qualifying enemies, it also “has a central role related to changes in game state and player state”11 (Jørgensen, 2008, The Role of Audio in a Gameplay Context section, para. 2). In Dead Space, when Isaac Clarke grunts in pain after taking a hit, it signals to the gamer that the player character’s physiological integrity has been affected. Musical loops can also signify transitions in the game state. In the Resident Evil series, the leitmotif associated with the “save room” means that the player character is in safety, while fast-paced music normally implies the presence of a threat or requires immediate attention from the gamer. Progression functions is a term I propose based on my reflections upon the motivational purpose of music proposed by Zach Whalen (2004) in his text Play Along: An Approach to Videogame Music. As Whalen explained, in Silent Hill, “the music is always in a degree of “danger state” in order to impel the player through the game’s spaces. The mood of the game is crucial to the horrific ‘feel’, but it also provides motivation by compelling continual progress through the game” (Silent Hill section, para. 1). I suggest that other sounds, such as the enemies’ sound cues or alarm sounds, can achieve a similar purpose and encourage (or sometimes discourage) the gamer to progress into the game. While these functions are mostly integrated in enemy-generated sounds, some segments of dialogue can also be considered as serving progression functions. For instance, in Dead Space, radio communications with Kendra and Hammond help the gamer to figure out how to reinitialise the ventilation system of the hydroponic station of the U.S.S. Ishimura. Of course, one single sound event can serve many of these functions simultaneously. Furthermore, as Jørgensen (2008) specified, “the functional roles of sounds [will be] judged with
201
Listening to Fear
different urgency in different situations even though the sound is exactly the same” (Player Interpretation of Audio in Context section, para. 1). While this quote was intended to portray the relationship existing between sound and context in multiplayer sessions of World of Warcraft (Blizzard Entertainment, 2004), it is, nevertheless, quite applicable to the single player games which characterize most of the horror computer game genre. It is in regard to the macro and micro contexts of the games that prioritisation of one function over another will be possible. With all this in mind, it is now time to take a look at how horror games partly build their sound strategies by playing with these functions.
HOrrOr cOMPUtEr GAMEs’ sOUND strAtEGIEs Horror computer games have been around for a long time. During the 1980s, many games such as Atari’s Haunted House (1981), Sweet Home (Capcom, 1989)12 and the videoludic adaptations of the movies Halloween (Wizard Video Games, 1983) and Friday the 13th (LGN, 1989) hit the shelves to satisfy gamers in quest of an adrenalin rush. However, as I explained in a chapter published in Horror Video Games: Essays on the Fusion of Fear and Play, the abstract graphics and synthesised sounds of those games could not provide a simulation of evisceration as convincing as certain computer games can provide today. Indeed, “at that time, the horror was more lurking in the paratextual material than the games themselves” (Roux-Girard, 2009, p. 147). As Mark J. P. Wolf (2003) explained: The boxes and advertising were eager to help players imagine that there was more to the games than there actually was, and actively worked to counter and deny the degree of abstraction that was still present in the games. Inside the box, game instruction manuals also attempted to add
202
exciting narrative contexts to the games, no matter how far-fetched they were. (p. 59) As Remi Delekta and Win Sical (2003) suggest in an article of the only issue of the Horror Games Magazine: “[Horror computer games] can not exist without a minimum of technical capacities: sounds, graphics, processing speed. Fear to exist needs to be staged and mise en scène needs means” (p. 13, freely translated). It was in 1992 that Alone in the Dark, designed by Frédérik Raynal, shook the entire videoludic scene by incorporating polygonal characters, monsters and objects in two-dimensional, pre-rendered backgrounds. While this simulated three-dimensionality opened a new “game space” allowing for novel possibilities in gameplay, it also created an innovative “playground” for imaginative sound designers.
between Horror and terror Before we begin our analysis of horror games’ sound strategies, I need to clarify that fear, terror, dread, horror, anxiety and disgust, while they are broadly analogous emotions, are not synonymous. Moreover, not all horror computer games try to generate this entire emotional spectrum13. Accordingly, while some games rely on visceral manifestations of fear such as horror and disgust, others create fear at a psychological level, generating suspense, terror and dread. To understand how games manage to scare gamers, we must first take a look at the difference between horror and terror. According to Perron14 (2004), horror is compared to an almost physical loathing and its cause is always external, perceptible, comprehensible, measurable, and apparently material. Terror, as for it, is rather identified with the more imaginative and subtle anticipatory dread. It relies more on the unease of the unseen. (p. 133) Of course, sound design plays a prominent role in setting these two poles up. On one hand,
Listening to Fear
sounds provoke spontaneous sensations using rendering effects of matter, and on the other, they contribute to the elevation of suspense by creating ambiguity between causes, uncertainty regarding the origin of the sounds and by limiting the information carried by the sound’s affordances. To achieve this, horror computer games rely on a plurality of strategies. In the preceding sections of this chapter, I introduced a number of theoretical tools to help us understand how gamers structure sounds within and without the gameworld and how they produce meaning with the different cues they listen to. I now propose to revisit those concepts in light of a horrific mise en scène to comprehend how horror games develop those strategies.
the choice of the sounds While horror computer games (and mostly survival horror games) utilize a wide range of sound strategies, the staging of fear starts at a purely formal level. The choice of sounds and the way they are used are greatly responsible for the quality of the mood of the games. Some empirical research (quoted in Grimshaw, 2009) attempted to demonstrate that there is a certain degree of correlation between the physical signal of a sound and the emotions felt by listeners. For instance, Cho, Yi, and Cho’s (2001) research on textile sounds shows that loud and high-pitched sounds are unpleasant to the ear, while Halpern, Blake, and Hillenbrand (1986) point to loud, low-mid frequencies as being disagreeable. Whereas these investigations seem contradictory, they nevertheless tend to reveal that the acoustic qualities of sounds can have, amongst other factors, a physiological as well as psychological impact on the gamer. However, to arouse emotions, we need much more than mere frequencies. Borrowing from Pierre Schaeffer’s theory (synthesised by Chion, 1983) on the morphological description of sounds and Quatre écoutes (écouter, ouir, entendre, comprendre), it is mostly the work performed on
the allure, grain, dynamic profile, and the mass profile of a sound that determines its repercussion on the gamer. During their gameplay activity, the gamer hears (entendre) the morphological qualities of the sounds which allow them to comprehend (comprendre) and experience them as frightening. Therefore, it is not only because the gamer listens (écouter) to what they can identifiy as a zombie that they are scared, but because they hear (entendre) a moan or a growl, which correspond to the sound motifs contained in their knowledge of horror symbols. Therefore, it is not so much because the lamentation is generated by a zombie and comprises low-frequencies that it is frightening but, because, in its essence, it contains an energy reminiscent of a certain form of pain and agony. Ambiences can have a similar effect as they associate acoustic qualities with unpleasant situations and frightening locales. Reciprocally, the emotions produced by these choices of sound force the gamer to focus on every little detail of the sound design and are partly responsible for the gamer’s high level of “perceptual readiness”. Of course, the selection of sounds must also aim to create uncertainty as this feeling is essential to the creation of suspense. To do so, designers sometimes have to baffle the gamer’s expectations to a certain point. In his book on the Silent Hill series, Perron (2006) observed an evolution, from one title to another, in the sound used to portray the monstrous nurses. As the author explains: “The nurses, which have a much low-pitched ‘voice’ in [Silent Hill], have a penetrating sped up respiration in [Silent Hill 3]” (Perron, 2006, p. 93, translated by the author). According to me, this purely aesthetic strategy has the effect of reducing the gamer’s “launch window” into the game, preventing him from using his anterior knowledge to identify (identifying functions) his opponents. Consequently, Silent Hill 3’s sound design created ambiguity regarding the cause of the sound, and forced the gamer to reconstruct, from game to game, the relation between the sounds and their generators.
203
Listening to Fear
However, as we have seen earlier, horror games do not create fear only with their aesthetic dimension, but also with their narrative structure and gameplay. Therefore, some of their strategies are also constructed from these two dimensions.
creation of a startle Effect Sound plays a preponderant role in the creation of a variety of surprise effects. Following an analysis of this phenomenon by Robert Baird, Perron (2004) explained in his text Sign of a Threat: The Effects of Warning Systems in Survival Horror Games, that the essential formula for creating a startle effect can be summed up into three steps: “(1) a character presence, (2) an implied offscreen threat, and(3) a disturbing intrusion [often accentuated by a sound burst] into the character’s immediate space” (p. 133). As noted by the author, it is indeed at the moment of the intrusion of the off-screen threat inside the screen that sound will take on all its importance. At this level, it is a question of contrast in the sonic intensity and synchronisation of the sound and its generator in the visual field of the gamer. Therefore, startle effects depend on the physical limitations of the ears. As ears are slower to react than the eyes, the startle effect will temporally cloud the gamer’s evaluation and identification operations. To favour such effects, horror games often rely on a refined sound aesthetic and create moments of approximate silence. We can also say that the sounds the gamer cannot hear—the noises an enemy should make while moving towards the gamer that are rendered inaudible—play a role as important as the ones he can hear. In Schaefferian terms, we could say the game plays on the limits of hearing (ouïr) as a way to fool the gamer’s listening (écouter). It is only into these considerations that the episodes of respite before an attack play a determining role in the staging of a startle effect. This is stressed by Whalen (2004):
204
As it is the case with horror films, the silence [...] puts the player on edge rather than reassuring him that there is no danger in the immediate environment, increasing the expectation that danger will soon appear. The appearance of the danger is, therefore, heightened in intensity by way of its sudden intrusion into silence. (Silent Hill section, para. 3) It is according to this technique that designers punctuated, by shattering a window, the intrusion of a long-fanged monster in Alone in the Dark (I-Motion, 1992), or in a similar incursion of a zombie-dog in Resident Evil (Capcom, 2002), intensified the attack of a crawling monster in Silent Hill 2 (Konami, 2001), or amplified the brutal opening of an elevator door by a necromorph in Dead Space. As Perron (2004) mentioned: “To trigger sudden events is undoubtedly one of the basic techniques used to scare someone. However, because the effect is considered easy to achieve, it is often labelled as a cheap approach and compared with a more valued one: suspense” (p. 133). Following this line of thought, if sound plays a decisive role when it comes to making a gamer jump out of his shoes, it also plays a role in the creation of suspense. It is in this perspective, towards dread and anticipation, that the next strategies will be explored.
the Impact of Forewarning To create suspense, forewarning is one of the most effective strategies. Before further developing this concept, it is essential to mention that forewarning is not always exclusively based on sound. Forewarning, which consists of alerting the gamer to the presence of a menace in the surroundings of his player character, can also be based on visual cues, as it is the case with Fatal Frame (Tecmo, 2002) when the indicator in the bottom of the screen turns orange, signalling the presence of a ghost. However, many forewarning
Listening to Fear
systems have been designed through sound. The most renowned case of such a technique and, incidentally, the most studied—being discussed by Carr (2003), Kromand (2008), Perron (2004) and Whalen (2004) —is the pocket radio in the Silent Hill series. This radio, which emits static when a threat is nearby, plays its role as a warning system perfectly. Forewarning can also be created through a more classical way through making use of off-screen sounds (Perron, 2004). This is the case in Alone in the Dark: The New Nightmare (Infogrames, 2001) when, during the numerous seconds necessary for the gamer to go down the stairs leading to the interior court of the fort, it is possible to hear sounds associated with plant monsters coming from outside the frame of the fixed virtual camera shots. If we could believe that such a warning, prefiguring the entrance of a gloomy monster inside the screen, could reduce the feeling of fear or uneasiness in the gamer, research cited in Perron’s (2004) work tends to prove the opposite. As the author himself specifies, “[…] simple forewarning is not a way to prevent intense emotional upset. It is worse than having no information about an upcoming event” (Perron, 2004, p. 135). Such a method creates terror by anticipation based on a fear of the unseen. However, what Perron fails to highlight, is that forewarning does not rely only on the sound function of the same name. To be really effective, the forewarning must be unreliable and/or the quantity of information about the localisation of the generator must be limited. This precision offers the opportunity to introduce another strategy of horror computer games which relies on the functions of game sound: luring the gamer with sound.
Luring the Gamer With sound In his master’s thesis, Serge Cardinal (1994) explained that “filmic writing favouring the emergence of a clear spatial structure will have the tendency to anchor the sound with its source,
will privilege without ambiguity the identification and localisation of the source with sound, will submit sound’s diffusion to the sound properties’ logic” (p. 53, freely translated). To create fear and strong feelings of discomfort, horror games execute a reversal of this concept making the generators of the sounds harder to identify and localize. In the example from Alone in the Dark: The New Nightmare described earlier, the designers have avoided creating an evolution in the morphological properties of the sounds of the plant monsters in relation to the player character travelling through the fort’s space. This technique is used to alter the information the sound is carrying regarding the distance separating the threat and the gamer’s player character. While listening carefully, the gamer remarks no variation in the dynamic profile and mass profile of the sound generated by the creatures of darkness even though the player character performs a descent which, if it were scaled, would be equivalent to a little less than a hundred meters. In this case, the designers intentionally reduce the quantity of information carried by the sound in a way that limits the gamer’s interpretation of space and time, as it is impossible to evaluate the distance between the player character and the monsters. However, this tweaking of the spatial and temporal functions of the sound allows for an emphasis to be put on its forewarning purpose, which is bound to influence the progression function of the sound. Preventing the easy localization of the source/generator of the sound has an effect of reinforcing the suspense established by the forewarning while simultaneously forcing the gamer to take a more prudent approach while going down the stairs. Many horror game strategies rely on creating a certain level of ambiguity regarding the origin of sounds within the gameworld. While this can be achieved, as suggested by Daniel Kromand (2008), by blurring the frontier between the diegetic and non-diegetic parts of the game, similar exercises can be performed between instances within the diégèse. This partly explains why I chose to
205
Listening to Fear
structure my analysis of horror computer game sounds around the concept of sound generators and functions of game sound. Indeed, those notions are best suited to describing the relationship between the different instances of sound, in that there is more in horror computer games than meets the ear.
Ambiguity between sound Generators A study of the relations that exist between the different categories of sound generators allows me to put forward some of the sonic strategies of horror computer games. One of the most basic strategies of those games is to design sound in a way that creates ambiguity between the different sound generators of the game. Indeed, if two or more generators manage to produce sounds of a similar nature, it will directly affect the cognitive process of the gamer, making it harder to localize the sources but also harder to classify the cues as more or less important regarding the game context. For the most part, these ambiguities will concern the spatio-temporal and preparatory functions of sound and will generate fear through anticipation. The first technique consists of blurring the line between the sounds generated by the player and those generated by enemies. If these two generators manage to produce similar sound cues through a common source, it is possible to believe that, for example, the movements of the gamer’s player character through space might nourish the suspense. I must admit that this technique is not widespread in horror computer games but, seeing as the game Dead Space manages to create such a doubt, it is worthy of being mentioned as similar modus operandi might be exploited in future horror games. Indeed, in Dead Space, the sounds emitted by the player character’s footsteps on the viscous organic matter which often covers the floors of the spaceship are very similar to the sounds produced through the interaction of the substance and the deformed limbs of the grotesque monsters roaming with intent to kill the player
206
character. After hearing the monster’s footsteps for the first time, the gamer’s perceptual readiness will augment regarding these sounds. However, since the sounds emitted by the gamer’s player character are so similar to and blend with those of the enemies, the movement of the player character on the gooey surface might signal a potential presence in the player character’s surrounding environment. The gamer will then be forced to adopt a more careful approach and look around more often than he would have normally done. The flesh covered sections of the spaceship also encouraged the designers to establish a similar relationship, much more common to horror computer games, between the sounds generated by the enemies and the gameworld. The game environments are often designed to generate ambiences that imitate the sounds generated by the threats of the games. As mentioned by Kromand (2008): “The [gamer]’s understanding of affordances can help to perform better [...] as certain sounds pass information regarding nearby opponents, but at the same time these exact affordances are mimicked by the ambiance.” (Welcome to Rapture section, para.4). Once again, this way of conceptualizing sound in the game favours the creation of doubt in the player regarding the real provenance of the sounds. To get back to our Dead Space example, the organic matter is sometimes surmounted by excrescences which randomly squirt blood when the player character passes by. The excretion sound is also reminiscent of the sound made by enemies and tends to mislead the gamer as to what generated the sound. Similarly, other ambiance sounds, such as the creaking of the ship’s hull, the rumbling of the machinery, and other metallic impacts are used to simulate the prowling of a necromorph in an air vent or in one of the ship’s corridors. Of course, Dead Space is not the only game that makes use of such strategies. As Ekman and Lankoski (2009) noted, in “Silent Hill 2 and Fatal Frame, the whole gameworld breathes with life, suggesting that somehow the environment itself is alive, sentient, and capable of taking action against the
Listening to Fear
player” (p. 193). This way of introducing “event sounds with no evident cause, sound not plausibly attributed to an inanimate environment” is, for that matter, the trademark of the Silent Hill series. This way of conceptualizing sound even extends to the atonal, extra-diegetic music of the game. This aesthetic choice allows me to introduce one last case of ambiguity between sound generators. Some horror games aim at creating ambiguity between the game system, the gameworld, and the enemies, the emphasis being put, as suggested by Kromand (2008), on blurring the line between elements that are part of the diegesis and others that are not. By choosing to exploit atonal music, which is closer to musique concrète than traditional orchestral or popular music, that blends and often merges with the ambient and dynamic sound effects of the game, designers manage to lure the gamer into thinking that there are more threats than there actually are. This technique also often succeeds at diverting the gamer’s attention from the real threats in the game. The most flagrant example of such a scrambling between the sounds emitted by enemies and the game system comes from Silent Hill. During a gameplay sequence in the alternate town of Silent Hill (Konami, 1999), the non-diegetic music, which is mostly constituted of metallic, industrial sounds, also includes in its loop a sound that is very similar to the sounds generated by the flying monsters of the game. Since the flying demons’ sounds are mixed very low within the music, the gamer, who is concentrated on his activity, probably won’t notice that this cue is repeated on a fixed temporal line and will be bound to associate this sound to an oncoming monster. A similar type of conception was also privileged in the sound design of Dead Space. As Don Veca, the lead sound designer of the game, underlined: “We […] approached the entire sound-scape as a single unit that would work together to create a dark and eerie vibe. [...] In this way, Dead Space has really blurred the line between music and sound design” (cited in Napolitano, 2008, First
Question section, para. 2). Therefore, as mentioned by Kromand (2008), “the constant guessing as to whether the sounds have a causal connection put the [gamer] in unusual insecure spot that might well build a more intense experience” (Conclusion section, para. 2), which has the effect of augmenting the level of fear in the player. As a unit, the techniques which aim at creating ambiguity between sound generators are based on the different circuits a sound can perform between the on-screen, the off-screen and the extra-diegetic. Indeed, it is by regularly making sounds pass from on-screen (which allows the player to identify the cause of the sound) to the off screen (where the sound serves as a forewarning of a threat) to the extra-diegetic (where sound simulates the presence of a threat), that videoludic sound manages to condition the gamer to be wary of everything he hears.
Fear and context Of course, fear will not only be induced by the morphological nature of a sound, by its fixed relation with its cause or the constructions of strategies. Fear, horror, and terror mostly depend on the context in which the sound is heard. At this level, many parameters will influence the perception the gamer will have of a sound: the spatial configuration, the general difficulty of the gamer, the number of enemies, the available resources, the available time and so forth. The global situation related to the perception of a sound will have a determining impact on the attitude a gamer will adopt towards this sound. A videoludic design favouring such game mechanics will therefore be an accomplice to the sound strategies.
cONcLUsION In an attempt to scare their gamers, horror computer games utilise different strategies of mise en scène. Testament to the dialogue between the
207
Listening to Fear
production and reception of the games, these strategies, to be efficient, must play with the gamer’s expectations—regarding the reading and listening constraints imposed by the genre and paratext— and exploit the cognitive schemes that help them to classify the information they receive during their gameplay sessions. In this line of thought, the games must create situations that will generate negative emotions such as fear, horror, and terror. As only the gamer gets access to these emotions, I privileged an approach oriented on the reception of sound in a gameplay situation rather than a mere analysis of technical data. It is consequently with a terminology that does not reference directly the game code or algorithm, but instead focuses on the gamer’s mental reproduction of the videoludic universe, that I attempted to explain the importance of sound in the development of horror computer game strategies. The gamer’s first objective being to insure the survival of their player character, their tasks mainly revolve around detecting all the intrusions that might become hazardous for their character. In these circumstances, gamers must structure the sounds they hear and extract from them all the information they need to properly respond to a given situation. This cognitive process has been broadly presented with the help of Arsenault and Perron’s model (Figure 1). More precisely, the gamer must determine the origin and the cause of the sounds. To do so, they must first determine if a sound is generated by an event present within the videoludic world or overhanging this world. The gamer must then refine this categorisation to establish more precisely what, between their actions, the enemies, the game environment, and the game system, is the generator of the sound. At the same time, they must pay attention to the affordances (their functions) of the sounds which might communicate information about the space, the time, the enemies, and the events occurring in the game environment. The gamer must then evaluate which affordance must be prioritised according to the circumstances.
208
To feel safe, the gamer must be able to quickly find answers to their questions. To arouse fear, horror games block this process. While the morphologic nature of a sound is sometimes enough to induce a strong feeling of discomfort, horror computer games mostly rely on sound strategies to reach their goal. From the startle effects to the creation of ambiguity between the sound generators, the games trick the gamer’s listening by limiting the information the sounds carry. Plunged into a universe of “un-knowledge” (Kromand, 2008), the gamer can only be scared by their gameplay experience. To be really effective, the sound strategies must be part of a whole and integrated into a global staging of fear, which also depends on the relationships between the sound and images, the gameplay, and the game’s narrative. In the end, it is the pressure applied by the genre, and the deconstruction of the structure and the functions of sound by the different in-game situations, that will determine the true impact of the sound strategies on the gamer.
rEFErENcEs Alone in the dark. [Computer game]. (1992). Infogrames (Developer). Villeurbanne: Infogrames. Alone in the dark: Inferno. [Computer game]. (2008). Eden Games S.A.S. (Developer). New York: Atari. Alone in the dark:The new nightmare. [Computer game]. (2001). DarkWorks (Developer).Villeurbanne: Infogrames. Altman, R. (1992). General introduction: Cinema as event . In Altman, R. (Ed.), Sound theory, sound practice (pp. 1–14). New York: Routledge. Arsenault, D., & Perron, B. (2009). In the frame of the magic cycle: The circle(s) of gameplay . In Perron, B., & Wolf, M. J. P. (Eds.), The video game theory reader 2 (pp. 109–132). New York: Routledge.
Listening to Fear
Arsenault, D., & Picard, M. (2008). Le jeu vidéo entre dépendance et plaisir immersif: les trois formes d’immersion vidéoludique. Proceedings of HomoLudens: Le jeu vidéo: un phénomène social massivement pratiqué, (pp. 1-16). Retrieved from http://www.homoludens.uqam.ca/index. php?option=com_ content&task=view&id=55 &Itemid=63. Boillat, A. (2009). La «diégèse» dans son acception filmologique. Origine, postérité et productivité d’un concept. Cinémas Journal of Film Studies, 19(2-3), 217–245. Bordwell, D. (1986). Narration in fiction film. New York: Routledge. Carr, D. (2003). Play dead: Genre and affect in Silent Hill and Planescape Torment. Game Studies, 3(1). Retrieved from http://www.gamestudies. org/0301/carr/ Chion, M. (1983). Guide des objets sonores: Pierre Schaeffer et la recherche musicale. Paris: Buchet/Chastel. Chion, M. (1990). L’Audio-vision. Paris: Nathan. Chion, M. (2003). Un art sonore, le cinéma: histoire, esthétique, poétique. Paris: Cahiers du Cinéma. Collins, K. (2008). Game sound: An introduction to the history, theory, and practice of video game music and sound design. Cambridge, MA: MIT Press. Dead space. [Computer game]. (2008). EA Redwood Shores (Developer). Redwood City: Electronic Arts. Dektela, R., & Sical, W. (2003). Survival horror: Un genre nouveau. Horror Games Magazine, 1(1), 13–16. Ekman, I., & Lankoski, P. (2009). Hair-raising entertainment: Emotions, sound, and structure in Silent Hill 2 and Fatal Frame . In Perron, B. (Ed.), Horror video games: Essays on the fusion of fear and play (pp. 181–199). Jefferson, NC: McFarland.
Fatal frame. [Computer game]. (2002). Tecmo (Developer). Torrance: Tecmo. Friday the 13th. [Computer game]. (1989). PackIn-Video (Developer). New York: LJN. Grimshaw, M. (2008). The acoustic ecology of the first person shooter: The player experience of sound in the first-person shooter computer game. Saarbrücken, Country: VDM Verlag Dr. Muller. Grimshaw, M. (2009). The audio uncanny valley: Sound, fear and the horror game. In Proceedings of Audio Mostly: 4th Conference on Interaction with Sound. Retrieved from http:// digitalcommons.bolton.ac.uk/cgi/viewcontent. cgi? article=1008&context=gcct_conferencepr. Halloween. [Computer game]. (1983). Video Software Specialist (Developer). Los Angeles: Wizard Video Games. Hauntedhouse.[Computer game]. (1981). Atari (Developer).Sunnyvale: Atari. Huiberts, S., & van Tol, R. (2008). IEZA: A framework for game audio. Gamasutra. Retrieved from http://www.gamasutra.com/view/feature/3509/ ieza_a_framework_for_game_audio.php. Jauss, H. R. (1982). Toward an aesthetic of reception. Minneapolis, MN: University of Minnesota Press. Jørgensen, K. (2006). On the functional aspects of computer game audio. In Proceedings of Audio Mostly – A Conference on Sound in Games (pp. 48-52). Retrieved from http://www.tii.se/ sonic_prev/images/ stories/amc06/amc_proceedings_low.pdf. Jørgensen, K. (2008). Audio and gameplay: An analysis of PvP battlegrounds in World of Warcraft. Game Studies, 8(2). Retrieved from http:// gamestudies.org/0802/articles/jorgensen. Jørgensen, K. (2011). Time for new terminology? Diegetic and non-diegetic sounds in computer games revisited . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. 209
Listening to Fear
Juul, J. (2005). Half-real: Video games between real rules and fictional worlds. Cambridge, MA: MIT Press. Kromand, D. (2008). Sound and the diegesis in survival-horror games. In Proceedings of Audio Mostly 2008 the 3rd Conference on Interaction with Sound (pp. 16-19). Retrieved from http:// www.audiomostly.com/images/stories/ proceeding08/proceedings_am08_low.pdf. Left 4 dead. [Computer game]. (2008). Turtle Rock Studios (Developer). Kirkland: Valve Software. Manovich, L. (2001). The language of new media. Cambridge, MA: MIT Press. Murray, J. (1997). Hamlet on the holodeck: The future of narrative in cyberspace. New York: The Free Press. Napolitano, J. (2008). Dead Space sound design: In space no one can hear intern screams. They are dead. (Interview). Original Sound Version. Retrieved from http://www.originalsoundversion. com/?p=693. Neale, S. (2000). Genre and Hollywood. New York: Routledge. Odin, R. (2000). De la fiction. Bruxelle: De Boeck. Perron, B. (2004). Sign of a threat: The effects of warning systems in survival horror games. In . Proceedings of COSIGN, 2004, 132–141. Retrieved from http://www.cosignconference. org/ downloads/papers/perron_cosign_2004.pdf. Perron, B. (2006). Silent hill: Il motore del terrore. Milan: Costa & Nolan. Resident evil 3: Nemesis. [Computer game]. (1999). Capcom (Developer). Sunnyvale: Capcom USA. Resident evil 4. [Computer game]. (2004). Capcom Production Studio 4 (Developer). Sunnyvale: Capcom USA.
210
Resident evil 5. [Computer game]. (2009). Capcom Production Studio 4 (Developer). Sunnyvale: Capcom USA. Cardinal, S. (1994). Occurrences sonores et espace filmique. Unpublished master’s thesis. University of Montréal, Montréal. Resident evil. [Computer game]. (1996). Capcom (Developer). Sunnyvale: Capcom USA. Resident evil. [Computer game]. (2002). Capcom (Developer). Sunnyvale: Capcom USA. Roux-Girard, G. (2009). Plunged alone into darkness: Evolution in the staging of fear in the Alone in the Dark series . In Perron, B. (Ed.), Horror video games: Essays on the fusion of fear and play (pp. 145–167). Jefferson, NC: McFarland. Silent hill 2. [Computer game]. (2001). KCET (Developer). Redwood City: Konami of America. Silent hill 3. [Computer game]. (2003). KCET (Developer). Redwood City: Konami of America. Silent hill. [Computer game]. (1999). KCEK (Developer). Redwood City: Konami of America. Stockburger, A. (2003). The game environment from an auditive perspective. In Proceedings of Level Up, DiGRA 2003. Retrieved from http:// www.stockburger.co.uk/research/pdf/ AUDIOstockburger.pdf. Sweethome. [Computer game]. (1989). Capcom (Developer). Osaka: Capcom. Taylor, L. (2005). Toward a spatial practice in video games. Gamology.Retrieved from http:// www.gamology.org/node/809. Whalen, Z. (2004). Play along: An approach to videogame music. Game Studies, 4(1). Retrieved from http://www.gamestudies.org/0401/whalen/. Wolf, M. J. P. (2003). Abstraction in the video game . In Perron, B., & Wolf, M. J. P. (Eds.), The video game theory reader (pp. 47–65). New York: Routledge.
Listening to Fear
Worldof Warcraft. [Computer game]. (2004). Vivendi (Developer). Irvine: Blizzard.
ENDNOtEs 1
KEY tErMs AND DEFINItIONs Allure: It is the amplitude or frequency modulation of a sound. Comprendre: According to Schaeffer, comprendre means grasping a meaning, values, by treating the sound like a sign, referring to this meaning as a function of a language, a code. Dynamic Profile: It is the temporal evolution of the sound’s energy. Écouter: According to Schaeffer, écouter, is listening to someone, to something; and through the intermediary of sound, aiming to identify the source, the event, the cause, it treats the sound as a sign of this source, this event. Entendre: According to Schaeffer, entendre, here, according to its etymology, means showing an intention to listen [écouter], choosing from what we hear [ouïr] what particularly interests us, thus “determining” what we hear. Grain: It can be defined as the microstructure of sound matter, such as the rubbing of a bow. Mass Profile: It is the evolution in the mass of a sound. For example, from pitched to complex. Mise En Scène: It is the organisation of the different elements that define the staging of a scene, or, in the case that interests us, the simulation of a gameplay sequence. Ouïr: According to Schaeffer, ouïr is to perceive by the ear, to be struck by sounds, it is the crudest level, the most elementary of perception; so we “hear”, passively, lots of things which we are not trying to listen to nor understand Videoludic: It is an adjecti.ve linked to videogames. The use of this term opens a door for the utilisation of sonoludic as an adjective for audio only games or computer games in which gameplay mechanics are mostly based on sound.
2
3
4
5
6
It must be mentioned that this chapter does not wish to theorize the perhaps ill-suited notion of videoludic genres—a fertile field of computer game research that should, in coming years, generate quite a debate–but wishes, rather, to use it as a tool to better understand how gamers structure their gameplay session in survival horror games. For space reasons, I chose to limit my analysis of these specific factors. Just keep in mind that the industry and the technology play a great part in the final rendering of the games. Note that the former definition is largely associated with reception issues while the later refers to the productions aspects of the games. Generic issues of survival horror games will therefore be approached as a “constraint of listening” from which the gamer will organise and evaluate the role of sound in a given context. Indeed, while playing a game, the gamer never has access to this code. As Arsenault and Perron (2009) explained, the gamer “only witnesses the [...] result of the computer’s response to his action. He does not, per se, discover the game’s algorithm which remains encoded, hidden and multifaceted” which means that “the notion that a gamer’s experience and a computer program directly overlap is a mistake” (p. 110). While this statement upholds the approach of this paper, it also calls for a use of terminology that can reflect a game audio structure with accuracy and that can be applied directly to a gameplay situation. I find necessary to make this distinction because the notion diegesis, which is now often broadly defined as “the fictional world of the story” (Bordwell, 1986) might be questionable as it sometimes seems to borrow too much from narrative theory. Étienne Souriau
211
Listening to Fear
7
212
(n.d.), in his original definition of the term, conceptualised the “diégèse” as a “‘world’ constructed by representation” (Boillat, 2009, p. 223, freely translated) and, as it is possible to deduce, which is not necessarily specific to a narrative theory. Following Souriau’s line of thought, “the diegetic level is characterized not only by ‘everything we take into consideration as being represented by the film’ but also by ‘the type of reality supposed by the signification of the film’” (cited in Boillat, 2009, p. 222, freely translated). According to Boillat (2009), Souriau refined this definition by assimilating the “diégèse” to “all that belongs, ‘in the indigibility’ [...] to the story being told, to the world supposed or proposed by the fiction of the film” (Boillat p.222, freely translated), this “all” making reference to three very important constituents: time, space, and the character. As it is also highlighted by Boillat, this second part of the definition is essential to the concept so as to prevent the “reducing [of] the ‘diégèse’, as it was often the case [...] to only the ‘recounted story’” (p. 222, freely translated). However, in his book De la Fiction, French semio-pragmatist Roger Odin makes a clarification regarding the dichotomy between the story and the diégèse. As he explained, the “diégèse” “cannot be mixed up with the story” but “provides the descriptive elements the story needs manifest to itself” (cited in Boillat, 2009, p. 234, freely translated). While trying to apply the concept of diégèse to videogames, one must acknowledge that it does not function following the requirement of fictional films and according to a pure “fictionalisation process” (Odin, 2000). The reconstruction of the diegetic stage works differently based partly on a process
8
9
10
11
12 13
14
of “systemic immersion” (Arsenault & Picard, 2008), allowing for more levels of communication between the gamer’s world and the gameworld. On these premises, whether certain sounds generated within the “diégèse” seem to address an instance without it or not, does not hold that much importance regarding the construction and integrity of the “diégèse”. I personally prefer to use the adjective extradiegetic instead of non-diegetic because I believe that, for example, survival horror games’ music is tightly linked to the events that are taking place in the diegetic world. In Hamlet on the Holodeck, Janet Murray (1997) defines agency as “the satisfying power to take meaningful action and see the results of our decisions and choices” (p. 126). For example Jørgensen’s (2006; 2008) response functions, even though they play an important role in the actual gameplay of survival horror games are not as important to the construction of the games’ strategies. For this reason, they will be left out of this chapter. For more information on sound function, see Grimshaw, 2008; Jorgensen, 2006 and 2008, and Collins 2008. I think “player character state” would be more appropriate as the gamers themselves remain in their living room. Only available in Japan. This allows for the differentiation between horror computer games, which are a broader category of the videoludic horror genre, and survival horror games, which can be referred to as games that maximize the elements of a horrific mise en scène. Following William H. Rockett’s line of thoughts.
213
Chapter 11
Uncanny Speech Angela Tinwell University of Bolton, UK Mark Grimshaw University of Bolton, UK Andrew Williams University of Bolton, UK
AbstrAct With increasing sophistication of realism for human-like characters within computer games, this chapter investigates player perception of audio-visual speech for virtual characters in relation to the Uncanny Valley. Building on the findings from both empirical studies and a literature survey, a conceptual framework for the uncanny and speech is put forward which includes qualities of speech sound, lip-sync, human-likeness of voice, and facial expression. A cross-modal mismatch for the fidelity of speech with image can increase uncanniness and as much attention should be given to speech sound qualities as aesthetic visual qualities by game developers to control how uncanny a character is perceived to be.
INtrODUctION As technological advancements allow for the representation of high fidelity, realistic, human-like characters within computer games, aspects of a character’s appearance and behaviour are being associated with the Uncanny Valley phenomenon. (A definition of the Uncanny Valley is provided in the first section of this chapter.) It seems that one of the main factors contributing to a character being regarded as lifeless as opposed to lifelike is the character’s speech. In 2006, Quantic Dream DOI: 10.4018/978-1-61692-828-5.ch011
revealed a tech demo (The Casting) for the computer game Heavy Rain (2006), in which the main character, Mary Smith, evoked a somewhat negative responsive from the audience (Gouskos, 2006). Criticism was made of the uncanny nature of Mary Smith’s speech in that it sounded strange and out of context with the given facial expression and emotion portrayed by this character. A closer inspection of the video showed that not only were there errors in the sound recording (disparities between the acoustics and the volume and materials of the room with excessive plosives contradicting the distant camera and microphone), but a lack of correct pitch and intonation for speech and a lack
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Uncanny Speech
of synchronization of speech with lip movement were factors that reduced the overall believability for this character (Tinwell & Grimshaw, 2010). A mismatch between the conveyed emotion of Mary Smith’s voice with her gestures and posture exacerbated how unnatural and odd the character was perceived to be. MacDorman (quoted in Gouskos, 2006), observed that a perceived asynchrony of lip movement with speech was one of the factors that people found disturbing about Mary Smith: In addition, there is sometimes a lack of synchronization with her speech and lip movements, which is very disturbing to people. People ‘hear’ with their eyes as well as their ears. By this, I mean that if you play an identical sound while looking at a person’s lips, the lip movements can cause you to hear the sound differently. Since Mary Smith was revealed in 2006, increasing technological sophistication for computer games has allowed for heightened realism of human-like characters. Cinematic animation is achieved not only for cut scenes and trailers containing full motion video (FMV) but also for animation during in-game play. For example, the phoneme extractor and facial expression tool Faceposer designed by Valve for titles such as Left 4 Dead (2008) and Half Life 2 (2008). However it would seem that speech, as a factor integral to the uncanny phenomenon, is often overlooked when compared to the aesthetic visual qualities of behaviour of a human-like character. So far there have been limited studies to ascertain which factors contribute to the uncanny for virtual characters. In response to the hearsay in mass media raised by characters such as Mary Smith, Tinwell and Grimshaw (2010) conducted a study to investigate how the cross-modality of image and sound might exaggerate the uncanny. The results from this study are referred to throughout all sections within this chapter as the Uncanny Modality (UM) study, unless otherwise stated from another study. Prior to this, much of the work on the uncanny had been
214
visually-based, excluding sound as a factor. As a way towards building a conceptual framework for the uncanny and virtual characters in immersive 3D environments, this chapter defines how characteristics for a character’s speech may exaggerate the uncanny by considering aspects such as synchronization of audio and video streams, articulation, and qualities of speech. The first section provides an exposition of the Uncanny Valley describing how the theory came about, previous investigation into the theory and potential limitations of the theory in relation to virtual characters. Previous authors (such as Bailenson et al., 2005; Brenton, Gillies, Ballin, & Chatting, 2005; and Vinayagamoorthy, Steed, & Slater, 2005) have suggested that uncanniness is increased when the behavioural fidelity for a realistic, human-like character does not match up with that character’s realistic, human-like appearance. The second section discusses how a cross-modal mismatch between a character’s appearance and speech may exaggerate the uncanny. For instance whether a character’s speech may be perceived as belonging to a character or not, based on that character’s appearance. The third section discusses how particular qualities of speech such as slowness of speech, intonation and pitch and how monotone the voice sounds, may influence perceived uncanniness and how such qualities might work to the advantage of those characters intended to elicit an eerie sensation. The results from the UM study (Tinwell & Grimshaw, 2010) revealed a strong relationship between how strange a character is perceived to be and the lack of synchronization of speech and lip movement. (Characters rated as close to perfect synchronization for lip movement and speech were perceived as less strange than those with disparities in synchronization.) The fourth section reviews the findings from this study and also puts forward future experiments that may
Uncanny Speech
help to define acceptable levels of asynchrony for computer games where uncanniness is not desired. For figures onscreen an over exaggeration of pronunciation for particular words can make the figure appear uncanny to the viewer as the figure seems absurd or comical (Spadoni, 2000). The fifth section considers how the manner of articulation of speech may influence the uncanny by examining the visual representation (viseme) for each phoneme within the choreography tool Faceposer (Valve Corporation, 2008). A summary is presented in the final section that defines the outcomes from this inquiry as to how speech influences the uncanny for realistic, human-like virtual characters as a way towards building a conceptual framework for the uncanny. It is intended that this framework is not only relevant to computer game characters but also for characters within a wider context of user interfaces. For example virtual conversational agents within therapeutic applications used to interact with autistic children to aid the development of communication skills. Also those virtual conversational agents used to deliver learning material to students within e-learning applications.
tHE UNcANNY VALLEY The subject of the uncanny was first introduced in contemporary thought by Jentsch (1906) in an essay entitled On the Psychology of the Uncanny. Jentsch described the uncanny as a mental state where one cannot distinguish between what is real or unreal and which objects are alive or dead. In 1919, to establish what caused certain objects to be construed as frightening or uncanny, Sigmund Freud made reference to Jentch’s essay as a way to describe the feeling caused when one cannot detect if an object is animate or inanimate upon encountering objects such as “waxwork figures, ingeniously constructed dolls and automata” (p. 226). Freud characterized the uncanny as similar to the notion of a doppelganger; the body replica
being at first an assurance against death, then the more sinister reminder of death’s omen “a ghastly harbinger of death” (p. 235). Building on previous depictions of the uncanny, the roboticist Masahiro Mori (1970, as translated by MacDorman & Minato, 2005) observed that a robot continued to be perceived as more familiar and pleasing to a viewer as the robot’s appearance became more human-like. However, a more negative response was evoked from the robot as the degree of human-likeness reached a stage at which the robot was close to being human, but not fully. Mori plotted a perpendicular slope climbing as the variables for perceived human-likeness and familiarity increased until a point was reached where the robot was regarded as more strange than familiar (see Figure 1). At this point (about 80-85% human-likeness), due to subtle deviations from the human-norm and the resounding negative associations with the robot, Mori drew a valley shaped dip. A real human was placed, escaping the valley, on the other side. Mori gave examples of objects such as zombies, corpses and lifelike prosthetic hands that lie within the valley. He also predicted that the Uncanny Valley would be amplified with movement as opposed to the still images of a robot. Mori recommended that for robot designers, it was best to avoid designing complete androids and to instead develop humanoid robots with human-like traits, aiming for the first valley peak and not the second which would risk a fall into the Uncanny Valley. As computer game designers working in particular genres continue the pursuit of realism as a way to improve player experience and immersion, designers have the second peak as a goal to achieve believably realistic, humanlike characters (Ashcraft, 2008; Plantec, 2008). To reach this goal and to assess if overcoming the Uncanny Valley is an achievable feat, further investigation and analysis of the factors that may exaggerate the uncanny is required.
215
Uncanny Speech
Figure 1. A diagram to demonstrate Mori’s plot of perceived familiarity against human-likeness as the Uncanny Valley (taken from a translation by MacDorman and Minato of Mori’s ‘The Uncanny Valley’)
Previous Investigation into the Uncanny Valley Since Mori’s original theory of the Uncanny Valley over thirty years ago, the increasing realism possible for virtual characters and androids has sparked a renewed interest in the phenomenon (Green, MacDorman, Ho, & Vasudevan, 2008; Pollick, in press; Steckenfinger & Ghazanfar, 2009). However, there have been few empirical studies conducted to support the claims of uncanny virtual characters and androids evident within new media (Bartneck, Kanda, Ishiguro, & Hagita, 2009; MacDorman and Ishiguro, 2006; Pollick, in press; Steckenfinger & Ghazanfar, 2009). Still images of both virtual characters and robots have been used for experiments investigating the Uncanny Valley. Design guidelines have been authored to help realistic, human-like, characters escape from the valley (for example, Green et al., 2008; MacDorman, Green, Ho, & Koch, 2009; Schneider, Wang & Yang, 2007; Seyama & Nagayama, 2007). MacDorman et al. focused on how facial proportions, skin texture and how levels of detail affect the perceived eeriness, human likeness, and attractiveness of 216
virtual characters. Schneider et al. investigated the relationship between human-like appearance and attraction with the results indicating that the safest combination for a character designer seems to be a clearly non-human appearance with the ability to emote like a human. Hanson (2006) conducted an experiment using still images of robots across a spectrum of humanlikeness. An image of a human was morphed to an android on one half of the spectrum and then the android to a mechanical-looking, humanoid robot on the other half. The results depicted an uncanny region between the mechanical-looking, humanoid robot and the android. In a second experiment, Hanson found that it was possible to remove the uncanny region within the same plot, where it had previously existed, by changing the appearance of the android’s features to a more “cartoonish” and friendly appearance. However the results from these experiments only provide a somewhat limited interpretation of perceived uncanniness based on inert (unresponsive) still images. Most characters used in animation and computer games are not stationary, with motion, timing and facial animation being the main factors contributing to the Uncanny Valley
Uncanny Speech
(Richards, 2008; Weschler, 2002). For realistic androids, behaviour that is natural and appropriate when engaging with humans, referred to as “contingent interaction” by Ho, MacDorman, and Pramono (2008, p. 170), is a key factor in assessing a human’s response to an android (Bartneck et al., 2009; Kanda, Hirano, Eaton, & Ishiguro, 2004). Previous authors (such as Green et al., 2008; Hanson, 2006; MacDorman et al., 2009; Schneider et al., 2007) state that the conclusions drawn from their experiments where still images had been used may have been different had movement (and sound) been included as a factor. The perception of the uncanny does not always have to provide a negative impact for the viewer (MacDorman, 2006). The principals of the uncanny theory can work to the advantage of engineers when designing robots with the purpose of being unnerving within an appropriate setting and context. Similarly, the uncanny may help in the success of the horror game genre for zombie-type characters. Building on these findings, Tinwell and Grimshaw (2010) conducted the UM study, using video clips with sound, to investigate how the uncanny might enhance the fear factor for horror games. The results showed that combined factors such as appearance and sound can work together to exaggerate the uncanny for virtual characters.
Not only was it suggested that a lack of lip/vocalization synchronization reduced how familiar a character was perceived to be, but a perceived lack of human-likeness for a character’s voice, facial expression, and doubt in judgement as to whether the voice actually belonged to the character or not, also reduced perceived familiarity.
Limitations of Mori’s theory Recent studies demonstrate weaknesses within the Uncanny Valley theory and suggest it may be more complex than the simplistic valley shape that Mori plotted in his original diagram (see Figure 1). Various factors (including speech) can influence how uncanny an object is perceived to be (Bartneck, et al., 2009; Ho et al., 2004; Minato, Shimda, Ishiguro, & Itakura, 2004; Tinwell & Grimshaw, 2009). Attempts to plot Mori’s Uncanny Valley shape cannot confirm the twodimensional construct that Mori envisaged. The results from experiments that have been conducted using cross-modal factors such as motion and sound imply that it is unlikely that the uncanny phenomena can be reduced to the two factors, perceived familiarity and human-likeness, and is instead a multi-dimensional model (see Figure 2).
Figure 2. The Uncanny Wall, (Tinwell & Grimshaw, 2009)
217
Uncanny Speech
When ratings for perceived familiarity were plotted against human-likeness, the results from Tinwell and Grimshaw’s experiment, using 100 participants and 15 videos ranging from humanoid to human with character vocalization, depict more than one valley shape. The plot is more complex than Mori’s smooth curve and the valley shapes less steep than Mori’s perpendicular climb. The most significant valley occurs between the humanoid character Mario, on the left and the stylized, human-like Lara Croft, on the right. The nadir for this valley shape is positioned at about 50-55% human-likeness that is lower than Mori’s original prediction of 80-85% human-likeness. Results from studies using robots with motion and speech are also inconsistent with Mori’s Uncanny Valley. MacDorman (2006) plotted ratings for perceived familiarity against human-likeness for an experiment using videos of robots from mechanical to human-like, including some stimuli with speech. The results showed no significant valley shape in keeping with the depth and gradient of Mori’s plot and that robots rated with the same degree of human-likeness can have a different rating for familiarity. Bartnek et al. (2009) found that when a robotic copy of a human was compared to that human for the two conditions movement (with motion and speech) and still, despite a significant difference in perceived human-likeness between the human and the android, there was no significant difference between perceived likeability for the android and the human. These results imply that movement may not be the only factor to influence the uncanny. Further investigation is required to assess how speech may contribute to a more multi-dimensional model to measure the uncanny. Uncertainty exists as to whether the meaning for Mori’s original concept may have been “lost in translation” (Bartnek et al., 2009, p. 270). The word that Mori used in the title for the Uncanny Valley is bukimi, which, translated in Japanese, stands for “weird, ominous, or eerie”. In English, “synonyms of uncanny include unfamiliar, eerie, strange, bizarre, abnormal, alien, creepy, spine
218
tingling, inducing goose bumps, freakish, ghastly and horrible” (MacDorman & Ishiguro, 2006, p. 312) while Freud used the word unheimlich to define the uncanny: Further confusing the issue, the root heimlich has two meanings viz familiar or agreeable and that which is concealed and should be kept from sight. Freud discussed both meanings in his 1919 essay and they are not necessarily mutually exclusive as we show below. However, despite a generic understanding for the word that Mori used, the appropriateness of the term shinwa-kan, (translated as familiarity) that Mori used in his original paper as a variable to measure and describe uncanniness has been addressed by previous authors. As an uncommon word within Japanese culture there is no direct English equivalent for the word shinwa-kan. The word familiarity stands for the opposite to unfamiliarity (one of the synonyms for bukimi), yet the word familiarity may be open to misinterpretation. Whilst strange is a typical term for describing the unfamiliar, familiarity might be interpreted with a variety of meanings including how well-known an object appears: for example, a well-known character in popular culture or an android replica of a famous person. Bartnek et al. (2009) proposed that with no direct translation shinwa-kan could be treated as a “technical term” in its own right however this may cause problems when comparing the results from one experiment to another where the more generic translation “familiarity” is used as the dependent variable (p. 271). Other words such as likeability (Bartnek et al., 2009) or unstrange (the opposite to strange) may be closer to Mori’s original intention, nevertheless the validity for experiments conducted into the uncanny may be more robust if a standard word were to be used as a dependent variable to measure and describe perceived uncanniness: that word has yet to be agreed upon. Conflicting views exist as to whether it is actually possible to overcome the Uncanny Valley. One theory put forward is that objects may appear
Uncanny Speech
less uncanny over time as one grows used to a particular object. Brenton et al. (2005) give the example of the life-like sculpture The Jogger by Duane Hanson: The sculpture will appear “less uncanny the second time that it is viewed because you are expecting it and have pre-classified it as a dead object”. The effect of habituation may also apply to those with regular exposure to realistic human-like virtual characters. 3D modellers working with this type of character or gamers with an advanced level of gaming experience may be less able to detect flaws within a particular character because they had grown accustomed to the appearance and behaviour for that character by interacting with it on a regular basis (Brenton, et al., 2005). Recent empirical evidence goes against this theory. The results from a study by Tinwell and Grimshaw (2009) showed that the level of experience for both playing computer games and of using 3D modelling software made little difference in detecting uncanniness. (Judgements for those with an advanced level of experience for perceived familiarity and human-likeness had no significant difference between those with lesser or no experience.) Tinwell and Grimshaw suggest it may never be possible to overcome the Uncanny Valley as a viewer’s discernment for detecting subtle nuances from the human norm keeps pace with developments in technology for creating realism. With a lack of empirical evidence to support the notion of an Uncanny Valley, the notion of an Uncanny Wall may be more appropriate (see Figure 2). Viewers who may at first have been “wowed” by the apparent realism of characters such as Quantic Dream’s Mary Smith (2006) or characters in animation such as Beowulf (Zemeckis, 2007) or The Polar Express (Zemeckis, 2004), soon developed the skills to detect discrepancies for such characters’ appearance and behaviour. Indeed, as soon as the next technological breakthrough in achieving realism is released, a viewer may be reminded of the flaws for a character that at first did not seem uncanny. In addition to the meaning of uncanny
as used in the Uncanny Wall hypothesis being an exposition of the first Freudian sense of heimlich/ unheimlich as described above, the undesired unmasking of the technological processes used in the production of a character, and the perception of those processes as flaws in the presentation of that character, allows us simultaneously and without contradiction to use the second meaning of heimlich: that which should remain out of sight. The concept of the Uncanny Wall (as opposed to the Uncanny Valley which always holds out the hope for a successful traversal to the far side), evokes a variety of myths, legends and modern stories (Frankenstein’s monster, for example, or the Golem) in which beings created by man are condemned to forever remain pale shades of those created by gods. Further studies would be required to provide evidence for the Uncanny Wall to substantiate the hypothesis that the Uncanny Valley is an impossible surmount for realistic, human-like virtual characters. As soon as the next character is released, announced as having overcome the Uncanny Valley, we intend to conduct another test using the same characters as in the previous experiment. If those characters previously rated as close to escaping the valley, such as Emily (Image Metrics, 2008), are placed beneath the new character as perceived strangeness increases, our prediction may be justified. In the meantime, a conceptual guide for uncanny motion and sound in virtual characters may be beneficial in aiding computer game developers to manipulate the degree of uncanniness.
crOss-MODAL MIsMAtcH For androids, if a human-like appearance causes us to evaluate an android’s behaviour from a human standard, we are more likely to be aware of disparities from human norms (MacDorman & Ishiguro, 2006; Matsui, Minato, MacDorman, & Ishiguro, H., 2005; Minato et al., 2004). Ho et
219
Uncanny Speech
al. (2008) observed that a robot is eeriest when a human-like appearance creates an expectation of a human form when non human-like elements fail to deliver to expectations. Also, a mismatch in the human-likeness of different features for a robot, for example, a nonhuman-like skin texture combined with human-like hair and teeth, elicited an uncanny sensation for the viewer. With regards to virtual characters it has been suggested that a high graphical fidelity for realistic human-like characters raises expectations for the character’s behavioural fidelity (Bailenson et al., 2005; Brenton et al., 2005; Vinayagamoorthy et al., 2005). Any discrepancies from the humannorm with how a character spoke or moved would appear odd. For humanoid or anthropomorphic characters with a lower fidelity of human-likeness (for example, Mario or Sonic the Hedgehog), differences from the human-norm would be more acceptable to the viewer: Expectations are lowered based on the more stylized and iconic appearance for that character. Despite seemingly strange behaviour with jerky movements or a less than human-like voice, the viewer will still develop a positive affinity with the character. Empirical evidence implies that humanoid and anthropomorphic type characters do escape the valley dip as Mori predicted, being placed before the first peak in the valley (Tinwell, 2009; Tinwell & Grimshaw, 2009). Evidence shows that for virtual characters (and robots) a perceived mismatch in the humanlikeness for a character’s voice based on that character’s appearance exaggerates the uncanny. As part of the Uncanny Modality survey (Tinwell & Grimshaw, 2010), 100 participants rated how human-like the character’s voice sounded and how human-like the facial expression appeared using a scale from 1 (nonhuman-like) to 9 (very humanlike). Strong relationships were identified between the uncanny and perceived human-likeness for a character’s voice and facial expression. The less human-like the voice sounded, the more strange the character was regarded to be. Uncanniness
220
also increased for a character the less human-like the facial expression appeared. Laurel (1993) suggests that to achieve harmony, there is an expectation for the sensory modalities of image and sound to have the same resolution. So that there is accord between visual appearance and behaviour for virtual characters we put forward that the degree of fidelity of humanlikeness for a character’s voice should match that character’s appearance, or otherwise risk discord for that character. To avoid the uncanny, attention should be given to the fidelity of human-likeness for a character’s voice in accordance with that character’s appearance. For high fidelity humanlike characters it is expected that that character should have a human-like voice of a resolution that matches their realistic, human-like appearance. However for mechanical-looking robots, a less human-like and more mechanical-sounding voice is preferable. The humanoid robot Robovie was intentionally given a mechanical sounding voice so that it appeared more natural to the viewer (Kanda et al., 2004). A voice that was too human-like may have been regarded as unnatural based on the robot’s appearance, thus exaggerating the uncanny for the robot. To test the Uncanny Valley theory with virtual characters, it has been suggested that it is not necessary to include characters from computer games as the level of realism achieved from gaming environments generated in real-time is less than that achieved for animation and film (Brenton et al., 2005). Some characters created for television and film have been proclaimed as overcoming the Uncanny Valley: In 2008, Plantec hailed the character Emily as finally having done so. Walker, of Image Metrics, states that whilst computer games would benefit from these more realistically rendered faces, it is not yet possible to achieve the same high level of polygon counts for in-game play as achieved for television and film due to technical restrictions: “We can produce Emily-quality animation for games as well, but
Uncanny Speech
it just can’t work in a real-time gaming environment” (as quoted in Ashcraft, 2008). Accordingly, for virtual characters used within computer games that are approaching levels of realism as achieved for the film industry, it may be advisable to reduce the level of human-likeness for a character’s voice to a level that is in keeping with that character’s appearance. Actors’ voices are typically used for realistic, human-like characters’ speech in computer games. Yet, if the level of fidelity for achieving human-like realism for computer games is less than that achieved for film, a less than human-like voice should be used to avoid the character being perceived as unnatural. Hug (2011) makes a similar point when discussing the similarities between indie game and animation film aesthetics. Hug describes an affinity between sound used in animation film or cartoons matches and the aesthetic style for the animation: “[S]ounds that are more or less de-naturalized in a comical, playful, or surreal way, which is characterized by a subservsive interpretation of sound-source associations”. He further uses the example of an explosion that occurs within the arcade game Grey Matter (McMillen, Refenes, & Baranowsky, 2008) as an intriguing case of “cartoonish” sound design “when an abstract dot hits a flying cartoon brain, the latter ‘explodes’ with sounds of broken glass”. Although a more cartoonish style of sound is used for the explosion, the sound seems more in keeping with the stylized appearance of the object to which the sound belongs. The visceral sounds of the impact are still evident despite the more simplistic nature of the sound. The acoustics appear more natural as the level of detail appears to match the stylized aestheticism of the film’s environment. Of course we do not suggest that cartoon-like voices be used with characters that are approaching believable realism in computer games, however the level of human-likeness may be subtly modified so that the perceived style of the voice sound matches the aesthetic appearance of the character. This absurd juxtaposition may be necessary to
reduce the uncanny for computer game characters due to the fact that they will always be playing catch up to the level of realism achieved for film. Refinements made to character’s voices over a spectrum of human-likeness ranging from humanlike to mechanical, may perhaps help to remove the uncanny where it was previously evident. Reiter notes that recently, more attention has been given to the quality of sound in computer games to keep up with the quality of realism achieved visually for in-game play and to provide a more cinematic experience. As a method of communication both diegetic and non-diegetic game sound enhances a game’s plausibility in that sound can “trigger emotions and provide additional information otherwise hard to convey” (Reiter, 2011). Distinctions made as to the quality of game sound are not simply due to the level of clarity, resolution, or digital output achievable for sound: “Perceived quality in game audio is not a question of audio quality alone” (Reiter, 2011). For speech, textures, emotive qualities and delivery style are attributes that contribute to the perceived quality and overall believability for a character. (Qualities of speech and the uncanny are discussed further in the following section.) Quality of speech is critical in portraying the emotive context of a character convincingly. However with regards to the uncanny, if the perceived realism and quality for a voice goes beyond that of the quality and realism for a character’s appearance, such a cross-modal mismatch could exaggerate the uncanny. Further experiments are required to test this theory. Building on the premise of Hanson’s (2006) experiment where the uncanny was removed from a morphed sequence of images from robot to human by making a robot’s features more “cartoonish” and friendly, similar changes could be made to the acoustics of speech for videos of realistic, human-like characters. Whilst the videos of characters would remain constant, the speech sound would be changed across a spectrum of human-likeness from mechanical to human-like. If our predictions are correct, char-
221
Uncanny Speech
acters will be perceived as more strange when the speech sounds too mechanical or too human-like in relation to the fidelity of human-likeness for a character’s appearance. A character may appear more natural and be perceived as more familiar once the fidelity of human-likeness for speech is adjusted to be regarded as matching that of a character’s appearance.
QUALItIEs OF sPEEcH Bizarre qualities and textures of speech served to gratify the pleasure humans sought in frightening themselves with early horror film talkies, for example the monster in Browning’s (1931) film Dracula. Some cinematic theorists argue that the success of films such as Dracula was due to an uncanny modality that occurred during the transition between silent to sound cinema (Spadoni, 2000, p. 2). Sounds that may have been perceived as unreal or strange due to technical restrictions of sound recording and production at the time were used to the advantage of the character Dracula. For early sound film, to produce the most intelligible dialogue for the viewer, the recording process required that words were pronounced slowly, emphasizing every “syl-la-ble” (Spadoni, 2000, p. 15). However, whilst words could be easily interpreted by the viewer, this impeded delivery style made the speech sound unnatural and unreal. Delivery of speech style also influenced how strange Dracula was perceived to be. In the role of Dracula, the acoustics of Bela Lugosi’s speech set the standard for what the “voice of horror” should be (Spadoni, 2000, pp. 63-70). The weird textures of Bela Lugosi’s voice were manipulated to create a greater conceptual peculiarity for the viewer, thus setting the eponymous character apart from other horror films. The distinctive vocal tone and pronunciation of Dracula’s speech were characteristics that critics acclaimed as the most shocking and chilling; “slow painstaking voices pronouncing each syllable at
222
a time like those of radio announcers filled the theatre” (p. 64). As Tinwell and Grimshaw state, paraphrasing Spadoni, (2010) the unique textures and delivery style for Dracula’s speech increased the uncanny for Dracula: Dracula’s voice, the ethereal voice of the undead, is compared to the voice of reason and materiality that is Van Helsing’s. In the former, the uncanny is marked by uneven and slow pronunciation, staggered rhythm and a foreign (that is, not English) accent and all this produces a disconnect between body and speech. Van Helsing’s speech, by contrast, is the embodiment of corporiality; authoritative, clearly enunciated and rational in its delivery and meaning. For zombie characters in computer games, comparisons have been made with horror film talkies as to the methods used to create and modify sound to induce an ambience of fear (Brenton et al., 2005; Perron, 2004; Roux-Girard, 2011; Toprac & Abdel-Meguid, 2011). Results from the UM study by Tinwell and Grimshaw (2009) to define cross-modal influences of image and sound and the uncanny in virtual characters show that particular qualities of speech (similar to those observed for early horror talkies) can exaggerate how uncanny a virtual character is perceived to be. Thirteen video clips of one human and twelve virtual characters in different settings and engaged in different activities were presented to 100 participants. The twelve virtual characters consisted of six realistic, human-like characters: (1) the Emily Project (2008) and (2) the Warrior (2008) both by Image Metrics; (3) Mary Smith from The Casting (Quantic Dream, 2006); (4) Alex Shepherd from Silent Hill Homecoming (Konami, 2008) and two avatars (5) Louis and (6) Francis from Left 4 Dead (Valve, 2008); four zombie characters, (7) a Smoker, (8) The Infected, (9) The Tank and (10) The Witch from Left 4 Dead; (11) a stylised, human-like Chatbot character “Lillien” (Daden Ltd, 2006); (12) a realistic, human-like
Uncanny Speech
zombie (Zombie 1) from the computer game Alone in the Dark (Atari Interactive, Inc, 2009) and (13) a human. Table 1 shows the median ratings for a character’s strangeness and for the speech qualities: whether the speech seemed (a) slow, (b) monotone, (c) of the wrong intonation, (d) if the speech did not appear to belong to a character, or (e) none of the above. Characters with the same median value for strangeness were grouped together and the median values for speech qualities were then calculated for those characters or groups. (Median values were used to indicate a central tendency for results, to help establish a clear overall picture of the vital relationships over multiple qualities of speech.) The results implied that, slowness of speech, an incorrect intonation, and pitch and how monotone the voice sounded increased uncanniness. A strong indirect relationship was identified between individual ratings for the variables “the speech intonation sounds incorrect” and “the voice belongs to the character”. This implies that if the intonation for a character’s voice is in keeping with what the viewer may have expected, this characteristic may contribute to the overall believability for that character. The two zombies the Witch and the Tank, from the computer games Left 4 Dead (Valve, 2008), were regarded as the most uncanny with a median strangeness rating
of just 2 (see Table 1). However it seems the unintelligible hisses and snarls from the Tank were regarded as sounds that this character was likely to make based on the Tank’s appearance and how he behaved. Likewise the inhuman cries and screeches from the Witch matched her seemingly pathetic and wretched appearance. Such sounds enhanced the believability of these characters as they were in keeping with their nonhuman-like appearance. The findings from the UM study provide empirical evidence to support the claims made by MacDorman (as quoted in Gouskos, 2006) that Mary Smith’s speech was one of the main contributing factors as to why she was perceived as uncanny. Twenty percent of participants observed a lack of correct pitch and intonation for Mary Smith’s speech. This implies that the pitch and tone for her voice may not have matched the facial expression exhibited by this character. The emotive qualities of speech may have seemed either inappropriate or out of context with how this character appeared to look and behave. The facial expression may not have matched nor accurately conveyed the emotive qualities of her voice. Attributes such as these raised doubts as to whether the voice actually belonged to this character or not, thus increasing the sense of perceived eeriness for this character.
Table 1. Median ratings for speech qualities for those characters or groups with the same median strangeness value. (Tinwell & Grimshaw, 2010). Note. Judgements for strangeness were made on 9-point scales (1 = very strange, 9 = very familiar) Median Strangeness for Character or Group
Slow
Monotone
Wrong intonation
Belongs
None
The Tank, The Witch, (Mdn = 2)
10
9.5
23.5
56.5
16.5
The Infected, The Smoker, Zombie 1, Chatbot, (Mdn = 3)
24
21.5
40
42
8.5
Mary Smith, (Mdn = 4)
8
3
20
20
8
The Warrior, Alex Shepherd, (Mdn = 6)
14
17
17
62.5
7.5
Louis, Francis, (Mdn = 7)
2.5
3.5
6.5
79.5
4.5
Emily, (Mdn = 8)
2
0
2
87
6
Human, (Mdn = 9)
1
15
4
72
6
223
Uncanny Speech
As well as being regarded of the wrong pitch, speech that is delivered in a slow, monotone way increased the uncanny for both zombie characters and human-like characters not intended to contest a sense of the real. Within the UM study, the Chatbot character received a less than average rating for perceived familiarity and was placed with three other zombie characters with a median strangeness value of just three (see Table 1). The Chatbot’s voice was rated individually as being slow (75%), monotone (59%), and of an incorrect intonation (76%). The “speech” for Zombie 1, grouped with the Chatbot character with a median strangeness value of three, was also judged individually as being monotone (29%), slow (42%), and of an incorrect intonation (34%). Including such qualities of speech for the zombie may have been a conscious design decision by developers to increase the perceived eeriness for a character intended to elicit an uncanny sensation. (As mentioned above, such qualities enhanced the overall impact for the monster Dracula.) However the crippled speech style for the Chatbot appeared unnatural and unreal. Such qualities for this character’s speech were factors that viewers found most annoying and irritating, exaggerating the uncanny for this character when perhaps this was not intended. Our results imply that uncanniness is increased if speech is judged to be of the wrong pitch, too monotone, or slow in delivery style. Whilst such qualities can work to the advantage of antipathetic characters by increasing the fear factor, these qualities may work against empathetic characters in the role of hero or protagonist within a game. A designer may wish the player to have a positive affiliation with the protagonist character, yet the designer may unwittingly create an uncanny sensation for the player with speech qualities that sound strange to the viewer. Speech prerecorded in a manner that is too slow or monotone to aid clarity for post-production purposes may be judged as unnatural and should be instead recorded at an appropriate tempo. Pitch and tone of speech that do not match the facial expression or given
224
circumstance for a character may be regarded as out of context and confusing for a viewer. To avoid the uncanny, attention should be given to ensuring that the pitch of voice accurately depicts the given emotion for a character and, once speech has been recorded at the correct pitch, that the facial expression conveys that emotion convincingly.
LIP-sYNcHrONIZAtION VOcALIZAtION The process of matching lip movement to speech is an integral factor in maintaining believability for an onscreen character (Atkinson, 2009). For first-person shooters (FPS) and other similar types of action game, there are limited periods during gameplay when attention is focused solely on a headshot of a speaking character. Close up shots of a player’s character, comrades or antagonists are predominantly used when exchanging information during gameplay or during cinematic cut scenes and trailers. The music genre of computer games provides an outlet for musicians to promote and sell their work (Kendall, 2009; Ripken, 2009). As well as FPS games, music games can a provide challenge for developers with regards to facial animation and sound. The Beatles: Rock Band (EA Games, 2009) highlights the recent success of the merger of music and computer games that use realistic, human-like characters to represent music artists. It has been found, however, that uncanny traits can leave viewers dissatisfied with particular characters within the context of a computer game (Tinwell, 2009). With emphasis directed at a character’s mouth as the vocals are matched to the music tracks, it is important that an artist’s identity be transferred effectively within this new medium (Ripken, 2009). Factors such as asynchrony may result in a negative impact on the overall believability for such characters. This section discusses the outcomes of a lack of synchrony for lip-vocalization narration in film
Uncanny Speech
and television and the corresponding implications for characters in computer games.
Lip syncing for television and Film The process of a viewer accepting that sound and image occur simultaneously from one given source is referred to as synchresis (Chion, 1994) or synchrony (Anderson, 1996).1 For early sound cinema, various methods of sound recording and post production techniques were applied before a viewer no longer doubted that a voice actually belonged to a figure onscreen. A perceived lack of synchronization between image and sound has been equated with much of the uncanny sensation evoked by films within the horror genre in early sound cinema (Spadoni, 2000, pp. 58-60). Errors in synchrony evoked the uncanny for a scene in Browning’s Dracula (1931). As a figure’s lips remained still, human laughter resonated within the scene. With no given body or source, the laughter is regarded as an eerie, disembodied sound. Whilst technology allows for some improvement with cinema speakers, televisions and personal computers, most sound is still delivered through some mechanism that is physically disjunct from the onscreen image (for example, via headphones or separate speakers). Tinwell and Grimshaw (2010) note that future technologies may overcome issues with asynchrony within the broadcasting industry: “Presumably, there will be no need for such perceptual deceit once flat-panel speakers with accurate point-source technology provide simultaneously a visual display” (p. 7). For human figures in television and film, viewers are more sensitive to an asynchrony of lip movement with speech than for visual information presented with music (Vatakis & Spence, 2005). Viewers are also more sensitive to asynchrony when sound precedes video and less so when sound lags behind video (Grant et al., 2004). Grant et al. found that for continuous streams of audio-visual speech presented onscreen, detectable asynchrony occurred at 50ms when sound preceded video,
with a smaller window of acceptable asynchrony for when sound lagged behind video at 220ms. Standards set by the television broadcasting industry require that the audio stream should not precede the video stream by more than 45ms and that the audio stream should not lag behind the video stream by more than 125ms (ITU-R, 1998). An asynchrony for speech with lip movement can lead to one misinterpreting what has been said: the McGurk Effect (1976). As a viewer, one can interpret what has been heard by what has been seen. Depending on which modality one’s attention may be drawn to for audio-visual speech (and depending on which syllable is used), the pronunciation of a visual syllable can take precedence over the auditory syllable. Conversely a sound syllable can take precedence over the visual syllable. Alternatively, as one comprehends the visual articulatory process of speech both automatically and subconsciously, one can combine the sound and visual syllable information to create a new syllable. For example, a visual “ga” coinciding with the sound “ba” can be interpreted as a “da” sound. (This type of effect was observed by MacDorman (2006) for the character Mary Smith’s speech, who was criticized for being uncanny.) A viewer’s overall enjoyment of a television programme can be disrupted if delays occur between transmission devices for video and audio signals. To prevent confusion or irritation for the viewer, sub-titles are often preferred to dubbing of speech for foreign works. (Hassanpour, 2009). Errors in the synchronization of lip movements with voice for figures onscreen (lip sync error) can result in different responses from the viewer depending upon the context within which the errors are portrayed. A study by Reeves and Voelker (1993) found that not only is lip sync error potentially stressful for the television viewer, but it can also lead to a dislike for a particular program and viewers evaluating the people displayed on the screen more negatively and as “less interesting, more unpleasant, less influential, more agitated, more confusing, and less successful” (p. 4). On the
225
Uncanny Speech
contrary, lip sync error has also been deliberately used to provoke a humorous affect for the viewer where the absurd is regarded as comical as opposed to annoying. For example, the intentionally bad dubbing for characters in “Chock-Socky” movies (Tinwell & Grimshaw, 2010).
Lip syncing for computer Games With increasing technological sophistication in the creation of realism in computer games, textbased communication systems have been replaced with virtual characters using actors’ voices. To create full voice-overs for characters, automated lip-syncing tools extract phoneme sounds from prerecorded lines of speech. The visual representation (viseme) for a particular sound is retrieved from a database of predetermined mouth shapes. Muscles within the mouth area for a 3D character are modified to create a particular mouth shape for each phoneme. Interpolated motion is inserted between the next phoneme and associated mouth shape to enable contingency of lip movement for words within a given sentence. For example, a specific mouth shape can be selected for the sound “sh” to be used in conjunction with other sounds within a word or line of speech. Full voice-overs for characters were generated for titles developed by Valve such as Left 4 Dead (2008) and Half Life 2 (2008) using this technique. A phoneme extractor tool within Faceposer allowed for the detection and extraction of phoneme sounds from prerecorded speech to be synchronized with a character’s lips. Whilst research has been undertaken to improve the motion quality of real-time data driven approaches for realistic visual speech synthesis (Cao, Faloustsos, Kohler, & Pighin, 2004), prior to the UM study (Tinwell & Grimshaw, 2010) there have been no attempts to investigate what impact lip-synchronization may have on viewer perception and the uncanny in virtual characters. Videos of 13 virtual characters ranging from humanoid to human were rated by 100 partici-
226
pants as to how uncanny and how synchronized speech with lip movement was perceived to be. (A full description of the stimuli used in the experiment is provided in the third section.) The results revealed a strong relationship between how uncanny a character was perceived to be and a lack of synchronization between lip movement and speech: those characters with disparities in synchronization were perceived as less familiar and more strange than those characters rated as close to perfect lip-synchronization. Synchronization problems with the recorded voice for early sound cinema heightened a viewer’s awareness that the figure was not real and was simply a manufactured artifact (Spadoni, 2000, p. 34). A viewer was reminded that figures onscreen were merely fabricated objects created within a production studio. The uncanny was increased as figures were perceived as, “a reassembly of a figure” easily disassembled within a movie theatre (Spadoni, 2000, p. 19). The results from the UM study (Tinwell & Grimshaw, 2010) imply that the implications of asynchrony for speech and the uncanny for human figures within the classic horror cycle of Hollywood film also apply to virtual characters intended for computer games. The zombie characters the Witch and the Tank from the computer game Left 4 Dead (2008), received less than average scores for perceived lip-synchronization. The jerky, haphazard movement of the Witch’s lips appeared disparate from the high-pitched cries and shrieks spewed out by this character. As the Witch proceeded to attack, her presence seemed evermore overwhelming as sounds appeared to emulate from an incorporeal and uncontrollable being in a similar manner to Dracula’s laughter noted earlier. Similarly, participants seemed somewhat confused by the chaotic movement and irregular sounds generated by the Tank character making the viewer feel panicked and uncomfortable. The stimuli for this study were presented in different settings and as different actions. Some were presented as talking heads, for example the
Uncanny Speech
Chatbot character, whilst others moved around the screen, for example the Tank and the Witch. A further study is required to determine the actual causality of lip-synchronization as a significant contributor towards the uncanny when not associated with other factors of facial animation and sound. Thus, we intend a further experiment to test the hypothesis: Uncanniness increases with increasing perceptions of lack of synchronization between the character’s lips and the character’s sound. At present there are no standards set for acceptable levels of asynchrony for computer games as there are for television. It may well be that these acceptable levels are the same across the two media but it might equally be the case that the interactive nature of computer games and the use of different reproduction technologies and paradigms propose a different standard. For example, perhaps it is the case that current technological limitations in automated lip-syncing tools require a smaller window of acceptable asynchrony for computer games than previously established for television. We hope the future experiment noted above will also ascertain if viewers are more sensitive to an asynchrony of speech for virtual characters where the audio stream precedes video (as has been previously identified for the television broadcasting industry).
ArtIcULAtION OF sPEEcH Hundreds of individual muscles contribute to the generation of complex facial expressions and speech. As one of the most complex muscular regions of the human body, and with increased realism for characters, generating realistic animation for mouth movement and speech is a challenge for designers (Cao et al., 2004; Plantec, 2007). Even though the dynamics of each of these muscles is well understood, their combined effect is very difficult to simulate precisely. Whilst motion capture allows for the recording of high fidelity facial
animation and expression, this technique is mostly useful for FMV. Recorded motions are difficult to modify once transferred to a three-dimensional model and the digital representation of the mouth remains an area requiring further modification. Editing motion capture data often involves careful key-framing by a talented animator. A developer may edit individual frames of existing motion capture data for prerecorded trailers and cut scenes yet, for computer games, most visual material is generated in real-time during gameplay. For ingame play, automatic simulation of the muscles within and surrounding the mouth is necessary to match mouth movement with speech. Motion capture by itself cannot be used for automated facial animation. To create automatic visual simulation of mouth movement with speech, computer game engines require a set of visemes as the visual representation for each phoneme sound. Faceposer (Valve, 2008) uses the phoneme classes phonemes, phonemes strong, and phonemes weak with a corresponding viseme to represent each syllable within the International Phonetic Alphabet (IPA). Prerecorded speech is imported into a phoneme extractor tool that extracts the most appropriate phoneme (and corresponding viseme) for recognized syllables. Editing tools allow for the creation of new phoneme classes, or to modify the mouth shape for an existing viseme. The UM study (Tinwell & Grimshaw, 2010) identified a strong relationship between how uncanny a character was perceived to be with a perceived exaggeration of facial expression for the mouth. The results implied that those characters perceived to have an over-exaggeration of mouth movement were regarded as more strange. Thus, uncanniness increases with increasing exaggeration of articulation of the mouth during speech. Finer adjustments to mouth shapes using tools such as Faceposer may prevent a perceived overexaggeration of articulation of speech, yet such adjustments are time consuming for the developer. If no original visual footage is available for speech,
227
Uncanny Speech
judgements made to correct mouth shapes that appear too strong or too weak are likely to be based on the subjective opinion of an individual developer. Even then, the developer is still constrained by the number of mouth and facial muscles available to modify within the 3D model, which may not include an exhaustive depiction of every single muscle used in human speech. To avoid the uncanny, working with the range of mouth shapes and facial expression that current technology allows for within tools such as Faceposer, the developer should at least avoid an articulation of speech that may appear overexaggerated. The mouth shape for the phoneme used to pronounce the word “no” (“n” in Faceposer) may be applicable if the word is pronounced in a strong, authoritative way, but would appear overdone and out of context if the same word was used to provide reassurance in a calming and less domineering manner. Indeed, if the developer wishes to create an uncanny sensation for a zombie character, adjusting mouth shapes so that articulation of speech appears over exaggerated may enhance the fear factor for such characters by increasing perceived strangeness. In the same way that a snarling dog or ferocious beast may raise the corners of their mouths to show their teeth in an aggressive way, viewers may be made to feel uncomfortable by overstated mouth movements that suggest a possible threat.
sUMMArY AND cONcLUsION In summary, attributes of speech that may exaggerate the uncanny for realistic, human-like characters in computer games are: 1.
2. 3.
228
A level of human-likeness for a character’s speech that does not match the fidelity of human-likeness for a character’s appearance An asynchrony of speech with lip movement Speech that is of an incorrect pitch or tone.
4. 5.
Speech delivery that is perceived as slow, monotone, or of the wrong tempo An over-exaggeration of articulation of the mouth during speech.
Whilst such characteristics of speech may adorn the spine tingling sensation associated with the uncanny for antipathetic characters in the horror genre of games, a developer may risk the uncanny if such characteristics exist for empathetic characters. The protagonist Mary Smith, as featured in the tech demo for the adventure game Heavy Rain (2006), may have been intended to evoke affinity and sympathy from the audience. Instead, Mary Smith was regarded as strange and abnormal: Uncanny speech for this character contributed to just such a negative response from the audience. The speech was not only judged as lacking synchronization with lip movement but an inaccurate pitch and lack of human-likeness raised doubt as to whether the voice actually belonged to the character or not. Attributes such as these reduced the overall believability for Mary Smith. However, for zombies such as the Tank and the Witch from the survival horror game Left 4 Dead (Valve, 2008), uncanny speech increased (in a desired manner) how strange and freakish these characters were perceived to be. The outcomes from this investigation show that the majority of characteristics for uncanny speech in computer games may be induced by current technological limitations in the production, reproduction, and control of virtual characters. Restrictions as to the range of facial muscles available to manipulate in automated facial animation tools used to generate footage in real-time is a current constraint for achieving realism in computer games comparable to film. It seems there is a lack of an exhaustive range of mouth shapes to fully represent each phoneme sound and variation of interpolation between syllables in a range of different contexts. Such constraints may contribute to a perceived asynchrony of speech and mouth
Uncanny Speech
shapes being used for syllables that do not accurately convey the prosody or context of speech. Computer games may always be playing catchup with the levels of anatomical fidelity achieved in film for facial animation, however developments in procedural game audio and animation may provide a solution for uncanny speech. As Hug states, the future of sound in computer games is moving towards procedural sound techniques that allow for the generation of bespoke sounds, to create a more realistic interpretation of life within the 3D environment. For in-game play dynamic sound generation techniques, “such as physical modelling, modal synthesis, granulation and others, and meta forms like Interactive XMF” will create sounds in real-time responding to both user input and the timing, position, and condition of objects within gameplay (Hug, 2011). Using procedural audio (speech synthesis in this case), a given line of speech may be generated over a differing range of tempos using a delivery style appropriate for the given circumstance. For example, the sentence “I don’t think so” may be said in a slow, controlled manner, if carefully contemplating the answer to a question. In contrast, a fast-paced tone may be used if intended as a satirical plosive when at risk of being struck by an antagonist. Procedural animation techniques for the mouth area may also allow for a more accurate depiction of articulation of mouth movement during speech. Building on the existing body of research into real-time, data-driven, procedural generation techniques for motion and sound (for example, Cao et al., 2004; Farnell, 2011; Mullan, 2011), a tool might be developed that combines techniques for the procedural generation of emotive speech in response to player input (actions or psychophysiology) (Nacke & Grimshaw, 2011) or game state. Interactive conversational agents in computer games or within a wider context of user interfaces may appear less uncanny if the tempo, pitch, and delivery style for their speech varies in response to the input from the person
interacting with the interface. Such a tool will aid in fine-tuning the qualities of speech that will, depending on the desired situation, reduce or enhance uncanny speech.
rEFErENcEs Alone in the dark [Computer game]. (2009). Eden Games (Developer). New York: Atari Interactive, Inc. Anderson, J. D. (1996). The reality of illusion: An ecological approach to cognitive film theory. Carbondale, IL: Southern Illinois University Press. Ashcraft, B. (2008) How gaming is surpassing the Uncanny Valley. Kotaku. Retrieved April 7, 2009, from http://kotaku.com/5070250/how-gaming-issurpassing-uncanny-valley. Atkinson, D. (2009). Lip sync (lip synchronization animation). Retrieved July 29, 2009, from http://minyos.its.rmit.edu.au/aim/a_notes/ anim_lipsync.html. Bailenson, J. N., Swinth, K. R., Hoyt, C. L., Persky, S., Dimov, A., & Blascovich, J. (2005). The independent and interactive effects of embodied-agent appearance and behavior on self-report, cognitive, and behavioral markers of copresence in immersive virtual environments. Presence (Cambridge, Mass.), 14(4), 379–393. doi:10.1162/105474605774785235 Ballas, J. A. (1994). Delivery of information through sound . In Kramer, G. (Ed.), Auditory display: Sonification, audification, and auditory interfaces (pp. 79–94). Reading, MA: AddisonWesley. Bartneck, C., Kanda, T., Ishiguro, H., & Hagita, N. (2009). My robotic doppelganger—A critical look at the Uncanny Valley theory. In Proceedings of the 18th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN2009, 269-276.
229
Uncanny Speech
Brenton, H., Gillies, M., Ballin, D., & Chatting, D. J. (2005, September 5). The Uncanny Valley: Does it exist? Paper presented at the HCI 2005, Animated Characters Interaction Workshop, Napier University, Edinburgh, UK.
Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Browning, T. (Producer/Director). (1931). Dracula [Motion picture]. England: Universal Pictures.
Ferber, D. (2003, September) The man who mistook his girlfriend for a robot. Popular Science. Retrieved April 7, 2009, from http://iiae.utdallas. edu/news/pop_science.html.
Busso, C., & Narayanan, S. S. (2006). Interplay between linguistic and affective goals in facial expression during emotional utterances. In Proceedings of 7th International Seminar on Speech Production, 549-556. Calleja, G. (2007). Revising immersion: A conceptual model for the analysis of digital game involvement. In Proceedings of Situated Play, DiGRA 2007 Conference, 83-90. Cao, Y., Faloustsos, P., Kohler, E., & Pighin, F. (2004). Real-time speech motion synthesis from recorded motions. In R. Boulic & D. K. Pai (Eds.), Eurographics/ACM SIGGRAPH Symposium on Computer Animation (2004), 345-353. Chion, M. (1994). Audio-vision: Sound on screen (Gorbman, C., Trans.). New York: Columbia University Press. Edworthy, J., Loxley, S., & Dennis, I. (1991). Improving auditory warning design: Relationship between warning sound parameters and perceived urgency. Human Factors, 33(2), 205–231. Ekman, I., & Kajastila, R. (2009, February 11-13). Localisation cues affect emotional judgements: Results from a user study on scary sound. Paper presented at the AES 35th International Conference, London, UK. (2008). Emily Project. Santa Monica, CA: Image Metrics, Ltd. (2008). Faceposer [Facial Animation Tool as Part of Source SDK]. Bellevue, WA: Valve Corporation.
230
Freud, S. (1919). The Uncanny . In The standard edition of the complete psychological works of Sigmund Freud (Vol. 17, pp. 219–256). London: Hogarth Press. Gaver, W. W. (1993). What in the world do we hear? An ecological approach to auditory perception. Ecological Psychology, 5(1), 1–29. doi:10.1207/ s15326969eco0501_1 Gouskos, C. (2006). The depths of the Uncanny Valley. Gamespot. Retrieved April 7, 2009, from, http://uk.gamespot.com/features/6153667/index. html. Grant, W., Wassenhove, V., & Poeppel, D. (2004). Detection of auditory (cross-spectral) and auditory-visual (cross-modal) synchrony. Speech Communication, 44(1/4), 43–53. doi:10.1016/j. specom.2004.06.004 Green, R. D., MacDorman, K. F., Ho, C. C., & Vasudevan, S. K. (2008). Sensitivity to the proportions of faces that vary in human likeness. Computers in Human Behavior, 24(5), 2456–2474. doi:10.1016/j.chb.2008.02.019 Grey Matter [INDIE arcade game]. (2008). McMillen, E., Refenes, T., & Baranowsky, D. (Developers). San Francisco, CA: Kongregate. Grimshaw, M. (2008a). The acoustic ecology of the first-person shooter: The player experience of sound in the first-person shooter computer game. Saarbrücken, Germany: VDM Verlag Dr. Mueller.
Uncanny Speech
Grimshaw, M. (2008b). Sound and immersion in the first-person shooter. International Journal of Intelligent Games & Simulation, 5(1).
Jentsch, E. (1906). On the psychology of the Uncanny. Psychiat.-neurol. Wschr., 8(195), 21921, 226-7.
Grimshaw, M., Nacke, L., & Lindley, C. A. (2008, October 22-23). Sound and immersion in the first-person shooter: Mixed measurement of the player’s sonic experience. Paper presented at Audio Mostly 2008, Piteå, Sweden.
Kanda, T., Hirano, T., Eaton, D., & Ishiguro, H. (2004). Interactive robots as social partners and peer tutors for children: A field trial. HumanComputer Interaction, 19(1), 61–84. doi:10.1207/ s15327051hci1901&2_4
Half Life 2. [Computer game]. (2008). Valve Corporation (Developer). Redwood City, CA: EA Games.
Kendall, N. (2009, September 12). Let us play: Games are the future for music. The Times: Playlist, p. 22.
Hanson, D. (2006). Exploring the aesthetic range for humanoid robots. In Proceedings of the ICCS/ CogSci-2006 Long Symposium: Toward Social Mechanisms of Android Science, 16-20.
Laurel, B. (1993). Computers as theatre. New York: Addison-Wesley.
Hassanpour, A. (2009). Dubbing. The Museum of Broadcast Communications. Retrieved July 14, 2009, from, http://www.museum.tv/archives/ etv/D/htmlD/dubbing/dubbing.htm. Ho, C. C., MacDorman, K., & Pramono, Z. A. D. (2008,). Human emotion and the uncanny valley. A GLM, MDS, and ISOMAP analysis of robot video ratings. In Proceedings of the Third ACM/ IEEE International Conference on Human-Robot Interaction, 169-176. Hoeger, L., & Huber, W. (2007). Ghastly multiplication: Fatal Frame II and the videogame Uncanny. In Proceedings of Situated Play, DiGRA 2007 Conference, Tokyo, Japan, 152-156. Hug, D. (2011). New wine in new skins: Sketching the future of game sound design . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. ITU-R BT.1359-1. (1998). Relative timing of sound and vision for broadcasting. Question ITU-R, 35(11).
Left 4 dead [Computer game]. (2008). Valve Corporation (Developer). Redwood City, CA: EA Games. Lillian—A natural language library interface and library 2.0 mash-up. (2006). Birmingham, UK: Daden Limited. MacDorman, K. F. (2006). Subjective ratings of robot video clips for human likeness, familiarity, and eeriness: An exploration of the Uncanny Valley. ICCS/CogSci-2006 Long Symposium: Toward Social Mechanisms of Android Science. MacDorman, K. F., Green, R. D., Ho, C. C., & Koch, C. T. (2009). Too real for comfort? Uncanny responses to computer generated faces. Computers in Human Behavior, 25, 695–710. doi:10.1016/j. chb.2008.12.026 MacDorman, K. F., & Ishiguro, H. (2006). The uncanny advantage of using androids in cognitive and social science research. Interaction Studies: Social Behaviour and Communication in Biological and Artificial Systems, 7(3), 297–337. doi:10.1075/is.7.3.03mac Matsui, D., Minato, T., MacDorman, K. F., & Ishiguro, H. (2005). Generating natural motion in an android by mapping human motion. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 1089-1096.
231
Uncanny Speech
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5568), 746–748. doi:10.1038/264746a0 McMahan, A. (2003). Immersion, engagement, and presence: A new method for analyzing 3-D video games . In Wolf, M. J. P., & Perron, B. (Eds.), The video game theory reader (pp. 67–87). New York: Routledge. Minato, T., Shimda, M., Ishiguro, H., & Itakura, S. (2004). Development of an android robot for studying human-robot interaction. In R. Orchard, C. Yang & M. Ali (Eds.), Innovations in applied artificial intelligence, 424-434. Mori, M. (1970/2005). The Uncanny Valley. In K. F. MacDormand & T. Minato (Trans.) . Energy, 7(4), 33–35. Mullan, E. (2011). Physical modelling for sound synthesis . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Nacke, L., & Grimshaw, M. (2011). Player-game interaction through affective sound . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Perron, B. (2004, September 14-16). Sign of a threat: The effects of warning systems in survival horror games. Paper presented at COSIGN 2004, University of Split, Croatia. Plantec, P. (2007). Crossing the Great Uncanny Valley. In Animation World Network. Retrieved August 21, 2010, from http://www.awn.com/articles/production/crossing-great-uncanny-valley/ page/1%2C1. Plantec, P. (2008). Image Metrics attempts to leap the Uncanny Valley. In The Digital Eye. Retrieved April 6, 2009, from http://vfxworld.com/?atype= articles&id=3723&page=1.
232
Pollick, F. E. (in press). In search of the Uncanny Valley . In Grammer, K., & Juett, A. (Eds.), Analog communication: Evolution, brain mechanisms, dynamics, simulation. Cambridge, MA: MIT Press. Reeves, B., & Voelker, D. (1993). Effects of audiovideo asynchrony on viewer’s memory, evaluation of content and detection ability. (Research Report prepared for Pixel Instruments, CA). Palo Alto, CA: Standford University, Department of Communication. Reiter, U. (2011). Perceived quality in game audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Richards, J. (2008, August 18). Lifelike animation heralds new era for computer games. The Times Online. Retrieved April 7, 2009, from, http://technology.timesonline.co.uk/tol/news/ tech_and_web/article4557935.ece. Ripken, J. (2009, October 19). Game synchronisation: A view from artist development. Paper presented at the Music and Creative Industries Conference 2009, Manchester, UK. Roux-Girard, G. (2011). Listening to fear: A study of sound in horror computer games . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Schafer, R. M. (1994). The soundscape: Our sonic environment and the tuning of the world. Rochester, VT: Destiny Books. Schneider, E., Wang, Y., & Yang, S. (2007). Exploring the Uncanny Valley with Japanese video game characters. In Proceedings of Situated Play, DiGRA 2007 Conference, 546-549.
Uncanny Speech
Seyama, J., & Nagayama, R. S. (2007). The uncanny valley: The effect of realism on the impression of artificial human faces. Presence (Cambridge, Mass.), 16(4), 337–351. doi:10.1162/ pres.16.4.337
Vatakis, A., & Spence, C. (2006). Audiovisual synchrony perception for speech and music using a temporal order judgment task. Neuroscience Letters, 393, 40–44. doi:10.1016/j.neulet.2005.09.032
Silent hill homecoming [Computer game]. (2008). Double Helix & Konami (Developer/Co-Developer). Tokyo, Japan: Konami.
Vinayagamoorthy, V., Steed, A., & Slater, M. (2005). Building characters: Lessons drawn from virtual environments. In Proceedings of Toward social mechanisms of android science, COGSCI 200, 119-126.
Spadoni, R. (2000). Uncanny bodies. Berkeley: University of California Press. Steckenfinger, A., & Ghazanfar, A. (2009). Monkey behavior falls into the uncanny valley. Proceedings of the National Academy of Sciences of the United States of America, 106(43), 18362–18366. doi:10.1073/pnas.0910063106 The Beatles. Rock band [Computer game]. (2009). Harmonix. Redwood City, CA: EA Games. The casting [Technology demonstration]. (2006). Quantic Dream (Developer). Foster City, CA: Sony Computer Entertainment, Inc. Tinwell, A. (2009). The uncanny as usability obstacle. In A. A. Ozok & P. Zaphiris (Eds.), Online Communities and Social Computing workshop, HCI International 2009, 12, 622-631. Tinwell, A., & Grimshaw, M. (2009). Bridging the uncanny: An impossible traverse? In Proceedings of Mindtrek 2009. Tinwell, A., Grimshaw, M., & Williams, A. (2010). Uncanny behaviour in survival horror games. Journal of Gaming and Virtual Worlds, 2(1), 3–25. doi:10.1386/jgvw.2.1.3_1 Toprac, P., & Abdel-Meguid, A. (2011). Causing fear, suspense, and anxiety using sound design in computer games . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey: IGI Global.
Warren, D. H., Welch, R. B., & McCarthy, T. J. (1982). The role of visual-auditory “compellingness” in the ventriloquism effect: Implications for transitivity among the spatial senses. Perception & Psychophysics, 30(6), 557–564. (2008). Warrior Demo. Santa Monica, CA: Image Metrics, Ltd. Weschler, L. (2002). Why is this man smiling? Wired. Retrieved April 7, 2009, from http://www. wired.com/wired/archive/10.06/face.html. Zemekis, R. (Producer/Director). (2004). The polar express [Motion picture]. California: Castle Rock Entertainment. Zemekis, R. (Producer/Director). (2007). Beowulf [Motion picture]. California: ImageMovers.
KEY tErMs AND DEFINItIONs Audio-Visual: An artifact with the components image and sound. Cross-Modal: Interaction between sensory and perceptual modes, in this case, of vision and hearing. Realism: Representation of objects as they may appear in the real world. Uncanny Valley: A theory that as humanlikeness increases, an object will be regarded as less familiar and more strange, evoking a negative effect for the viewer (Mori, 1970).
233
Uncanny Speech
Virtual Character: A digital representation of a figure onscreen. Viseme: A visual representation of a mouth shape for a particular speech utterance such as “k,” “ch” and “sh.” Those with hearing impediments can use visemes to lip read and understand the spoken language when unable to hear sound.
234
ENDNOtE 1
In the field of psychoacoustics, synchrony and synchresis are closely related to the ventriloquism effect.
235
Chapter 12
Emotion, Content, and Context in Sound and Music Stuart Cunningham Glyndŵr University, UK Vic Grout Glyndŵr University, UK Richard Picking Glyndŵr University, UK
AbstrAct Computer game sound is particularly dependent upon the use of both sound artefacts and music. Sound and music are media rich in information. Audio and music processing can be approached from a range of perspectives which may or may not consider the meaning and purpose of this information.Computer music and digital audio are being advanced through investigations into emotion, content analysis, and context, and this chapter attempts to highlight the value of considering the information content present in sound, the context of the user being exposed to the sound, and the emotional reactions and interactions that are possible between the user and game sound. We demonstrate that by analysing the information present within media and considering the applications and purpose of a particular type of information, developers can improve user experiences and reduce overheads while creating more suitable, efficient applications. Some illustrated examples of our research projects that employ these theories are provided. Although the examples of research and development applications are not always examples from computer game sound, they can be related back to computer games. We aim to stimulate the reader’s imagination and thought in these areas, rather than attempt to drive the reader down one particular path.
INtrODUctION Music and sound stimulate one of the five human senses: hearing. Any form of stimulation is subject to psychological interpretation by the individual
and a cause-and-effect relationship occurs. Whilst this relationship is unique to each individual up to a point, it is safe to assume that broad, often shared, experiences occur across multiple listeners. It can be argued that the emotional reaction and response of a listener to a sound or piece of
DOI: 10.4018/978-1-61692-828-5.ch012
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Emotion, Content, and Context in Sound and Music
Figure 1. Idealised Role of Emotion, Content and Context in a Computer Application
music is the single most important event resulting from that experience. The goal of this chapter is to explore the relationship between sound stimuli and human emotion. In particular, this chapter examines the role sound plays in conveying emotional information, even from sources that may be visual in origin. Equally, the chapter seeks to demonstrate how human emotion is able to flip this paradigm and influence music and sound selection, based on emotional state and consideration of the context of the user. The content being represented digitally provides the opportunity to gain a greater understanding of the information present in a data set. Information being stored often has a number of characteristic features and structural elements that can be identified automatically. For example, music generally contains an identifiable structure, which might consist of several movements, parts, or, more commonly, verses and choruses. However, such structure can almost be considered fractal, in that there are microscopic and macroscopic levels of organisation and also repetition, ranging from musical beats, bars, verses and choruses to the level of the song itself.
236
Contextual data provides additional information about factors that contribute to making the user interaction experience much more relevant and effective by acquiring knowledge of the external factors that influence decision making and the emotion of the user. The conceptual diagram of Figure 1 shows an idealised situation in which a large database of audio media is presented to the user through a suitable application (such as a computer game). In this scenario, the user’s emotion and context are analysed and compared against analysis of appropriate media content. This provides selection of the ‘best fit’ media that will further stimulate and engage the user in the most effective way. The chapter explains the fundamentals of emotional stimulation using sounds and music, whilst retaining relevance to the audiologist. We demonstrate that by analysing the information present within media and considering its applications, significant advantages can be gained which improve user experiences, reduce overheads, and aid in the development of more suitable, efficient applications: whether they be computer games or other audio tools.
Emotion, Content, and Context in Sound and Music
EMOtION Emotion is a key factor to consider in computer applications given that almost all applications will have some form of Human Computer Interface (HCI). Humans are emotional beings and the interaction with the machine will have some emotional effect on them to a greater or lesser extent. The computer, therefore, has an ability to invoke an emotional response in the user. The user may bring their own emotions with them to an interactive experience which has been affected by external factors in the environment around them (Dix, Finlay, Abowd, & Beale, 2003). The quality and resultant experience that a user has with a machine is important and this is also true when we consider the frequent interaction that we have with entertainment media and computer games.
Emotion in Multimedia The use of sound in multimedia, and especially in computer games, is commonplace. This is unsurprising if one considers that, in order to successfully engage a human user in an immersive experience, the interaction must be achieved through one of the primary human senses. Speech and hearing are hugely important in our daily lives and allow us as humans to send and receive large amounts of information on an ad-hoc basis. Naturally, it is hearing and the use of sound that we are interested in examining in this chapter. Sound is used in complementing and augmenting other stimuli, especially visual. Consider, for example, the last time you watched a horror movie and were embarrassed by the unintended jump or flinch you experienced at a big bang or crescendo that accompanied the appearance of the bad guy in the movie! Proof, if it were needed, that the constructive use of music and sound can provoke one of the most primal of human emotional instincts; fear. Sound in multimedia environments is classified into two distinct categories (see Jørgensen, 2011 for a fuller analysis of these terms):
•
•
Diegetic. Sound or music that is directly related, or at least perceived to be related, to the environment in which the subject is intended to be immersed. For example, in a movie this could be the sound coming from a television that is in the room pictured on screen. Another example would be the voices of the characters on screen or the sound of a character firing a gun or driving a car. In a nutshell, the subject is able to reasonably identify the source of the sound given the surrounding virtual environment Non-diegetic. These sounds are generally presented to augment or complement the virtual environment but come from sources that the subject cannot identify in the current environment. To go back to the horror movie example again, consider the famous shower scene from Alfred Hitchcock’s classic Psycho from 1960: the screeching, stabbing violin sounds as the character of Marion Crane is stabbed by Norman Bates (dressed as his mother). There is no reason for the watcher of Psycho to believe that there are a collection of violinists in the bathroom with Norman and Marion, rather the music is there to enhance the environment that is presented.
Emotion in computer Games Game players exhibit larger emotional investment in games than in many other forms of digital entertainment, primarily due to the interactive nature of the medium. Jansz (2006) argues that game players often emotionally immerse themselves in games to experience emotional reactions that cannot reasonably be stimulated in the real-world: a sandbox environment for emotional development and experience. This notion will probably be familiar to most readers, as many of us will have deliberately watched a scary movie to try and frighten ourselves and because we enjoy experiencing the sensations and physical responses
237
Emotion, Content, and Context in Sound and Music
of being frightened, provided we are within a controlled environment. Freeman (2004) provides a list of reasons that support the activation of emotion in computer games, citing “art and money” (p. 1) as the principle drivers, although his work focuses mainly on the latter, such as competitive advantages for games development companies, rather than direct benefit to consumers and game players. Nevertheless, as Freeman advocates, this awareness in the industry of the need to integrate emotion further into computer gaming, is evidence of market demand and big business interest in this exciting field. Emotion manifests itself in many ways and there is an identifiable physical symptom in the user. Whilst the studies discussed later concentrate on identifying physical emotional reaction, these have not always been directly linked to the player’s physical interaction with the game. However, research by Sykes and Brown (2003) describes an initial study that deals with investigating not just emotional response or reaction in users but emotional interaction with a game. Sykes and Brown also support the theory that emotional reaction and interaction represent significant potential in being able to adapt and manipulate gaming environments in response to the emotional and affective states of the user. Their investigation dealt with determining if the amount of pressure applied to the buttons of a computer game controller pad correlated with an increased level of difficulty in the game environment. A benefit of using this approach as opposed to galvanic skin response or heart rate monitoring is that those mechanisms can be altered by the environmental changes around the user whereas changes in pressure applied to the game controller are much more likely to have been caused by events occurring in the game. Their results indicated that players did indeed apply greater pressure to the game controller when a greater level of difficulty and concentration was required in the game. Although the study is preliminary and relatively small-scale, the authors’ methods
238
of analysis employ significance testing of the data collected. Ravaja et al. (2005) conducted experiments that attempt to evidence the impact of computer gameplay upon human emotions by employing an array of biometric measurements. This is based upon the generally held theory that emotion is expressed by humans in three forms: “subjective experience (e.g., feeling joyous), expressive behavior (e.g., smiling), and the physiological component (e.g., sympathetic arousal)” (p. 2). Taking this further, the authors make the point that the psychological connection between a player and a computer game exceeds pure emotion and touches cognition where players make assertions and links to the game: believing they are a super-hero or ninja warrior, for example. This work also highlights the issue that, until recently, research into emotional enjoyment and influence has focused upon non-interactive, mass media communication channels, such as television, film and radio. The wide range of measurements used by Ravaja et al. is concise and, as the authors indicate, few other studies have employed such a wide range of metrics when investigating emotional connection with computer gameplay. The authors use electrocardiogram (ECG)/inter-beat intervals (IBI), facial electromyography (EMG) and skin conductance level (SCL) as measurements during their experiments. The experiments showed that reliable results are achieved across a range of subjects in response to significant events in a game scenario (such as success, failure, poor performance and so on). This work provides very strong evidence that subjects exhibit strong, identifiable physical reactions that are typical during emotional arousal when playing with computer games. It supports the argument, made in this chapter, that emotion, through physical disturbance, is a strong method for detecting emotional state and response when interacting with computer games. Broadly speaking, positive and negative game events correlated to positive and negative emotional reactions in players. However, one
Emotion, Content, and Context in Sound and Music
point of note from the study is that the intuitively expected emotional response was not always the one that was encountered in subjects. One criticism of Ravaja et al.’s study is that, although a reasonable sample size was used (36 participants), the gender balance was almost 70% in favour of male participants. Whilst it can be argued that the gaming population is likely to be male in majority, the study could have reflected the situation more accurately. The paper does not attempt to account for this disparity or investigate whether a significant difference was present between the results of the male participants and female participants (see Nacke & Grimshaw, 2011 for indications of gender difference in response to game sound). Although beyond the scope of their paper, the work could have been much strengthened by performing some form of subjective response with subjects on their performance in the game scenarios, thus allowing a more valid conclusion by employing triangulation of quantitative and qualitative methods. This would complement the reliable results attained through their objective measurements. There is no doubt that emotion plays a significant affective role in computer gaming and that it has the potential to be used both as a reactive and interactive device to stimulate users. The emotion elicited in gamers is a function of both the content of the game as well as the context in which the user is placed, further justifying the aims and underlying concept of this chapter: that these three traits are inextricably linked and that further understanding and utilising them must therefore lead to more intense, immersive, and interactive gaming. Conati (2002), for example, considers how probabilistic models can be employed to develop artificial intelligence systems that are able to predict emotional reactions to an array of content and contextual stimuli in education games, with the aim of keeping the player engaged with the game. But what of sound linked to emotion in games?
the Use of Emotional sound in Games Research by Ekman (2008) bridges the gap between traditional movies and modern computer games by explaining how sound is used to stimulate emotions in each of these media. Ekman enhances her discussions with summaries of some of the numerous theories in the portrayal of emotional involvement experienced through sound and music. Perhaps most importantly in her work, Ekman emphasises the difference between the role of sound in movies as opposed to computer games. Principally, this is that sound in movies is present to enhance the narrative and heighten the experience whereas, in computer games, sound must perform not only this function but also serve as a tool for interaction, often to the extent where the narrative element is sacrificed in favour of providing informational content. Ekman’s work therefore suggests that incorporating diegetic and non-diegetic sounds into computer games significantly increases the level of complexity for the sound designer. Kromand (2009) feels strongly that sound can be used to influence a game player’s stress and awareness levels by incorporating suitable mixtures of diegetic and non-diegetic sound. He provides examples of several contemporary computer games that feature such affective sound. In particular, his work focuses on the popular BioShock, F.E.A.R. and Silent Hill 2 titles. Kromand’s work is an interesting starting point and introduction to the use of sound in games, especially in inducing more unpleasant sensations. He provides extensive discussion and illustrative examples and considers the concept of trans-diegetic sounds (Jørgensen, 2011) those which transcend the traditional barrier between diegetic and non-diegetic. Kromand concludes by proposing that mixtures of diegetic and non-diegetic sound can lead to confusion and uncertainty about the environment and actions around the game player. He hypothesises that this confusion is purposefully implemented in the game
239
Emotion, Content, and Context in Sound and Music
environment and that the uncertainty of events taking place adds to the emotional investiture of the player in the game. Though not as up-to-date as other works concerning computer games and human emotion, a corresponding work, which also looks at methods of eliciting emotional state in computer gamers, comes from Johnstone (1996). The age of this paper alone demonstrates the importance and significance of the emotional link between computer games and game players. His study concerns the discernment of emotional arousal by speech sounds made by users during their interaction with a computer game. Part of the rationale behind his approach is hypothesised to be because the feedback equipment of today (heart rate monitors and skin conductance devices) was not so readily or cheaply available in 1996. An interesting concept that is partially addressed by Johnstone is that spontaneous emotional speech sounds differ acoustically from those that are planned and considered. If this theory holds true, then it means that genuine emotional responses can be distinguished from planned responses. In effect, this is somewhat analogous to the use of voice stress analysis in lie detection scenarios. Johnstone indicates that this ability is also useful in a truly interactive manner, since it not only means that users or game player responses can be analysed to determine emotional valences, but also that synthesised speech, such as the voices of characters in games, could be manipulated in similar acoustic ways to provide more realistic and affective game environments and conjunctions. For diegetic sounds in particular, this presents a world of opportunity. The results of Johnstone’s initial study are promising though there are some methodological aspects of the research that would have benefited from tighter control. For example, subjects’ spontaneous speech sounds were recorded and analysed but they were also required to answer subjective questions to provide speech samples. By the very nature of such an enquiry, the subjects would
240
have been required to consider their response during which time the effects of spontaneity or the moment could well have been depleted. The results gained are not enough to fully support the idea of distinguishing spontaneous sounds from planned although there is evidence to suggest that this might be a logical progression in future. Nevertheless, the data collected shows promise in being able to determine notions of urgency and felt difficulty in the game environment from events that are associated with achieving the objectives of the game. Primarily this can be measured by changes in spectral energy levels, low frequency energy distribution, and shorter speech duration. In more recent work, Livingstone and Brown (2005) present theories and results that support the use of auditory stimuli to provide dynamic and interactive gaming environments. Whilst their paper explores the use of musical changes and emotional reactions in a general sense, part of their work is also devoted to investigating the application in gaming. Their underlying concept is that musical changes in the game can trigger emotional reactions in game players in a more dynamic manner than is currently the norm. Livingstone and Brown employ a rule-based analysis of symbolic musical content that relates to a fixed set of emotional responses. Their work demonstrates that by dynamically altering the musical characteristics of playing music, such as the tempo, mode, loudness, pitch, harmony and so forth, the user perceives different emotional intentions and contexts within the piece of music that is currently playing. Music, then, stimulating one of the five human senses, is capable of influencing emotional change within humans in a computer game environment. The work of Parker and Heerema (2008) presents a useful overview of how sound is used in diegetic and non-diegetic forms within computer games. They argue that greater use should be made of sound in order to enhance the game environment and experience. A primary exemplar used by them, is that sound should also serve as a tool for input
Emotion, Content, and Context in Sound and Music
and interaction with the game, rather than being present purely to be heard. They reiterate that sound in games at present is reactive rather than interactive. However, in this chapter, we suggest that sound is simply a tool of the emotions and that it is player emotion that should be interactive, rather than reactive, in order to provide a new level of computer gaming experiences. We feel this can be strongly underpinned by the use of sound. Parker and Heerema go on to describe audio gaming and provide a series of examples and discussions of scenarios where sound can be used as the primary interaction mechanism between the player and the computer game. These range from the player reacting to audio cues, providing the game with input using speech, or other sonic input, and by directly controlling sound and music in the game. Although concise and valid at representing the current state of play of sound in games, their work does not consider the affective nature of using sound in games. Emotion is triggered by sound and the two are intrinsically linked. Recent work by Grimshaw, Lindley, and Nacke (2008) seeks to formalise the relationship between a subject’s immersion in a game environment as a function of the auditory content. Grimshaw et al. employ a series of biometric techniques to provide insight into the human emotional and physiological response to the sonic actions and environment of a first-person shooter game. Their method employs a significant array of quantitative, physiological measurements that are correlated with subjective questioning. The deep complexity of human emotion and psychology is exposed in their work as a strong relationship between the results of these two investigative methods cannot be found. This deficiency is the subject of significant discussion by the authors and, unsurprisingly, it is suggested as an area for significant future investigation. It is important to place an emphasis on this point: although broad hypotheses and empirical evidence show sound and music play a large part in stimulating emotional responses in human subjects, the quantification of these effects, especially
objective measurement, is elusive. Subjective investigation has traditionally always been the forté of psychological and sociological researchers. It is for this reason that sound designers and scientists working in the field must have an awareness of these issues, especially the sound designer working in computer game and multimedia development. In short, emotion is highly difficult to measure in an absolute way. Bridging this gap must be done carefully and backed-up by considered research and investigation. There is a wealth of literature relating to the emotional impact of games. Equally, there is an increasing amount of published work concerning audio games; the majority of literature still concerns itself with traditional, visually-focused games. As the reader may have noticed in this section of this book, there are few studies that have concerned themselves with using sound as the primary interactive method whilst also monitoring and responding to the emotional reactions of the game player. It is just this sort of scenario that the studies and ideas presented in this chapter aim to inspire, support and help stimulate.
Are sound and Music really Important in Games? It is interesting to consider to what extent sound is perceived as being important by users in computer games. If we consider the move from the beeps and clicks that early computer games such as Space Invaders and Pong made to modern alternatives such as the Guitar Hero series, we can see that the computer games industry has certainly placed an increased focus on the use of sound and music in games. To this extent, we conducted research, by means of a user survey, into determining user awareness of sound in computer games. The work is documented in grater detail in (Cunningham, Grout, & Hebblewhite, 2006), but a summary of the important findings and discussions are provided here.
241
Emotion, Content, and Context in Sound and Music
Table 1. Overall Game Genre Preference of Survey Participants in Rank Order Game Genre
Preference (%)
Role-Playing Game (RPG)
39
Shoot-em-up
24
Strategy/Puzzles
12
Adventure
9
Sports
9
Simulation
3
Other
3
This survey was undertaken to establish the various factors that subjects considered important when it came to purchasing a new computer game. Our initial hypothesis was that users would rate factors such as playability and visuals of a game, much higher than the sound and music, demonstrating that the focus in the computer gameworld tends to be in the areas of the graphical and gameplay domains. The survey had a total of 34 respondents. A profile of the gamers participating in the survey, in terms of their game type preference, is shown in Table 1. We believe a future study should investigate whether the favoured game genre affects particular factors that users specifically look for in games. For example, role-playing games have been traditionally much more limited in terms of their graphic and aural flamboyance, with greater emphasis being placed upon story-line and depth, whilst action and adventure games are often much more visually stimulating. Figure 2 and Figure 3 illustrate the results of questions where participants were asked to indicate the most important and, since it was assumed prior to the study that the playability or gameplay would most likely be rated highly, the second most important feature that influenced game purchasing decisions. Not surprisingly, we found that the most important factor is playability. The rating for all the other possible factors are negligible, although somewhat surprising is that no participants rated
242
the sound or musical elements to be important when deciding upon a game to buy. Intriguingly, the ability to play a game online with other users took favour over sound, which is an intriguing insight into the mind of the 21st Century games player. Users who selected the “Other” category were prompted to provide a more detailed explanation. The responses received here all related to one of the following comments: “depth and creativity”, “the whole package” and two participants stated that the “story or scenario” was most important. It is argued, on the basis that playability will always rate highest, that the results in Figure 3 are more insightful than those in Figure 2. After all, the whole notion of computer games is that they are to be played with! This time we see, as we might well have expected, that the graphics and visual stimulation was the most popular factor. As expected, the sound present in a game was cited by a low percentage of those surveyed. The users who chose the “Other” category on this occasion also stated that the factor important to them related to the story of the game. Encouragingly, however, and still applicable in the context of sound in games, is the percentage of users that value the interface. If we consider some of the most recent successful games, where the use of music and sound has been prominent, these titles almost all employ an interactive sound interface of some form. Prime examples include the Guitar Hero, Rock Band,
Emotion, Content, and Context in Sound and Music
Figure 2. Rating the Most Important Game Feature
Dance Dance Revolution (DDR), and SingStar series of games, as well as Battle of the Bands, Ultimate Band and Wii Music, to name just a few. For the budding entrepreneur game developer, it is probably worth taking note that the majority of these titles revolve around the player being placed in a live music performance scenario or band. We briefly attempted to analyse these two assump-
tions through our survey, though the results are inconclusive with an almost 50/50 split between positive and negative responses. However, it is worth bearing in mind that these responses are now somewhat dated. An overview of the responses is presented in Table 2. It is reasonable to suggest that the soundtrack of a game brings an added attraction when it comes
Figure 3. Rating the Second Most Important Game Feature
243
Emotion, Content, and Context in Sound and Music
Table 2. Survey of Participants’ Interest in Game Soundtrack and Music Does the soundtrack/music of a computer game make you more interested in playing or buying it?
to a gamer parting with their hard-earned cash. As mentioned earlier, game series like Grand Theft Auto, FIFA and Dave Mirra feature music by well-known recording artists, in some cases including music that has been commissioned specifically for that game. It can be seen in the results summarised in this section that, other than the added value suggested above, users do not place any particular emphasis on game sound. As was expected, the main aspects users were interested in were the playability and graphics of a game, although interaction with sound offers great potential. The development of new sound-motivated games will be a dynamic and challenging field in the years to come, though we must not forget the golden rule of a successful game: playability. The under-use of sound in games is further supported by Parker and Heerema (2008), a source the reader is encouraged to investigate if they are still in doubt as to the true potential of sound in the gaming environment. To quote directly from their work: “The use of sound in an interactive media environment has not been advanced, as a technology, as far as graphics or artificial intelligence” (p. 1). Their work goes on to justify these assertions and they explain that poor quality sound in a game often results in the game being unsuccessful in the marketplace, whilst the success of a game containing an acceptable or higher quality of sound will be based upon other factors such as playability or graphics. It is clear from the discussions and investigation covered in this section that human interaction and psychological and emotional links with games are more and more to the fore, as well as becoming increasingly important in the development of suc-
244
Yes
50%
No
44%
Don’t Know
6%
cessful gaming experiences. It is fair to assume that users are not only affected by sound and music but that they also respond to feedback and interact with the game, essentially providing full-duplex communication between human and machine that is becoming increasingly information-rich. It is these interactions and information that the rest of this chapter focuses on.
cONtENt Digital audio data holds much more information than the raw binary data from which it is constituted. At its barest, sound and music are generally provided to augment and provide realism to the current scenario. However, as we demonstrated in the previous section of this chapter, computer games are truly multimedia experiences that combine a range of stimuli to interact with the user. In short, we see the area of content analysis as providing a semi-intelligent mechanism with which to tie together one or more media employed in a multimedia environment in order to provide even more effective and efficient interaction and experience. Content of a particular medium can take many different forms, some of which will be shared across a range of media while others will be exclusive to a particular media type. The following is an attempt to briefly describe and exemplify these two categories: •
Shared content information. If we consider an entire multimedia artefact as being a hierarchical object, greater than the sum of its parts, then shared content informa-
Emotion, Content, and Context in Sound and Music
•
tion would be found attached to each of the media components present. For example, the publishing house, year of production, copyright information, and name of the game in which a multimedia object (a sound or otherwise) appears will always be the same. This information is generally that which is exclusively available in the form of meta-data and requires little data mining to extract Exclusive content information. This is information about the content that can only be found in a given type of media. Although the same content information may appear in multiple instances of that media, it will generally be exclusive to that type of media. For example, if we consider the music present in a computer game, the exclusive content information would include the tempo, amplitude range, time signature, spectral representation, selfsimilarity measurement, and so on.
The relationship between sound and visual elements has been a mainstay of the media field since its inception. Consider the music video and Hollywood movie. Careful correlation occurs in these areas between the content presented to the user in these fields. Prime examples of these include the synchronisation between actions and transitions appearing in the visual field and the sound content. An illustration of this that the authors find particularly effective is in the opening sequence of the 1977 movie Saturday Night Fever. This particular scene sees the watcher treated to shots of John Travolta’s feet, pounding the streets of New York in time to the Bee Gee’s classic Stayin’ Alive: a classic in its own right and an almost ridiculously simple example of the sound content being combined in the production of the visual content to produce something that has a much greater impact than either of the two individual components.
As Zhang and Jay Kuo (2001) demonstrated, it is quite possible to extract and classify a range of different sound content types from multimedia data, especially the kind of mixes found in traditional entertainment like television shows and movies. Though their work is focused on the traditional media of multimedia communication, the computer game environment is simply a natural extension of this, with the major difference being the integration of an element of interactivity. It is these principles that we hope content analysis allows us to build upon and utilise in the field of electronic media processing and development. In particular, we hope that game sound content can be analysed to provide an enhanced gaming experience. As a good starting point for consideration, we began to explore the relationship between visual information and music in electronic media, to provide an augmented experience when viewing the visual data. In another of our works (Davies, Cunningham, & Grout, 2007) we attempted to generate musical sequences based upon analysis of digital images: in that particular case, those of photographs and traditional works of art. The underlying thoughts and questions that motivated that research revolved around suggestions such as: What would the Mona Lisa sound like? We felt this would also provide additional information for people who were, for example, visually impaired, and it could be used to provide added description and emotional information relating to a particular still image. It became a logical ethos that the only way in which this could be achieved would be to analyse the content of the image, as it is this that contains the information and components required to relay the same information but in an alternative format. A tool that we have found very effective in analysing musical content is that of the Audio Similarity Matrix (ASM), based upon ideas initially proposed and demonstrated by Foote (1999). This allows a visual indication of the self-similarity, and therefore structure, of a musical piece. We suggest further reading into Foote’s work as a
245
Emotion, Content, and Context in Sound and Music
Figure 4. ASM of ABBA’s “Head Over Heels” (28 second sample)
good starting point to stimulate the imagination into how content analysis can provide highly useful information for a variety of scenarios. A graphical example of an ASM is presented in Figure 4 as an exemplar, where dark colours represent high similarity and bright colours show low similarity. Whilst we do not limit the application of content analysis to computer games, we suggest a few examples of appropriate situations where it may be used. Simple examples relate to the link between visuals and sound. In a game where the scene is bright and full of strong, primary colours, it would be pertinent to include sound that reflects this notion: bright and strong in timbre. On the other hand, a dark, oppressive scene would require slower, darker music with a thinner and sharper timbre, inducing a different set of emotions. In today’s dynamic computer games, where the user has an apparently boundless freedom to explore a virtual environment, the dynamic updating of sound content to match the visuals requires some 246
form of content analysis. Even a simple parameter that defined the “colour” of the scene or presence of tagged objects nearby would suffice on a basic level. Another suggestion would be to manipulate the gameplay by the choice of music and sound prescribed by the user. For example, the same game scenarios and task may be undertaken at different speeds, levels of difficulty, and in different environments, based upon the choice of music the user makes. Consider the scenario where a user may decide to play dance music whilst interacting with a game, thereby instigating a bright, quantised environment with predictable, rhythmic, structured gameplay content. Whereas if they choose highly random, noisy, alternative noise-core they would be presented with chaotic, overwhelming game scenarios: in both cases, a reflection of the structure and content of the music that can be achieved only through detailed signal and structural content analysis of the audio data.
Emotion, Content, and Context in Sound and Music
cONtEXt Context awareness also provides opportunities for a heightened user experience with digital media systems, particularly those that hold large data sets, the content of which may only be relevant to a user in certain usage scenarios. We believe the incorporation of contextual information into digital devices provides a more tailored experience for users. Contextual information can be considered as an added extra in digital media systems, allowing more defined information about the user to be brought into software systems. Recommendation systems, for example, are a great example of where contextual data can be included. Schmidt and Winterhalter’s (2004) work in elearning is a good example of how context awareness can be incorporated into digital, computerbased communication media. In their field, the context of the user is particularly important as it allows greater control and focusing of learning and teaching materials in order to engage at a deeper level with the user. Their work emphasises that the key stages of context awareness are in first acquiring contextual information and then building a suitable user-context model so as to estimate the current context of the user. Schmidt and Winterhalter also reinforce notions that good contextual modelling comes by acquiring information from a range of sources. Most importantly, in discussions of the importance of user context, Schmidt and Winterhalter hit upon the key questions that context awareness is able to begin to address: “How do we know what the user currently does, or what he intends to do?” (p. 42). Schmidt and Winterhalter choose to employ more passive mechanisms for contextual data acquisition, such as those which passively track user progress through tasks and record commonly accessed information. This is perfectly suitable for e-learning applications but, in the field of computer games and interactive entertainment, we feel that something a bit more fortuitous is required.
A crucial work that backs up these notions of more interactive and reliable context awareness, especially when it comes to the surrounding environment, can be found in Clarkson, Mase, and Pentland (2000). Although this work may now be slightly dated, the principles and techniques employed in their work are effective and provide good examples of the type of contextual information that can be acquired by using simple sensor input. Their work investigates how context, such as whether the user is on a train or at work and whether they are in conversation or not, can be estimated from sensor input, primarily a camera and microphone. Such work provides a strong basis from which to lead into more specific analysis of context that is relevant to the current activity or software application. This is further elaborated upon in the context of mobile device usage by Tamminen, Oulasvirta, Toiskallio, and Kankainen (2004), who consider determining contextual information in mobile computing scenarios. Computers, gaming consoles and mobile devices have all become much more powerful in recent years and interface with a range of local and remote information sources. These information sources range from the traditional tactile input devices to accelerometers, touch screens, cameras, microphones and so on. The Nintendo Wii and Apple iPhone and iPod are prime examples of such low-cost, sensor-rich, powerful computational devices. The technology available in these devices, as well as those devices that can be further added into the chain, mean a wide range of contextual information can potentially be extracted from a game player, be they mobile or static. We consider that the foremost sources of contextual information come from the user themselves and from the surrounding environment in which the user is currently immersed. This is further ratified by Reynolds, Barry, Burke, and Coyle (2007) who also consider the importance and usage of contextual input parameters from these two domains in their own research.
247
Emotion, Content, and Context in Sound and Music
Information from the user is arguably the most useful data that can be acquired in determining the context of the user. This allows the researcher to begin to investigate factors such as their level of activity, stress levels, emotional state, for example. We propose mechanisms such as skin conductivity, motion and heart rate data that might be acquired directly from the user and would prove particularly useful in monitoring their contextual state. Factors in the environment around the user are likely to have an effect on their performance in a game, their general attitudes, and their emotional state. A number of metrics can provide suitable input to a software system to estimate environmental context. Environmental information includes the amount of ambient noise, light levels, time of day and year, temperature and so forth. We feel that the devices and information in this scenario are relevant to many contextual extraction applications, not only those of digital entertainment and games. Hopefully, the reader can begin to gain an insight into the usefulness of contextual information from the examples and discussions in this section of the chapter. The next section of this chapter seeks to exemplify how context (as well as emotion and content) can be employed in digital multimedia applications, especially those that relate to sound. We feel that, in computer games, the virtual gameplay environment can be tailored to reflect the real environment of the player. In all, this will provide a deeper, more immersive experience: this will help the player to develop greater emotional and personal investment in the game. It will also be interesting to see if such a game can contribute to altering the emotional state of the user and impact upon their own personal context. For example, can games be designed that would relax a user, reduce their stress levels and heart rate, and even make them alter their surrounding environment to reflect their new, calmer state? Only through more contextual awareness and pervasive interactions with games will we know the answer to this question and others.
248
DEtErMINING UsEr PErcEPtIONs OF MUsIc In this section of the chapter, we aim to gain more of an insight into how emotion, content and context are attached to music by human listeners. By investigating the various perceptions and semantic terms users relate to different musical genres, it is possible to gain a deeper understanding of the ways in which humans relate their emotions, musical content and the context of different types of music. Wide ranges of semantics are frequently employed to portray musical characteristics and range from technically-related terms to experiential narratives (Károlyi, 1999). It is proposed that the characteristics of a piece of music are difficult to quantify in a single term or statement. Whilst high-level abstractions may be possible that categorise the music or provide an overview of the timbre, this is a highly subjective and individual (and potentially emotionally influenced) expression of a listener’s experience of the music. Such an investigation also allows groupings to be applied to terms, understanding to be formed and a mapping of the relationships between these groups to be formed.
repertory Grid technique In order to extract common descriptive features and semantics that are most meaningful and globally understood, it is better to employ a technique where the listener subject may employ their own descriptions of the elements under investigation. George Kelly’s work (1955) into personal construct theory (PCT) and personal construct psychology (PCP) provides a suitable mechanism, known as repertory grid analysis, by which such descriptions can be elicited from subjects, correlated and employed in measurement subject experiences. Kelly’s work in this area is grounded in principles of constructivism, where subjects identify and deal with the world
Emotion, Content, and Context in Sound and Music
around them based upon their own experiences and interpretation of events and objects. Repertory grid analysis consists of defining a particular subject or domain to be investigated within a particular context. Descriptions of instances or examples of the domain are known as elements and bi-polar descriptions of the elements, known as constructs, are rated on a scale (usually 5 or 7 point). For example, the domain being investigated might be movies and the elements could be a number of popular movies and the bi-polar constructs used to describe and rate the movie elements could be violent or non-violent, an adult’s film, or a children’s film and so on. Constructs are defined by the subject with the help of the interviewer, who enables the subject to produce more constructs by defining the relationships and differences between the nominated elements. This can be enhanced through interview techniques involving triads, where three random elements are chosen and the subject asked to choose the least similar of the three and define the construct that separates it from the other two elements (Bannister & Mair, 1968). Subjects then provide a rating on the point scale for each element against each of the constructs they have defined in order to complete their grid. Alternatively, and particularly of use when subjects struggle to separate two elements from a third, the interviewer can also find it useful to present a subject with two elements and ask them to explain the factor that differentiates the two elements. This will often provide one pole of a construct and the subject is then asked for what the opposite side of that particular construct would be. Once a desired number of subjects have completed a repertory grid each, the grids are then
concatenated and can be immediately visualised as one large grid but, more crucially, the opportunity is available to determine the importance of elements and constructs within the larger grid. The bi-polar nature of defining constructs allows the context and relationship of a construct to be articulated and better understood by the researcher. This further removes ambiguity when a subject provides a rating, since the interrelation between the opposing ends of the scale have been specified by the subject themselves (Kelly, 1955). Further detail of PCP and repertory grid technique goes well beyond this work and can be found in Kelly’s seminal text.
Using a repertory Grid to Understand Perceptions of Music A set of elements was defined to include a representative spectrum of musical genres upon which the subjects would be asked to define their own bi-polar personal constructs in regard to their experiences and perceptions of the characteristics of those genres, in their experiences of listening to music. Whilst there are many sub-genres and pseudo-related musical styles, this provides an appropriate, broad spread for the purposes of this particular investigation without making the interview process for the participant overly laborious in terms of time and effort. The elements defined were those shown in Table 3. Subjects for the investigation were drawn from a random sample of the population. Subjects were interviewed on an individual basis and told that the purpose of the exercise was to get them to express their perceptions about the characterising features of different type of music. To carry out
Table 3. Musical Genre Elements Used in Repertory Grid Experiment • Pop
• Rock
• Dance
• Jazz
• Classical
• Soul
• Blues
• Rap
• Country
249
Emotion, Content, and Context in Sound and Music
the rating of elements against their defined constructs, subjects were asked to perform a card sorting exercise for each pair of constructs. The use of triads was made to elicit the choice of constructs by randomly selecting 3 elements and once subjects began to struggle with the use of triads they were asked to differentiate between 2 randomly selected elements. A total of 10 subjects were selected to participate in the elicitation process of the repertory grid interview. The age of subjects interviewed ranged from 16 to 59, with the average age being 34, and there was a 50/50 male/female gender split. The results of the ten repertory grid interviews are presented in Figure 5 and Figure 6. Though the number of subjects involved in the repertory grid interviews appears to be a low population sample at first glance, the granularity from these interviews comes from the sum number of constructs elicited across all participants. Furthermore, the data retrieved using constructs provides both qualitative and quantitative information regarding the domain of enquiry. In addition to the visual analysis of a repertory grid, a PrinCom map, which makes use of Principal Component Analysis (PCA), can be derived that relates elements and constructs in a graphical fashion where the visual distance between elements and constructs is significant. The PrinCom mapping integrates both elements and constructs on a visual grid and shows the relationship between the two. A PrinGrid for the repertory grid derived in this investigation is shown in Figure 7. It is the constructs elicited that are particularly of interest within the scope of this work. The constructs used by subjects provide insight into how they perceive music. As can be seen from the grid in Figure 5 and Figure 6, the range of constructs elicited provides an insight into, not only how subjects typically perceive the sound content of each musical genre but also, terms relating to the context in which they place each genre and occasional indications of the emotional impact of each genre. For example, by also 250
looking at Figure 7 we can produce the notion that blues music is “emotionally evocative”, has “specific geography & history”, is placed in the context of being “African American”, and is “mellow”. Naturally, there is some subjectivity present here and these statements are open to interpretation, but to most readers it is expected that these constructs should represent the group norm. A perceived limitation of the repertory grid technique to have been encountered during this particular study is that of familiarity with the elements under investigation by subjects, during repertory grid interview. During interviews there were clearly some elements that subjects were definitely not as familiar with as others. It was observed that subjects would often group together the elements they were less familiar with when rating elements against their chosen constructs. Whilst it is appreciated that this phenomenon is likely to be particularly present in this study, due to music awareness firmly depending upon personal preference or taste, it is doubtless likely to occur in other scenarios. Using a repertory grid sought to elicit humanfriendly descriptions of musical characteristics. Although not strictly timbral definitions, these constructs succeed in describing the characteristics of musical genres. To put this into the context of artistic definitions with the notion of a visual metaphor, whilst timbre is a human description of the colour of a sound or piece of music, these constructs can be thought of as describing the patterns; the mix of shapes and colours that provides deeper information about the content and the bigger picture We find repertory grid investigation to be a highly useful tool in determining group norms and perceptions of important factors in any field that is being explored. In the context of this chapter, it can hopefully be seen that using such techniques would allow information about how a group of users would perceive a game and game sound in terms of the content that constitutes the game along with their emotional perceptions of the game and the context in which they view it.
Emotion, Content, and Context in Sound and Music
Figure 5. Repertory Grid Ratings of Musical Genres (Part a).
EXAMPLE APPLIcAtIONs IN cUrrENt rEsEArcH Presented in this section are summaries from some research work that has been influenced or involved
by the use of emotion, content and context in various guises. A number of the works presented here have been studies involving a small set of music. For convenience, this small database of music is shown in Table 4 so that the reader may
251
Emotion, Content, and Context in Sound and Music
Figure 6. Repertory Grid Ratings of Musical Genres (Part b).
252
Emotion, Content, and Context in Sound and Music
Figure 7. Musical Genre Principal Components Analysis.
cross-reference the ID number to the song, where appropriate. We feel that this small selection of songs represents a reasonable cross-section of contemporary popular musical genres.
responsive Automated Music Playlists Some of our most recent and cognate work combining the use of emotion, content and context in musical applications has been in the area of intelligent playlist generation tools and this work is explored in greater detail in a separate work (Cunningham, Caulder, & Grout, 2008). However, to see the effectiveness of combining all three of these areas, the reader is provided here with a summary of that work to date.
Our main motivation in this area of research and development was to address some of the shortcomings traditionally employed in automatic recommendation and playlist generation tools. Historically, these tools evolved in a similar way to that of Automated Collaborative Filters (ACFs). That is to say, simple measurements of user preference and the preference of a typical population were used to build a ranked table of music in a library. These analysed information such as the most frequently played tracks, a user rating of each track, favourite artists and musical genres, and other meta-data attached to a song (Cunningham, Bergen, & Grout, 2006). However, this is not to totally trivialise the area of automatic playlist generation, since a number of systems exist that employ much more advanced learning and
253
Emotion, Content, and Context in Sound and Music
Table 4. Mini Music Database Used in Testing ID
Artist
Song
0
Daft Punk
One More Time (Radio Edit)
1
Fun Lovin’ Criminals
Love Unlimited
2
Hot Chip
Over and Over
3
Metallica
Harvester of Sorrow
4
Pink Floyd
Comfortably Numb
5
Sugababes
Push The Button
6
The Prodigy
Breathe
7
ZZ Top
Gimme All Your Lovin’
analysis techniques and technologies (Aucouturier & Pachet, 2002; Platt, Burges, Swenson, Weare, & Zheng, 2002). In recent years, as computational power and resources have increased, the tools that underpin musical and sound content analysis have migrated deeper into the field of playlist generation, allowing greater scope and accuracy for classification of musical features (Logan, 2002, Gasser, Pampalk, & Tomitsch, 2007). However, although these advances have been significant, such methods have always focused on musical and sonic information extraction and few systems have considered the wider scope of the user and his or her environment. It is reasonable to expect that factors relating to the emotion, state of mind and current activities of a listener will greatly influence their current and, most importantly, next selection of music.
Emotion, Content and Context in Automatic Playlist Generation In this review of our recent work in the area of intelligent automatic playlist generation, we provide details of the development principles, investigation and analysis into the viability of playlist generation that considers the wider circumstance of the listener. To achieve more accurate and useful playlist generation, we propose to not only build upon the established principles of using information about the musical content but also
254
to examine the context the listener is in and how these external factors might affect their choice of suitable music. Additionally, the emotional state of the user is of interest since this also is likely to influence their choice of music. To visually summarise the complete system we are describing here, a figure showing an idealised scenario is provided in Figure 8. To realise a system that will potentially have to deal with and correlate a wide range of input parameters, we employ approaches that utilise fuzzy logic and self-learning systems. Principally, determining the user’s emotional state is of key importance to a truly successful and useful playlist generation system. This is informed by and correlated with the state of the surrounding environmental factors for the user, as well as their current levels of movement, physical activity, heart rate, stress levels, and so on. Within our work, we felt that it was initially most important to investigate the current locomotive state of the user. For example, a user who is moving a lot and accelerating rapidly may be participating in an energetic activity such as running, dancing, cycling, or exercising in some way. It is feasible to suggest that most people listening to music in such scenarios would be likely to want to listen to music that reflects the nature of their physical motion such as music with a dominant, driving rhythm and high tempo, greater than 120 or 130 beats-per minute, for instance.
Emotion, Content, and Context in Sound and Music
Figure 8. Emotion, Content and Context Aware Playlist System
Given currently available technology, it is also relatively easy to find equipment that allows the measurement of a user’s movement. This was realised in our case by employing the wireless hand controller from the Nintendo Wii: the Wiimote. The Wiimote, when compared to other alternatives, is a cheap device that allows measurement of three-dimensions of movement. The Wiimote is almost universally accessible since it employs the Bluetooth communication protocol to send and receive data to a paired host. As Maurizio and Samuele (2007) demonstrate, valuable motion information can be retrieved through the accelerometers contained in the Wii controller.
Implementation and Initial Results Our initial work in this field sought to demonstrate the ability to attain, analyse and correlate content-related data about music and contextual
information acquired from the user to arrive at an estimate of the user’s emotional state (E-state). To achieve this, we developed a small scale system that would work from a number of simulated factors (controlled by the researcher) and also live data extracted from sensors, principally the Wii controller. This system was designed to work with a small music database consisting of eight songs, shown in Table 4, and rank these songs in order of most suitable, based upon the estimated E-state. To begin working with the motion data from the Wiimote controller, we attempted to work with four simple locomotive states: standing, walking, jogging, and running. These simple locomotive states were believed to be detectable from not only the Wiimote but a range of motion measurement devices such as the accelerometers built into the Apple iPhone/iPod, as well as higher-level systems such as a Qualisys motion capture system (which we also had access to and allows us to verify the
255
Emotion, Content, and Context in Sound and Music
Table 5. Defined Set of Emotional States (E-states) Depressed
E-State Numeric Range
0-3
results obtained from the Wiimote). Similarly, for each of the other input parameters we work with, such as weather conditions, amount of light and so on, a range of states was also defined. As previously mentioned, due to our current limitations of time and resources, we focused on only the locomotive state of the user. It is hoped that in future, further ratification will be achieved by using other user measurements such as heart rate and galvanic skin response. To avoid additional complication, and allow greater control over the testing procedure, this set of parameter ranges and values was loosely defined according to the empirical and historic knowledge of the individual. However, this is too fixed and logical to fit the way things tend to work in the real world: therefore, to make these values more representative of real scenarios, they are fuzzified, when defined in the fuzzy logic system. In simple terms, this means the boundaries and degree of accuracy of each point on a scale is related within the range of all available values. The implemented fuzzy logic system provides a single output value that represents the predicted E-state of the listener. For simplification, we began by defining five emotional states and assigned a numeric range to each state to allow the representation of varying degrees of this state and the overlap where states merge into one another. This is appropriate, since it is very difficult to place an absolute, quantitative metric onto the complex emotions felt by humans. A table representing these allocations is shown in Table 5. To verify the effectiveness of using the Wii controller as a device to measure movement, and particularly locomotive state, we performed a number of experiments benchmarked against a
256
Unhappy 3-4
Neutral 4-6
Happy 5-8
Zoned 7-9
Qualisys motion detection system. By determining the rate of acceleration from the accelerometers in the Wii controller and asking a subject to provide three locomotive states (walking, jogging, and running) we can see that the Wii controller allows rapid identification of each of these states, as the graph in Figure 9 shows. We defined a number of different scenarios, that combined a range of contextual parameters which a user might typically find himself or herself in. Each of these scenarios is shown in Table 6. These parameters included those that the user has control over, in this case the locomotive state, and external, environmental factors, beyond the control of the user. We carried out a quantitative investigation, using 10 subjects, where each subject was asked to map one of the emotional states from Table 5 against one of the scenarios from Table 6. From this investigation, the average Estate response rating for each scenario is shown in Table 7. Although the use of an average response is not ideal, it provides a sufficient insight into the common perception of each scenario and when we performed analysis of each response, there was strong correlation across all of the subjects. Each of the songs in the mini-database was allocated a range of values from the E-state table by the research team. These values reflected the researchers’ perceptions of the content and emotional indicators present in the music. Naturally, in future research, we will explore the perceived emotional state attached to each song, by employing a more detailed sample of a suitable population. However, for now, these allocations were decided to be a controlled factor in this particular investigation. These allocations were then mapped
Emotion, Content, and Context in Sound and Music
Figure 9. Wii Acceleration Curves for 3 States of Locomotion
against the emotional states extracted from the user-scenario study and each song’s E-state was ranked against each scenario’s E-state to provide a grade for each song, using a simple Euclidean distance measurement of the form G (p, q ) =
(p − q ) . 2
(1)
Table 8 shows the resultant ranking of songs (by their ID), for each of these scenarios.
At this stage, it is recognised that the system is currently more limited than the idealised scenario presented earlier in Figure 8. A number of factors have been simulated and a number of assumptions have been made. However, using a Fuzzy Rule Based System (FRBS) we have been able to implement a limited version of the system outlined in that dynamically outputs an E-state based on live sensor data from a Wii controller and simulated environmental parameters. An outline of the Takagi-Sugeno-Kang (TSK) type
Table 6. Range of Scenarios in Subject E-state Evaluation ID
Scenario Description
1
Walking, temperature is hot, lighting is dark/grey and weather is light rain.
2
Stationary, temperature is cold, lighting is dark and weather is raining.
3
Stationary, temperature is warm, lighting is brightening and weather is dry.
4
Running, temperature is hot, lighting is daylight/getting brighter and dry.
5
Walking, temperature is getting hot, lighting is dark and weather is drizzling.
6
Stationary, temperature is hot, lighting is grey and weather is dry.
7
Walking/Jogging, temperature is mild, daylight and it’s dry.
257
Emotion, Content, and Context in Sound and Music
Table 7. Average User E-state for each Scenario Scenario ID
Emotion (0 = Unhappy; 100 = Very Happy)
1
4.3
2
0.0
3
6.8
4
7.7
5
3.0
6
3.8
7
6.5
FRBS we employed, along with the fuzzified input parameters, is shown in Figure 10. Whilst a number of parameters such as light and temperature measurement have been simulated at this stage, the implementation of live sensor data from such devices is a trivial one and is only limited by the current lack of the hardware resources to incorporate these devices into the live system. In its current state the system demonstrates the ability to read contextual data from the user and correlate this with information from the environment to make an informed judgement of the user’s emotional state. With future development and also feedback from the user whilst using the system, the facility will be available to teach the playlist generator about the user’s preferences and train the accuracy with which the system is able to estimate the emotional state of the user.
FUtUrE rEsEArcH IDEAs AND cONcLUsIONs We have seen that awareness of the presented issues is beneficial in not only providing richer interactive experiences and more appropriate information, but that knowledge of the purpose of information can be used to optimise computational challenges. Furthermore, information, such as that presented visually, can be analysed in terms of content and context with the goal of being able to stimulate emotions in a user who might otherwise be disadvantaged from such an experience, due to visual impairment. It is hoped that we have demonstrated the applicability of determining and analysing features related to emotion, content and context in relation to improving systems where user-interaction is of particular significance.
Table 8. Ranked Playlist Ordering for Set of Given Scenarios Scenario
Playlist order
E-state
1
4.3
1; 4; 0; 7; 3; 6; 2; 5
2
0
6; 3; 4; 1; 0; 7; 2; 5
3
6.8
0; 7; 2; 5; 1; 4; 3; 6
4
7.7
2; 5; 0; 7; 1; 4; 3; 6
5
3
4; 3; 6; 1; 0; 7; 2; 5
6
3.8
4; 1; 3; 6; 0; 7; 2; 5
7
6.5
0; 7; 1; 2; 5; 4; 3; 6
258
Emotion, Content, and Context in Sound and Music
Figure 10. Simplified Overview of TSK-type FRBS Used in Playlist Generation
For the budding researcher interested in these areas, we suggest the following broad, non-exhaustive, list of some of the key research themes and areas that would greatly benefit from further investigation: •
•
•
•
•
Explore the commonly perceived emotions of users playing computer games and determine the most reliable methods with which to record and model these emotions Develop a common software toolbox to allow audio content analysis to be easily bolted into a range of software products Further examine the value of environmental context for users playing computer games compared with user-centred contextual data Assess gameplay parameters that are best influenced and reflected in the emotion, content and context of the user Develop fuzzy logic systems that can accurately read a range of content and contex-
tual data and output a robust, truly reflective emotional state for the majority of a sample user population. Above all, we hope that in reading this chapter we have stimulated intellectual thought and got the creative juices flowing. Our aim here is not to provide a cast-in-stone set of data and instructions for the budding developer and researcher to follow and obey: far from it! By all means question, criticise and make up your own mind. Anyone working in the field of computer game and multimedia development that involves sound will not only be aware of the technical, computing, and engineering issues of their field (the logical ones) but they will doubtless have opinions and their own tastes and creativity. If there is a lesson to be learned from this chapter, it is that we hope you will consider the bigger picture, the wider implications, the external factors and the notion of a user-centred design process. We feel that the three areas of emotion, content and context
259
Emotion, Content, and Context in Sound and Music
epitomise these views and will be the crucial issues in future technological and entertainment areas. Think big. In our opinion, the ‘blue sky’ and ‘off the wall’ ideas are some of the most fun and interesting things you can do when it comes to being creative with technology. Have fun with your work and work with fun stuff!
Cunningham, S., Grout, V., & Hebblewhite, R. (2006). Computer game audio: The unappreciated scholar of the Half-Life generation. In Proceedings of the Audio Mostly Conference on Sound in Games.
rEFErENcEs
Davies, G., Cunningham, S., & Grout, V. (2007). Visual stimulus for aural pleasure. In Proceedings of the Audio Mostly Conference on Interaction with Sound.
Aucouturier, J. J., & Pachet, F. (2002). Scaling up music playlist generation. In Proceedings of the IEEE International Conference on Multimedia Expo.
Dance dance revolution. (1998). Konami. Dave mirra freestyle BMX. (2000). Z-Axis.
Davis, H., & Silverman, R. (1978). Hearing and deafness (4th ed.). Location: Thomson Learning.
Bannister, D., & Mair, J. M. M. (1968). The evaluation of personal constructs. London: Academic Press.
Dix, A., Finlay, J., Abowd, G. D., & Beale, R. (2003). Human computer interaction (3rd ed.). Essex, England: Prentice Hall.
Battle of the bands. (2008). Planet Moon Studios.
Ekman, I. (2008). Psychologically motivated techniques for emotional sound in computer games. In Proceedings of the Audio Mostly Conference on Interaction with Sound.
BioShock. (2007). Irrational Games. Bordwell, D., & Thompson, K. (2004). Film art: An introduction (7th ed.). New York: McGrawHill. Clarkson, B., Mase, K., & Pentland, A. (2000). Recognizing user context via wearable sensors. In Proceedings of the Fourth International Symposium of Wearable Computers. Conati, C. (2002). Probabilistic assessment of user’s emotions in educational games. Applied Artificial Intelligence, 16(7/8), 555–575. doi:10.1080/08839510290030390 Cunningham, S., Bergen, H., & Grout, V. (2006). A note on content-based collaborative filtering of music. In Proceedings of IADIS - International Conference on WWW/Internet. Cunningham, S., Caulder, S., & Grout, V. (2008). Saturday night or fever? Context aware music playlists. In Proceedings of the Audio Mostly Conference on Interaction with Sound.
260
F.E.A.R. First encounter assault recon. (2005). Monolith Productions. FIFA. (1993-). EA Sports Foote, J. (1999). Visualizing music and audio using self-similarity. Proceedings of the seventh ACM international conference on Multimedia (Part 1), 77-80. Freeman, D. (2004). Creating emotion in games: The craft and art of emotioneering™. Computers in Entertainment, 2(3), 15. doi:10.1145/1027154.1027179 Gasser, M., Pampalk, E., & Tomitsch, M. (2007). A content-based user-feedback driven playlist generator and its evaluation in a real-world scenario. In Proceedings of the Audio Mostly Conference on Interaction with Sound. Grand theft auto. (1993-). Rockstar Games.
Emotion, Content, and Context in Sound and Music
Grimshaw, M., Lindley, C. A., & Nacke, L. (2008). Sound and immersion in the first-person shooter: Mixed measurement of the player’s sonic experience. In Proceedings of the Audio Mostly Conference on Interaction with Sound. Guitar hero. (2005-). [Computer software]. Harmonix Music Systems (2005- 2007)/ Neversoft (2007-). Hitchcock, A. (Director) (1960). Psycho. Hollywood, CA: Paramount. Jansz, J. (2006). The emotional appeal of violent video games. Communication Theory, 15(3), 219– 241. doi:10.1111/j.1468-2885.2005.tb00334.x Johnstone, T. (1996). Emotional speech elicited using computer games. In Proceedings of Fourth International Conference on Spoken Language (ICSLP96). Jørgensen, K. (2011). Time for new terminology? Diegetic and non-diegetic sounds in computer games revisited . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Károlyi, O. (1999). Introducing music. Location: Penguin. Kelly, G. A. (1955). The psychology of personal constructs. New York: Norton. Kromand, D. (2008). Sound and the diegesis in survival-horror games. In Proceedings of the Audio Mostly Conference on Interaction with Sound. Livingstone, S. R., & Brown, A. R. (2005). Dynamic response: Real-time adaptation for music emotion. In Proceedings of the Second Australasian Conference on Interactive Entertainment. Logan, B. (2002). Content-based playlist generation: Exploratory experiments, In ISMIR2002, 3rd International Conference on Musical Information (ISMIR).
Maurizio, V., & Samuele, S. (2007). Lowcost accelerometers for physics experiments. European Journal of Physics, 28, 781–787. doi:10.1088/0143-0807/28/5/001 Moore, B. C. J. (Ed.). (1995). Hearing: Handbook of perception and cognition (2nd ed.). New York: Academic Press. Moore, B. C. J. (2003). An introduction to the psychology of hearing (5th ed.). New York: Academic Press. Nacke, L., & Grimshaw, M. (2011). Player-game interaction through affective sound . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Parker, J. R. & Heerema, J. (2008). Audio interaction in computer mediated games. International Journal of Computer Games Technology. Platt, J. C., Burges, C. J. C., Swenson, S., Weare, C., & Zheng, A. (2002). Learning a Gaussian process prior for automatically generating music playlists. Advances in Neural Information Processing Systems, 14, 1425–1432. Pong. (1972). Atari Inc. Ravaja, N., Saari, T., Laarni, J., Kallinen, K., Salminen, M., Holopainen, J., & Järvinen, A. (2005). The psychophysiology of video gaming: Phasic emotional responses to game events. In Proceedings of DiGRA 2005 Conference: Changing Views - Worlds in Play. Reynolds, G., Barry, D., Burke, T., & Coyle, E. (2007). Towards a personal automatic music playlist generation algorithm: The need for contextual information. In Proceedings of the Audio Mostly Conference on Interaction with Sound. Rock band. (2005-2007). Harmonix Music Systems.
261
Emotion, Content, and Context in Sound and Music
Schmidt, A., & Winterhalter, C. (2004). User context aware delivery of e-learning material: Approach and architecture. Journal of Universal Computer Science, 10(1), 38–46. Silent hill 2. (2001). Konami. SingStar. (2004). London Studio. Space invaders. (1978). Taito Corporation. Stigwood, R., & Badham, J. (Producers). (1977). Saturday night fever [Motion picture]. Hollywood, CA: Paramount. Sykes, J., & Brown, S. (2003). Affective gaming: Measuring emotion through the gamepad. In Proceedings of Conference on Human Factors in Computing Systems (CHI ‘03). Tamminen, S., Oulasvirta, A., Toiskallio, K., & Kankainen, A. (2004). Understanding mobile contexts. Personal and Ubiquitous Computing, 8(2), 135–143. doi:10.1007/s00779-004-0263-1 Ultimate band. (2008). Fall Line Studios. Wii Music. (2008). Nintendo. Yost, W. A. (2007). Fundamentals of hearing: An introduction (5th ed.). New York: Academic Press. Zhang, T., & Jay Kuo, C. C. (2001). Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9(4), 441–457. doi:10.1109/89.917689
ADDItIONAL rEADING Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer. Brewster, S. A. (2008). Nonspeech auditory output . In Sears, A., & Jacko, J. (Eds.), The human computer interaction handbook (2nd ed., pp. 247–264). Philadelphia: Lawrence Erlbaum Associates.
262
Cunningham, S., & Grout, V. (2009). Audio compression exploiting repetition (ACER): Challenges and solutions. In Proceedings of the Third International Conference of Internet Technologies and Applications (ITA 09). Ekman, I., & Lankoski, P. (2009). Hair-raising entertainment: Emotions, sound, and structure in Silent Hill 2 and Fatal Frame . In Perron, B. (Ed.), Gaming after dark. Welcome to the world of horror video games (pp. 181–199). Jefferson, NC: McFarland & Company, Inc. Freeman, D. (2003). Creating emotion in games. Indianapolis, IN: New Riders. Grimshaw, M. (2008). The acoustic ecology of the first-person shooter: The player, sound and immersion in the first-person shooter computer game. Saarbrücken, Germany: VDM Verlag Dr. Mueller. Grimshaw, M. (2009). The audio Uncanny Valley: Sound, fear and the horror game. In Proceedings of the Audio Mostly Conference on Interaction with Sound. Loy, G. (2006). Musimathics: The mathematical foundations of music (Vol. 1). Cambridge, MA: MIT Press. Loy, G. (2007). Musimathics: The mathematical foundations of music (Vol. 2). Cambridge, MA: MIT Press. Papworth, N., Liljedahl, M., & Lindberg, S. (2007). Beowulf: A game experience built on sound effects. In Proceedings of the 13th International Conference on Auditory Display (ICAD). Röber, N., & Masuch, M. (2005). Leaving the screen: New perspectives in audio-only gaming. In Proceedings of 11th International Conference on Auditory Display (ICAD).
Emotion, Content, and Context in Sound and Music
KEY tErMs AND DEFINItIONs Content: The definable qualities and characteristics for any given piece of information. Context: The scenario and environment in which a user or application is placed in. Emotional Interaction: A digital system capable of inducing emotional reactions in a user and being able to dynamically respond to human emotional states.
Emotional Reaction: A human affective response or feeling in response to one or more stimuli. Emotional State: The dominant, overriding emotional sensation of a human at a given moment. Playlist Generation: The production of a sequence of songs, often to be listened to on a portable music player.
263
264
Chapter 13
Player-Game Interaction Through Affective Sound Lennart E. Nacke University of Saskatchewan, Canada Mark Grimshaw University of Bolton, UK
AbstrAct This chapter treats computer game playing as an affective activity, largely guided by the audio-visual aesthetics of game content (of which, here, we concentrate on the role of sound) and the pleasure of gameplay. To understand the aesthetic impact of game sound on player experience, definitions of emotions are briefly discussed and framed in the game context. This leads to an introduction of empirical methods for assessing physiological and psychological effects of play, such as the affective impact of sonic playergame interaction. The psychological methodology presented is largely based on subjective interpretation of experience, while psychophysiological methodology is based on measurable bodily changes, such as context-dependent, physiological experience. As a means to illustrate both the potential and the difficulties inherent in such methodology we discuss the results of some experiments that investigate game sound and music effects and, finally, we close with a discussion of possible research directions based on a speculative assessment of the future of player-game interaction through affective sound.
INtrODUctION Digital games have grown to be among the favourite leisure activities of many people around the world. Today, digital gaming battles for a share of your individual leisure time with other traditional activities like reading books, watching movies, listening to music, surfing the internet, or DOI: 10.4018/978-1-61692-828-5.ch013
playing sports. Games also impose new research challenges to many scientific disciplines – old and new – as they have been hailed as drivers of cloud computing and innovation in computer science (von Ahn & Dabbish, 2008), promoters of mental health (Miller & Robertson, 2009; Pulman, 2007), tools for training cognitive and motor abilities (Dorval & Pepin, 1986; Pillay, 2002), and as providers of highly immersive and emotional environments for their players (Ravaja, Turpeinen,
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Player-Game Interaction Through Affective Sound
Saari, Puttonen, & Keltikangas-Järvinen, 2008; Ryan, Rigby, & Przybylski, 2006). Gaming is a joyful and affective activity that provides emotional experiences and these experiences may guide how we process information. Regarding emotions, Norman’s (2004) definition is that emotion works through neurochemical transmitters which influence areas of our brain and successively guide our behaviour and modify how we perceive information and make decisions. While Norman makes a fine distinction between affect and cognition, he also suggests that both are information-processing systems with different functionality. Cognition refers to making sense of the information that we are presented with, whereas affect refers to the immediate “gut reaction” or feeling that is triggered by an object, a situation, or even a thought. Humans strive to maximize their knowledge by accumulating novel, but also interpretative information. Experiencing novel information and being able to interpret it may be a cause of neurophysiological pleasure (Biedermann & Vessel, 2006). Cognitive processing of novel information activates endorphins in the brain, which moderate the sensation of pleasure. Thus, presenting novel cues in a game environment will affect and mediate player experience and in-game learning. This is an excellent example of how cognition and affect mutually influence each other, which is in line with modern emotion theories (Damasio, 1994; LeDoux, 1998; Norman, 2004). Norman (2004) proceeds to define emotion as consciously experienced affect, which allows us to identify, who (or what) caused our affective response and why. The problem of not making a clear distinction between emotion and affect is further addressed by Bentley, Johnston, & von Baggo (2005), who recall Plutchik’s (2001) view on emotion as an accumulated feeling which is influenced by context, experience, personality, affective state, and cognitive interpretation. They also explain that user experience for desktop software or office-based systems is more dependent on performance factors while, for digital games,
user experience depends much more on affective factors. Affect is defined as a discrete, conscious, subjective feeling that contributes to, and influences, an individual’s emotion (Bentley, et al., 2005; Damasio, 1994; Russell, 1980). We will revisit this notion later in the text. In addition, Moffat (1980) introduced an interesting notion about the relationships between personality and emotion, which are distinguished along the two dimensions: duration (brief and permanent) and focus (focused and global). For example, an emotion might develop from brief affection into a long-term sentiment or a mood that occurs steadily might become a personality trait. The two dimensions can be plausibly identified at a cognitive level, making a strong case for the relation between emotion, cognition, and personality both at the surface and at a deep, structural level. Psychophysiological research shows that affective psychophysiological responses elicit more activity (on facial muscles such as corrugator supercilii, indicating negative appraisal) and higher arousal when people have to process unpleasant sound cues (e.g., bomb sounds), which shows that sound cues can be used in games to influence players’ emotional reactions (Bradley & Lang, 2000). Sound and music are generally known to enhance the immersion in a gaming situation (Grimshaw, 2008a). To music has been attributed also a facilitation of absorption in an activity (Rhodes, David, & Combs, 1988), and it is generally know to trigger the mesolimbic reward system in the human brain (Menon & Levitin, 2005), allowing for music to function as a reward mechanism in game design and possibly allowing for reinforcement learning (Quilitch & Risley, 1973). The recent explosion of interactive music games is a testament to the pleasureenhancing function of music in games. Examples for interactive music games are Audiosurf (2008), the Guitar Hero series (2005-2009), SingStar (2004), or WiiMusic (2008). They make heavy use of reinforcement learning, as both positive
265
Player-Game Interaction Through Affective Sound
and negative reinforcement are combined when learning to play a song on Guitar Hero (2005) for example (for a comprehensive list of interactive music games see the list at the end of this chapter). Hitting the button and strumming with the right timing leads to positive reinforcement in the way that the guitar track of the particular song is played back and suggests player finesse, while a cranking sound acts as negative reinforcement when the button and strumming are off. Such reward mechanisms that foster reinforcement learning are a very common design element in games (see Collins, Tessler, Harrigan, Dixon, & Fugelsang, 2011). Applying them to diegetic composition of music is new and warrants further study as sound and music effects in games are currently not studied with the same scientific rigour that is present for example in the study of violent digital game content and aggression (Bushman & Anderson, 2002; Carnagey, Anderson, & Bushman, 2007; Ferguson, 2007; Przybylski, Ryan, & Rigby, 2009). In addition to the reinforcement learning techniques in game design, another design feature is what Bateman (2009, p. 66) calls toyplay, facilitating the motivation of playing for its own sake. Toyplay denotes an unstructured activity of play guided by the affordances of the gameworld and is largely of an exploratory nature (Bateman, 2009; Bateman & Boon, 2006) being similar to games of emergence (Juul, 2005, p. 67) and unstructured and uncontrolled play termed “paida” (Caillois, 2001, p. 13). Many music games work completely without a narrative framing and derive the joy of playing simply out of their player-game interaction. For example, Audiosurf (2008) eliminates most design elements not necessary for the interaction of the player with the game, which is essentially the production of music by “surfing” the right tones. The colourful representation of tones and notes is a visual aesthetic that drives the player to produce music. A simple concept brought to stellar quality in games such as Rez (2001) or SimTunes (1996), which truly appeal to the toyplay aspect of gaming. Therefore, toyplay
266
elements and reinforcement learning techniques are two design methods most pronounced in music interaction games and that drive affective engagement with sound and music. With recent efforts in the field of humancomputer interaction (Dix, Finlay, & Abowd, 2004), the sensing and evaluation of the cognitive and emotional state of a user during interaction with a technological system has become more important. The automatic recognition of a user’s affective state is still a major challenge in the emerging field of affective computing (Picard, 1997). Since affective processes in players have a major impact on their playing experiences, recent studies have emerged that apply principles of affective computing to gaming (Gilleade, Dix, & Allanson, 2005; Hudlicka, 2008). The field of affective gaming is concerned with processing of sensory information from players (Gilleade & Dix, 2004), adapting game content (Dekker & Champion, 2007) – for example, artificial behaviour of non-player character game agents to player emotional states – and using emotional input as a game mechanic (Kuikkaniemi & Kosunen, 2007). However, not much work has been put into sensing the emotional cues of game sounds in games (Grimshaw, Lindley, & Nacke, 2008), let alone in understanding the impact of game sound on players’ affective responses. We start by discussing general theories of emotion and affect and their relevance to games and psychophysiological research (for a more general introduction to emotion, see Cunningham, Grout, & Picking’s (2011) chapter on Emotion, Content & Context). For instance, we suggest it is emotion that drives attention and this has an important effect upon both engagement with the game and immersion (in those games that strive to provide immersive environments). Immersion is an important and current topic in games literature – rather than attempt to define it (that is attempted elsewhere in this book); we limit ourselves to a brief overview of immersion theories and their relationship to theories of emotion, flow,
Player-Game Interaction Through Affective Sound
and presence before discussing empirical studies and theoretical stratagems for measuring player immersion as aided by game sound. Once we can understand under what sonic conditions immersion arises, we can then design more precisely for immersion.
tHEOrIEs OF EMOtION Psychophysiological research, affective neuroscience as well as affective and emotive computing are supporting the assumption that a user’s (or in our case a player’s) affective state can be measured by sensing brain and body responses to experienced stimuli (Nacke, 2009). Emotions in this sense can be seen as psychophysiological processes, which are evoked by sensation, perception, or interpretation of an event and/or object which is referred to in psychology as a stimulus. A stimulus usually entails physiological changes, cognitive processing, subjective feeling, or general changes in behaviour. This is of general interest, since playing games includes all sorts of virtual events taking place in virtual environments containing virtual objects. Emotions blur the boundaries between physiological and mental states, being associated with feelings, behaviours and thoughts. No definitive taxonomy has been worked out for emotions, but several ways of classifying emotions have been used in the past. One of the first and most prominent theories of emotion is the JamesLange theory, which states that emotion follows from experiencing physiological alterations: The change of an outside stimulus (either event or object) causes the physiological change which then generates the emotional experience (James, 1884; Lange, 1912). The Cannon-Bard theory offered an alternative explanation of the processing sequence of emotions, stating that, after an emotion occurs, it evokes a certain behaviour based on the processing of the emotion (Cannon, 1927). Thus, the percep-
tion of a certain emotion is likely to influence the psychophysiological reaction. This theory already tries to account for a combination of cognitive and physiological factors when experiencing emotions, in which case an emotion is not purely physiological (i.e. it is separate from mental processing). Another important emotional concept is the two-factor theory of emotions which is based on empirical observations (Schachter & Singer, 1962) and considers emotions to arise from the interaction of two factors: cognitive labeling and physiological arousal (Schachter, 1964). In this theory, cognition is used as a framework within which individual feelings can be processed and labeled, giving the state of physiological arousal positive or negative values according to the situation and past experiences. These theories have spawned modern emotion research in neurology and psychophysiology (Damasio, 1994; Lang, 1995; LeDoux, 1998; Panksepp, 2004) which is gathering evidence for a strong connection between affective and cognitive processing as underlying factors of emotion in line with the definition of Norman (2004) which we initially provided.
From Emotions to Experience Modern emotion research typically uses one of two taxonomies which try to account for emotions as either consisting of a combination of a few fundamental emotions or as comprising different dimensions usually demarked by extreme characteristics on the ends of the dimensional scales: 1.
Emotions comprise a set of basic emotions. In the vein of Darwin (1899) who observed fundamental characteristic expressive movements, gestures, and sounds), researchers like Ekman (1992) and Plutchik (2001) argue for a set of basic discrete emotions, such as fear, anger, joy, sadness, acceptance, disgust, expectation, and surprise. Each basic emotion can be correlated to an individual
267
Player-Game Interaction Through Affective Sound
2.
physiological and behavioural reaction, for example a facial expression as Ekman (1992; Ekman & Friesen, 1978) found after studying hundreds of pictures of human faces with emotional expressions Emotions can be classified by means of a dimensional model. Dimensional models have a long history in psychology (Schlosberg, 1952; Wundt, 1896) and are especially popular in psychophysiological research. Wundt (1896) was one of the first to classify “simple feelings” into a threedimensional model, which consisted of the three fundamental axes of pleasure-displeasure (Lust-Unlust), arousal-composure
(Erregung-Beruhigung), and tension-resolution (Spannung-Lösung). A more modern approach and currently the most popular dimensional model was suggested by Russell (1980). His circumplex model (see Figure 1) assumes the possible classification of emotional responses in a circular order on a plane spanned by two axes, emotional affect and arousal. The mapping of emotions to the two dimensions of valence and arousal has been used in numerous studies (Lang, 1995; Posner, Russell, & Peterson, 2005; Watson & Tellegen, 1985; Watson, Wiese, Vaidya, & Tellegen, 1999) including studies of digital
Figure 1. The two-dimensional circumplex emotional model based on Russell (1980)
268
Player-Game Interaction Through Affective Sound
games (Mandryk & Atkins, 2007; Nacke & Lindley, 2008; Ravaja, et al., 2008). The current popularity of dimensional models of emotion in psychophysiology can be explained by the fact that Wundt (1896) was one of the first researchers to correlate physiological signals, such as respiration, blood-pressure, and pupil dilation with his “simple feelings” dimensions. Bradley and Lang (2007) note that discrete and dimensional models of emotion need not be mutually exclusive but, rather, these views of emotion could be seen as complementary to each other. For example, basic emotions can be classified within affective dimensions. Finding physiological and behavioural emotion patterns as responses to specific situations and stimuli is one of the major challenges that psychophysiological emotion research faces currently. However, new evidence from neurophysiological functional Magnetic Resonance Imaging (fMRI) studies supports the affective circumplex model of emotion (Posner et al., 2009), showing neural networks in the brain that can be connected to the affective dimensions of valence and arousal: in this case, affective pictures were used as stimuli. The measurement of emotions induced by sound stimuli in a game context is, however, more complex. To identify how a certain sound, or a game element in general, is perceived, a subjective investigation is necessary, usually done after the experimental session. Gathering subjective responses in addition to psychophysiological measurements of player affect allows cross-correlation and validation of certain emotional stimuli that may be present in a gaming situation. This ‘after-the-fact’ narration is not, however, without its self-evident problems. A further major challenge remains the distinction between auditory and visual stimuli within games, as many games evoke highly immersive, audio-visual experiences, which can also be influenced by setting, past experiences, and social context.
Thus, we suggest that for measurement of emotional responses to game sound, three broad strategies are available for a full, scientific comprehension of player experiences. This means that there are at least three ways of understanding the emotional player experience in games (each illustrated by a particular stratagem) but the third, being a combination of the previous two, is likely to be the most accurate: 1.
2.
3.
As objective, context-dependent experience – Physiological measures (using sensor technology) of how a player’s body reacts to game stimuli can inform our understanding of these emotions As subjective, interpreted experience – Psychological measures of how players understand and interpret their own emotions can inform our understanding of these emotions As subjective-objective, interpreted and contextual experience – Inferences drawn from physiological reactions and psychological measures allow a more holistic understanding of experience.
One of our primary research goals is to understand gaming experience, which has been connected to positive emotions (Clark, Lawrence, Astley-Jones, & Gray, 2009; Fernandez, 2008; Frohlich & Murphy, 1999; Hazlett, 2006; Mandryk & Atkins, 2007), but also to more complex experiential constructs like, for example, immersion (Calleja, 2007; Ermi & Mäyrä, 2005; Jennett, et al., 2008), flow (Cowley, Charles, Black, & Hickey, 2008; Csíkszentmihályi, 1990; Gackenbach, 2008; Sweetser & Wyeth, 2005) or presence (Lombard & Ditton, 1997; Slater, 2002; Zahorik & Jenison, 1998). Thus, we will provide an overview of the current understanding of immersion, flow and presence in games and then provide suggestions as to how this could be measured using objective and subjective approaches.
269
Player-Game Interaction Through Affective Sound
IMMErsION, FLOW, AND PrEsENcE In the fields of game science, media psychology, communication and computer science, many studies are concerned with uncovering experiences evoked by playing digital games. There is a lot of work directed towards investigating the potentials, definition, and limitations of immersion in digital games (Douglas & Hargadon, 2000; Ermi & Mäyrä, 2005; Jennett et al., 2008; Murray, 1995). A major challenge of studying immersion is defining what exactly is meant by the term “immersion” and how does it relate to similar game experience phenomena such as flow (Csíkszentmihályi, 1990), cognitive absorption (Agarwal & Karahanna, 2000) and presence (Lombard & Ditton, 1997; Slater, 2002).
From Immersion to Flow and Presence In a very comprehensive effort, Jennett et al. (2008; Slater, 2002) give an extensive conceptual overview of immersion. According to their definition, immersion is a gradual, time-based, progressive experience that includes the suppression of all surroundings (spatial, audio-visual, and temporal perception), together with attention and involvement mediating the feeling of being in a virtual world. This suggests immersion to be an experience related to cognitive processing and attention: the more immersive an experience is, the more attentionally demanding it is (see Reiter, 2011 for a discussion of attention and audio stimuli). One could hypothesize that emotional state drives attention (Öhman, Flykt, & Esteves, 2001) and therefore, the more affective an experience is, the more likely it is to grab individual attention and consequently to immerse the player. Thus, immersion would be elicited as the result of an action chain that starts with affect. This prompts an emotional response that influences attention and, as a consequence, leads to immersion. It remains to be shown whether, and how, affective
270
responses of players influence immersion and what measures of player affect are most suitable to evaluate immersion. Immersion is seen in some literature (Sweetser & Wyeth, 2005) – based on qualitative analysis – as an enabler of a fleeting experience of peak performance labeled flow (Csíkszentmihályi, 1990; Nakamura & Csíkszentmihályi, 2002). Flow is a little understood, but often-used experiential concept for describing one kind of game experience. Some examples from game studies and human-computer-interaction literature try to use flow for analyzing successful game design features of games (Cowley et al., 2008; Sweetser & Wyeth, 2005). However, originally, flow was conceived by Csíkszentmihályi (1975) on the basis of studies of intrinsically motivated behaviour of artists, chess players, musicians, and sports players. This group was found to be rewarded by executing actions per se, experiencing high enjoyment and fulfilment in the activity itself rather than, for example, being motivated by future achievement. Csíkszentmihályi describes flow as a peak experience, the “holistic sensation that people feel when they act with total involvement” (p. 36). Thus, complete mental absorption in an activity is fundamental to this concept, which ultimately makes flow an experience mainly found in situations with high cognitive loading accompanied by a feeling of pleasure. According to a more recent description from Nakamura and Csíkszentmihályi (2002), it should be noted that for entering flow, two conditions should be met: (1) a matching of challenges or action opportunities to an individual’s skill and (2) clear and close goals with immediate feedback about progress. Flow itself can be described through the following manifested qualities (which are admittedly too fuzzy for a clear evaluation using subjective or objective methods): (1) concentration focuses on present moment, (2) action and consciousness merge, (3) self-awareness is lost, (4) one is in full control over one’s actions, (5) temporal perception is distorted, and (6) doing the activity is rewarding
Player-Game Interaction Through Affective Sound
in itself (Nakamura & Csíkszentmihályi, 2002). Flow even shares some properties with immersion, such as a distorted temporal perception and lost or blurred awareness of self and surroundings. Jennett et al. (2008) argue that immersion can be seen as a precursor for flow experiences, thus allowing immersion and flow to overlap in certain game genres, while noting that immersion can also be experienced without flow: Immersion, in their definition, is the “prosaic experience of engaging with a videogame” (p. 643) rather than an attitude towards playing or a state of mind. One important question in the discussion about flow and immersion is whether flow is a state or a process. Defining flow as a static rather than a procedural experience would be in contrast to the process-based definitions of immersion such as the challenge-based immersion of Ermi and Mäyrä (2005). This kind of immersion oscillates around the success and failure of certain types of game interactions. Another important differentiation between flow and immersion is that immersion could be described as a “growing” feeling, an experience that unfolds over time and is dependent on perceptual readiness of players as well as the audio-visual sensory output capabilities of the gaming system. Past theoretical and taxonomical approaches have tried to define immersion as consisting of several phases or components. For example, Brown and Cairns (2004) describe three gradual phases of immersion: engagement, engrossment, and total immersion, where the definition of total immersion as an experience of total disconnection with the outside world overlaps with definitions of telepresence, where users feel mentally transported into a virtual world (Lombard & Ditton, 1997). The concept of presence is also discussed by Jennett et al. (2008) in relation to immersion, but defined as a state of mind rather than a gradually progressive experience like immersion. If we assume for a moment that immersion is an “umbrella” experience, immersion could incorporate notions of presence and flow at certain stages of its progress. It remains, however, unclear
through what phases immersion unfolds and what types of stimuli are likely to foster immersive experiences. In what situations is immersion likely to unfold and what situational elements make it progress? When does it reach its peak and how much immersion is too much? More research is needed to investigate such questions, as well as a possible link between high engagement and addiction, as studied by Seah and Cairns (2008) or the differences between high engagement and addiction as suggested in a study by Charlton and Danforth (2004).
the scI Immersion Model Ermi and Mäyrä (2005) subdivide immersive game experiences into sensory (as mentioned above), challenge-based and imaginative immersion (the SCI-model) based on qualitative surveys. The elements of this immersion model account for different facilitators of immersion, such as, the experience of elements (in a gaming context) through which immersion is likely to take place. The three immersive game experiences Ermi and Mäyrä give implicitly provide different immersion models of static state and progressive experience. Sensory immersion can be enhanced by amplifying a game’s audio-visual components, for example, using a larger screen, a surround-sound speaker system, or greater audio volume. If immersion is actually facilitated in this way, immersion would be an affective experience, as evidence points to the fact that enhanced audio-visual presentation results in an enhanced affective gaming experience (Ivory & Kalyanaraman, 2007). By jamming the perceptive systems of players (as a result of mental workload associated with auditory and visual processing of game stimuli), sensory immersion is probably also a facilitator of guiding player’s attention (see Reiter, 2011). This strengthens the hypothetical link between attentional processing and immersive feeling found in related literature (Douglas & Hargadon, 2000) but, while the link remains, the cognitive direction is the reverse of
271
Player-Game Interaction Through Affective Sound
those discussed earlier. Imaginative immersion describes absorption in the narrative of a game or identification with a character which is understood to be synonymous with feelings of empathy and atmosphere. However, atmosphere might be an agglomeration of imaginative immersion and sensory immersion (since certain sounds and graphics might facilitate a compelling atmospheric player experience): the use of this term raises the need for a clearer definition of the concept of atmosphere and this is not provided by Ermi and Mäyrä (2005). If ‘imaginative’ refers mainly to cognitive processes of association, creativity, and memory recall, it is likely to be facilitated by player affect. However, individual differences are huge when it comes to pleasant imagination (this is probably a matter of personal preference), which would make it very difficult to accurately assess this kind of immersion using empirical methodology. The last SCI dimension, viz. challenge-based immersion, conforms closely with one feature of Csíkszentmihályi’s (1990) description of flow. This is the only type of immersion in this model that suggests it might be progressive experience because challenge level is never simply static but is something that oscillates around the success and failure of certain types of interaction over time. If we assume now that immersion is linked to either successful or failed interactions in a game that are likely to strengthen or weaken the subjective feeling of immersion, we can try to establish the following relationship between game interactions and immersions. Given a number of successful interactions σ, a number of failed interactions φ, and incremental playing time τ, then two descriptions of the magnitude of immersion ι could be considered: (1) For σ, φ > 0: If σ > φ, then ι = σ/φ × τ. (2) For σ, φ > 0: If σ ≤ φ, then ι = σ/(φ × τ).
272
These equations would suggest that the longer people play with a higher success than failure rate, the more immersed they would feel. If the failure rate is higher than the success rate, the feeling of immersion for players will decrease over time. Many sonic interactions in games are implicitly challenge-based because they require interpretation (or are understood from previous experience), but an example of explicitly challenge-based sonic interaction in games is given by Grimshaw (2008a) in his description of the navigational mode of listening (p. 32). It remains to be tested whether such an equation could account for immersion itself or whether this would only measure one aspect of the immersive experience. Ideally, such a ratio would be extended and combined with psychophysiological variables that measure a player’s affective response over time.
Implications for Player-Game Interaction and Affective sound In the context of sound and immersion in computer games, other work investigates the role of sound in facilitating player immersion in the gameworld. A strong link between “visual, kinaesthetic, and auditory modalities” is hypothetically assumed to be key to immersion (Laurel, 1991, p.161). The degree of realism provided by sound cues is also a primary facilitator for immersion, with realistic audio samples being drivers of immersion (Jørgensen, 2006) similar to employing spatial sound (Murphy & Pitt, 2001) although some authors, as noted by Grimshaw (2008b) argue for an effect of immersion through perceptual realism of sound (as opposed to a mimetic realism) where verisimilitude, based on codes of realism, proves as effective if not more efficacious than emulation and authenticity of sound (see Farnell, 2011). Self-produced, autopoietic sounds of players, and the immersive impact that sounds have on the relationship between players and the virtual environment a game is played in, have been framed in discussions on acoustic ecolo-
Player-Game Interaction Through Affective Sound
gies in first-person shooter (FPS) games which provide a range of conceptual tools for analyzing immersive functions of game sound (Grimshaw, 2008a; Grimshaw & Schott, 2008). In an argument for physical immersion of players through spatial qualities of game sound (Grimshaw, 2007), we find the concept of sensory immersion reoccurring (Ermi & Mäyrä, 2005). The perception of game sound in this context is not only loading player’s mental and attentional capacities but is also having an effect on the player’s unconscious emotional state. The phenomenon of physical sonic immersion is not new, but has been observed before for movie theatre audiences and the concept has been transferred to sound design in FPS simulations and games (Shilling, Zyda, & Wardynski, 2002). In some cases, the sensory intensity levels of game sound may be such that affect really is a gut feeling as alluded to earlier in this chapter. Possible immersion through computer game sound may be strong enough to enable a similar affective experience by playing with audio only, as investigations in this direction suggest (Röber & Masuch, 2005).
PsYcHOPHYsIOLOGIcAL MEAsUrEMENt OF EMOtIONs As we have discussed before, a rather modern approach is the two-dimensional model of emotional affect and arousal suggested by Russell (1980, see Figure 1). Ekman’s (1992) insight that basic emotions are reflected in facial expressions was fundamental for subsequent studies investigating physiological responses of facial muscles using a method called electromyography (EMG) which measures subtle reactions of muscles in the human body (Cacioppo, Berntson, Larsen, Poehlmann, & Ito, 2004). For example, corrugator muscle activity (in charge of lowering the eyebrow) was found to increase when a person is in a bad mood (Larsen, Norris, & Cacioppo, 2003). In contrast to this, zygomaticus muscle activity (on the cheek)
increases during positive moods. High obicularis oculi muscle activity (responsible for closing the eyelid) is associated with mildly positive emotions (Cacioppo, Tassinary, & Berntson, 2007). An advantage of physiological assessment is that it can assess covert activity of facial muscles with great sensitivity to subtle reactions (Ravaja, 2004). Measuring emotions in the circumplex model of emotional valence and arousal is now possible during interactive events, such as playing games, by covertly recording the physiological activity of brow, cheek and eyelid muscle (Mandryk, 2008; Nacke & Lindley, 2008; Ravaja, et al., 2008). For the correct assessment of arousal, additional measurement of a person’s electrodermal activity (EDA) is necessary (Lykken & Venables, 1971), which is either measured from palmar sites (thenar/hypothenar eminences of the hand) or plantar sites (e.g. above abductor hallucis muscle and midway between the proximal phalanx of the big toe) (Boucsein, 1992). The conductance of the skin is directly related to the production of sweat in the eccrine sweat glands, which is entirely controlled by a human’s sympathetic nervous system. Increased sweat gland activity is related to electrical skin conductance. Using EMG measurements of facial muscles that reliably measure basic emotions and EDA measurements that indicate a person’s arousal, we can correlate emotional states of users to specific game events or even complete game sessions (Nacke, Lindley, & Stellmach, 2008; Ravaja, et al., 2008). Below, we refer to several experiments analyzing cumulative measurements of EMG and EDA to assess the overall affective experience of players in diverse game sound scenarios.
Pointers from Psychophysiological Experiments A set of preliminary experiments (Grimshaw et al., 2008; Nacke, 2009; Nacke, Grimshaw, & Lindley, 2010) investigated the impact of the sonic user experience and psychophysiological effects of
273
Player-Game Interaction Through Affective Sound
game sound (i.e., diegetic sound FX) and music in an FPS game. They measured EMG and EDA responses together with subjective questionnaire responses for 36 undergraduate students with a 2 × 2 repeated-measures factorial design using sound (on and off) and music (on and off) as predictor variables with a counter-balanced order of sound and music presentation in an FPS game level. Among many results, two are particularly interesting: (1) higher co-active EMG brow and eyelid activity when music was present than when it was absent (regardless of other sounds) and (2) a strong effect of sound on gameplay experience dimensions (IJsselsteijn, Poels, & de Kort, 2008). In the case of the latter result, higher subjective ratings of immersion, flow, positive affect, and challenge, together with lower negative affect and tension ratings, were discovered when sound was present than when it was absent (regardless of music). The psychophysiological results of this study put the usefulness of (tonic) psychophysiological measures to the test, since the literature points to expressions of antipathy when the facial muscles under investigation are activated at the same time (Bradley, Codispoti, Cuthbert, & Lang, 2001). The caveat here is that the most common stimuli that have been used in psychophysiological research are pictures (Lang, Greenwald, Bradley, & Hamm, 1993). Using music, especially in a highly immersive environment such as a firstperson perspective digital game, may lead to a number of emotions being elicited simultaneously and which might lie outside of the dimensional space that is being used in Russell’s (1980) model. This opinion argues that a person’s emotional experience is a cognitive interpretation of this automatic physiological response (Russell, 2003). But the bipolarity of the valence-arousal dimensions have been criticized before as the model is too rigid to allow for simultaneous (i.e., positive and negative) emotion measurements (Tellegen, Watson, & Clark, 1999). Using sound and music in a digital game is, however, a very ambiguous and complex use of stimuli and prior research has
274
suggested that the emotional responses to such complex stimuli can be simultaneously positive and negative (Larsen, McGraw, & Cacioppo, 2001; Larsen, McGraw, Mellers, & Cacioppo, 2004). Tellegen, et al. (1999) proposed a structural hierarchical model of emotion which might be more suited in this context by providing for both independent positive emotional activation (PA) and negative emotional activation (NA) organized in a three-level hierarchy. The top level is formed by a general bipolar Happiness-Unhappiness dimension, followed by the PA and NA dimension allowing discrete emotions to form its base. With this model, we could argue that the findings of Nacke et al. (2010) show an independent positive and negative emotional activation during the music conditions. This would, however, also indicate that the physiological activity is not a direct result of the sound and music conditions, but arguably of a combination of stimuli present during these conditions. In addition, greater electrodermal activity was found for female players when both sound and music were off, while the responses for male players were almost identical (see Figure 2). The authors assumed music to have a calming effect on female players, resulting in less arousal during gameplay. For females, music was also connected with pleasant emotions as higher eyelid EMG activity indicated. Overall, the psychophysiological results from that study pointed toward a positive emotional effect of the presence of both sound and music (see also Nacke, 2009). Interesting in this context is that music does not seem to be experienced significantly differently on a subjective level, whereas sound was clearly indicated as having an influence on game experience. Higher subjective ratings of immersion, flow, positive affect, and challenge, together with lower negative affect and tension ratings when sound was present, paint a positive picture of sound for a good game experience (particularly so when music is absent). The results discussed above are ones that run the gamut from expected (sound contributes
Player-Game Interaction Through Affective Sound
Figure 2. Results of electrodermal activity (EDA averages in log [µS]) from the Nacke et al. (2010) study, split up between gender, sound, and music conditions in the experiment (see also Nacke, 2009)
positively to the experience of playing games) to the interesting and meriting further investigation (for example, gender differences in sound affect in the context of FPS games). Being the results of preliminary experiments, they typically provoke more questions than they answer and such results should, for the time being, be viewed in the light of several limiting factors. For example, the experiments provided audio-visual stimuli (not solely audio) and the sub-genre of game used – the FPS game – proposes a hunter-and-the-hunted scenario which, perhaps, might account for the gender affect differences noted. Another limitation that needs to be considered in psychophysiological research is the effect of familiarity with a particular game genre and a psychological mindset. Thus, a personality test and demographic questions regarding playing habits and behaviour
will help circumvent possible priming effects of familiarity or non-familiarity with games in the experimental analysis. In our experiments, personality assessments and demographic questionnaires were handed out prior to each study to factor out priming elements later in the statistical analysis. Finally, it is difficult to correlate objective measurements taken during gameplay with subjective, post-experiential responses and it may well be that such psychophysiological measurements are not the most optimal method for assessing the role of sound in digital games.
cONcLUsION In this chapter, we have given an overview of the emotional components of gameplay experience
275
Player-Game Interaction Through Affective Sound
with a special focus on the influence of sound and music. We have discussed the results of experiments that have made use of both subjective and objective assessments of game sound and music. After these pilot studies, and our discussion of emotional theories and experiential constructs, we have to conclude that the detailed exploration of game sound and music at this stage of our knowledge is still difficult to conduct because there are few comparable research results available and there is not yet a perfect measurement methodology. The multi-method combination of subjective and objective quantitative measures is a good starting point from which to create and refine more specific methodologies for examining the impact of sound and music in games.
Important Questions and Future challenges The important questions regarding game design that aims to facilitate flow, fun, or immersive experiences are: should tasks be provided by the game (i.e., created by the designer), should they be encouraged by the game environment, or should finding the task be part of the gameplay? The latter is rather unlikely, since finding only one task at a time sequentially might frustrate players and choosing a pleasant task according to individual mood, emotional, or cognitive disposition will probably provide more fun. Thus, instead of saying players need to face tasks that can be completed, it might be better design advice to provide several game tasks at the same time and design for an environment that encourages playful interaction. An environment that facilitates flow, fun, or immersion would provide opportunities for the player to alternate between playing for its own sake (i.e., setting up their own tasks) and finding closure by completing a given task. Some of the future challenges here will include finding good experimental designs that clearly distinguish audio stimuli, while still being embedded in a gaming context, in order that the
276
measurements and results obtained remain valid and thus more readily informative for the design suggestions above. We also see a lot of potential in cross-correlation of subjective and objective measures in terms of attentional activation, such as the exploration of brain wave (that is EEG) data to find out more about the cognitive underpinnings of gameplay experience, by this means potentially separating experiential constructs from an affective emotional attribution and aligning them to an attentive cognitive attribution. Experiments might be designed to answer the question does attention guide immersion or vice versa? Others might investigate sound and affect in game genres other than FPS.
Potential of these New technologies for sound Design Why go to all this experimental trouble? After all, most digital games seem to function well enough with current sound design paradigms. The answer lies in two technologies both having great potential for the future of sound design. The first is procedural audio and as other chapters here deal with the subject in great depth (Farnell, 2011; Mullan, 2011), we limit ourselves to highlighting the importance of the ability to stipulate affectiveemotional parameters for the real-time synthesis of sound. It is generally accepted that a sudden, loud sound in a particular context (perhaps there is a preceding silence and darkness wrapped up in a horror genre context) is especially arousing. However, what is less understood is the role, for example, of timbre on affect and emotion and, in the context of digital games and virtual environments, immersion. Would it be effective to design an affective real-time sound synthesis sub-engine as part of the game engine where the controllable parameters are not amplitude and frequency but high-level factors such as fear, happiness, arousal, or relaxation? Perhaps these parameters could be governed by the player in the game set-up menu who might opt, for instance, for a more or less
Player-Game Interaction Through Affective Sound
emotionally intense experience through the use of a simple fader. This brings us to the second technology. Although rudimentary and imprecise, consumer biofeedback equipment for digital devices (including computers and gaming consoles) is beginning to appear.1 Pass the output of these devices (which are variations of the EMG and ECG/EKG technologies used in the experiments previously described) to the controllable parameters of the game sound engine proposed above and procedural audio becomes a highly responsive, affective, and emotive technology.2 Furthermore, a feedback loop is established in which both play and sound emotionally respond to each other. In effect, the game itself takes on an emotional character that reacts to the player’s affect state and emotions and that elicits affect responses and emotions in turn – perhaps the game’s character might be empathetic or antagonistic to the player. This is the future of game sound design and the reason for pursuing the line of enquiry described in this chapter.
rEFErENcEs Agarwal, R., & Karahanna, E. (2000). Time flies when you’re having fun: Cognitive absorption and beliefs about information technology usage. Management Information Systems Quarterly, 24(4), 665–694. doi:10.2307/3250951 Audiosurf. [Video game]. (2008). Dylan Fitterer (Developer), Bellevue, WA: Valve Corporation (Steam). Bateman, C. (2009). Beyond game design: Nine steps towards creating better videogames. Boston: Charles River Media.
Bentley, T., Johnston, L., & von Baggo, K. (2005). Evaluation using cued-recall debrief to elicit information about a user’s affective experiences. In T. Bentley, L.Johnston, & K. von Baggo (Eds.), Proceedings of the 17th Australian conference on Computer-Human Interaction (pp. 1-10). New York: ACM. Biedermann, I., & Vessel, E. A. (2006). Perceptual pleasure and the brain. American Scientist, 94(May-June), 247–253. Boucsein, W. (1992). Electrodermal activity. New York: Plenum Press. Bradley, M. M., Codispoti, M., Cuthbert, B. N., & Lang, P. J. (2001). Emotion and motivation I: Defensive and appetitive reactions in picture processing. Emotion (Washington, D.C.), 1(3), 276–298. doi:10.1037/1528-3542.1.3.276 Bradley, M. M., & Lang, P. J. (2000). Affective reactions to acoustic stimuli. Psychophysiology, 37, 204–215. doi:10.1017/S0048577200990012 Bradley, M. M., & Lang, P. J. (2007). Emotion and motivation . In Cacioppo, J. T., Tassinary, L. G., & Berntson, G. G. (Eds.), Handbook of psychphysiology (3rd ed., pp. 581–607). New York: Cambridge University Press. doi:10.1017/ CBO9780511546396.025 Brown, E., & Cairns, P. (2004). A grounded investigation of game immersion . In Dykstra-Erickson, E., & Tscheligi, M. (Eds.), CHI ‘04 extended abstracts (pp. 1297–1300). New York: ACM. Bushman, B. J., & Anderson, C. A. (2002). Violent video games and hostile expectations: A test of the General Aggression Model. Personality and Social Psychology Bulletin, 28(12), 1679–1686. doi:10.1177/014616702237649
Bateman, C., & Boon, R. (2006). 21st century game design. Boston: Charles River Media.
277
Player-Game Interaction Through Affective Sound
Cacioppo, J. T., Berntson, G. G., Larsen, J. T., Poehlmann, K. M., & Ito, T. A. (2004). The psychophysiology of emotion . In Lewis, M., & Haviland-Jones, J. M. (Eds.), Handbook of emotions (2nd ed., pp. 173–191). New York: Guilford Press. Cacioppo, J. T., Tassinary, L. G., & Berntson, G. G. (2007). Handbook of psychophysiology (3rd ed.). Cambridge, UK: Cambridge University Press. doi:10.1017/CBO9780511546396 Caillois, R. (2001). Man, play and games. Chicago: University of Illinois Press. Calleja, G. (2007). Digital games as designed experience: Reframing the concept of immersion. Unpublished doctoral dissertation. Victoria University of Wellington, New Zealand. Cannon, W. B. (1927). The James-Lange theory of emotions: A critical examination and an alternative theory. The American Journal of Psychology, 39(1/4), 106–124. doi:10.2307/1415404 Carnagey, N. L., Anderson, C. A., & Bushman, B. J. (2007). The effect of video game violence on physiological desensitization to real-life violence. Journal of Experimental Social Psychology, 43(3), 489–496. doi:10.1016/j.jesp.2006.05.003 Charlton, J. P., & Danforth, I. D. W. (2004). Differentiating computer-related addictions and high engagement . In Morgan, K., Brebbia, C. A., Sanchez, J., & Voiskounsky, A. (Eds.), Human perspectives in the internet society: culture, psychology and gender. Southampton: WIT Press. Clark, L., Lawrence, A. J., Astley-Jones, F., & Gray, N. (2009). Gambling near-misses enhance motivation to gamble and recruit win-related brain circuitry. Neuron, 61(3), 481–490. doi:10.1016/j. neuron.2008.12.031
278
Collins, K., Tessler, H., Harrigan, K., Dixon, M. J., & Fugelsang, J. (2011). Sound in electronic gambling machines: A review of the literature and its relevance to game audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Cowley, B., Charles, D., Black, M., & Hickey, R. (2008). Toward an understanding of flow in video games. Computers in Entertainment, 6(2), 1–27. doi:10.1145/1371216.1371223 Csíkszentmihályi, M. (1975). Beyond boredom and anxiety. San Francisco: Jossey-Bass Publishers. Csíkszentmihályi, M. (1990). Flow: The psychology of optimal experience. New York: Harper Perennial. Cunningham, S., Grout, V., & Picking, R. (2011). Emotion, content and context in sound and music . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Damasio, A. R. (1994). Descartes’ error. New York: G. P. Putnam. Darwin, C. (1899). The expression of the emotions in man and animals. New York: D. Appleton and Company. Dekker, A., & Champion, E. (2007). Please biofeed the zombies: Enhancing the gameplay and display of a horror game using biofeedback. In Proceedings of DiGRA: Situated Play Conference. Retrieved January 1, 2010, from http://www.digra. org/dl/db/07312.18055.pdf. Dix, A., Finlay, J., & Abowd, G. D. (2004). Human-computer interaction. Harlow, UK: Pearson Education. DJ hero. [Video game],(2009). FreeStyleGames (Developer), Santa Monica, CA: Activision.
Player-Game Interaction Through Affective Sound
Donkey Konga. [Video game], (2004). Namco (Developer), Kyoto: Nintendo. Dorval, M., & Pepin, M. (1986). Effect of playing a video game on a measure of spatial visualization. Perceptual and Motor Skills, 62, 159–162. Douglas, Y., & Hargadon, A. (2000). The pleasure principle: Immersion, engagement, flow. In Proceedings of the eleventh ACM on Hypertext and Hypermedia (pp.153-160), New York: ACM. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200. doi:10.1080/02699939208411068 Ekman, P., & Friesen, W. V. (1978). Facial action coding system: A technique for the measurement of facial movement. Palo Alto, CA: Consulting Psychologists Press. Electroplankton. [Video game], (2006). Indies Zero (Developer), Kyoto: Nintendo. Elite beat agents. [Video game], (2006). iNiS (Developer), Kyoto: Nintendo. Ermi, L., & Mäyrä, F. (2005). Fundamental components of the gameplay experience: Analysing immersion. In Proceedings of DiGRA 2005 Conference Changing Views: Worlds in Play. Retrieved January 1, 2010, from http://www.digra.org/dl/ db/06276.41516.pdf. Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Ferguson, C. J. (2007). Evidence for publication bias in video game violence effects literature: A meta-analytic review. Aggression and Violent Behavior, 12(4), 470–482. doi:10.1016/j. avb.2007.01.001
Fernandez, A. (2008). Fun experience with digital games: A model proposition . In Leino, O., Wirman, H., & Fernandez, A. (Eds.), Extending experiences: Structure, analysis and design of computer game player experience (pp. 181–190). Rovaniemi, Finland: Lapland University Press. Frequency. (2001). Sony Computer Entertainment (PlayStation 2). Frohlich, D., & Murphy, R. (1999, December 20). Getting physical: what is fun computing in tangible form? Paper presented at the Computers and Fun 2 Workshop, York, UK. Gackenbach, J. (2008). The relationship between perceptions of video game flow and structure. Loading... 1(3). Gilleade, K. M., & Dix, A. (2004). Using frustration in the design of adaptive videogames. In [New York: ACM.]. Proceedings of ACE, 2004, 228–232. Gilleade, K. M., Dix, A., & Allanson, J. (2005). Affective videogames and modes of affective gaming: Assist me, challenge me, emote me. In Proceedings of DiGRA 2005 Conference: Changing Views: Worlds in Play. Retrieved January 1, 2010, from http://www.digra.org/dl/ db/06278.55257.pdf. Gitaroo man. [Video game], (2001). Koei/iNiS (Developer) (PlayStation 2). Electroplankton. [Video game], (2006). Indies Zero (Developer), Kyoto: Nintendo. Grimshaw, M. (2007). The resonating spaces of first-person shooter games. In Proceedings of The 5th International Conference on Game Design and Technology. Retrieved January 1, 2010, from http://digitalcommons.bolton.ac.uk/ gcct_conferencepr/4/.
279
Player-Game Interaction Through Affective Sound
Grimshaw, M. (2008a). The acoustic ecology of the first-person shooter: The player, sound and immersion in the first-person shooter computer game. Saarbrücken: VDM Verlag Dr. Mueller. Grimshaw, M. (2008b). Sound and immersion in the first-person shooter. International Journal of Intelligent Games & Simulation, 5(1), 2–8. Grimshaw, M., Lindley, C. A., & Nacke, L. (2008). Sound and immersion in the first-person shooter: Mixed measurement of the player’s sonic experience. In Proceedings of Audio Mostly 2008 - A Conference on Interaction with Sound. Retrieved January 1, 2010, from http://digitalcommons. bolton.ac.uk/gcct_conferencepr/7/. Grimshaw, M., & Schott, G. (2008). A conceptual framework for the analysis of first-person shooter audio and its potential use for game engines. International Journal of Computer Games Technology, 2008. Guitar hero 5. [Video game], (2009). RedOctane (Developer), Santa Monica, CA: Activision. Guitar hero II. [Video game], (2006). RedOctane (Developer), Santa Monica, CA: Activision. Guitar hero III. [Video game], (2007). RedOctane (Developer), Santa Monica, CA: Activision. Guitar hero: On tour. [Video game], (2008). RedOctane (Developer), Santa Monica, CA: Activision. (Nintendo DS). Guitar hero. [Video game], (2005). RedOctane (Developer), New York: MTV Games. Guitar hero world tour. [Video game], (2008). RedOctane (Developer), Santa Monica, CA: Activision. Hazlett, R. L. (2006). Measuring emotional valence during interactive experiences: Boys at video game play. In Proceedings of CHI’06 (pp. 1023 – 1026). New York: ACM.
280
Hudlicka, E. (2008). Affective computing for game design. In Proceedings of the 4th International North American Conference on Intelligent Games and Simulation (GAMEON-NA).Montreal, Canada. IJsselsteijn, W., Poels, K., & de Kort, Y. A. W. (2008). The Game Experience Questionnaire: Development of a self-report measure to assess player experiences of digital games. FUGA Deliverable D3.3. Eindhoven, The Netherlands: TU Eindhoven. Ivory, J. D., & Kalyanaraman, S. (2007). The effects of technological advancement and violent content in video games on players’ feelings of presence, involvement, physiological arousal, and aggression. The Journal of Communication, 57(3), 532–555. doi:10.1111/j.1460-2466.2007.00356.x James, W. (1884). What is an emotion? Mind, 9(34), 188–205. doi:10.1093/mind/os-IX.34.188 Jennett, C., Cox, A. L., Cairns, P., Dhoparee, S., Epps, A., & Tijs, T. (2008). Measuring and defining the experience of immersion in games. International Journal of Human-Computer Studies, 66, 641–661. doi:10.1016/j.ijhcs.2008.04.004 Jørgensen, K. (2006). On the functional aspects of computer game audio. In Audio Mostly: A Conference on Sound in Games. Juul, J. (2005). Half-real: Video games between real rules and fictional worlds. Cambridge, MA: MIT Press. Kuikkaniemi, K., & Kosunen, I. (2007). Progressive system architecture for building emotionally adaptive games. In BRAINPLAY ’07: Playing with Your Brain Workshop at ACE (Advances in Computer Entertainment) 2007. Lang, P. J. (1995). The emotion probe. Studies of motivation and attention. The American Psychologist, 50, 372–385. doi:10.1037/0003066X.50.5.372
Player-Game Interaction Through Affective Sound
Lang, P. J., Greenwald, M. K., Bradley, M. M., & Hamm, A. O. (1993). Looking at pictures: Affective, facial, visceral, and behavioral reactions. Psychophysiology, 30, 261–273. doi:10.1111/j.1469-8986.1993.tb03352.x
Mandryk, R. L. (2008). Physiological measures for game evaluation . In Isbister, K., & Schaffer, N. (Eds.), Game usability: Advice from the experts for advancing the player experience (pp. 207–235). Burlington, MA: Elsevier.
Lange, C. G. (1912). The mechanism of the emotions . In Rand, B. (Ed.), The classical psychologists (pp. 672–684). Boston: Houghton Mifflin.
Mandryk, R. L., & Atkins, M. S. (2007). A fuzzy physiological approach for continuously modeling emotion during interaction with play environments. International Journal of HumanComputer Studies, 65(4), 329–347. doi:10.1016/j. ijhcs.2006.11.011
Larsen, J. T., McGraw, A. P., & Cacioppo, J. T. (2001). Can people feel happy and sad at the same time? Journal of Personality and Social Psychology, 81(4), 684–696. doi:10.1037/00223514.81.4.684 Larsen, J. T., McGraw, A. P., Mellers, B. A., & Cacioppo, J. T. (2004). The agony of victory and thrill of defeat: Mixed emotional reactions to disappointing wins and relieving losses. Psychological Science, 15(5), 325–330. doi:10.1111/j.09567976.2004.00677.x Larsen, J. T., Norris, C. J., & Cacioppo, J. T. (2003). Effects of positive and negative affect on electromyographic activity over zygomaticus major and corrugator supercilii. Psychophysiology, 40, 776–785. doi:10.1111/1469-8986.00078 Laurel, B. (1991). Computers as theatre. Boston, MA: Addison-Wesley. LeDoux, J. (1998). The emotional brain. London: Orion Publishing Group. Lego rock band. [Video game], (2009). Harmonix (Developer), New York: MTV Games. Lombard, M., & Ditton, T. (1997). At the heart of it all: The concept of presence. Journal of Computer-Mediated Communication, 3(2). Lykken, D. T., & Venables, P. H. (1971). Direct measurement of skin conductance: A proposal for standardization. Psychophysiology, 8(5), 656– 672. doi:10.1111/j.1469-8986.1971.tb00501.x
Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage, 28(1), 175–184. doi:10.1016/j.neuroimage.2005.05.053 Miller, D. J., & Robertson, D. P. (2009). Using a games console in the primary classroom: Effects of ‘Brain Training’ programme on computation and self-esteem. British Journal of Educational Technology, 41(2), 242–255. doi:10.1111/j.14678535.2008.00918.x Moffat, D. (1980). Personality parameters and programs . In Trappl, R., & Petta, P. (Eds.), Creating personalities for synthetic actors (pp. 120–165). Berlin: Springer. Mullan, E. (2011). Physical modelling for sound synthesis . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Murphy, D., & Pitt, I. (2001). Spatial sound enhancing virtual storytelling. In Proceedings of the International Conference ICVS, Virtual Storytelling Using Virtual Reality Technologies for Storytelling (pp. 20-29) Berlin: Springer. Murray, J. H. (1995). Hamlet on the holodeck: The future of narrative in cyberspace. New York: Free Press.
281
Player-Game Interaction Through Affective Sound
Nacke, L., Lindley, C., & Stellmach, S. (2008). Log who’s playing: Psychophysiological game analysis made easy through event logging. In P. Markopoulos, B. Ruyter, W. IJsselsteijn, & D. Rowland (Eds.), Proceedings of Fun and Games, Second International Conference (pp. 150-157). Berlin: Springer. Nacke, L., & Lindley, C. A. (2008). Flow and immersion in first-person shooters: Measuring the player’s gameplay experience. In Proceedings of the 2008 Conference on Future Play: Research, Play, Share (pp. 81-88). New York: ACM.
Phase. [Video game], (2007). Harmonix Music Systems. Picard, R. W. (1997). Affective computing. Cambridge, MA: MIT Press. Pillay, H. K. (2002). An investigation of cognitive processes engaged in by recreational computer game players: Implications for skills of the future. Journal of Research on Technology in Education, 34(3), 336–350. Plutchik, R. (2001). The nature of emotions. American Scientist, 89(4), 344–350.
Nacke, L. E. (2009). Affective ludology: Scientific measurement of user experience in interactive entertainment. Unpublished doctoral dissertation. Blekinge Institute of Technology, Karlskrona, Sweden. Retrieved January 1, 2010, from http:// affectiveludology.acagamic.com.
Posner, J., Russell, J. A., Gerber, A., Gorman, D., Colibazzi, T., & Yu, S. (2009). The neurophysiological bases of emotion: An fMRI study of the affective circumplex using emotion-denoting words. Human Brain Mapping, 30(3), 883–895. doi:10.1002/hbm.20553
Nacke, L. E., Grimshaw, M. N., & Lindley, C. A. (2010). More than a feeling: Measurement of sonic user experience and psychophysiology in a firstperson shooter. Interacting with Computers, 22(5), 336–343. doi:10.1016/j.intcom.2010.04.005
Posner, J., Russell, J. A., & Peterson, B. S. (2005). The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and Psychopathology, 17, 715–734. doi:10.1017/ S0954579405050340
Nakamura, J., & Csíkszentmihályi, M. (2002). The concept of flow . In Snyder, C. R., & Lopez, S. J. (Eds.), Handbook of positive psychology (pp. 89–105). New York: Oxford University Press. Norman, D. A. (2004). Emotional design. New York: Basic Books. Öhman, A., Flykt, A., & Esteves, F. (2001). Emotion drives attention: Detecting the snake in the grass. Journal of Experimental Psychology. General, 130(3), 466–478. doi:10.1037/00963445.130.3.466 Panksepp, J. (2004). Affective neuroscience: the foundations of human and animal emotions. Oxford: Oxford University Press. PaRappa the rapper. [Video game], (1996). Sony Computer Entertainment.
282
Przybylski, A. K., Ryan, R. M., & Rigby, S. C. (2009). The motivating role of violence in video games. Personality and Social Psychology Bulletin, 35(2), 243–259. doi:10.1177/0146167208327216 Pulman, A. (2007). Investigating the potential of Nintendo DS Lite handheld gaming consoles and Dr. Kawashima’s Brain Training software as a study support tool in numeracy and mental arithmetic. JISC TechDis HEAT Scheme Round 1 Project Reports. Retrieved June 6, 2009, from http://www.techdis.ac.uk/index.php?p=2_1_7_9. Quilitch, H. R., & Risley, T. R. (1973). The effects of play materials on social play. Journal of Applied Behavior Analysis, 6(4), 573–578. doi:10.1901/ jaba.1973.6-573
Player-Game Interaction Through Affective Sound
Ravaja, N. (2004). Contributions of psychophysiology to media research: Review and recommendations. Media Psychology, 6(2), 193–235. doi:10.1207/s1532785xmep0602_4 Ravaja, N., Turpeinen, M., Saari, T., Puttonen, S., & Keltikangas-Järvinen, L. (2008). The psychophysiology of James Bond: Phasic emotional responses to violent video game events. Emotion (Washington, D.C.), 8(1), 114–120. doi:10.1037/1528-3542.8.1.114
Schachter, S. (1964). The interaction of cognitive and physiological determinants of emotional state . In Berkowitz, L. (Ed.), Advances in experimental social psychology (Vol. 1, pp. 49–80). New York: Academic Press. doi:10.1016/S00652601(08)60048-9 Schachter, S., & Singer, J. (1962). Cognitive, social, and physiological determinants of emotional state. Psychological Review, 69, 379–399. doi:10.1037/h0046234
Reiter, U. (2011). Perceived quality in game audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Schlosberg, H. (1952). The description of facial expressions in terms of two dimensions. Journal of Experimental Psychology, 44(4), 229–237. doi:10.1037/h0055778
Rez. [Video game], (2001). Sega (Developer, Dreamcast), Sony Computer Entertainment Europe (Developer, PlayStation 2).
Seah, M., & Cairns, P. (2008). From immersion to addiction in videogames. In [New York: ACM.]. Proceedings of BCS HCI, 2008, 55–63.
Rhodes, L. A., David, D. C., & Combs, A. L. (1988). Absorption and enjoyment of music. Perceptual and Motor Skills, 66, 737–738.
Shilling, R., Zyda, M., & Wardynski, E. C. (2002). Introducing emotion into military simulation and videogame design: America’s Army: Operations and VIRTE. In Conference GameOn 2002. Retrieved January 1, 2010, from http://gamepipe. usc.edu/~zyda/pubs/ShillingGameon2002.pdf.
Röber, N., & Masuch, M. (2005). Leaving the screen: New perspectives in audio-only gaming. In Proceedings of 11th International Conference on Auditory Display (ICAD). Rock band. [Video game], (2007). New York: MTV Games. Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. doi:10.1037/h0077714 Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110(1), 145–172. doi:10.1037/0033295X.110.1.145 Ryan, R., Rigby, C., & Przybylski, A. (2006). The motivational pull of video games: A self-determination theory approach. Motivation and Emotion, 30(4), 344–360. doi:10.1007/s11031-006-9051-8
SimTunes. [Video game], (1996). Maxis (Developer). SingStar. [Video game], (2004). Sony Computer Entertainment Europe (PlayStation 2 & 3). Slater, M. (2002). Presence and the sixth sense. Presence (Cambridge, Mass.), 11(4), 435–439. doi:10.1162/105474602760204327 Sweetser, P., & Wyeth, P. (2005). GameFlow: A model for evaluating player enjoyment in games. [CIE]. Computers in Entertainment, 3(3), 3. doi:10.1145/1077246.1077253 Tellegen, A., Watson, D., & Clark, A. L. (1999). On the dimensional and hierarchical structure of affect. Psychological Science, 10(4), 297–303. doi:10.1111/1467-9280.00157
283
Player-Game Interaction Through Affective Sound
Traxxpad. [Video game], (2007). Eidos Interactive (PlayStation Portable). von Ahn, L., & Dabbish, L. (2008). Designing games with a purpose. Communications of the ACM, 51(8), 58–67. doi:10.1145/1378704.1378719 Watson, D., & Tellegen, A. (1985). Toward a consensual structure of mood. Psychological Bulletin, 98(2), 219–235. doi:10.1037/0033-2909.98.2.219 Watson, D., Wiese, D., Vaidya, J., & Tellegen, A. (1999). The Two General Activation Systems of Affect: Structural findings, evolutionary considerations, and psychobiological evidence. Journal of Personality and Social Psychology, 76(5), 820–838. doi:10.1037/0022-3514.76.5.820 WiiMusic. [Video game], (2008). Kyoto: Nintendo. Wundt, W. (1896). Grundriss der Psychologie. Leipzig, Germany: Alfred Kröner Verlag. Zahorik, P., & Jenison, R. L. (1998). Presence as being-in-the-world. Presence (Cambridge, Mass.), 7(1), 78–89. doi:10.1162/105474698565541
ADDItIONAL rEADING Brewster, S. A., & Crease, M. G. (1999). Correcting menu usability problems with sound. Behaviour & Information Technology, 18(3), 165–177. doi:10.1080/014492999119066 DeRosa, P. (2007). Tracking player feedback to improve game design. Gamasutra. Retrieved May 21, 2009, from http://www.gamasutra.com/view/ feature/1546/tracking_player_feedback_to_.php. Edworthy, J. (1998). Does sound help us to work better with machines? A commentary on Rauterberg’s paper ‘About the importance of auditory alarms during the operation of a plant simulator’. Interacting with Computers, 10(4), 401–409.
284
Isbister, K., & Schaffer, N. (2008). Game usability: Advice from the experts for advancing the player experience. Burlington, MA: Morgan Kaufmann Publishers. James, W. (1994). The physical basis of emotion. Psychological Review, 101(2), 205–210. doi:10.1037/0033-295X.101.2.205 Jenkins, S., Brown, R., & Rutterford, N. (2009). Comparing thermographic, EEG, and subjective measures of affective experience during simulated product interactions. International Journal of Design, 3(2), 53–65. Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall. Koster, R. (2005). A theory of fun for game design. Phoenix, AZ: Paraglyph Press. Lang, P. J. (1994). The varieties of emotional experience: A meditation on James-Lange Theory. Psychological Review, 101, 211–221. doi:10.1037/0033-295X.101.2.211 Lazzaro, N. (2003). Why we play: Affect and the fun of games . In Jacko, J. A., & Sears, A. (Eds.), The human-computer interaction handbook: Fundamentals, evolving technologies, and emerging applications (pp. 679–700). New York: Lawrence Erlbaum. Mathiak, K., & Weber, R. (2006). Toward brain correlates of natural behavior: fMRI during violent video games. Human Brain Mapping, 27(12), 948–956. doi:10.1002/hbm.20234 Nacke, L. E., Drachen, A., Kuikkaniemi, K., Niesenhaus, J., Korhonen, H. J., Hoogen, W. M. d., et al. (2009). Playability and player experience research. In Proceedings of DiGRA 2009: Breaking New Ground: Innovation in Games, Play, Practice and Theory. Retrieved February 2, 2010, from http://www.digra.org/dl/db/09287.44170.pdf.
Player-Game Interaction Through Affective Sound
Nacke, L. E., & Lindley, C. A. (2009). Affective ludology, flow and immersion in a first- person shooter: Measurement of player experience. Loading... 3(5). Retrieved February 2, 2010, from http://journals.sfu.ca/loading/index.php/loading/ article/view/72. Röber, N. (2009). Interaction with sound: Explorations beyond the frontiers of 3D virtual auditory environments. Unpublished doctoral dissertation. Otto-von-Guericke University, Magdeburg. Wise, R. A. (2004). Dopamine, learning and motivation. Nature Reviews. Neuroscience, 5(6), 483–494. doi:10.1038/nrn1406 Wolfson, S., & Case, G. (2000). The effects of sound and colour on responses to a computer game. Interacting with Computers, 13(2), 183–192. doi:10.1016/S0953-5438(00)00037-0
KEY tErMs AND DEFINItONs Affective Gaming: The research area exploring game designs and mechanics that evoke player emotions and affects. Affective Sound: One auditory stimulus or multiple auditory stimuli (here in a gaming context) that evoke affect and emotion. Audio Entertainment: An activity that involves the manipulation or reception of one sonic entity or multiple sonic entities that permits the users to amuse themselves. Empirical Methods (Quantitative): The collection of quantitative data on which a theory can
be based or which facilitates reaching a scientific conclusion. Human-Centered Design: Also known as user-centered design (UCD) is a design philosophy that values the needs, wants, and limitations of users during each iterative step of the design process. Human-Computer Interaction (HCI): The research area studying how people interact with computational machines. Interaction Design: The creation and study of hardware devices and/or software that users can interact with. Psychophysiology: A branch of psychology concerned with the way psychological activities produce physiological responses. User Experience (UX): The field of study concerned with experience people have as a result of their interactions with products, technology and/or services. User Studies: Experimental studies involving human participants to evaluate the impact of software or hardware on users.
ENDNOtEs 1
2
For example, the NIA: http://www.ocztechnology.com/products/ocz_peripherals/ nia-neural_impulse_actuator. In terms of the creation of static sound objects, the future might see the sound designer being able to think sounds rather than having to design sounds.
285
Section 4
Technology
287
Chapter 14
Spatial Sound for Computer Games and Virtual Reality David Murphy University College Cork, Ireland Flaithrí Neff Limerick Institute of Technology, Ireland
AbstrAct In this chapter, we discuss spatial sound within the context of Virtual Reality and other synthetic environments such as computer games. We review current audio technologies, sound constraints within immersive multi-modal spaces, and future trends. The review process takes into consideration the wide-varying levels of audio sophistication in the gaming and VR industries, ranging from standard stereo output to Head Related Transfer Function implementation. The level of sophistication is determined mostly by hardware/system constraints (such as mobile devices or network limitations), however audio practitioners are developing novel and diverse methods to overcome many of these challenges. No matter what approach is employed, the primary objectives are very similar—the enhancement of the virtual scene and the enrichment of the user experience. We discuss how successful various audio technologies are in achieving these objectives, how they fall short, and how they are aligned to overcome these shortfalls in future implementations.
INtrODUctION In the past, sound has often been a secondary consideration in visually intensive environments, such as Virtual Reality (VR) systems and computer games. However, hearing and several other perceptual modalities are now considered equally relevant to the user-experience within DOI: 10.4018/978-1-61692-828-5.ch014
artificial and simulated domains. Linear soundscape composition, especially within computer games, has been facilitated with advancements in computer hardware and storage capacities. The sonic contribution of linear music to the virtual scene is extremely important, especially during gameplay, as it adds atmosphere, drama, emotion, and sometimes fantasy to the overall scene. However, interactive sounds and environmental acoustics are also important in enhancing the user-
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Spatial Sound for Computer Games and Virtual Reality
experience by immersing the user in the gameplay or VR scene. These types of sounds at present are still used mostly as effects rather than as authentic references to the virtual landscape. Accurate spatialization and real-time interactive sonic elements are essential if the user-experience is to be brought to the next level in future developments. Many of the recently developed audio tools are based on well-established theory, but remain limited in their implementation of true spatial sound by hardware constraints. Some of the theory that has been successfully implemented to varying degrees are techniques such as Interaural Time Difference (ITD), Interaural Intensity Difference (IID), the Doppler Shift, and Distance Attenuation. However, many more spatial attributes remain difficult to render in real-time, such as high fidelity simulation of ear geometrics and head/shoulder shadow. As mentioned earlier, some of the basic principles and techniques are now readily available to developers, but the underlying theory in this field indicates that for true spatial sound to be delivered to the listener, individualization1 of the listening experience is key to its success. Despite the advances in hardware in recent times, all of the current spatialization techniques used within gaming and VR environments remain focused on a generalized listening experience, and, as of yet, no commercially viable method has been successfully implemented that achieves true individualized spatial sound. The generation of individualized Head Related Transfer Function (HRTFs) for commercial dissemination is one of the remaining milestones to be affected by hardware limitations. Many in the industry argue that generic solutions are sufficient in achieving an accurate sense of immersion in virtual environments for most users. This argument may well indeed hold true, except for the fact that it cannot truly be tested until we can compare it to individualized spatial sound on a commercial scale. In addition to the limitations of implementing individualized spatial listening, there still
288
persists the problem of rendering accurate room and outdoor acoustics. This is, again, down to the constraints of available hardware resources. Rendering what may be considered a simple scene in the visual domain could easily entail several very complex models of various acoustically dependent elements. For example, an accurate rendition of the listener closing a room door would need to model the room itself, the door’s material and structure, and the change in acoustic space during the act of closing the door (from a coupled space to a singular space). In addition, other very important factors such as the material on the floor, walls and ceiling, and reflective and absorbing objects within the space also need to be modelled. All this, of course, in real-time! In spite of the current limitations to implementing commercial solutions for individualized spatialization, the industry is employing very interesting and creative workarounds. It has not only tackled the distribution of sound in virtual space intuitively, but it has also efficiently tackled problems relating to large audio data file sizes and bandwidth. Therefore, in this chapter, not only do we review sound spatialization techniques, but in tandem, we also discuss audio compression technology and how this theme goes hand-in-hand with spatialized sound for VR and computer games.
PErcEPtUAL PrOcEssING OF sOUND The cognitive mechanisms involved in the aural perception of space are highly evolved and complex, and can be categorized into two distinct groups–direct analyses of physical/sensory information, and higher cognitive influences (see Figure 1). Both groups play a crucial role in our everyday hearing processes. Even in cases of perceived silence, background noise stimulates auditory spatial awareness by communicating spatial information about the surrounding environment to the listener based on both acute sensory
Spatial Sound for Computer Games and Virtual Reality
Figure 1. A simplified outline of the human auditory processing system. Higher cognitive processes influence what we hear and how we hear the external world
detection and environmental experience (Ashmed & Wall, 1999). The physical interaction between moving sound-waves and static/moving objects in space are well understood (HRTF, ITD, ILD, Aural Occlusion, Doppler Shift and so on), but the higher cognitive mechanisms involved in audition have yet to be fully explored and explained. Many of these procedures are abstract concepts, such as the influence of cultural and personal experience on our sonic perception of space. This aspect of spatial sound design is vast and scientific methodology has yet to be fully developed for many of the issues. An understanding of the metamorphosis from raw auditory sensation to a spatial awareness that has meaning is perhaps the final chapter in creating true spatial sonic interfaces for VR and computer games. However, from a commercial standpoint, these issues are for future consideration given that convincing solutions for the “simpler” elements of spatial sound (that is, the physics of sound-wave propagation in the
external environment) remain to be solved within reasonable cost and performance (Ashmed & Wall, 1999). Music is perhaps the closest, and one of the oldest, methods of interacting with higher cognitive processes. Figure 2 illustrates a forecast for the evolution of spatial sound implementation in VR and game technology. In essence, raw auditory sensation of sound involves the physical transportation of both the attributes of the sonic event/source and the properties of an acoustic space (whether virtual or real-world) to the listener. Not only does the human auditory system detect and analyze the acoustic attributes of the sound source itself and localize it within a complex sonic soundscape, but it also determines the acoustic make-up of the space based on the difference between the direct sound and the physical effects the space has imposed on the sound as it travels through. An unfinished list of these alterations includes absorption, reflection, refraction, dispersion, and occlusion. But raw auditory sensation doesn’t end
289
Spatial Sound for Computer Games and Virtual Reality
Figure 2. Evolution of spatial sound implementation in VR and game technology
there. Before it enters the middle ear, the sound is filtered by the listener’s unique ear and head shape/size. These processes of filtering, along with internal higher cognitive processes, combine to infuse the processed sound with a spatial signature unique to the listener.
Human Auditory system–sound Localization Mechanisms The basic (physiological-based) understanding of sound localization in humans is one comprising two distinct categories–the first dealing with the horizontal plane (left, right, front, back) and the other with the vertical plane (above and below the head position). With regard to sound localization on the horizontal plane, a number of factors come into play. One of the most obvious mechanisms is ITD where a sound will reach one ear before it reaches the other (the speed of sound remains constant, however, apparent differences in arrival time results from phase differences between the
290
two signals). This mechanism is most useful for frequencies between 20 Hz to 2 kHz. In ITD, sound coming directly from the right will reach the right ear 0.6 msec before reaching the contralateral left (Begault, 1993). The accuracy of the human auditory system to locate sounds based on ITD is very impressive, with studies (Begault, 1993) showing a discrimination of an angle as small as 1º or 2º (which translates to a time difference of about 11 µsec!), depending on the position. Refer to Figure 3. However, what happens when sounds are continuous and we do not have the onset information such as in sudden, brief sounds? A slight variant of ITD is used instead by the human auditory system. In such cases, the phase discrepancy between the right and left ears is analyzed and the location of a sound source is determined. In other words, the peak of a wave-cycle reaches the right ear before it reaches the left. From this method ensues another problem however. For high frequencies (in humans this would be around
Spatial Sound for Computer Games and Virtual Reality
Figure 3. a) Sound-wave arriving from the right. b) Sound-wave information reaches the right ear 11µsec before reaching the left. This corresponds to a sound-source detection precision of as little as 2º
2 kHz to 20 kHz), the techniques involving ITD and its variant are not applicable. This is due to the nature of high frequencies, where the periods of each cycle are very short, meaning that many cycles have occurred within the distance between both ears. Therefore, when high continuous sounds are presented to the listener, phase discrepancy becomes unreliable. A very different analysis procedure is required to determine the location of continuous high-frequency sound source. To this end, IID is employed. IID is a technique employed by the auditory system that describes the difference in intensity levels between sound signals arriving at both ears. In effect, this procedure takes into account the interaction between the external sound-wave and the listener’s head and shoulders. As a solid body, the head and shoulders will reflect and absorb energy from a sound-wave as it travels past. In essence, this means that a sound traveling from the right will reach the right ear with a particular intensity level, but reaches the left ear with a lower intensity because of the head and shoulder interference. The combination of ITD and IID (known as the Duplex Theory of sound localization) means that the human hearing system is very efficient at
localizing sound on the horizontal plane. A very different approach is believed to take place where localization on the vertical plane is concerned. The comparison between the inputs of a signal (time, phase or intensity) reaching both ears is not effective when localizing sound on the vertical plane. It is easy to comprehend why: A signal coming from above or below the listener is likely to reach both ears approximately at the same time and the head and body shadows the input of both ears almost equally. An alternative, albeit technically more complicated, analysis is used. The process entails the filtering of the signal before entering the auditory canal due to the geometric features of the pinnae. (Refer to Figure 4 for a detailed view of a pinna). The folds of the pinnae reflect certain frequencies of an incoming signal and, as a sound source moves vertically, the combined direct sound and reflected sound changes dynamically. See Figure 4a showing the structure of a pinna and 4b illustrating the combination of direct and reflected signal paths before entering the auditory canal.
291
Spatial Sound for Computer Games and Virtual Reality
Figure 4. a) The human pinna structure, b) a sound impinging and interacting with the pinna
sOUND PrOPAGAtION IN rEAL sPAcE Synthetic environments in VR and computer games are very varied and sometimes very complex. Visual simulation of real-world scenes has come a long way in terms of player immersion and photo-realism. The more complex and detailed the visual representations become, the more elaborate and intricate the sonic attributes need to be in order to match user expectations. This can pose many problems for sound designers in terms of realistic acoustic simulation and sound source emission. Again, as with the theoretical understanding of psychoacoustics, environmental/room acoustics are well understood but the problem of implementing the theory in VR and computer games lies mainly with resource allocation and hardware constraints. The propagation of sound can vary dramatically from scene to scene or from level to level during an instance of gameplay. The player can be within a small room enclosure in one scene and change suddenly to a wide, open space in another. Some scenes take place in unusual environments, such as under water or in outer-space, where the
292
acoustics are very different from the typical airbased medium. Not only does the shape, size, and context of the space influence acoustics, but so too do static and moving objects within that space as well as materials of large surfaces such as tiled walls.
Indoor Acoustics In the most basic terms, sound in an empty room is both absorbed by and reflected off surfaces. The energy that is reradiated is dispersed around the room and the listener hears both direct and reflected sound as a result. Between each opposing wall, a standing frequency and its associated multiples resonate . The standing waves that are produced express the room’s resonant characteristic: there are multiple resonant frequencies in any one room. The acoustic result for the listener is a reinforcement of those resonant frequencies (by way of emphasized energy) when present in the soundscape. (Refer to Figure 5). Another basic feature of indoor acoustics is reflection, which is often applied broadly in VR and computer game environments using general delay and reverberation units. In Figure 6a, the
Spatial Sound for Computer Games and Virtual Reality
Figure 5. Standing waves between two parallel walls
most basic scenario is illustrated when a sound source is emitted next to a single-walled surface in free space. Sound is reflected off the surface and these reflections act as if they emanate from an exact copy of the original source but, instead, it is on the opposite side of the wall and the same distance from the wall as the original. Reflected sound is generally categorized according to the time it takes to reach the listener after the direct sound. Reflected sound reaching the listener approximately 50-80 ms after the direct sound is often referred to as early sound and can be indis-
tinguishable from direct sound. To the listener, early sounds increase the perceived presence of the direct sound and have a slight intensity drop compared to the original direct sound. Reflections arriving after early sounds are usually more numerous, have shorter time gaps between each instance and contribute to the creation of reverberation (see Figure 6b). Figure 6b illustrates a simple typical indoor reflected sound. However, imagine a listener positioned within a simple rectangular room. The sound source is now reflected off six surfaces (not
293
Spatial Sound for Computer Games and Virtual Reality
Figure 6. a) A sound is reflected from a surface in free space. The reflected sound-waves act like a copy of the original, but at the opposite side and the same distance from the surface. Based on image from Everest (2001), b) reverberation, comprised of direct sound, early reflections, and reverberation
to mention the listener). The concoction of direct sound and reflected sound, as well as taking the listener’s position in the room into account, quickly becomes more complex to simulate. Couple this with the fact that the process of surface reflection is a function of the sound’s frequency (wavelength), then a 100% accurate simulation would need to consider any frequency within the human hearing range (circa 16 Hz to 20000 Hz, or wavelengths from 21.5 m to 0.017 m). To deal with this large frequency range, sound designers and audio developers conform to modes whereby frequencies are arbitrarily grouped in relation to geometric acoustic data. In addition to room modes, reflections from static or moving objects also need to be considered, as well as the absorption coefficients of the various materials. All of these factors make spatial sound simulation of indoor environments resource-intensive and a complex component in overall computer game and VR design. At points where there is semi-occlusion, such as glass-free windows or a ¾ wall partition (see Figure 7), there is at least some transmission of sound through the barrier, some diffraction around the barrier, and some reflection around the barrier. These indirect paths to the listener on the
294
other side, as well as filtering based on the type of material the barrier consists of, impact on the final sound heard by the player/listener.
Outdoor Acoustics In an open environment, away from reflective buildings or nearby surfaces, the outdoor scene could be considered a free field in the most simplistic terms. In such circumstances simple rules can apply, such as the inverse square law, when dealing with sound pressure and intensity levels between sound source and listener. However, increasingly rich visual representations of outdoor environments within VR and computer games leads players to expect increasingly accurate acoustic simulation. Bearing this in mind, future implementations may need to consider factors such as atmospheric absorption, refraction, turbulence, diffraction, humidity, temperature, ground material, and the listener’s proximity to the ground (Fletcher, 2004). Sound attenuation outdoors is frequency dependent, with high frequencies loosing much of their energy due to the elements described above. An example in nature is lightening and thunder. Lightening occurring close to the listener results
Spatial Sound for Computer Games and Virtual Reality
Figure 7. Sound is transmitted to the listener on the other side of an occluding barrier via through and around the barrier, as well as reflected from surfaces such as a ceiling
in thunder that is quite rich in both high and low frequencies. However, if the listener were much further from the source, only the low frequencies would survive the distance. Most gameplay occurs on or close to the virtual ground where sound is reflected back to the listener at varying rates depending on the surface type. Some other factors that can influence outdoor sound propagation at ground level are things such as low-lying mist or fog. These conditions alter the sonic properties of the outdoor environment, with the effect of increasing the apparent loudness of distant sounds. This phenomenon is caused by a temperature irregularity, where cold air is closer to the ground than warm. This forces some sound-waves to bend back towards the ground at the point of temperature inversion (from cold to warm), instead of propagating upward and diminishing. See Figure 8a and 8b. However, an opposing factor in such an environment may also come into play where some sound may be attenuated or muffled due to humidity levels in the low-lying fog.
Furthermore, ground level gameplay may also incorporate wind elements that also have an effect on sound-wave direction. Sound may be carried toward the listener if they are positioned downwind or they may have difficulty hearing the sound in cases where they are upwind (see Figure 8c). Slopes and valleys also have an effect, as do ground surfaces themselves. For example, grass surfaces do not tend to affect frequencies below 100 Hz, but can seriously attenuate higher frequencies up to 40dB/km at 1 kHz (Fletcher, 2004). Similarly, trees and foliage scatter mid to high-end frequencies also, whilst their effect on low frequencies is minimal. In relation to super-hero characters capable of flying well above ground level, similar effects in terms of sound propagation would also occur. These effects are predominantly due to temperature change (getting colder with increased height) and as the character flies further upward, colder temperatures slow the speed of sound (the difference being as much as 10% between ground level and 10000 meters) (Fletcher, 2004). Other
295
Spatial Sound for Computer Games and Virtual Reality
Figure 8. a) Typical dispersion of sound from a ground-based source showing the effects of warmer air close to the ground a) and colder air close to the ground b), also, wind can affect sound energy and dispersion c). Based on an image from Everest (1997)
factors affecting the sonic environment of our flying super-hero may also need to be considered, such as turbulence, which itself can generate low frequency sound in addition to scattering high frequencies.
296
VIrtUAL sPAtIAL sOUND IMPLEMENtAtION—INtrODUctION The level of fidelity in the implementation of sound localization varies considerably from system to system. Most of the time, constraints such as network bandwidth, processing speeds, storage restrictions, and memory considerations
Spatial Sound for Computer Games and Virtual Reality
limit the flexibility required by sound designers. Such restrictions have led scientists to find imaginative alternatives of rendering spatial audio in synthetic environments–using strategies such as compact file sizes, low bit rates, client-based synthetic sound rendering and so forth, without impacting on perceived sound quality. In this section, we will review examples of some of the popular approaches to rendering spatial sound in games and VR. In general, left-right source positioning is relatively easy to achieve on both headphone-based and speaker-based implementations. However, front-back sound positioning is often less successful, especially when reproduced on headphones. The primary reason for this is due to the inherent requirement of the listener to perform head movement when determining the location of sound in front or back scenarios. A surround-speaker setup in this respect does not have the same level of difficulty due to the arrangement of discrete channels placed in the physical world. In more sophisticated headphone scenarios, some VR systems incorporate head-tracking to allow for the integration of head orientation and movement in the virtual scene. Typically, for headphone output, however, most implementations for computer games and/ or mobile devices use generic filtering processes. These are especially vulnerable to difficulties when attempting to externalize the sound in frontback scenarios. The use of the term “externalize” in this context relates to the impression the listener has of the sound being some distance out from (either in front of, behind, above, or below) their own listening position. Front-back spatial sound through headphones usually results in sound sources being heard inside the head rather than being virtually projected out–the impression of depth would be realized were the sound sources virtually projected out. The result arising from this situation is frontback confusion, something that can negatively impact on a listener’s experience during gameplay or VR navigation, where characters or objects are
heard momentarily in a different location from their visual origin. In addition to front-back confusion, vertical sound localization remains a significant challenge in VR and computer games. Until a commercially viable method for obtaining individualized HRTF measurements and accurate real-time processing of head-tracking is achieved, these issues will continue to task developers who will have to rely on simpler approaches. The current method for obtaining an individual’s HRTF is by measuring the right and left Head-Related Impulse Response (HRIR), which can then be convolved with any mono signal source. Essentially, the HRTF is the Fourier Transform of the HRIR. The HRTF measurements are usually undertaken in an anechoic chamber with specialized equipment. Most HRTF implementations in computer games and VR systems are derived from generic HRTF databases developed from a specialized human head manikin or derived from an average set of HRTFs taken from a particular population. The pinnae and head dimensions of the manikin head devices are procured from statistical data of average human biometrics. Of course, the disadvantage to this approach is that the player’s ear and head shape may be very different from that of the manikin, which results not only in the lack of a true, individualized spatial experience, but in the distortion of spatial listening cues. However, recent research into novel ways of acquiring individualized characteristics of the human hearing system are being explored, paving the way forward to exciting developments in spatial audio for VR and computer games (Satoshi & Suzuki, 2008) (Otani & Ise, 2003).
Ambisonics Ambisonics is a spatial audio system that was developed in the 1970s by Michael Gerzon and often touted as being superior in terms of spatial reproduction when compared to commercial domestic multichannel formats. It is a system that
297
Spatial Sound for Computer Games and Virtual Reality
includes audio capturing techniques, the representation or encoding of the signal as a soundfield (referred to as ‘B-format’), and the decoding of the signal during reproduction. Particular microphone patterns and positions, or a specially built microphone with four specifically arranged capsules (Soundfield Microphone), are required to capture the signal in a compatible way. The result is 4 signals conventionally referred to as X, Z, Y, W in first order Ambisonics: • • • •
X = Front minus Back Z = Up minus Down Y = Left minus Right W = A non-directional reference signal. Front + Up + Left + Back + Down + Right.
Although considered one of the most advanced and realistic spatial reproduction systems available, the Ambisonics format has suffered from commercial setbacks. These can be attributed to bad timing in entering the marketplace, misleading associations with ill-fated quadraphonic techniques, and the lack of uptake by key music industry players during its development. However, recently the computer game industry and virtual reality researchers have stimulated renewed interest in Ambisonics.
real-time Processing The real-time simulation of physical environments in computer games and VR systems requires a significant degree of real-time processing. Because of the complexity of this task, a certain amount of latency can be expected but limits must be placed on the extent of the latency so as not to thwart the user-experience. With current hardware, latency issues remain a concern, especially when attempting to reproduce a real-world scenario that requires rapid and consistent refresh rates of visual, aural, haptic, and motional events. Even within a sound-only game, the continuous motion of a sound source around the listener requires a
298
significant amount of computation if its trajectory is to be smooth and uninterrupted. A balance needs to be struck and compromises are necessary. It is broadly accepted that update rates of 60 Hz and a total delay time of up to 50 msec are acceptable for acoustic virtual sound (Vorländer, 2008). In many instances, a perceptual evaluation referred to as the Just-Noticeable-Difference (JND) is a useful instrument. This psychoacoustic evaluation procedure can be used for pitch differentiation or temporal differentiation, to name just a few. In terms of spatial sound, it is useful to know the accuracy of the human hearing system in differentiating between the same sound-source at different degrees in space. With this information, previous calculations that were not perceptually detected by the listener can now be disregarded. Some useful JND values in this regard allows for the reduction of redundant data, and resources can be used for other processes such as sound propagation. The performance of human listeners in point-to-point localization on the azimuth is most accurate in the frontal direction at 1º. On the left/right axis, the performance diminishes significantly to 10º. The rear direction is 5º and the JND on the vertical plane is the least accurate at 20º (Vorländer, 2008). Therefore, a computer game or VR system can effectively render spatial sound with diminished precision at certain locations around the listener. Real-time binaural synthesis (the generation of spatial sound for headphone reproduction), is also a very resource demanding process, and interesting techniques are used to address the problem. One of the methods used is to preprocess binaural sound for reproduction when the listener reaches particular coordinates and orientation in the virtual room. A best-matched filter is applied as the listener’s position changes. The key to this approach is to make the changeover from filter to filter as inaudible as possible. Fast convolution is required no matter which real-time binaural synthesis approach is taken. With many channels being processed in parallel, and with the continu-
Spatial Sound for Computer Games and Virtual Reality
ous updating of listener position, convolution must be rapid and dynamic with no perceived artifacts during the many transitions. Multi-processing systems have aided in achieving more realistic rendering, as have techniques such as optimized fading between impulse response updates. Head-tracking technology also introduces an amount of latency into a real-time processing system. Usually, the technologies employed are optical, inertial, mechanical, ultrasonic, and electromagnetic. However, new developments in terms of eye tracking and image recognition are being explored to reduce the amount of hardware encumbrance placed on the user. These techniques are also finding interesting applications in the computer game industry by taking advantage of the integrated webcam facilities built-in to most modern consumer computers.
Developing spatial sound Environments So far in this chapter we have explored some of the key concepts in spatial sound and how they might apply to computer games and VR environments. The next section examines a number of key implementations of spatial sound. Where possible the emphasis is upon standardized implementations, such as MPEG-4, Java 3D, and OpenSL-ES, which are very stable, unlikely to change in the near-term, and have also informed the development of other implementations. There are also other implementations that are introduced by virtue of their prevalence within the industry.
Java 3D Sound API Although the Java 3D API specification was originally intended for 3D graphics it has proved to be a suitable vehicle for the rendering of three-dimensional sound. It makes sense from a developer’s point of view to keep all of the threedimensional functionality within the same API set.
The implementation of spatial sound in the Java 3D specification employs a hierarchy of nodes that comprise: • • • •
Sound node PointSound node ConeSound node BackgroundSound node
There are also two Java classes for defining the aural attributes of an environment. These are the Soundscape Node and the AuralAttributes Object. Each node is defined in a SceneGraph. The SceneGraph is a collection of nodes that constitute the three-dimensional environment. The application reads the nodes and their associated parameters from the SceneGraph and constructs the three-dimensional world with that information. The BackgroundSound node is not a spatial sound rendering node. Its purpose is to facilitate the use of ambient background sounds within the Java application. The audio input to this node is normally a mono or stereo audio file.
Spatial Sound in Java 3D The Sound Node itself does not address the spatial rendering of the sound source: this is accomplished in one of two ways. Firstly, by explicitly constructing the spatial attributes of the sound using either the PointSound Node or the ConeSound Node, or secondly, by configuring the acoustical characteristic of an environment using the Soundscape Node. The first technique, constructing the spatial attributes, is dependent upon the type of sound source that is being used. If the sound source is a uniformly radiating sound (positional sound) then the PointSound node should be used, otherwise the developer should use the ConeSound node (directional sound). Distance attenuation, as implemented in the Java 3D specification, employs distance attenuation arrays, which modify the amplitude of po-
299
Spatial Sound for Computer Games and Virtual Reality
sitional and directional sound sources, and also applies angular attenuation modifications to the amplitude of directional sound sources. When a sound object is created it has to be assigned an initialGain value: if this field is empty then the value defaults to 1.0 (where 1.0 is the maximum gain and 0.0 is the equivalent of a gain value of –60dB). In relation to the generic Sound Node, no distance attenuation is applied. This would seem to be a shortcoming in the specification as distance attenuation is one of the strongest cues in establishing depth perception for sound and should be accessible from the generic object. If the developer did not want the sound to have distance attenuation then he could simply leave the distance field blank. The SoundScape node (refer to Figure 9) configures the acoustical properties of the listener’s environment. An unlimited number of SoundScape nodes can be contained within a scene. The defined SoundScape node region determines which sets of acoustical properties are to be used. As a result of being able to specify several SoundScape node regions, one can generate a number of aural environments within the scene. For instance, within the one scene there could be three rooms, each with a different acoustical signature. Alternatively, with more detailed scene description, one could set up a number of acoustical regions within a single room using a number of SoundScape nodes. The acoustical properties, that is, reverberation and atmospheric attributes of the SoundScape node, are specified in the AuralAttributes Object. The AuralAttributes Object is a component object of the SoundScape Node. It specifies the following aural properties: reverberation, Doppler effect, distance frequency filtering, and atmospheric rolloff. Table 1 contains a list of the parameters and their default values for when an AuralAttribute Object is first constructed. The AuralAttributes node describes reverberation with three components: delay time, reflection coefficient, and feedback loop. Delay time is used to calculate the amount of time
300
taken for the sound to reach the listener having undergone one reflection. This component is either set explicitly or implied by the bounding regions of the volume. Note that the bounding region is not necessarily the same as the region specified for the Soundscape Node. Delay time is measured in milliseconds. The reflection coefficient is used to determine the attenuation factor for the sound. The reflection coefficient(s) represent the reflective or absorption properties of the environment. A value of 1.0 represents an un-attenuated sound and a value of 0.0 represents a sound that has been fully absorbed. These coefficients are applied as a uniform attenuation across the spectrum. This is not a very refined scheme as most reflective/absorptive materials alter the spectrum of the sound in a non-uniform manner. Using the specification’s present implementation of reflection/absorption, there would be very little timbral difference between a sound that has been reflected by plaster and one that has been reflected by a metallic object. The final component, feedback loop, specifies the number of times a sound is reflected or the order of reflection. If the feedback loop has a value of 0.0 no reverberation is performed, if it is set to one then the listener hears an echo and, if it is set to –1.0 the reverberation will continue until the amplitude of the signal dies to –60dB. This is known as effective zero in the Java 3D specification. Effective zero relies upon a –6dB drop in gain for every doubling of distance (inverse square law). Values between 0.0 and 1.0 refer to the number of iterations of the loop. The parameter attributeGain is used to alter the speed of sound in order to mimic the effects of atmospheric change. The default value is 0.344 meters per millisecond and this refers to the speed of sound at room temperature. This value is then altered by the gain scale value specified by the developer. A value greater than 1.0 will increase the speed of sound and conversely a value less than 1.0 will decrease the speed of sound.
Spatial Sound for Computer Games and Virtual Reality
Figure 9. Sound node hierarchy in the Java 3D specification
console) . In relation to the current mass-market, 5.1 channel sound is the standard deployment for multichannel sound output. This consists of a left, center, right, left surround, right surround (5) and Low-Frequency Enhancement (.1). 7.1 sound is also popular and 10.2 (see figure 10) is the emerging next generation of surround speaker setup. Work is also being done on 22.2-surround sound. Games generally follow either a 2.0 (stereo) or 5.1-channel standard. 5.1-surround sound is effective to a point. However, it is essentially a coarse representation of the spatial sound field. Stepping between the speakers, as sound moves dynamically from frontal speakers to surrounds, can be quite audible. A natural sound (that emanates directly from an acoustic sound source and not via speakers) retains its timbral characteristics despite colouration from
The Doppler effect is achieved by taking the value of the speed of sound, multiplying it by the velocity scale factor (which is the change of speed relative to the listener’s position) and then proportionally applying the frequency scale values. If the velocity scale factor is 0.0, no Doppler effect is processed. For further information on the Java 3D API the reader is referred to Murphy (1999a), Murphy(1999b), Murphy and Rumsey (2001), and Murphy and Pitt (2001).
XNA/XACT Implementation of Surround Sound XNA/XACT is a collection of game development technologies produced by Microsoft for the various Windows platforms (including the Xbox Table 1. AuralAttribute Object Properties Parameter
Default Value
attributeGain
1.0
rolloff
1.0
reflectionCoeff
0.0
reverbDelay
0.0
reverbBounds
null
reverbOrder
0
distanceFilter
null
frequencyScaleFactor
1.0
velocityScaleFactor
1
301
Spatial Sound for Computer Games and Virtual Reality
room acoustics and HRTFs. As it dynamically moves from front to surround positions around the listener’s head, the sound is filtered in a number of ways. However, the listener can distinctly recognize that the sound is from the same source as it retains its fundamental timbral characteristic. With exact matching speakers, therefore, one would expect the same would hold true with 5.1-surround sound, since their discrete position in space is simply like a natural sound source traveling around the listener’s head. Unfortunately, this is not the case as 5.1-surround is a system that represents too few steps in the sound field. Therefore, sound designers need to compensate for timbral instabilities by equalizing the signal as it reaches the surround speakers. This is a difficult task, as sound designers cannot predict the type of room the listener will have their gaming system in.
MPEG-4 and Spatial Sound MPEG (Motion Picture Experts Group) is a working group of an ISO/IEC subcommittee that generates multimedia standards. In particular, MPEG defines the syntax of low-bitrate video and audio bit streams, and the operation of codecs. MPEG has been working for a number of years on the design of a complete multimedia toolkit, which can generate platform-independent, dynamic, interactive media representations. This has become the MPEG-4 standard. In this standard, the various media are encoded separately allowing for better compression, the inclusion of behavioral characteristics, and userlevel interaction. Instead of creating a new Scene Description Language (SDL) the MPEG organization decided to incorporate Virtual Reality Modeling Language (VRML). VRML’s scene description capabilities are not very sophisticated so MPEG extended the functionality of the existing VRML nodes and incorporated new nodes with advanced features. Support for advanced sound within the scene graph was one of the areas developed further
302
by MPEG. The Sound Node of MPEG-4 is quite similar to that of the VRML/Java 3D Sound Node. However, MPEG-4 contains a sound spatialization paradigm called Environmental Spatialisation of Audio (ESA). ESA can be divided into a Physical Model and a Perceptual Model. Physical Model (see Table 2): This enables the rendering of source directivity, detailed room acoustics and acoustic properties for geometrical objects (walls, furniture, and so on.). Auralization, another term for realisation of the physical model, has been defined as: “creating a virtual auditory environment that models an existent or non-existent space” (Väänänen, 1998). Three Nodes have been devised to facilitate the physical modelling approach. These are AcousticScene, AcousticMaterial and DirectiveSound. Briefly, DirectiveSound is a replacement for the simpler Sound Node. It defines a directional sound source whose attenuation can be described in terms of distance and air absorption. The direction of the source is not limited to a directional vector or a particular geometrical shape. The velocity of the sound can be controlled via the speedOfSound field: this can be used, for example, to create an instance of the Doppler effect. Attenuation over the distance field can now drop to -60dB and can be frequency-dependent if the useAirabs field is set to TRUE. The spatialize field behaves the same as its counterpart in the Sound Node but with the addition that any reflections associated with this source are also spatially rendered. The roomEffect field controls the enabling of ESA and, if set to TRUE, the source is spatialized according to the environment’s acoustic parameters. AcousticScene is a node for generating the acoustic properties of an environment. It simply establishes the volume and size of the environment and assigns it a reverberation time. The auralization of the environment involves the processing of information from the AcousticScene and the
Spatial Sound for Computer Games and Virtual Reality
Figure 10. 10.2 enhanced surround sound. The ITU-R BS 775-1 standard for 5.1-surround is L, C, R, LS, RS and one Sub. 10.2-surround expands this by adding an extra Sub, left and right elevated speakers, back surround, wide left and wide right speakers. L = Left. C = Center. R = Right. LW = Left Wide. RW = Right Wide. LH = Left Height. RH = Right Height. L Sub = Subwoofer Left Side. R Sub = Right Subwoofer Right Side. LS = Left Surround. RS = Right Surround. BS = Back Surround
acoustic properties of surfaces as declared in AcousticMaterial. Perceptual Model (see Table 3): Version 1 of the MPEG-4 standard only rendered spatial sound based upon physical attributes, that is, geometric properties. However, virtual and synthetic worlds are not constrained by physical laws and properties: it became necessary to introduce a perceptual equivalence of the physical model. To this end, two new nodes were added in version 2 of MPEG4: PerceptualScene and PerceptualSound. Rault, Emerit, Warusfel, and Jot (1998) highlighted the merits of the perceptual approach in a document to the MPEG group:
A first advantage we see in this concept is that both the design and the control of MPEG4 Scenes is more intuitive compared to the physical approach, and manipulating these parameters does not require any particular skills in Acoustics. A second advantage is that one can easily attribute individual acoustical properties for each sound present in a given virtual scene. The principle elements of the perceptual model are drawn from research undertaken by IRCAM’s Spatialisateur project, and additional features are derived from Creative Lab’s Environmental Audio Extensions (EAX) and Microsoft’s DirectSound
303
Spatial Sound for Computer Games and Virtual Reality
Table 2. MPEG-4, V2. Advanced Audio Nodes–Physical Nodes Node AcousticScene
Field params 3DVolumeCenter 3DVolumeSize reverbtime
AcousticMaterial
reffunc transfunc ambientIntensity diffuseColor emissiveColor shininess specularColor transparency
DirectiveSound
direction intensity directivity speedOfSound distance location source useAirabs spatialize roomEffect
API (Burgess, 1992). Using the perceptual model, each sound source’s spatial attributes can be manipulated individually, or an acoustic-preset can be designed for the environment. Fields such as Presence, Brilliance, and Heavyness are used to configure the room/object’s acoustic characteristics. In all, there are 9 fields used to describe, in non-technical terms, the spatial characteristics of a room or a sound object. These fields have been derived from psycho-acoustic experiments carried out at IRCAM (the Spatialisateur project). Of the 9 subjective fields, 6 describe perceptual attributes of the environment and 3 are perceived characteristics of the source. Table 1 lists the parameters for both Environment and Source.
304
It can also be seen from Table 4 that the last 3 fields of the Environment section and all of the Source fields are dependent upon the position, orientation, and directivity of the source. The validity of this approach could be questioned in terms of its subjectivity, for example, the choice of words such as Warmth and Brilliance. However, the use of subjective terms as acoustic parameters, in this context, is to facilitate the non-specialist to compose a soundscape with convincing acoustic properties. This effectively opens up the complex world of acoustics to the non-specialist. For further information on MPEG spatial sound the reader is referred to Murphy (1999a), Murphy (1999b), Murphy and Rumsey (2001), and Murphy and Pitt (2001).
Spatial Sound for Computer Games and Virtual Reality
Table 3. MPEG-4 version 2. Advanced Audio Nodes–PERCEPTUAL NODES Node PerceptualScene
Field AddChildren RemoveChildren BboxCenter UseAirabs UseAttenuation RefDistance Latereverberance Heavyness Liveness RoomPresence RunningReverberance RoomEnvelopment Presence Warmth Brilliance Fmin Fmax
PerceptualSound
direction intensity directivity omniDirectivity speedOfSound distance location relParams directFilter inputFilter useAirabs useAttenuation spatialize roomEffect source
some considerations Head Tracking is an important tool in a dynamic virtual environment. Apart from the obvious advantages it brings to the visual presentation it is also important in the spatial rendering of sound.
According to Burgess: “The lack of these [headrelated] cues can make spatial sound difficult to use. We tend to move our heads to get a better sense of a sound’s direction. This ‘closed-loop’ cue can be added to a spatial sound system through the use of a head-tracking device” (Burgess, 1992).
305
Spatial Sound for Computer Games and Virtual Reality
Table 4. Perceptual Fields for MPEG-4 Spatial Audio Environment Fields
Source Fields
LateReverberance
Presence
Heavyness
Warmth
Liveness
Brilliance
RoomPresence RunningReverberance RoomEnvelopment
Recent research has shown that the use of Head Tracking reduces front-back reversals by a ratio of 2:1 (Blauert, 1997) and there is evidence that it assists in the externalisation of sources that would otherwise be located ‘inside-the-head’. Another area where Head Tracking is helpful is in the simulation and control of the Doppler Effect and to resolve source-listener movement ambiguities. Blauert (1997) terms this “persistence”: In connection with spatial hearing, the term ‘persistence’ refers to the fact that the position of the auditory event can only change with limited rapidity. Under appropriate conditions the position of the auditory event exhibits a time lag with respect to a change in position of the sound source. Persistence must always be taken into consideration when using sound sources that change position rapidly. (p. 47)
Virtual spatial Audio On Mobile systems Although not widely known, there have been a number of 3D audio solutions available for mobile devices for a number of years. Despite this, manufacturers have been quite slow in implementing Operating System (OS) and hardware support for these audio APIs or only offer limited support on a select number of devices. As a consequence, third-party developers cannot rely on 3D audio effects for their mainstream applications as it is
306
too unreliable in a programming environment, requiring compatibility with a wide variety of device models. Therefore, there seems to be an unusual scenario whereby 3D audio on mobile phones continues to be of extreme interest to interface researchers and API developers, but the practical implementation of the technology is stagnant in terms of mainstream consumerism. This is set to change, however, with the emergence of faster wireless networks, more powerful mobile operating systems and the establishment of digital media broadcast standards for handheld devices (such as DVB-H).
Jsr-234 JSR-234, or the Advanced Multimedia Supplements (AMMS), is an API initiated by Nokia and developed under the Java Community Process. It allows for more control over multimedia elements, including the creation of 3D audio environments. It is an optional supplement to the Mobile Media API (MMAPI, JSR-135) designed for J2ME/ CLDC mobile devices. Refer to Sun (2010), Li (2005) and Goyal (2006) for more information on CLDC and MIDP specifications. MMAPI is itself an optional low-footprint API, implemented in MIDP 2.0, allowing developers to create Java applications to playback and capture audio and video in a variety of multimedia file formats, perform camera operations, stream radio over a network, generate musical tones and so forth. A large number of mobile phone devices
Spatial Sound for Computer Games and Virtual Reality
support MMAPI, but this is reduced when it comes to fully implementing AMMS. It is important to point out that these multimedia APIs are not part of the actual mobile phone OS: they do, however, enable developers to create third-party applications in the form of a MIDlet (.jar and .jad files). Therefore, whilst developers cannot create 3D audio menu items for the phone’s OS, it provides an excellent platform for testing the psychophysical considerations of such an interface in preparation for future spatial audio implements in the core OS. In JSR-234, a source-medium-receiver model (Paavola & Page, 2005) is employed (see Figure 11). This approach goes beyond the capabilities of MMAPI. Using MMAPI alone, sourced audio data can be started, stopped, paused, and primitively controlled. A summary of the MMAPI process is as follows: the abstract class, DataSource, locates the audio content; a Manager class creates the appropriate Player interface; and the Player in turn incorporates control methods for rendering and primitively controlling the audio content (see Figure 12). Control methods include VolumeControl, ToneControl, PitchControl, StopTimeControl, RecordControl and RateControl. No spatial
attributes are capable if using MMAPI alone and MMAPI’s Manager class cannot be expanded to include this service. In order to fulfil the requirements of spatial audio rendering, AMMS was created to extend MMAPI. Of interest to us in the AMMS package are GlobalManager, Spectator, and Module. The GlobalManager class is similar in action to the MMAPI Manager class, but is also very different in what interfaces it creates. Therefore, it does not extend or replace the Manager class and the MMAPI Manager is still required to create Players (see Figure 13). The GlobalManager handles the creation of Module interfaces and allows access to the Spectator class. The Modules implemented via the GlobalManager are the EffectsModule and the SoundSource3D module. In contrast to MMAPI, AMMS allows several Players to be assigned to one audio effect (that is, mixing several Player instances). This allows common effects to be applied to all Players and this helps to optimize effects on limited resources. The types of control effects possible using the EffectsModule interface are equalization, panning, virtualization, reverb, and chorus/flanger (Paavola & Page, 2005).
Figure 11. An overview of JSR-234
307
Spatial Sound for Computer Games and Virtual Reality
Figure 12. Basic MMAPI process. The Manager class bridges between the DataSource class and Player interface. Primitive controls are implemented via the Player’s methods
The SoundSource3D module performs control effects specifically relating to positioning sources in the virtual space. These include sound source directivity (sound cones), distance attenuation, Doppler shift, location control, specifying the dimensions of an extended sound source (not a point source), obstruction of the sound traveling from source to listener, and reverberation effects (JSR-234 Group, 2005; Paavola & Page, 2005). The Spectator class represents the listener in the virtual space (JSR-234 Group, 2005). As with other sound sources in the virtual space, the listener/spectator must also possess spatial cues. The controls associated with the Spectator class are location, Doppler, and orientation.
OpensL Es OpenSL ES is an open standard API for interactive spatial audio for embedded systems developed by the Khronos Group (2009). Target devices for OpenSL ES are basic mobile phones, smart
308
phones, PDAs and mobile digital music players. It is a C-language audio API with some overlap with OpenMAX AL 1.0, a multimedia/recording API for embedded systems from the same group. Like the relationship between MMAPI and AMMS, OpenMAX AL has basic audio capabilities, with OpenSL ES providing advanced 3D audio and sonic effects, and both share many common methods. However, unlike MMAPI and AMMS, both OpenMAX AL and OpenSL ES are entirely independent and each can perform as a standalone API on target devices. Three Profiles are present in the OpenSL ES implementation: Phone, Music, and Game (see Table 5). Different audio capabilities exist for different profiles but manufacturers are free to implement two or all three profiles on their devices. All features of a given profile have to be implemented by the manufacturer in order to ensure compatibility. Therefore, a manufacturer implementing the Phone profile, but wanting to
Spatial Sound for Computer Games and Virtual Reality
Figure 13. MMAPI Manager creates Players. AMMS GlobalManager creates Modules that Players hook into
incorporate elements of the Game profile, must fully implement the Game profile also.
cONcLUsIONs AND FUtUrE DEVELOPMENts Spatial presentation of sound is a very important feature of VR and becoming more important in computer games. Without spatial sound, virtual environments would lack the complex qualities required for a convincing immersive experience. The development of synthetic sound spatialization techniques for immersive environments has lagged behind comparible visual technology. However, there now exists a number of options for developers who wish to incorporate spatial sound into computer games and VR. Java 3D provides a collection of tools that enable developers to integrate spatial sound in a virtual environment. While the API set allows for the construction of reasonable spatial sound experiences, there are a number of shortcomings which hinder the advancement of Java 3D
for spatial sound environments: most notable of these are the lack of support for HRTFs and issues around real-time processing in Java. MPEG-4 version 2 is where the main breakthroughs in the integration of spatial sound in an international standard have been achieved. To cater for both ESA and absolute sound rendering, a dual approach has been developed. This dual approach of both physical and perceptual descriptions of spatial sound seems to encapsulate all of the necessary attributes for a cogent spatial experience. XNA opens up spatial sound for game development. While not an open platform, unlike OpenAL, the development environment is accessible and fits neatly into the XNA architecture of computer game development. The future holds some remarkable potential for spatial sound in computer games: the designer’s imagination being the only limiting factor. Imagine being immersed completely in sound, where gameplay relies heavily on the sense of hearing. Walking down a dark corridor in a first-person shooter, hearing your footsteps below you, en-
309
Spatial Sound for Computer Games and Virtual Reality
Table 5. The table shows audio-only information for the profiles associated with OpenSL ES. MIDI specifications have been excluded from the table. Adapted from Khronos Group (2009) API Feature
Phone
Music
Game
PLAYBACK/PROCESSING CONTROLS Play multiple sounds at same time
YES
YES
YES
Playback mono & stereo
YES
YES
YES
Basic playback controls
YES
YES
YES
End-end looping
YES
YES
YES
Partial looping
NO
NO
YES
Set playback position
YES
YES
YES
Position-related notifications
YES
YES
YES
Sound prioritization
YES
YES
YES
Audio to several concurrent outputs
YES
NO
NO
Volume Control
YES
YES
YES
Audio balance & pan control
NO
YES
YES
Metadata retrieval
NO
YES
YES
Modify playback rate & pitch
NO
NO
YES
Play sounds from secondary store
YES
YES
YES
Buffer/queues
NO
NO
YES
Query capabilities of implementation
YES
YES
YES
Enumerate audio I/O devices
YES
YES
YES
Query audio I/O device capabilities
YES
YES
YES
NO
YES
YES
CAPABIILITY QUERIES
EFFECTS Stereo widening Virtualization
NO
YES
YES
Reverberation
NO
YES
YES
Equalization
NO
YES
YES
Effect control
NO
YES
YES
3D AUDIO Positional 3D audio
NO
NO
YES
Sound cones
NO
NO
YES
Multiple distance models
NO
NO
YES
Source & listener velocity
NO
NO
YES
Source & listener orientation
NO
NO
YES
3D sound grouping
NO
NO
YES
Simultaneous render of multiple 3D controls
NO
NO
YES
vironmental sounds coming from air ducts and doorways, suddenly, you hear a noise behind and to your left, you turn to be confronted by a
310
ghastly beast who wants you for lunch. You fire your weapon, the piercing impact of the firing mechanism on your ears, the sound reverberat-
Spatial Sound for Computer Games and Virtual Reality
ing and interacting with the room, shell casings tinkling on the floor, and the creature falls to the ground with a resonating thud. The application of spatial sound will advance and drive the narrative and drama in a computer game, and ultimately lead to an immersive user experience.
rEFErENcEs Ashmed, D. H., & Wall, R. S. (1999). Auditory perception of walls via spectral variations in the ambient sound field. Journal of Rehabilitation Research and Development, 36(4). Begault, D. R., & Wenzel, E. M. (1993). Headphone localization of speech. Human Factors, 35, 361–376. Blauert, J. (1997). Spatial hearing: The psychophysics of human sound localization (rev. ed). Cambridge, MA: MIT Press. Burgess, D. (1992). Techniques for low cost spatial audio. In Proceedings of the 5th annual ACM symposium on User interface software and technology. Duda, R. O., Algazi, V. R., & Thompson, D. M. (2002). The use of head-and-torso models for improved spatial sound synthesis. In Proceedings of the 113th Audio Engineering Society Convention. Everest, F. A. (1997). Sound studio construction on a budget. City, ST: McGraw-Hill. Everest, F. A. (2001). Master handbook of acoustics. City, ST: McGraw-Hill. Fletcher, T. D. R. N. H. (2004). Principles of vibration and sound (2nd ed.). New York. Goyal, V. (2006). Pro Java ME MMAPI: Mobile media API for Java Micro Edition. City, CA: Apress Press Inc. JSR-234 Group. (2005). Advanced multimedia supplements API for JavaTM2 Micro Edition. Nokia Corporation.
Khronos Group. (2009). OpenSL ES specification. The Khronos Group. Li, S., & Knudsen, J. (2005). Beginning J2METM platform: From novice to professional (3rd ed.). City, CA: Apress Press Inc. Mark, F., Bear, B. W. C., & Paradiso, M. A. (2007). Neuroscience —Exploring the brain (3rd ed.). City, ST/Country: Lippincott Williams & Wilkins. Murphy, D. (1999). A review of spatial sound in the Java 3D API specification. Institute of Sound Recording, University of Surrey. Murphy, D. (1999). Spatial sound description in virtual environments. In Proceedings of the Cambridge Music Processing Colloquium. Murphy, D., & Pitt, I. (2001). Spatial sound enhancing virtual story telling. Springer Lecture Notes In Computer Science, 2197. Murphy, D., & Rumsey, F. (2001). A scalable spatial sound rendering system. In Proceedings of the 110th AES Convention. Otani, M., & Ise, S. (2003). A fast calculation method of the head-related transfer functions for multiple source points based on the boundary element method. Acoustical Science and Technology, 24(5), 259–266. doi:10.1250/ast.24.259 Paavola, M. K. E., & Page, J. (2005). 3D audio for mobile devices via Java. In Proceedings of the AES 118th Convention. Rault, J. B., Emerit, M., Warusfel, O., & Jot, J. M. (1998). Audio rendering of virtual room acoustics and perceptual description of the auditory scene. TCI/SC29/WG11. Satoshi Yairi, Y. I., & Suzuki, Y. (2008). Individualization of Head-Related Transfer Functions based on subjective evaluation. In Proceedings of the 14th International Conference on Auditory Displays.
311
Spatial Sound for Computer Games and Virtual Reality
Sun Microsystems. (2010). Java ME API. Retrieved February 4, 2010, from http://java.sun. com/javame/reference/apis.jsp. Väänänen, R. (1998). Verification model of advanced BIFS (systems VM 4.0 subpart 2). ISO/ IEC JTCI/SC29/WG11. Vorländer, M. (2008). Auralization—Fundamentals of acoustics, modelling, simulation, algorithms and acoustic virtual reality (1st ed.). Berlin: Springer.
KEY tErMs AND DEFINItIONs API (Application Programming Interface): A mechanism in software engineering to allow two separate pieces of software to integrate or interface, for instance a library of functionality being integrated into an application. Doppler Effect: The apparent shift in pitch/ frequency of a sound due to motion parallax.
312
Fast Convolution: Convolution is an operation applied to two signals, the most common operation being multiplication. A Fast Convolution is an optimised circular convolution, typically used in conjunction with a Fast Fourier Transform. Free-Field: An open space/environment which does not interact with the sound source (as opposed to ‘room interaction’ in a closed space. Head Tracking: The tracking of position and orientation of head movement by an external sensor in VR or computer games. HRTF: Head Related Tranfer Function. Occlusion: An obstacle that blocks the effective transmission of sound by absorbing energy, or reflecting sound waves.
ENDNOtE 1
Individualization refers to a listening experience that is tailored to and unique to the individual listener.
313
Chapter 15
Behaviour, Structure and Causality in Procedural Audio1 Andy Farnell Computer Scientist, UK
AbstrAct This chapter expands some key concepts and problems in the emerging field of procedural audio. In addition to historical, philosophical, commercial, and technological themes, it examines why procedural audio differs from earlier “computer music” and “computer sound”. In particular, the extension of sound synthesis to the general case of ordinary, everyday objects in a virtual world, and the requirements for interactivity and real-time computation are examined.
INtrODUctION Procedural audio is sound as a process. Instead of thinking about nouns, we think of verbs, or the actions that cause sounds. Procedural sound is also a structural and reasoned, rather than purely sensible, approach to sound, in which behaviour supplants identity and we ask not what sound is, but what it does. This chapter follows the publication of Designing Sound (Farnell, 2008) in which I lay foundations for procedural audio, particularly its use in real-time virtual worlds. Here I would like to talk about the ideas of behaviour, causality DOI: 10.4018/978-1-61692-828-5.ch015
and structure which are implied by the identification of sound as a branch of dynamics requiring energetic change. The task at hand is producing sound for film, computer games, or other interactive entertainment applications. This task requires creativity, knowledge and understanding. Creativity in traditional sound design is directed at capturing, curating and matching audio data to depicted circumstance, whereas creation of procedural sound is from first principles: so it is truly design as opposed to selection. Insights into process are therefore at the root of the work. The end product is sound as code, or sound objects (Polotti, Papetti, Rocchesso, & Delle, 2001). One must create new digital signal process-
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Behaviour, Structure and Causality in Procedural Audio
ing (DSP) algorithms, or a set of parameters for existing sound objects, rather than actual sound files. The product is “potential sound”, rather than particular audio data as it would appear at the digital analog convertor (DAC). The complete package of a sound object is DSP, control, and encapsulation code compatible with a set of parameters to be supplied in the future. Objects may be instantiated and animated at some later time, in any contrived circumstance. We say this media has deferred form, exhibiting desired behaviour according to supplied runtime parameters.
Procedural Audio as a Design Philosophy It’s worth adding that the above goal is not the exclusive end. Merely adopting a procedural way of thinking about sound can inform and enhance traditional design approaches. For this reason I teach a sound design syllabus based upon this premise. Students begin the first semester by considering the physical processes inherent in all sound sources They then progress to devising models and appropriate synthetic methods in an implementation language. This deeper understanding enables choices and creativity through structural metaphor and simile rather than only empirical (surface) features. For example, we study the design of bells with reference to design traditions on shape and material properties. Combining metallurgy and geometry (which leads to modal interpretation) with the study of basilar physiology and sensory psychoacoustics of Barkhausen/Zwicker (1961) and Plomp and Mimpen (1968), we arrive at a firm understanding of why some bells are sonorous, dissonant, hollow, or foreboding. In another exercise we decompose the sound of fire to arrive at the components corresponding to physical processes of combustion. This results in models of fire with surface parameters like crackles, hiss, and roar, which can be abstracted further into scalable fire models for burning trees or liquid fires. Hybridi-
314
sation of models at an abstract level is possible, perhaps to combine fluid models like bubbling mud with fire to create lava flows. If procedural audio implies the sound is based on process, then behavioural audio implies that the internal model and supplied parameters reflect the behaviour of the target object in some way. What is behaviour? Behaviour relates environmental stimulus to what is observed in time. It is a strange concept because it must both change and, to some degree, remain fixed in time. A purely signal interpretation might define behaviour as repeatable changes in one or more output features in light of one or more supplied, dependent variables with little restraint on the whole cavalcade of qualifiers, discrete, continuous, linear, non-linear, causal, or non-causal that might apply. A useful behaviour might be supposed to be time invariant on a medium to long scale (in the order of seconds). In other words, behaviour that changes too rapidly ceases to be observed as behaviour. When talking about behaviour we might incorporate statefulness into sound objects, for example; whether a container is full or empty will cause it to respond differently to a collision. Within a larger context behaviour implies a narrative. A drink can or tumbleweed in the street is destined to roll around as it is blown in the wind. Of course it can do many things, but rolling around is its “script”, its purpose in life. One way of understanding the narrative is a context that places particular emphasis on certain features. In real life the function of glass windows is to keep out the wind while allowing light in. In a first person shooter their primary concern (understood tragic destiny) is to be shot at and broken. As we will examine later, this change of narrative focus (and thus reality) is both a dynamic force in interactive sound, and the source of an error in some programming approaches that assume the need for a literal, uncoloured interpretation of reality.
Behaviour, Structure and Causality in Procedural Audio
Must Procedural Audio be real-time? Loose definitions of procedural audio might omit the requirement for real-time computation because behaviour and process can be described and computed in an offline system. In some ways the boundary between what is offline and real-time is only a matter of degree, the amount of CPU power available. In other ways there is a stark discontinuity in the temporal design of some models that means they cannot cross the divide. An example is found in the Csound implementation layer, based originally on an offline musical score model in which event durations are given a priori. Although advances in machine technology now allow real-time deployment, sound objects must be structurally redesigned to incorporate “on the fly” MIDI events. The underlying Csound model requires different operations for each approach, which cannot be mixed. One must therefore approach a Csound design with this in mind at the start. For film, pre-recorded radio and other static media, parameters are applied up-front, then the sound is recorded. Computer sound has been used in this way for decades, and indeed this inherits from the MUSIC-V lineage, which includes Csound. For computer sound, this rendering predates similar CGI concepts by a long time. A slightly different use is also possible. Where the synthesis parameters accompany a visual timeline, such as in animation, and where sufficient offline computing power is available, we have a powerful editing system in which changes to the visual elements are automatically reflected in the audio track, which is rendered along with the images. This changes the temporal status of what is traditionally post-production, because sounds don’t have to be created after the picture but are designed as part of it, or even beforehand. Here is a subtle change in the use of the technology rather than the deeper nature of the technology itself. To distinguish these (past, present, and future)
understandings of production I use the terms prior, concurrent and deferred computation. Those coming from a computer animation background, such as in the work of Zheng and James at Cornell, and Moss and Yee at North Carolina, seem to embrace the concurrent model in which realistic sounds are a side effect of realistic physics based animation. However, a fuller definition of procedural and behavioural audio would add the requirement of real-time computation and imply something further about delivery as well as generation. Procedural audio differs from longer established computer music in several ways. It tackles the general case of sound synthesis, not only those sounds which are musically pleasing. It is real-time and interactive. And it anticipates deferred form (and unexpected interactions) as a philosophy. A concurrent approach does not handle non-diegetic sources, high energy events or, at present, more complex systems than rigid body collisions and simple fluid dynamics. We put the full power of procedural audio to use in the context of computer games when we extend it to the general case all natural sounds in an unpredictable environment. It is for live installment (arts and theatre), virtual reality, or games that sound design with procedural sound objects comes into its own because we can make parametric decisions in real-time, which means we can create sound for situations that have not been planned in advance. This way of thinking is a departure from contemporary sound design á la Hollywood. In some ways it returns us to the era before Foley, of the live theatre sound effects artist employing props: a reactive, adaptive and intelligent sound maker. So, as for nomenclature, we might better employ the established expression “synthesised sound” for prior media like music and film, while reserving “procedural audio” for deferred applications with a real-time element in which sound is created in the moment, as required. In prior media, the use of the word “sound” implies it has already been realised (concretised) as a signal, while “procedural audio” is still code (potential sound). You
315
Behaviour, Structure and Causality in Procedural Audio
can’t just play procedural audio; you have to run it, in a context that makes sense.
Procedural Audio is More than Physical Modelling Since it is often conflated precisely with the words synthesis or physical modelling, let’s be clear that procedural audio isn’t necessarily either of these subjects. Physical modelling (insomuch as it refers to finite elements, tensor matrices, mass spring damper systems, waveguides and other implementations) and synthesis (as it pertains to known parametric methods such as additive, granular or non-linear), are pieces of a larger picture. This picture also comprises psychology (perceptual psychoacoustics and auditory cognition), philosophy and epistemology, and object domain knowledge. It is in their combination that we attempt to reduce a sound to its behavioural realisation. The culmination of these disciplines is to program physically informed synthetic or composite source sound objects with behaviourally informed responses to input conditions. This involves some measure of simplification and necessary understanding and is not an attempt to construct a oneto-one model of a system. So, by programming, I do not mean only models, synthesis methods, and implementations but, rather, the total set of principles required to get good behaviour, effective over a range of use cases. In this respect, procedural audio intersects with some of the interesting stuff of orthodox of sound design, the tricks, shortcuts, perceptual devices, simplifications, and deceptions of psychology and storytelling.
bEHAVIOUr AND ANALYsIs Starting with a concept, analysis is the beginning of the real work. It seeds mental prototypical designs which in turn lead to synthesis. As described in Designing Sound, Computer Sound Design and
316
other examinations of design, this continues in a “dialectic” testing the new synthesis against the internal model and external reference, and iterating over successive improvements. There are several flavours of analysis that can be employed, such as perceptual analysis and signal (correlation, spectrum, transient) analysis. The one we will consider in this chapter is physical component analysis. Without an interpretive intermediate stage, analysis-synthesis is merely transcoding, in the sense of a Shannon-Weaver signal. In other words we could resynthesise a sound transparently from an analysis and gain nothing (except data reduction useful for transmission). A proper analysis yields behavioural features and concepts that can be meaningfully manipulated to give new sound transformations. We get proper handles on things. An analysis that makes sense will allow us to creatively intervene in a reasoned way because we are able to abstract something useful from the example sounds. This is dealt with in several places in Representations of Musical Signals (De Poli et al 1991). An analysis works best when carried out over a long time, or with numerous examples of the target rather than with a single snapshot. Like tomography, it is an attempt to reveal what is hidden within by integrating views. Imagine a wrapped present that you shake, weigh and tap, trying to guess what is inside. A stimulus provokes a response. Consistent responses, mapping domain to range, constitute a behaviour and, as we think of the object as a “black box”, by observing the common responses to lots of environmental stimuli we thus reveal something about the object. In the case of an idiophonic object, say a snare drum, the response may be immediate and consistent, while the increasing complexity of machines and living organisms may require deeper analysis and yield less predictable behaviour. At the limits of behaviourism, we can either open the box to reverse engineer the system or use signal (superficial) analysis with a phenomenal synthetic method. There may also be accidental characteristics,
Behaviour, Structure and Causality in Procedural Audio
untypical of the class and peculiar to the target, like an intermittent rattle due to a broken part. All other environmental variables should remain constant. But this ideal of independent parameters is rare in the real world, so it’s often necessary to play with several data sets. The environment we work in is the physical world. This is another way of saying that physics is an objective representation of reality with which we can describe sound signals (and their effect) produced by objects. We describe the perceived signal in terms of the things that produce the signal, object and stimuli. Clearly a knowledge of sound physics is useful, particularly mechanics, solid vibrational physics, fluid dynamics, and gas phase acoustics given in standard textbooks (Elmore & Heald, 1969), (Subrahmanyan & Lal, 1974).
MODELs, MEtHODs AND IMPLEMENtAtIONs Having obtained an analysis, an understanding of the target, we wish to produce a design which simulates its salient features and behaviours. We break the design of sonic objects into three conceptual strata: a model, which is an abstract representation of desired behaviours; one or more methods, which allow us to realise audio signals from a behavioural description; and an implementation, which provides the vehicle for delivery. These decouplings, which I observed during many years of synthetic sound design, allow modular software engineering practice in which each layer may be replaced whilst leaving the others intact. It becomes possible to construct a sounding object, and then completely replace its sound synthesis method, perhaps swapping a subtractive for FM method, or re-implementing the same model and methods on a different DSP platform.
Models We could reduce running water to a model of overlapping exponentially frequency-modulated sinusoids corresponding to Helmholtz oscillators caused by entrained air cavities. We can reduce fire to a componentised model of crackles, hisses, and low frequency noise corresponding to physical features like fuel fragmentation, gaseous expansion, and turbulence during combustion. At the surface, it is only necessary to express the model in clear words, or in simple formulae with correctly ordered relationships between features. Notice that the first case can be taken superficially, only as a surface (signal) analysis, without asking why the water comprises patterns of exponentially rising sine bursts. We could just make note of their average pitch, duration, overlap, and density arriving at a broad model involving spectral centroid and flux. Or, we can delve deeper into the causes of features. The latter approach brings physical behaviour into the picture and can offer meaningful performance parameters while the former can only ever yield a brittle, phenomenological model. The term performance is borrowed from computer music where we would probably be dealing with a musical instrument model. For the general case of synthesis, a model is “performed” by actions from its environment or energy changes within itself. For a babbling brook we might say it’s performance parameters are speed of flow, and depth of the water. What we desire from a procedural model is that it presents a parametric (performance) interface, with the smallest set of useful control signals corresponding to forces or values in the environment relevant to the narrative. We say the model captures the sound source well when this correspondence is extensive and efficient. These might be fixed, like the height of a waterfall, continuously variable values, such as speed of fluid flow or temperature, or a discontinuous or ordinal feature set such as a texture tag (wood, metal, stone) taken from the object’s properties
317
Behaviour, Structure and Causality in Procedural Audio
or the surrounding world (such as footsteps over changing ground materials). Making this mapping well behaved can be a difficult programming task that requires an artistic, human contribution. A model must also explain/manifest the relationship between its own parts or subsystems. For example, a helicopter is a complex machine with many parametrically coupled sources, an engine, gear box, exhaust system, and two propellers. Their individual performance parameters are linked by a higher set of control equations that accord with proper flight manoeuvres. The ideal set of parametric controls, from the viewpoint of game audio design, would be the parameters used to actually fly the vehicle (plus an observation point vector). Consequently, procedural audio relies on much tighter coupling with physics computation, at least if efficiency gains are to made by avoiding duplicate calculations.
Methods Methods are the way we concretise models. They connect the abstract model to the solid implementation as DSP. Methods are drawn from an extensive set of known techniques that provide particular spectral or time domain signal behaviour from an assumed input parameter set, such as additive Fourier, subtractive, non-linear waveshaping, FEM (lumped mass), waveguide, MSD (elastic), FM, AM, granular, wavelet, wavetable, fractal, Walsh and others. These are essentially mathematical formulations, functions of time, systems of linear and non-linear equations from which we obtain audio signals. They fall loosely into two classes: parametric (signal) and source (physical). In the former, the relationship between model and implementation must be described by the method. In the latter, the mapping of model to output signal is the method. The latter promises unparalleled realism at the expense of computation cost and lack of abstract control, while the former trades detailed realism for cost savings
318
and versatile control. Pragmatic implementations often harness both classes. Synthetic methods are not limited to the production of signals at the audio rate, they apply also to control systems above the audio DSP level, or any other model feature that can be computed. Examples are the rolling drinks can in Designing Sound which has an inertial control model distinct from the sound production, or fragmentation models of Zheng and James which concentrate on the control level of particle debris and leave the actual sound generation to precomputed proxies. In some racing games it is the sonic behaviour of the engine and vehicle as an overall system that could be described as procedural, while the synthesis itself is at best a naïve granular, and at worst carefully hand blended sample loops played back under the control of the vehicle performance model. As a systematic example let’s return again to the helicopter, whose engine is modelled with a parabolic pulse source and waveguide network, gearbox by a closed form additive expression, and the blades by a subtractive (noise-based) method. But their inter-relation for given flight manoeuvres may be calculated by differential equations at the control rate. These controls are of course independent of the DSP used to compute the audio signatures, which could be replaced by cheaper methods for reduced level of audio detail (LOAD) as the vehicle recedes into the distance. Yet overall, the model still retains its behaviour as the detail of the sound (and cost of production) diminishes. Dynamic adjustment or replacement of model components or methods to obtain chaning level of detail is one very powerful aspect of properly stratified procedural audio.
Implementations Actual implementation of audio signals requires knowledge of practical computer engineering and is not a subject to ponder deeply here. We shall just glance over it, because it changes from case
Behaviour, Structure and Causality in Procedural Audio
to case, and for a high level treatment we can take exchangeable implementations as a given. It’s worth noting that there is an axiomatic set of a few operators including such “atoms” as the unit delay (Z -1) and arithmetic operators (plus, minus, multiply, divide) from which all of DSP can be obtained. In native implementations, many constructions can be built upon this foundation, such as the sin() and cos() functions approximated by polynomials (truncated Taylor/Maclaurin), familiar band limited waveforms built on known identities, continued fractions or closed forms, and standard filter blocks built on established topologies (biquad and so forth). Recently attention has been drawn back to computational physical methods (specifically finite difference methods for one and two dimensional wave equations) which have come from Bilbao (2009) and, which combined with the earlier waveguide work of Smith (1992) and trends toward matrix processors, offer hope for less expensive versions of direct computation for plates, surfaces, tubes and volumetric extents. In practice, as you develop procedural audio systems, current technologies and runtime constraints will determine implementation for you. Designing Sound gives some coverage, both as proof of methodology and an example set for implementation using the Pure Data language. Many of these have been translated to SuperCollider by Dan Stowell at Queen Mary University, London. We must also remain mindful of low level programatic and architectural issues, such as why to avoid memory lookups, data bus throughput, stack frame overheads, the danger of denormals, and even simple things like watching for divide by zero, though these are addressed at the C code, compiler or assembly level and depend on the underlying hardware too. The choice of a language and its features is another subject. Visual dataflow is certainly an advantage for productivity, readability and reuse. It has a natural congruence with the thought process during design. It has a wonderful granularity combining simplistic concretes alongside abstract
concepts. Yet in the form I use it, as Pure Data, it is far from perfect. Nonetheless it probably represents the best hope currently available for designing procedural audio due to the ease with which large programs can be constructed. A nice overview of language design for the particular case of musical synthesis is by Günter Geiger (2005). The general case of synthesis (everyday sound effects created using procedural sonic objects) is not addressed by Geiger’s study. While Pure Data seems ideal for design, deployment may be much better served by those languages having a “built from the ground up” client-server model like Chuck or SuperCollider. Beyond a comparison of algorithmic complexity in relative technologies, such as Avanzini (2001), a research opportunity exists here for cost metrics of implementations on modern vector processors, such as CUDA on Nvidia GPU (see Angus and Caunce 2010). Briefly, the qualities to be considered are trans-structural concurrency (parallelisation) for which functional LISP type evaluations seem suited (though suffering from overheads associated with functional languages), plus those that handle instance concurrency (source polyphony), like SuperCollider or Chuck. Other issues are core language size (for example, MPEG4SAOL Csound or Lua Vessel), scalability, whether objects should be pre-compiled or whether an interpreter or server/client model is best, and licensing for commercial development (for example, Pure Data or Max/MSP).
cOst AND VALUE Of those design qualities on orthogonal axes, cheap, good or fast, the ground in procedural audio is similarly divided between a mixture of expedient design, aesthetic realism, efficient code, and object flexibility. Flexibility in terms of software engineering benefits is worth investing in because reuse, polymorphism, and free asset generation are amazing advantages available
319
Behaviour, Structure and Causality in Procedural Audio
for computer game development. Paramount throughout decades of development has been computational efficiency. Progress here provides a one-off, lump investment that pays back dividends forever. Any breakthrough or leap of insight into cheaply obtaining a sonic behaviour can reduce the cost of all subsequent models, or make possible those hitherto infeasible. Regardless, there is an inexorable movement towards procedural audio anyway, even if it is not much recognised by the industry at present. Even if it were not for the myriad advantages of procedural audio there is a steady trend in the direction of run-time rather than prior processing, simply because the march of technology makes it so (Whitmore, 2009). The software engineering principle of continuous integration and revision can take advantage of deferred form to avoid expensive mistakes being fixed into code by premature decisions, good for game development which already stretches the limits of Agile development. The real-time requirement has always made insightful analysis and thoughtful construction part of a constant technical and creative quest. Something worth noting about spacial and temporal structure here, is the need for reusable componentised objects and connecting flow, not only because this is the natural computational model but also because it is a natural, human-interfacing model that satisfies the need for expedient design and robust, incrementally improvable code. Dataflow programming already goes some way towards this, on the surface at least, languages such as Max/MSP and Pure Data are ideal during model design. Finally, the main goal is aesthetic and some ability to discern satisfactory results (good ears) is needed. This is possibly one of the hardest qualities to value and one of the most expensive activities is the refinement, testing, and evaluation of aesthetic quality since it is often subjective (without metrics or value boundaries) to the point of being arbitrary. Like most art forms, conformity to genre norms is an overwhelming consideration under marketing
320
pressures, and there are precious few who dare to take a chance on artistically progressive projects, indeed the games industry is remarkably conservative in this regard. However, Reiter and Weitzel (2007) attempt some metrics and Reiter (2011) discusses elsewhere in this book the inter-modal effects within an interactive multi-media system.
FrOM stAtE tO PrOcEss The frequently examined question of whether game sound designers should also be programmers is perhaps misleading. They are programmers to the extent that the tools (often externally supplied closed-source “middleware solutions”) fail to allow them to express and design sonic concepts with procedural reasoning. Rather, the questions are: •
•
•
To what extent can audio programmers get away with not understanding sound design? In other words, are fixed tools developed in isolation (brick wall model) of the creative design phase counterproductive, and if so how can continuous integration and closer working be facilitated by defining new roles in game audio, or consolidating old ones? To what extent have we got stuck in a paradigm that is restricting progress in sound? Can middleware products offering an impoverished event-asset models grow to include new approaches, or must they be replaced? What new tools and skills will be needed for next generation game sound, and to what extent can these be standardised in industry and taught in higher education ?
Some of these issues have been touched upon at the AES35 and AES128 (in panel W10 for instance) and by Nicholas Fournell and myself at other venues such as Brighton Develop Conferences.
Behaviour, Structure and Causality in Procedural Audio
LANGUAGE AND PrOcEss In a command like: play (scrape.wav) there is a definite article, a specific, singular sound which exists as a data file or area of memory containing a digitised recording. It is an allusion to an atomic event. In this unqualified case, the event time is implicitly now. We could choose to “bind” the sound to an event, deferring it until some condition is met, by saying something like: if (moves(gate)) play(scrape. wav) In the case of a single, one-shot sound, the game logic is unaware of the sound length. The relationship between the audio engine and game logic is stateless, so any timing relationship between the sound and visual elements must be predefined or contrived at runtime. As a further refinement, a looped sound can be playing or stopped. In such a stateful system, an indeterminate future endpoint is explicitly given by game logic: meanwhile, a looped sample plays repeatedly. Like MIDI, this leaves the possibility for a stuck sound without a safety timeout. For decades, this has been the dominant model of game audio. Everything of significance can be reduced to a single occurrence, an event, or to a simple set of states. A multi-state example might be an elevator that can be “starting”, “moving”, “stopping”, or “stopped”. We say this is an event-based sound system, and that each event is bound to a sound asset, or to a simple postprocessed control of that asset. State transitions are themselves events. Essentially, the entire game audio system can be reduced to a matrix of event-resource pairings. Since the turn of this century, more sophisticated approaches have appeared. Multi-state, multi-sample sources borrow from music technol-
ogy the principles needed to create sampling plus synthesis (S+S), wavetable and velocity-mapped acoustic instruments. This migration of audio technology, from music synthesisers to game audio during the late 1990s and early 21st Century, can be seen as the renaissance of synthesis. The “dark age” of sampling following an abandonment of game audio synthesis chips like the SID, AY8910, YM2151 (Collins, 2008, pp. 31-60)), is over now that native synthesis capabilities are more than adequate for realistic sound. As mentioned earlier, it is perfectly possible to approach the procedural concept with an implementation using only pre-digitised waves. The line between sampling and synthesis has never been a clear one and at present it is this area, using hybrid sampling-synthesis, that holds the most promise for transitional generation game audio technology during the changeover from data driven to fully procedural systems. Practical transitional systems employed at Electronic Arts and Disney are variations on granular or wavetable S+S with Steve Rockett’s Blackrock team attempting some ambitious work on vehicle engines. Work done by Kenneth Young and others on the Media Molecule title “Little Big Planet” shows many of the structural and behavioural hallmarks of procedural audio as applied to combinational sound that can be configured for user generated content (UGC). While such endeavours stop short of fully procedural audio, they are a valuable step in the right direction because they establish conceptual foundations necessary for proper structural approaches and they are properly tested in publicly distributed titles. Along with dynamic reconfigurability, which has a bearing on the effectiveness of user generated content, most important of these approaches are the transition from state/event to continuous parameterisation and the recognition of behavioural audio objects with multi-dimensional parameters. To take the above example of the screeching gate, we might now express the condition as:
321
Behaviour, Structure and Causality in Procedural Audio
if (change(angle(gate))) playgrains(scrape(angle(gate))) The “event” that the audio engine main loop is waiting for is a change in the angle of the gate because a player has moved it. From here, we can dare to abandon entirely the notion of state and pretend that all objects in the world, subject to their position in a masking/priority table, are reactive and continuously parameterised. Useful mask topologies are functions for spatial and temporal relevance (player focus), geometric distance from known listener actors (machine listening objects and human players), occlusion based on raycasting, and unimpeded work (power delta as a measure of how loud the sound could be in free space). The “sound” is now no longer simply a file whose duration should be matched to the rotation time of a fixed animation: It is a command to a granular synthesiser that creates screeching sounds by picking grains from a file and replaying them. It is a sound made as a process or function. The domain of the function comprises two variables, time and angle, while the range is a time-variant audio signal. Further, the function could have hidden internal mappings, for example, the rate of change of angle (angular acceleration) could be used to select different grains and timings that stimulate a stick-slip friction model to obtain an impulse signature such as outlined by Rocchesso, Avanzini, Rath, Bresin, and Serafin (2004) or in Farnell (2008). As sound code rather than audio file, advantages are better cohesion and decoupling since no subsystem need remain aware of the state of any other (an advantage for network replication too). A disadvantage is that more data must pass between the underlying world model (geometry and physics engine) and the audio system. In use, the first obvious advantage is that the player can exert continuous control over the object movement and hear corresponding sounds. A disadvantage is that instead of a linear, sample replay routine
322
we now need a computationally more expensive grain/segment-based player (for the hybrid case) or a control and DSP synthesis layer for a fully procedural implementation. A further developmental disadvantage in moving beyond the limitation of events to thinking about actions and energy flows is that previously neat boundaries become blurred. A discrete time interpretation of a game makes simple, easy sense to level designers or script writers. In practice, behavioural sound objects coupled to the underlying physics engine (for example, see Mullan, 2009) and game logic may need to be presented such that the boundaries of sounds may remain, in surface appearance at least, based on indivisible events.
bEHAVIOUrAL AUDIO Another advantage of procedural sound is that it can easily obviate the old problem of repetitive sound sources and the need to fake variations. If a procedural sound object is repeatedly triggered with precisely the same parameters, will it not make an identical sound? A comparison I have been fond of is between sound as a film, which can reveal behaviour, and sound as a photograph which cannot. It accords with a dynamic interpretation of sound where causality is significant. One phenomenon here is iterated complexity, familiar to some as the “butterfly effect” (viz. Lorenz (1993)). Natural variation can be introduced when a function of several variables is sensitive to initial conditions. Although of course all computations are deterministic, with small deliberate variances even systems of low complexity yield large variation of output in a short time. A weakness in the film analogy is that a film also has a script. Watching the same film twice does not alter the story. Films are also snapshots (singular experiences, regardless of warped time, in the sense of Tarkovsky) in (and of) time, despite an extra dimension to play with. The analogy of a theatrical play with real actors is an improvement,
Behaviour, Structure and Causality in Procedural Audio
exposing the concept of deferred form which, I believe, is a vital facet of procedural audio. In this analogy the performance is “played to the audience”, perhaps a little different on each night. It’s adaptive and plastic. So, my favorite procedural sound metaphor is a football game. It encapsulates the essence of football, the players, the rules, and the ball, the spectators and, at each juncture, it is entirely causal: Every move sets up every subsequent move, yet its unfolding form and precise outcome are unpredictable even though the essential experience of it remains football. Natural sounds are like this. Given identical contexts, no two snapping twigs would produce the same sound pattern. Game sound is itself, like a game.
behavioural breadth The football game is bounded by immutable constraints, such as the players not leaving the field and flying about. It is a fair model for physical vibrations and acoustic propagation where unfathomable complexity means that no two time domain waveforms will ever be the same, yet underlying circumstances mean that two indistinguishable strikes can sound from the same bell hit in different places. This non-contradiction requires a model with constraints that allow a wide enough resulting range for the domain of stimulations (behavioural parameters), and narrow enough for the sound to be perceptually defined. We could call this the reality window. In the idiophonic case, the breadth of behavioural parameters may extend to include different excitation points or methods. For example, we may wish to construct a tin can model that can be impacted on its base or sides, be scraped, rolled or crushed, and yet still remain recognisable as the same object under different conditions of behavioural stimulation (see Vicario, 2001).
Depth of Model and the Question of realism Clothes you put on don’t change who you are. A soldier is more than a uniform and a doctor is more than a stethoscope. Thus it is with sampled sounds that suffer the weakness of exposing only surface features which will not withstand harder examination. One of the things people mean when describing sounds as “two dimensional” is an inability to work in more than one or two tightly constrained contexts. The idea of multi-sampling only makes sense insofar as all entity interactions can be enumerated. With combinatorial growth this clearly becomes a nonsense. Thus, sampled sounds are straw men, exposed by cursory examination. They capture the look without the feel, the appearance without the behaviour. To use a computer science term, they are brittle. The breakdown comes quickly under the functionalist gaze (if ears can cast such a thing) of a listener who cannot refrain from causal analysis (causal listening in Chions sense of seeking an identifying mechanism behind the surface signal). The senses are jarred and the familiarity of being as change (as Heidegger might have it) deeply upset by hearing exactly the same thing twice (it’s completely unnatural), especially in rapid temporal proximity where it may still be in echoic memory. This is the experience of the sound sample. A whole generation has grown used to it. And yet, with a sound photograph (sample) the best, most expensive microphones, studios and playback technology only help to expose the fakery as soon as the same sound plays for a second time. Doesn’t this make bogus the whole quest for “realism” in game audio? Such inflexibility has long been the bane of game audio developers struggling to avoid repetition and uniformity (comically depicted in an episode of “The Simpsons” where Sideshow Bob repeatedly steps on garden rakes, trapped in a “simulacra hall of mirrors”, repeating the same experience and briefly reducing the story to the level of a cheap computer game). Many systems
323
Behaviour, Structure and Causality in Procedural Audio
have been, and are still being, developed to spice up sampled sources and impart random features that seem to imitate a behavioural source. Unfortunately for this approach, the underlying features indicative of identifiable behaviour are far from random. Bregman (1992, pp. 10-36), and again Vicario (2001), point out that our sense of realism, in terms of confidence in a concrete identification with well-formed behaviour, increases with the number of instances whose variance reveals an underlying model (the more examples of a face we see the better we get at identifying its true essence). So, contrary to the goals of “increasing realism through variation”, resynthesising or treating samples to obtain random dispersions of attack time, phase, and pitch may just lead to a muddy confusion at the cognitive level, like seeing several distorted and falsely coloured photographs of the same face in the style of Warhol. No underlying behaviour that can reveal the thing in itself is conveyed. So, to answer the question “how real is it?”, I postulate two different kinds of realism. There is a superficial kind of realism, that works from one angle and relies on smoke and mirrors, and there is realism in depth. I also call the former type of surface reality aesthetic or sensible realism and the latter type of realism behavioural, characteristic, or essential realism. Sensible sources just provide the right sensations, while essential sources provide the correct perceptions, in the examined case. This is something obtained for free by a source (as opposed to signal) method. A twist to behavioural approaches, as we shall see, is that you can fake the parametric depth to a degree you can comfortably afford, in constant memory and CPU space. According to Warren (1992), Bregman (1992), and those of the cognitive schools, behavioural parameters are as important in our assessment of reality as surface features. We are accustomed to having our senses trick us with a small number of sample points, thus perception is a multi-layered, iterative, convergent affair: a cognitive integra-
324
tion. Whether multi-modal or through repeated examples through the same channel, perception in depth will only stand up if there is behaviour in depth. This can happen on a short timescale, on the acoustic and neurological time-scales in which the sounds of a snapping twig (potential predator) and a raindrop can move from sensation to perception, and then to identification in the conscious awareness of the fore-brain. Or it can happen over a medium timescale given several instances of a source held in echoic memory when we are capable of discerning fine nuances of quality, formant, and scaling (qualitative feature recognition, see McAdams & Bigand, 1992). To discern a Steinway piano from a honky-tonk usually takes more than one note. On still longer timescales that involve stronger memory formation, a collection of asynchronous sample loops, say of windy weather, may fool us for many minutes or hours before we become aware that the behaviour lacks depth. Such discernment may even require the gamer to engage in many sessions of play before the difference between repetitive, data-driven sound and “living” procedural sound is apparent. Here lies a vital development metric, if you never expect the player to play the game more than once, don’t bother with realistic depth at all. To describe the sound as more “alive” might seem strange, but in defense I must say finally on this subject of realism that there is an ineffable quality about any behaviourally designed sounds, regardless of whether the behaviour makes sense. Although I cannot yet formalise this in my own words it is illustrated by some anecdotes and quotes and is certainly worth further study with more formal experiments. In one observation, students playing with an explosion generator and a billiard ball collision simulator simply did not want to stop. They knew instinctively the difference between this and a system that was just playing back a wide choice of samples. (Perhaps this can be accounted for by novelty against an generational experience of samples). Without any visual accompaniment the
Behaviour, Structure and Causality in Procedural Audio
sounds alone, like a musical instrument, seemed engaging enough to just keep pressing the buttons (in a way that nobody would ever do with a sample replay). One described it as “satisfying”, another as “moreish” (addictive). Another, more academic explained that the continual subtle difference was what forged its identity and begged further exploration (I suppose in the spirit of Deleuze) as opposed to the anaesthetic, throw away emptiness of repetitious samples. One emotional testimony stays in my mind from an online discussion in 2005 about synthetic game sound. It remains inspiring to my work even today. Cosinusoidal (screen name) writes: “Real synthesis in computer games is something I imagine every night to help me drift off into dreams/sleep . I remember being 7 years old and playing C64 computer games and the fact that the computer was some how alive making these sounds brought tears to our eyes . Hopefully games developers will realise that real synthesis is profoundly more immersive than samples could ever be, today reading your views on the same thing has gave me hope [sic] that computer games might one day return to synthesis methods.”
Plausibility and Edge cases A sonic object captures some feature of a real source when one or more parameters can be ascribed corresponding to a real physical variable. For example, the order of a low pass filter combined with the delay time of a ground image crudely captures the distance of an aircraft where height and all other variables remain fixed. It sounds real because it directly matches something that happens in reality (multiple path destructive interference causing a comb filter sweep). Beck’s (2000) “acoustic viability” captures the behaviour of a synthetic musical instrument once the parameter space accords with our sensory expectations. These expectations can be set up a
priori, through visual priming, by experience, or on a short timescale by the “rules of auditory scene analysis” (McAdams & Bigand, 1992) which amount to an innate understanding of physics (and by Chion’s (1994) rules of audio-vision synchronisation). I like to extend this idea to encompass “behavioural plausibility” in general. As an example, an innate physical behaviour is that all systems are in energetic decay unless a new source of energy is supplied. A bouncing ball must decay (decreasing period between collisions), because if it makes sound there must be loss, ergo less kinetic energy, and thus shorter subsequent bounces. We know this at a deep level, learned through exposure to a lifetime of physics, or perhaps the result of inherited structure (in the sense of Chomsky’s innate grammar disposition). A ball with increasing energy (apparent through increasing collision intervals, perhaps because a force is silently being applied to it by a basketball player) seems to be playing backwards, even though the energetic decay of individual collisions is correct. Thus the larger scale behavioural feature dominates the smaller scale one. This interpretation of energetic growth, decay or maintenance is given by Pierre Schaeffer (discussed by Miranda, 2002, p. 127). Where is the limit of this behavioural realism? At what point does real become real enough? The answer, in our context, is where it serves its purpose for arts and entertainment. Once the depth of object behaviour is good enough we need go no further. Using super-computers and many hours of computation to produce sounds, misses a crucial point. There is a middle ground between improving upon sampled sounds and a scientific simulation on which lives depend. Failing to distinguish these goals and chasing brute force implementations with computationally precise, but sensibly inaccurate, or perceptually irrelevant results is a mistake in my opinion (unless you are actually performing simulations for civil aircraft noise abatement purposes). Knowing which are the relevant parameters is what the art of practical
325
Behaviour, Structure and Causality in Procedural Audio
procedural audio is all about (for further discussion on the limits of realism and definitions of perceptual realism, see Grimshaw, 2008; Grimshaw & Schott, 2008; Hug, 2011).
scHOOLs OF DEsIGN I would like to talk a little about the “how?” of procedural audio. Rather than dwelling on the dry, mathematical subject of synthetic methods it seems more fun to cluster the use of methods into schools of thought. This sideways glance may provoke some thoughts about application of synthetic methods in procedural systems because each exposes a different view of abstraction and control.
concrete I mention this first because the liberation of sound design requires at least a few more punches to the jaw of orthodoxy, if only to wake it up. Let the concrete school represent all that we are trying to attack and reject, the use of recorded samples and the attendant culture of infantilising commodification (“one problem one product”). I like the term concrete, as it derives from musique concrète conveying the spirit in which this technique is usually executed. The real principle of musique concrète is the juxtaposition and recontextualisation of real sounds. Since this is obviated by game audio practice that demands a gunshot represents a gunshot and a falling rock represents a falling rock, it is, in the words of Rand (1971): “Blades of grass glued onto a piece of paper to represent grass”. Thus the value of concrete technique, in technical and traditional sound design, is the substitution of one sound for another. Breaking bones represented by a snapped carrot, tearing flesh by a cabbage. We can learn a lot from sonic metaphor, about which features of signal A make it a useful perceptual substitute for signal B. In other words, a study of sound metaphor exposes a
326
powerful analytical technique. Perhaps this paucity of imagination is where game audio, driven by an adolescent sense of inadequacy in the “realism” department (which I hope we have established is largely irrelevant anyway), differs from film. The way that game audio will break away from it’s older brother’s shadow may be to break into a fresh abstract and modernistic movement, not rejecting the concrete in a reactionary way, but to find a proper voice for the living, reactive qualities that the video game can offer as a form.
Essential Essentialists are preoccupied with exactly modelling, one to one, those physical features of reality with sonic effect. For hard proponents such as Hiller and Ruiz (1971) or Zheng and James (2009), brute force modelling from elementary equations of material and fluid dynamics is the paradigm. Moderates such as Smith (1992), Cook (2002), Bilbao (2009), and Karplus and Strong (1983) are happier with approximations offering computational efficiency or analogous behaviour. Common analogous methods are waveguides, which are finite or recursive filters designed as acoustic models of spaces, finite element modelling, mass spring damper arrays used prima facia to model energetic propagation on a per point basis or such as tensor arrays aimed at solutions over continuous manifold. They are, of course, inherently causal. Beyond sound, modelling one can identify radical essentialists who seek not just depth but full concurrency and coherence with the audiovisual, a high ideal in which the sonic and visual characteristics of a modelled object/ process are a product of the same underlying behaviour (and computation). Here we would, for example, use the same fluid dynamic equations to render a view of rippling water waves and the sounds made. This breed of essentialists want to recreate the real world within their computers and love helium cooled supercomputers, extravagant research budgets and William Gibson novels.
Behaviour, Structure and Causality in Procedural Audio
However, in game audio, this becomes a seductive goal. Take again the case of the bouncing ball. Sonically we do not wish to model the nuance of kinetic, potential and elastic energy exchange, for entertainment purposes (good enough synthesis) we are satisfied with a fair approximation at a timing curve and an adequate model of the modes of a sphere excited at an exterior point. While the ball is unseen (acousmatic) no real advantage accrues. The payoff comes when we have accompanying graphics provided by a physics engine. Some of the parameters needed for our synthesis are available for free, with the added bonus of perfect synchronisation. So, up to a point, a hard essentialist’s goal is an noble one for computer games. At what point does it become pathological? The question is whether over-computation is happening; are we calculating irrelevant information? It’s hard to draw a definite line, but using information bandwidth studies from HCI as a yardstick, then once the complexity of the model greatly exceeds the information that can be acquired by a person then we’re probably going down the wrong road.
behavioural Behaviouralists are concerned with rendering convincing facsimiles of sonic phenomena by understanding the underlying physical behaviour though not necessarily modelling it directly. The behaviouralist position admits psychoacoustics and perceptual science into its framework as a necessary component. The aims are to produce perceptual realism (effect, recognition, semantics, and emotion). The mechanism of the underlying process is relevant though we are concerned with data reduction, ignoring all except those parameters and control systems that are most important. An analogy may be drawn to Searle’s Chinese Room or to the Turing Test. As a black box, while we obtain consistent and plausible results, and while the effects are subjectively correct, the internal functions are unimportant to our judgment. As we apply more and increasingly
difficult tests, the plausibility of the results may eventually break down. Behaviouralists know up front that their models will only work for a predefined range of uses. For the bouncing ball, we might choose to ignore all but a pair of material parameters and the initial height from which the ball falls. Given that no other changes occur (the surface stays fixed, and gravity doesn’t change), then everything else is irrelevant to the subsequent sound pattern. Constraints must be drawn up a priori and be satisfied throughout the sounding, which somewhat breaks the conditions of spontaneity when we make a simplification that “bakes in” some future behaviour for a while.
Phenomenal The school of phenomenology (in the sense of Husserl and Merleau-Ponty) might say: “If it works, use it”. Meaning, it does not even matter whether the underlying process matches some counterpart in reality, or is understood, so long as its facade yields passably aesthetic results. The senses are all. Some of Miranda’s (2002) [improvements by Serquera 2010] experiments with cellular autonoma could be seen as phenomenal synthesis since, although certain generative systems can be discovered yielding astonishing sonic effects, we don’t have any mapping between their parameters and sensible, acoustically viable parameters. The weakness of phenomenal synthesis is that it is brittle and it works for a small range of sounds or islands, and it easily breaks when pushed too far. Its advantage is that if we only pay attention to the surface features of a sound we can do unusual synthesis cheaply, as in Serquera’s multi-voter method; a way of herding the CA cats into a pattern that appears to be a natural process. Yet careful, reasoned use of phenomenal technique crops up frequently in practical procedural design, and is vital in many places. A chirp impulse replacing an impossibly loud transient, the use of noise bursts in critical band masking, and grain dilation to increase perceived loudness,
327
Behaviour, Structure and Causality in Procedural Audio
are all psychoacoustic sound design tricks that work at the sensation or perceptual level. They are stock tricks known to sound designers rather than features of a well-modelled reality.
Metaphysical (Purely Abstract) The metaphysicists would have it that there is a deeper order lying behind even the hard essentialist position, that ultimately mathematics alone can provide what we need, often from simple and elegant roots. The seductive idea that a formula can fall out the air to replace thousands of cycles of computation provokes intense debate. This ranges from comparatively concrete, order awaiting discovery (within geometry and topology with the effect of furthering understanding of signals), from the “bigger things” of which Gauss (1882) speaks, or Feynman’s “chequer board”, to the frankly mystical. If our bouncing ball can obtain a shortcut to spherical modes by Riemann, and a handy computable identity leading to a fast way of making struck solid sphere sounds, that’s wonderful progress. Maybe things turn out to be simple in the end. My bet is that we’ll need to work hard for results for a lot longer yet. But elegant formulas don’t always give elegant methods. The arbiter is computability: not mathematics but, to the extent our Von Neumann machines are able; algorithmic complexity. Waveguide and finite difference methods may be expensive, but they are reliable and provable. When stability becomes an issue in such systems, at least they are able to assert good behaviour for proper conditions. A series that converges too slowly, grows too fast for a set of variables, or has too many expensive terms might never match a crude bunch of heuristics stored in a lookup table in terms of reliability. Ideas of innateness, intrinsic order and Platonic truths give promise of natural models with little computational work. They include self similar fractal geometry, L-systems, cellular and genetic autonoma, recursive noise, and chaotic systems. All have all found use in
328
computer graphics (in procedural geometry and textures) and will certainly yield new audio synthesis methods in time. In some cases they seem to accord with natural processes in an uncanny way. In practice, exact relations to clearly stated and understood natural processes are often open mathematical questions. Many operate on the coarse-grained, emergent level, which is not to say purely aleatoric/stochastic. Contrast this with the precision of the wave equation methods. For this reason there is some overlap in use with the phenomenal approach (it works but we don’t care why), though their ideologies are different.
Pragmatic A pragmatic stance admits all of the above thinking, using every tool in the box to achieve a goal. Bringing together all of the above, but without insistence on purity of method or on some overarching ideology. This requires the synthesist to understand at least a little of each and their relationships. What influences adherence to one or another way of thinking are the constraints imposed by a production context. For animation and movies, one may, in the spirit of Takala & Hahn (1992), prefer to adopt full-blown physical modelling with the luxury of powerful computers, plentiful budgets, and an offline deadline . On the other hand, an embedded musical greetings card may harness unusual, shallow processes like shift register composition and direct binary instruction (Miranda, 2002) to create useful sound with negligible CPU cost/complexity. Computer games, for the next decade will probably exploit procedural audio techniques with a behavioural slant towards signal/parametric methods, with essential/source methods appearing as cost allows. It will be driven by the need for a compromise between deep process and efficient code that may run in real-time. Open frameworks that allow artists to play with cheap, mixed methods, and allow programmers to re-write methods are likely to be the most successful rather than any grand
Behaviour, Structure and Causality in Procedural Audio
unified scheme. Diversity and openness are the watchwords for the future of game audio.
strUctUrE, sPAcE AND cAUsALItY IN PrOcEDUrAL sOUND We now move on to a deeper discussion of sound object design. In chapter 3 of Designing Sound, the basic concepts of physical sound were explored, at least as they pertain to rigid body vibrations in objects whose size and shape remains fixed. Within the framework of current game physics almost all such sources are taken to be the result of a collision and thus the energetic source is kinetic. Shape (Kunkler-Peck & Turvey, 2000) (van den Doel & Pai, 1998) and material constitution as a starting point for the synthesis of idiophonic sounds is established. That a particular sound corresponds to a set of material properties, shape, size and excitation pattern and position, is confirmed by many whose work in modal and waveguide methods produces excellent results. A deeper investigation of the role of structure would make a fascinating thesis on its own. The question of whether such a correspondence is strict (injective/one-to-one and surjective/onto) is left aside. Benson (2007, pp. 119-120) references the work of Gordon, Webb, and Wolpert (1992) regarding the Dirichlet spectrum of homomorphic plates. In short, it’s possible for different shapes, similarly excited at different points, to sound identical. This leads to a useful simplification where for some object we can ignore the physical arrangement of different sub-parts and consider only their connectivity, like a net-list that represents only their logical relationships, not their spacial relationships. This is one essential feature of modal synthesis, an overview and bibliography for which is given in Adrien (1991) with a more recent discussion of modal methods in Bilbao (2009).
We have an innate understanding of size and scale as revealed by sound. Modal parameters that change with size for constant shapes (scaled Eigenvectors) are shown by some experiments (Kunkler-Peck & Turvey, 2000) to be universally interpreted. Changes of parametric scale (such as speeding up, slowing down playback rate, or shifting formants in a fixed ratio) to indicate changes of size, is a common sound design technique. For modal synthesis this technique is understood as a means to change the apparent size of simple rigid bodies. For an excellent annotated bibliography of psychoacoustic aspects see Giordano (2001). Moving on to more complex sonic objects, large scale structures (in which propagation cannot be taken as uniform), heterogeneous, polymorphic, composite, and variable size objects presents a challenge. To summarise the work above, and that presented in Designing Sound, it is sufficient to say that we can understand and synthesise sounds from objects that change in shape or size (like poured liquids), or have non-linear discontinuities (such as the twanged ruler against a table). As the size and complexity of an object increases we can no longer treat it as a collection of modal resonators connected without concern for propagation. We must consider the journey of sound waves through a series of sub-objects, from some excitation point or temporal origin, towards the listener’s ear through a radiation surface, intervening medium and acoustic context. At this point we introduce causality and the flow of energy into our model. Space, viz. size, now becomes relevant to the modal frequencies and to the time domain structure. An example is a ticking clock model in which power is transferred from a sprung store of elastic potential energy along a series of interconnected cogs and wheels towards a final radiator which is the face and hands of the device. Each sub-object can be modelled modally, using a variety of methods (additive, subtractive, or non-linear), but the overall behaviour that makes the sound object a clock, as opposed to a collection
329
Behaviour, Structure and Causality in Procedural Audio
of cogs dropped onto a table, is the synchronous and causal relationship between the parts. This Newtonian correctness can be extended to acoustic models like vehicle engines, best suited to waveguide methods, or abstracted to a control level, such as for the bouncing ball that loses energy. Application at the control level should be of particular note to designers of audio systems linked to game physics engines as this represents the perfect interface of simple classical physical models (elasticity, mass, damping, friction, rolling and other common behaviours) to audio DSP. Ideally, such parameters should be exposed at the sub-millisecond refresh rate or at audio block frequency. At a design level there are many advantages to componentised construction. Sound designers have already dreamed of modular systems in which they can create objects by combination. The software systems Cordis Anima, Modalsys, and Mosaic were the musical forerunners of newer systems that allow plug and play modelling. An advantage, explored in my work constructing game objects like guns, rocket launchers, and vehicles, is that one can obtain unexpected behaviours for free. For example, once a weapon body is constructed then reload sounds are available at no extra computational cost. Likewise construction of a car chassis and bodywork to get the correct engine filter implies the availability of door sounds with little further work. Newtonian simplifications need more attention once we enter a real game scenario. Causality is often represented (in duplicate) at a higher level of abstraction in games systems. Above the physics engine, the collision or action system often maintains a causal trace, an actor-instigator chain, in order to make logical gameplay decisions. The “one hand clapping” problem (in which we have to ask which of two objects, both of which are mutually excitor and resonator, such as two colliding billiard balls, is the source of sound) is a false dilemma imposed by the faulty logic of non-relativistic representation. Unless one object
330
is statically coupled to a significant radiator, for the instantaneous sound excitation we should consider symmetrically only the respective structures and the velocity (total kinetic energy) with which they are brought together.
Good behaviour for structural and causal systems Perhaps a way to appreciate the importance of proper structure and causality is to consider a case where it is not observed and the consequences. Often we play with wild settings of a synthesiser and discover a wonderful result, a subversion of a tuba that suddenly sounds exactly like a motorbike. I sometimes call these “islands”, because they are disconnected in behavioural timbre space. This can be seen from two viewpoints, physical and psychoacoustic. Let’s concentrate on the psychoacoustic interpretation first, where we have activated a higher level recognition schema, albeit a false one. This may be due to a spectral match, an associative (metaphorical or similar match) or a partial behavioural (mechanistic) match. Whatever the basis of the match, once identified and without further information, the tuba is a motorbike until we know more. In the analysis of Vicario (2001) these mismatches are interpreted in a Kantian phenomenological sense. He describes a typical causal identification error; the sound of rain on a window that turns out to be branches rattling against the glass. Partial behavioural matches are intriguing as they form one of the pillars of traditional sound design, shaking an umbrella for bird sounds, crunching vegetables to make the sounds of breaking bones. An object that displays some subset of the behaviour of another can often be coerced to produce signals easily mistaken for the target, especially when supplemented with confirming visual stimuli. The trick here, for the sound designer, is to identify those behavioural parameters which might exaggerate or counterpoint a desired
Behaviour, Structure and Causality in Procedural Audio
artistic direction, as Randy Thom likes to remind us, the narrative is paramount. Knowing this, such insight can equally be applied to the performance of a synthetic model. In Vicario’s (2001) example it is not until further sensible data is obtained that an identification error is revealed. The apparent sound of rain on the window is experienced as rain until the curtain is drawn back to reveal a cloudless sky and leaves brushing against the window in a breeze. Let’s turn to the physical domain now, and we can see the source of the confusion. A motorbike and a tuba share much in common, with a long tubular exhaust system driven by acoustic pulses. Any source, whatever its mechanism, that happens to coincide with the spectrum of the motorbike will, when taken entirely in isolation as a static spectrum, be a motorbike. The moment of truth comes when we attempt to move one of the generative parameters. The error, that a correct causal model (in terms of structure and scale) exists, is revealed. Structure is wrong for all other points in the behavioural parameter space. As soon as the pitch speeds up, the motorbike transforms into a tuba again and the deception is exposed. We can almost always find isolated matches to static examples. That is to say, given an arbitrary synthesiser with a small number of arbitrary parameters and a timbre space that includes the target sound, there are successful methods of converging on the parameter set necessary to mimic the target (Yee-King & Roth, 2008) One of the fascinating things about the system of Yee-King and Roth, which uses genetic algorithms to approach the best approximation for a given time domain example, is that it can find unlikely candidates within the timbre space. It can find islands that are entirely brittle and bear no resemblance to the target sound. One question I have put to Roth (with whom I currently share a laboratory) is whether, given a structurally and causally well-formed trumpet model and a target snapshot of a trumpet note, the system would converge on a parameter set congruent with the performance space of the
trumpet. I strongly suspect the answer is no (given just one example). Roth agrees, but suggests that convergent parameter estimation may work for higher dimensional performance spaces too and that given two or more examples of trumpet notes, and thus the ability to form lines then planes within the performance parameter space, it would do so. Indeed, this is what we would expect of such a multi-dimensional adaptive system. We call them neural networks. It’s what the brain of a musician does while learning to play an instrument.
Parameters and Performance Let’s distinguish time invariant or fixed parameters from behavioural parameters. These parameters for a particular piano note are generally taken as fixed within the duration/lifetime of that sound (although they may themselves have time variance such as envelope settings). In music we sometimes call the behavioural parameters the performance setup, for instance pedal pressure or keyboard scaling. These describe how higher order parameter changes affect each note or each instance of a sound. The fixed parameters would be the oscillator levels, filter settings and suchlike, while the behavioural parameters are those that change during performance. A well-formed parameter space provides a behaviour captured by the fewest salient variables while allowing the greatest sensible range. For a piano, it’s how hard you hit the key. That’s all, no need to alter the weight per unit length of the string or the size of the sound board. A pianist doesn’t need to know that. It interfaces to the performance use case giving no more or less control than required. Imagine if a piano offered an array of levers for string tension and hammer hardness that had to be set up before pressing each note! A model for raindrops offering ten filter values at its interface would be less useful than one offering only two relevant controls for size and velocity, provided that the size and velocity controls work over a proper range of values. Part of the work
331
Behaviour, Structure and Causality in Procedural Audio
of an object designer is to wrap up or “collapse” (often many) parameters into a small set of useful ones. Realistically for the rain example we would like to pick up only one or two variables from the game engine: how much it is raining (maybe flux in drops per square meter per second) and the material texture tag of an object which will sound (and perhaps this would be discovered automatically from nearby objects within a hearing radius). Parameter range should be conformal and continuously differentiable in the sensible space, that is to say, it should contain no poles or zeros where some combination of parameters causes an anomalous signal output. In other words, we want a model with the fewest meaningful parameters, which for all settings will smoothly produce the correct sonic behaviour. Such a model would at least satisfy the criteria of Beck’s (2000) “acoustic viability” which I shall paraphrase as a “simple set of parameters that work with a consistent underlying physical process”. Concerning hybridisation, the subject that was fashionably called morphing in the late 1990s; does this mean that given two parameter points along a presumably continuous behavioural line, a listener would correctly place a new example? The work of Wessel (1973) and Grey (1975) on classical instrument timbre space seems to say that if sounds are hybridised by simple mixing of spectra or interpolation of envelope curves then the perceptual interpolation is relatively smooth and ‘navigable’. That is to say, given a trumpet and flute somewhere in the middle, even though there are numerous candidate parameter combinations, there will be an area of “flumpets” and “trutes” (and also two clear flute and trumpet boundaries with hysteresis such that the matching space is trisected). Yet the more complex the synthesis becomes the less likely perceptual/behavioural interpolation can be achieved. Even if isolated points of sensible accordance can be found, unless the model is causally and structurally defined all other points fail, even those close to the working ones. Often the parametric space between superficially
332
similar sounds just gives noise. The FM method can have parametric spaces where this happens.
DEVELOPMENt ADVANtAGEs OF PrOcEDUrAL AUDIO I will here present only a short overview of the developmental advantages of procedural audio since I have dealt with these in some detail elsewhere. As mentioned above, the side effects of obtaining behaviour for free, as a result of object design, are wonderful. It does, however, have some potentially annoying but not insurmountable, artistic problems. Coupling between existing abstractions might be problematic. Changing one part of a model may change others in unexpected ways, so object classes might have to be overloaded for special cases and one off events. The potential solution to combinatorial asset growth is a benefit that alone may be enough for developers to embrace procedural audio. Where space is the issue, something we have not talked about yet is source compactness. As code, sound can be stored and transmitted with orders of magnitude better space efficiency (say 10kB instead of 10MB). Its value in replicated network games and mobile applications is high. This is topical at the present time in the UK where the gold-rush to unbounded 3G bandwidth growth has hit a wall, leaving many business strategies without a paddle. Suddenly data reduction and efficient representation are on the agenda again, at least for casual mobile gaming. Overall cost also improves compared to sample playback (which must grow with linear cost) above about 300 concurrent sources. In computer graphics, there is a long-established concept of visual level of detail. Objects at a distance, fast moving objects, or those of low relevance are drawn in a cursory fashion using techniques of texture MIPmapping and partial rendering. Especially relevant to computer game production is to introduce the idea of LOAD, a concept I am hoping to develop
Behaviour, Structure and Causality in Procedural Audio
further in consultancy for the Thief title with audio director Paul Weir. This is made possible in procedural audio because, while data playback incurs a fixed cost per source, a computational source may have variable cost. Also, the application of a perceptual audio model to a game in which AI enemies respond primarily to sound seems an obvious direction in virtual (simulated) machine listening research.
Artistic Advantages of Procedural and behaviouraltthinking Though I am fond of advocating procedural game audio from a technical position it seems inescapable that there are numerous artistic advantages too. These have been difficult to formulate and sometimes extremely hard to find a sympathetic audience for, despite also being a sound artist in my career and recognising the ways we have become stuck as sound artists unable to move forward artistically until the limitations of samples have been surpassed; as I put it earlier; shaking off the concrete and dreary realism. The goal of sonic structuralism and formalism here is only to open a doorway to effective computational models. Once that door is open I think the results will be spectacular, not just technically but artistically. An accusation often levelled by sceptics of computational sound is that it’s too complex for artists to deal with. Oddly it’s never actual artists or practitioners who are saying this. My thought is that the same was said of Pixar by pen and ink artists, just as the same was said about word-processors. And it is not just a question of tools. It is insulting to artists to assume the requisite concepts for advanced audio are beyond them. It is from here that the most creative of the next generation of sound designers will come to enjoy manipulating and extending the exposed structure and functions available at the new frontier of sonic arts. In many ways the existing event-asset paradigm is as arbitary as any other. Much creative energy is currently spent learning proprietary interfaces,
and getting around the limitations of what are simplistic, inelegant tools. Something I classed previously as a technical development advantage, dynamic level of detail, is also something to be seen as an aesthetic opportunity. Many remedial tools, which exist because the philosophy of game audio is still in essence brute force, might be dispensed with. In a scene of sufficient complexity the procedural approach allowing LOAD is to be preferred, not only because it becomes more efficient, but it also offers a better quality of sound. This “sparse” quality accrues from the ability to select psychoacoustically relevant structures thus dovetailing with existing sounds whereas, with sample playback, there are ultimately limits to unbridled superposition. Game audio is frequently criticised as too dense. Designers are constantly badgered to produce “Big” sounds which are then clumsily overused. The loudness wars have run out of earth to scorch and there’s nowhere left to go. The limits of superposition, grey goo mixes (see also Hug, 2011), can be remedied only by taking something away, which is only possible if you have something to take away. Because procedural mixes are constructed from atomic contributions, it is possible to use psychoacoustic prioritisation (similar to Bark band masking used in MP3 compression) to keep mixes sparse and clean while reducing CPU cost at the same time (for example see Moeck, Bonneel, Tsingos, Drettakis, Viaud-Delmon, & Alloza, 2007). This allows space for significant events to be “punched through” under artistic control. Remedial techniques employed in engines like the Wwise and FMOD middleware can selectively duck and filter sources to achieve focus. Beyond about 300 sources, the real-time dynamic processing required to tame and focus a mix becomes such a burden that it is probably better to deconstruct sources and to select behaviourally significant structures rather than allowing them to compete in a mix, hoping that the listener can properly attend. This logic is natural to composers and music
333
Behaviour, Structure and Causality in Procedural Audio
engineers who understand how, at a certain point in mixing a record, simplifying or removing parts produces a more potent result. Less can be more. Another conceptual change in procedural audio is that objectified sounds no longer exist. They are only potential sounds. A sound object with essence (behaviour), form (model, methods and implementation), and potentiality (latent signals brought out at run time) exists in its future. This is a movement away from sound as product, to sound as service: Does this sound object have the potential to work for these situations (plural emphasised)? A creative mind limited to exercising choice and selection, reaching for the last popular library, typing a keyword into a search utility, can become mired in self-reference, gratuitous reuse, and aspiration to the last big thing. Getting stuck this way is bad for any artist. You become only as great as your library, and the ability of your memory (assisted by search tools) to navigate it. A sound artist is set free by considering sound as a process again instead of as an asset. Focusing on process reveals possibility and allows one to think in sound again, not merely to think of sound. I’m also sceptical of current search and database technologies for sound and curious about alternatives to language at the creative stage of sound design. Tags or symbolic tokens applied to sounds are of limited help. I have heard many researchers describe the hard problems in multidimensional search for which sound is a difficult case, even with meticulously crafted meta-data and timbre tag matching, such as those used by the Echo Nest. Sounds are not what they say they are, since beyond simple onomatopoeia the name of the source is not the sound. Linguistics and phenomenology of sound provide wonderful thought experiments. These are not airy philosophies, they are vital definitions that shape tools, influence software interfaces and determine how we get to think about sound. Challenging the underlying mindset that considers sound as a fixed asset may be helpful. As Rocchesso et al. (2004) put it, is the sound of a
334
symphony radiating from a loudspeaker cone the sound of a loudspeaker cone? What is the sound and what is the source? How are a rock dropped into a pool and a stone thrown into a lake not the same sound? Language fails to help us where mass, diameter, velocity, fluid viscosity, and depth can: The palette of elementary physics provides a more real set of colours to play with provided some basic understanding is assumed. In time, we may be able to provide simple interfaces that artists with no mathematical or physics knowledge can use purely by experimentation.
sUMMArY Moving from a data model to a computational model of sound is more than a change of technology; it alters how we think about sound design. It is a move back towards an era of coherent audio-visual modelling that was cut short by a temporarily cheaper, but weaker technology. The breaking away of sound, as though to a separate faculty, has come full circle and we now need to consider reintegrating sound within the larger model. This challenges the episteme of sound design and asks us to re-examine concepts of realism, satisfaction, and immersion. For artists the move is as profound as going from a two dimensional to a three dimensional creative medium. Identity is replaced by an appreciation of behaviour, structure, process, causality, and relativity to the environment. This enriches the ways we can talk to viewers or communicate experience to players. For developers, the challenges are equally tough in taking game and film sound design to another level. Some advantages do compete with existing sampling technology and, in some cases, procedural technology still has far to go to equal the impact of recordings. But “impact”, “richness” and “realism” are ill-defined and overused words in sound design. We need to re-evaluate these words in structural and behavioural terms if they are to have any modern meaning in the
Behaviour, Structure and Causality in Procedural Audio
critique of 21st century digital audio. This means deeply questioning and perhaps rejecting the “Hollywood” values that have colonized game audio, but may not actually be appropriate to a multi-modal, interactive context. There are new advantages like LOAD, essential realism, space efficiency, sparse superposition, automatic asset generation, and sound object polymorphism that are simply not relevant to data driven methods of the film era and will never be achieved using samples. Only real-time procedural audio can address these concepts and now that the necessary processing power is available a new frontier has opened in game audio which is here to stay.
rEFErENcEs Adrien, J. M. (1991). The Missing link: Modal synthesis . In De Poli, G., Piccialli, A., & Roads, C. (Eds.), Representations of music signals (pp. 269–298). Cambridge, MA: MIT Press. Angus, J. A. S, and Caunce A. (2010) A GPGPU approach to improved acoustic finite difference time domain calculations. AES 128 (7963) London, UK. Avanzini, F. (2001). Computational issues in physically-based sound models. Unpublished doctoral dissertation. University of Padova, Italy. Beck, D. (2000). In Boulanger, R. (Ed.), Designing acoustically viable instruments in Csound. The Csound book: Perspectives in software synthesis, sound design and signal processing (p. 155). Cambridge, MA: MIT Press. Benson, D. J. (2007). Music: A mathematical offering. Cambridge: Cambridge University Press. Bilbao, S. (2009). Numerical sound synthesis. Location: John Wiley & Sons.
Bregman, A. S. (1992). Auditory scene analysis: Listening in complex environments . In McAdams, S. E., & Bigand, E. (Eds.), Thinking in sound (pp. 10–36). New York: Clarendon Press/Oxford University Press. Chion, M. (1994). Audio-vision. New York: Columbia University Press. Collins, K. (2008). Game sound: An introduction to the history, theory, and practice of video game music and sound design. Cambridge, MA: MIT Press. Cook, P. R. (2002). Real sound synthesis for interactive applications. Location: AK Peters. De Poli, G., Piccialli, A., & Roads, C. (1991). Representations of musical signals. Cambridge, MA: MIT Press. Elmore, W. C., & Heald, M. A. (1969). Physics of waves. Location: McGraw Hill. Farnell, A. J. (2008). Designing sound. London: Applied Scientific Press. Gauss, C. F. (1882). General solution of the problem: To map a part of a given surface on another given surface so that the image and the original are similar in their smallest parts. Copenhagen: Journal of Royal Society of Science. Geiger, G. (2005). Abstraction in computer music software systems. Unpublished doctoral dissertation. Universitat Pomp eu Fabra, Barcelona. Giordano, B. (2001). Preliminary observations on materials recovering from real impact sounds: Phenomenology of sound events . In Polotti, P., Papetti, S., Rocchesso, D., & Delle, S. (Eds.), The sounding object (Sob project) (p. 24). Verona: University of Verona. Gordon, C., Webb, D. L., & Wolpert, S. (1992). Isospectral plane domains and surfaces via Riemannian orbifolds. Inventiones Mathematicae, 110, 1–22. doi:10.1007/BF01231320
335
Behaviour, Structure and Causality in Procedural Audio
Grey, J. M. (1975). Exploration of musical timbre. Stanford University Dept. Music Technology Report, STAN-M-2. Grimshaw, M. (2008). Sound and immersion in the first-person shooter. International Journal of Intelligent Games & Simulation, 5(1), 2–8. Grimshaw, M., & Schott, G. (2008). A conceptual framework for the analysis of first-person shooter audio and its potential use for game engines. International Journal of Computer Games Technology, 2008. Hiller, L. and Ruiz, P. (1971). Synthesizing musical sounds by solving the wave equation for vibrating objects. Journal of the Audio Engineering Society. Hug, D. (2011). New wine in new skins: Sketching the future of game sound design . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Karplus, K., & Strong, A. (1983). Digital synthesis of plucked strings and drum timbres. Computer Music Journal, 7(4), 43–55. doi:10.2307/3680062 Kunkler-Peck, A. J., & Turvey, M. A. (2000). Hearing shape. Journal of Experimental Psychology. Human Perception and Performance, 26(1), 279–294. doi:10.1037/0096-1523.26.1.279 Little BigPlanet. (2008). Sony Computer Entertainment. Lorenz, E. (1993). The essence of chaos. Seattle, WA: University of Washington Press. doi:10.4324/9780203214589 McAdams, S. E., & Bigand, E. (Eds.). (1992). Thinking in sound: The cognitive psychology of human audition. New York: Clarendon Press. Oxford: University Press. Miranda, E. R. (2002). Towards the cutting edge: AI, supercomputing and evolutionary systems. Computer Sound Design, 157-192. Elsevier.
336
Moeck, T., Bonneel, N., Tsingos, N., Drettakis, G., Viaud-Delmon, I., & Alloza, D. (2007). Progressive perceptual audio rendering of complex scenes. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games (ACM SIGGRAPH),189-196. Moss, W., Yeh, H. (2010) Automatic sound synthesis from fluid simulation. ACM Trans. On Graphics (SIGGRAPH 2010). Mullan, E. (2009). Driving sound synthesis from a physics engine. IEEE Games Innovation Conference (ICE-GIC 09). Plomp, R., & Mimpen, A. M. (1968). The ear as a frequency analyzer. The Journal of the Acoustical Society of America, 36, 1628–1636. doi:10.1121/1.1919256 Polotti, P., Papetti, S., Rocchesso, D., & Delle, S. (Eds.). (2001). The sounding object (Sob project). Verona: University of Verona. Rand, A. (1971). Art and cognition. The Romantic Manifesto. 78. Signet. Reiter, U. (2011). Perceived quality in game audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Reiter, U., & Weitzel, M. (2007). Influence of interaction on perceived quality in audiovisual applications: Evaluation of cross-modal influence. In Proceedings of 13th International Conference on Auditory Displays (ICAD). Rocchesso, D., Avanzini, A., Rath, M., Bresin, R., & Serafin, S. (2004). Contact Sounds for Continuous Feedback. In Proceedings of International Workshop on Interactive Sonification. Serquera, J., Miranda, E. R. (2010) CA sound synthesis with an extended version of the multitype voter model. AES128 (8029) London, UK.
Behaviour, Structure and Causality in Procedural Audio
Smith, J. O. III. (1992). Physical modeling using digital waveguides. Computer Music Journal, 16(4), 74–91. doi:10.2307/3680470 Subrahmanyan, N., & Lal, B. (1974). A textbook of sound. Delhi: University of Delhi. Takala, T., & Hahn, J. (1992). Sound rendering. Proceedings of SIGGRAPH ’92, 26(2), 211-220. van den Doel, K., & Pai, D. K. (1998). The sounds of physical shapes. Presence (Cambridge, Mass.), 7(4), 382–395. doi:10.1162/105474698565794 Vicario, G. B. (2001). Prolegomena to the perceptual study of sounds . In Polotti, P., Papetti, S., Rocchesso, D., & Delle, S. (Eds.), The sounding object (Sob project) (p. 13). Verona: University of Verona. Warren, R. M. (1992). Perception of acoustic sequences . In McAdams, (Eds.), Thinking in sound: The cognitive psychology of human audition. New York: Clarendon Press. Oxford: University Press. Wessel, D. L. (1973). Physchoacoustics and music: A report from Michigan State University. PAGE Bulletin of the Computers Arts Soc., 30. Whitmore, G. (2009). The runtime studio in your console: The inevitable directionality of game audio. Develop, 94, 21. Yee-King, M., & Roth, M. (2008). Synthbot: An unsupervised software synthesiser programmer. International Computer Music Conference. Zheng, C., & James, D. L. (2009). Harmonic fluids. ACM Transaction on Graphics (SIGGRAPH 2009), 28(3). Zheng, C. & James, D. L. (2010). Rigid-Body Fracture Sound with Precomputed Soundbanks. ACM Transaction on Graphics (SIGGRAPH 2010), 29(3).
Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands. The Journal of the Acoustical Society of America, 33, 248. doi:10.1121/1.1908630
KEY tErMs AND DEFINItIONs Auditory Scene Analysis: Determining what is happening in the environment using only the sense of hearing. Continuous Parameterisation: A system that has no discrete control states but is driven by a set of real numbers. Deferred Form: A property of something that only becomes fully defined at the moment of use. For example, stock market investments have a deferred value because you don’t know what they will be worth in 20 years. DSP: Digital signal processing. In our case, computer code designed to create or modify audio signals. Dynamics: A branch of mathematics or physics concerning the changes occurring within a system. Dataflow: A programming paradigm based on flow. Usually with a visual (patching) front-end in which boxes represent transformations/functions and lines are wires that connect the boxes together. Examples include Max/MSP, Pure Data, CPS, Reaktor. Excitation: The means by which energy becomes vibration in a sounding object. Often passing from an outside object to the object that makes the sound (although self-excitation by internal stress is possible). Examples are collision, friction, field and proximity coupling, plucking, turbulence, and induction. Finite Difference (Also, First Difference): A DSP scheme where mathematical differentiation is represented discretely. Finite Element: A part in a computational model in which a continuum is broken down into small, notionally atomic parts, each with simple
337
Behaviour, Structure and Causality in Procedural Audio
relationships to identical neighbouring parts. Behaviour of the whole then emerges from the simple behaviour of parts. Good Behaviour: In a computer science sense, a well-behaved algorithm has predictability in time and space resources and its growth is acceptable for all likely input cases. It will not unexpectedly cause clicks, dropouts, or lockups. This is vital in a real-time system where resources may be pushed to the limit. Grey Goo: The breakdown of all boundaries and dissolving of form as everything becomes uniformly bland. Sometimes used to express fear of a totalitarian transformation or the movement towards a lifeless normalised form without diversity or dynamics. Idiophonic: Simple, often homogeneous objects having fixed size and shape throughout the sounding process. Examples are dropped objects, bottles, planks and other simple geometric shapes in collision. Implementation: A specific body of code or hardware for making some synthesis method work. Also, in traditional game development, the act of manually binding sound samples to events. LOAD (Level of Audio Detail): Changes made to the running synthesis system according to the amount of detail needed in the signal. Can be determined by distance, focus, object relevance, or by other perceptual (psychoacoustic) factors. Machine Listening: The branch of AI applied, in the widest sense, to recognition, classification, identification, or tracking of audio signal data. Sometimes AI auditory scene analysis, and speech recognition. Mask Topology: A: set of filters/rules for selecting or inhibiting points in a multi-dimensional data set. Masking: A psychoacoustic phenomenon accounting for the dominance of sounds containing one feature over sounds with the same or similar features. Some perceptual rules define, according to time, amplitude and spectrum, the extent to which one sound may mask another.
338
Taking advantage of this allows data reduction techniques and a knowledge of it is at the heart of skilled synthesis. Method: A general way of creating an audio signal. Model: That which captures the overall behaviour of a sound object. Occlusion: Changes to the intensity and tone of a sound by the intervention of an object between the source and listener. Parameter: A value given to a sound object (a function of time). Three senses of the word are commonly encountered; Formal parameters are blank slots left in the definition of an abstraction, to be filled later. Actual, instantiation, or fixed parameters, which don’t change within the life of a sound, are used to set up an object when it is created. Real-time or performance parameters are those properties of a simulated world that change (often rapidly) and are fed to a sound object instance while it runs. Parametric (Signal) Method: A synthesis method that maps input parameters to an audio signal using only functions that define the spectrum, such as FM, AM, or waveshaping. Physical Modelling: (Misnomer, more consistently written as physical methods). The class of methods that attempt to more or less precisely and uniformly model the material properties of an object. Mathematical ways of viewing mass, stiffness, force and velocity of some finite elements or continuum. Physically Informed Model: A simplified, perhaps heterogeneous, model in which physical properties and relationships are encoded as heuristics, simple rules and coarse data. A looser and less precise model than a strict physical model. For example a model of something as complicated as a helicopter cannot hope to be a physical model, it would contain too many elements. Instead a physically informed model states the relationship between components like pistons, engines, exhausts, rotor blades and so forth. The smaller components may themselves be (large scale)
Behaviour, Structure and Causality in Procedural Audio
physically informed parts or (small scale) physical models. The parts of a physically informed model don’t necessary have to be physical models but could be parametric. It is the overall system model that embodies the physical relationships. Physics Engine: Part of a computer game responsible for modelling the large scale behaviour of solids, fluids and gases according to heuristic Newtonian physics. For example the bouncing, rolling, and deformation of meshes according to mass, gravity, kinetic energy, and buoyancy. This code is usefully seen as distinct from modules responsible for the appearance and sound of objects. Reactive Audio: Live synthesised sound that requires user input. Most sounding objects are reactive. Replication: In computer games, replication is the problem of keeping networked clients in a multi-player game in synchrony so that each player on their gaming system feels they are experiencing the same instant in time and causality as other players (even though the clients are separated by significant and unpredictable packet latency). This is a hard problem (and in some senses, because of the speed of light, unresolvable). S+S: (Sampling Plus Synthesis).: A synthesis method using a combination of stored wavetables, mixing, and post-processing (mainly consisting of
time-variant filters and basic amplitude modulation). Popular in 1990s music synthesisers. Sound Object (Also Sounding Object): Object oriented computer code implementing a virtual sound source with methods which activate parameters to DSP code. Musically, the analogy is a musical instrument. Statefulness: Having, and being in, one of two or more discrete and exclusive states; such as existing or not existing. Tensor: A way of representing a complex vector of forces (relationships between vectors). Waveguide: A physical model where time delays and filters are used to simulate the motion and edge reflection of a wave in a bounded medium.
ENDNOtE 1
A note on terminology relevant to implementation: In computer science, we have the notion of a procedural language. A procedural sound will exist as happily on a computer running a functional language, a quantum computer if such a thing existed, or a Turing machine made of tin cans and string. The word “procedural” does not apply to the implementation, rather to the treatment of sound as process.
339
340
Chapter 16
Physical Modelling for Sound Synthesis Eoin Mullan Queen’s University Belfast, N. Ireland
AbstrAct While the first computer games synthesised all their sound effects, a desire for realism led to the widespread use of sample playback when technology matured enough to allow it. However, current research points to many advantages of procedural audio which is generated at run time from information on sound producing events using various synthesis techniques. A specific type of synthesis known as physical modelling has emerged, primarily from research into musical instruments, and this has provided audio synthesis with an intuitive link to a system’s virtual physical parameters. Various physical modelling techniques have been developed, each offering particular advantages, and some of these have been used to synthesise audio in interactive virtual environments. Refinements of these techniques have improved their efficiency by exploiting human audio perception. They have been implemented in large virtual environments and linked to third party physics engines, unveiling the potential for more realistic audio, reduced production costs, faster prototyping, and new gaming possibilities.
INtrODUctION Current research is realising the potential of procedural audio for generating sound effects in computer games. Procedural audio is generated from information on specific sound producing events and the result is a unique soundtrack each time a virtual environment is explored. This sound DOI: 10.4018/978-1-61692-828-5.ch016
generation process often involves the physical modelling of sound producing objects from an acoustic perspective. This chapter looks back at the relevant history of physical modelling and forward to how it is set to be a part of the future of computer game audio. It is laid out in four sections. The remainder of this section gives a brief history of sound effects in computer games before discussing the shortcomings of sample playback and the potential of procedural audio and physical
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Physical Modelling for Sound Synthesis
modelling. The second section takes a look at the evolution of physical modelling including many lessons which are to be learned from the physical modelling of musical instruments. Some specific techniques are discussed with extra emphasis placed on two techniques, modal synthesis and digital waveguide synthesis, which are particularly useful in real-time applications. The third section presents, in chronological order, the projects that have made advances in the area of physical modelling for sound synthesis in computer games and virtual environments and the last section looks at directions which future research may take as well as an industry perspective on the technique. The earliest computer games to include sound effects synthesised them using whatever hardware was available at the time. The sounds produced were very much influenced by the limitations of the hardware. Although by today’s standards they could not be considered realistic, they were marketed as such and a “drive towards realism […] is a trend we shall see throughout the history of game sound” (Collins, 2008, p. 9). “By 1980, arcade manufacturers included dedicated sound chips known as programmable sound generators, or PSGs into their circuit boards” (Collins, 2008, p. 12). Early consoles also had sound chips that developers used to synthesise sound and again the hardware limitations influenced their work. However, as computers became more powerful, developers began to utilise recorded samples in their quest for realism. Andy Farnell, author of Designing Sound (2008), explains, “[e]arly consoles and personal computers had synthesiser chips that produced sound effects and music in real-time, but once sample technology matured it quickly took over because of its perceived realism thus synthesised sound was relegated to the scrapheap” (p. 298). Karen Collins gives a comprehensive account of the earliest chips that performed synthesis up to the first systems that were capable of playing CD-quality samples. When sound effects are required in a virtual environment today, sample playback is the most
common method of producing them (McCuskey, 2003) and it is widely used for good reason. Raghuvanshi, Lauterbach, Chandak, Manocha, and Lin (2007) state that the method of sample playback is “simple and fast”, meaning it is computationally inexpensive and straight forward to implement (p. 68). The method takes advantage of known sound design techniques that have been refined through a long history of use in the movie industry. In the introduction to Real Sound Synthesis for Interactive Applications, Perry Cook (2002) concedes that, with much effort having gone into improving the sample-based approach, “the state of the art has indeed advanced to the point of absolute realism, at least for single sounds triggered only once” (p. xi). However, there are many drawbacks with sample playback and, in an interactive environment, it cannot provide “absolute realism”. The sounds heard in reality are produced as a result of many factors and information on these factors is contained within the sounds. For example, a piece of wood struck near its centre will sound differently than when struck close to its edge. If struck harder it will not only sound louder but it will have a different quality (Cook, 2002, p. xiii). When continuous contact is maintained, for example during rolling or scraping, another variation of sounds is heard. The object used to excite the piece of wood also has an influence. While the sonic differences can be subtle, we intuitively perceive the conditions that cause them and therefore these factors are important in creating realistic audio. One recording of a piece of wood being struck may be enough to provide realistic audio in a pre-determined scenario but it will be inadequate in a fully interactive environment. A partial solution to this problem is to use multiple recordings. We could, for example, record a block of wood being struck on various points with varying forces using different objects. When its virtual counterpart is excited, an algorithm can then determine the most suitable sample to playback or interpolate between the most appropriate samples. However, this approach can quickly
341
Physical Modelling for Sound Synthesis
become expensive in terms of both the memory and processing power required. Cook (2002) goes on to say: “To have truly realistic continuously interactive sound synthesis, however, essentially infinite memory would be required to store all possible samples” (pp. xi-xii) and “the equivalent of a truly exhaustive physical model would have to be computed in order to determine which subset of the samples and parameters would be required to generate the correct output sound” (p. xii). This means that, while using samples can provide excellent audio in pre-determined situations, in a fully interactive environment it is not capable of producing realistic sound for every eventuality that may occur. Even if one were to ignore, as game developers have been doing, the limitation of sample playback described, there are further drawbacks with the technique. Firstly, it costs time and hence money to record a library of samples, something which is described by Doel, Kry, and Pai (2001) as “a slow, labor intensive process” (p. 537). As some sounds will not be specific to just one game, for example, footsteps or doors opening and closing, it may be possible to reuse samples from a previously compiled library but this leads to another disadvantage of the practice, which is that repeatedly using the same samples as often as is necessary can become repetitive to the listener/gamer. A further disadvantage is that sound effects can only be included for objects that are conceived during a game’s development. There is no scope for allowing the user to create, modify, and then sound his or her own custom objects. This is becoming more restrictive with the increasing popularity of games such as Media Molecule’s LittleBigPlanet (Sony Computer Entertainment Europe, 2008) that feature User Generated Content (UGC) and developments like LucasArts’ Digital Molecular Matter technology, which can realistically simulate objects being broken, hence creating a different combination of shattered pieces each time.
342
The solution is to defer some of the audio processing until runtime when information on the specific sound producing event is known. This information is used to drive more realistic audio production. Director of Audio at Microsoft Game Studios, Guy Whitmore (2009), explains this in a statement written on his then upcoming Audio Keynote speech to the 2009 Develop Conference in Brighton: fully-mixed wave files, common in the majority of games today, are like 2D art; you can only see one side. They’re big and clunky and typically pre-processed. Yet there’s a desire and need to hear sound and music from all perspectives, from many directions, in various surroundings, and in different contexts. In order to effectively react and adapt to a non-fixed game timeline, audio content must be authored and kept in smaller component parts until runtime, when they are assembled and mixed. So take your entire music studio and sound design suite, including mixers, synths, samplers, DSP, mics, speakers; virtualise everything and make it adequately efficient, then drop it all into your game authoring environment. Stir in a nonlinear DAW [Digital Audio Workstation], the right game ties, and some mixing intelligence, and there you have it; a complete runtime studio, ready with the flexibility to turn on a dime to meet the needs of the game. This technique improves on simply playing back samples, but researchers wish to take the idea further. “It’s about sound as a process rather than sound as data” (Farnell, 2008, p. 1). Farnell goes on to explain the concept of procedural audio: The sample based data model requires that most of the work is done in advance, prior to execution on the platform. Many decisions are made in advance and cast in stone. Procedural audio on the other hand is highly dynamic and flexible, it defers many decisions until runtime. (p. 301)
Physical Modelling for Sound Synthesis
Procedural audio is generated in real-time from all the relevant information available. As it is generated in response to sound producing events it also fits the term “dynamic audio”, which Collins (2008) defines as audio that “reacts both to changes in the gameplay environment, and/or actions taken by the player” (p. 4). Researchers have developed techniques to generate many types of sounds from pouring and bubbling water to fire and thunder and from guns and explosions to animal and vehicle sounds. Farnell details many of these techniques in Designing Sound (2008). While it’s true that synthesising sounds from scratch is more computationally expensive than sample playback “the rewards are astonishing” (Farnell, 2008, p. 1). The ever increasing capabilities of personal computers and games consoles coupled with the desire for realism means that procedural audio is set to become part of the future of computer game audio. Farnell (2011) describes the thinking behind procedural audio, with the ultimate goal of creating perceived realism, elsewhere in this book. Most often this will involve a degree of sound synthesis. The current chapter is concerned with a specific type of sound synthesis known as physical modelling which is mentioned by Marin D. Wilde (2004, pg. 158) as a future direction for computer game audio. To physically model an object means to simulate how it behaves physically and from an audio perspective this means simulating how an object vibrates in response to excitation and causes sound waves to be radiated from it. Physical modelling can be used to realise procedural audio and would be the preferred method of those whom Farnell (2011) might call “moderate essentialists”. A range of physical modelling techniques have been developed and this is the subject of the next section.
PHYsIcAL MODELLING FOr sOUND sYNtHEsIs Sound synthesis techniques have been developed for musical and sound design purposes since the late 1950s when Matthews first performed wavetable synthesis by generating sound from data stored in tables (Bilbao, 2009). The following decade saw the development of FM synthesis and additive synthesis and, since then, composers and sound designers have learned to use these techniques and others to achieve their aims. However, these techniques, often described as abstract, have no basis in the real world and so the link between the input parameters and the sound produced is not naturally intuitive. This means control systems are often complex for the user (Adrien, 1991) and, while commonly used in early computer games, there was no obvious link between their input parameters and in-game variables. Physical modelling synthesis techniques are, however, based on real physical systems where there will usually be an intuitive link between the configuration of the system and the sound produced. Hence there should be an intuitive link between the parameters of a physical model and the sounds it produces, offering good control to sound designers (and musical composers). To discuss physical modelling from an acoustic point of view is to discuss the physical modelling of musical instruments as this is a long established research area in which much progress has been made. Much of what has been learned can be applied to the physical modelling of any sounding object and is therefore relevant where sound effects are to be created in a virtual environment. This section looks at some of the methods available before giving a more detailed look at the methods most suitable for operating in real-time and those used in the projects discussed. Throughout history, musical instruments have evolved to create sound in a sometimes complex way while giving relatively simple control to the musician. Instrument makers have, through cen-
343
Physical Modelling for Sound Synthesis
turies of tradition, learned which design subtleties are important in particular instruments and what is necessary to make an instrument sound “good”. In more recent times, the exact way in which instruments produce and radiate sound has been an area of research for physicists and acousticians. Mathematical equations have been devised to describe the physical behaviour that causes sound producing vibrations for many instruments and this has increased our understanding of them. The interested reader is directed to The Physics of Musical Instruments (Fletcher & Rossing, 1991) which gives the derivation of such equations for many popular instruments. Methods have been developed to solve these equations numerically enabling computers or signalling processing hardware to simulate the instruments thereby revealing more about why they sound as they do. This has not only increased our understanding of instruments and sound-producing objects but it has also given musicians new tools with which to create sounds and music. For a more thorough explanation of the methods described below, and others, one may wish to read Discrete Time Modelling of Musical Instruments (Välimäki, Pakarinen, Erkut, & Karjalainen, 2006). Physical modelling techniques were first developed in the 1960s based on the causal systems that produce sound in reality (Kelly & Lochbaum, 1962; Ruiz, 1969). Back then, the limited processing power and availability of computers was prohibitive but, as computers became more powerful and accessible, the algorithms being developed could afford to be more computationally demanding and this allowed researchers to develop many different physical modelling techniques. The finite element method (FEM) has been employed in many areas of scientific research from car crash simulations to complex weather systems. It involves dividing a distributed physical system, which is complex, into discrete elements, which are simple. The advantage of FEM over other physical modelling techniques is the ease with which it can be applied to complex geometries that do
344
not behave uniformly throughout. As well as this, more accuracy can be gained in important parts of a structure, for example the part of a car which will be impacted upon during a crash simulation, by using a finer grid in that area. However difficulties arise when one attempts to extract an audio signal from the surface vibrations of an FEM simulation due to the irregularity of the grid and in general it is a computationally expensive technique. For this reason FEM is quite uncommon in acoustic applications, and while sometimes used in a preprocessing stage it is less suitable for direct sound synthesis in real-time. Methods such as finite difference models and mass-spring networks also discretise an object in space and time. The current position of a point on an object is calculated from its own position at previous temporal intervals and that of neighbouring points. The mass-spring networks method treats an object as a mesh-like network of point masses connected by springs and dampers. The positions of points in the network are calculated using Newton’s second law of motion and Hooke’s law. The finite difference method is based on partial differential equations (PDEs), also derived using Newton’s second law of motion and Hooke’s law, that describe the vibration of an object. To enable these equations to be solved numerically, they are discretised in space and time using finite approximations. With the finite difference method, the same equations are used to describe all parts of the object being modelled and so the method is suited to objects that behave uniformly throughout and that can be fitted to a regular grid. Both these methods can be used in real-time and offer some advantages, for example, it is easy to extract an audio signal from the surface vibrations of an object modelled using these techniques. The techniques allow for real-time interaction, at one or many points on the object, which could be external excitation or coupling with another object. They can also model distributed nonlinear effects, that is, effects due to larger or smaller vibrations beyond the amplitude
Physical Modelling for Sound Synthesis
of the output. Higher spatial and temporal resolutions can be used for a more accurate simulation but this costs more in terms of processing cycles and these techniques can become computationally expensive for large objects. For this reason, they are not often the preferred method for environments involving many objects or in cases where efficiency is important and sound quality is not the top priority. In the mid-1980s, digital waveguide synthesis emerged as a computationally cheap method for modelling one-dimensional (1D) wave propagation in systems with no stiffness. As many stringed and woodwind instruments approximately fit this description the technique proved popular and has been heralded as “the most successful physical modelling technique to date” (Bilbao, 2009, p. 18). It is explained as follows. The vibrational behaviour of a 1D system with no stiffness can be modelled as two travelling waves propagating through the structure, bouncing back at the ends and not interacting with each other. These waves originate from an excitation signal and initially leave the excitation point in opposite directions. Digital waveguide synthesis involves employing two memory buffers as a delay line for each direction. The size of the buffers corresponds to the time taken for a sound wave to travel the length of the structure being modelled and is therefore calculated using the speed of sound in the medium. To model attenuation, the waves are passed through a digital filter as they bounce at each end. The filters are designed to apply the correct attenuation for the structure being modelled and can apply frequencydependent attenuation, giving the output sound the signature of different materials. As locations in the memory buffers correspond to positions on a real object it is easy to simulate the structure being excited at different points by adding an excitation signal to the corresponding memory location of each buffer. Audio output can easily be extracted from the model by summing the contents of the
memory location of each buffer corresponding to the desired pickup position. Compared to the techniques discussed so far, which may require hundreds or thousands of arithmetic operations per temporal interval, digital waveguide synthesis is a very inexpensive technique. The movement of a signal through a buffer requires no arithmetic operations and a second order digital filter will often suffice for applying attenuation at the ends. It should be stressed that this technique does not gain computational efficiency over other methods through some crude approximation or by sacrificing veracity. Instead, it exploits the harmonic nature of waves in a 1D system without stiffness and employs a very cheap method of delaying a signal in time. An obvious drawback with digital waveguides is their limitation to 1D systems. When they have been extended to more dimensions, creating a “waveguide mesh”, the performance gains have been lost and the number of arithmetic calculations involved invariably approaches that of a finite difference scheme (Bilbao, 2006). Another limitation is their inability to model the effect that the vibration amplitude has on how a system resonates, that is, they cannot easily be used to model distributed nonlinearities. This effect is apparent when one listens to how the sound of a gong changes as it resonates. To understand another drawback of digital waveguide synthesis one must understand the effect that bending stiffness has on how an object vibrates. The composite frequencies of a 1D sounding object with no bending stiffness will be harmonically related, that is, the higher frequencies will be integer multiples of the fundamental frequency. Bending stiffness causes waves to propagate more quickly through a medium and, significantly, this effect is more pronounced in higher frequencies. Therefore, the spectrum of a 1D sounding object with bending stiffness will not be harmonic. Because many stringed and woodwind instruments are not greatly affected by stiffness, this disadvantage has not been enough
345
Physical Modelling for Sound Synthesis
to dissuade its widespread use in musical applications. However, the primary sound sources in a typical virtual environment are rarely plucked strings and woodwind instruments and so digital waveguide synthesis has not been so commonly adopted in these situations. The effect of stiffness can be realised by using more sophisticated, and hence more computationally expensive, digital filters. Some research has been carried out in an attempt to do this in an inexpensive way (Essl, Serafin, Cook, & Smith, 2004) but, to accurately realise the full effect of stiffness, the total computation required would equal that of the next physical modelling technique to be discussed. The method which has become the most popular for real-time sound synthesis in interactive virtual environments is modal synthesis. It is a physically based method that has many advantages for real-time interactive sound synthesis. As it is used by many of the projects discussed in this chapter, it is beneficial to understand some theory behind it. There follows a brief description of the technique and, for a more detailed examination, one may wish to read Jean-Marie Adrien’s seminal work on the subject, The Missing Link: Modal Synthesis (1991). To understand modal synthesis, one should consider the sound of a vibrating object as a sum of decaying sinusoids. The frequencies of these sinusoids correspond to an object’s modes of vibration and, in modal synthesis, they are recreated by a bank of oscillators working in parallel. Each mode of vibration also has an associated decay rate and a shape function or shape matrix and, together with its frequency, they represent an object’s modal data, that is, all the information needed to carry out modal synthesis for that object. A shape function, or shape matrix, provides the gain of a given mode for a specific contact location meaning that modal synthesis can create contact location-specific sounds, an obvious advantage in virtual interactive environments where realism is desired and perpetual cues are important. A common way of implementing oscillators for
346
modal synthesis is to create a digital filter tuned to resonate at the desired frequency and with the desired decay rate. To simulate object excitation, an impulse signal is input into each filter independently. The location-dependent gain can either be incorporated into the design of each filter, or, if the contact location is to be varied during run time, applied to the impulse signal before input to each of the filters. It is with good reason that modal synthesis has become the most popular physical modelling technique to be applied in interactive virtual environments. Its many advantages will be presented here in detail as they are relevant to the projects discussed later in this chapter. Firstly, modal synthesis generates sound as a sum of sinusoids which correlates naturally with the way in which humans perceive sound. The computational expense of the technique is directly proportional to the number of modes being used and, as in theory this is unlimited, one cannot simply assert that it is a cheap technique. However, many studies have indicated that realistic object contact sounds can be created with little computational expense and that its memory usage is much lower than that of the digital waveguide technique. Furthermore, studies have implemented modal synthesis in such a way that the processing power afforded it can be varied at run time with a graceful, if at all noticeable, degradation in sound quality. The ability to set a sonic level of detail is particularly appealing in a large environment where some sounding objects may be far away from the observer or occluded, or in a busy game environment where the player’s attention is to be focused on a particular aspect. It is also attractive to have an audio engine that can utilise more processing power when it is available and make do with less when it is not. Although, like digital waveguide synthesis, modal synthesis cannot easily model distributed nonlinear effects, it can model the effect of bending stiffness at no extra cost and hence synthesise the sound of inharmonic objects. This is important in a typical virtual environment where the majority
Physical Modelling for Sound Synthesis
of sounding objects will have a level of stiffness that is significant from an acoustic point of view. Like digital waveguide synthesis, it is possible with modal synthesis to give the effect of sounding different materials by changing how quickly the composite frequencies are damped with time. This is straightforward to implement with modal synthesis–damping can be simply applied to each mode of vibration–as opposed to digital waveguide synthesis which requires filters to be designed with this in mind. Another advantage of modal synthesis is its versatility with regard to how an object’s modal data is obtained. Projects have determined information on an object’s modes of vibration from calculations, from recordings of real sounding objects, and by using finite element analysis with each of these methods having its own advantages and disadvantages. A final physical modelling technique to be aware of is called the functional transformation method (FTM) (Trautmann & Rabenstein, 2003). Originally developed in the 1990s, this more formalised form of modal synthesis derives the modal data directly from the underlying PDEs that describe the object’s vibrations, using Laplace and Sturm-Liouville transformations. The FTM can be applied in 1D, 2D, and 3D linear systems with regular shapes and provides a more structural approach to interconnecting vibrating structures. Multi-rate techniques, which involve simulating lower frequencies with a lower sample rate without affecting the sound quality (a technique which can also be applied to modal synthesis), enable FTM to be used in real-time on a typical desktop PC. However, to date, this technique has not been utilised in an interactive virtual environment.
tHE PAPEr trAIL OF PHYsIcAL MODELLING IN VIrtUAL ENVIrONMENts The idea of sound rendering for computer animation was introduced by Tapio Takala and James Hahn in their paper Sound Rendering (1992) and they pioneered many of the ideas used in more recent projects. They presented a methodology for combining procedural sounds into a synchronised soundtrack based on an animation. Each object in a scene is associated with a characteristic sound and a sound script is created for an animation detailing “how a prototype sound signal will be instantiated, and how it is transformed by the acoustic environment” (Takala & Hahn, 1992, p. 218). Although they acknowledge the potential for physical modelling, saying “sound can be synthesized from physical principles” (Takala & Hahn, 1992, p. 214), and they describe how an object’s complex vibration can be computed as a sum of its vibration modes (modal synthesis), the main focus of the paper is on the modulation of sounds due to propagation in a three-dimensional environment. The authors touch upon the idea of driving sound synthesis from a physics engine when they propose “key events of a script can be […] automatically computed by a behavioural or physically-based motion control” (Takala & Hahn, 1992, p. 215). This idea has been built upon by some of the projects presented later in this chapter which successfully extract and use not only event triggers, but also information from a physics engine. Takala and Hahn also give their insight into how sound is produced when two surfaces slide over each other: the surface features cause both of the objects to vibrate in the same way as a phonograph needle vibrates when dragged in the groove of a record disk. The waveform of the sound generated is similar to the shape of surface imperfections of the objects. The so called 1/f noise could be used
347
Physical Modelling for Sound Synthesis
to model this phenomenon for rough surfaces. (p. 214) While its implementation is not presented in their paper, the idea of using 1/f noise for a scarping sound has been utilized to good effect in more recent projects. Around the same time as the above work appeared, William W. Gaver published some research on auditory event perception (Gaver, 1993b) and synthesizing sounds for auditory icons (Gaver, 1993a). In the former work, Gaver notes that “sound provides information about an interaction of materials at a location in an environment” (Gaver, 1993b, p. 5) and he explores “everyday listening” as the process of listening to events rather than sounds. From this, a method for describing sounds by attributes of the sound source is proposed. In the latter work, Gaver states that auditory icons can add functionality to a computer environment and, while the digital samples cannot be easily manipulated, it is possible to synthesize them. He then describes algorithms to synthesize many everyday sounds as specified by parameters that describe the sound producing event, unlike many synthesis methods at the time which synthesized sound in musical terms. This was an important step towards synthesizing object contact sounds based on their physical attributes, a necessity of physically motivated sound synthesis in virtual environments. A few years later, in 1997, Perry Cook published research carried out on physical modelling. Physically Informed Sonic Modeling (PhISM): Synthesis of Percussive Sounds appeared in the Computer Music Journal (Cook, 1997) and documented Cook’s work in creating a framework which could synthesize the sounds of a wide range of percussion instruments. Here, two synthesis algorithms are presented motivated by two distinct types of percussive instruments. The first, Physically Informed Spectral Additive Modeling (PhISAM), deals with resonant percussive instruments like a marimba or cowbell. As
348
the name may suggest, this method uses modal synthesis and the intention is that the modal data has a physical meaning. The framework suggests methods for determining the modal frequencies from the recording of a real sounding instrument, for example “peak-picking from spectra calculated by Fourier analysis” (Cook, 1997, p. 39). Although automation can help in setting the modal data, the author suggests “it relies heavily on human analysis and decisions” (p. 39). To create an appropriate excitation signal, the suggestion is made to record an actual object striking a non-resonant surface. So, if one wished, for example, to simulate a virtual object being struck by a mallet, one should use the recording of a real mallet striking a non-resonant body. If the body being struck is completely rigid and non-resonant then the sound made will be due only to the mallet and should therefore capture the excitation force, which can be used as an excitation signal in the virtual world. The second algorithm is called Physically Informed Stochastic Event Modeling (PhISEM). This involves “the overlapping and adding of small grains of sound” (Cook, 1997, p. 40), a familiar process known as granular synthesis. The method is motivated by instruments like the maracas or tambourine, the characteristic sounds of which are made up of many short, discrete sound events. While these instruments may be uncommon in the average computer game environment, Cook notes that PhISEM algorithms can also be used to synthesize many everyday sounds like footsteps on gravel or dripping water. The process involves analyzing a sound-producing system and possibly the waveform of the sound it produces. From this analysis, rules are created to map the variables of the system, such as the shake velocity of a maracas, to the parameters of a granular synthesis algorithm. One of many contributions in the area by Kees van den Doel and Dinish K. Pai was published in 1998. The Sounds of Physical Shapes (Doel & Pai, 1998) documented the creation of their Sonic Explorer application which adds object contact sounds to real-time simulation environments.
Physical Modelling for Sound Synthesis
They focused on making realistic object contact sounds by considering the shape and material of the sounding object, the location of contact on the object, and the force of the impact. Sound is generated by modal synthesis and an object’s modal data is determined by calculation. This is possible for regularly shaped objects and, in Sonic Explorer, has been carried out for strings, bars, plates, and membranes (in physical modelling terminology, a membrane is a two-dimensional system with no stiffness). The process of calculating modal frequencies for these objects is explained in chapters two and three of The Physics of Musical Instruments (Fletcher & Rossing, 1991) and in Sonic Explorer this is carried out off-line as a pre-processing stage. Doel and Pai explain that the contribution of each mode to the overall vibration depends on the impact location and they derive some general formulas to calculate the mode amplitudes for a given impact location. This means that when an object is struck on different locations there will be subtle differences between the sounds produced that match what the listener intuitively expects to hear. The authors do not take into account the directionality of the emitted sound nor do they accurately model a physical environment as this is beyond the scope of their project but, with a few justified simplifications, they present a suitable method for transforming the vibrations of an object into the sound that is heard. Next, they show how to account for different materials by explaining that an object’s internal friction parameter, which is determined by its material, affects how its sounding frequencies become damped with time. By changing the damping values they can therefore create the effect of sounding different materials. A frequency-independent damping value is applied to all frequencies equally and a frequency-dependent damping has a greater effect on higher frequencies. The beginning of the new millennium saw a significant contribution from a team at the Polytechnic University of Milan, led by Augusto Sarti. In their paper Object-Based Sound Synthesis for
Virtual Environments (Pedersini, Sarti, & Tubaro, 2000), they present a way to model multiple interacting sounding objects. They model sounding objects by combining digital waveguides and wave digital filters (Fettweis, 1986), which are closely related to digital waveguides. Nonlinear elements are incorporated in object contact conditions and these are exploited to make a dynamic interconnection topology. This allows for connections between sounding objects to be made and broken as the acoustic effects are accurately modelled. They also give an overview of available sound rendering algorithms, with a strong influence from 3D graphics rendering. More recently, this approach has been automated using a binary connection tree (BCT) (De Sanctis, Sarti, Scarparo, & Tubaro, 2005) allowing real-time interactions between sounding objects in an interactive setting. In 2001, Doel and Pai, along with others at the University of British Columbia, Vancouver, developed a system for bringing real world objects into the virtual domain. Scanning Physical Interaction Behaviour of 3D Objects (Pai et al., 2001) extends the idea of simply scanning an object to create a graphical virtual equivalent by capturing its deformation behaviour (how it reacts to external force), its surface texture, and its sound producing properties. This is achieved by scanning the interaction behaviour in a variety of ways using robotic measurement facilities. While the techniques for finding an object’s deformation behaviour and surface texture are fascinating they fall outside the scope of this chapter and so this project will be examined from an audio perspective. Similar to The Sounds of Physical Shapes, this project generates sound by modal synthesis but now the modal data is automatically determined for an object based on robotically obtained measurements. To this end, a robotic arm applies an approximation of an impulse force (a short tap) to a location on the object being scanned and the resulting sound is recorded. A technique is described for extracting the object’s modal data (frequency of vibration modes, the amplitude
349
Physical Modelling for Sound Synthesis
of these modes, and their damping) from this recording. To enable impact location-dependent sound synthesis, that is, to realize the subtle differences in sound due to an object being struck on different locations, this process is repeated at many points on the object’s surface. In order to model friction, a measure of surface roughness is determined robotically and this is then used for generating continuous contact sounds. In the same year, Doel and Pai along with Paul G. Kry, who specializes in computer graphics and physics based animations, published FoleyAutomatic: Physically-based Sound Effects for Interactive Simulation and Animation (Doel, Kry, & Pai, 2001). In this project the process of sound synthesis is linked to a dynamic simulation (physics engine) that allows user interaction. Sound is generated using modal synthesis, as described in The Sounds of Physical Shapes (Doel & Pai, 1998), and modal data is determined as in Scanning Physical Interaction Behavior of 3D Objects (Pai et al., 2001). In addition to being linked to a dynamic simulation, this project is the first to synthesize continuous contact sounds, that is, rolling and scraping, in a real-time simulation. The technique used in this paper could be implemented with any multi-body dynamic simulation method as the requisite information, such as object collision forces, normal forces during continuous contact, and the position of objects relative to each other, if not directly available, can be easily computed from the object velocities and positions which are available. However, the authors note that methods which provide smooth surface models are preferable to polyhedral approximations and methods which model rolling and sliding are also desirable. In order to create realistic audio, object interactions are simulated at an audio rate which is much higher than the video rate. The modelling of object interactions must be carried out quickly and the authors note a stochastic model that involves some random element is often appropriate. When objects make contact, a force signal is calculated at the audio rate and this is used to drive modal
350
synthesis. In the case of a single impact, it was found that the shape of the force profile was not perceptually important but that the duration of the force gave a good feel for the hardness of the objects in contact. For collisions involving hard surfaces, a burst of impulses at the dominant modal frequencies of the colliding objects was found to convincingly produce the sound due to micro-collisions. To create a scraping impulse, the phonograph model as described earlier in the work of Takala and Hahn (1992) is recalled. To create such a surface profile, noise is filtered to give a 1/f shape and a fractal dimension variable is considered to represent the roughness in the profile produced. In the case of rolling, it is theorized that the impulse is similar to scraping but more “drawn out in time” (Doel et al., 2001, p. 541) and this is realized by applying a low-pass filter to the scraping impulse. Suspecting a stronger coupling between objects during rolling than scraping, it was found that enhancing the spectrum at the resonant frequencies of the objects involved gave better rolling sounds at a higher computational cost. All of the ideas summarized here have been implemented in the application FoleyAutomatic which can “automatically generate high-quality realistic contact sounds” from physical information in order to “increase the feeling of realism and immersion in interactive simulations” (Doel et al., 2001, p.543). See Figure 1. Another, paper published in 2001 by James F. O’Brien, Perry Cook and Georg Essl, focused on calculating the sound heard due to a vibrating object at a relative location. Synthesizing Sounds from Physically Based Motion (O’Brien, Cook, & Essl, 2001) describes how to calculate the air pressure at two points in an environment (for stereo sound) due to the surface vibrations of an object. Considering Huygen’s principle, the delay and attenuation due to sound propagation is calculated. However, the technique requires a deformable body simulator that calculates the surface vibrations of objects at audible frequencies and is therefore not compatible with most real-time
Physical Modelling for Sound Synthesis
Figure 1. Virtual rock in a wok, from the Foleyautomatic project. (© 2010 Kees van den Doel. Used with permission.)
interactive simulations. The following year, however, O’Brien, along with Chen Shen and Christine M. Gatchalian published a paper which addressed this. Synthesizing Sounds from Rigid-Body Simulations (O’Brien, Shen, & Gatchalian, 2002) notes that when a body’s sound producing deformations are small enough, which they generally are, they can be decoupled from its rigid-body behaviour. This means that it is possible to use a rigid body simulator to calculate how objects move on a macroscopic scale while audio is generated by a separate process. O’Brien et al. have implemented this twice, with two different third-party physics engines. Modal synthesis is the method of choice here and the unique contribution of this work is the way in which modal data is obtained. Unlike FoleyAutomatic, they do not require any experimental data and, unlike Sonic Explorer, arbitrarily shaped objects can be included because an object’s modal data is determined using a finite element scheme. A tetrahedral method previously used by O’Brien et al. (O’Brien et al., 2001) is now modified to produce modal data for an object of a given size, shape, and material. While this precomputation phase may take a few hours it allows sound synthesis to happen in real-time during the simulation. The authors also note that some parameters can be changed after the computation without affecting the modal frequencies while other changes affect all frequencies by the same ratio and so can be quickly computed. The same year saw a contribution from Doel and Pai, this time along with researchers at the
Institute for Hearing Accessibility Research in the University of British Columbia. Measurements of Perceptual Quality of Contact Sound Models (Doel, Pai, Adam, Kortchmar, & Pichora-Fuller, 2002) explored ways to improve how the modal synthesis technique can be employed. The authors analysed the recording of a metal vase being struck and found 179 modes. However after “laborious trial and error” (Doel et al., 2002, p. 2) they found that only 10 to 15 of those modes were perceptually important. By conducting listening tests, they then set about finding an algorithm for sorting an object’s modes by perceptual importance so that only the most important modes need actually be synthesized, thus saving processing power. For example, a simple approach was to weight the importance of a mode by its gain while more sophisticated techniques considered the effect of masking. After describing their techniques, experimental procedure, and results, they concluded that while results varied substantially among participants the “efficiency of the synthesis can be improved by several orders of magnitude by a careful selection of the modal model” (Doel et al, 2002, p. 4). The year 2002 also saw the first published contribution to the field from Dylan Menzies. Realising the promise of physical modelling for sound synthesis in virtual environments and the potential of these techniques to work closely with a physics engine, Menzies set about creating a modular framework to enable the sounding and interaction of many objects in a virtual world. Scene Management for Modelled Audio Objects
351
Physical Modelling for Sound Synthesis
in Interactive Worlds (Menzies, 2002) describes the beginnings of what would later become the Phya project. This project is detailed later when discussing Menzies’ 2007 paper Physical Audio for Virtual Environments, Phya in Review (Menzies, 2007). Meanwhile, Doel and Pai continued their research on efficient ways to implement modal synthesis by applying it to complex scenes involving a large number of sounding objects. In Interactive Simulation of Complex Audio-Visual Scenes (Doel, Knott, & Pai, 2004) they describe a process which they call mode pruning in which the effect of auditory masking is used to predict which modes are perceptually unimportant so that they can be excluded from synthesis, hence saving processing power. The technique is not just applied to individual sounding objects, as in previous work: all sounding objects are now considered together. This requires the process to be carried out at run time as objects are being sounded and reaps further gains in efficiency. In addition, this project eases the load on the CPU by offloading as much computation as possible to the Graphical Processing Unit (GPU). The next major contribution from Doel and Pai was published as a chapter entitled Modal Synthesis for Vibrating Objects (Doel & Pai, 2006) in the book Audio Anecdotes III. As a book chapter, this work is self contained and as such includes a comprehensive review of the theory of modal synthesis before explaining how they have implemented it using a bank of band-pass filters. In order for the project to facilitate environments that may potentially grow to be very large, with sounding objects continuously being created and destroyed, a modular solution was designed. From a programming point of view, this means that code is written in classes and, at run-time, instances of these classes are created and destroyed as required. For example, a class named SonicObject contains functionality for rendering to a system’s audio hardware, ModalSonicObject implements modal synthesis and an AudioForce class provides
352
functionality for extracting an audio signal from an object. Modular programming also allows a derived class to inherit functionality from a base class and so common code can be written in a base and used in many specialized derived classes. This important advantage ensures the functionality of the system can easily be extended in the future. In this work the authors note that: “Modal synthesis can also be used to model other types of physical systems which can be modeled by excitations acting on resonances, such as car engines, rumbling sounds, or virtual musical instruments” (Doel & Pai, 2006, p. 100). They go on to explain their implementation of a system to synthesize the sound of a four-stroke engine and a four-cylinder engine. To do this, they determine modal data that represents “a lumped model of everything that is vibrating” (p. 114) and drive it with a force signal the generation of which is inspired by the workings of an engine. They state that reasonable results were obtained with their “extremely simple models” (p. 115) and express optimism about the range of engine sounds that could be achieved by spending more time developing the technique. In the same year, Nikunj Raghuvanshi and Ming C. Lin published research on their use of physical modelling in large environments in a paper entitled Interactive Sound Synthesis for Large Scale Environments (Raghuvanshi & Lin, 2006). Their method of determining an object’s modal data is similar to the one used by O’Brien et al. (2002) and can similarly be used for arbitrarily shaped objects. They consider three-dimensional objects as being composed of a thin shell and a hollow inside and they represent it as a mesh of particles connected by damped springs. Off-line computation of a matrix of particle masses and one of elastic forces renders the modal data for an object from which synthesis can be performed. A large portion of the paper details ways in which computational efficiency is gained and 3 techniques are detailed. Similar to Doel and Pai in Interactive Simulation of Complex Audio-Visual Scenes, the total computational expense is reduce
Physical Modelling for Sound Synthesis
by decreasing the amount of modes used for synthesis and obviously the aim is to do this with the minimum degradation of sound quality. The first technique, which they call mode compression, exploits the fact that humans find it difficult to discriminate between nearby frequencies (Sek & Moore, 1995). Therefore, a number of nearby frequencies are lumped together to reduce computational expense. The second technique, mode truncation, considers that an oscillator requires processing cycles regardless of how much its output contributes to an overall sound. To improve efficiency, a threshold is introduced below which an oscillator’s output is deemed to be unimportant and therefore is no longer calculated. The third technique, quality scaling, is concerned with synthesizing sound for multiple objects simultaneously. Each sounding object in a scene is given a processing time-quota within which to perform modal synthesis with more important objects given more time than those of lesser importance. At each audio callback, modal synthesis is carried out starting with the most important object and finishing with the least important, allowing each object their full time quota if required. This means the more important objects will be rendered with a higher level of sonic detail than those of less importance. Results indicate that each of these techniques yields an efficiency gain and, when used together, sound can be synthesized for a large scene with hundreds
of interacting objects “with little loss in perceived sound quality” (Raghuvanshi & Lin, 2006, p. 108). Returning to the research of Menzies, 2007 saw the publication of Physical Audio for Virtual Environments, Phya in Review (Menzies, 2007). This paper described a project now known as Phya: a library that facilitates physical modelling for sound synthesis in tandem with a physics engine. In this work, the author underlines the need for creative thinking in such a project in order to produce a “powerful synergistic perceptual effect” (Menzies, 2007, p. 1) by combining realistic audio with visuals. This may entail relaxing the physical constraints on the sound production process and giving sound designers some freedom to enhance the characteristic sounds of a scene. Ways in which a level of control may be given to a sound designer are highlighted throughout the work. There is also an emphasis on using robust, efficient techniques and on creating a system that can be easily scaled up to handle large environments. By combining a knowledge of techniques described in previous projects with new innovations, Phya facilitates sound synthesis for many different types of contact (Figure 2.). These include: simple impacts; multiple impacts, which happen at a rate not captured by most physics engines; scraping and rolling, including the effect of contact jumps where the objects momentarily break contact; grazing sounds, which occur when an object simultaneously bounces and skids off a surface; stick and slip, due to friction between
Figure 2. Screenshot of Phya synthesising the sounds of multiple objects. (© 2010 Dylan Menzies. Used with permission.)
353
Physical Modelling for Sound Synthesis
objects; and the buzzing from vibrating objects in light contact with each other. In addition, the project can produce the effect of surface damping (an object becomes less resonant due to being in contact with another object) and also the effect of a change in sound due to a deformable object being forced out of shape. To a first approximation, the effect of distributed nonlinearities described earlier can be simulated, producing interesting results like pitch glide. Further, as yet unreleased, functionality has demonstrated the effect of diffuse resonance which is when an object’s modes “become very numerous and merge into a diffuse continuum” (Menzies, 2007, p. 3). More recent demonstrations of Phya have introduced loose particle surfaces, for example, gravel, achieved through a PhISEM like approach, as well as surfaces covered in leaves, plastic packaging, and shallow water. Bear in mind the author does not claim these effects are created through accurate modelling of the physical phenomena that cause them but, instead, techniques have been developed by combining an understanding of these phenomena with a creative approach to sound synthesis while reserving some aspects to be controlled by a sound designer. In a later publication, Menzies refers to this as “the development of semi-physical perceptual models that provide some freedom for the sound designer to more easily mould a virtual world” (Menzies, 2008, p.1) The information required to create these sounds is unlikely to be available directly from a physics engine and therefore a collision update layer of Phya must deduce it from what is available. Often, the physics engine will not provide enough detail, for example when multiple impacts are involved, and so the collision update layer must generate extra information from what is known in a deterministic or stochastic way. The collision update process is reported as one of the more tricky problems the project has overcome, with the issue of monitoring continuous contact said to be particularly awkward as most physics engines do not
354
use persistent contacts but instead simply report if two objects are touching in a given frame. Before concluding, Menzies, who himself has experience in both the academic and industrial sides of computer game development, gives his own “tentative explanations” as to why physical modelling hasn’t been embraced outside of research projects despite its potential value. Firstly, as is widely known among sound designers and audio programmers in the games industry, audio is usually considered less important than graphics, meaning that less resources are allocated to it both in terms of development and hardware. Therefore “audio programming is often carried out by a non-specialist programmer” and “there is often a natural resistance to acknowledge that out of house technologies could be valuable if they can not readily be reproduced in house” (Menzies, 2007, p. 5). Considering the difficulties associated with correctly harnessing the information available from a physics engine, it is perhaps understandable that developers may not wish to risk spending resources on a concept that has yet to be tested in a commercial sense. In addition, Menzies contends that “published research often focuses on a level of audio modeling detail that goes well beyond that required in a simulation” (Menzies, 2007, p. 5) implying that current research is not entirely relevant to an industry where a level of creative control is desirable, algorithmic efficiency is vital, and absolute authenticity is not. In 2007, Raghuvanshi and Lin, now collaborating with Christian Lauterbach, Anish Chandak, and Dinesh Manocha all from the University of North Carolina, published an article on their continued research. Real-Time Sound Synthesis and Propagation for Games (Raghuvanshi et al., 2007) reviews their previously described work (Raghuvanshi & Lin, 2006) before presenting sound propagation functionality that has since been added. Their approach is an adaptation of beam tracing (Funkhouser et al., 1998) and it is, as they claim, well suited to interactive applications. Their results indicate that, in complex, dynamic scenes
Physical Modelling for Sound Synthesis
that contain moving sound sources, the technique is effective enough to model sound propagation with several orders of reflection. In 2008, Menzies, in his paper Virtual Intimacy: Phya as an Instrument (Menzies, 2008), considered how Phya might be used to create music. He describes how Phya’s physical behaviour is naturally appealing to humans and how the facility to create physically impractical or even impossible conditions in a virtual world gives users more freedom than in the real world, creating new musical composition possibilities. In 2009, Menzies revealed the beginnings of a complementary application for Phya, in the paper Phya and VFoley, Physically Motivated Audio for Virtual Environments (Menzies, 2009). The development of VFoley is intended to allow users to hear how an object sounds as modifications are being made to it, so that a user may quickly test object interactions (Figure 3.). An application developed by myself (Mullan, 2009) to synthesise sounds for regular shapes, not only allows the user to modify an object’s geometric and material parameters, but also parameters that directly describe how an object sounds, referred to as its “audibly perceptible” parameters. The completion of projects such as these brings physical modelling for sound synthesis to non-programmers, and hence more widespread usage.
FUtUrE DIrEctIONs AND cONcLUsION So, what lies in the future of this research area? What are the current barriers to its mainstream adoption? Is physical modelling for sound synthesis set to become the norm for computer game sound effects? Let us first look at possible future research directions. Menzies’ most recent work promises to provide users with an authoring environment for Phya, hence making it accessible to a wider audience. Although it is not yet clear what form this will take, the facility for users to modify an object’s size, shape, and material properties while it is sounded in real-time, would most certainly be desirable. The most problematic stipulation here is the provision of a means to change an object’s shape while providing sound in real-time. To date, any work that allows for the sounding of arbitrarily shaped objects has required an off-line pre-processing stage to determine the object’s modal data and, so far, this cannot happen quickly enough to be used in an application that could be described as interactive. Cynthia Bruyns has carried out related research and has shown how to quickly estimate an arbitrarily shaped object’s modal data based on those of a similar shape (Bruyns, 2006). However, no application exists whereby a level designer may create a unique object and instantly (or within a reasonably short time) hear how it sounds. This would be desirable from a game development point of view and would create new gameplay possibilities.
Figure 3. Screenshot taken from the development of VFoley. (© 2010 Dylan Menzies. Used with permission.)
355
Physical Modelling for Sound Synthesis
Some readers may have noticed that, in all the studies described in this chapter, the focus was rarely on formal listening tests. Instead, the developers were usually the ones to evaluate the sounds produced. Although anyone working in this research area should have good listening skills there is undoubtedly much to be learned from carrying out extensive listening tests on a range of people, including subjects with and without experience in sound design, both gamers and nongamers. The results could reveal failings/overkill in the current techniques and should inform future research projects. This should, in turn, lead to improvements in efficiency which is desirable in any aspect of computer game programming, and certainly in audio programming. Finally, there are still barriers to overcome in the adoption of physical modelling for sound synthesis in common game engines. While Menzies highlights the difficulties encountered in linking sound synthesis to a physics engine (Menzies, 2007), he also stresses that these difficulties can be overcome. So far, many projects have used a physics engine by harnessing the information already available from it, but no physics engine has been adapted for sound synthesis from the inside. The potential benefits of this are worth exploring as, in a commercial sense, an in-built sound synthesis module could give a physics engine a competitive edge. Looking forward to a time when the methods discussed here can be fully integrated into computer games, it is important to view physical modelling not as a standalone practice, but as a branch of procedural audio. Within the context of computer game sound effects there is a continuum from pure physical modelling to purely abstract synthesis methods. Audio programmers will require not only the knowledge but also the creativity to know which part of this continuum they should use in different situations in order to create the most compelling sounds with the resources available. Furthermore, game designers will need to be aware of the new possibilities made
356
available by developments in physical modelling and procedural audio. At a minimum, games are set to become more realistic due to improved audio. Each time a game is played, a unique soundtrack, tailored to that game experience, will be created. The subtleties therein will match that which the gamer’s intuition expects based on what he or she sees and this will lead to a more immersive experience. Beyond this, new gaming possibilities are opening up, ranging from small enhancements of current common situations to completely new prospects. For example, a common puzzle in The Legend of Zelda: A Link to the Past (Nintendo, 1992) was to identify the weak point of a wall so it could be destroyed. A weakness was usually visible and also produced a different sound when struck with a sword. However, the sound produced by striking either a strong or weak point was always one of two samples. With the incorporation of physical modelling into this situation, not only could the sound of striking a wall contain information on the material and thickness of the wall, but also on the sword used to strike it. This would still hold true even if the sword had been created uniquely by the gamer during gameplay, and this links to another range of possibilities facilitated by physical modelling. Games will be able to produce sound for objects that were not conceived during development. As mentioned in the introduction to this chapter, this is particularly desirable in games such as LittleBigPlanet (Sony Computer Entertainment Europe, 2008) which encourage users to create their own unique content. Indeed, such games might require the user to fashion objects that sound a particular way in order to solve a puzzle. As also mentioned earlier, with the increasing sophistication of physics in games, we have now reached the point of objects being realistically shattered, creating a potentially unique combination of fragments each time. Again, any shattered pieces will not have been conceived during a game’s development and so the only way to create sound for them is by synthesis at
Physical Modelling for Sound Synthesis
run time. Finally, with creative thinking, there is potential for new game scenarios that are designed with the possibilities of physical modelling in mind. With the emergence of environments like LittleBigPlanet, Crayon Physics Deluxe (KlooniGames, 2009) and Phun – 2D Physics Sandbox (Ernerfeldt, 2008), gamers now have the facility to create their own worlds which evolve naturally due to the effects of physics (like a domino effect). Currently, object contact sounds are not attached to these environments but, with the introduction of physical modelling for sound synthesis, this need not be the case. If object contact sounds were synthesised in these environments, there is the potential to create an intentional sequence of sounds, that is, music. This virtual musical performance would be driven by physics and accompanied by graphics. Such an environment would give musical composers a new composition tool and gamers a creative environment. These are just some suggestions as to how physical modelling for sound synthesis might enhance computer games in the future. As research in the area continues and these techniques become available, the onus will switch to game developers to embrace them, implement them (or buy them), utilise their advantages, and dream up new possibilities.
Bruyns, C. (2006). Modal synthesis for arbitrarily shaped objects. Computer Music Journal, 30(3), 22–37. doi:10.1162/comj.2006.30.3.22
rEFErENcEs
Doel, K. d., Kry, P. G., & Pai, D. K. (2001). FoleyAutomatic: Physically-based sound effects for interactive simulation and animation. In P. Lynn (Ed.), Proceedings of SIGGRAPH ’01: The 28th annual conference on Computer graphics and interactive techniques (pp. 537-544). New York: ACM.
Adrien, J. M. (1991). The missing link: Modal synthesis . In DePoli, G., Picialli, A., & Roads, C. (Eds.), Representations of musical signals (pp. 269–297). Cambridge, MA: MIT Press. Bilbao, S. (2006). Fast modal synthesis by digital waveguide extraction. IEEE Signal Processing Letters, 13(1), 1–4. doi:10.1109/LSP.2005.860553 Bilbao, S. (2009). Numerical sound synthesis: Finite difference schemes and simulation in musical acoustics. Chichester, England: John Wiley and Sons.
Collins, K. (2008). Game sound: An introduction to the history, theory and practice of video game music and sound design. Cambridge, MA: MIT Press. Cook, P. R. (1997). Physically informed sonic modeling (PhISM): Synthesis of percussive sounds. Computer Music Journal, 21(3), 38–49. doi:10.2307/3681012 Cook, P. R. (2002). Real sound synthesis for interactive application. Natick, MA: A K Peters, Ltd. Crayon Physics Deluxe. (2009). Petri Purho (Developer). San Mateo: Hudson Soft. De Sanctis, G., Sarti, A., Scarparo, G., & Tubaro, S. (2005). Automatic modelling and authoring of nonlinear interactions between acoustic objects. In K. Galkowski, A. Kummert, E. Rogers & J. Velten (Eds.), The Fourth International Workshop on Multidimensional Systems – NDS 2005 (pp.116-122). Doel, K. d., Knott, D., & Pai, D. K. (2004). Interactive simulation of complex audio-visual scenes. Presence (Cambridge, Mass.), 13(1), 99–111. doi:10.1162/105474604774048252
Doel, K. d., & Pai, D. K. (1998). The sounds of physical shapes. Presence (Cambridge, Mass.), 7(4), 382–395. doi:10.1162/105474698565794
357
Physical Modelling for Sound Synthesis
Doel, K. d., & Pai, D. K. (2006). Modal synthesis for vibrating objects. In K. Greenebaum, & R. Barzel (Eds.), Audio anecdotes III: Tools, tips, and techniques for digital audio (pp. 99-120). Wellesley, MA: A K Peters, Ltd. Doel, K. d., Pai, D. K., Adam, T., Kortchmar, L., & Pichora-Fuller, K. (2002). Measurements of Perceptual Quality of Contact Sound Models. In Nakatsu & H. Kawahara (Eds.), Proceedings of the 8th International Conference on Auditory Display, (pp. 345-349). Kyoto, Japan: ATR. Ernerfeldt, E. (2008). Phun: 2D physics sandbox. Available from http://www.phunland.com/wiki/ Home. Essl, G., Serafin, S., Cook, P., & Smith, J. O. (2004). Theory of banded waveguides. Computer Music Journal, 28(1), 37–50. doi:10.1162/014892604322970634 Farnell, A. (2008). Designing sound. London: Applied Scientific Press. Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Fettweis, A. (1986). Wave digital filters: Theory and practice. Proceedings of the IEEE, 74(2), 270–327. doi:10.1109/PROC.1986.13458 Fletcher, N. H., & Rossing, T. D. (1991). The physics of musical instruments. New York: Springer. Funkhouser, T., Carlbom, I., Elko, G., Pingali, G., Sondhi, M., & West, J. (1998). A beam-tracing approach to acoustic modelling for interactive virtual environments. In S. Cunningham, W. Bransford & M. F. Cohen (Eds.) Proceedings of SIGGRAPH ’98: The 25th annual conference on Computer graphics and interactive techniques (pp. 21-28). New York: ACM.
358
Gaver, W. W. (1993a). Synthesizing auditory icons. In S. Ashlund, K. Mullet, A. Henderson, E. Hollnagel & T. White (Eds.) Proceedings of the INTERCHI ’93 conference on Human factors in computing systems (pp. 228-235). New York: ACM. Gaver, W. W. (1993b). What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology, 5(1), 1–29. doi:10.1207/s15326969eco0501_1 Kelly, J., & Lochbaum, C. (1962). Speech synthesis. In Proceedings of the Fourth International Congress on Acoustics, 4, 1-4. Retrieved from http://hear.ai.uiuc.edu/public/Kelly62.pdf LittleBigPlanet. (2008). Media Molecule (Developer). Surrey, UK: Sony Computer Entertainment Europe. McCuskey, M. (2003). Beginning game audio programming. Boston, MA: Premier Press. Menzies, D. (2002). Scene management for modelled audio objects in interactive worlds. In Nakatsu & H. Kawahara (Eds.), Proceedings of the 8th International Conference on Auditory Display. Kyoto, Japan: ATR. Menzies, D. (2007). Physical audio for virtual environments, Phya in review. In W. L. Martens (ed.), Proceedings of the 13th International Conference on Auditory Display (pp.197-202). Montreal, Canada: McGill University. Menzies, D. (2008). Virtual intimacy: Phya as an instrument. In Proceedings of the 8th International Conference on New Interfaces for Musical Expression NIME08. Retrieved from http://www. zenprobe.com/dylan/pubs/menzies08_virtualIntimacy.pdf
Physical Modelling for Sound Synthesis
Menzies, D. (2009). Phya and VFoley, physically motivated audio for virtual environments. In 35th AES Conference on Audio for Games. Retrieved from http://www.aes.org/e-lib/browse. cfm?elib=15171
Raghuvanshi, N., & Lin, M. C. (2006). Interactive sound synthesis for large scale environments. In Proceedings of the 2006 symposium on Interactive 3D graphics and games (pp. 101-108). New York: ACM.
Mullan, E. (2009). Driving sound synthesis from a physics engine. In Charlotte Kobert (Ed.), Proceedings of the IEEE Games Innovation Conference 2009 (pp.256-264). New York: IEEE.
Ruiz, P. (1969). A technique for simulating the vibrations of strings with a digital computer. Unpublished master’s thesis. University of Illinois, Urbana, IL.
O’Brien, J. F., Cook, P. R., & Essl, G. (2001). Synthesizing sounds from physically based motion. In P. Lynn (Ed.), Proceedings of SIGGRAPH ’01: The 28th annual conference on Computer graphics and interactive techniques (pp. 529-536). New York: ACM.
Sek, A., & Moore, B. C. (1995). Frequency discrimination as a function of frequency, measured in several ways. The Journal of the Acoustical Society of America, 97(4), 2479–2486. doi:10.1121/1.411968
O’Brien, J. F., Shen, C., & Gatchalian, C. M. (2002). Synthesizing sounds from rigid-body simulations. In T. Appolloni (Ed.), Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation (pp.175-181). New York: ACM. Pai, D. K., Doel, K. d., James, D. L., Lang, J., Lloyd, J. E., Richmond, J. L., & Yau, S. H. (2001). Scanning physical interaction behaviour of 3D objects. In P. Lynn (Ed.), Proceedings of SIGGRAPH ’01: The 28th annual conference on Computer graphics and interactive techniques (pp. 87-96). New York: ACM. Pedersini, F., Sarti, A., & Tubaro, S. (2000). Objectbased sound synthesis for virtual environments using musical acoustics. IEEE Signal Processing Magazine, 17(6), 37–51. doi:10.1109/79.888863 Raghuvanshi, N., Lauterbach, C., Chandak, A., Manocha, D., & Lin, M. C. (2007). Real-time sound synthesis and propagation for games. Communications of the ACM, 50(7), 67–73. doi:10.1145/1272516.1272541
Takala, T., & Hahn, J. (1992). Sound rendering. In Proceedings of SIGGRAPH ’92: The 19th annual conference on Computer graphics and interactive techniques, 26(2), 211-220. New York: ACM. The legend of Zelda: A link to the past. (1992). Nintendo EAD (Developer). Kyoto, Japan: Nintendo Trautmann, L., & Rabenstein, R. (2003). Digital sound synthesis by physical modelling using the functional transformation method. New York: Kluwer Academic/Plenum Publishers. Välimäki, V., Pakarinen, J., Erkut, C., & Karjalainen, M. (2006). Discrete-time modelling of musical instruments. Reports on Progress in Physics, 69, 1–78. doi:10.1088/0034-4885/69/1/R01 Whitmore, G. (2009, May). The runtime studio in your console: The inevitable directionality of game audio. Develop, 94, 21. Wilde, M. D. (2004). Audio programming for interactive games. Oxford: Focal Press.
KEY tErMs AND DEFINItIONs Sound synthesis: The process of generating audio.
359
Physical Modelling for Sound Synthesis
Physical Modelling: Simulating the physical behaviour of a real object or system. From an audio point of view, this means simulating how an object vibrates in the audible frequency range. Digital Waveguide Synthesis: A physical modelling synthesis technique that models the vibrations of a system as travelling waves. It is particular efficient for 1D systems with a harmonic frequency spectrum. Modal Synthesis: A physical modelling synthesis technique which generates sound as a sum of decaying sinusoids.
360
Discretise: The process of making a continuous model or function discrete. Usually this is performed so a problem can be solved numerically or implemented digitally using a computer. Inharmonic: Describes a spectrum in which the higher frequencies (partials) are not integer multiples of a fundamental frequency. Procedural Audio: Audio that is generated at run time from information on sound producing events. Sound generation algorithms are derived from analysis of real sound producing systems. This analysis may be physics based, phenomenological, or a mixture of these.
Section 5
Current & Future Design
362
Chapter 17
Guidelines for Sound Design in Computer Games Valter Alves University of Coimbra, Portugal & Polytechnic Institute of Viseu, Portugal Licínio Roque University of Coimbra, Portugal
AbstrAct The inconsequential exploitation of sound in most computer games, both in extent and nature, contrasts with its prominence in our daily lives and with the kind of associations that have been explored in domains such as music and cinema. Sound design remains the craft of a talented minority and the unavailability of a public body of knowledge on the subject has greatly contributed to this state of affairs. This leads to a mix of alienation and best-judgment improvisation in the broader development community. A sensitivity to the potential of sound for the enrichment of the experience—with emphasis on game specifics—is, therefore, necessary. This study presents a contribution to the practice of sound design for computer games. An approach to intentional sound design, informed by multi-disciplinary interpretations of concepts including emotion, context, acoustic ecology, soundscape, resonance, and entrainment, is distilled into a set of design guidelines that holistically address the different sound layers.
INtrODUctION Computer game sound design is in its infancy. It is still a practice almost reserved to a limited number of experts in the game industry who have typically made their own way through the field in the absence of a structured body of knowledge. The consequences are self-evident. To start with, there is no abundance of purposeful sound usage DOI: 10.4018/978-1-61692-828-5.ch017
in computer games. More relevant to the study presented here, there is little theoretical support for someone, who is not one of those experts, to perform intentional sound design. This situation is not an exception in the broader context of human-machine interfaces and interaction systems. Game development, though, is one of the fields where sound is deserving of greater attention as noted by a number of recent authors (Collins, 2008a; Ekman, 2005, 2008; Grimshaw, 2007; Peck, 2001, 2007). Additionally, in the
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Guidelines for Sound Design in Computer Games
wider field of Human Computer Interaction (HCI), research on sound is recognized as quite neglected (Brewster, 1994; Frauenberger, 2007; Hermann & Hunt, 2005; Kramer et al., 1997). One conspicuous sign of the lack of a relevant body of knowledge is the unavailability of clear guidelines or best practices. Yet, this kind of support does exist and is widely known with respect to the visual modality (Kramer et al., 1997). What is more, researchers in HCI often resort to computer games as instruments to conduct studies on several aspects (Barr, 2008; A. Jørgensen, 2004) including those related to sound. Sound design in computer games is particularly interesting because it supplies evidence of the pertinence of multiple aspects of sound in interaction. To start with, computer game sound matters to usability, in the sense of “easing the use of the system by providing specific information to the player about states of the system” (K. Jørgensen, 2006, p. 48). It can also work as support to gameplay (K. Jørgensen, 2008). Additionally, sound is a valuable component of overall game aesthetics and affective perception. Furthermore, it may be used to create and enhance emotional impact (Ekman, 2008) and contribute to immersion (Collins, 2008a; Grimshaw, 2007, 2008). Nevertheless, it is important to be aware that interaction in HCI and computer games are not the same: applications typically bracketted under the HCI label are meant to be used, while games are meant to be played (Barr, 2008; Sotamaa, 2009). The relevance of computer games in HCI research is also justified by a growing appreciation for the concept of User Experience (Hassenzahl & Roto, 2007; Hassenzahl & Tractinsky, 2006) which emerged as an attempt to promote a holistic interaction perspective beyond the more traditional efforts, such as usability. Aspects as efficiency or performance are no longer the sole design concerns: Subjective appreciation matters and also influences the former concerns. Yet again, the research has been much directed to
visual modality, leaving others, like sound, less explored (Alves & Roque, 2009a). The field that is acknowledged to be most contributive to game sound–and to many other aspects of game development, for that matter–is the movie industry. In fact, practices on game sound are strongly influenced by those from cinema. Still, although this is understandable and legitimate to some extent, it is crucial to understand that fundamental disparities exist between the two media that both impose and propose distinct approaches. It is exactly in this difference that we find most prospective development. Ultimately, what is needed is knowledge on how to compose sound attending to game scenario specifics including nonlinearity, dynamicity, and the need for variability (Collins, 2008a, 2008b). The lack of guidance in sound design has proven to be damaging. On the one hand, developers are discouraged from integrating sound in their projects leading to unbalanced interfaces when compared to our experiences in daily life or even with other media. On the other hand, and possibly more harmfully, when developers venture into sound integration they have to resort to their best judgment, not necessarily achieving interesting results (Frauenberger, 2007). In turn, all these circumstances, have contributed to users/players becoming accustomed to the factual unimportance of sound, even developing some negative associations to sound from which the urge to the mute button is an emblematic example. Muting is interesting as a transient state, not as the defensive default. Considering such a scenario and refocusing on research and development, two modes of attack seem to be imperative. One is sensitization. This means getting more people aware of the low-level appreciation that the audio component currently has and countering this by proposing innovative ways to explore sound potential. The other is to deliver support to enable the implementation of such ideas. This stretches from providing guidance on the potential concepts that may allow tackling
363
Guidelines for Sound Design in Computer Games
the intentionality of the design to pragmatic aspects of implementation. In this chapter we contribute to both these approaches. We will start by addressing some fundamental questions. Then, we will present a contribution intended to aid sound design in the form of a set of design guidelines. These guidelines are an expression of findings that we have synthesized from an interdisciplinary literature review and from an extensive analysis of media products, particularly computer games. We brought together research and concepts that include: acoustic ecology; recent studies on emotion, including the latest findings on neuroscience; physical phenomena having repercussions on the psychology and physiology of perception, cognition, and emotion; and context engineering. We will present some background to these concepts and on their prominent relationships to each other. We will also present a report on an exemplary design exercise (Alves & Roque, 2009b), following the method here presented, carried out by a team of game developers with no prior experience in sound design for the purposes of demonstrating a possible practical interpretation of our suggestions.
INtENtIONAL sOUND DEsIGN It is essential that the exploration of the usage of sound in some interactive experience does not end up confused with the mere placing of sounds on top of things. Designers should not be searching for excuses to use sound: they should be designing ways in which sound may contribute to the purpose of the application. To put it another way, in this context, sound is a means, not an end. It is not about fitting; it is about profiting. Failing to understand this enlarges the user’s perception that sound is expendable. And the truth is the user does not need our help to hear “things”. The user is not living in a vacuum being already surrounded by sounds. So, it is probably the case that, unless the sound coming from the application
364
brings some value–which can be fun, certainly–it is just disturbing the surrounding sounds. And that is when the mute button becomes handy. A sound designer must consider the project as a whole and ponder how sound will best serve the overall purposes in harmony with all other aspects. For that to be possible, it is crucial that sound designers become involved in the general design process as soon as it begins. Unless that happens, the range of possibilities will be severely curtailed by whatever other decisions have been taken. This is an issue that is documented regarding sound designers in movie industry (for example, Parker, 2003; Peck, 2001).
Emotions We have already stated the scarce consideration that sound has so far received in most designed interactive processes. No less relevant is the fact that most of the efforts on leveraging sound usage have been focused on utilitarian issues. These include complex data display, event monitoring and reinforcement of critical messages, applications for visually impaired people, and interfaces for eyes-free devices. Of course, these are all most noble quests, but they do not explore a very powerful facet of sound, which is its association to emotions. Research on emotion was not always popular although theories can be traced back at least to Plato and Aristotle. As an area of research, it has had a low profile for most of the 20th Century and only recently has it had a resurgence in interest (Damásio, 2003; Ledoux, 1998; Nettle, 2006) thanks largely to advances in neuroscience laboratory tools. The fact that it is now possible to have an internal perspective of emotion, rather than dealing with external observations alone, contributes decisively to a new consideration of emotions. To start with, it helps to set apart what is science and what is no more than wishful thinking, allowing for the credibility of the approaches that rightfully find support on the emotional plane. Also, it
Guidelines for Sound Design in Computer Games
reveals new opportunities to act according to the physiological observations of emoting processes. But, and possibly more relevantly, recent scientific findings on brain phenomena and on how cognition and emotion are intertwined (Damásio, 2000, 2003, 2005; Lane, Nadel, Allen, & Kaszniak, 2002; Ledoux, 1998) build support for unprecedented studies that aim to leverage cognitive attributes through the exploration of emotional aspects of the interaction (Norman, 2004). The new thinking on emotion contributed to a new perspective on the interaction process itself, consistent with a move in the research focus from a functionalist view of usability to a broader notion of User Experience (Hassenzahl & Tractinsky, 2006; Mahlke, 2007; Mahlke & Thüring, 2007; Norman, 2002). User Experience privileges quality of interaction over instrumental aspects and introduces “the general notion of technology as a positive aspect of our daily lives” (Mahlke, 2007). In computer games, the experience and the explicit designing of emotions are core concepts (Freeman, 2003; Marks & Novak, 2009; Schell, 2008) and–apart from game categories such as “serious games”–they constitute the ultimate argument for consumption. That said, it seems fair to argue for a more thoughtful exploration of sound, namely in what concerns its potential association to emotions (Ekman, 2008; Follett, 2007; Grimshaw, 2007; Peck, 2001), with both a focus on purely hedonic purposes and through an exploration of how the achievement of specific emotional states may indirectly contribute to pragmatic goals such as various aspects of performance: efficiency, effectiveness, perception, memory, and so forth. Interestingly, in other disciplines, sound has proven to be notably associated with emotion; relationships between sound and emotion have been traditionally explored in areas like music (Juslin & Sloboda, 2001) and cinema (Peck, 2001; Sider, 2003) with a solid body of knowledge. One aspect that appears fundamental to the research of sound and emotion in interaction,
and which also remains overlooked, is the need for a holistic perspective on sound, exploring the benefits of considering the auditory component not as a set of independent stimuli but as a coherent composition integrated with the context of the experience.
Acoustic Ecology Acoustic ecology (Kallmann, Woog, & Westerkamp, 2007; World Soundscape Project, n.d.; Wrightson, 2000), an area founded mostly by music composers, is very insightful to an emotionally meaningful conception of contextualized sound. It is supported by the central concept of soundscape (Schafer, 1973, 1994) and the thereby developed musical composition (Truax, 1995, 2001). Together, they represent a meaningful body of knowledge with particular emphasis on context, emotion, and interaction between the listener and the environment. The term soundscape means the “sound heard in a real or virtual environment” (Wrightson, 2000, p. 10) considered as a whole. A soundscape is an ecologically balanced entity where sound mediates the relationships between individuals and the environment. So, acoustic ecology implies a consideration of how the environment is understood by those living within it: regarding sound, the focus is on how it functions, not simply how it propagates. Acoustic ecology also supports the idea that an acoustic environment can be understood as a musical composition. This emphasis on the concepts of harmony and orchestration is not mere lyricism. Studies on natural environments show balance in level, spectra, and rhythm. For instance, it was observed that “animal and insect vocalizations tended to occupy small bands of frequencies leaving ‘spectral niches’ (bands of little or no energy) into which the vocalizations (fundamental and formants) of other animals, birds or insects can fit” (Wrightson, 2000, p. 11). Another implication is that the listener shares responsibility in composition (Wrightson, 2000).
365
Guidelines for Sound Design in Computer Games
The idea of the listener as a composer is very insightful. First, it gives relevance to the sound the listener himself produces (composes and/ or interprets), intentionally or not. Second, and perhaps more impressively, it emphasizes that the user completes the composition by filling the “meaning” that is absent or that is not evident (Truax, 1995). This process of construction is assumed to be personal since the overall context, where an acoustic environment fits, is different for each person. We consider that these insights from acoustic ecology can be adapted to inform sound design in computer games from which conceit it becomes relevant to conceive of a translation of the knowledge generated around the concept of soundscape: This will be driven by research on the implications of such an idea to overall perception and the emotional dimension of interaction.
resonance and Entrainment One goal inherent to game design is to allow for engaging experiences. Thus, it is important to reflect on reasons that may lead to a player not becoming engaged with a designed setting. Perhaps we have to recognize that, ultimately, such lack of engagement may be explained by the player’s own will and not by flaws in the design process. This is no excuse, however, for ignoring important sound design considerations. In some circumstances, the deviation from the predicted behavior derives from the fact that there is no matching between the player’s state and some desired state in any particular moment. From physics, we borrow two related concepts that allow us to describe and to formalize a model to actuate design with the purpose of addressing this circumstance. These concepts are resonance and entrainment (Augoyard & Torgue, 2005; Sonnenschein, 2001), both physical phenomena having repercussions on the psychology and physiology of perception, cognition, and emotion (Leeds, 2001; Sonnenschein, 2001).
366
Resonance is the matching between vibratory rates and is found in all periodic, sinusoidal movements. It requires a concordance between the exciting frequency and that of the object put into vibration. A resonant system exists when an object is able to make another resonate. Natural resonance occurs when an object vibrates as a consequence of being excited with its own natural frequency. If the object has the ability to vibrate to a variety of frequencies, resonance can be forced. Accordingly, we can describe the unengaging circumstances mentioned above as failures to achieve resonance: For diverse reasons, there is no match between the game sound features and the player. This is not just a figurative interpretation of the concept: Our interest in resonance is indeed related to how the body, as a system, responds to sound stimuli and this is no different from what has been exploited by music through the ages. One explanation for the failure in the desired matching, derived from the concept’s definition, is that the entities–the player and the setting–are in such different states that no resonance can even be forced. The second concept–entrainment–gives us a hint on how to work on that problem. Entrainment has to do with the synchronization between resonant systems. It “has been found so ubiquitous that we hardly notice it” (Sonnenschein, 2001, p. 97). Entrainment has long been used by music to induce specific states of consciousness (Leeds, 2001; Sonnenschein, 2001). In terms of psychoacoustics, the pertinence is to change the rate of brainwaves, heartbeat, or breath according to verified associations between those rates and cognitive and emotional states. For entrainment to happen, three conditions must be met (Leeds, 2001). Firstly, a system will only entrain another if the latter is able to achieve the same vibratory rate. Secondly, the former needs power enough to prevail over the latter. Finally, the former needs to keep the same vibratory parameters until the latter is able to entrain. Regardless of whether we opt to take this liter-
Guidelines for Sound Design in Computer Games
ally or as an insightful metaphor, we must realize that if we want a player to resonate to a system’s desired state we may need to first get the system resonating with the player and then progressively bring the system–and the player along–into the desired state. Resonance–including, for our purposes the related concepts of entrainment, sympathetic vibration, resonant frequencies, and resonant systems–has been said to be “the single most important concept to understand if you are to grasp the constructive or destructive role of sound in your life” (Leeds, 2001, p. 35). We believe resonance is fundamental to the exploration of sound in computer games, notably to support a model that serves as an aid to understanding and, hopefully, overcoming the issue of empathy between a game and its players.
GUIDELINEs FOr sOUND DEsIGN IN cOMPUtEr GAMEs Based on the concepts and findings here described, we have distilled a set of guidelines for sound design in computer games. We encourage readers to understand this set as a work-in-progress. Our purpose is to contribute to the research community by building knowledge that can give us and other researchers the confidence to consider it plausible and worth refining not least for its use value to computer game sound designers. Therefore, these guidelines have no claim (yet) of truth-value: instead, their value is strictly instrumental to the research and structuring of a body of knowledge in sound design. Also, the guidelines do not prescribe procedures but, instead, establish a mindset that can inform those procedures. In that sense, they state what to care about rather than stipulating how to do it in a particular instance. But, most of all, they are meant to generate understanding, not to be obeyed. The guidelines attend to the identification of several affective aspects of sound design, includ-
ing: considering the relevance of acoustic properties of elements selected for interaction, namely as to their emotional effect; conveying meaning and coherent consequence to diegetic sound, inside the gameworld; allowing to perform through the exploration of the sonic outcome of meaningful actions; exploring the activation of events and interaction elements through the interpretation of the corresponding acoustic expressions; integrating users’ context in the sonic composition; supporting and exploiting resonance and entrainment; and dealing with perception issues during a user’s experience. Each guideline is presented with a description, relevant context, and examples. For the conception of these guidelines, we did not focus on speech-based interaction. Also, although we do not exclude the use of music, we are mainly interested in exploring interaction through non-musical sounds. In terms of sound layers (Peck, 2001, 2007), this does not mean we will not be considering dialog and music because that would ruin our commitment to the holistic approach underpinning our research: Depending on the purpose with which specific sound stimuli are added to the composition, they can play a role in any layer. It simply means we are not attempting to contribute guidelines that specifically go into such matters as dialog generation and interpretation or musical composition in the strict sense.
Guideline 1: select Elements with High sonic Potential It is strategic that the inherent, potential sonic expressiveness is valued when selecting the interaction protagonists in early stages of design. This mindset applies to the full extent of the game’s components, including objects, characters, script, and features such as the gameplay. Actually, this guideline is the mother of all others here presented: In every each of them, for the designer to be able to implement the respective idea, a dedicated selection of these components is mandatory. We will avoid stating it as a prerequisite because it
367
Guidelines for Sound Design in Computer Games
is not supposed to happen before those ideas are set. Both the selection of the elements and the setting of the ideas that will explore them will profit from a tight process of decision-making along the progression which, in turn, ought to be carried out from the very early phases of the overall design process. Also important to notice is that it is not about selecting sounds. It is about selecting game elements, taking into account how they will supply the sonic properties that are required to accomplish some design aspect. This distinction is absolutely fundamental. Unless that is kept in mind, then energies will be spent on enlarging the mistake of not using sound but covering with sound. Actually, using sound to wrap the elements in a game is not an error per se. Metaphorically speaking, we do prefer our gifts when they come in a nice wrapping paper. Still, that nice paper can be discounted and disconnected from the gift itself: Even if we opt to keep the paper, the gift and the paper will still be independent entities, not contributing to the others accomplishments but being in their separate existences. The attentive selection of interaction elements, prizing rich sonic expression, expands the space of possibilities in design time. This will allow fulfilling the intentionality of the soundscape whilst maintaining contextual consistency. Also, it should be easier to provide a good auditory perception of the environment if objects in it are identifiable or provide context through their sonic properties. Choosing and combining acoustic protagonists may be thought of as the construction of a dialect, specific to the project and which will allow supporting its communication model. This calls for a creative effort of collecting and combining possibilities. Still, it is useful to be attentive to some opportunities. One is that elements may have different states of sonic expression: roughly, the sound emitted while in customary or natural conditions and the sound emitted when the element is “activated”. In some cases, more states,
368
or even variation in a continuum, may be identified. For example: a squeaky rubber duck has no sonic expression when left alone but possesses a very well known sonic identity when squeezed; conversely, a cicada has a customary expression that ceases when disturbed; a waterfall seems to have the same characteristic sound both on its own and when someone bathes in it; and, a flock of pigeons also emits sound in both situations but these are very distinct (mating and feeding versus alarm and flapping wings). In another vein, if we need a game character to drive fast through the rush-hour traffic, we might consider including a car horn and choose carefully its sound (according to Guideline 3 below). So, there are countless possibilities to explore, depending on what is intended to be communicated. Although some acoustic elements may be added–or patched–along the project, without overall disturbance, others imply strategic decisions and consequently need to be analyzed in the early stages of design. In the latter case, above, resorting to a siren of some emergency vehicle service would imply the necessity to fit such decision in the design options: even considering it would be plausible in the scenario, it might be inappropriate if too many other design decisions had been taken. Finally, a related challenge is to reunite elements, which are coherent among themselves, within the whole project. For instance, unless premeditated, dinosaur roars and bottle pops, would not be compatible, although each one would possibly be associated to ideas that we might need to combine (let’s say, angst and repose). The issue is compatibility, not verisimilitude: we are happy to hear the bad guys’ spaceship exploding in the void, although we know that would be impossible (The Curious Team, 1999).
Guidelines for Sound Design in Computer Games
Guideline 2: select Elements Whose changes in sonic Expression May support or translate Emotions When designing a game’s emotional script, the designer should evaluate how sound will contribute to it. There is no doubt emotions are core to computer games. Additionally, it is well documented that sounds can be used to support emotional contexts. Actually, that is a common practice–and sometimes the ultimate goal–in some mature fields as music (Gouk, 2004) and the cinema (Lynch, 2003). It is important to notice we are not claiming that sound should be the way to support emotions in computer games. Sound is one way to contribute to that but one way that should not be forgotten, considering its potential and particular strengths for these purposes. One approach that can be further explored, when selecting each acoustic element according to its association to emotion, is to evaluate it with an emphasis on its ability to support different emotions, that is to say, to express emotional changes through its own sonic alteration. This is not mandatory, since emotional changes may be achieved by resorting to different elements–possibly one to support each different emotion–but it may be advantageous to explore the use of elements capable of supporting several emotional states and signaling the correspondent change. That, for instance, may relieve the user from interpreting new sonic elements for their emotive associations, and may provide gains in effectiveness. Moreover, the swapping of distinct sonic elements in the soundscape is more prone to erroneous interpretations, such as motion of their respective sources, although visual information may be enough for disambiguation. Finally, and more relevantly, this approach is more likely to offer continuity and emotional gradations. As in Guideline 1, this is a matter of creative gathering and the selection of possibilities. A few illustrative examples of elements and their possible associate emotional states would be:
birds (relaxation, attentiveness, fleeing); weather elements (calm, scaring); baby sounds (joy, tranquility, agitation, affliction); nice breakable materials (aesthetic contemplation, trespassing, destruction).
Guideline 3: Allow sound to Matter in the Gameworld The nature of the interaction, as perceived by the user, should be extended in order to genuinely integrate sound as an instrument for action in the environment. This is perhaps the most neglected use of sound in computer games. Sound, if used, is predominantly relegated to complement the visual rendering. It serves as output, which is good but just half the idea. In fact, acting through sound makes perfect sense in a system with a bidirectional interface. There is no reason for sound driven actions not to deserve the same kind of appreciation as running, jumping, grabbing, or shooting. Allowing the player to perform through sound, either as a consequence of some contextualized and meaningful action or by explicitly deploying some sonic event, has the potential to greatly extend the value of the experience. Moreover, it significantly enlarges the space of possibilities in terms of design of the gameplay. Reasons for the under-exploration of this kind of approach may be that this is something that could hardly be borrowed from music or cinema–the chief contributors for sound design practices in computer games (Deutsch, 2003)–and that it is also commonly neglected in computer application interfaces. It should be noted that we are thinking beyond speech-activated commands. Speech recognition is not a goal in our study. Also, the kind of input suggested in this guideline is particularly meaningful if it does not consist of a mere mapping of commands that otherwise would be entered by pressing a key or button. Although the latter may be useful, it doesn’t truly represent a change in the interaction itself but only in its activation. In
369
Guidelines for Sound Design in Computer Games
fact, to observe this guideline, the actual activation, at the level of the interface, can still resort to a typical key press instead of true sound input. In our non-digital lives we often resort to sound to make things happen: We open our way into the crowd by saying “excuse me, excuse me” rather then pushing or shooting; we yell to the annoying neighbor’s dog to counter its attack (sometimes it gets worse but we still do it); we cough to make someone notice us; we use the car horn to stop another driver hitting us; we walk more or less loudly according to our intention to make ourselves noticed, even if unconsciously; and so on. Sound plays a huge part as input in the communicational model, not only as dialog, in a strict sense, but also in more indirect ways. So, we have the means to get inspired about what could be different in computer games. In fact, when put this way, it seems that it is not about how to let sound in, but rather how to stop forcing it out of the game: How to escape from the bias of visual predominance and derived solutions, and how to allow for more balanced approaches. One aspect that we believe ought to deserve careful attention is the construction of a sense of coherence. In truth, when we claim the need to consider sound consequences, we are already addressing the issue of coherence between the value of what is seen and what is–or should be–heard. But let us confine, for now, our reasoning to what is heard: The inclusion of aspects in the game that are sound-driven may turn out to be improper if they reveal an incomprehensibly unequal treatment regarding other aspects that are evident candidates for the same behavior. This is not about realism: the coherence is relative to the gameworld, not necessarily to the real world. Instead, it is related to the holistic perspective that is dominant in the notion of the soundscape. Of course, incoherence can become accepted based on the willing suspension of disbelief. The player can indeed adapt to the game’s reality where, for instance, a very noisy event does not trigger any kind of reaction from enemies but the
370
slightest imprudence regarding noise in the scope of some other specific event can unleash the devil. Even so, and excluding the merit of well-designed alternate realities, such adaptation demands at least a first effort from the player. That effort has little to do with playing: it is exterior to the gaming experience itself. The player–the game user–gets confronted with the implausible and has to solve it consciously before eventually coming to accept it. In turn, that compromises flow and game immersion. If indeed the required suspension of disbelief comes at a cost with no intended value, just as the player is able to overlook the limitations of a compromised game design, efforts ought to be made to minimize the effect. Some examples of the ideas expressed in this guideline can, in fact, be found in a few existing computer games. In the Thief game series–for example, Thief: Deadly Shadows (Ion Storm Inc, 2004)–and Metal Gear Solid 4 (Kojima Productions, 2008), both stealth games, some items can be thrown in order to make noise and consequently divert enemies’ attention to them. In the latter, it is even possible to knock on nearby objects with similar purpose. In both games and others, such as The Elder Scrolls IV: Oblivion (Bethesda Game Studios, 2006) the sound of the character’s footsteps can broadcast his position. Other hypothetical examples would be: yelling to frighten or as part of the strategy to defeat beasts, whistling to call our dog or horse, clapping hands to scare birds and so on.
Guideline 4: Allow Meaningful sonic control for Intended Actions This works as an inversion of the cause-effect relationship in events with a natural or associated sonic expression. As in Guideline 3, this guideline relates sound and acting, however, this time instead of performing some event X and expecting that other events Y are triggered or shaped by its sonic expression, we are suggesting a way to trigger an event Z by performing its own sonic expres-
Guidelines for Sound Design in Computer Games
sion. The idea is to allow the player/character to produce the sound that translates the actions that are intended to occur. An interesting collateral effect is that, in this process, the player/character substitutes or participates in the correspondent sound and, consequently integrates into the overall composition. In contrast to the former guideline, in order to cope with this one, it seems relevant to allow for actual sound input. Conceptually, this differs from strict voice commands in the sense that the input does not reflect an order for something to happen but rather the actual sonic expression of something as if it were already happening. This is indeed a relevant distinction, with some implications both in format and semantics. One difference is the nature of the emitted message: Text versus expression. Another is the timing and duration of the message. In the case of voice commands the order precedes the action and its duration does not depend on that of the action; in the case of the approach we are suggesting, the stimulus and the action are theoretically simultaneous: The action starts as soon as the stimulus is identified (despite, in practice, that this will imply some latency) and lasts for as long as the stimulus is maintained. Consequently, there are also differences in the kind of control that is possible for actions that are flexible regarding duration. Also, it is conceivable that we interpret variances in the acoustic parameters along the stimulus (intensity, pitch and so forth) and dynamically shape the action according to preset conventions. Furthermore, there are significant aesthetic differences: For instance, the proposed approach evidences great potential regarding the exploration of the input sound as a component of the game’s artistic value. Finally, there are differences in terms of the emotional impact underlying each approach: For example, if we are actually giving orders, as in some war games such as Tom Clancy’s EndWar (Ubisoft Shanghai, 2008), voice commands may feel more appropriate, while, in some other scenarios, making non-verbal sounds may provide a better experience. Again, we em-
phasize that we are not arguing the value of one approach over the other: our aim is to contribute to the enrichment of the space of possibilities. One final point that should not be overlooked is the potential ludic value inherent to making sounds: that is, in performing at the interface. Thus, not only the ludic meaning of the triggered actions but also the activation itself becomes part of the game. This is a rare opportunity. Typically, the activation level is not conceived of for the purposes of providing fun. There is not much joy in the act of pressing keys at the keyboard, moving the mouse, pushing buttons in controllers and so on (although, to be fair, there is fun inherent to the use of some interface devices such as steering wheel and pedals, musical instrument imitations, and some modern game console controllers). Of course, the design of the sounds that are supposed to be input–a matter that fits into Guideline 1–has a determinant importance on the kind of achievements that may become possible at this level of the game. Other hypothetical examples would be: driving a cart on a path while avoiding running over crossing animals by producing the sounds of the engine and possibly the emergency brake, gaining focus over a wooden box to move it on a rock floor by imitating the sound it would make and controlling directions with mouse or keys, making a ball jump different heights according to the modulation of some established sound, shooting a gun by vocalizing the shots, shooting different guns using a feature of automatic weapon selection based on their distinct shot sounds and so forth.
Guideline 5: Allow Integration of Player’s context into the soundscape composition Context plays an important role in interaction processes. Also, sound is both part of that context and a way to express it. It is worthwhile to explore the possibilities in terms of soundscape composition and, particularly in respect to affec-
371
Guidelines for Sound Design in Computer Games
tive sound, allowed by the consideration of the player’s context. Actually, all guidelines here presented have been strongly influenced by a constant attention to context. In all aspects–interaction protagonists, emotional support, consequent sound, action through sound–there is always an emphasis on the need to consider a global perspective, both concerning the integration of the different modalities and regarding the different combining approaches in the particular case of sound. The bottom line is that no approach is good unless it fits in the whole. If it does not, either the approach or the whole needs to be adjusted. This guideline goes a little further in terms of the consideration for context. The argument is that the context is not limited to the game itself. A game is played by someone who actually has– and is–context too. So there is no point in trying to figure out how to turn a game into a perfectly designed context piece if we leave out the only element of the context who would possibly appreciate it: the player. Some concepts that have recently became well-known in game design, such as immersion (Grimshaw, 2008) and flow (Csíkszentmihályi, 2008), emphasize, in different ways, the pertinence of getting the player and the game into the same plane of existence. These approaches focus mostly on the migration of the player into the game. We suggest tackling the same issue in a complementary way, which is somehow the reverse method: To extend the game in order to embrace the player, that is, to build the game around the player. Dealing with context poses complex challenges. Conceptually, all aspects of the player’s context matter to whatever is done in the scope of that context. In practice, this has two related implications. One is that, since it is not technically viable to seize all context parameters, it becomes necessary to identify and capture the most meaningful parameters of that context, considering the process we are designing. The other is that we cannot afford to neglect some aspect of the
372
context that turns out to be indeed influential to that process, bearing in mind the problem that contextual aspects are inherently non-evident. Another class of challenges is the actual reading of the contextual parameters which, in many cases, demands the usage of probes or sensors. In turn, this is potentially problematic not only in terms of the availability of those devices but because some of them can be considered intrusive or uncomfortable to use. An example of contextual parameters, which we suggest for the sound designer to consider, is the player’s ambient sound (as in Cunningham, Caulder, & Grout, 2008 and Cunningham, Grout, & Picking, 2011). This might be useful to dynamically equalize each of the categories of game sounds according to the expected ability of the player to perceive them. Or, in a more complex endeavor, it might become interesting to integrate the players’ ambient sound, or some of its acoustic parameters, into the game’s sound. Still, we should not restrict ourselves to soundto-sound explorations: all possible combinations are relevant to game design, at the very least those that have sound in either of the extremes fit the present guideline. For instance, we are particularly sensible to acoustic explorations that can be develop from the readings of the players’ physiological indicators, namely heartbeat, breath, and brainwaves. In truth, there are some classical examples of similar exploration in other domains, as evidenced by the relationship between music rhythm and the heartbeat. We believe that, since these indicators provide hints on the player’s emotional state, it will be interesting to consider their potential to dynamically set compositional aspects of sound in game scenarios thus aiming at a better resonance and possibly as the basis for entrainment. This is suggested in Guideline 7 below (see also, Nacke & Grimshaw (2011) on the monitoring of psychophysiological states of players and implications for game sound design). An aspect that also deserves some commentary is the possible contradiction between leading
Guidelines for Sound Design in Computer Games
the player into the context of a fantasy world and bonding with the context of the real world. Indeed, once a resonance state between player and game has been established, the player might appreciate being transported to another context. Actually, the sense of escapism is part of the argument for playing computer games. Even so, this is not contradictory with the effort suggested in this guideline. To start with, because it is a prerequisite to first be able to empathize with the player (something we will explore in Guideline 7 which concerns entrainment). Next, the kind of context that is integrated in the experience and the way that context is translated into the experience do not necessarily evidence the bonds in such a manner that they anchor the player to a former state or to the consciousness of a real world existence. Ultimately, the designer may decide that the more immersive the current state the less binding there is with the player’s outer context. But even then, the ability to evaluate the immersion level will probably require reading certain parameters from the player’s current context. Most of all, it seems to be a matter of dynamically adjusting the components of the context that are the most critical to resonance management.
Guideline 6: consider shared context in Multi-player Environments This is an extension of the previous guideline through the consideration of multi-player environments. Each player’s context may include the perception of aspects of the other players’ context. The argument is that, in a multi-player environment, context is both local and global (Roque, 2005 and discussed in terms of a virtual acoustic ecology by Grimshaw, 2008). It may be advantageous if each player perceives not only other player’s actions but also relevant elements of the context that shaped those actions. The implementation of this guideline calls for the combination of elements deriving from different players, which, in turn, are captured or
integrated according to the techniques mentioned in Guideline 5. Regarding the combination of the stimuli, it is important to be attentive to the insights from acoustic ecology and consider that the design of a shared-context soundscape should support the fitting of individual interventions rather than superposing their disconnected sounds (Wrightson, 2000). This approach may be considered with different purposes: for example, simply aesthetic, taking advantage of aspects of the global complexity; as a mechanism to deliver a sense of presence and of activity of the respective community; as part of the gameplay, making available some aspects and hiding others according to what best serves the game mechanics.
Guideline 7: Integrate Acoustic Elements that May support Entrainment Entrainment can be used to support the maintenance or the change of emotional states. Sound is one prominent way to implement entrainment which can be achieved by progressively moving from one state of resonance into another. In terms of game experience, keeping the player emotionally involved along time, as complexity grows and emotions unfold, is crucial. As the term entrainment suggests, the idea will be to create the conditions for the player to engage with and to be transported on a journey. Still, the path can be too turbulent for the designer to assume the player will have enough of a pleasurable experience to warrant reaching the end. The consequences of such an observation are relevant. The most important is that any tool a game designer has to monitor and direct the course of action in order to avoid losing players will be valuable. In this sense, entrainment, and its support through sound, is instrumental. Also, regarding each particular instant of the experience, the managing of the proximity between a player’s emotional state and the expected (or even required)
373
Guidelines for Sound Design in Computer Games
emotional state may be addressed through the idea of resonance. Finally, and although resonance must be granted during the whole experience, the initial moment–that is, the first resonant achievement–is particularly challenging. It is clear that it will be harder to go from a state of no resonance to a state of resonance than it will be (later) to move between resonance states. The latter situation, being well designed, should allow a more continuous transition. To address the achievement of initial resonance, at least two approaches can be explored. One is to speculate about the initial mindset and emotional state of the player and gently move from there. That is no different from what is done in other forms of communication: It is a good idea to perform some sort of introduction before getting into the core of the message. Still, the contents of the introduction have to be tuned according to the context of the listeners, which frequently has to be estimated. Although this approach is technically simple it may be ineffective due to the lack of indicators about both the starting context and the evolution of the process. So, a second class of approaches, where there is some way to read indicators that permit a better judgment about those aspects, will allow more efficiency. For this purpose, any known technique to dynamically infer a player’s emotional state will be useful. In the scope of the present study we find particular relevance in those techniques that take into account the player’s physiological rhythms, namely heartbeat, breath rate, and brain waves because of their potential exploitation in terms of sound (see Guideline 5). The problem with the actual reading of such indicators is the device apparatus which is likely to be found intrusive and, as such, contraindicated in terms of the experience. The relationships between emotions and heartbeat, breath rate, and brain waves have long been explored (for example, Atwater, 1997; Leeds, 2001). Musical examples are Shamic drumming, that induce theta brain waves with consequent approximation to deep sleep and trance state, and
374
Balinese Gamelan, which has a beat phenomenon that generates frequencies of about 4 to 8 Hz and this also targets the theta brain waves. Another example, more commonly acknowledged, including in computer games–for instance, inFamous (Sucker Punch Productions, 2009) and Uncharted: Drake’s Fortune (Naughty Dog, 2007)–is the use of strong beats that gradually increase in rhythm and intensity in order to emulate the heart rate that would match the designed emotional state. Depending on the intended purpose, these practices may be used to inform game design. Once again, the acoustic elements used to design the conditions for entrainment should fit in with the design of the soundscape according to the principles covered in Guideline 1.
APPLYING tHE GUIDELINEs When we argue for the relevance of the integration of sound in the design of interaction processes, based on the observation of the discrepancy between current game sound use and the value that sound assumes in everyday life, it may seem we are implicitly claiming for a balancing in the gameworld similar to that in real world regarding the prominence of sound in interaction. This is not the case. We are addressing the design of a virtual world where, in principle, there is no reason for us to be anchored to the constraints of the real world. So, the designer should pursue not fidelity to reality but, rather, creativity. Again, this should not be confused with a discussion around the search for realism, although that is also an interesting matter to approach in the context of this text (see Farnell, 2011). It may be clear by now that we prize an exploration of sound that goes beyond the concerns for realism. We acknowledge that a rich experience does not require a realistic approach to sound. Of course, the ability to achieve realistic features–at some sound layers–is interesting in the sense that it enlarges the boundaries of the space of possibilities, but it
Guidelines for Sound Design in Computer Games
seems fairly evident that it is not a requisite. What is more, paradoxically, approaching realism can be troublesome in terms of perception and emotional response, as is the case of the “uncanny valley” phenomenon (Grimshaw, 2009; Tinwell, Grimshaw, & Williams, 2011) that comprises a feeling of strong discomfort with greater humanlikeness. That is, plausibility and precise realism become issues, and failure to achieve them contributes severely to the degradation of the experience. Considering that we have been presenting foundations to possible insights that might inform game sound design, we feel the need to not let pass unnoticed the importance of the designer having a background in gaming and possessing an extensive analysis of the widest possible universe of computer games. Particularly, once one is sensible to sound design, one develops attentiveness to sound facets when playing computer games, even unintentionally. Experimenting with computer games in a genuine setting–that is, playing games–and possibly becoming or taking advantage of being a hardcore player, is also one rich source of information and insights (Aarseth, 2003). Even gaming experiences that are perceived as poor, become sometimes most valuable if one can rationalize what seems wrong and what would be an alternative. A different reason why it is relevant to actually play computer games, with a behavioral pattern similar to that of the players who are the typical consumers of the kind of games we are addressing, is that, as we argued, a player’s perceptions are strongly influenced by context. In turn, the context of a certain player is also shaped by the number and diversity of games played before, amount of time usually dedicated to playing, the number of playing hours in a given moment and so forth. Adding this to the inherent difficulty in grasping other people’s contexts, it seems appropriate to say that the more the researcher or designer is able to feel like a player, the closer the judgments reached will be to those of players (even when considering that no two players are
equal, nor even that one player remains the same through the passage of time). Finally, and somehow in the same vein, it is fundamental to recognize that we will never be designing the players’ behaviors or feelings. Instead, through sound design, we are working with the conditions that will influence those players into what is intended to be a desired emotional experience. But, again, since those players will always be subject not only to the designed conditions but also to other conditions that constitute their own current context–including manifesting their own will and deciding, for example, not to engage–it is not reasonable to be assertive and didactic about effectiveness. In fact, because games are mostly forms of participatory media, the players also are, to some extent, designers of their own experiences.
A DEsIGN EXErcIsE We present an example of the application of the guidelines by a group of developers with no prior experience in game sound design. The exercise involved a team of 5 Master’s students on a course in game design and development (Alves & Roque, 2009b). The team was commissioned with the design of a game specifically intended to demonstrate the importance of sound in gameplay. This prompted them to think about a game that could not otherwise be played except with and through sound. Our argument for attaching this example to this chapter is twofold. On the one hand, it serves as an instantiation that may be useful to illustrate a possible interpretation of some of the suggestions this study provides. On the other hand, it goes some way to verifying the plausibility of the guidelines we have presented. Of course, at this point, the simple observance of this experiment does not provide the support for a generalization of the results, but the results are an encouraging indicator nonetheless.
375
Guidelines for Sound Design in Computer Games
Game Plot and setting The game is a single-player adventure, suitable for audiences over the age of 6. It is about a castaway and his rescuing from an island inhabited by fictitious creatures. The plot comprises gaining the sympathy of the native creatures in order to get their help in calling the attention of some passing ships. Two input methods were designed: vocalized sound input through a microphone and, alternatively, the use of keystrokes to model the corresponding programmed sounds. The game takes place in an island scenario where the playing character interacts with a set of creatures, one at a time, by interpreting their sound manifestations in the context of the game diegesis. As an example, the player has to “gain trust” of a creature by imitating its pitch, its rhythm and so on with two end results: unlocking some progress in the game and training the ability to recognize and reproduce specific sound characteristics, in the context of other sound sources, in order to achieve a specific composition. The coordination abilities thus gained by the player will then be put to a final test in a final setting.
story A castaway gets into an island inhabited by strange creatures. He notices that ships pass at a distance and that they might rescue him, but, when he tries to signal his presence by yelling to them, he fails to get noticed. On the island, there are several accessible zones and each zone is inhabited by one species. A species population consists of a bunch of cubs and a parent: The cubs are curious, the parent is neutral though vigilant. The cubs’ behavior triggers communication-learning episodes where the castaway iteratively tries to replicate their utterances. After a certain number of successful such episodes, the parent becomes receptive to communication and the castaway, combining expressions learned from the cubs, starts a communication process to conquer its
376
sympathy. When he has succeeded, the parent volunteers to accompany the castaway to the beach and to help him to yell for the attention of the ships passing by. While they yell, someone, in one of the ships, appears to have noticed something but assumes it was an illusion because the stimuli coming from the beach were too weak. In each of the other zones in the island the plot repeats: each time a parent becomes a friend, the entire group gathers at the beach for another attempt to catch the attention of the passing ships. With each attempt, the perception grows that the aim is about to be achieved until, after enlisting the aid of a certain number of creatures, the goal is finally reached and the castaway rescued. Non-interactive (cinematographic) scenes include the arrival at the island and lonely call for passing ships, moving onto the beach accompanied by friendly creatures and yelling to the passing ships, and a ship’s crew member wondering about the yelling sounds (this is a distinct scene each time a new creature joins the group).
Gameplay The castaway moves through the island’s zones (there being no predefined sequence) and in each zone there are two types of interaction: with cub creatures and with their parent. All interactions happen between the castaway and only one creature at a time with each interaction comprising an iterative process of alternate interventions in a dialog. The interaction can be aborted before success is achieved by the player’s decision or because a certain number of iterations has been reached. There is no enforced order to the interactions but it is mandatory to successfully interact with several cubs before being able to complete with success the interaction with the parent. The dialog with a cub is initiated and conducted by it while the dialog with the parent is initiated and conducted by the castaway. The success condition in the relationship with a cub depends on sufficiently matching its utterances and the success
Guidelines for Sound Design in Computer Games
condition in the relationship with a parent depends, firstly, on its receptivity to communicate–which, in turn depends on the number of cubs with whom a successful conversation has been carried out–and, secondly, on the level of satisfaction to which the castaway can lead the creature in a process where, in response to each castaway’s sound sequence, the creature manifests the correspondent sympathy reaction. The level of sympathy may drop during the interaction with the parent. Every zone in the island shares the same game mechanics: what differs are the sound stimuli. The relationship with the cubs can be understood as a learning process of the sound stimuli that will eventually allow a successful relationship between the castaway and the cubs’ parent. On the other hand, the relationship with the parent is an exploratory exercise of composition through the combination of these stimuli with some room for creativity. Regarding similarity evaluation criteria of the sound stimuli used in interactions, in a first approach, the following acoustic variables were considered: duration, loudness, and pitch. In practice this means sounds do not have to be strictly identical: they only have to match according to those variables.
critical reflection on the Exercise The observation of the design experience surrounding this exercise provided a reinforcement of the idea that the observance of this set of guidelines implies that they must be considered from the early stages of the overall game design process. The guidelines involve fundamental aspects of the interaction which could hardly be tuned and achieved if too much design features had already been decided. That is an important consideration. We may have the need to put it as a prerequisite or accept the limitation of this effort if used upon an already well-developed design. Although, in this exercise, there were the optimal conditions to escape this struggle (the exercise was
designed from scratch), keeping a faithfulness to the principle still demanded tenacity, despite the passionate attentiveness to the guidelines. Ironically, despite all that freedom, it was not particularly easy to come up with a satisfying idea that permitted one to experiment with the set of guidelines. Actually, that was a time consuming task and a valuable lesson that deserves some commentary. It was evident, for those involved in the exercise, that the team was particularly unaccustomed to the opportunity of thinking in auditory terms. For instance, the insights often suffer from too much visual bias: In a moment when auditory possibilities were being experimented with, the team agreed it was desirable to go beyond a simple mapping to visual elements and worked instead to make the gameplay itself as strongly influenced by the audio component as it is by the visual modality. In the early stages of this exercise the team was uneasy about how long the observance of the proposed guidelines would have to be explicitly carried. Yet, and although the circumstances of the research did not allow designers to forget about them, once the design was defined, particularly the game flow and interaction, their requirements became embedded into the whole design and, as intended, subsequent steps related to sound became merely a matter of implementation. One difficulty, more operational than conceptual, had to do with which sound files to use. This was not exactly a surprise since we knew beforehand that “sound designers are often limited by having poor, outdated equipment, not enough off-the-shelf sound libraries, but most importantly, not enough time to go out and get new, original sounds for the game project” and that “sound is art [and] to make a game sound artful […] sound designers [must] have the time and money to practice their art” (Peck, 2001, p. 1). There are several reasons for us to mention our experiences regarding this practical aspect. First, to note paucity of existing sounds and lack of time to record new ones were critical factors in this particular exercise.
377
Guidelines for Sound Design in Computer Games
Second, and more important, to remind one how significant such a bottleneck may be for this kind of endeavor in general. Finally, to acknowledge that, despite the predictability of such difficulties, a priori conditioning the space of possibilities as a function of the already available sound materials would be extraordinarily limiting. Finally, we realize the designed gameplay includes a tacit approach to the problem of the players’ adaptation to the game model, in terms of both interface and game mechanics. This addresses an early concern: The introduction of uncommon ingredients in interaction, unless carefully accomplished, can pose difficulties for players. In the case of this exercise, the interaction with the island creatures occurs as an iterative procedure which is, in fact, a learning process. Most pleasing is that such learning makes sense inside and along the game: It is not an introductory level with a tutorial goal. In that sense, it is the character, not the player, who learns.
cONcLUsION We exposed the discrepancy that exists between current exploitation of sound in computer games and the value that sound assumes in interaction processes in our daily-lives. We reinforced this point by mentioning that in other domains, such as music and cinema, sound has proven to be effective in many aspects that are also critical to the experience of computer games. We also contextualized current game sound design with sound design in the wider scenario of interaction systems, namely those addressed by HCI. We made a point of the fact that noticing the relevance of sound in other fields is insightful and can provide relevant synergy. However, computer games have their own specifics that oblige proper adaptation and, most of all, they provide opportunities that are particular to the field. Considering our assessment of the current status, we argue the need for a collective sensitivity to
378
the importance of the integration of sound design in game development practices and advocate the requirement of conceptual guidelines for those who will undertake sound design. We reiterate that sound design should serve the project’s intentionality and constitute a whole along with all other aspects of game design. Attempts to do sound design directed by the need to provide “something to be heard” are limited, do not honor sound’s potential, and may even cause problems with other aspects of the game. Implicit in this thought is that this conceptual sound design ought to be performed right from the early stages of the project and be applied to all semantic layers of game sound. We contributed to the recognition of the value of sound design by presenting an approach that is based on a multi-disciplinary interpretation of several concepts. These include: emotions, regarding which we have empathy for the neurological approach because it provides a less context-dependent way to deal with personal behavior; context, which allows us to understand the individual as a complex being blended with others, with the environment, with own prior experiences, and so on; acoustic ecology, which provides a contextual conceptualization of sound with emphasis on the affective dimension; soundscape and soundscape composition, both concepts derived from acoustic ecology; resonance and entrainment, two physical concepts with repercussions for perception, cognition, and emotion and that inspire interpretations of emotion management through a game experience. From a holistic consideration of principles and insights subsidiary to these concepts, a set of guidelines for sound design in computer games has been drawn up. The guidelines address several affective aspects of sound design, including: valuing the acoustic properties of all interaction protagonists and their influence on perception and emotions; conveying meaning to the presence of sound in terms of consequence inside the designed world; acting through sound by performing meaningful actions which have valuable sonic expression;
Guidelines for Sound Design in Computer Games
using sound associated to events as an input to control them; ensuring coherence in the use of sound; integrating the player’s context in the sonic composition, including in multi-player games; exploring resonance as a instrument to achieve a binding between the player and the designed intent; and the use of entrainment as a model to create a dynamism of resonance states according to the emotional script. We also presented a report on a brief design case where those guidelines were exercised and conducted by a team of game developers with no prior experience in sound design. We registered some uneasiness on the part of designers to work with the acoustic field as well as they do with the visual field: Fighting the visual bias that leads to sound merely being an extension of visual representations becomes a primary task. Difficulties also arise with quality audio sampling and with communicating sonic design ideas or intentions when compared to drawing visual renderings on paper. In further research we intend to augment and refine the set of design guidelines and to build a significant understanding of their application. Particularly, we will be considering how to enhance the approach to dynamic composition of soundscapes in computer games, with special relevance to designing the experience with nonmusical layers of sound.
AcKNOWLEDGMENt We thank the Master’s students involved in the design exercise here presented: João Pinheiro, Lara Silva, Nuno Lourenço, Pedro Almeida, and Sandra Mendes. This research is partially supported by FCT, Fundação para a Ciência e a Tecnologia, grant SFRH/PROTEC/49757/2009.
rEFErENcEs Aarseth, E. (2003, August). Playing research: Methodological approaches to game analysis. Paper presented at the Digital Arts and Cultures Conference, DAC2003. Melbourne, Australia. Alves, V., & Roque, L. (2009a). A proposal of soundscape design guidelines for user experience enrichment. In Proceedings of the 4th Conference on Interaction with Sound, Audio Mostly 2009 (pp. 27-32). Glasgow, UK. Alves, V., & Roque, L. (2009b). Notes on adopting auditory guidelines in a game design case . In Veloso, A., Roque, L., & Mealha, O. (Eds.), Proceedings of Videojogos2009 - Conferência de Ciências e Artes dos Videojogos. Aveiro, Portugal. Atwater, F. (1997). Inducing altered states of consciousness with binaural beat technology. In Proceedings of the Eighth International Symposium on New Science (pp. 11-15). Fort Collins, CO: International Association for New Science. Augoyard, J. F., & Torgue, H. (Eds.). (2005). Sonic experience: A guide to everyday sounds. Montreal, Canada: McGill-Queens University Press. Barr, P. (2008). Video game values: Play as human-computer interaction. Unpublished doctoral dissertation. Victoria University of Wellington, New Zealand. Bethesda Game Studios (Developer). (2006). The Elder Scrolls IV: Oblivion [Computer game]. 2K Games & Bethesda Softworks. Brewster, S. A. (1994). Providing a structured method for integrating non-speech audio into human-computer interfaces. Unpublished doctoral dissertation. University of York, Heslington, UK. Collins, K. (2008a). Game sound: An introduction to the history, theory, and practice of video game music and sound design. Cambridge, MA: MIT Press.
379
Guidelines for Sound Design in Computer Games
Collins, K. (2008b). Nothing odd about audio. Retrieved September 31, 2009, from http://www. slideshare.net/collinsk/sk-466356 Csíkszentmihályi, M. (2008). Flow: The psychology of optimal experience. London: Harper Perennial. Cunningham, S., Caulder, S., & Grout, V. (2008). Saturday night or fever? Context aware music playlists. In Proceedings of the 3rd Conference on Interaction with Sound, Audio Mostly 2008 (pp. 64-71). Piteå, Sweden. Cunningham, S., Grout, V., & Picking, R. (2011). Emotion, content and context in sound and music . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Damásio, A. (2000). The feeling of what happens: Body and emotion in the making of consciousness. London: Vintage Books. Damásio, A. (2003). Emotion, feeling, and social behavior: The brain perspective. The Walter Chapin Simpson Center for the Humanities. Retrieved September 31, 2009, from http:// depts.washington.edu/uwch/katz/20022003/antonio_damasio.html Damásio, A. (2005). Descartes’ error: Emotion, reason, and the human brain. London: Vintage Books. Deutsch, S. (2003). Music for interactive moving pictures . In Sider, L. (Ed.), Soundscape: The School of Sound lectures 1998-2001 (pp. 28–34). London: Wallflower Press. Ekman, I. (2005). Meaningful noise: Understanding sound effects in computer games. In Proceedings of Digital Arts and Cultures 2005. Copenhagen, Denmark.
380
Ekman, I. (2008). Psychologically motivated techniques for emotional sound in computer games. In Proceedings of the 3rd Conference on Interaction with Sound, Audio Mostly 2008 (pp. 20-26). Piteå, Sweden. Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Follett, J. (2007). Audio and the user experience. UXmatters. Retrieved September 31, 2009, from http://www.uxmatters.com/MT/archives/000200. php Frauenberger, C. (2007). Ears))): A methodological framework for auditory display design . In CHI ‘07 extended abstracts on Human factors in computing systems (pp. 1641–1644). San Jose, CA: ACM Press. Freeman, D. (2003). Creating emotions in games. Berkley, CA: New Riders Games. Gouk, P. (2004). Raising spirits and restoring souls: Early modern medical explanations for music’s effects . In Erlmann, V. (Ed.), Hearing cultures: Essays on sound, listening and modernity (pp. 87–105). Oxford: Berg. Grimshaw, M. (2007). Situating gaming as a sonic experience: The acoustic ecology of first person shooters . In Proceedings of Situated Play (pp. 474–481). Tokyo, Japan: DIGRA. Grimshaw, M. (2008). The acoustic ecology of the first-person shooter. Saarbrücken, Germany: VDM Verlag Dr. Muller. Grimshaw, M. (2009). The audio uncanny valley: Sound, fear and the horror game. In Proceedings of the 4th Conference on Interaction with Sound, Audio Mostly 2009 (pp. 21-26). Glasgow, UK.
Guidelines for Sound Design in Computer Games
Hassenzahl, M., & Roto, V. (2007). Being and doing: A perspective on User Experience and its measurement. Interfaces, 72, 10–12. Hassenzahl, M., & Tractinsky, N. (2006). User Experience—a research agenda [Editorial]. Behaviour & Information Technology, 25(2), 91–97. doi:10.1080/01449290500330331 Hermann, T., & Hunt, A. (2005). Guest Editors’ Introduction: An Introduction to Interactive Sonification. IEEE MultiMedia, 12(2), 20–24. doi:10.1109/MMUL.2005.26 Ion Storm Inc (Developer). (2004). Thief: Deadly Shadows [Computer game]. Eidos Interactive. Jørgensen, A. (2004). Marrying HCI/Usability and computer games: A preliminary look. In Proceedings of the third Nordic conference on Human-computer interaction, NordiCHI ‘04 (pp. 393-396). Tampere, Finland. Jørgensen, K. (2006). On the functional aspects of computer game audio. In Proceedings of the 3rd Conference on Interaction with Sound, Audio Mostly 2006 (pp. 48-52). Piteå, Sweden. Jørgensen, K. (2008). Audio and Gameplay: An Analysis of PvP Battlegrounds in World of Warcraft. Gamestudies, 8(2). Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford: OUP. Kallmann, H., Woog, A. P., & Westerkamp, H. (2007). The World Soundscape Project. The Canadian Encyclopedia. Retrieved September 31, 2009, from http://thecanadianencyclopedia.com/ PrinterFriendly.cfm?Params=U1ARTU0003743 Kojima Productions (Developer). (2008). Metal Gear Solid 4: Guns of the Patriots [Computer game]. Konami.
Kramer, G., Walker, B., Bonebright, T., Cook, P., Flowers, J., Miner, N., et al. (1997). Sonification report: Status of the field and research agenda. Retrieved September 31, 2009, from http://www. icad.org/websiteV2.0/References/nsf.html Lane, R. D., Nadel, L., Allen, J. J. B., & Kaszniak, A. W. (2002). The study of emotion from the perspective of cognitive neuroscience . In Lane, R. D., & Nadel, L. (Eds.), Cognitive neuroscience of emotion (Series in affective science) (pp. 3–11). Oxford: OUP. Ledoux, J. (1998). The emotional brain: The mysterious underpinnings of emotional life. London, UK: Phoenix. Leeds, J. (2001). The power of sound. Rochester, VT: Inner Traditions. Lynch, D. (2003). Action and reaction . In Sider, L. (Ed.), Soundscape: The School of Sound lectures 1998-2001 (pp. 49–53). London: Wallflower Press. Mahlke, S. (2007). Marc Hassenzahl on user experience. HOT Topics, 6(2). Retrieved September 31, 2009, from http://hot.carleton.ca/hot-topics/ articles/hassenzahl-on-user-experience/ Mahlke, S., & Thüring, M. (2007). Studying antecedents of emotional experiences in interactive contexts. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 915-918). San Jose, CA: ACM Press. Marks, A., & Novak, J. (2009). Game development essentials: Game audio development. Florence, KY: Delmar Cengage Learning. Nacke, L., & Grimshaw, M. (2011). Player-game interaction through affective sound . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global.
381
Guidelines for Sound Design in Computer Games
Naughty Dog (Developer). (2007). Uncharted: Drake’s Fortune [Computer game]. Sony Computer Entertainment.
Sonnenschein, D. (2001). Sound design: The expressive power of music, voice and sound effects in cinema. Seattle, WA: Michael Wiese Productions.
Nettle, D. (2006). Happiness: The science behind your smile. Oxford: OUP.
Sotamaa, O. (2009). The player’s game: Towards understanding player production among computer game cultures. Unpublished doctoral dissertation. University of Tampere, Finland.
Norman, D. (2002). Emotion & design: attractive things work better. interactions, 9(4), 36-42. Norman, D. (2004). Emotional design: Why we love (or hate) everyday things. New York: Basic Books. Parker, P. (2003). Filling the gaps . In Sider, L. (Ed.), Soundscape: The School of Sound lectures 1998-2001 (pp. 184–194). London: Wallflower Press. Peck, N. (2001). Beyond the library: Applying film postproduction techniques to game sound design. In Proceedings of Game Developers Conference. San Jose, CA. Peck, N. (2007, September). Unpublished Presentation. CoFesta/TGS, Tokyo, Japan. Roque, L. (2005). A sociotechnical conjecture about the context and development of multiplayer online game experiences. In Proceedings of DiGRA 2005 Conference: Changing Views – Worlds in Play. Vancouver, Canada. Schafer, R. M. (1973). The music of the environment. Cultures, 1973(1). Schafer, R. M. (1994). The soundscape: Our sonic environment and the tuning of the world. Rochester, VT: Destiny Books. Schell, J. (2008). The art of game design: A book of lenses. London: Morgan Kaufmann. Sider, L. (Ed.). (2003). Soundscape: The School of Sound lectures 1998-2001. London: Wallflower Press.
Sucker Punch Productions (Developer). (2009). Famous [Computer game]. Sony Computer Entertainment. The Curious Team. (1999). Curious about space: Can you hear sounds in space? Ask an Astronomer. Retrieved September 31, 2009, from http://curious. astro.cornell.edu/question.php?number=8 Tinwell, A., Grimshaw, M., & Williams, A. (2011). Uncanny speech . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Truax, B. (1995, September). Sound in context: Acoustic communication and soundscape research Simon Fraser University. Paper presented at the International Computer Music Conference. Truax, B. (2001). Acoustic communication (2nd ed.). Westport: Greenwood Press. Ubisoft Shanghai (Developer). (2008). Tom Clancy’s EndWar [Computer game]. Ubisoft. World soundscape project. (n.d.). Retrieved September 31, 2009, from http://www.sfu.ca/~truax/ wsp.html Wrightson, K. (2000). An introduction to acoustic ecology. Soundscape: The Journal of Acoustic Ecology, I (I, Spring 2000), 10-13.
KEY tErMs AND DEFINItIONs Context: Context encompasses intrinsic and extrinsic aspects that surround and influence interaction phenomena. Disregarding context can
382
Guidelines for Sound Design in Computer Games
make all the difference, namely, deviation from a predicted outcome. Context has long challenged engineering and design disciplines. Emotion: There are many possible levels to approach and therefore define emotions. In this text we adopt the cognitive neuroscience perspective, which explains emotions as body reactions that include releasing chemicals in brain and blood. Acknowledging this biological basis emphasizes how seriously the matter ought to be taken: It is definitely not something oneself can decide whether to attend to or not, once exposed to “competent” stimuli. This perspective also supports the notion that changes occurring in the body are accompanied by automatic associations, for instance, joy makes our cognition tend to speed up while sadness slows it down. Entrainment: Entrainment refers to the synchronization of resonant systems. Breath, heartbeat, and brainwaves are examples of resonant systems for which entrainment may be explored as studied in psychoacoustics. There are two types of entrainment: internal-to-internal and external-to-internal. Internal-to-internal refers to entrainment among one person’s pulse systems, namely heart, breath, and brain. For instance, when heartbeat increases so does breath rate. External-to-internal has to do with the changing of internal rhythms through external stimulation, in our case, through sound. The latter is what allows for entrainment through design; the former augments the opportunities regarding the system at which that entrainment is target. Resonance: Resonance is the phenomenon in which an object is put into sympathetic vibration by finding a concordance between its frequency and an exciting frequency. There are two types of resonance: natural (also called free), when an object vibrates as a consequence of being excited with its own natural frequency; and forced, if the object has the ability to vibrate to a variety of external frequencies. The functioning of the
tympanic membrane is an example of the principle of forced resonance and, here, the limits of what can be forced establish the audible range. The human body is subject to resonance at many levels, depending on the frequencies to which it is exposed. Sound Layers and Semantics: One way to address the complexity of the components of sound design is by classifying sound stimuli in layers according to their semantics. Classifications, as borrowed from the body of knowledge and practice in film, might include: dialog, which is the discourse; music, for setting the emotional tone; foley, which is the sound of actions; ambience, comprising the sounds of the environment; and sound effects, which are the sounds of abstract or imaginary objects. Soundscape: Soundscape is a concept that derives from the field of acoustic ecology and refers to the sound of an environment heard as a whole. A soundscape is an ecologically balanced entity where sound mediates relationships between individuals and the environment. This holistic consideration puts emphasis on context, emotion, and interaction between the listener and the environment. Soundscape Composition: Acoustic ecology supports the notion that a soundscape can be understood as a composition: like a musical composition. What is more, soundscapes can be composed. This inherent sense of harmony and orchestration is not mere lyricism: for instance, studies on animal vocalizations, in natural environments, evidence balance in level, spectra, and rhythm. Willing Suspension of Disbelief: The term comes from the early 19th Century British poet Samuel Taylor Coleridge who argued that an infusion of reality into the fantastical was required for readers to accept implausible narratives. It has since been widely adapted for the study of computer games and immersive environments.
383
384
Chapter 18
New Wine in New Skins: Sketching the Future of Game Sound Design Daniel Hug Zurich University of the Arts, Switzerland
AbstrAct With the disappearance of technological constraints and their often predetermining impact upon design, computer game sound has the opportunity to develop into many innovative and unique aesthetic directions. This article reflects upon related discourse and design practice, which seems strongly influenced by mainstream Hollywood film and by a striving for naturalism and the simulation of “reality.” It is proposed that this constitutes an unnecessary limitation to the development and maturation of game sound. Interestingly, a closer understanding of aesthetic innovations of film sound, in particular in relation to what can be termed “liberation of the soundtrack,” can indicate thus far unexploited potential for game sound. Combined with recent innovations in creative practice and technology, they serve as inspiration to propose new directions for game sound design, taking into account the inherent qualities of the interactive medium and the technological and aesthetic possibilities associated with it.
INtrODUctION: tHE crEAtIVE DEAD-END IN GAME sOUND DEsIGN Growing up and maturing is usually associated with acquiring independence from one’s parents and leaving the family home, to follow one’s own, autonomous destiny. This archetypal human narrative could well be applied to game sound,
which, having matured significantly over the last two decades, in many ways still seems to live with its parents, Mrs. Film Sound and Mr. Realism. This manifests itself in both game design practice and technological developments, as well as in the discourse that permeates it all. At present, like a child, the game sound has a limited horizon and is oriented very much to its “parents”.
DOI: 10.4018/978-1-61692-828-5.ch018
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
New Wine in New Skins
Just a chip off the Old block? You might suggest this is not a problem and, perhaps, even quite a normal situation. In fact, once again using film as an example, movie makers have drawn their aesthetical points of reference upon theatre, photography, pure document, and so on. But the greatest advances within the medium occurred when it developed its very own, emancipated aesthetics. In their “Realtime Art Manifesto”, Auriea Harvey and Michaël Samyn suggest that, to develop a unique language for the real time 3D medium and to avoid imitation of any old medium, artists should: “Imitate life and not photography, or drawings, or comic strips or even old-school games” (Harvey & Samyn, 2006). This is probably overstating (after all, it is a manifesto!) but still raises an important point. I suggest that the computer game represents a young medium that still needs to find its own, autonomous identity, in particular concerning sound aesthetics. I propose that an understanding of the driving force behind the maturing and emancipation of film sound can contribute to aesthetic innovation in game sound design. This sounds contradictory at first, but there are good reasons for this strategy. In principal, I argue that film sound undertook a similar path towards maturity. Of course, it would be misleading and naïve historicism to think that the aesthetic developments of film can be directly applied to games. An examination of the history of film sound predominately serves as point of reference, showing what basic strategies of innovation could be used. Underlying principles are carved out and translated to the realm of game sound by relating them to specific qualities of the medium. To this end, historical, theoretical, technical, cultural, and formal aspects of film sound aesthetics are investigated. Ultimately, the idea is to encourage a fresh approach to game sound design that, although inspired by film sound in some ways, actually detaches it from this heritage. In this article I would like to contribute to a discussion which is slowly emerging and could be
labelled: How do we ensure that the wine (sound design,1 aesthetics) we put into the new skins (the medium: computer games) is not mouldy, but fresh and fruity? To cut a long story short: I do not have a definite answer and there probably isn’t one. But a travel through the creative history of film and game sound, and through the consideration of some of the intrinsic qualities of computer-based interactive games, suggests several creative approaches that could very well be a useful contribution. The aim of the article is thus not to provide ready-made solutions, but rather to enrich an existing discourse by cross-fertilizing with other fields. Therefore, those of you who believe that the journey sometimes is the reward, please read on.
Overview The first section will describe the state of the art in game sound design, and shall elaborate upon the reasons for the creative limitations as they stand. I will outline the recent historical developments in creative practice and technology while also taking the underlying discourse into account. I will focus on significant contributions to innovations, inspired by both the technological advances and the innovative approaches taken by mainstream commercial and independent developers. Following this, the second section will provide an overview of relevant developments within film sound design, from the arrival of sound film to the present day. Most importantly, it will describe the new design strategies resulting from the “liberation of the soundtrack”. The third and final section proposes directions for game sound design, which in particular build upon the underlying concepts that motivated the innovative aesthetics in game and film sound and partially proposing entirely new approaches based on the essential qualities of interactive computer games and the unique experience they provide. These propositions will also be discussed in the light of technological approaches that could be
385
New Wine in New Skins
used to implement them and will be illustrated through examples.
GAME sOUND DEsIGN tODAY the Dominating Paradigms: simulation of reality and Hollywood Film Aesthetics Looking at the topics discussed at the 2009 AEC conference on Game Sound in London (Audio Engineering Society, 2009) as well as the websites of the Interactive Audio Special Interest Group (IASig2), the Game Audio Network Guild (G.A.N.G. 3), Gamasutra4 and others, shows that the hot topics in contemporary game sound are quite evident: dynamic mixing and digital signal processing (DSP), dynamic procedural sound generation techniques, and meta formats like Interactive XMF.5 In many cases, the related discourse concerns the development of a credible “recreation of reality” (Young, 2006). The technical apparatus of computer games provides everything needed for creating sounds that provide a truly coherent simulation of “reality”. With the help of sound design, middleware sounds can be “attached” to sources, placed in Cartesian space and linked to movements and scripted events. A powerful combination of software engine and sound hardware calculates and produces the correct psychoacoustic transformations for creating the illusion of location, movement and spatiality. The creative focus on the simulation of reality is manifest in the interest in providing a naturalistic presentation of complex sound environments. Simon Carlile writes in a comment to an article on the future of Game Audio: There can be hundreds of simultaneous sound objects when we cross the road but fortunately we hear out the approaching truck pretty reliably. But to allow that capability in games, the sound
386
objects need to be rendered in a way that the brain expects so that the information they represent can be effectively processed. Virtual Reality research demonstrates that plausibility and consistency are very important in generating the sense of presence and supporting in-world performance. There is a need then to attend to the “objective” characteristics of the sound object (particularly environment ambience). (Bridgett, 2009a) The automatic generation of sound in real time through physical modelling seems to be a logical next step in game sound, just as physics simulations have become established through middleware like Nvidia’s PhysX.6 In this scenario, “creative” interventions are mostly limited, for example, to modifying volume falloff curves in order to fine-tune dynamic spatial mixing, but even this is driven by functional necessities, which are mostly the understandability of dialogue, creating clear distinction between “foreground” and “background”, or preventing an overload of the mix. It is symptomatic, that so far the only winner of the award for “Most Innovative Use of Audio” (given by the Game Audio Network Guild7 that is not a music game8 follows these strategies and uses the technologies mentioned: Tom Clancy’s Ghost Recon: Advanced Warfighter 2 (Ubisoft, 2007) was awarded for its “audio controlling graphics & physics engine”. This example reinforces the observation that the ideal of a simulation system is driving innovation in game sound design. It is arguable that the only exception from this limiting orientation towards the reproduction of “realism” seems to be found in the genres of horror and survival games as well as in games with suspense-driven settings (see, for example, Kromand, 2008). But within such contexts, “realism” is simply replaced by the aim of fulfilling established aesthetic expectations and conventions using stereotypes and clichés from filmic genres (horror, psycho-thriller and so forth). On closer examination, this reveals a fundamental dialectic inherent within contemporary computer games:
New Wine in New Skins
There is no “real” set where the action takes place and that could serve as point of orientation for establishing verisimilitude (Grimshaw, 2007). It has been noted that games constitute a “cinematic realism” rather than an “objective” one (Collins, 2008, p. 134), re-creating a sense of immersion and believability within a fantasy world by means already established in film. This leads to the other dominating ideal of aesthetics in current computer game sound discourse which is achieving a more “filmic” soundtrack. Despite essential (theoretical) differences (I will discuss these in the last section in more detail) the discourse about the state of the art and the possible future of mainstream game sound design often resorts to “Hollywood” and film sound design as a point of reference. In many cases, the production and technology of games and film are also very similar (for example, Collins, 2008; Grimshaw, 2007). On Gamasutra, Rob Bridgett (2006) argues that game sound has progressed “both towards and away from it’s [sic] antecedent of film sound” (p. 5). While the movement away from film sound is mainly concerned with a need to produce eventbased, interactive sound, the movement towards film sound has strong aesthetic implications. However, not everybody is satisfied with this situation. A cursory glance at articles within relevant online and offline game magazines reveals that, due to the arrival of the so-called “Next Generation” consoles (Microsoft XBox 360, Sony Playstation 3, and, to a certain extent because of technical limitations, the Nintendo Wii), a small but significant number of publications are dealing with aesthetic challenges and the vast unexplored potential lurking in this still relatively new medium. Inspired by Randy Thom’s (1999) article “Designing a Movie for Sound”, Bridgett (2007a) calls for designing games for sound: A game’s design should be created with sound in mind, from the very beginning, to allow the soundtrack to fulfil its potential. Some authors express their frustration with an aesthetic “dead end”. Peter Drescher (2006a) states on his blog:
Here we have these outrageously powerful desktop machines, easily producing many more channels of audio than were available to $100,000 mixing consoles just a few years ago -- and THIS is the best we can come up with!? tired, recycled 50’s gladiator movie soundtracks? the Matrix again and again? heavy metal guitar cliche after cliche ... ach! my ears!! gimme the volume control ... In searching for innovative game sound concepts, I shall further develop upon the following two areas: technology that could drive innovation, and innovation that eventually could drive technology or just change the way we see (or hear!) things.
technological Opportunities Interactive Mixing and Digital Signal Processing The sophistication of game sound has been greatly facilitated by several technological advances in the last 10-15 years, both in hardware (DSP chips on soundcards, affordable multichannel output or 3D virtualization technologies for loudspeakers and headphones, distribution on DVDs, and massive increase of memory) and in software technologies (standardized, dynamic audio APIs like OpenAL, powerful sound design middleware like FMOD or Wwise). Two important techniques that have emerged are adaptive (or interactive) mixing and DSP. In some of the more recent postings in the audio thread at Gamasutra.com, Rob Bridgett reflects on his experiences mixing titles like Scarface: The World Is Yours (Vivendi, 2006), and LittleBigPlanet (Sony, 2008). He puts forward the idea of interactive mixing as a key strategy for advancing game sound and proposes a combination of both film standards (such as grouping, auxiliary channels, automation including mixer snapshots and standardization) with game-specific mixing features. Such features include fall-off management, passive (for instance, auto-ducking)
387
New Wine in New Skins
and active, script-driven mixing techniques and a more game-specific use of snapshots. Bridgett also stresses that game sound requires specific artistic techniques. Again, these are mostly inspired by film sound (Bridgett, 2009a). He also discusses the application of these technologies and methods in titles like Scarface: The World Is Yours, Heavenly Sword (Sony, 2007), Fable II (Microsoft, 2008), LittleBigPlanet and Prototype (Activision, 2009). Scott Morgan provides an example of how advanced mixing engines can be used to create more dynamic interactive soundscapes basing on the smart selection and mixing of 18 parallel channels of audio and the application of runtime reverb and filtering to change the overall feeling of the atmosphere when needed (Morgan, 2009). The potential of interactive mixing also raises the question as to whether this technology could be used to manage more advanced narrative functions, which are often inspired by film conventions, to control sound in games. A report put together by the group dealing with interactive mixing at the Project Bar-B-Q Interactive Music Conference raises the following question: How can we introduce a high-level mixing aesthetic to games which would allow for the control of sounds dynamically to create narrative mixes that compare with the best musical and cinematic examples (Grigg et al., 2006). Drescher, one of the group members, provokingly proposes a fictional tool called “THE Homunculonic AEStheticator”, an “interactive audio mixing engine with real-time Haptic Applicators, capable of producing multiple adaptive soundtracks encoded with True Human Emotions™, using T.H.E. algorithm.” Describing the fictional product, Drescher points out a few issues that should in fact be addressed by the game sound community: dramatic tension, intelligibility, focus, scale, romance, comedy, dynamic volume control, and automatic mastering for various output media. He also notes that, in principal, all the required technologies would be available to address these issues (Drescher, 2006b).
388
Interactive, Procedural Audio and Physical Modelling Another “hot topic” in the game sound community concerns procedural audio (see Farnell (2011) and Mullan (2011) for an extended discussion). This technology partially overlaps with interactive mixing and DSP, but is less concerned with traditional studio metaphors like channels, faders, and effects. Procedural audio describes a broad array of systems that are able to produce a range of sound outputs for a range of inputs. This includes synthetic, generative, algorithmic, and AI driven systems, among others. According to Andy Farnell, “it is better to describe procedural audio by what it is not. It is not pre-sequenced, pre-recorded sound and music” (Farnell, 2007, p. 12). There are not many examples of games that follow this approach thoroughly, but some titles, such as LittleBigPlanet or some games by Will Wright, in particular Sim City (3000 and later versions, Maxis 1999 - 2007), the Sims series (Electronic Arts, 2000 -) and Spore (Electronic Arts, 2008) closely adopt this principle. The design of such games was strongly driven by generative design principles and the contribution of programmers from the Demo scene, a computer art subculture that specializes in producing generative, non-interactive audio-visual presentations. For example, while there is a significant amount of audio material in Spore (over two gigabytes of compressed audio, according to Kent Jolly in (Jackson, 2009)), the music (which was produced in collaboration with Brian Eno) is mostly composed of short phrases and samples, which are synchronised in different ways by the audio engine. The criteria for the composition and the mixing are partially event driven and mostly generative. The result is reminiscent of minimal music by Philip Glass or Steve Reich (albeit nowhere near the quality of said composers9), providing an adequate backdrop for an open-ended game, the pace of which fundamentally depends upon how it is played. Robi Kauker states that procedural audio is the
New Wine in New Skins
only way to deal with the tremendous amount of content required for non-linear, open-world games like The Sims or Spore. On asked about whether the decision to follow a data-driven approach to the audio of these titles was driven by aesthetical or practical considerations, Kauker answers: It’s practical because there’s really no other way to make it interesting. We could play a big loop, and the ambience would go [hums] all the time. That’s lovely and it works for some types of games, but for our games that are user-developed, we have to vary the world constantly. With Spore and The Sims, you don’t know what the world looks like beforehand. You don’t know what’s going to be in the world beforehand. The only thing you know is that there is a world! That makes it different by the very nature. (Fleming, 2009) Referring to general sound effect design, Kent Jolly states: The footstep system is also complicated because you never knew what kind of foot [the player] will put on or how many feet a creature will have. The front two can be humanoid and the back can be hooves, and the hooves can be huge and the front feet tiny. (Jackson, 2009) All of the character sounds are thus combinations of various sound components depending upon character configuration. The designers created a dynamic system of samples that could be adapted in real time by filtering and other processing techniques to create the endless combinatorics that the gameplay affords, for example to make feet sound bigger or smaller, without having to change the basic sample. At this point the attentive reader may note that I have mentioned the word “sample” several times, and might wonder whether what I am describing really is “procedural audio”. In fact, this depends upon the point of view that can vary slightly from author to author (a consequence of said discourse-
driven approach). According to Farnell, the systems described here represent a (highly evolved) “data model” rather than a truly procedural one. A truly procedural system, according to Farnell, could do a resynthesis of recorded sound, which would provide a full real time control over all its parameters (Farnell, 2007; 2011). Physical modelling of sound is a technology that can be understood as a specific type of procedural audio. A significant number of the systems mentioned come close to an accurate imitation of physical acoustics, but true physical modelling systems are still in an early stage of development. Audiokinetic’s SoundSeed10 is an example of a software-based technology that creates sound variations parametrically from a single source. It uses a physical modelling-inspired method to process the sound according to various models, such as impact or air. This very limited example demonstrates there is a motivation within the market to push the envelope in this direction.
Aesthetics of “Independent” Games Now let us have a look at what is often understood as the hotbed of creativity, the independent (or “indie”) game developer community. Fuelled by accessible and widespread digital distribution systems and by engines like Unity11 that simplify development and distribution, this movement has gained significant momentum since the midnineties, with indie games having an increasing impact upon on the mass-market industry. If we look at “typical” indie game titles,12 we see a perspective on technology that is quite different from the one described above. High-end technology is usually not an option for small lowbudget productions. It seems that this limitation supports an aesthetic that seems oriented more towards animated movies, abstract representations, making references to the arcade age and the first console generations. This can be seen not only as an involuntary tendency caused by said limitations but also as an ideology: Independent
389
New Wine in New Skins
and alternative game artists and developers often emphasize more or less explicitly that the aesthetic power and potential in games does not lie in the simulation of reality alone. In the Realtime Art ManifestoHarvey and Samyn (2006) express it this way: “Make it feel real, not necessarily look real”.
Musical Approaches Some titles employ musical elements without being actual music games. In Primerose, a geometrical puzzler by Jason Rohrer (2009), different colours produce different tones that are tuned to a minor chord, and dissolving rows results in an interval of a fifth. When a chain reaction of dissolving rows occurs, the fifth rises by one tone. This, and like approaches, are similar to design strategies used in some vintage arcade games, such as Cakewalk (Commavid, 1983), Mr. Do! (CBS Electronics, 1983), Oink! (Activision, 1983), or Dig Dug (Atari, 1983), where tonality and musical motives are strongly liked to gameplay and user input. Aquaria (Ambrosia Software, 2007) adopts a more innovative and distinctive approach to linking musicality with interaction. The player has to make her avatar sing to activate certain spells. This is achieved through drawing a sequence of connecting lines between symbols arranged in a circle. Touching a symbol with the mouse produces a tone and each of the tones belongs to a harmonic scale tuned to the game’s soundtrack. An interesting fusion of interface, avatar, interaction, and soundtrack is achieved which transgresses diegetic limits with ease.
Animation Film Aesthetics Of course, a prototypical, “pure” aesthetics of animation film or cartoons does not exist, but still one can speak of a certain affinity of many indie games to animation film. This manifests itself in sounds that are more or less de-naturalized in a comical, playful, or surreal way, characterized by a subversive interpretation of sound-source associa-
390
tions. Interesting examples are titles by Amanita Design, for example Samorost 1 (2003) and Samorost 2 (2005). Blueberry Garden (Svedäng, 2009) is another example where the physical, material representation is questioned: Jumping reminds one of the sound of a wooden stick being very quickly dragged over a rough surface, flying through air distantly reminds one of the synthesized sound of a rope swirling in the air, and oversized fruit falls on the ground with a dull “thud”. Also Grey Matter13 (McMillen, Refenes, Baranowsky, 2008) is an interesting case of “cartoonish” sound design: When an abstract dot hits a flying cartoon-brain, the latter “explodes” with sounds of breaking glass. The relationship of sound and animation is motivated strongly by how the explosion is designed: The cartoon-like objects explode into spiky particles, scattering like glass. In these examples, the sound narrative is transformed into a trans-natural entity, whereby the sound design shifts from the “real” to the metaphoric, the iconic and symbolic, without losing the roots of “dirty matter”. Yet, an impact is still an impact and some of its “visceral” characteristics are always maintained. These sonic aesthetics are common in animation movies (Curtis, 1992; Beauchamp, 2005) and have only become possible through the “liberation” of sound from its source, which I will describe in more detail later.
Abstract is Beautiful Another aesthetic category in experimental games relies upon the total detachment of sound from any actual physical or even metaphorical source. Some of the examples here follow traditional design techniques (emitters, zones, event-based triggering of samples and so forth) but use them in interesting ways to create unique sonic aesthetics. In Brainpipe (Digital Eel, 2008) spatial navigation generates an abstract soundtrack that blurs the borders between music, voice, and sound effects and challenges diegetic borders by pitching down all sounds when the player de-
New Wine in New Skins
celerates. The real time strategy game Darwinia (Ambrosia Software, 2007) features insectoid, geometric life-forms, which produce synthetic sounds reminiscent of actual insect sounds. These are combined with all kinds of energetic sounds, hums and wobbles, which are attached to static or moving game entities, generating an entirely emitter-based soundscape. Dyson (Kremers and May, 2009), is an interesting hybrid between abstract and concrete sounds. Planets, represented by simple circles, have to be conquered by planting seeds. The sounds of seeds rooting in a planet are a combination of a soft rustle with a faint melodic tone emanating from the planet. When seeds start to battle, the soundtrack becomes reminiscent of a swordfight but with a rough, lo-fidelity texture. The different planets also emit different sounds: The sounds emanating from the conquered planets are ambiguous and are reminiscent of the sounds made by machinery in a laboratory, combined with faint beeps that oscillate between machine signals, and crickets. Planting new seedlings emits a glasslike, percussive sound which has no connection to the visual representation (nor does it function as a metaphor) but, rather, defines an experiential quality of the interaction with the game.
Interactive Ambience Interactive ambiences, as Bridgett points out, are still an undervalued design opportunity (Bridgett, 2007b). A related aesthetical concept has been labelled “antimusic” by Ed Lima, describing his approach of using very little musical scoring in Doom 3 (Activision, 2004) and of using carefully crafted interactive ambiences instead (Lima, 2005). This example shows the potential of using just simple, two-dimensional (foregroundbackground), static ambience design paradigms. Interactive ambiences, or ambiences that are crafted to support dramatic effects, already play a role in some mass-market titles. An early example is Thief: The Dark Project (Eidos, 1998). Some other notable examples are Half-Life (Valve, 1998
-), Splinter Cell (Ubisoft, 2002 -) and Prey (2K Games/3D Realms, 2005)–not to mention games from the survival horror genre, such as the Silent Hill series (Konami, 1999-2009). Harvey and Samyn from “Tale of Tales”14 are an independent design team who use sound extensively to develop an interactive ambience. The Path (Tale of Tales, 2009) is an interesting example, as it implements a procedural, open gameworld complemented by a distinctive sonic aesthetic that largely builds on interactive ambience. There are no sounds for the direct interactions with objects in the gameworld as such but, instead, the ambience responds to the user’s movements and actions. For example, when the avatar runs, the camera slowly moves away, the screen blurs slightly, and sharp and more aggressive string tones “intrude” upon the melancholic string and piano soundtrack. Moreover a pumping, dull sound similar to a heartbeat is played, masking the surrounding sounds. In this way, sound fosters a dual role in acting as the traditional soundtrack and as a component of narration, or rather a comment on the player’s action, ultimately making him feel responsible for the change of ambience that has occurred. It enunciates the action of sustained running as unpleasant, inappropriate, and potentially dangerous.
summary: Innovation with the Handbrake On The “parents” indeed have left a deep mark in the mind of their bastard progeny: Film sound often seems to be a lodestar for the game sound community. Like children looking for safety, many productions are oriented towards filmic, stereotypical, “best practices”–after all, if it worked in film, why should it not work for games? Certainly, technological advances have left their mark. Some faint voices call for an innovative exploitation of interactive, procedural technology. But still, the “parental” paradigms of using the technology either to replicate film aesthetics or to “simulate
391
New Wine in New Skins
reality”, prevails. As expected, the industry “dissidents” of the family, the independent developers, provide interesting aesthetic approaches to sound design and usually make an effort to find their own style. However, true innovation sound-wise is relatively rare also here: Most indie games do not move beyond the aesthetic level that animation film has already reached, and the occasional use of procedural technologies has, as yet, not been applied to sound at all. This is not necessarily a big problem. After all, there are many great games with wonderful sound out there. However, from the point of view of a longer-term advance of computer games as a medium and game sound in particular, the current situation represents a dead-end, preventing the development of a unique aesthetic identity. It is time for game sound to come of age!
FILM sOUND: FrOM WALLFLOWEr tO EMANcIPAtION As stated in the introduction, my proposition is, that film sound can teach us a few interesting lessons about how to find aesthetic independence. I will elaborate upon what those lessons could be within this section. I will firstly focus upon the developments that had the biggest impact in the aesthetic history of film sound and which are potentially of interest to computer game sound.
Pioneering Approaches Ever since the introduction of ”talkies,”15 film makers began to reflect upon the use of sound in film. The seeds for an experimental aesthetics in sound design were already planted during this pioneering era of film sound when the medium was not yet entirely defined and conventionalized. In their famous Statement, Eisenstein, Pudovkin and Alexandrov (1928) demanded that sound needed to be used contrapuntally in relation to the visual montage in order to avoid the destruction
392
of the montage. Pudovkin (1929) further elaborated this point, arguing that image and sound are united by the resulting interplay of meanings: The redundant use of sound is to be avoided and image and sound have to be developed along separate rhythmic paths by using counterpoint as an essential compositional device. Around the same time, in a different cultural context, French director René Clair expressed his concern at the stereotyped patterns that early sound films exhibited even during the early experimental stage. He saw more potential in the interpretation instead of the imitation of noises, and argued for an alternate use of the visual subject and the sounds produced by it (Clair, 1929). Over the following decades, sound became firmly established in film. As a result, aesthetic approach to sound was extended, and also revised, in some aspects. While sound was supposed to serve the image (according to earlier writers), directors such as Bresson emphasized the reciprocity of sound and image. Sound should replace image, not complement it, and it can also dominate the image (Bresson, 1985). This concept has been developed further, in particular by Michel Chion (1994). Metz considers that attention towards sound should move beyond a purely phenomenological understanding and towards sound as a socially constructed entity. We experience sound through a body of knowledge and thus its design (and study) takes place within larger cultural and ideological structures (Metz, 1980).
New Hollywood and its relatives Let us now take a look at the biggest aesthetic revolution in sound cinema in terms of historic and economic dimensions. Mainstream Hollywood in the 1950s was growing rapidly and productions became increasingly monumental, relying upon rigorous division of labour and tight production and marketing plans. This lead to the constant repetition of conventional, formulaic, “designing for the masses” approaches to film. This, in turn,
New Wine in New Skins
suffocated a lot of creativity, including sonic creativity. Studio-specific sound libraries were used over and over again, which was due partially to convenience, and partially to assist larger studios to create their signature sounds (Flückiger, 2001, Whittington, 2007). In some ways, the situation of “classic” Hollywood is comparable with the mainstream game industry today. In the late 1960s and early 1970s the tide began to turn. Directors such as George Lucas, Steven Spielberg, Francis Ford Coppola began to treat sound in an entirely different way. The inspiration for these non-conformist “movie brats” came from avant-garde movements, such as the French Nouvelle Vague: In the late fifties and early sixties, significant changes within the French movie industry provided the creative minds of François Truffaut, Alain Renais, Chris Marker, and Jean-Luc Godard, amongst others, with the freedom to break from convention and to explore new directions. The Nouvelle Vague was characterized by a critical approach to society but also to cinema itself, emphasizing the role of the author. This was an important inspiration for a new generation of Hollywood film makers, the New Hollywood. I will come back to this later. Firstly, let us consider the factors that contributed to the liberation of the soundtrack.
the sound of Music Sonically, avant-garde movements had an important impact, in particular Futurism and Musique Concrète (see, for example, Walter Murch in LoBrutto, 1994, p. 84). This led to highly innovative sound design practices. Sound designers approached sound “in itself”, interweaving the dominant causalistic and naturalistic sound ideology with the “objet sonore”, which is only attainable through “reduced listening”, where the real or supposed source of a sound and the meaning it may convey is ignored. The aesthetic achievements of Musique Concrète inspired the use of what could be termed “musical” design
strategies, and musicality emerged as a principle to create and arrange every noise in a sound track (Flückiger, 2001). Commenting on his sound work for THX 1138 (Lucas, 1971) Murch states that: “It is possible to just listen to the sound track of THX exclusive of the dialogue. The sound effects in the background have their own musical organization” (cited in Whittington, 2007, p. 57). A further consequence of this movement was the combination of synthesized sound with recorded sound that opened up new narrative and aesthetic spaces. For example, the screams of the birds in Hitchcock’s eponymous movie (1963) and created by Bernard Herrmann, Remi Gassman, and Oskar Sala on an early electronic instrument called the Trautonium.16 Another striking and highly evolved example can be found in Apocalypse Now (Coppola, 1979) where the naturalistic recording of helicopter sounds are combined, juxtaposed, and fused with wobbling sounds from a synthesizer.
sonic Gene technology As can be seen from these examples, technology often played an important role in the development of aesthetic innovations, even though it was sometimes used in an unorthodox way. A further example relates to the impact of multichannel sound in driving aesthetic innovation: Reporting about the technological and aesthetic challenges posed by the quadraphonic system, Murch states that: “Parenthetically, that’s actually where the concept of sound design came from. I felt that since nobody had ever done this before, I had to design it and figure out how to use this new tool that we’d come up with.” (LoBrutto, 1994, p. 91). Added to this was an unprecedented increase in flexibility of the recording process due to portable technology, in particular the Nagra, invented in 1959 by Stefan Kudelski. This high-fidelity portable recording technology made it possible to work in the field more frequently, and also to record sounds that previously were hard to record,
393
New Wine in New Skins
further liberating the sound recording process. For the first time the messy, wild, aleatoric, banal, everyday, even “abnormal” sounds (broken machines, leftovers and trash found in basements and attics) were embraced, becoming part of the material the “movie brats” worked with. This led to a reconsideration of the constructed nature of any soundtrack and the meticulous de- and reconstruction of the complex fictional sonic event. Sounds were now combined in layers, in complex sonic alloys, combining several qualities into a single sonic “transobject”. A prominent example is the sonic re-engineering of the “used future” in THX 1138 and in Star Wars (Lucas, 1977) with its laser swords, haggard, stuttering spaceship jet engines and worn out androids (Burtt in LoBrutto, 1994). As a consequence of new ideologies and technologies, the limitation imposed when sound is treated as an index of a single, recognizable source was overcome. All levels of association of sound and its source, be it on-screen, off-screen, or even in a reference to a cultural framework beyond the film itself, became possible. The reduced listening, abstract qualities in sounds, such as structural instability, change in energy and power, organic or synthetic notions and so on became important, particularly within the science fiction genre (Whittington, 2007). A common design strategy relates to the ability to “play” with the familiarity of a sound, using de-familiarization as a narrative device. In addition, unidentifiable or ambiguous sounds can create interpretive spaces and activate the viewer. They also can frustrate expectations and even create fear by proposing an unknown source or phenomenon, the alien and incomprehensible. Postmodern, anthropomorphic, ambiguous, hybrid machines could be created: The robot R2D2 from Star Wars “speaks” with synthetic beeps which remind the viewer of baby talk (Burtt in LoBrutto, 1994). Despite this ambiguous, uncanny sonic identity, the form is accepted, and even loved, by the listeners.
394
Is it a bird? Is it a Plane? No, it’s Ambiguity, Man! The Russian director Andrei Tarkovsky in particular explored ambiguity as a catalyst for engaging experience and depth. His use of ambiguity in films like Solaris (1972), Stalker (1979) or Sacrifice (1986) creates a sonic environment in which the audience struggles to make sense of a sound heard, creating meaning through establishing coherence between the heterogeneous elements of the audiovisual narrative. This diegetic playfulness leaves the spectator struggling with her beliefs. Truppin (1992) describes specific design strategies used by Tarkovsky: The revelation or negation of a (unexpected) source of a sound, the subversion of the coherence between sonic and visual space, or the use of sounds on parallel levels in order to enunciate qualities of both the material and the psychological or spiritual. On the other hand, Truppin notes that this use of clearly identifiable, specific and naturalistic sounds in a surreal setting might unhinge established conceptions of the real or provide signs of safety in an otherwise confusing narrative world.
sonic Perspective breaks Loose The liberation of sound from source and the increased flexibility in the recording and production process led also to the deconstruction of the perspective relationship between sound and image. So far, practices to create sonic perspective were motivated either by the need to understand dialogue or the maintenance of a more or less naturalistic sonic perspective (Wurtzler, 1992). Microphone placement possibilities provided additional restrictions. The critical authors of the Nouvelle Vague fundamentally questioned these conventions. In Godard’s films, for instance, sounds would appear relatively loud, negating the division between foreground and background and would even refuse to disappear when “more important” information appeared. What is more,
New Wine in New Skins
protagonists would seem strangely unresponsive to them. Godard would even avoid covering the edits of the sounds (Williams, 1985). This leads to a sonic aesthetic that is diametrically opposed to the transparency of the mix aimed for in most Hollywood films, old and new. Thanks to these unorthodox approaches, sonic aesthetics like these are no longer taboo and are used in several innovative sonic designs, such as subjectivization.
I Feel Good: subjectivization The possibility of enunciating subjective experience is an important aesthetic possibility emerging from the liberation of sound from its source and the techniques of montage described above. Sounds were now used in various ways to mark, or even simulate, subjective experience. Flückiger (2001) identifies several sound design strategies, which are very common and unquestioned nowadays: disassociation of sound and image, disappearing sounds, non-naturalistic reverberation, montage of unidentifiable sounds over slow-motion images, enlargement relative to the image, body sounds like breathing and heartbeats, and overemphasized, anti-naturalistic selection. For instance, in The Terminator (Cameron, 1984), and even more so in Terminator 2: Judgement Day (Cameron, 1991) the sounds of the Terminator’s leather clothes and his interactions with his sunglasses are moved towards the foreground. This creates an uneasy intimacy with the deadly man-machine. Another powerful (and aesthetically very different) example for subjectivization can be found in Pi (Aronofsky, 1998): The protagonist suffers from violent headaches. These are both marked as the protagonist’s subjective experience of pain through unidentifiable sounds, heartbeat-like music or metaphorical sounds of grinding stones, as well as simulated through high pitched screeches. The simulation effect is enhanced through the action-driven ducking of the painful sounds when the protagonist switches the lights off, temporary relieving the pain for both the character and the
audience. Dream sequences or representations of hallucination represent extreme cases of subjectivization. For a direct and radical confrontation, I recommend David Lynch’s Eraserhead (1977) or a look at the dreams of special agent Cooper in Twin Peaks (Lynch, 1990-1991), in particular the red room at the end of the second episode of season one. Other striking examples are the explicit audio-visual placement of the viewer into one of the protagonists in first-person view, for example in Predator (McTiernan, 1987).
take Me Higher: High Level semantics Last but not least, the liberation of sound from a strictly indexical function facilitated the emergence of complex higher level semantics, where primary semantics (related to the questions: What creates the sound? What is it made of? How does it move? Where is it?) became constituents of higher level meanings (Flückiger, 2001). More than before, sounds could now have symbolic and metaphoric functions standing for cultural, religious, or psychological entities (think of bells, keys, animal sounds and so on). Sounds could be established as “keysounds” within the narrative context of a specific film, for example, the sound of the scanner on the bridge in the original Star Trek series (produced by Roddenberry, 1966-1969) or the sounds of helicopters in Apocalypse Now. From here, new stereotypes and meta-signs could be established where artificial, non-referential sounds achieve a new indexicality through systematic re-use within certain genres or filmic styles. This connects to Altman’s (1992) proposition of understanding cinema as “event”. In this view, cinema is no longer an autonomous aesthetic entity, but a complex socio-cultural artefact which emerges in the interaction of complex production and reception processes (see also Metz, 1980). This encourages us to think about the many complex influences that make all aspects of cinema, not solely sound and image, meaningful. Just con-
395
New Wine in New Skins
sider all the audio commentaries, “Making Of” documentaries and Internet fan sites, educating filmgoers within the production process which in turn influences their experience and understanding of films (Whittington, 2007). Only through such processes can the start-up sound of an Apple computer become a meaningful sonic event in the Sci-Fi animation movie Wall-E (Stanton, 2008). Through the process of cultural reception, digestion, and reproduction, many of these experimental designs have now entered our collective memories and have become signs that are easy to understand or even clichés. Additionally, there is a culture of constant cross-referencing, citation, and remixing. This began with a strong emphasis on genre in New Hollywood, which soon turned into de- and reconstruction and recombination of genre into pastiche like Star Wars (elements of Western, Swashbucklers, Sci-Fi, Cartoon17) or hybrids like Alien (Scott, 1979) and Predator (Sci-Fi blended with Horror). This post-modern aesthetics became a significant driving force for film sound (Whittington, 2007) and is commonplace now, with films like The Matrix (Wachowski brothers, 1999) combining characteristic sonic signatures of SciFi, Horror, Film Noir, and Martial Arts.
brats at Work Economical, structural and technological changes alone would not be enough to drive a significant aesthetic revolution. Fundamental to innovation is the challenging of convention, developing an attitude that rules are there to be broken. It is probably no coincidence that this spirit flourished mostly in communities that were driven by nonconformist ideals, such as the Russian film schools of the 1950s, the French Nouvelle Vague and the “movie brats” of New Hollywood. This mindset was linked to production systems resembling the “Cinema Copain” ideal of the Nouvelle Vague. The biographies of many innovative directors share this similarity: After acquiring financial independence, for instance, through placing some box
396
office hits, they continued their work in relatively small, independent teams, considering themselves colleagues that could be trusted. Freedom in the creative process was the result and only this way it was possible that (for example) Ben Burtt could spend one whole year designing the sound effects of Star Wars through trial-and-error.
summary: seeds of change As our little voyage through the history of film sound has shown, the long process of “emancipation” of sound, from considerations of audio-visual montage through to its liberation from a naïve indexical straitjacket, has provoked a fundamental shift in aesthetic paradigms and resulted in a series of aesthetic innovations. Asynchronity, counterpoint and complex reciprocal imagesound relations enriched the vocabulary of the time-based medium leading to a reciprocity of the influence between image and sound and a malleable dominance of the one over the other. The emancipation of sound made it possible to develop a rich semantic vocabulary, relying on symbols, key-sounds and so forth, establishing genre specific stereotypes. The musicalization of the sound track has broken the barrier between musical and non-musical, abstract and concrete, material and synthetic, referential and non-referential sound, naturalizing the fictional and fictionalizing the natural. The embracing of ambiguity led to a rich practice of sonic enunciation of subjectivity and showed that the struggle for understanding can be a enjoyable experience. Finally, an understanding of sound-related discourse, emerging from the socio-cultural process of production and consumption has opened up and encouraged post-modern playfulness with style, form, and meaning. A few driving forces emerge as preconditions for such a development. Firstly, there is the confrontation with the artistic avant-garde of Futurism and Musique Concrète and the rebel attitude of “Rock’n’Roll” which leads to a critical approach and a playful “joy of subversion” in the
New Wine in New Skins
creative process. Additionally, it is important to break from the chains of the big studios and to work in relatively small teams with a less strict division of work. Last, but not least, technology plays an important role as a catalyst for aesthetic innovation, in particular through creative “abuse” or exploration of less common applications.
rEDrAWING PAttErNs OF cHANGE: PErsPEctIVEs FOr GAME sOUND DEsIGN In this section I will revisit some of the innovations in film sound described above18 and propose “conceptual derivatives” for computer game sound. As previously mentioned, it is clear that there is neither a repetition of the aesthetic history nor can patterns or guidelines derived from film sound phenomena be applied directly to game sound.19 There are certainly commonalities, but there are also essential differences between game and film sound. I will not elaborate here upon these differences; the reader may refer to the discussions in Deutsch (2001), Jørgensen (2007), Grimshaw (2007), Collins (2008), and, in particular (Grimshaw, 2008) who offers an in-depth discussion of several core elements of film sound theory and their applicability to game sound.20 Nevertheless, as I have made clear in the introduction to this chapter, historical developments in film sound aesthetics can serve as models of transformation and as points of reference for my proposition of new creative and aesthetic directions for game sound and thus providing an inspirational source and motivation for exploring new directions. The creative leaps in film sound design are interesting also because, despite originating in experimental artistic approaches, they were not only relevant to a small underground scene, but have finally made their way to the mass market and the mainstream.
Defining the Essence of computer Games First of all, a common conceptual ground for the understanding of the essential qualities of computer games, on which any further discussion can be based, shall be established. Note that this is by no means meant to be a general definition of computer games. Also, it is not a comprehensive review of the essential literature on the topic. Rather, it forms a working definition, limited to the scope of this chapter and serving to elucidate its thesis. Computer games, as understood here, are computational systems that essentially are procedural, functional and interactive. These autopoietic (Grimshaw, 2007) systems gestate worlds emerging only through the player’s agency and interaction. In this sense, computer games are both narrative spaces and tools for action. According to Neitzel, games have a transitional, hybrid character, oscillating between the closed, symbolic spaces of representative media that function through observation and immersive systems of agency in a virtual world. Depending on genre and type of game, the player is situated, or constantly shifts, between experience and action. Computer games thus resolve the subject as being the centre and origin of diegesis, as it is presented in film (Neitzel, 2000). Using Neitzel’s term, this “playful schizophrenia” is pushed even further through multiplayer games and online worlds, where the computer only provides the setting and rules–a kind of procedural experience system but one where the interaction mainly happens between humans represented by avatars. Narrative may well emerge, but is not a constituent of the game system any more. A similar impact results from the diffusion of games into everyday life through pervasive gaming applications. Here, life is a game, literally. In the immersive, performative experience of playing, the boundary of engagement through the “interface”, which involves physical controllers
397
New Wine in New Skins
as well as virtual representations of artefacts, or even the avatar’s limbs, is dissolved. This has interesting implications for how sound is associated with player agency as it can relate to the physical interaction as well as the action in the gameworld or even to actions of the game system, as will be elaborated below. Thus, the essential quality of a computer game is constituted by the action afforded by the game apparatus and performed by the player, where the interactive system facilitates the emergence of certain experiences that may have a narrative quality, at least in retrospect. I will follow this understanding of games exclusively here, which also means that I will not discuss aspects related to narrative such as diegesis, unless they offer a possibility for aesthetic experimentation. In the following section, I will demonstrate how the inspiration taken from film sound aesthetics and the qualities of computer games described here can be turned into directions for innovative sound design motivated by the inherent qualities of computer games as defined above.
Field of Action 1: Media Aesthetics and semantics Sound beyond Simulation and Naturalism Considering the prominence of the discourse about simulation and realism identified in the review of the state of the art in computer games, this issue shall be addressed first. I have previously criticized a general sonic naturalism and reductionism (Hug, 2008b), mainly by building upon Chion’s observation that there is no sound of a thing as a one-to-one-relationship and, if an unambiguous indexicality is needed, we usually rely on idealized instances of sonic occurrences. Additionally, Chion points out that many sounds suggest abstract qualities of material and process, which he labels “indices sonores materialisants”, rather than being specific indexes
398
of a process occurring with an object (Chion, 1998, p. 102). The concept of realism is particularly questionable in the entirely constructed virtual worlds of computer games, where any relation to a “given” reality (as in film) is entirely voluntary. Films already require constant leaps of faith in terms of identifying a sound’s source. It is through psychoacoustics and our imagination and willingness to accept and normalize inconsistencies that the sound from a speaker somewhere behind a screen can become the sound of a thing on screen (see Chion’s description of “magnetization” and “synchresis”, Chion, 1994). But, while a film will always produce a rupture between the world it has recorded and its representation, 3D computer games do not produce this rupture, as the mere possibility of spatial sound and physical modelling naturalizes and justifies every sound produced. Computer games constitute an apparatus where the process that generates its entities and the manifestations of these processes form a closed system, where sounds are calculated according to an ideal physical model and are “naturally” emitted from objects in three-dimensional space. Of course, such a simulative system of recreating reality has its appeal but, if this approach dominates creation processes, the fundamental quality of the technology is missed, its potential to create new, surprising aesthetics is overlooked. It is not a natural fact that any generative system relying on physical modelling is predestined for recreating reality. Let us recall some of the discussions about Virtual Reality when it still was a relatively young medium: Whereas hyperreality still implies some connection, regardless of how faint, to the ethos of verisimilitude, sound has no such loyalty; after all, where there is no ‘thing’ to represent, there can be no ‘misrepresentation.’ Similarly, VR, as a space of computer-generated simulation, renders irrelevant questions of verisimilitude, realism, and authenticity. Unlike the camera, the simula-
New Wine in New Skins
tion makes no claims to reproduce reality, and in that sense it cannot be wrong, it can only be bad. (Dyson, 1996, p. 84)
Designing for Autopoietic Content Designing for autopoietic content means that the content of a game is designed as a system of potentialities, framed and specified by superordinate qualities of a desired game experience, rather than consisting of a finite sum of fixed assets. This is potentiality embodied within formulations of the methods and procedures of its generation and the models of their mutual interaction. This is not a new design strategy; it is used, for example, in procedural art, and also for experimental music and live electronics. While artistic design strategies can and should be used for creating unique sonic experiences in games, they are of limited relevance as the related “application domains” are free from a-priori functional demands and do not necessarily rely on interactivity and play. Closer to the sought characteristics, even if originating from an entirely different field, are certain sonification methods, in particular Model Based Sonification (Hermann & Ritter, 1999). Here, a sound generating system is set up in such a way that the totality of all components that generate a sound are driven by the dataset that is fed into the system. The system is then treated as a form of “virtual emitter” that can be excited through a user’s interaction, for example, by virtually touching it with an interface. This could very well be an inspiration for an experimental approach to designing the sounds for a game: Following the analogy, all entities of the games could be specific parametrical setups of a general sonic model, which could be derived from an overall aesthetic concept, and the sounds would be generated in real time through the agency of both players, nonplayer characters (NPCs) and the world-system of the game.
In terms of designing sonic objects it would also be worthwhile to investigate ways to use generative technologies, such as physical modelling, in a creative, unorthodox way. The play with qualia, with sonic “traces” of materiality or physical processes, of the human or the animal, the organic or the inorganic, the material and the structural, could be achieved in real time following similar semantic approaches as in film, as outlined above. The system could also be devised as a kind of “real time sound designer”, assembling sonic components into complex sonic amalgams as micro-narratives (Back, 1996) on the fly. This would result in a subversion of the physical modelling paradigm into “fictional physical modelling”, linking it to dynamic interactive processes rather than a “trigger” paradigm. Why not take Farnell’s (2011) proposal of a “behavioural sonic object” further by subjecting the control of the sound generation algorithm to any imaginable narrative or performative expression? In the game Love (Steenberg, upcoming) the available processing power is used for a very distinct and aesthetically innovative graphical post-process in real time. Sound could be approached in the same way: for example, to use the processing power to create new sounds “on the fly”, instead of simulating reality. This would be a sound design which works with potentiality, as demanded, rather than crafting the actual sample itself.
Designing for Autopoietic Second Order Semantics As in film, it is paramount to understand how meaning of sound emerges from socio-cultural processes of production and reception. Sonic metasigns like symbols and key sounds, the practice of citation and remix, and the play with codes are some examples of those complex sonic signs. In a computational, dynamic, modular system, such complex systems of meaning creation can be assembled on the fly, in real time, driven by interaction. The current paradigm of using more
399
New Wine in New Skins
or less elaborate static samples with event based triggers can result in interesting and even outstanding gaming experiences, as we have seen in the examples given in the first part of this article, but this approach will never be able to exhaust the potential of the interactive real time medium that games are. Harvey and Samyn (2006) state in the Realtime Art Manifesto: “The situation is the story. Choose your characters and environment carefully so that the situation immediately triggers narrative associations in the mind of the user.” This also means that sound should be designed in a way that supports a situational emergence of narrative, which of course requires us to rethink the whole sound design process and the unorthodox use of the procedural technology we have at our disposal. An approach could be based on a script language that allows us to denote conditions for certain narrative and compositional reconfigurations of a procedural audio engine. The sound designer’s job in this case would be to define the changes in the parameters and the mappings. The engine registers the patterns in the player’s actions: This acting could be framed by simple, established psychological categories, for example basic emotional states or levels of intentionality. Does a player walk straight to a target or does he explore the surrounding world? Is he low on health, weak, hectic or calm? Where does he look and for how long? Does he first aim, then shoot? Does he collect health potions and use them only when necessary? Is he hitting the target with a last, desperate, blow? Does the player often look at the map? Does he constantly rearrange the inventory? Does he switch weapons aimlessly or in a very targeted manner? Does he miss a lot of the hidden pickup objects of secret doors? All this implies certain experiential qualities that can be taken as control elements of the interactive experience. This way, the experiential quality, that in film is narrated audio-visually, becomes the actual experiential quality of the player in the gameworld. In some ways, this strategy is comparable to the approach that is used in adaptive music. The
400
essential difference is that it is basically a system to allow for the creation of complex modification patterns in all aspects of the sound design, from the sonic object to its arrangement in an interactive time-space, depending upon the player’s behaviour in the gameworld. This requires further research into how people play games and the strategies they develop (the field of affective computing is one that researches these questions (see, for example, Picard, 1997)).
Agency-Driven Sonic Montage From the standpoint of aesthetical history of film, montage is probably one of the most important aspects of audio-visual design. As discussed above, temporal concepts such as asynchronicity or counterpoint cannot be transferred directly into games, but an agency-driven understanding could be followed. Like the active, script-driven mixing proposed by Bridgett (2009a), and that I mentioned earlier, one could envisage an “active montage”: By motivating the player to do certain things, for instance, to visit the inventory repeatedly or to switch perspective from close-up to total, an agency-driven sonic montage may be achieved. Game mechanics and level design are the fundamental components of design here. Let us also consider cinematic off-screen sound for a moment, an important element in the audio-visual montage. At first view it seems that, in a three-dimensional game, which gives the user a fair amount of control over the camera (both independently or not from the point of view of her avatar) an “off-screen” mode does not exist as a possibility. But by examining this possibility more closely, you will notice that specifically staged “off-screen” sound events actually do exist in games. Common examples are invisible doors, machinery and so on, that are activated by switches and the like. Another example is the spawning of players, NPCs, or objects. Spawning and remote control switches (and their derivatives) are constituent elements of many games. The
New Wine in New Skins
sonic design of these off-screen events is usually very stereotypical, for example through some kind of synthesized energy sound or the sound of a mechanism being activated. From the study of film sound, we quickly realize that this is not necessary the end of it. To begin with, the engine could be aware of the direction an avatar looks at, and control “off-screen” sounds depending upon this direction. From there, a myriad of design possibilities open up that wait to be explored.
Sonic Effects: A Helpful Paradigm A useful paradigm that can support the design of dynamic interactive sonic environments are “sonic effects”, proposed by Jean-Francois Augoyard and Henry Torgue (2005). “Sonic effects” emerge from the interaction of sonic events with their spatial and social environment and they always also have perceptual and psychological dimension. This approach, originating in urban studies, provides a useful link between Pierre Schaeffer’s “objet sonore” and Murray Schafer’s “soundscape”. Sonic effects relate to sounds as an instrumentarium to give shape to human relations and the everyday management of urban space, thus stressing the performative aspect of sound. At the same time, the approach roots sounds in specific situations and places. In addition to the acoustic analysis of a sound, or its relation to other sounds and space, it also considers its psychological dimensions and the socio-cultural discourse surrounding it. It seems to me that understanding how sound can be a constituent of experience should be fundamental in thinking about procedural semantics. In their book, Augoyard and Torgue (2005) describe a range of sonic effects and this knowledge could be implemented in game engines as well.
Predictability Killed the Game Star Sound design for games must also embrace limits of control and pre-determination. Harvey and Samyn (2006) state that: “Interactivity is the one
unique element of the realtime medium and it wants to be free”. Thus, it is not possible to attempt to control all aspects of gameplay! Multisensory congruency, consistency, obviousness: they are all useful and often necessary. However, there is no absolute rule stating that artificially created virtual experiences have to follow these ideals. As for sound, already trivial everyday observations show that the multitude of sensory phenomena occurring at any moment does not have to overlap necessarily: I can be looking at a picture of my last holiday while I hear the cars on the street (Ihde, 1976). These components are not semantically related by some kind of embracing narrative, but they can potentially emerge into a “narrative” in my memory which always will be unified into a coherent whole: The same is true for computer games if they are understood and designed as autopoietic systems that have very little or no predetermining story line. Soundscape studies have revealed how soundscapes mediate relationships between listener and environments (Truax, 2001) and these soundscapes are not simply “atmos” or “backgrounds”: They are constituted by sonic manifestations of individual agency, human or animal, culture and nature. By situating games as acoustic ecologies, Grimshaw (2007) developed such an understanding of games and it seems worthwhile to elaborate on it, both theoretically and in experimental practice. It is the fundamental listening experience of the acoustic ecology as emergent, non-conventional and potentially surprising that may encourage us to follow the tracks of Eisenstein, Clair, Tarkovsky, Lynch, and so many others who introduced the poetic, unforeseeable and even indescribable into their works. The game The Path, which was mentioned earlier, is an example of an almost Tarkovskyan audio-visual aesthetic. And the spatially extended, delocalized sounds of Thief: The Dark Project are akin to the changing room tones we can hear in Eraserhead in the role they play in our experience of the gameworld.
401
New Wine in New Skins
Learning from the experience of sound designers within movies, we can conclude that we should not avoid the ambiguous and unidentifiable at all costs. Interaction is essentially a process of ambiguity, requiring ongoing negotiation of meaning and goals not just between humans but also between humans and truly interactive computer systems. Instead of associating an image, action and sound with pre-determined, one-to-one relationships, we can create “situations” in which something can take place rather than conveying “one single message”. Games do not have to be “understandable” all the time: They may confront us with situations that seem accidental, but reveal their poetic quality as part of an overall gaming experience. Ambiguity, and also the potential to surprise, represents certain qualities of life that need to be taken into account when designing experience systems like computer games. This demands a consideration of similar techniques used in film, such as the deconstruction of causality or defamiliarization, whereby, in addition to the crafting of static sounds, they can also be achieved procedurally by manipulation in real time. It is important to keep in mind that any sound, even ambiguous ones, can become familiar again, constituting new categories of signifiers and establishing links to the experience from whence they emerged.
Field of Action 2: sonic Agency and Mediation of self Hear Me Interact From an anthropological point of view, making sound is one of the first performative acts of a human being, and the crying of the newborn can be considered the first “public statement”. Sound plays an important role in the constitution of our self image and for expressing this self image to others. Through breathing to laughing, crying, and screaming, we have an immense palette of sonic archetypes, and we use these instruments both
402
intentionally and unintentionally (Seitter, 2007). Playing games with sound also means hearing oneself being active. The player does not just listen to sonic events or a soundscape, he also does sound. Sounds thus manifest his presence and agency in the gameworld. Chion has introduced the term ergo-audition, to describe this “hearing oneself doing something”. First of all, this is relevant as regulatory feedback. But the concept includes also (inter)subjective and socio-cultural dimensions of meaning making. Usually, we hear other people’s activities rather than our own. However, when we break a sonic taboo, for example when sneezing during a classical music concert, we become more, even painfully, aware, of our own sounds. We are also aware of our sound-making capacity in the exact opposite case: when we enjoy hearing ourselves, a phenomenon that Chion labelled “plaisir de l’ergo-audition” (Chion, 1998). The sound of the beer can that I kick around, the sound of the champagne cork popping–they are positive manifestations of my agency and I want them to be heard by others as well. This effect is increased by interesting and surprising relationships between my action and the sounds and also because they encourage exploration. This manifestation of positive agency can be a powerful means of creating compelling experiences in games. I have mentioned the simple but effective feature of singing in the game Aquaria. Although not very explorative, this interaction is very satisfactory, not only because of the pleasing sonic quality but also because it activates the “joy of self-hearing”. Of particular interest is an effect I call “differential of power”: If I press a small button and this results in a massive, powerful sound, I experience a feeling of power and influence. A weak and fragile sound, however, is a sign of weakness and powerlessness. Specifically designing this relationship between player and gameworld contributes to the creation of engaging experiences. After the discussion of procedural approaches above, it is easy to imagine now that the control of this relationship could be
New Wine in New Skins
done in real time, depending upon the player’s performance and the dramaturgy unfolding in the gameplay, by adding or subtracting elements of complex sound constructions or by just modifying volume, envelope, and/or frequency spectrum of the interaction sound.
Sound Mediating the Relationship to the Avatar Another important point related to ergo-audition is the relationship to the player’s avatar, which includes a transformed projection of self, even if no visual representation of an avatar exists in the game. In this sense, games incorporate aspects of theatrical performance and role-playing. For instance, in online role-playing games such as World of Warcraft (Blizzard, released in Europe in 2005), players are explicitly required to create an alter ego which also includes the selection of a virtual race, body, clothes, profession, and skills. Upon entering the gameworld, users act out their role using all elements available. Creating and maintaining virtual identities is thus an important part of many games. So far, however, there are very few titles that allow the player to customize sonic identities without having to access low level functionality of the software. It is somewhat implicit in Spore, as the creation of a life-form also influences how it sounds. This is an area where a fictional physical modelling system would produce exciting new design possibilities. Each piece of clothing or artefact an avatar could put on or use would have a dynamically generated sound which could be modified additionally by attributive settings to achieve a rich sound palette. A team of players in World of Warcraft would also follow sonic criteria when choosing their equipment and clothing and they could further customize their sonic appearance to their liking. The sounds of their approaching avatar could become a signal of peril for their foes or a signal of hope for their friends, just like the sounds of the cavalry and fanfare to the Indians or settlers
in classical Western movies. And what about a tool allowing players to select from a wide range of sounds, or even import their own sounds, to teach the game how they want it to sound like? Something like a sonic pipette, following the pipettes of photo-editing software? In particular, using procedural sound design for autonomous, procedural avatars, like the “Drama Princess”, 21 is an interesting prospect.
Sound Mediating the Relationship to the Virtual World While film sound often connects elements of visual montage and provides continuity, game sound can link the player action on the interface with the action of her/his virtual presence in the gameworld. A simple click is transformed into a complex dynamic movement or process like opening boxes, operating artefacts and so on. On the one hand, missing visual information can be replaced by sound, on the other hand, through its “immaterial corporeality” (Connor, 2004), sound contributes to a virtual embodiment of a user’s agency. Sound-image synchresis is thus extended to action: What I do (or what I suppose another agent did) has caused that sound. Also the physicalization effect of sound is relevant for action and proprioception as it gives us hints about the meaning and properties of a virtual object and provides a kind of sonic affordance, giving the player hints about what to do next, thus stimulating action. Sound here supplies a form of virtual embodiment, often compensating for the lack of haptic feedback, for example when picking locks in Thief 3: Deadly Shadows (Eidos, 2004): Here, the two dimensional movement of the mouse is transformed audio-visually into a physical, three-dimensional interaction with the lock. Another interesting example is represented by the sounds emanating from the Wii Remote when pulling the virtual string of the bow or using the virtual fishing rod in The Legend of Zelda – Twilight Princess (Nintendo,
403
New Wine in New Skins
2006). This proprioceptive feedback could also be proceduralized and, for example, be influenced by the avatar’s exhaustion. Let us remember the power of ambiguity and unidentifiable sonic objects in this context: Why is it that, even in relatively experimental games, standard gameplay elements such as pick-up sounds, gates, teleporters, system information and so on always sounds similar? Considered superficially, it can certainly help the understanding of the game’s function and interface but, on the other hand, exploration, surprise, and adventuring into the unfamiliar is a fundamental, if not the “raison d’être”, for virtual worlds. Again, consider the most famous examples of innovative cinema and you will notice that this quality is what many of them share in their sound design. Additionally, with games being procedural systems, ambiguity can be dynamically controlled in relation to the whole image-sound-action relationship. Why not, for instance, surprise an overly self-conscious player by introducing disturbances into the sounds of artefacts and the environment?
Subjectivization In movies, extensive post-processing, as described above, is used for enunciation markers and the simulation or marking of subjective experiences. Games are essentially subjective experiences, so this would suggest that related sound design strategies could be applied. Chances are, however, that the designs used in movies cannot be applied in straightforward manner, as in a game we do not have a purely observational position. It is also important to note that we tend to perceive unexpected events, such as the sudden marking of subjective experience, as the system’s agency which can result in a breakdown of flow and immersion. Therefore, significant experimentation with this design element is still necessary. A possible initial approach could be to relate enunciations of subjective experience to the experiential story of the player during gameplay (experience of
404
failure, success and so forth) in which the sounds of the events encountered play an important role. A very simple and primitive example implemented in many games is the change of state of the player avatar, such as the “bullet time” in Max Payne (Rockstar Games, 2001). Most examples of the sonic enunciation of subjectivity can be found in First-Person Shooters (FPS) and related genres that feature an internal, first-person point of view. So far, most of the first-person perspectives are staged in a very “neutral” way, except for so-called special game states, in which the changed states are usually strongly enunciated by visual and auditory means. Examples are various variations of rage and invincible modes, for example, in Scarface: The World Is Yours, Haze (Free Radical Design, 2008), or Prey. However, state changes are so far inherently binary concepts and it is necessary to explore more open and dynamic approaches to “player states,” and their sonic enunciation.
Sonic Perspective A further element of mediating the relationship to the virtual world is sonic perspective. Vintage arcade games and the 3D FPS probably are the most restrictive genres in terms of perspective limitation. They also represent two extremes of sonic perspective, the “one dimensionality” and the “emitter paradigm”. In terms of perspective, game sound follows the same paradigms as the early film sound: Intelligibility of dialogue is extended by the need for intelligibility of any sound that is directly relevant for gameplay. These sounds are brought forward into a two dimensional space. The naturalistic spatial perspective representing distance from the camera from film in games is embodied in the emitter paradigm. Why do we not shake up these perspective rules and explore sonic perspective and extension beyond the restrictions of intelligibility and naturalism? Innovative film sound has shown us that it is possible. Of course, this means letting go of the general approach that
New Wine in New Skins
treats virtual 3D spaces as constructs analogous to “real” space.
Sonic Manifestation of the Systems Agency This is perhaps the most abstract and general opportunity for intervention to contribute to aesthetic innovation in game sound design, but nevertheless touches an essential point: In a truly interactive setting, the human player is not the only possible agent. The system itself, and all its manifestations, for example in the form of NPCs, are also agents and thus can be candidates for designing sonic manifestations of agency. I have mentioned the abstraction of the computer as agent in the condition of the breakdown of an interaction flow. However, without breaking the magic circle of immersion, sound can manifest a transformation or a presence in the system. Combining this with the idea of games that do not try to hide the apparatus but instead make the game system the “opponent” (or companion?), this is a very promising outlook.
Field of Action 3: the schizophonic Game Interface Last but not least, a neglected aspect is the sound design of the artefacts the gamer uses. I have already mentioned the Wii Remote as an interface to control the bow in The Legend of Zelda: Twilight Princess (Nintendo, 2006). With ordinary mice, joysticks and gamepads, the actual actions and sounds of the interface on the one hand, and the actions and sounds of the virtual world they connect to on the other hand, already form a mutual relationship, but this relationship is not subject to design and the two dimensions are rather juxtaposed than integrated. The Wii Remote is a foreshadowing of what would be possible if game controllers were allowed more input modes and always had loudspeakers embedded. The sounds of interaction partially emanate directly from the controller in the player’s hand, which tightens the
bonds between the player’s body, sound, interface, virtual object, and action. As we can see, contrary to movies, the game apparatus has one component that does not necessarily have to be “hidden” or masked–the interface. As such, this interface can be a candidate for sound design as well. Through sound, the interface might also gain significance and become more transparent as part of the gaming experience. For instance, sound helps to transform the Wii Remote into what it stands for in the gameworld, be it a gun, a sword, a tennis bat, or a bow. Game interfaces such as these constitute placeholder objects22 for many kinds of meanings and functions and are inherently ambiguous. They can–within the constraints of their appearance and operation modality–be redefined through sound. The object in its materiality and form integrates into its virtual function. In terms of sound design, the physical properties of the placeholder artefact are connected with the more complex functionalities it offers. The type of operation of the simulated object and its functionality defines the sound: Sounds that relate to the direct manipulation of objects are normally used and are, furthermore, combined with additional semiotic potential. The term “schizophonic” can be used to describe such artefacts. Schizophonia is the term coined by Schafer to denote the separation of sound from their (natural) sources by means of electroacoustics. For him this concept carries only negative connotations (Schafer, 1977). However, in my understanding, every interactive physical artefact which disposes of some kind of artificial, electroacoustic sound is schizophonic. Schizophonia is thus an essential and exciting aspect to consider when designing innovative game interfaces with sound.23 This also opens up a perspective from the interface of a stationary computer or console game to more experimental gaming artefacts, mobile gaming and so on. Here, we enter an exciting new field of theory and practice where sound has enjoyed little attention thus far: The sonic enhance-
405
New Wine in New Skins
ment of interactive artefacts of everyday life, an area of research and design which is investigated by the Sonic Interaction Design community.24 Understanding the possible associations of action, sound, and object (interface and/or virtual) is of great importance in this context. The relationships are manifold, and very seldom do we encounter simple analogies or isomorphisms between sound and action. A simple differentiation could be trigger versus constant manipulation (and their combination). This can be further diversified into more or less isomorphous and more or less direct connections (for more elaborate discussion, see Chion, 1998 and also Jensenius, 2007). These possibilities are of specific interest to game sound design as they contribute directly to the experience of agency, ergo-audition, and the pleasure of self-hearing in a game. I would like to point out here that the most simple and direct mappings are not necessarily the most interesting and pleasing ones. Additionally, let us always consider the use of dynamic, procedural techniques in the design: A gestural-sonic link may be modulated, depending upon the artefacts or the player’s condition or state, or can convey the shifting of agency from the player to the virtual artefact or the game system and vice-versa.
cONcLUsION Go Play! In this essay, I have expressed concern that game sound design is being restricted at an early stage of its history to certain implicit paradigms manifested in technological developments, creative practice, and discourse. Certainly, the emergence of a major game industry that serves a mass market has brought many advances and benefits to the medium. However, just as Hollywood has experienced, this threatens to ossify the medium before it can blossom and develop. The examples from the history of film sound have shown that
406
playing with form, the subversion of convention, the unexpected and ambiguous can be essential driving forces for innovation and engaging experience. In the long run, believing that satisfying expectations is the only way to go, is a dangerous error. Humans tend to get bored and eventually prefer to be challenged (in digestible doses) and to encounter new things. Thus, the first lesson is to leave conventions behind sometimes and to play! Do not just produce for the market, do not make it too easy for your consumers and, more importantly, do not produce for consumers but for players.
Art and craft Meets Procedurality Game sound design has to marry the high demands for craft and artistic inspiration known from film sound design with the procedurality of the medium. Sound design for games requires not only the design of complex sounds but the design of systems and processes that can generate, or modify them, and embed them in an interactive, ever changing experience. Interactive systems may also learn from the player’s agency, from the tactics and strategies he develops while interacting with the gameworld. Ultimately, this leads to a vision of games not as simple event and state machines but rather as evolving and adapting systems, always in dialogue with the player, implicitly or explicitly. The game system becomes another actor.25 Of course, this radical approach is not the only one to take and it will not always make sense to do so. Taking this approach depends on the experience and of the “hermeneutic affordances”26 that the designers want to provide. Real time mixing, digital signal processing, and adaptive and procedural techniques are the technologies that enable this development and some game sound designers have started to move towards this direction. This does not replace more traditional approaches to game (sound) design but helps to increase the breadth of possible gaming experiences. In fact, looking at all the technology
New Wine in New Skins
available, in particular the most recent generation of consoles and middleware, we have reached a state comparable to essential technical turning points in film sound history. However, in order to explore new directions for game sound, we might also need new game concepts. For example, take the notion of “genre”: In games, genre defines the diegetics, the way music is used and how “realism” or credibility is established (Collins, 2008). Why not play with those genre references instead of subjecting the criteria for sound design to the genre conventions? We can do that either by pushing genre to its extreme and remixing and combining genre elements, as in New Hollywood, or we can design games from the “inside out”, starting with considerations about interactivity, human agency, the play with real time and with procedural story spaces. Experimental approaches can also be formal or structural (comparable to films by David Lynch or Andrei Tarkovsky), they can touch the mainstream and play with it, such as 2001: A Space Odyssey (Kubrick, 1968) or THX 1138, or they can establish new mainstream aesthetics that are rooted in the spirit of works like Star Wars and The Matrix.
Games as Pervasive Event We can think of a game sound design which does not regard the game as an isolated hermetic system but as something integrated into everyday culture. For this to happen, the complex cultural practices of creation, re-mix, cross-reference, and appropriation need to be integrated into the conceptual process of game (sound) design. Think outside the box, literally. Think about pervasive gaming, games that interweave with our everyday practices, that are played in semi-public spaces and transgress the traditional borders of the medium, also sonically: The complex intertwining of the gaming event with everyday culture and its soundscape becomes all the more evident. Imagine the machine that generates the game becoming
part of the game. Imagine the room you play in is part of the world inside the game.
reduce to the Max All this seems to offer an incredible amount of possibilities. The question is, how to turn this into innovative aesthetical approaches to game sound. Leonard Paul and Rob Bridgett (2006) propose several strategies to push innovation in game sound by essentially limiting the freedom for “Next Generation” (which, by now, is of course the “present generation” of consoles) sound design. The authors are convinced that “as we move away from having our boundaries and aesthetics defined for us by the hardware (…) we need to enforce our own ‘boundaries’ or refined aesthetics.” (Original emphasis) They propose to establish a strict aesthetic valuing originality and distinctive style over high production quality. Establishing limits within which to work forms the initial step in this process. A strategy is to start the design from one single voice or sound, limiting tracks and DSP as well as microphones, and instead using resources to explore modulation, dynamics, and more detailed sculpting of sounds. They also propose recording performances or Foley sessions in one go, basically approaching “life-feeling” and forcing concentration upon the “essential”. In relation to interactive mixing techniques, they warn against relying too much upon 3D voices and too much “realism”.
Push the Envelope While many of these suggestions make sense, the question may be raised whether (artificial) limitation can really be the only key to experimentation, creativity and inspiration. Of course, as the Demo Scene demonstrates impressively, it can. However, it is absurd to follow this path as the only way of dealing with a lack of innovation and aesthetical elaboration. Consider the Cinema Vérité or the Dogma 95 movements in film: They
407
New Wine in New Skins
existed only as small movements within a bigger system, infusing it with alternative ideas to debate, but could not (and should not!) become the ultimate guideline for “good” artworks. The same is true for game development. As movie history has shown, technological innovation has always been a fruitful ground for creative re-appropriation. For example, multichannel sound has led to an artistic exploration of the technology at hand by innovative movie makers such as Francis Ford Coppola and Walter Murch. Why not take this example as a motivation for a similar attitude towards new technologies available today in game sound? The point is not to let oneself be restricted creatively by dominating conventions such as “simulating reality”.
Analysis for Inspiration Analytical concepts such as those presented by Grimshaw (2007) and Jørgensen (2007; 2011) are extremely helpful for game sound designers in terms of inspiration for new concepts and design approaches. What, for example, if other players can also hear a system sound, which presumably was addressing only a single player (under certain conditions)? What if a causal link of a sound is not established through a direct association with a player’s activity, but with certain interaction patterns of whole player communities? The point I would like to make here is that, in the end, one of the biggest advantages of analytic theory lies in its ability to infuse new ideas by playing with them, by deconstructing their “rules” and underlying assumptions in a creative way. This is true for the notion of diegetics as well as for any other analytical concept.
creative Environments for creative Minds We also need new production processes and tools. Drescher’s humorous “Homunculonic AEStheticator” might not be as farfetched as it would first
408
appear. Scripting languages need to be created to ensure that the creation of complex game systems is accessible to artists and designers as well. A look at the impact that visual patching has brought to artistic approaches to programming in general or the effect of accessible open source electronics platforms, such as the Arduino, on artistic appropriation of “physical computing” shows the potential of a broad range of accessible tools. To make this all possible, education plays an important role. At the Zurich University of the Arts Game Design programme, we have been trying to foster a critical, sometimes subversive and always playful approach to game design. In relation to sound design, we developed a curriculum that constantly oscillates between experiment, research and design (Hug, 2007). We have limited access to the latest technologies including those available to create advanced game sound design. But this does not matter: If the students understand what is at stake in the design of interactive games, they will be able to realize the same innovative ideas with more advanced technology later. It is also important to provide a space for experimentation, literally, to understand our classrooms as experimental laboratories. These labs should offer tools for experimentation with standard computers, but also with embedded systems, sensors, tiny loudspeakers, loudspeaker arrays and so on. Our students have taken to this approach by producing some fantastic games with some of them performing successfully in festival competitions.27 Of course; there is still a lot of room for improvement, in particular in terms of sound design. All too often, the preconditioning to a conservative approach to sound, and the general inability of our culture to deal with sound in a creative way outside of what is considered “music”, has stifled creativity. However, there are always encouraging projects emerging, which embody many of the ideas presented here. So, there are good reasons to hope for a ”next generation” that truly deserves the term.
New Wine in New Skins
These are all very general and pretentious statements, easy to say and hard to do. My take on this is that we as game (sound) designers should consider such approaches, watch and listen (again) to experimental films and artworks, and to our everyday soundscapes, and that we should never stop searching for a sonic aesthetic that emerges from the procedural quality of interactive computer games. To sum it all up, in order to advance in this direction, all we have to do is… play!
AcKNOWLEDGMENt I wish to thank Graeme Coleman for his great support in editing this article and his valuable feedback. I also would like to thank the team at the Game Design department of the Zurich University of the Arts for providing an environment that enables playful experiments with games, and our students for the inspiration they provide with their ideas, questions and products.
rEFErENcEs Altman, R. (1992). Cinema as event . In Altman, R. (Ed.), Sound theory sound practice. New York: Routledge. Aquaria. (2007). Ambrosia Software. Aronofsky, D. (1998). Pi. Harvest Filmworks
Back, M. (1996). Micro-narratives in sound design: Context, character, and caricature in waveform manipulation. In Proceedings of the 3rd International Conference on Auditory Display. Beauchamp, R. (2005). Designing sound for animation. Burlington, MA: Elsevier. Blueberry garden. (2009). Erik Svedäng. Brainpipe. (2008). Digital Eel. Bresson, R. (1985). Notes on sound . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. Bridgett, R. (2006). Audible words, pt. 2: Updating the state of critical writing in game sound. Gamasutra. Retrieved February 6, 2009, from http://www.gamasutra.com/features/20060831/ audio_03.shtml Bridgett, R. (2007a). Designing a next-gen game for sound. Gamasutra. Retrieved February 13, 2009, from http://www.gamasutra.com/view/ feature/2321/designing_a_nextgen_game_for_ sound.php Bridgett, R. (2007b). Interactive ambience. Game Developer Magazine. Retrieved May 8, 2009, from http://www3.telus.net/public/kbridget/aural_fixation_april07.jpg Bridgett, R. (2009a). The future of game audio: Is interactive mixing the key? Gamasutra. Retrieved May 3, 2009, from http://www.gamasutra.com/ view/feature/4025/
Audio Engineering Society. (2009). AES 35th international conference: Audio for games. Journal of the Audio Engineering Society. Audio Engineering Society, 57(4), 254–261.
Cakewalk. (1983). Commavid.
Augoyard, J. F., & Torgue, H. (2005). Sonic experience - A guide to everyday sounds. Montreal: McGill University Press.
Cameron, J. (1991). Terminator 2: Judgement day. Pacific Western.
Cameron, J. (1984). The terminator. Pacific Western.
Chion, M. (1994). Audio-vision: Sound on screen. New York: Columbia University Press. Chion, M. (1998). Le son. Paris: Nathan.
409
New Wine in New Skins
Clair, R. (1985). The art of sound. Excerpts from a series of letters . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. (Original work published 1929)
Eisenstein, S. M., Pudovkin, V. I., & Alexandrov, G. V. (1985). Statement on the sound film . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. (Original work published 1928)
Collins, K. (2008). Game sound: An introduction to the history, theory, and practice of video game music and sound design. Cambridge, MA: MIT Press.
FableII. (2008). Microsoft.
Connor, S. (1994). Edison’s teeth: Touching hearing . In Erlmann, V. (Ed.), Hearing cultures: Essays on sound, listening and modernity. Oxford: Berg. Coppola, F. F. (1979). Apocalypse now. Zoetrope Studios. Curtis, S. (1992). The sound of the early Warner Bros. cartoons . In Altman, R. (Ed.), Sound theory sound practice. New York: Routledge. Darwinia. (2007). Ambrosia Software. Deutsch, S. (2001). Harnessing the power of music and sound design in interactive media . In Earnshaw, R., & Vince, J. (Eds.), Digital content creation. New York: Springer. Dig dug. (1983). Atari. Doom 3. (2004). Activision, 2004. Drescher, P. (2006a). GAC(k!). Retrieved June 8, 2009, from http://blogs.oreilly.com/digitalmedia/2006/09/gack-1.html Drescher, P. (2006b). THE Homunculonic AEStheticator. Retrieved June 8, 2009, from http:// blogs.oreilly.com/digitalmedia/2006/11/thehomunculonic-aestheticator-1.html Dyson, F. (1996). When is the ear pierced? The clashes of sound, technology and cyberculture . In Moser, M. A., & MacLeod, D. (Eds.), Immersed in technology: Art and virtual environments. Cambridge, MA: MIT Press. Dyson. (2009). Kremers & May.
410
Farnell, A. (2007). An introduction to procedural audio and its application in computer games. Retrieved May 20, 2009, from http://www.obiwannabe.co.uk/html/papers/proc-audio Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Fleming, J. (2009). Planet of sound: Talking art, noise, and games with EA’s Robi Kauker. Retrieved June 3, 2009, from http://www.gamasutra.com/ view/feature/3978/planet_of_sound_talking_art_. php Flückiger, B. (2001). Sound design, die virtuelle Klangwelt des Films. Marburg, Country: Schüren. Grey matter. (2008). McMillen, Refenes, & Baranowsky. Grigg, C., Whitmore, G., Azzarello, P., Pilon, J., Noel, F., Snyder, S., et al. (2006, May). Group report: Providing a high level of mixing aesthetics in interactive audio and games. Paper developed at the Annual Interactive Music Conference Project Bar-B-Q. Grimshaw, M. (2007). The acoustic ecology of the first person shooter. Unpublished doctoral dissertation. University of Waikato, New Zealand. Grimshaw, M. (2008). Per un’analisi comparata del suono nei videogiochi e nel cinema . In Bittanti, M. (Ed.), Schermi interattivi saggi critici su videogiochi e cinema (pp. 95–121). (Bittanti, M., Trans.). Roma: Meltemi.
New Wine in New Skins
Half-Life series. (1998-). Valve. Harvey, A., & Samyn, M. (2006). Realtime art manifesto. Retrieved June 6, 2009, from http:// tale-of-tales.com/tales/RAM.html Haze. (2008). Free Radical Design. Heavenly sword. (2007). Sony. Hermann, T., & Ritter, H. (1999). Listen to your data: Model-based sonification for data analysis . In Advances in intelligent computing and multimedia systems (pp. 189–194). Baden-Baden. Hitchcock, A. (1963). The birds. Universal Pictures. Hug, D. (2007). Game sound education at ZHdK: Between research laboratory and experimental education. In Proceedings of Audio Mostly 2007 - 2nd Conference on Interaction with Sound. Hug, D. (2008a). Towards a hermeneutics and typology of sound for interactive commodities. In Proceedings of the CHI 2008 Workshop on Sonic Interaction Design. Hug, D. (2008b). Genie in a bottle: Object-sound reconfigurations for interactive commodities. In Proceedings of Audiomostly 2008, 3rd Conference on Interaction With Sound. Ihde, D. (1976). Listening and voice: A phenomenology of sound. Athens, OH: Ohio University Press. Jackson, B. (2009). SFP: The magical world of “Spore”. In Mix Online. Retrieved May 20, 2009, from http://mixonline.com/post/features/ sfp-magical-world-spore Jensenius, A. R. (2007). ACTION --SOUND: Developing methods and tools to study musicrelated body movement. Unpublished doctoral dissertation. University of Oslo, Department of Musicology.
Jørgensen, K. (2007). ‘What are those grunts and growls over there?’ Computer game audio and player action. Unpublished doctoral dissertation, Copenhagen University, Denmark. Jørgensen, K. (2011). Time for new terminology? Diegetic and non-diegetic sounds in computer games revisited . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Kromand, D. (2008). Sound and the diegesis in survival-horror games. In Proceedings of Audiomostly 2008, 3rd Conference on Interaction With Sound. Kubrick, S. (1968). 2001: A space odyssey. MetroGoldwyn-Mayer. Lima, E. (2005). The devil’s in the details: A look at Doom3’s antimusic. In Music4games. Retrieved June 12, 2009, from http://www.music4games. net/Features_Display.aspx?id=70 LittleBigPlanet. (2008). Sony. LoBrutto, V. (1994). Sound-on-film: Interviews with creators of film sound. Westport, CT: Praeger. Love. (forthcoming). Eskil Steenberg. Lucas, G. (1971). THX 1138. Warner Bros. Pictures. Lucas, G. (1977). Star wars. LucasFilm. Lynch, D. (1977). Eraserhead. American Film Institute. Lynch, D. (1990-1991). Twin Peaks. Lynch/Frost Productions. Marks, A. (2001). The complete guide to game audio. Lawrence, KS: CMP Books. Maturana, H. R., & Varela, F. G. (1980). Autopoiesis: The organization of the living . In Maturana, H. R., & Varela, F. G. (Eds.), Autopoiesis and Cognition. Dordrecht, Netherlands: Reidel.
411
New Wine in New Skins
Max Payne. (2001). Rockstar Games.
Scarface: The world Is yours. (2006). Vivendi.
McTiernan, J. (1987). Predator. Amercent Films.
Schaeffer, P. (1966). Traité des objets musicaux. Paris: Seuil.
Metz, C. (1980/1985). Aural objects . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. Columbia: Columbia University Press. Morgan, S. (2009). Dynamic game audio ambience: bringing Prototype’s New York City to life. Gamasutra. Retrieved May 8, 2009, from http:// www.gamasutra.com/view/feature/4043/ Mr. Do! (1983). CBS Electronics. Mullan, E. (2011). Physical modelling for sound synthesis . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Neitzel, B. (2000). Gespielte Geschichten. Struktur- und prozessanalytische Untersuchungen der Narrativität von Videospielen. Unpublished doctoral dissertation, University of Weimar, Germany. Oink! (1983). Activision. Paul, L., & Bridgett, R. (2006). Establishing an aesthetic in next generation sound design. Gamasutra. Retrieved May 25, 2009, from http://www. gamasutra.com/view/feature/2733/ Picard, R. W. (1997). Affective computing. Cambridge, MA: MIT Press. Prey. (2005). 2K Games/3D Realms. Primerose. (2009). Jason Rohrer. Prototype. (2009). Activision. Pudovkin, V. (1985). Aynchronism as a principle of sound film . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. (Original work published 1929) Roddenberry, G. (1966-1969). Star trek. Paramount Television. Samorost 1. (2003). Amanita Design. Samorost 2. (2005). Amanita Design.
412
Schafer, R. M. (1977). The soundscape: Our sonic environment and the tuning of the world. New York: Destiny Books. Scott, R. (1979). Alien. Twentieth Century Fox. Seitter, W. (2007). Das Spektrum der menschlichen Schallproduktionen. In H. Schulze & C. Wulf (Eds.), Paragrana, Internationale Zeitschrift für Historische Anthropologie, 16(2), 191-205. Berlin: Akademie Verlag. Silent hill series. (1999-). Konami. Sim city. (1999-2007). Maxis. Sonnenschein, D. (2001). Sound design: The expressive power of music, voice, and sound effects in cinema. Studio City, CA: Michael Wiese Productions. Splinter cell series. (2002-). Ubisoft. Spore. (2008). Electronic Arts. Stanton, A. (2008). Wall-E. Pixar Animation Studios. Tarkovsky, A. (1972). Solaris. Mosfilm. Tarkovsky, A. (1979). Stalker. Mosfilm. Tarkovsky, A. (1986). Sacrifice. Argos Films. The legend of Zelda: Twilight princess. (20060). Nintendo. The path. (2009). Tale of Tales. The Sims series (2000-). Electronic Arts. Thief 3: Deadly shadows. (2004). Eidos. Thief: The dark project. (1998), Eidos.
New Wine in New Skins
Thom, R. (1999). Designing a movie for sound. Film Sound. Retrieved February 12, 2009, from http://filmsound.org/articles/designing_for_ sound.htm Tom Clancy’s ghost recon: Advanced warfighter 2. (2007). Ubisoft. Truax, B. (2001). Acoustic communication. Westport, CT: Greenwood Press. Truppin, A. (1992). And then there was sound: The films of Andrei Tarkovsky . In Altman, R. (Ed.), Sound theory sound practice. New York: Routledge. Wachowski, L., & Wachowski, A. (1999). The matrix. Warner Bros. Pictures. Whittington, W. (2007). Sound design & science fiction. Austin: University of Texas Press. Williams, A. (1985). Godard’s use of sound . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. World of warcraft. (2005). Blizzard. Wurtzler, S. (1992). “She sang live, but the microphone was turned off”: The live, the recorded and the subject of representation . In Altman, R. (Ed.), Sound theory sound practice. New York: Routledge. Young, K. (2006). Recreating reality. Game Sound. Retrieved February 13, 2009, from http://www. gamesound.org/articles/RecreatingReality.html
KEY tErMs AND DEFINItIONs Arduino: An open-source electronics prototyping platform based on flexible, easy-to-use hardware and software. See http://www.arduino. cc. Autopoiesis: From Greek “auto” (self) and “poiesis” (creation). The term was originally
introduced by Humberto Maturana and Francisco Varela and describes a system organized as “a network of processes of production (transformation and destruction) of components which: (i) through their interactions and transformations continuously regenerate and realize the network of processes (relations) that produced them; and (ii) constitute it (the machine) as a concrete unity in space in which they (the components) exist by specifying the topological domain of its realization as such a network” (Maturana & Varela 1980, p. 79). The concept has been applied in several fields such as sociology and cybernetics particularly where it references self-organizing systems. Demo Scene: A computer art subculture that specializes in producing non-interactive audiovisual presentations that are entirely generative and run in real-time on a computer. See for instance http://www.demoscene.info. Futurist Music: Most prominently defined by Russolo in his 1913 manifesto “Art of Noises”, futurist music embraced noise, everyday urban sounds, and, in particular, the sounds of industry and war as material for musical expression. Soundscape: According to the original definition by R. Murray Schafer, a soundscape is “any portion of the sonic environment regarded as a field for study” (see glossary in Schafer, 1977). It is characterized by a communicative and systematic relationship between sounds, listener, and environment. Musique Concrète: A form of electroacoustic music without visible source (“acousmatic”) which mainly uses material derived from the manipulation of recordings of found “sonic objects”. Reduced Listening: In the theory of Pierre Schaeffer, reduced listening is listening to the sound for its own sake, as a sound object, by removing its real or supposed source and the meaning it may convey. He describes his theory of the “objet sonore”, including the various listening modes, in (Schaeffer, 1966). The only substantial English text covering Pierre Schaeffer seems to be Chion’s Audio-Vision (1994).
413
New Wine in New Skins
Sonification: Sonification is the use of nonspeech sound to convey information or perceptualize data. For more information about this research field visit http://www.icad.org. 13.
ENDNOtEs 1.
2. 3. 4. 5.
6.
7. 8.
9.
10.
11. 12.
414
At this point I want to clarify my use of the multi-faceted term “sound design”. I will use the term here to describe the activity of creating and composing what is commonly called “sound effects” (SFX) and does not involve music or speech, except for the cases where the borders between these traditional classifications are blurred. The nature of computer games also includes aspects of implementation and programming. http://www.iasig.org http://www.audiogang.org http://www.gamasutra.com Interactive XMF is mainly used for music, and will be integrated in Sony’s “Awesome” scripting engine for the Playstation 3. http://www.nvidia.com/page/pg_55418. html http://www.audiogang.com I do not discuss musical computer games as they form a specific, highly stereotypical, genre, and are therefore not relevant for the aim of this article. Actually, it is a major challenge to create procedural systems that do not sound random and characterless. http://www.audiokinetic.com/4105/soundseed-introduction.asp http://unity3d.com The various nominees and winners of the Independent Games Festival competition which is being held at GDC in San Francisco provides a useful basis for studying interesting approaches to sound design in independent games. Of particular interest is, of course, the “Excellence In Audio” award,
14. 15. 16.
17.
18.
19.
20.
21.
although often this broad scope results in musical games winning the award, which are not necessarily the most innovative titles in terms of sound design. http://www.newgrounds.com/portal/ view/467236 http://tale-of-tales.com Colloquial term for the first sound movies. Another example of the early adoption of new technologies by innovative filmmakers. Note that the use of sonic codes originally reserved for cartoons (described as iconic and non-indexical by Curtis, 1992) in such movies gives the notion of any objective “realism” as a point of reference the final blow. A minor methodical issue has to be mentioned at this point. Firstly, most of the movies discussed by the authors cited are at least 15 years old. Furthermore, there is no substantial account available on sound design within television series where many aesthetic innovations have been developed in the last few years. Future work shall incorporate a discussion of these relatively new formats. Many of the aesthetic innovations, like mixing many different sounds from all possible sources, time-stretching and pitch-shifting, are “standard” methods for sound design in film and computer games (see, for example, Sonnenschein, 2001, Marks, 2001). They have entered our collective memory built on mass media consumption and, in some cases, have become stereotypes themselves often serving the limited, functional aesthetics of most of today’s games. Note the recent publication dates: It is clear that the discussion of this topic is far from concluded and that the related theory is still very much in a state of flux. Drama Princess is a “reusable autonomous character for realtime 3D focused on dramatic impact rather than simulation of natural behavior”, developed by Harvey and Samin
New Wine in New Skins
22.
23.
24. 25.
from Tale of Tales. http://www.tale-of-tales. com/DramaPrincess For a more in-depth discussion of possible categories of sounding interactive artefacts, see (Hug, 2008a). For an elaboration please refer to (Hug, 2008b). http://www.cost-sid.org From this perspective the diegetic question seems obsolete. If the gamer is part of the same system as the player, the narrative world and the existential world of the player merge into one.
26.
27.
I use this term to describe the interaction of “given” affordance in the sense of ecological psychology, with the ongoing interpretive process in our interactive experience of artificial sounds. A recent example is the game Feist by Florian Faller and Adrian Stutz (http://www. playfeist.net). More information about the programme is available at http://gamedesign. zhdk.ch and on the work platform at http:// www.gametheory.ch (available only in German, so far).
415
416
Appendix
AbstrAct What will the player experience of computer game sound be in the future? This was the question posed in an online discussion forum to which the book’s contributors were invited to respond. What follows is a free-wheeling debate about the future of game sound. Little editing has been done, other than the most obvious grammar, syntax and spelling errors, in order to maintain the fresh, often off-the-cuff responses. Three related themes become apparent in this discussion: affect, emotion and biofeedback; realism versus alternative realities; and the need for a game-sound design aesthetics. The first opens up interesting possibilities for enhanced player interaction (including playerplayer interaction across networked games) and immersion. Although authors and games companies often talk about the player being immersed in the gameworld, it is clear that current technology only hints at the potential. Similarly, games companies often praise the realism of their game sounds: even the iconic sound of Atari’s Pong of the early 1970s had its synthetic tones described as “realistic”. But which realism is being alluded to? What precisely does this Holy Grail of realism represent and how should it be attained? Is it the authenticity of sound that contributes to game realism or its verisimilitude in the context? If the latter, does realism derive from expectation, culture and genre and what debt does it owe to other forms of media? If realism refers to an emulation of reality, do we mean social realism, thematic realism, consequential or physical realism and who wants to play reality anyway? These questions directly relate to the need for a game sound design language: something that is still nascent. Game sound involves a very different paradigm to the derivation and perception of sound as found in reality or any other form of recreational medium. Like real-world environments, game sound derives from the actions of and upon its entities but it is triggered from a different rather than issuing directly from those entities. Unlike cinema, games require the willing and active participation of the player to effect the game and its sound. Whatever the future holds, it is clear that we have only begun to discover the possibilities inherent in computer game sound. Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Appendix
Grimshaw: There are a number of ways to approach this. What will the picture be in 1 year, 10 years, 20 years, how will the technology change, what interfaces/outputs might we have, do we need realism (what type of realism), what will change in the player’s perception, how will sound design change, are there ethical questions involved in biofeedback and so on? I’ll start this off by saying that games of the future will conduct a form of dialogue between player and sound where the sound itself becomes an active, participating character in the gameworld working in tandem with the player to increase his/her experience and immersion. This will be achieved through biofeedback whereby the game constantly monitors the player’s immediate affect and latent emotion (through EEG, GSR, ECG, EMG etc. -- devices such as Nia and emotiv headsets are tending in this direction) and responds by synthesising new sounds and processing NPC speech. In the former case, parameters of sound such as frequency, timbre, intensity, ADSR envelope will be modified; in the latter, pitch, stress, rhythm and so on will be modified (I’m imagining realtime synthesis of NPC text files with an emotive envelope/pattern applied according to player state and game context). So, the game engine senses the player is not frightened enough? Up the ‘fear’ controller to alter the synthesis of sounds and add a worried tremor to NPC speech. Perhaps this should be taken the other way. The player is about to have a heart attack so an emotion governor kicks in to calm them down by synthesising soothing sounds (after all, presumably game companies don’t want to be sued). Players can be kept on just the right level of emotional rollercoaster. Of course, before all this is possible, there needs to be substantial research into what it is about sound that induces fear (or happiness, sadness etc.). This will need to take in context, past experience
of the player and their culture/society (various literature reports that different nationalities view very different engine sounds as exciting and sporty -- Ferrari for Italians, Porsche for Germans). The game setup menu might have faders for nationality, gender and age. This type of technology will void the requirement for game sound designers because it will be the players (their psychophysiological state) who design the sound on the fly. Some role might remain for the creation of specific sounds but sound designers will find their role greatly decreased in the games industry. Of course, it also opens up new avenues, creative ones, outside the industry -- the technology described above leads to the possibility of being able to design/create sound by ‘thinking’ about it -- presumably, the most creative soundscapes created in future will be thought by the most creative minds. O Keeffe: Perhaps in 20 years, sound will be considered less important, characterised as an interference with game play. In the real world, sound within urban spaces is constantly being categorised as noise. If we continue on the path of highlighting post modern soundscapes as noisy environments we justify silent virtual spaces, places to escape to sound of the real. Grimshaw: Rather like ‘silent’ rooms in busy corporate and academic campuses. Liljedahl: There is also the concept of “perceptulization” which is interesting in the view of the very well established “visualization” and the at least in some communities well established “auralization”. In the future, sound, graphics and other media types will be more integrated. Today, what we see and what we hear do not necessarily match. One example is room acoustics. When
417
Appendix
for example the content of a room changes during the course of a game, the acoustics does not. With new, better methods to simulate reverberation and acoustic occlusion culling will add to realism. Along the same line is the idea that the sounds of the game could adapt to the acoustics of the physical room where the player is located. This can be used in pervasive games to blur the borders between the virtual reality of the game and the physical reality of the player.
need to? In the example of the leg being chopped off, I’m assuming it’s not a real leg attached to a live human being. Yet, the horror works because the context, latex leg, blood, screams and other appropriate sawing and chopping sounds make us “see” what we do not see. Again, this is a case of sound making the image look better -- the scene would lose its power without the sound (just as Tati’s films would be nothing without his absurd sound use).
Liljedahl: I like the idea with silent rooms. It opens up for new types of games where realism is not self evident or the only aspect of the game media.
What would a future be like where, instead of putting sound to image (to make it look better), we put image to sound (to make it sound better)? (I wish to ignore musical forms such as ballet, musicals etc. here.)
Liljedahl: Perhaps we have done what can be done given the sound technology available today. Decades of film, TV and computer game production have exploited many of the possibilities doable with today’s relatively static audio tools. New levels of interactivity will put new demands on the ability to create dynamic soundscapes. New DSP technology will open up totally new possibilities to create far more dynamic soundscapes than today. Grimshaw: I was chatting with a colleague last night who works in (visual) SFX and he was describing some visceral scene in one of the Saw movies (a leg being chopped off or something). According to him: “Sound makes the image look better” (my italics).
Cunningham: Bio-feedback will surely play a big part. It sounds so cumbersome and intrusive now but the technology will come along to let us do it in more discreet and passive ways. In the meantime the scope is there to research the human physiological responses to sounds of fear, joy, sadness, and so on. Improving computer models of emotion and AI engines will mean that the game can, in turn, adapt to the changing state of the player.
I wonder if it’s ever the case that “image can make the sound sound better”? Many of the chapters in the book make the point that sound tends to be subservient to image particularly so in the production process and this is illustrated by my colleague’s comment.
Silent rooms are interesting. I would classify Second Life as being something of a pseudo-game, but I frequently find myself turning the ambient noises and music down or off - as I find it easier to get a handle on what my avatar is doing and the interactions it is having with other characters and objects. Perhaps this comes back to the concept of “realism” in games. I think we can already produce game sound that is 80% or 90% realistic, especially using surround sound. I suppose what we need to consider is: is realism needed and what are the effects of it on game players?
Regarding the comment here that “what we see and what we hear do not necessarily match”. Do they
If we do want realism, then the technology has to get better. I don’t want 5 or 7 speakers dotted
418
Appendix
round the walls of my living room (and neither does my missus!). Wavefield synthesis would be great too but it’s not practical for the average man in the street. Do we need to go back to HRTF or are thin, wireless speakers the way to go? Wilhelmsson: What would a future be like where, instead of putting sound to image (to make it look better), we put image to sound (to make it sound better)? (I wish to ignore musical forms such as ballet, musicals etc. here.) (Grimshaw) One way to obtain this state would be to start the process of game design with the sound or at the very least let the sound designer become a part of the process from the very beginning. I would say that some ideas from moviemaking could come in handy. Pudovkin’s ideas on asynchronsim for instance. Use the sound to communicate with the player what the images do not and stress that invisible part as game play foundation. Maybe abandon the image as progenitor of the sound more or less totally. However, that is not likely to happen. A more likely development for the player experience of computer game sound would be less compressed audio files which in turn might lead to a sound environment that has a more “natural” dynamic range. If processing power of the sound would keep up with or overshadow the processing power of graphics cards we might have less a need for high compression on the audio files. Would it not be nice to have multiple processors and 128 GB to just handle all the audio files in a game? What more? The player will probably benefit from more refined audio technology with directional sound. Why not continue the reconfiguration of our daily living environments yet another step and make full use of directed sound technology? Let the player experience the sound without disturbing others and not force her to use headphones. This sound strategy could very well have the positive side effect that the player might be forced to move around to hear all that there is to hear. In a world
with new control devices surprisingly little has been done on the side of audio technology sad enough. Cunningham: I guess to answer the question directly, the player experience of computer game sound in the future will be an one that is totally transparent, pervasive, and natural. Alves: One way to obtain this state would be to start the process of game design with the sound or at the very least let the sound designer become a part of the process from the very beginning. (Wilhelmsson) Yes, I do believe that’s the way to go. Though there will always be the need for people with the technical ability to deal with sound, the real deal is to ensure that sound is explored in its depth. Not only as sonorization but also as part of meaningful components of the game where sound is relevant for the course of action and for the inhabitants of the gameworld. To accomplish that, a sound designer (meaning a designer who is aware and attentive to sound potential) ought to be involved right from the start of the design process. Yet, I do not believe that we should be looking for ways to ensure that a game will have sound. My understanding is that sound as any other modality should be subservient to the global communicative purposes of the game, let’s say to the accomplishment of an emotional script. In that sense, ultimately, if the best use of sound in a particular game is to keep silence, so be it - that will be good sound design too. Anyway, my point is that for the designer to be able to use sound whenever and however it should be more appropriate, it is fundamental to be able
419
Appendix
to design the game with such design decisions already present, in counterpoint to have the game already defined and trying to find the best way to fit or to wrap it with meaningful sound. Alves: is realism needed and what are the effects of it on game players? … If we do want realism, then the technology has to get better. (Cunningham) Realism is certainly interesting. Still, I tend to feel that the actual issue is the adequacy of sound to the “reality” of the gameworld (which can be very disparate from that of the real world). That is, the realism relative to the gameworld. One important aspect of this inner realism, I believe, is coherence. Coherence with visuals, with physics, and among sounds. I mean, the whole setting should be believable (or at least it should not ruin the player’s will to believe in the gameworld). Alves: In addition to the contribution of sound to the enrichment of the gameworld, I’m also hoping that sound will contribute to make the act of playing a more enjoyable experience, per se. I mean that the use of sound, particularly as input, can change the way a player interacts with the game interface with possible improvements in the overall experience. I’m expecting that sound will become increasingly bidirectional. And that means that someday we will be sending all sorts of sound into the game. In turn, that seems to call for a much more active interpretation of the role of playing -- no more sitting still and mute, just clicking or pressing keys. That is, the actual behavior of the person who is playing, while playing, may change. I also believe that roaring or making some kind of bizarre noise towards the game will tend to
420
promote collective experiences. This is, of course, just a feeling but I guess if I would have a game where I was supposed to do a lot of yelling, I would certainly guarantee a good laugh if I could use the help of some friends or family (I mean, in loco, not online). It seems to me that a game interface that allows a more expressive activation is also more prominent to sharable experiences. In turn, if there is some truth in this foresight, we are also talking about new opportunities in terms of game design. We may be willing to address more attentively the design of games that are meant to provide collective experiences. Cunningham: A quick thought in regard to the last post by Alves, discussing the use of sound as a game input and the role playing of the game player. I’ve often thought it might be interesting to conduct a study of the type of speech and nonspeech sounds made by games players during play. I’ve often found myself muttering, cursing, elating, and so on in response to the stresses, failures, and victories in the game scenario. If the game could respond to such utterances this could lead to an even more dynamic and interactive experience. Droumeva: perhaps in 20 years, sound will be considered less important, characterised as an interference with game play. In the real world, sound within urban spaces is constantly being categorised as noise. If we continue on the path of highlighting post modern soundscapes as noisy environments we justify silent virtual spaces, places to escape to sound of the real. (O Keeffe) This made me think of a captivating user video I once saw on YouTube from a “walkthrough” of Grand Theft Auto: Liberty City. The player was moving his avatar and narrating his actions, and
Appendix
basically he wanted to talk about the depth of exploration one can get to in GTA (a game that, as we all know, has had a reputation for being gratuitously violent), how rich the environment is, etc. He walked his character all the way away from the city - saying that “it was too noisy and busy” and into “Central Park” in order to “enjoy some quiet nature sounds and peace of mind, away from it all” - it just struck me as a most curious simulacrum - finding the precious solace of “realism”, of reprieve from a noisy environment in a game - in the quiet natural soundscape of a game! To me, that signifies one possible future for game sound - it will be more and more the “real” environment of young people as opposed to the real soundscapes of the noisy, urban, overcrowded offline world. So design has to be conscious of that, absolutely, how - I’m not sure...mimic closely and thoroughly our surrounding acoustic soundcsape, or foster completely imaginary worlds? Grimshaw: I think foster completely imaginary worlds. It’s the ‘otherness’ of other environments that captivates players and I for one would see no reason to immerse myself in a world exactly the same as this one. Hug: Following up on the initial question (and some related points made during the discussion)... I agree with Grimshaw that there is a strong possibility biofeedback will be used in some form to control game engine states, including real time sound synthesis. However I am not convinced that this will necessarily lead to improved player experience. The problem is, that if players are aware of these mechanisms (and they surely will be, because advertisements, making-ofs and magazines will make a rave about it) this will already alter the way they approach the game. Usually, once we are aware of a certain level of control, we will try to subvert a given system. In
the traditional narrative, the fearful sound works just because we have been “guided” toward it by the linear storyline (and the sensory experiences that accompany it, think “calm before the storm”). In the perspective of an ad-hoc modification of synthesis parameters it might well be that we constantly reflect the fact that our behavior has an impact on the events, which might make the actual events much less interesting. A second issue is to deal with the nuances in perception, behavior, and the possible contrast between measured states and felt states. Biophysical excitement might have different causes, so altering the assumed cause might trigger the wrong feedback loops... But that’s a subject of a lot of research anyway. But I also see a huge chance for creative practice in such technologies. However, this seems to require the invention of new game genres. I think the traditional “narrative” approach to game design (more or less linear, storylines, quests) has a few merits (subtly changing the sounds based on player’s states, as described), but it would not be the most suitable approach to leverage the full potential. The play with emotions in itself could become part of the game, and the player would have to use his self-control over emotional states for actively controlling aspects of the game. Imagine you are a virtual spy and have to trick a lie detector or an investigative detective... Ok that’s more traditional narrative again, but what I want to say is: It might be worthwhile not to try to hide the system, the apparatus, from the player, but make it available to them as tool for action. How this connects to sound? I strongly believe that there is a big potential in the possibility of linking a virtual sound world with the actions of the player. This may be partially controlled by biophysical monitoring systems, but I think at least
421
Appendix
equally important are (physical) game controllers. And I mean not just Wii Motes, but the idea is that everything can become a game controller in a “mixed reality”. The interesting differential then (and it’s this differential that is the most exciting to design) is between the player’s actions and the sounds as manifestations of this action in the game world. So biophysical monitoring would not mainly be used to adjust the sensory output of the medium to alter the player’s emotional state, but it would be used to give the player an additional channel of expression. Imagine a game where players learn about the sonic behavior of virtual artifacts (and the way they have to handle them using their physical placeholders or project natal - body movements), where totally new and surprising action-sound relationships could be designed. And the mentioned input channel for speech & nonverbal expressions could play an important part in it. Remember the audio-gun from Dune? Hug: Addressing Droumeva’s point. There is always a fascination in the simulation of “reality”, and actually I think part of the fascination comes from the knowledge that you are not “out there” but sitting at home in your full immersion suit listening to binaural soundscapes. I think this will always have its place and justification. But on the other hand, I think Grimshaw is right. It’s the “otherworld” that we seek to flee into. This otherworld certainly is composed of familiar elements but deconstructs them and surprises us with the unexpected. In general I think that creative and more sustainable potential lies in the definition of new aesthetics rather than simulation of the familiar and “real”. I think game sound could take an example in how film sound was pushed into a media language of its own, establishing design strategies that have become kind of “naturalized”, are inherently part
422
of the aesthetics of the medium. Game sound thus should explore new directions and for that we need people (artists?) that abolish preconceptions and just try out crazy stuff. I think no one can exactly say how a game sound aesthetics will or should be like, but we can say that we have to explore unorthodox paths and eventually, a new “language” will emerge. I also think, that the directions towards which such an exploration could go can be derived of some genuine qualities of the medium. Think about the idea of “montage” in film which was one of the strongest catalysts for audiovisual innovation. In games, it is maybe not so much about audio-visual montage, but about action-feedback montage. De-constructing familiar action-feedback loops and creating new ones. Another field which is prone to artistic exploration is “diegesis” of game sound, as it seems very unclear where diegesis starts and ends in a medium where a narrative is not passively consumed but actively co-created as player experience. Film sound has developed a great variety of ways to establish or support diegesis, as well as how to integrate non-diegetic sound to serve a narrative. In game sound this is still terra incognita to a large extent, in particular if we look at genres with “low narrativity”. Grimshaw: Certainly the player may subvert the system I propose and that might be part of the fun (and would all players be aware of the possibility and, even if they were, would the apparatus recede into the background with familiarity and the needs to play the game in order to reach the desired outcome?). However, why not have the game subvert the player? The sound engine need not slavishly mimic the fear of the player, for example, it could do the reverse and stubbornly refuse to help that emotion along until the player is lulled into a false sense of security and then....!
Appendix
I like the idea of having to use emotion to navigate the game (and this feeds into more than just sound). You’re right Daniel [Hug], such a game would probably require a new genre (can I bag the name first as ‘emotive gaming’!) where the game itself emotionally engages with the player, becomes a character itself. Hug: However, why not have the game subvert the player? The sound engine need not slavishly mimic the fear of the player, for example, it could do the reverse and stubbornly refuse to help that emotion along until the player is lulled into a false sense of security and then....! (Grimshaw) Well I doubt this degree of control over the player’s emotions can ever be achieved, simply because of the ambiguity of interpretation. It’s maybe a bit like with psychoactive drugs: for some a dream for others a horror trip (or even for the same person under different circumstances). This is why I think the power of biofeedback should maybe be seen as another channel of agency for the player rather than hiding it. Hug: My vision of a tool for game sound design: A hybrid foley box which seamlessly integrates physical modeling and all kinds of synthesis methods as well as real-world recordings and resynthesis. The most important feature will be a sonic pipette: just grab a sonic residue somewhere, drop it into a placeholder object, combine with “sonic drops” from other sources, including virtual ones drawn from physical models, create an envelope in real time by singing into it and then define a set of mapping criteria to attach it to an entity of your game world (objects, npcs, avatars) and play around with the object and its sounds in realtime in the game world.
If anyone would like to help develop this, let me know! This actually also points to a question related to the future of game audio in general, if procedural methods really are the future: at which point does the actual sound design take place? Will we be merely adjusting parameters of simulated physical entities? How do we achieve the magic and the “bigger than life” effect and surprising, new sounds, if everything is controlled by a “realistic” simulation engine? Is there a way to combine the strengths of procedural audio with old-fashioned compositional sound design? Well, my idea of a hybrid foley box would maybe be a way to join these worlds... Grimshaw: Well I doubt this degree of control over the player’s emotions can ever be achieved, simply because of the ambiguity of interpretation. It’s maybe a bit like with psychoactive drugs: for some a dream for others a horror trip (or even for the same person under different circumstances). This is why I think the power of biofeedback should maybe be seen as another channel of agency for the player rather than hiding it. (Hug) I can dream.... Certainly a lot of work needs to be done: fundamental research into mapping emotion/affect to sound parameters not to mention precise and accurate measurement of such emotion/affect. It may well be that it’s a long time before we move beyond the blunt tool of mere positive/negative valence and are able to precisely identify fear as opposed to anxiety for example. However, (once that research is well under way) personal emotion profiles for individuals could be stored taken from ‘set-up’ measurements and this would allow more precise targeting of individuals.
423
Appendix
Grimshaw: This actually also points to a question related to the future of game audio in general, if procedural methods really are the future: at which point does the actual sound design take place? Will we be merely adjusting parameters of simulated physical entities? How do we achieve the magic and the “bigger than life” effect and surprising, new sounds, if everything is controlled by a “realistic” simulation engine? Is there a way to combine the strengths of procedural audio with old-fashioned compositional sound design? Well, my idea of a hybrid foley box would maybe be a way to join these worlds... (Hug) Now there’s an interesting question. Anyone? Droumeva: In general I think that creative and more sustainable potential lies in the definition of new aesthetics rather than simulation of the familiar and “real”. I think game sound could take an example in how film sound was pushed into a media language of its own, establishing design strategies that have become kind of “naturalized”, are inherently part of the aesthetics of the medium. Game sound thus should explore new directions and for that we need people (artists?) that abolish preconceptions and just try out crazy stuff. (Hug) I completely agree actually, got wrapped up in making a point about the experiences of “reality” which, I believe, will still be an important social experience in gaming, albeit - agreed with Grimshaw and Hug that gaming is more about a different reality than re-immersing into a nostaligic version of past realities. (whatever it is that those “alternate realities” end up being). And I do retract my previous implication that “reality” should be somehow integrated into game sound design, or be a design principle - I meant it strictly as an important cultural byproduct and social experience.
424
I love the idea of biofeedback and I do see that coming in a more mainstream way into gaming in the 10-year prediction range - and hopefully by then game sound will operate with a new and improved “media language” - crazy artists would have made their mark on the interactional mappings between game sound and game input, as well as the structure and mechanics of gaming period, so biofeedback might literally control the soundscape or at least the avatar’s own soundmaking in the game (assuming a narrative structure once again, I know) as a mechanic. I think along with biofeedback, I see remote networking tangible controllers - things that can send sensations like touch, temperature, pressure, perhaps sound and breath, vibration, rhythm, to a remote player. I also like the idea of players being made to understand more how game sound is synthesized and be able to take part in that process more actively, though this point makes me wonder if the future of gaming is all about “opening up” the programmatic side of games and making players-as-producers - I don’t know if that might result in really bland, generic game structures that are the “blank slate” upon which players build up game worlds and game feedback. That said, I can definitely see, within a year even, game sound being customizable - i.e. players being allowed to upload their own sound effects to each game, and thus construct their own soundscapes. But regarding the general question of future of game sound - obviously related tightly to the future of game genres, and game mechanics - I also see a rise in “lifestyle gaming” and “human computation”. Lifestyle gaming I’d call things like Wii Fit, Brain Age, the multitude of “games” that are essentially utility applications thinly veiled as games. Game sound is bound to be affected by this shift. Thinking specifically of biofeedback, I can definitely see it being used in “lifestyle games” for anti-anxiety, meditation, stress control, etc. and
Appendix
that would entail different (perhaps more secondary, limited, or perhaps more information-based/ driven) uses of game sound than entertainment games - driven by a quest for playfulness, fun and challenge. Human computation - the use of gaming structures for humans to do actual work, may be fringe now, but I see it rising with trends like education technology (I myself am in that field, somewhat…) and I can’t help but thinking game sound - its potentials for fun and playfulness - might suffer, should pragmatics over-ride aesthetics and playfulness. Just throwing this out there... Grimshaw: I don’t think game sound will suffer. Pragmatics might be needed should game structures broaden their reach into non-gaming areas because such areas are not intended to be games and therefore do not need (necessarily) game aesthetics and playfulness. Grimshaw: With regard to Droumeva, you brought up the idea of network tangible controllers. What about extending this to incorporate biofeedback in a multi-player system? We’ve already discussed using a player’s psychophysiology to affect/effect sound in the game and, due to the nature of networked games, assuming the new sound then has a feedback effect upon the player, this will probably have an effect upon gameplay and other players. So far, the biofeedback sound is only heard by the one player -- could the parameters used to drive the sound synthesis/ processing also be sent via the network to other player’s audio engines? In a horror game, can players then sense the fear of others? Hug: There is something in this discussion which strongly reminds me of mid-end nineties discussions about full immersion cyberspace,
telepresence, body-suits, etc. This vision might now actually become technically feasible. But just to give this a twist into a slightly different direction: What if in the future (which actually has already begun in some ways) there is no “closed system” of gaming anymore? No dedicated software and hardware interfaces? When gaming is pervasive, where you are, where shopping for food becomes a quest for the one milk bottle which contains the key to level 92? When your CO2 footprint directly links to your avatar’s stats and to bonus programmes offered by a green power syndicate? What of sound, then, being neither a reflection of a (constructed) reality nor the expression of a separate, self-referential aesthetic system (“game sound aesthetics”, think 8 bit...), but an element in a hybrid, electroacoustic soundscape? I’m doing some extensive research on sound design for interactive artifacts for everyday use and there I constantly run into this question. In this scenario there is no distinguished system of aesthetic codes anymore as we know it from film and today’s games, there is no entering or leaving a specific application, environment, cinema, game, etc., there is just a constant “multilayeredness” of presence and agency - or maybe a constant switching between presences. And this poses fundamental questions about what might be suitable (sound) design strategies. How do we combine, merge, juxtapose, subvert the “naturally occurring” physical sounds, and the sounds of a pervasive gaming system? And how do we integrate these sonic events into the socio-cultural fabric of everyday life? This sounds maybe far out, but then again, it is happening already. Do we need to investigate not only the “acoustic ecology” of the game, as pointed out by Grimshaw in his work, but an acoustic ecology of our game-lives?
425
Appendix
Or, might it be possible that in the end, humans will always prefer to experience the transition between systems, to know when they are inside or outside a game, work, “real life”?... Where the power switch (and the “mute” button) is? Alves: hybrid, electroacoustic soundscape (...) might it be possible that in the end, humans will always prefer to experience the transition between systems, to know when they are inside or outside a game, work, “real life”?... Where the power switch (and the “mute” button) is? (Hug) In such scenario: no switch, please... We are entitled to a playful life!
426
I guess if we are able to enhance any aspect of our lives then it should become legitimately persistent (just as shelter, clothes, food, and education). We would deal with ‘new sound’ the same way we currently deal with other sounds we are able to control: should a sound become inconvenient in any particular circumstance, we would behave in a way it would not happen.
ENDNOtEs 1
Collins, K. (2008). Game sound: An introduction to the history, theory, and practice of video game music and sound design. Cambridge MA: MIT Press. p. 9.
427
Compilation of References
(1990). Wing commander [Computer game]. Austin, TX: Origin Systems.
Adams, M. (2009). Hearing the city: Reflections on soundwalking. Qualitative Research, 10, 6–9.
(1995). Super Mario Bros [Computer game]. Redmond, WA: Nintendo.
Adams, M., Cox, T., Moore, G., Croxford, B., Refaee, M., & Sharples, S. (2006). Sustainable soundscapes: Noise policy and the urban experience. Urban Studies (Edinburgh, Scotland), 43(13), 2385. doi:10.1080/00420980600972504
(2002). Lucky Larry’s Lobstermania [Computer game]. Reno, NV: IGT. (2008). Emily Project. Santa Monica, CA: Image Metrics, Ltd. (2008). Faceposer [Facial Animation Tool as Part of Source SDK]. Bellevue, WA: Valve Corporation. (2008). Warrior Demo. Santa Monica, CA: Image Metrics, Ltd. (2010). World of warcraft [Computer game]. Reno, NV: Blizzard Entertainment. Aarseth, E. (2008). A hollow world: World of Warcraft as spatial practice . In Corneliussen, H., & Rettberg, J. W. (Eds.), Digital culture, play and identity: A World of Warcraft reader. Cambridge, MA: MIT Press. Aarseth, E. (2003, August). Playing research: Methodological approaches to game analysis. Paper presented at the Digital Arts and Cultures Conference, DAC2003. Melbourne, Australia. Aarseth, E. (2005). Doors and perception: Fiction vs. simulation in games. In Proceedings of 6th Digital Arts and Culture Conference 2005. Aav, S. (2005). Adaptive music system for DirectSound. Unpublished master’s thesis. University of Linköping, Sweden.
Adler, S. (2002). The study of orchestration (3rd ed.). New York: Norton & Company. Adorno, T. W., & Eisler, H. (1947). Composing for the films. New York: Oxford University Press. Adrien, J. M. (1991). The Missing link: Modal synthesis . In De Poli, G., Piccialli, A., & Roads, C. (Eds.), Representations of music signals (pp. 269–298). Cambridge, MA: MIT Press. Agarwal, R., & Karahanna, E. (2000). Time flies when you’re having fun: Cognitive absorption and beliefs about information technology usage. Management Information Systems Quarterly, 24(4), 665–694. doi:10.2307/3250951 Alais, D., & Blake, R. (1999). Neural strength of visual attention gauged by motion adaptation. Nature Neuroscience, 2(11), 1015–1018. doi:10.1038/14814 Alloy, L., Abramson, L., & Viscusi, D. (1981). Induced mood and the illusion of control. Journal of Personality and Social Psychology, 41, 1129–1140. doi:10.1037/00223514.41.6.1129 Alone in the Dark. (2008). Eden Games. Alone in the dark. [Computer game]. (1992). Infogrames (Developer). Villeurbanne: Infogrames.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Compilation of References
Alone in the dark: Inferno. [Computer game]. (2008). Eden Games S.A.S. (Developer). New York: Atari.
Anderson, P. W. S. (2002). Resident evil [Motion picture]. Munich, Germany: Constantin Film.
Alone in the dark:The new nightmare. [Computer game]. (2001). DarkWorks (Developer).Villeurbanne: Infogrames.
Angus, J. A. S, and Caunce A. (2010) A GPGPU approach to improved acoustic finite difference time domain calculations. AES 128 (7963) London, UK.
Altman, R. (1992). Sound theory sound practice. London: Routledge.
Aquaria. (2007). Ambrosia Software.
Altman, R. (1992). General introduction: Cinema as event . In Altman, R. (Ed.), Sound theory, sound practice (pp. 1–14). New York: Routledge. Altman, R. (1992). Cinema as event . In Altman, R. (Ed.), Sound theory sound practice. New York: Routledge. Alves, V., & Roque, L. (2009b). Notes on adopting auditory guidelines in a game design case . In Veloso, A., Roque, L., & Mealha, O. (Eds.), Proceedings of Videojogos2009 - Conferência de Ciências e Artes dos Videojogos. Aveiro, Portugal. Alves, V., & Roque, L. (2009a). A proposal of soundscape design guidelines for user experience enrichment. In Proceedings of the 4th Conference on Interaction with Sound, Audio Mostly 2009 (pp. 27-32). Glasgow, UK. AM3D (2009). AM3D [Computer software]. AM3D A/S (Developer). Aalborg, Denmark. Amdel-Meguid, A. A. (2009). Causing fear and anxiety through sound design in video games. Unpublished master’s thesis. Southern Methodist University, Dallas, Texas, USA. Amsel, A. (1962). Frustrative nonreward in partial reinforcement and discrimination learning: Some recent history and a theoretical extension. Psychological Review, 69(4), 306–328. doi:10.1037/h0046200 Anderson, G., & Brown, R. I. T. (1984). Real and laboratory gambling, sensation-seeking and arousal. The British Journal of Psychology, 75(3), 401–410. Anderson, J. D. (1996). The reality of illusion: An ecological approach to cognitive film theory. Carbondale, IL: Southern Illinois University Press.
428
Aronofsky, D. (1998). Pi. Harvest Filmworks Arons, B. (1992, July). A review of the cocktail party effect. Journal of the American Voice I/O Society, 12, 35-50. Arsenault, D., & Perron, B. (2009). In the frame of the magic cycle: The circle(s) of gameplay . In Perron, B., & Wolf, M. J. P. (Eds.), The video game theory reader 2 (pp. 109–132). New York: Routledge. Arsenault, D., & Picard, M. (2008). Le jeu vidéo entre dépendance et plaisir immersif: les trois formes d’immersion vidéoludique. Proceedings of HomoLudens: Le jeu vidéo: un phénomène social massivement pratiqué, (pp. 1-16). Retrieved from http://www.homoludens.uqam. ca/index.php?option=com_ content&task=view&id=5 5&Itemid=63. Ashcraft, B. (2008) How gaming is surpassing the Uncanny Valley. Kotaku. Retrieved April 7, 2009, from http://kotaku.com/5070250/how-gaming-is-surpassinguncanny-valley. Ashmed, D. H., & Wall, R. S. (1999). Auditory perception of walls via spectral variations in the ambient sound field. Journal of Rehabilitation Research and Development, 36(4). Assassin’s Creed 2. (2009). Ubisoft Montreal. Ubisoft. Association for Computing Machinery. (2010). ACM computing classification system. New York: ACM. Retrieved February 4, 2010, from http://www.acm.org/about/class/. Atkinson, D. (2009). Lip sync (lip synchronization animation). Retrieved July 29, 2009, from http://minyos.its. rmit.edu.au/aim/a_notes/anim_lipsync.html.
Compilation of References
Atwater, F. (1997). Inducing altered states of consciousness with binaural beat technology. In Proceedings of the Eighth International Symposium on New Science (pp. 11-15). Fort Collins, CO: International Association for New Science. Aucouturier, J. J., & Pachet, F. (2002). Scaling up music playlist generation. In Proceedings of the IEEE International Conference on Multimedia Expo. Audio Engineering Society. (2009). AES 35th international conference: Audio for games. Journal of the Audio Engineering Society. Audio Engineering Society, 57(4), 254–261. Audiosurf. [Video game]. (2008). Dylan Fitterer (Developer), Bellevue, WA: Valve Corporation (Steam).
Ballas, J. A. (1994). Delivery of information through sound . In Kramer, G. (Ed.), Auditory display: Sonification, audification, and auditory interfaces (pp. 79–94). Reading, MA: Addison-Wesley. Bannister, D., & Mair, J. M. M. (1968). The evaluation of personal constructs. London: Academic Press. Barlow, D. H. (1988). Anxiety and its disorders: The nature and treatment of anxiety and panic. New York: Guilford Press. Barr, P. (2008). Video game values: Play as humancomputer interaction. Unpublished doctoral dissertation. Victoria University of Wellington, New Zealand.
Augoyard, J. F., & Torgue, H. (Eds.). (2005). Sonic experience: A guide to everyday sounds. Montreal, Canada: McGill-Queens University Press.
Bartneck, C., Kanda, T., Ishiguro, H., & Hagita, N. (2009). My robotic doppelganger—A critical look at the Uncanny Valley theory. In Proceedings of the 18th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN2009, 269-276.
Augoyard, J., & Torgue, H. (2006). Sonic experience: A guide to everyday sounds (illustrated ed.). Montreal, Canada: McGill-Queen’s University Press.
Bateman, C. (2009). Beyond game design: Nine steps towards creating better videogames. Boston: Charles River Media.
Avanzini, F. (2008). Interactive sound . In Polotti, P., & Rocchesso, D. (Eds.), Sound to sense, sense to sound – A state of the art in sound and music computing (pp. 345–396). Berlin: Logos Verlag.
Bateman, C., & Boon, R. (2006). 21st century game design. Boston: Charles River Media.
Avanzini, F. (2001). Computational issues in physicallybased sound models. Unpublished doctoral dissertation. University of Padova, Italy. Back, M. (1996). Micro-narratives in sound design: Context, character, and caricature in waveform manipulation. In Proceedings of the 3rd International Conference on Auditory Display. Bailenson, J. N., Swinth, K. R., Hoyt, C. L., Persky, S., Dimov, A., & Blascovich, J. (2005). The independent and interactive effects of embodied-agent appearance and behavior on self-report, cognitive, and behavioral markers of copresence in immersive virtual environments. Presence (Cambridge, Mass.), 14(4), 379–393. doi:10.1162/105474605774785235
Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. Journal of the Optical Society of America, 20(7), 1391–1397. doi:10.1364/JOSAA.20.001391 Battle of the bands. (2008). Planet Moon Studios. Beauchamp, R. (2005). Designing sound for animation. Burlington, MA: Elsevier. Beck, D. (2000). In Boulanger, R. (Ed.), Designing acoustically viable instruments in Csound. The Csound book: Perspectives in software synthesis, sound design and signal processing (p. 155). Cambridge, MA: MIT Press. Beentjes, J. W. J., Van Oordt, M., & Van Der Voort, T. H. A. (2002). How television commentary affects children’s judgments on soccer fouls. Communication Research, 29, 31–45. doi:10.1177/0093650202029001002
429
Compilation of References
Beerends, J. G., & De Caluwe, F. E. (1999). The influence of video quality on perceived audio quality and vice versa. Journal of the Audio Engineering Society. Audio Engineering Society, 47(5), 355–362. Begault, D. R., & Wenzel, E. M. (1993). Headphone localization of speech. Human Factors, 35, 361–376.
Berndt, A., Hartmann, K., Röber, N., & Masuch, M. (2006). Composition and arrangement techniques for music in interactive immersive environments. In Proceedings of Audio Mostly 2006: A Conference on Sound in Games (pp. 53-59). Piteå, Sweden: Interactive Institute/Sonic Studio Piteå.
Benson, D. J. (2007). Music: A mathematical offering. Cambridge: Cambridge University Press.
Bethesda Game Studios (Developer). (2006). The Elder Scrolls IV: Oblivion [Computer game]. 2K Games & Bethesda Softworks.
Bentley, T., Johnston, L., & von Baggo, K. (2005). Evaluation using cued-recall debrief to elicit information about a user’s affective experiences. In T. Bentley, L.Johnston, & K. von Baggo (Eds.), Proceedings of the 17th Australian conference on Computer-Human Interaction (pp. 1-10). New York: ACM.
Beverland, M., Lim, E. A. C., Morrison, M., & Terziovski, M. (2006). In-store music and consumer–brand relationships: Relational transformation following experiences of (mis)fit. Journal of Business Research, 59, 982–989. doi:10.1016/j.jbusres.2006.07.001
Berndt, A. (2008). Liturgie für Bläser (2nd ed.). Halberstadt, Germany: Musikverlag Bruno Uetz. Berndt, A. (2011). Diegetic music: New interactive experiences . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Berndt, A., & Theisel, H. (2008). Adaptive musical expression from automatic real-time orchestration and performance . In Spierling, U., & Szilas, N. (Eds.), Interactive Digital Storytelling (ICIDS) 2008 (pp. 132–143). Erfurt, Germany: Springer. doi:10.1007/978-3-540-89454-4_20 Berndt, A. (2009). Musical nonlinearity in interactive narrative environments. In G. Scavone, V. Verfaille & A. da Silva (Eds.), Proceedings of the Int. Computer Music Conf. (ICMC) (pp. 355-358). Montreal, Canada: International Computer Music Association, McGill University. Berndt, A., & Hähnel, T. (2009). Expressive musical timing. In Proceedings of Audio Mostly 2009: 4th Conference on Interaction with Sound (pp. 9-16). Glasgow, Scotland: Glasgow Caledonian University, Interactive Institute/ Sonic Studio Piteå.
Biedermann, I., & Vessel, E. A. (2006). Perceptual pleasure and the brain. American Scientist, 94(May-June), 247–253. Bijsterveld, K. (2008). Mechanical sound: Technology, culture, and public problems of noise in the twentieth century. Cambridge, MA: MIT Press. Bijsterveld, K. (2004). The diabolical symphony of the mechanical age: Technology and symbolism of sound in European and North American noise abatement campaigns, 1900-40 . In Back, L., & Bull, M. (Eds.), The auditory culture reader (1st ed., pp. 165–190). Oxford, UK: Berg. Bilbao, S. (2006). Fast modal synthesis by digital waveguide extraction. IEEE Signal Processing Letters, 13(1), 1–4. doi:10.1109/LSP.2005.860553 Bilbao, S. (2009). Numerical sound synthesis: Finite difference schemes and simulation in musical acoustics. Chichester, England: John Wiley and Sons. BioShock. (2007). Irrational Games. Blauert, J. (2001). Spatial hearing: The psychophysics of human sound localization (3rd ed.). Cambridge, MA: MIT Press. Blesser, B., & Salter, L. (2009). Spaces speak, are you listening?: Experiencing aural architecture. Cambridge, MA: MIT Press.
430
Compilation of References
Blueberry garden. (2009). Erik Svedäng. Blumer, H. (1986). Symbolic interactionism. Berkeley: University of California Press. Boillat, A. (2009). La «diégèse» dans son acception filmologique. Origine, postérité et productivité d’un concept. Cinémas Journal of Film Studies, 19(2-3), 217–245.
Brandon, A. (2004). Audio for games: Planning, process, and production. Berkeley, CA: New Riders Games. Branigan, E. (1992). Narrative comprehension and film. London: Routledge. Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sound. London: MIT Press.
Bolivar, V. J., Cohen, A. J., & Fentress, J. C. (1994). Semantic and formal congruency in music and motion pictures: Effects on the interpretation of visual action. Psychomusicology, 13, 28–59.
Bregman, A. S. (1992). Auditory scene analysis: Listening in complex environments . In McAdams, S. E., & Bigand, E. (Eds.), Thinking in sound (pp. 10–36). New York: Clarendon Press/Oxford University Press.
Bordwell, D. (1986). Narration in the fiction film. London: Routledge.
Brenton, H., Gillies, M., Ballin, D., & Chatting, D. J. (2005, September 5). The Uncanny Valley: Does it exist? Paper presented at the HCI 2005, Animated Characters Interaction Workshop, Napier University, Edinburgh, UK.
Bordwell, D., & Thompson, K. (1994). Film history: An introduction. New York: McGraw-Hill. Bordwell, D., & Thompson, K. (2004). Film art: An introduction (7th ed.). New York: McGraw-Hill. Boucsein, W. (1992). Electrodermal activity. New York: Plenum Press. Braasch, J. (2005). Modelling of binaural hearing . In Blauert, J. (Ed.), Communication acoustics (pp. 75–108). Berlin: Springer Verlag. doi:10.1007/3-540-27437-5_4
Bresson, R. (1985). Notes on sound . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. Brewster, S. A. (1994). Providing a structured method for integrating non-speech audio into human-computer interfaces. Unpublished doctoral dissertation. University of York, Heslington, UK.
Bradley, I. L. (1971). Repetition as a factor in the development of musical preferences. Journal of Research in Music Education, 19(3), 295–298. doi:10.2307/3343764
Bridgett, R. (2006). Audible words, pt. 2: Updating the state of critical writing in game sound. Gamasutra. Retrieved February 6, 2009, from http://www.gamasutra. com/features/20060831/audio_03.shtml
Bradley, M. M., Codispoti, M., Cuthbert, B. N., & Lang, P. J. (2001). Emotion and motivation I: Defensive and appetitive reactions in picture processing. Emotion (Washington, D.C.), 1(3), 276–298. doi:10.1037/1528-3542.1.3.276
Bridgett, R. (2007a). Designing a next-gen game for sound. Gamasutra. Retrieved February 13, 2009, from http:// www.gamasutra.com/view/feature/2321/designing_a_ nextgen_game_for_sound.php
Bradley, M. M., & Lang, P. J. (2000). Affective reactions to acoustic stimuli. Psychophysiology, 37, 204–215. doi:10.1017/S0048577200990012
Bridgett, R. (2007b). Interactive ambience. Game Developer Magazine. Retrieved May 8, 2009, from http://www3. telus.net/public/kbridget/aural_fixation_april07.jpg
Bradley, M. M., & Lang, P. J. (2007). Emotion and motivation . In Cacioppo, J. T., Tassinary, L. G., & Berntson, G. G. (Eds.), Handbook of psychphysiology (3rd ed., pp. 581–607). New York: Cambridge University Press. doi:10.1017/CBO9780511546396.025
Bridgett, R. (2009a). The future of game audio: Is interactive mixing the key? Gamasutra. Retrieved May 3, 2009, from http://www.gamasutra.com/view/feature/4025/
Brainpipe. (2008). Digital Eel.
431
Compilation of References
Brodmann, K. (1909). Vergleichende Lokalisationslehre der Grosshirnrinde in ihren Prinzipien dargestellt auf Grund des Zellenbaues. Leipzig, Germany: Johann Ambrosius Barth Verlag. Brown, R. I. F. (1986). Arousal and sensation-seeking components in the general explanation of gambling and gambling addictions. Substance Use & Misuse, 21(9), 1001–1016. doi:10.3109/10826088609077251 Brown, E., & Cairns, P. (2004). A grounded investigation of game immersion . In Dykstra-Erickson, E., & Tscheligi, M. (Eds.), CHI ‘04 extended abstracts (pp. 1297–1300). New York: ACM. Brown, A. R., Wooller, R. W., & Kate, T. (2007,). The morphing table: A collaborative interface for musical interaction. In A. Riddel & A. Thorogood (Eds.), Proceedings of the Australasian Computer Music Conference (pp. 34-39). Canberra, Australia. Browning, T. (Producer/Director). (1931). Dracula [Motion picture]. England: Universal Pictures. Bruyns, C. (2006). Modal synthesis for arbitrarily shaped objects. Computer Music Journal, 30(3), 22–37. doi:10.1162/comj.2006.30.3.22 Bryant, J., Brown, D., Comisky, P. W., & Zillmann, D. (1982). Sports and spectators: Commentary and appreciation. The Journal of Communication, 32(1), 109–119. doi:10.1111/j.1460-2466.1982.tb00482.x Bryant, J., Comisky, P., & Zillmann, D. (1982). Drama in sports commentary. The Journal of Communication, 27(3), 140–149. doi:10.1111/j.1460-2466.1977.tb02140.x Bryman, A. (2008). Social research methods (3rd ed.). Oxford, UK: Oxford University Press. Bugelski, B. R., & Alampay, D. A. (1961). The role of frequency in developing perceptual sets. Canadian Journal of Psychology, 15(4), 201–211. doi:10.1037/h0083443 Bull, M. (2000). Sounding out the city: Personal stereos and the management of everyday life. Oxford, UK: Berg. Bull, M., & Back, L. (2004). The auditory culture reader (1st ed.). Oxford, UK: Berg.
432
Bullerjahn, C., & Güldenring, M. (1994). An empirical investigation of effects of film music using qualitative content analysis. Psychomusicology, 13, 99–118. Burgess, D. (1992). Techniques for low cost spatial audio. In Proceedings of the 5th annual ACM symposium on User interface software and technology. Bushman, B. J., & Anderson, C. A. (2002). Violent video games and hostile expectations: A test of the General Aggression Model. Personality and Social Psychology Bulletin, 28(12), 1679–1686. doi:10.1177/014616702237649 Busso, C., & Narayanan, S. S. (2006). Interplay between linguistic and affective goals in facial expression during emotional utterances. In Proceedings of 7th International Seminar on Speech Production, 549-556. Cabrera Paz, J., & Schwartz, T. B. M. (2009). Technocultural convergence: Wanting to say everything, wanting to watch everything. Popular Communication: The International Journal of Media and Culture, 7(3), 130. Cacioppo, J. T., Tassinary, L. G., & Berntson, G. G. (2007). Handbook of psychophysiology (3rd ed.). Cambridge, UK: Cambridge University Press. doi:10.1017/ CBO9780511546396 Cacioppo, J. T., Berntson, G. G., Larsen, J. T., Poehlmann, K. M., & Ito, T. A. (2004). The psychophysiology of emotion . In Lewis, M., & Haviland-Jones, J. M. (Eds.), Handbook of emotions (2nd ed., pp. 173–191). New York: Guilford Press. Caillois, R. (2001). Man, play and games. Chicago: University of Illinois Press. Cakewalk. (1983). Commavid. Calleja, G. (2007). Digital games as designed experience: Reframing the concept of immersion. Unpublished doctoral dissertation. Victoria University of Wellington, New Zealand. Calleja, G. (2007). Revising immersion: A conceptual model for the analysis of digital game involvement. In Proceedings of Situated Play, DiGRA 2007 Conference, 83-90.
Compilation of References
Cameron, J. (1984). The terminator. Pacific Western. Cameron, J. (1991). Terminator 2: Judgement day. Pacific Western. Cameron, J. (Director). (2009). Avatar [Motion picture]. Los Angeles, CA: 20th Century Fox. Lightstorm Entertainment, Dune Entertainment, Ingenious Film Partners [Studio]. Cancellaro, J. (2006). Exploring sound design for interactive media. Clifton Park, NY: Thomson Delmar Learning. Cannon, W. B. (1927). The James-Lange theory of emotions: A critical examination and an alternative theory. The American Journal of Psychology, 39(1/4), 106–124. doi:10.2307/1415404 Cao, Y., Faloustsos, P., Kohler, E., & Pighin, F. (2004). Real-time speech motion synthesis from recorded motions. In R. Boulic & D. K. Pai (Eds.), Eurographics/ ACM SIGGRAPH Symposium on Computer Animation (2004), 345-353.
Chadabe, J. (1985). Interactive music composition and performance system. U.S. Patent No. 4,526,078. Washington, DC: U.S. Patent and Trademark Office. Chapel, R. H. (2003). Real-time algorithmic music systems from fractals and chaotic functions: Towards an active musical instrument. Unpublished doctoral dissertation. University Pompeu Fabra, Barcelona, Spain. Charlton, J. P., & Danforth, I. D. W. (2004). Differentiating computer-related addictions and high engagement . In Morgan, K., Brebbia, C. A., Sanchez, J., & Voiskounsky, A. (Eds.), Human perspectives in the internet society: culture, psychology and gender. Southampton: WIT Press. Childs, G. W. (2007). Creating music and sound for games. Boston, MA: Thomson Course Technology. Chion, M. (1999). The voice in cinema. New York: Columbia University Press. Chion, M. (1983). Guide des objets sonores: Pierre Schaeffer et la recherche musicale. Paris: Buchet/Chastel.
Carnagey, N. L., Anderson, C. A., & Bushman, B. J. (2007). The effect of video game violence on physiological desensitization to real-life violence. Journal of Experimental Social Psychology, 43(3), 489–496. doi:10.1016/j.jesp.2006.05.003
Chion, M. (1990). L’Audio-vision. Paris: Nathan.
Carr, D. (2006). Space, navigation and affect . In Carr, C., Buckingham, D., Burn, A., & Schott, G. (Eds.), Computer games: Text, narrative and play (pp. 59–71). Cambridge, UK: Polity.
Chion, M. (1994). Audio-vision: Sound on screen (Gorbman, C., Trans.). New York: Columbia University Press.
Chion, M. (2003). Un art sonore, le cinéma: histoire, esthétique, poétique. Paris: Cahiers du Cinéma. Chion, M. (1998). Le son. Paris: Nathan.
Carr, D. (2003). Play dead: Genre and affect in Silent Hill and Planescape Torment. Game Studies, 3(1). Retrieved from http://www.gamestudies.org/0301/carr/
Chion, M. (2003). The silence of the loudspeaker or why with Dolby sound it is the film that listens to us . In Sider, L., Freeman, D., & Sider, J. (Eds.), Soundscape: The School of Sound lectures 1998-2001 (pp. 150–154). London: Wallflower Press.
Carroll, N. (1996). The paradox of suspense. In Vorderer & Friedrichsen (Eds.), Suspense: conceptualization, theoretical analysis, and empirical explorations (pp. 71-90). Hillsdale N.J.: Lawrence Erlbaum Associates.
Clair, R. (1985). The art of sound. Excerpts from a series of letters . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. (Original work published 1929)
Carter, F. A., Wilson, J. S., Lawson, R. H., & Bulik, C. M. (1995). Mood induction procedure: importance of individualising music. Behaviour Change, 12, 159–161.
Clark, L., Lawrence, A. J., Astley-Jones, F., & Gray, N. (2009). Gambling near-misses enhance motivation to gamble and recruit win-related brain circuitry. Neuron, 61(3), 481–490. doi:10.1016/j.neuron.2008.12.031
Castlevania. (1989). Konami Digital Entertainment.
433
Compilation of References
Clarkson, B., Mase, K., & Pentland, A. (2000). Recognizing user context via wearable sensors. In Proceedings of the Fourth International Symposium of Wearable Computers. Cohen, L. (2005). The history of noise [on the 100th anniversary of its birth]. IEEE Signal Processing Magazine, 22(6), 20–45. doi:10.1109/MSP.2005.1550188 Cohen, J. W. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Collins, K. (2008). Game sound: An introduction to the history, theory, and practice of video game music and sound design. Cambridge, MA: MIT Press. Collins, K., Tessler, H., Harrigan, K., Dixon, M. J., & Fugelsang, J. (2011). Sound in electronic gambling machines: A review of the literature and its relevance to game audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Collins, K. (2007). An introduction to the participatory and non-linear aspects of video games audio . In Hawkins, S., & Richardson, J. (Eds.), Essays on sound and vision. Helsinki: Helsinki University Press. Collins, K. (2008b). Nothing odd about audio. Retrieved September 31, 2009, from http://www.slideshare.net/ collinsk/sk-466356 Comisky, P. W., Bryant, J., & Zillmann, D. (1977). Commentary as a substitute for action. The Journal of Communication, 27(3), 150–153. doi:10.1111/j.1460-2466.1977. tb02141.x Command & conquer 3: Tiberium wars. (2007). EA Games. Conati, C. (2002). Probabilistic assessment of user’s emotions in educational games. Applied Artificial Intelligence, 16(7/8), 555–575. doi:10.1080/08839510290030390 Condry, J., & Scheibe, C. (1989). Non program content of television: Mechanisms of persuasion . In Condry, J. (Ed.), The Psychology of Television (pp. 217–219). London: Erlbaum.
434
Connor, S. (2004). Edison’s teeth: Touching hearing. In V. Erlmann (Ed.), Hearing cultures: Essays on sound, listening, and modernity (English ed., pp. 153-172). Oxford, UK: Berg. Cook, P. (Ed.). (1999). Music, cognition, and computerized sound: An introduction to psychoacoustics. Cambridge, MA: MIT Press. Cook, P. R. (1997). Physically informed sonic modeling (PhISM): Synthesis of percussive sounds. Computer Music Journal, 21(3), 38–49. doi:10.2307/3681012 Cook, P. R. (2002). Real sound synthesis for interactive application. Natick, MA: A K Peters, Ltd. Cooking Mama. (2007). OfficeCreate. Majesco Publishing. Cooley, M. (1998, November). Sound + image in computer-based design: Learning from sound in the arts. Paper presented at International Community for Auditory Display Conference, Glasgow, UK. Coppola, F. F. (Director). (1979). Apocalypse now! [Motion picture]. Hollywood, CA: Paramount Pictures. Cornelius, R. R. (1996). The science of emotion. Upper Saddle River, NJ: Prentice-Hall. Coventry, K. R., & Constable, B. (1999). Physiological arousal and sensation seeking in female fruit machine players. Addiction (Abingdon, England), 94, 425–430. doi:10.1046/j.1360-0443.1999.94342512.x Coventry, K. R., & Hudson, J. (2001). Gender differences, physiological arousal and the role of winning in fruit machine gamblers. Addiction (Abingdon, England), 96, 871–879. doi:10.1046/j.1360-0443.2001.9668718.x Cowley, B., Charles, D., Black, M., & Hickey, R. (2008). Toward an understanding of flow in video games. ACM Computers in Entertainment, 6(2). Crayon Physics Deluxe. (2009). Petri Purho (Developer). San Mateo: Hudson Soft.
Compilation of References
Creswell, J. (2005). Educational research: Planning, conducting, and evaluating quantitative and qualitative research (2nd ed.). Upper Saddle River, New Jersey: Pearson Education. Crockford, D., Goodyear, B., Edwards, J., Quickfall, J., & el-Guebaly, N. (2005). Cue-Induced brain activity in pathological gamblers. Biological Psychiatry, 58(10), 787–795. doi:10.1016/j.biopsych.2005.04.037 Crysis. (2007). EA Games, Crytek. Csíkszentmihályi, M. (1975). Beyond boredom and anxiety. San Francisco: Jossey-Bass Publishers. Cunningham, S., Grout, V., & Picking, R. (2011). Emotion, content and context in sound and music . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Cunningham, S., Bergen, H., & Grout, V. (2006). A note on content-based collaborative filtering of music. In Proceedings of IADIS - International Conference on WWW/Internet. Cunningham, S., Caulder, S., & Grout, V. (2008). Saturday night or fever? Context aware music playlists. In Proceedings of the 3rd Conference on Interaction with Sound, Audio Mostly 2008 (pp. 64-71). Piteå, Sweden. Cunningham, S., Grout, V., & Hebblewhite, R. (2006). Computer game audio: The unappreciated scholar of the Half-Life generation. In Proceedings of the Audio Mostly Conference on Sound in Games. Curtis, S. (1992). The sound of the early Warner Bros. cartoons . In Altman, R. (Ed.), Sound theory sound practice. New York: Routledge. Damasio, A. R. (1994). Descartes’ error. New York: G. P. Putnam. Damásio, A. (2000). The feeling of what happens: Body and emotion in the making of consciousness. London: Vintage Books.
Damásio, A. (2003). Emotion, feeling, and social behavior: The brain perspective. The Walter Chapin Simpson Center for the Humanities. Retrieved September 31, 2009, from http://depts.washington.edu/uwch/katz/20022003/ antonio_damasio.html Dance dance revolution. (1998). Konami. Darwin, C. (1899). The expression of the emotions in man and animals. New York: D. Appleton and Company. Darwinia. (2007). Ambrosia Software. Dave mirra freestyle BMX. (2000). Z-Axis. Davies, G., Cunningham, S., & Grout, V. (2007). Visual stimulus for aural pleasure. In Proceedings of the Audio Mostly Conference on Interaction with Sound. Davis, H., & Silverman, R. (1978). Hearing and deafness (4th ed.). Location: Thomson Learning. de Certeau, M. D. (1988). The practice of everyday life. Berkeley: University of California Press. De Poli, G., Piccialli, A., & Roads, C. (1991). Representations of musical signals. Cambridge, MA: MIT Press. De Sanctis, G., Sarti, A., Scarparo, G., & Tubaro, S. (2005). Automatic modelling and authoring of nonlinear interactions between acoustic objects. In K. Galkowski, A. Kummert, E. Rogers & J. Velten (Eds.), The Fourth International Workshop on Multidimensional Systems – NDS 2005 (pp.116-122). Dead space. [Computer game]. (2008). EA Redwood Shores (Developer). Redwood City: Electronic Arts. Dekker, A., & Champion, E. (2007). Please biofeed the zombies: Enhancing the gameplay and display of a horror game using biofeedback. In Proceedings of DiGRA: Situated Play Conference. Retrieved January 1, 2010, from http://www.digra.org/dl/db/07312.18055.pdf. Dektela, R., & Sical, W. (2003). Survival horror: Un genre nouveau. Horror Games Magazine, 1(1), 13–16.
Damásio, A. (2005). Descartes’ error: Emotion, reason, and the human brain. London: Vintage Books.
435
Compilation of References
Delfabbro, P., Fazlon, K., & Ingram, T. (2005). The effects of parameter variations in electronic gambling simulations: Results of a laboratory-based pilot investigation. Gambling Research: Journal of the National Association for Gambling Studies, 17(1), 7–25. Deutch, S. (2003). Music for interactive moving pictures . In Sider, L., Freeman, D., & Sider, J. (Eds.), Soundscape: The School of Sound lectures 1998-2001 (pp. 28–34). London: Wallflower Press. Deutsch, S. (2001). Harnessing the power of music and sound design in interactive media . In Earnshaw, R., & Vince, J. (Eds.), Digital content creation. New York: Springer. Diablo 2. (2000). Blizzard Entertainment. Dibben, N. (2001). What do we hear, when we hear music? Music perception and musical material. Musicae Scientiae, 2, 161–194. Dickerson, M., & Adcock, S. (1987). Mood, arousal and cognitions in persistent gambling: Preliminary investigation of a theoretical model. Journal of Gambling Behaviour, 3(1), 3–15. doi:10.1007/BF01087473 Dig dug. (1983). Atari. DigiWall [Computer game]. (2010). Piteå, Sweden: Digiwall Technology. Retrieved February 10, 2010, from http://www.digiwall.se/. Dix, A., Finlay, J., & Abowd, G. D. (2004). Humancomputer interaction. Harlow, UK: Pearson Education. DJ hero. [Video game],(2009). FreeStyleGames (Developer), Santa Monica, CA: Activision.
Doel, K. d., Knott, D., & Pai, D. K. (2004). Interactive simulation of complex audio-visual scenes. Presence (Cambridge, Mass.), 13(1), 99–111. doi:10.1162/105474604774048252 Doel, K. d., & Pai, D. K. (1998). The sounds of physical shapes. Presence (Cambridge, Mass.), 7(4), 382–395. doi:10.1162/105474698565794 Doel, K. d., & Pai, D. K. (2006). Modal synthesis for vibrating objects. In K. Greenebaum, & R. Barzel (Eds.), Audio anecdotes III: Tools, tips, and techniques for digital audio (pp. 99-120). Wellesley, MA: A K Peters, Ltd. Doel, K. d., Kry, P. G., & Pai, D. K. (2001). FoleyAutomatic: Physically-based sound effects for interactive simulation and animation. In P. Lynn (Ed.), Proceedings of SIGGRAPH ’01: The 28th annual conference on Computer graphics and interactive techniques (pp. 537-544). New York: ACM. Doel, K. d., Pai, D. K., Adam, T., Kortchmar, L., & PichoraFuller, K. (2002). Measurements of Perceptual Quality of Contact Sound Models. In Nakatsu & H. Kawahara (Eds.), Proceedings of the 8th International Conference on Auditory Display, (pp. 345-349). Kyoto, Japan: ATR. Donkey kong [Computer game]. (1981). Kyoto: Nintendo. Donkey Konga. [Video game], (2004). Namco (Developer), Kyoto: Nintendo. Doom 3. (2004). Activision. Dorval, M., & Pepin, M. (1986). Effect of playing a video game on a measure of spatial visualization. Perceptual and Motor Skills, 62, 159–162.
Dixon, L., Trigg, R., & Griffiths, M. (2007). An empirical investigation of music and gambling behaviour. International Gambling Studies, 7(3), 315–326. doi:10.1080/14459790701601471
Douglas, Y., & Hargadon, A. (2000). The pleasure principle: Immersion, engagement, flow. In Proceedings of the eleventh ACM on Hypertext and Hypermedia (pp.153160), New York: ACM.
Dixon, M., Harrigan, K. A., Sandhu, R., Collins, K., & Fugelsang, J. (2011: In press). Slot machine play: Psychophysical responses to wins, losses, and losses disguised as wins. Addiction.
Dragon age: Origins. (2009). EA Games, Bioware.
436
Dreher, R. E. (1947). The relationship between verbal reports and the galvanic skin response. Journal of Abnormal and Social Psychology, 44, 87–94.
Compilation of References
Drescher, P. (2006a). GAC(k!). Retrieved June 8, 2009, from http://blogs.oreilly.com/digitalmedia/2006/09/ gack-1.html Drescher, P. (2006b). THE Homunculonic AEStheticator. Retrieved June 8, 2009, from http://blogs.oreilly.com/digitalmedia/2006/11/the-homunculonic-aestheticator-1.html Dretzka, G. (2004, December 12). Casinos, celebrities bet on our love for pop culture icons. Seattle Times. Retrieved July 15, 2009, from http://community.seattletimes.nwsource.com/archive/?date=20041212&slug=casinos12. Drobnick, J. (2004). Aural cultures. Toronto: YYZ Books. Droumeva, M. (2011). An acoustic communication framework for game sound: Fidelity, verisimilitude, ecology . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Duda, R. O., Algazi, V. R., & Thompson, D. M. (2002). The use of head-and-torso models for improved spatial sound synthesis. In Proceedings of the 113th Audio Engineering Society Convention. Dyson. (2009). Kremers & May. Dyson, F. (1996). When is the ear pierced? The clashes of sound, technology and cyberculture . In Moser, M. A., & MacLeod, D. (Eds.), Immersed in technology: Art and virtual environments. Cambridge, MA: MIT Press. Ebcioglu, K. (1992). An expert system for harmonizing chorales in the style of J. S. Bach . In Balaban, M., Ebcioglu, K., & Laske, O. (Eds.), Understanding music with AI: Perspectives on music cognition (pp. 294–334). Cambridge, MA: MIT Press. Edworthy, J., Loxley, S., & Dennis, I. (1991). Improving auditory warning design: Relationship between warning sound parameters and perceived urgency. Human Factors, 33(2), 205–231. Effrat, J., Chan, L., Fogg, B. J., & Kong, L. (2004). What sounds do people love and hate? Interaction, 11(5), 64–66. doi:10.1145/1015530.1015562
Eijkman, E., & Vendrik, J. H. (1965). Can a sensory system be specified by its internal noise? The Journal of the Acoustical Society of America, 37, 1102–1109. doi:10.1121/1.1909530 Eisenstein, S. M., Pudovkin, V. I., & Alexandrov, G. V. (1985). Statement on the sound film . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. (Original work published 1928) Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200. doi:10.1080/02699939208411068 Ekman, P., & Friesen, W. V. (1978). Facial action coding system: A technique for the measurement of facial movement. Palo Alto, CA: Consulting Psychologists Press. Ekman, I., & Lankoski, P. (2009). Hair-raising entertainment: Emotions, sound, and structure in Silent Hill 2 and Fatal Frame . In Perron, B. (Ed.), Horror video games: Essays on the fusion of fear and play (pp. 181–199). Jefferson, NC: McFarland. Ekman, I. (2005). Meaningful noise: Understanding sound effects in computer games. In Proceedings of Digital Arts and Cultures 2005. Copenhagen, Denmark. Ekman, I. (2008). Comment on the IEZA: A framework for game audio. Gamasutra. Retrieved January 13, 2010, from http://www.gamasutra.com/view/feature/3509/ ieza_a_framework_for_game_audio.php Ekman, I. (2008). Psychologically motivated techniques for emotional sound in computer games. In Proceedings of the 3rd Conference on Interaction with Sound, Audio Mostly 2008 (pp. 20-26). Piteå, Sweden. Ekman, I. (2009). Modelling the emotional listener: Making psychological processes audible. In Proceedings of Audio Mostly 2009: 4th Conference on Interaction with Sound (pp. 33-40). Glasgow, Scotland: Glasgow Caledonian University, Interactive Institute/Sonic Studio Piteå.
437
Compilation of References
Ekman, I., & Kajastila, R. (2009, February 11-13). Localisation cues affect emotional judgements: Results from a user study on scary sound. Paper presented at the AES 35th International Conference, London, UK. Ekman, I., Ermi, L., Lahti, J., Nummela, J., Lankoski, P., & Mäyrä, F. (2005). Designing sound for a pervasive mobile game. In Proceedings of the ACM SIGCHI International Conference on Advances in Computer Entertainment Technology,2005. Eldridge, A. C. (2002). Adaptive systems music: Musical structures from algorithmic process. In C. Soddu (Ed.), Proceedings of the 6th Generative Art Conference Milan, Italy: Politecnico di Milano University. Electroplankton. [Video game], (2006). Indies Zero (Developer), Kyoto: Nintendo. Elite beat agents. [Video game], (2006). iNiS (Developer), Kyoto: Nintendo. Ellis, S. R. (1996). Presence of mind... A reaction to Thomas Sheridan’s “Musing on telepresence.” . Presence (Cambridge, Mass.), 5, 247–259. Elmore, W. C., & Heald, M. A. (1969). Physics of waves. Location: McGraw Hill. Epstein, M. (2009). Growing an interdisciplinary hybrid: The case of acoustic ecology. History of Intellectual Culture, 3(1). Retrieved December 29, 2009, from http:// www.ucalgary.ca/hic/issues/vol3/9. Ermi, L., & Mäyrä, F. (2005). Fundamental components of the gameplay experience: Analysing immersion. In Proceedings of DiGRA 2005 Conference Changing Views: Worlds in Play. Retrieved January 1, 2010, from http:// www.digra.org/dl/db/06276.41516.pdf.
Everest, F. A. (1997). Sound studio construction on a budget. City, ST: McGraw-Hill. Everest, F. A. (2001). Master handbook of acoustics. City, ST: McGraw-Hill. F.E.A.R. (2005). Vivendi Universal Games. Monolith Productions. FableII. (2008). Microsoft. Fake engine noises added to hybrid and electric cars to improve safety. (2008). Retrieved January 10, 2010, from http://www.switched.com/2008/06/05/fake-enginenoises-added-to-hybrid-and-electric-cars-to-improve/. Fallout 3. (2008). Bethesda Softworks. Bethesda Game Studios. Farmer, D. (2009). The making of Torment audio. Retrieved July 9, 2009, from http://www.filmsound.org/ game-audio/audio.html. Farnell, A. J. (2008). Designing sound. London: Applied Scientific Press. Farnell, A. (2011). Behaviour, structure and causality in procedural audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Farnell, A. (2007). An introduction to procedural audio and its application in computer games. Retrieved May 20, 2009, from http://www.obiwannabe.co.uk/html/ papers/proc-audio Farris, J. S. (2003). The human interaction cycle: A proposed and tested framework of perception, cognition, and action on the web. Unpublished doctoral dissertation. Kansas State University, USA.
Ernerfeldt, E. (2008). Phun: 2D physics sandbox. Available from http://www.phunland.com/wiki/Home.
Fastl, H., & Zwicker, E. (2007). Psychoacoustics: Facts and models (3rd ed., Vol. 22). Berlin, Heidelberg: Springer.
Essl, G., Serafin, S., Cook, P., & Smith, J. O. (2004). Theory of banded waveguides. Computer Music Journal, 28(1), 37–50. doi:10.1162/014892604322970634
Fatal frame. [Computer game]. (2002). Tecmo (Developer). Torrance: Tecmo.
Eternal Darkness. (2002). Nintendo.
438
Feld, S. (2004). A rainforest acoustemology . In Bull, M., & Back, L. (Eds.), The auditory culture reader (1st ed., pp. 223–240). Oxford, UK: Berg Publishers.
Compilation of References
Ferber, D. (2003, September) The man who mistook his girlfriend for a robot. Popular Science. Retrieved April 7, 2009, from http://iiae.utdallas.edu/news/pop_science. html. Ferguson, C. J. (2007). Evidence for publication bias in video game violence effects literature: A meta-analytic review. Aggression and Violent Behavior, 12(4), 470–482. doi:10.1016/j.avb.2007.01.001 Fernandez, A. (2008). Fun experience with digital games: A model proposition . In Leino, O., Wirman, H., & Fernandez, A. (Eds.), Extending experiences: Structure, analysis and design of computer game player experience (pp. 181–190). Rovaniemi, Finland: Lapland University Press. Ferrari, M., & Ives, S. (2005). Slots: Las Vegas gamblers lose some $5 billion a year at the slot machines alone. Las Vegas: An unconventional history. New York: Bulfinch. Fettweis, A. (1986). Wave digital filters: Theory and practice. Proceedings of the IEEE, 74(2), 270–327. doi:10.1109/PROC.1986.13458 FIFA. (1993-). EA Sports Figgis, M. (2003). Silence: The absence of sound . In Sider, L., Freeman, D., & Sider, J. (Eds.), Soundscape: The School of Sound lectures 1998-2001 (pp. 1–14). London: Wallflower Press. Final Fantasy 2. (1988). Squaresoft. Square ENIX. Firelight (2009). FMOD Ex. v4.28 [Computer software]. Victoria, Australia: Firelight Technologies. Fitterer, D. (2008). Audiosurf: Ride Your Music [Computer game]. Washington, DC: Valve. Fleming, J. (2009). Planet of sound: Talking art, noise, and games with EA’s Robi Kauker. Retrieved June 3, 2009, from http://www.gamasutra.com/view/feature/3978/ planet_of_sound_talking_art_.php Fletcher, N. H., & Rossing, T. D. (1991). The physics of musical instruments. New York: Springer. Fletcher, T. D. R. N. H. (2004). Principles of vibration and sound (2nd ed.). New York.
Flossmann, S., Grachten, M., & Widmer, G. (2009). Expressive performance rendering: introducing performance context. In Proceedings of the 6th Sound and Music Computing Conference (SMC). Porto, Portugal: Universidade do Porto. Flückiger, B. (2001). Sound design, die virtuelle Klangwelt des Films. Marburg, Country: Schüren. Folkman, S., & Lazarus, R. S. (1990). Coping and emotion . In Leventhal, N. B., & Trabasso, T. (Eds.), Psychological and biological approaches to emotion (pp. 313–332). Hillsdale, NJ: Erlbaum. Follett, J. (2007). Audio and the user experience. UXmatters. Retrieved September 31, 2009, from http://www. uxmatters.com/MT/archives/000200.php Foote, J. (1999). Visualizing music and audio using selfsimilarity. Proceedings of the seventh ACM international conference on Multimedia (Part 1), 77-80. Frauenberger, C. (2007). Ears))): A methodological framework for auditory display design . In CHI ‘07 extended abstracts on Human factors in computing systems (pp. 1641–1644). San Jose, CA: ACM Press. Freeman, D. (2004). Creating emotion in games: The craft and art of emotioneering™. Computers in Entertainment, 2(3), 15. doi:10.1145/1027154.1027179 Freeman, D. (2003). Creating emotions in games. Berkley, CA: New Riders Games. FreeStyleGames (2009). DJ Hero [Computer game]. FreeStyleGames (Developer), Activision. Frequency. (2001). Sony Computer Entertainment (PlayStation 2). Freud, S. (1919). The Uncanny . In The standard edition of the complete psychological works of Sigmund Freud (Vol. 17, pp. 219–256). London: Hogarth Press. Friberg, A., Bresin, R., & Sundberg, J. (2006). Overview of the KTH Rule System for musical performance. Advances in Cognitive Psychology . Special Issue on Music Performance, 2(2/3), 145–161.
439
Compilation of References
Friberg, J., & Gärdenfors, D. (2004). Audio games: New perspectives on game audio. In Proceedings of the ACM SIGCHI International Conference on Advances in Computer Entertainment Technology2004, 148-154.
Gasser, M., Pampalk, E., & Tomitsch, M. (2007). A content-based user-feedback driven playlist generator and its evaluation in a real-world scenario. In Proceedings of the Audio Mostly Conference on Interaction with Sound.
Friday the 13th. [Computer game]. (1989). Pack-In-Video (Developer). New York: LJN.
Gauss, C. F. (1882). General solution of the problem: To map a part of a given surface on another given surface so that the image and the original are similar in their smallest parts. Copenhagen: Journal of Royal Society of Science.
Frisby, D. (2002). Cityscapes of modernity: Critical explorations. Cambridge, UK: Polity. Frohlich, D., & Murphy, R. (1999, December 20). Getting physical: what is fun computing in tangible form? Paper presented at the Computers and Fun 2 Workshop, York, UK. Funkhouser, T., Carlbom, I., Elko, G., Pingali, G., Sondhi, M., & West, J. (1998). A beam-tracing approach to acoustic modelling for interactive virtual environments. In S. Cunningham, W. Bransford & M. F. Cohen (Eds.) Proceedings of SIGGRAPH ’98: The 25th annual conference on Computer graphics and interactive techniques (pp. 21-28). New York: ACM. Gaboury, A., & Ladoucer, R. (1989). Erroneous perceptions and gambling. Journal of Social Behavior and Personality, 4(41), 111–120. Gabrielsson, A., & Lindström, E. (2001). The influence of musical structure on emotional expression . In Juslin, P., & Sloboda, J. A. (Eds.), Music and emotion: Theory and research. Oxford, UK: Oxford University Press. Gackenbach, J. (2008). The relationship between perceptions of video game flow and structure. Loading... 1(3). Galloway, A. R. (2006). Gaming: Essays on algorithmic culture. Electronic Mediations (Vol. 18). Minneapolis: University of Minnesota Press. Gardner, W. G. (1992, November). A realtime multichannel room simulator. Paper presented at the 124th meeting of the Acoustical Society of America. Garlin, F. V., & Owen, K. (2006). Setting the tone with the tune: A meta-analytic review of the effects of background music in retail settings. Journal of Business Research, 59, 755–764. doi:10.1016/j.jbusres.2006.01.013
440
Gaver, W. (1993). What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology, 5(1), 1–29. doi:10.1207/ s15326969eco0501_1 Gaver, W. (1997). Auditory interfaces . In Helander, M. G., Landauer, T. K., & Prabhu, P. (Eds.), Handbook of human-computer interaction (2nd ed.). Amsterdam: Elsevier Science. doi:10.1016/B978-044481862-1/50108-4 Gaver, W. (1994). Using and creating auditory icons. In G. Kramer (Ed.). Auditory Display: Signification, Audification, and Auditory Interfaces (Santa Fe Institute Studies in the Sciences of Complexity, Vol. 18, pp. 417-446). Reading, MA: Addison-Wesley. Gaver, W. W. (1993a). Synthesizing auditory icons. In S. Ashlund, K. Mullet, A. Henderson, E. Hollnagel & T. White (Eds.) Proceedings of the INTERCHI ’93 conference on Human factors in computing systems (pp. 228-235). New York: ACM. Gaver, W. W., Beaver, J., & Benford, S. (2003). Ambiguity as a resource for design. Proceedings of the ACM CHI Conference on Human Factors in Computing Systems, 2003, 233-240. Gears of War. (2007). Microsoft. Gebeke, D. (1993). Children and fear. Retrieved December 10, 2009, from http://www.ag.ndsu.edu/pubs/yf/famsci/ he458w.htm. Geiger, G. (2005). Abstraction in computer music software systems. Unpublished doctoral dissertation. Universitat Pomp eu Fabra, Barcelona.
Compilation of References
Genette, G. (1983). Narrative discourse: An essay in method. Ithaca, NY: Cornell University Press. Gescheider, G. A., Sager, L. C., & Ruffolo, L. J. (1975). Simultaneous auditory and tactile information processing. Perception & Psychophysics, 18, 209–216. Gibson, J. (1986). The ecological approach to visual perception. New Jersey: LEA. Gibson, J. (1977). The theory of affordances . In Shaw, R. E., & Bransford, J. (Eds.), Perceiving, acting and knowing (pp. ##-##). New Jersey: LEA. Gilleade, K. M., & Dix, A. (2004). Using frustration in the design of adaptive videogames. In [New York: ACM.]. Proceedings of ACE, 2004, 228–232. Gilleade, K. M., Dix, A., & Allanson, J. (2005). Affective videogames and modes of affective gaming: Assist me, challenge me, emote me. In Proceedings of DiGRA 2005 Conference: Changing Views: Worlds in Play. Retrieved January 1, 2010, from http://www.digra.org/ dl/db/06278.55257.pdf. Giordano, B. (2001). Preliminary observations on materials recovering from real impact sounds: Phenomenology of sound events . In Polotti, P., Papetti, S., Rocchesso, D., & Delle, S. (Eds.), The sounding object (Sob project) (p. 24). Verona: University of Verona. Gitaroo man. [Video game], (2001). Koei/iNiS (Developer) (PlayStation 2). Electroplankton. [Video game], (2006). Indies Zero (Developer), Kyoto: Nintendo. Glass, D. C., & Singer, J. E. (1972). Urban stress. New York: Academic. God of War 2. (2007). SCE Studios Santa Monica. Sony Computer Entertainment. Goffman, E. (1959). The presentation of self in everyday life (1st ed.). New York: Anchor. Goldstein, E. B. (2002). Wahrnehmungspsychologie (2nd ed.). Berlin: Spektrum Akadem. Verlag.
Gordon, C., Webb, D. L., & Wolpert, S. (1992). Isospectral plane domains and surfaces via Riemannian orbifolds. Inventiones Mathematicae, 110, 1–22. doi:10.1007/ BF01231320 Gouk, P. (2004). Raising spirits and restoring souls: Early modern medical explanations for music’s effects . In Erlmann, V. (Ed.), Hearing cultures: Essays on sound, listening and modernity (pp. 87–105). Oxford: Berg. Gouskos, C. (2006). The depths of the Uncanny Valley. Gamespot. Retrieved April 7, 2009, from, http:// uk.gamespot.com/features/6153667/index.html. Goyal, V. (2006). Pro Java ME MMAPI: Mobile media API for Java Micro Edition. City, CA: Apress Press Inc. JSR-234 Group. (2005). Advanced multimedia supplements API for JavaTM2 Micro Edition. Nokia Corporation. Grand Theft Auto. (2004). San Andreas. Rockstar North. Rockstar Games. Grant, W., Wassenhove, V., & Poeppel, D. (2004). Detection of auditory (cross-spectral) and auditory-visual (crossmodal) synchrony. Speech Communication, 44(1/4), 43–53. doi:10.1016/j.specom.2004.06.004 Gray, J. A. (1971). The psychology of fear and stress. New York: McGraw-Hill. Green, R. D., MacDorman, K. F., Ho, C. C., & Vasudevan, S. K. (2008). Sensitivity to the proportions of faces that vary in human likeness. Computers in Human Behavior, 24(5), 2456–2474. doi:10.1016/j.chb.2008.02.019 Greeno, J. G., Collins, A. M., & Resnick, L. B. (1996). Cognition and learning . In Berliner, D., & Calfee, R. (Eds.), Handbook of educational psychology (pp. 15–46). New York: Simon & Schuster Macmillan. Grey Matter [INDIE arcade game]. (2008). McMillen, E., Refenes, T., & Baranowsky, D. (Developers). San Francisco, CA: Kongregate. Grey, J. M. (1975). Exploration of musical timbre. Stanford University Dept. Music Technology Report, STAN-M-2.
Gorbman, C. (1987). Unheard melodies? Narrative film music. Bloomington: Indiana University Press.
441
Compilation of References
Griffiths, M. D. (1990). The cognitive psychology of gambling. Journal of Gambling Studies, 6(1), 31–42. doi:10.1007/BF01015747 Griffiths, M., & Parke, J. (2005). The psychology of music in gambling environments: An observational research note. Journal of Gambling Issues, 13. Retrieved July 15, 2009, from http://www.camh.net/egambling/issue13/ jgi_13_griffiths_2.html. Grigg, C., Whitmore, G., Azzarello, P., Pilon, J., Noel, F., Snyder, S., et al. (2006, May). Group report: Providing a high level of mixing aesthetics in interactive audio and games. Paper developed at the Annual Interactive Music Conference Project Bar-B-Q. Grimshaw, M., & Schott, G. (2007). Situating gaming as a sonic experience: The acoustic ecology of first person shooters . In Proceedings of DiGRA 2007. Situated Play. Grimshaw, M. (2008). The acoustic ecology of the firstperson shooter: The player experience of sound in the first-person shooter computer game. Saarbrucken, Germany: VDM Verlag. Grimshaw, M. (2008). Sound and immersion in the firstperson shooter. International Journal of Intelligent Games & Simulation, 5(1), 2–8.
Grimshaw, M. (2009). The audio uncanny valley: Sound, fear and the horror game. In Proceedings of the 4th Conference on Interaction with Sound, Audio Mostly 2009 (pp. 21-26). Glasgow, UK. Grimshaw, M., & Schott, G. (2008). A conceptual framework for the analysis of first-person shooter audio and its potential use for game engines. International Journal of Computer Games Technology, 2008. Guitar hero 5. [Video game], (2009). RedOctane (Developer), Santa Monica, CA: Activision. Guitar hero II. [Video game], (2006). RedOctane (Developer), Santa Monica, CA: Activision. Guitar hero III. [Video game], (2007). RedOctane (Developer), Santa Monica, CA: Activision. Guitar hero world tour. [Video game], (2008). RedOctane (Developer), Santa Monica, CA: Activision. Guitar hero. (2005-). [Computer software]. Harmonix Music Systems (2005- 2007)/ Neversoft (2007-). Guitar hero. [Video game], (2005). RedOctane (Developer), New York: MTV Games.
Grimshaw, M. (2008). Per un’analisi comparata del suono nei videogiochi e nel cinema . In Bittanti, M. (Ed.), Schermi interattivi saggi critici su videogiochi e cinema (pp. 95–121). (Bittanti, M., Trans.). Roma: Meltemi.
Gullone, E., King, N., & Ollendick, T. (2000). The development and psychometric evaluation of the Fear Experiences Questionnaire: An attempt to disentangle the fear and anxiety constructs. Clinical Psychology & Psychotherapy, 7(1), 61–75. doi:10.1002/(SICI)10990879(200002)7:13.0.CO;2-P
Grimshaw, M. (2007). Sound and immersion in the firstperson shooter. In Proceedings of The 11th International Computer Games Conference: AI, Animation, Mobile, Educational & Serious Games (CGAMES 2007).
Haas, E. C., & Edworthy, J. (1996). Designing urgency into auditory warnings using pitch, speed and loudness. Computing and Control Engineering Journal, 7, 193–198. doi:10.1049/cce:19960407
Grimshaw, M. (2007). The acoustic ecology of the first person shooter. Unpublished doctoral dissertation. University of Waikato, New Zealand.
Half Life 2. [Computer game]. (2008). Valve Corporation (Developer). Redwood City, CA: EA Games.
Grimshaw, M. (2007). The resonating spaces of firstperson shooter games. In Proceedings of The 5th International Conference on Game Design and Technology. Retrieved January 1, 2010, from http://digitalcommons. bolton.ac.uk/gcct_conferencepr/4/.
442
Half-Life series. (1998-). Valve. Halloween. [Computer game]. (1983). Video Software Specialist (Developer). Los Angeles: Wizard Video Games.
Compilation of References
Hansen, S. H., & Jensenius, A. R. (2006). The Drum Pants. In Proceedings of Audio Mostly 2006: A Conference on Sound in Games (pp. 60-63). Piteå, Sweden: Interactive Institute/Sonic Studio. Hanson, D. (2006). Exploring the aesthetic range for humanoid robots. In Proceedings of the ICCS/CogSci-2006 Long Symposium: Toward Social Mechanisms of Android Science, 16-20. Harmonix (2003). Amplitude [Computer game]. Harmonix (Developer), Sony. Harmonix (2006-2009). Guitar Hero series [Computer games]. Harmonix, Neversoft, Vicarious Visions, Budcat Creations, RedOctane (Developers), Activision. Harrigan, K. A. (2009). Slot machines: Pursuing responsible gaming practices for virtual reels and near misses. International Journal of Mental Health and Addiction, 7(1), 68–83. doi:10.1007/s11469-007-9139-8 Harrigan, K. A., & Dixon, M. (2009). PAR sheets, probabilities, and slot machine play: Implications for problem and non-problem gambling. Journal of Gambling Issues, 23, 81–110. doi:10.4309/jgi.2009.23.5 Harry Potter and the Chamber of Secrets. (2002). Eurocom. Electronic Arts. Harvey, A., & Samyn, M. (2006). Realtime art manifesto. Retrieved June 6, 2009, from http://tale-of-tales.com/ tales/RAM.html Hassanpour, A. (2009). Dubbing. The Museum of Broadcast Communications. Retrieved July 14, 2009, from, http://www.museum.tv/archives/etv/D/htmlD/dubbing/ dubbing.htm. Hassenzahl, M., & Roto, V. (2007). Being and doing: A perspective on User Experience and its measurement. Interfaces, 72, 10–12. Hassenzahl, M., & Tractinsky, N. (2006). User Experience—a research agenda [Editorial]. Behaviour & Information Technology, 25(2), 91–97. doi:10.1080/01449290500330331
Hauntedhouse.[Computer game]. (1981). Atari (Developer).Sunnyvale: Atari. Haze. (2008). Free Radical Design. Hazlett, R. L. (2006). Measuring emotional valence during interactive experiences: Boys at video game play. In Proceedings of CHI’06 (pp. 1023 – 1026). New York: ACM. Healy, A. F., Proctor, R. W., & Weiner, I. B. (2004). Handbook of psychology: Vol. 4. Experimental psychology. Hoboken, NJ: Wiley. Heavenly sword. (2007). Sony. Hébert, S., Béland, R., & Dionne-Fournelle, O. (2005). Physiological stress response to video-game playing: the contribution of built-in music. Life Sciences, 76, 2371–2380. doi:10.1016/j.lfs.2004.11.011 Herber, N. (2006). The Composition-Instrument: Musical emergence and interaction. In Proceedings of Audio Mostly 2006: A Conference on Sound in Games (pp. 53-59). Piteå, Sweden: Interactive Institute/Sonic Studio Piteå. Hermann, T., & Hunt, A. (2005). Guest Editors’ Introduction: An Introduction to Interactive Sonification. IEEE MultiMedia, 12(2), 20–24. doi:10.1109/MMUL.2005.26 Hermann, T., & Ritter, H. (1999). Listen to your data: Model-based sonification for data analysis . In Advances in intelligent computing and multimedia systems (pp. 189–194). Baden-Baden. Hiller, L. A., & Isaacsons, L. M. (1959). Experimental music: Composing with an electronic computer. New York: McGraw Hill. Hiller, L. and Ruiz, P. (1971). Synthesizing musical sounds by solving the wave equation for vibrating objects. Journal of the Audio Engineering Society. Hirokawa, E. (2004). Effects of music, listening, and relaxation instructions on arousal changes and the working memory task in older adults. Journal of Music Therapy, 41(2), 107–127.
443
Compilation of References
Hirsch, A. R. (1995). Effects of ambient odors on slotmachine usage in a Law Vegas casino. Psychology and Marketing, 12(7), 585–594. doi:10.1002/mar.4220120703 Hitchcock, A. (1963). The birds. Universal Pictures. Hitchcock, A. (1956). The Man Who Knew Too Much [Motion picture]. Hollywood, CA: Paramount. Hitchcock, A. (Director) (1960). Psycho. Hollywood, CA: Paramount. Hitman. (2002). Io Interactive. Eidos Interactive. Ho, C. C., MacDorman, K., & Pramono, Z. A. D. (2008,). Human emotion and the uncanny valley. A GLM, MDS, and ISOMAP analysis of robot video ratings. In Proceedings of the Third ACM/IEEE International Conference on Human-Robot Interaction, 169-176.
Hudlicka, E. (2008). Affective computing for game design. In Proceedings of the 4th International North American Conference on Intelligent Games and Simulation (GAMEON-NA).Montreal, Canada. Hug, D. (2011). New wine in new skins: Sketching the future of game sound design . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Hug, D. (2007). Game sound education at ZHdK: Between research laboratory and experimental education. In Proceedings of Audio Mostly 2007 - 2nd Conference on Interaction with Sound. Hug, D. (2008a). Towards a hermeneutics and typology of sound for interactive commodities. In Proceedings of the CHI 2008 Workshop on Sonic Interaction Design.
Hodgkinson, G. (2009). The seduction of realism. In Proceedings of ACM SIGGRAPH ASIA 2009 Educators Program (pp. 1-4). Yokohama, Japan: The Association for Computing Machinery.
Hug, D. (2008b). Genie in a bottle: Object-sound reconfigurations for interactive commodities. In Proceedings of Audiomostly 2008, 3rd Conference on Interaction With Sound.
Hoeger, L., & Huber, W. (2007). Ghastly multiplication: Fatal Frame II and the videogame Uncanny. In Proceedings of Situated Play, DiGRA 2007 Conference, Tokyo, Japan, 152-156.
Huizinga, J. (1955). Homo ludens: A study of the play element in culture. Boston: Beacon Press.
Hopson, J. (2001). Behavioral game design. Gamasutra. Retrieved October 23, 2009, from http://www.gamasutra. com/view/feature/3085/behavioral_game_design.php. Hörnel, D. (2000). Lernen musikalischer Strukturen und Stile mit neuronalen Netzen. Karlsruhe, Germany: Shaker. Hörnel, D., & Menzel, W. (1999). Learning musical structure and style with neural networks. Computer Music Journal, 22(4), 44–62. doi:10.2307/3680893 Howard, D. M., & Angus, J. (1996). Acoustics and psychoacoustics. Oxford: Focal Press. Howard, I. P. (1982). Human visual orientation. New York: Wiley.
444
Ihde, D. (1976). Listening and voice: A phenomenology of sound. Athens, OH: Ohio University Press. IJsselsteijn, W., Poels, K., & de Kort, Y. A. W. (2008). The Game Experience Questionnaire: Development of a self-report measure to assess player experiences of digital games. FUGA Deliverable D3.3. Eindhoven, The Netherlands: TU Eindhoven. Ion Storm Inc (Developer). (2004). Thief: Deadly Shadows [Computer game]. Eidos Interactive. ITU-R BT.1359-1. (1998). Relative timing of sound and vision for broadcasting. Question ITU-R, 35(11). Ivory, J. D., & Kalyanaraman, S. (2007). The effects of technological advancement and violent content in video games on players’ feelings of presence, involvement, physiological arousal, and aggression. The Journal of Communication, 57(3), 532–555. doi:10.1111/j.14602466.2007.00356.x
Compilation of References
Iwai, T. (2005). Electroplankton [Computer game]. Indies Zero (Developer), Nintendo.
Jørgensen, K. (2009). A comprehensive study of sound in computer games. Lewiston, NY: Edwin Mellen Press.
Iwamiya, S. (1994). Interaction between auditory and visual processing when listening to music in an audio visual context. Psychomusicology, 13, 133–154.
Jørgensen, K. (2011). Time for new terminology? Diegetic and non-diegetic sounds in computer games revisited . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Jackson, D. (2003). Sonic branding: An introduction. New York: Palgrave/Macmillan. doi:10.1057/9780230503267 Jackson, B. (2009). SFP: The magical world of “Spore”. In Mix Online. Retrieved May 20, 2009, from http:// mixonline.com/post/features/sfp-magical-world-spore James, W. (1884). What is an emotion? Mind, 9(34), 188–205. doi:10.1093/mind/os-IX.34.188 Jansz, J. (2006). The emotional appeal of violent video games. Communication Theory, 15(3), 219–241. doi:10.1111/j.1468-2885.2005.tb00334.x
Jørgensen, A. (2004). Marrying HCI/Usability and computer games: A preliminary look. In Proceedings of the third Nordic conference on Human-computer interaction, NordiCHI ‘04 (pp. 393-396). Tampere, Finland. Jørgensen, K. (2006). On the functional aspects of computer game audio. In Audio Mostly: A Conference on Sound in Games.
Jauss, H. R. (1982). Toward an aesthetic of reception. Minneapolis, MN: University of Minnesota Press.
Jørgensen, K. (2007). ‘What are those grunts and growls over there?’ Computer game audio and player action. Unpublished doctoral dissertation, Copenhagen University, Denmark.
Jegers, K. (2009). Elaborating eight elements of fun: Supporting design of pervasive player enjoyment. ACM Computers in Entertainment, 7(2).
Jørgensen, K. (2007b). On transdiegetic sounds in computer games. Northern lights Vol. 5: Digital aesthetics and communication. Intellect Publications.
Jennett, C., Cox, A. L., Cairns, P., Dhoparee, S., Epps, A., & Tijs, T. (2008). Measuring and defining the experience of immersion in games. International Journal of Human-Computer Studies, 66, 641–661. doi:10.1016/j. ijhcs.2008.04.004
Jørgensen, K. (2008). Audio and Gameplay: An Analysis of PvP Battlegrounds in World of Warcraft. Gamestudies, 8(2).
Jennings, P. (2009). WMG: Professor Paul Jennings. Retrieved December 30, 2009, from http://www2.warwick. ac.uk/fac/sci/wmg/about/people/profiles/paj/. Jensenius, A. R. (2007). ACTION --SOUND: Developing methods and tools to study music-related body movement. Unpublished doctoral dissertation. University of Oslo, Department of Musicology. Jentsch, E. (1906). On the psychology of the Uncanny. Psychiat.-neurol. Wschr., 8(195), 219-21, 226-7. Johnstone, T. (1996). Emotional speech elicited using computer games. In Proceedings of Fourth International Conference on Spoken Language (ICSLP96).
Jot, J. M., & Chaigne, A. (1991). Digital delay networks for designing artificial reverberators. Paper presented at the AES 90th Convention. Preprint 3030. Jumisko-Pyykkö, S., Reiter, U., & Weigel, C. (2007). Produced quality is not perceived quality—A qualitative approach to overall audiovisual quality. In Proceedings of the 3DTV Conference. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. The Behavioral and Brain Sciences, 31, 559–621. Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford: OUP. Juul, J. (2005). Half-real. Video games between real rules and fictional worlds. Cambridge, MA: MIT Press.
445
Compilation of References
Kallmann, H., Woog, A. P., & Westerkamp, H. (2007). The World Soundscape Project. The Canadian Encyclopedia. Retrieved September 31, 2009, from http://thecanadianencyclopedia.com/PrinterFriendly. cfm?Params=U1ARTU0003743 Kalman, R. E., & Bucy, R. S. (1961). New results in linear filtering and prediction problems. Journal of Basic Engineering, 83, 95–108. Kanda, T., Hirano, T., Eaton, D., & Ishiguro, H. (2004). Interactive robots as social partners and peer tutors for children: A field trial. Human-Computer Interaction, 19(1), 61–84. doi:10.1207/s15327051hci1901&2_4 Kaplan, H. I., & Sadock, B. J. (1998). Synopsis of psychiatry. Baltimore, MD: Williams & Wilkins. Károlyi, O. (1999). Introducing music. Location: Penguin. Karplus, K., & Strong, A. (1983). Digital synthesis of plucked strings and drum timbres. Computer Music Journal, 7(4), 43–55. doi:10.2307/3680062 Kassier, R., Zielinski, S., & Rumsey, F. (2003). Computer games and multichannel audio quality part 2—Evaluation of time-variant audio degradation under divided and undivided attention. AES 115th Convention. Preprint 5856. Keller, P., & Stevens, C. (2004). Meaning from environmental sounds: Types of signal-referent relations and their effect on recognizing auditory icons. Journal of Experimental Psychology. Applied, 10(1). doi:10.1037/1076898X.10.1.3 Kelly, G. A. (1955). The psychology of personal constructs. New York: Norton. Kelly, J., & Lochbaum, C. (1962). Speech synthesis. In Proceedings of the Fourth International Congress on Acoustics, 4, 1-4. Retrieved from http://hear.ai.uiuc.edu/ public/Kelly62.pdf Kendall, N. (2009, September 12). Let us play: Games are the future for music. The Times: Playlist, p. 22. Khronos Group. (2009). OpenSL ES specification. The Khronos Group.
446
King, D., Delfabbro, P., & Griffiths, M. (2009). Video game structural characteristics: A new psychological taxonomy. International Journal of Mental Health and Addiction, 8(1), 90–106. doi:10.1007/s11469-009-9206-4 Kirchner, W. K. (1958). Age differences in short-term retention of rapidly changing information. Journal of Experimental Psychology, 55(4), 352–358. doi:10.1037/ h0043688 Kirnberger, J. P. (1767). Der allezeit fertige Polonaisen und Menuetten Komponist. Berlin, Germany: G.L. Winter. Klein, D. J., König, P., & Körding, K. P. (2003). Sparse spectrotemporal coding of sounds. EURASIP Journal on Applied Signal Processing, 7, 659–667. doi:10.1155/ S1110865703303051 Klevjer, R. (2007). What is the avatar? Fiction and embodiment in avatar-based singleplayer computer games. Unpublished doctoral dissertation. University of Bergen, Norway. Klinger, R., & Rudolph, G. (2006). Evolutionary composition of music with learned melody evaluation. In N. Mastorakis & A. Cecchi (Eds.), Proceedings of the 5th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics (pp. 234-239). Venice, Italy: World Scientific and Engeneering Academy and Society. Kojima Productions (Developer). (2008). Metal Gear Solid 4: Guns of the Patriots [Computer game]. Konami. Konami (1998). Dance Dance Revolution. Konami, Disney, Keen, Nintendo. Kramer, G., Walker, B., Bonebright, T., Cook, P., Flowers, J., Miner, N., et al. (1997). Sonification report: Status of the field and research agenda. Retrieved September 31, 2009, from http://www.icad.org/websiteV2.0/References/ nsf.html Kranes, D. (1995). Play grounds. Gambling: Philosophy and policy [Special Issue]. Journal of Gambling Studies, 11(1), 91–102. doi:10.1007/BF02283207
Compilation of References
Kromand, D. (2008). Sound and the diegesis in survivalhorror games. In Proceedings of Audiomostly 2008, 3rd Conference on Interaction With Sound. Krzywinska, T. (2002). Hands-on horror . In King, G., & Krzywinska, T. (Eds.), ScreenPlay: Cinema/Videogames/ Interfaces (pp. 206–223). London: Wallflower. Kubelka, P. (1998). Talk on Unsere Afrika Reise. Presented at The School of Sound, London, England. Kubrick, S. (1968). 2001: A space odyssey. MetroGoldwyn-Mayer. Kuikkaniemi, K., & Kosunen, I. (2007). Progressive system architecture for building emotionally adaptive games. In BRAINPLAY ’07: Playing with Your Brain Workshop at ACE (Advances in Computer Entertainment) 2007. Kungel, R. (2004). Filmmusik für Filmemacher—Die richtige Musik zum besseren Film. Reil, Germany: Mediabook-Verlag. Kunkler-Peck, A. J., & Turvey, M. A. (2000). Hearing shape. Journal of Experimental Psychology. Human Perception and Performance, 26(1), 279–294. doi:10.1037/0096-1523.26.1.279 Kusama, K. (Director). (2005). Aeon flux[Motion picture]. Hollywood, CA: Paramount. Ladouceur, R., & Sévigny, S. (2005). Structural characteristics of video lotteries: Effects of a stopping device on illusion of control and gambling persistence. Journal of Gambling Studies, 21(2), 117–131. doi:10.1007/ s10899-005-3028-5 Lakoff, G. (1987). Women, fire and dangerous things. Chicago: University of Chicago Press. Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: University of Chicago Press. Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh. New York: Basic Books.
Landragin, F., Bellalem, N., & Romary, L. (2001). Visual salience and perceptual grouping in multimodal interactivity. In Proceedings of International Workshop on Information Presentation and Natural Multimodal Dialogue IPNMD. Lane, R. D., Nadel, L., Allen, J. J. B., & Kaszniak, A. W. (2002). The study of emotion from the perspective of cognitive neuroscience . In Lane, R. D., & Nadel, L. (Eds.), Cognitive neuroscience of emotion (Series in affective science) (pp. 3–11). Oxford: OUP. Lang, P. J. (1995). The emotion probe. Studies of motivation and attention. The American Psychologist, 50, 372–385. doi:10.1037/0003-066X.50.5.372 Lang, P. J., Greenwald, M. K., Bradley, M. M., & Hamm, A. O. (1993). Looking at pictures: Affective, facial, visceral, and behavioral reactions. Psychophysiology, 30, 261–273. doi:10.1111/j.1469-8986.1993.tb03352.x Lange, C. G. (1912). The mechanism of the emotions . In Rand, B. (Ed.), The classical psychologists (pp. 672–684). Boston: Houghton Mifflin. Langer, E. J. (1975). The illusion of control. Journal of Personality and Social Psychology, 32, 311–328. doi:10.1037/0022-3514.32.2.311 Larsen, J. T., McGraw, A. P., & Cacioppo, J. T. (2001). Can people feel happy and sad at the same time? Journal of Personality and Social Psychology, 81(4), 684–696. doi:10.1037/0022-3514.81.4.684 Larsen, J. T., McGraw, A. P., Mellers, B. A., & Cacioppo, J. T. (2004). The agony of victory and thrill of defeat: Mixed emotional reactions to disappointing wins and relieving losses. Psychological Science, 15(5), 325–330. doi:10.1111/j.0956-7976.2004.00677.x Larsen, J. T., Norris, C. J., & Cacioppo, J. T. (2003). Effects of positive and negative affect on electromyographic activity over zygomaticus major and corrugator supercilii. Psychophysiology, 40, 776–785. doi:10.1111/14698986.00078
447
Compilation of References
Larsson, P., Västfjäll, D., & Kleiner, M. (2002). Better presence and performance in virtual environments by improved binaural sound rendering. In AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio.
Left 4 dead [Computer game]. (2008). Valve Corporation (Developer). Redwood City, CA: EA Games. Lillian—A natural language library interface and library 2.0 mashup. (2006). Birmingham, UK: Daden Limited.
Larsson, P., Västfjäll, D., & Kleiner, M. (2003). On the quality of experience: A multi-modal approach to perceptual ego-motion and sensed presence in virtual environments. In Proceedings of First ISCA ITRW on Auditory Quality of Systems AQS-2003, 97-100.
Legend of Zelda. (1987). Nintendo.
Lastra, J. (2000). Sound technology and the American cinema: Perception, representation, modernity. New York: Columbia University Press. Laurel, B. (1991). Computers as theatre. Boston, MA: Addison-Wesley. Lavie, N. (2001). Capacity limits in selective attention: Behavioral evidence and implications for neural activity . In Braun, J., & Koch, C. (Eds.), Visual attention and cortical circuits (pp. 49–60). Cambridge, MA: MIT Press. Lecanuet, J. P. (1996). Prenatal auditory experience . In Deliège, I., & Sloboda, J. (Eds.), Musical beginnings: Origins and development of musical competence (pp. 3–36). Oxford, UK: Oxford University Press. Ledoux, J. (1998). The emotional brain: The mysterious underpinnings of emotional life. London, UK: Phoenix. Lee, K. M., Jeong, E. J., Park, N., & Ryu, S. (2007). Effects of networked interactivity in educational games: Mediating effects of social presence. In Proceedings of PRESENCE2007, 10th Annual International Workshop on Presence, 179-186. Lee, K. M., Jin, S. A., Park, N., & Kang, S. (2005). Effects of narrative on feelings of presence in computer/ video games. In Proceedings of the Annual Conference of the International Communication Association (ICA). Leeds, J. (2001). The power of sound. Rochester, VT: Inner Traditions. Lefebvre, H. (2004). Rhythmanalysis: Space, time and everyday life. Continuum.
448
Lego rock band. [Video game], (2009). Harmonix (Developer), New York: MTV Games. Li, S., & Knudsen, J. (2005). Beginning J2METM platform: From novice to professional (3rd ed.). City, CA: Apress Press Inc. Liljedahl, M., Papworth, N., & Lindberg, S. (2007). Beowulf: An audio mostly game. Proceedings of the International Conference on Advances in Computer Entertainment Technology, 2007, 200–203. Liljedahl, M. (2011). Sound for fantasy and freedom . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Liljedahl, M., Lindberg, S., & Berg, J. (2005). Digiwall: An interactive climbing wall. Proceedings of theACM SIGCHI International Conference on Advances in Computer Entertainment Technology, 2005, 225-228. Lima, E. (2005). The devil’s in the details: A look at Doom3’s antimusic. In Music4games. Retrieved June 12, 2009, from http://www.music4games.net/Features_Display.aspx?id=70 Lincoln, Y. S., & Guba, E. D. (1985). Naturalistic inquiry. Thousand Oaks, CA: Sage Publications, Inc. Lissa, Z. (1965). Ästhetik der Filmmusik. Leipzig, Germany: Henschel. Little BigPlanet. (2008). Sony Computer Entertainment. Livingstone, C., Woolley, R., Zazryn, T., Bakacs, L., & Shami, R. (2008). The relevance and role of gaming machine games and game features on the play of problem gamblers. Adelaide: Independent Gambling Authority of South Australia.
Compilation of References
Livingstone, S. R. (2008). Changing musical emotion through score and performance with a compositional rule system. Unpublished doctoral dissertation. The University of Queensland, Brisbane, Australia.
Lykken, D. T., & Venables, P. H. (1971). Direct measurement of skin conductance: A proposal for standardization. Psychophysiology, 8(5), 656–672. doi:10.1111/j.1469-8986.1971.tb00501.x
Livingstone, S. R., & Brown, A. R. (2005). Dynamic response: Real-time adaptation for music emotion. In Proceedings of the Second Australasian Conference on Interactive Entertainment.
Lynch, D. (1977). Eraserhead. American Film Institute.
LoBrutto, V. (1994). Sound-on-film: Interviews with creators of film sound. Westport, CT: Praeger. Loftus, G. R., & Loftus, E. F. (1983). Mind at play. New York: Basic Books. Logan, B. (2002). Content-based playlist generation: Exploratory experiments, In ISMIR2002, 3rd International Conference on Musical Information (ISMIR). Loki & Creative. (2009). (1.1). Loki Software. Open, AL: Creative Technology. Lombard, M., & Ditton, T. (1997). At the heart of it all: The concept of presence. Journal of Computer-Mediated Communication, 3(2). Lorenz, E. (1993). The essence of chaos. Seattle, WA: University of Washington Press. doi:10.4324/9780203214589 Löthe, M. (2003). Ein wissensbasiertes Verfahren zur Komposition von frühklassischen Menuetten. Unpublished doctoral dissertation. University of Stuttgart, Germany. Love. (forthcoming). Eskil Steenberg. Lucas, G. (1971). THX 1138. Warner Bros. Pictures. Lucas, G. (Director). (1977). Star Wars [Motion picture]. Los Angeles, CA: 20th Century Fox. LucasArts. (1997). Monkey Island 3: The Curse of Monkey Island. LucasArts. Lumbreras, M., & Sánchez, J. (1999). Interactive 3D sound hyperstories for blind children. In Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit (pp. 318-325). Pittsburgh, PA: ACM.
Lynch, D. (2003). Action and reaction . In Sider, L. (Ed.), Soundscape: The School of Sound lectures 1998-2001 (pp. 49–53). London: Wallflower Press. Lynch, D. (1990-1991). Twin Peaks. Lynch/Frost Productions. MacDorman, K. F., Green, R. D., Ho, C. C., & Koch, C. T. (2009). Too real for comfort? Uncanny responses to computer generated faces. Computers in Human Behavior, 25, 695–710. doi:10.1016/j.chb.2008.12.026 MacDorman, K. F., & Ishiguro, H. (2006). The uncanny advantage of using androids in cognitive and social science research. Interaction Studies: Social Behaviour and Communication in Biological and Artificial Systems, 7(3), 297–337. doi:10.1075/is.7.3.03mac MacDorman, K. F. (2006). Subjective ratings of robot video clips for human likeness, familiarity, and eeriness: An exploration of the Uncanny Valley. ICCS/CogSci-2006 Long Symposium: Toward Social Mechanisms of Android Science. MacKenzie, I. S., & Ware, C. (1993). Lag as a determinant of human performance in interactive systems. In Proceedings of the ACM Conference on Human Factors in Computing Systems – INTERCHI’93, 488-493. MacLaran, A. (2003). Making space: Property development and urban planning. London: Hodder Arnold. Mahlke, S. (2007). Marc Hassenzahl on user experience. HOT Topics, 6(2). Retrieved September 31, 2009, from http://hot.carleton.ca/hot-topics/articles/hassenzahl-onuser-experience/
449
Compilation of References
Mahlke, S., & Thüring, M. (2007). Studying antecedents of emotional experiences in interactive contexts. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 915-918). San Jose, CA: ACM Press.
Matsui, D., Minato, T., MacDorman, K. F., & Ishiguro, H. (2005). Generating natural motion in an android by mapping human motion. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 1089-1096.
Mandryk, R. L., & Atkins, M. S. (2007). A fuzzy physiological approach for continuously modeling emotion during interaction with play environments. International Journal of Human-Computer Studies, 65(4), 329–347. doi:10.1016/j.ijhcs.2006.11.011
Mattilaa, A. S., & Wirtz, J. (2001). Congruency of scent and music as a driver of in-store evaluations and behavior. Journal of Retailing, 77, 273–289. doi:10.1016/S00224359(01)00042-2
Mandryk, R. L. (2008). Physiological measures for game evaluation . In Isbister, K., & Schaffer, N. (Eds.), Game usability: Advice from the experts for advancing the player experience (pp. 207–235). Burlington, MA: Elsevier. Manning, P. (1992). Erving Goffman and modern sociology. Standord, CA: Stanford University Press. Manovich, L. (2001). The language of new media. Cambridge, MA: MIT Press. Manz, J., & Winter, J. (Eds.). (1976). Baukastensätze zu Weisen des Evangelischen Kirchengesangbuches. Berlin: Evangelische Verlagsanstalt. Mark, F., Bear, B. W. C., & Paradiso, M. A. (2007). Neuroscience —Exploring the brain (3rd ed.). City, ST/ Country: Lippincott Williams & Wilkins. Marks, A., & Novak, J. (2009). Game development essentials: Game audio development. Florence, KY: Delmar Cengage Learning. Marks, A. (2001). The complete guide to game audio. Lawrence, KS: CMP Books. Marks, A. (2009). The complete guide to game audio: For composers, musicians, sound designers, game developers (2nd ed.). Location: Elsevier Press. Marmurek, H. H. C., Finlay, K., Kanetkar, V., & Londerville, J. (2007). The influence of music on estimates of at-risk gambling intentions: An analysis by casino design. International Gambling Studies, 7(1), 113–122. doi:10.1080/14459790601158002
450
Maturana, H. R., & Varela, F. G. (1980). Autopoiesis: The organization of the living . In Maturana, H. R., & Varela, F. G. (Eds.), Autopoiesis and Cognition. Dordrecht, Netherlands: Reidel. Maurizio, V., & Samuele, S. (2007). Low-cost accelerometers for physics experiments. European Journal of Physics, 28, 781–787. doi:10.1088/0143-0807/28/5/001 Max Payne. (2001). Rockstar Games. May, R. (1977). The meaning of anxiety (revised ed.). New York: Norton. Mazzola, G., Göller, S., & Müller, S. (2002). The topos of music: Geometric logic of concepts, theory, and performance. Zurich: Birkhäuser Verlag. McAdams, S. E., & Bigand, E. (Eds.). (1992). Thinking in sound: The cognitive psychology of human audition. New York: Clarendon Press. Oxford: University Press. McCraty, R., Barrios-Choplin, B., Atkinson, M., & Tomasino, D. (1998). The effects of different types of music on mood, tension and mental clarity. Alternative Therapies in Health and Medicine, 4, 75–84. McCuskey, M. (2003). Beginning game audio programming. Boston, MA: Premier Press. McDonald, G. (2008). A brief timeline of video game music. Retrieved July 8, 2009, from http://www.gamespot. com/gamespot/features/video/vg_music/. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5568), 746–748. doi:10.1038/264746a0
Compilation of References
McMahan, A. (2003). Immersion, engagement, and presence: A new method for analyzing 3-D video games . In Wolf, M. J. P., & Perron, B. (Eds.), The video game theory reader (pp. 67–87). New York: Routledge.
Meyer, J. (2009). Acoustics and the performance of music: Manual for acousticians, audio engineers, musicians, architects and musical instrument makers (5th ed.). New York: Springer.
McTiernan, J. (1987). Predator. Amercent Films.
Microsoft. (2009). [Computer software] [. Microsoft Corporation.]. Direct, X, 11.
Meehan, M., Razzaque, S., Whitton, M. C., & Brooks, F. P., Jr. (2003). Effect of latency on presence in stressful virtual environments. In Proceedings of IEEE Virtual Reality, 141-148. Mega Man. (1993). Capcom. Capcom Entertainment. Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage, 28(1), 175–184. doi:10.1016/j.neuroimage.2005.05.053 Menzies, D. (2002). Scene management for modelled audio objects in interactive worlds. In Nakatsu & H. Kawahara (Eds.), Proceedings of the 8th International Conference on Auditory Display. Kyoto, Japan: ATR. Menzies, D. (2007). Physical audio for virtual environments, Phya in review. In W. L. Martens (ed.), Proceedings of the 13th International Conference on Auditory Display (pp.197-202). Montreal, Canada: McGill University. Menzies, D. (2008). Virtual intimacy: Phya as an instrument. In Proceedings of the 8th International Conference on New Interfaces for Musical Expression NIME08. Retrieved from http://www.zenprobe.com/dylan/pubs/ menzies08_virtualIntimacy.pdf Menzies, D. (2009). Phya and VFoley, physically motivated audio for virtual environments. In 35th AES Conference on Audio for Games. Retrieved from http:// www.aes.org/e-lib/browse.cfm?elib=15171 Metal Gear Solid. (1998). Konami Japan. Konami Computer Entertainment. Metz, C. (1980/1985). Aural objects . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. Columbia: Columbia University Press.
Miller, D. J., & Robertson, D. P. (2009). Using a games console in the primary classroom: Effects of ‘Brain Training’ programme on computation and self-esteem. British Journal of Educational Technology, 41(2), 242–255. doi:10.1111/j.1467-8535.2008.00918.x Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Originally published in The Psychological Review (1956), 63, 81-97. (Reproduced, with the author’s permission, by Stephen Malinowski). Retrieved March 10, 2009, from http://www.musanim.com/miller1956/ Minato, T., Shimda, M., Ishiguro, H., & Itakura, S. (2004). Development of an android robot for studying humanrobot interaction. In R. Orchard, C. Yang & M. Ali (Eds.), Innovations in applied artificial intelligence, 424-434. Miranda, E. R., & Biles, J. A. (Eds.). (2007). Evolutionary computer music (1st ed.). USA: Springer. doi:10.1007/978-1-84628-600-1 Miranda, E. R. (2002). Towards the cutting edge: AI, supercomputing and evolutionary systems. Computer Sound Design, 157-192. Elsevier. Moeck, T., Bonneel, N., Tsingos, N., Drettakis, G., ViaudDelmon, I., & Alloza, D. (2007). Progressive perceptual audio rendering of complex scenes. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games (ACM SIGGRAPH),189-196. Moffat, D. (1980). Personality parameters and programs . In Trappl, R., & Petta, P. (Eds.), Creating personalities for synthetic actors (pp. 120–165). Berlin: Springer. Moore, B. C. J. (Ed.). (1995). Hearing: Handbook of perception and cognition (2nd ed.). New York: Academic Press.
451
Compilation of References
Moore, B. C. J. (2003). An introduction to the psychology of hearing (5th ed.). New York: Academic Press. Morgan, S. (2009). Dynamic game audio ambience: bringing Prototype’s New York City to life. Gamasutra. Retrieved May 8, 2009, from http://www.gamasutra.com/ view/feature/4043/ Mori, M. (1970/2005). The Uncanny Valley. In K. F. MacDormand & T. Minato (Trans.) . Energy, 7(4), 33–35. Moss, W., Yeh, H. (2010) Automatic sound synthesis from fluid simulation. ACM Trans. On Graphics (SIGGRAPH 2010). Mozart, W. A. (1787). Musikalisches Würfelspiel: Anleitung so viel Walzer oder Schleifer mit zwei Würfeln zu componieren ohne musikalisch zu seyn noch von der Composition etwas zu verstehen. Köchel Catalog of Mozart’s Work KV1 Appendix 294d or KV6 516f. Mr. Do! (1983). CBS Electronics. Mullan, E. (2011). Physical modelling for sound synthesis . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Murphy, D., & Neff, F. (2011). Spatial sound for computer games and virtual reality . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global. Murphy, D. (1999). Spatial sound description in virtual environments. In Proceedings of the Cambridge Music Processing Colloquium. Murphy, D., & Pitt, I. (2001). Spatial sound enhancing virtual storytelling. In Proceedings of the International Conference ICVS, Virtual Storytelling Using Virtual Reality Technologies for Storytelling (pp. 20-29) Berlin: Springer. Murphy, D., & Rumsey, F. (2001). A scalable spatial sound rendering system. In Proceedings of the 110th AES Convention. Murray, J. (1997). Hamlet on the holodeck: The future of narrative in cyberspace. Cambridge, MA: MIT Press. Muzak Corporation. (n.d.). Why Muzak. Retrieved October 5, 2009, from http://music.muzak.com/why_muzak. Myst. (1993). Brøderbund.
Mullan, E. (2009). Driving sound synthesis from a physics engine. In Charlotte Kobert (Ed.), Proceedings of the IEEE Games Innovation Conference 2009 (pp.256-264). New York: IEEE.
Nacke, L. E., Grimshaw, M. N., & Lindley, C. A. (2010). More than a feeling: Measurement of sonic user experience and psychophysiology in a first-person shooter. Interacting with Computers, 22(5), 336–343. doi:10.1016/j. intcom.2010.04.005
Murch, W. (1995). Sound design: The dancing shadow . In Boorman, J., Luddy, T., Thomson, D., & Donohue, W. (Eds.), Projections 4: Film-makers on film-making (pp. 237–251). London: Faber and Faber.
Nacke, L., & Grimshaw, M. (2011). Player-game interaction through affective sound . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Murch, W. (1998). Dense clarity – Clear density. Retrieved March 10, 2009, from http://www.ps1.org/cut/ volume/murch.html
Nacke, L. E. (2009). Affective ludology: Scientific measurement of user experience in interactive entertainment. Unpublished doctoral dissertation. Blekinge Institute of Technology, Karlskrona, Sweden. Retrieved January 1, 2010, from http://affectiveludology.acagamic.com.
Murphy, D. (1999). A review of spatial sound in the Java 3D API specification. Institute of Sound Recording, University of Surrey.
452
Compilation of References
Nacke, L., & Lindley, C. A. (2008). Flow and immersion in first-person shooters: Measuring the player’s gameplay experience. In Proceedings of the 2008 Conference on Future Play: Research, Play, Share (pp. 81-88). New York: ACM. Nacke, L., Lindley, C., & Stellmach, S. (2008). Log who’s playing: Psychophysiological game analysis made easy through event logging. In P. Markopoulos, B. Ruyter, W. IJsselsteijn, & D. Rowland (Eds.), Proceedings of Fun and Games, Second International Conference (pp. 150157). Berlin: Springer. Nakamura, J., & Csíkszentmihályi, M. (2002). The concept of flow . In Snyder, C. R., & Lopez, S. J. (Eds.), Handbook of positive psychology (pp. 89–105). New York: Oxford University Press. Namco (2003). Donkey Konga [Computer game]. Namco (Developer), Nintendo.
Nordahl, R. (2005). Self-induced footsteps sounds in virtual reality: Latency, recognition, quality and presence. In Proceedings of PRESENCE 2005, 8th Annual International Workshop on Presence, 353-354. Norman, D. (2004). Emotional design: Why we love (or hate) everyday things. New York: Basic Books. Norman, D. (2002). Emotion & design: attractive things work better. interactions, 9(4), 36-42. O’Brien, J. F., Cook, P. R., & Essl, G. (2001). Synthesizing sounds from physically based motion. In P. Lynn (Ed.), Proceedings of SIGGRAPH ’01: The 28th annual conference on Computer graphics and interactive techniques (pp. 529-536). New York: ACM. O’Callaghan, C. (2009 Summer). Auditory perception. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy, Retrieved January 24, 2010, from http://plato.stanford. edu/archives/sum2009/entries/perception-auditory/.
NanaOn-Sha (1996). PaRappa the Rapper [Computer game]. NanaOn-Sha (Developer), Sony.
Odin, R. (2000). De la fiction. Bruxelle: De Boeck.
NanaOn-Sha (1999). Vib-Ribbon [Computer game]. NanaOn-Sha (Developer), Sony.
Oguro, C. (2009). The greatest Easter eggs in gaming. Gamespot. Retrieved October 5, 2009, from http://www. gamespot.com/features/6131572/index.html.
Napolitano, J. (2008). Dead Space sound design: In space no one can hear intern screams. They are dead. (Interview). Original Sound Version. Retrieved from http://www. originalsoundversion.com/?p=693. Naughty Dog (Developer). (2007). Uncharted: Drake’s Fortune [Computer game]. Sony Computer Entertainment. Neale, S. (2000). Genre and Hollywood. New York: Routledge. Neitzel, B. (2000). Gespielte Geschichten. Struktur- und prozessanalytische Untersuchungen der Narrativität von Videospielen. Unpublished doctoral dissertation, University of Weimar, Germany. Oink! (1983). Activision. Nettle, D. (2006). Happiness: The science behind your smile. Oxford: OUP.
Öhman, A., Flykt, A., & Esteves, F. (2001). Emotion drives attention: Detecting the snake in the grass. Journal of Experimental Psychology. General, 130(3), 466–478. doi:10.1037/0096-3445.130.3.466 Ong, W. (1982/1990). Orality and literacy: The technologizing of the word (L. Fyhr, G.D. Hansson & L. Perme Swedish Trans.). Göteborg, Sweden: Anthropos. Otani, M., & Ise, S. (2003). A fast calculation method of the head-related transfer functions for multiple source points based on the boundary element method. Acoustical Science and Technology, 24(5), 259–266. doi:10.1250/ ast.24.259 Owen, D. (2006, April 10). The soundtrack of your life: Muzak in the realm of retail theatre. The New Yorker. Retrieved October 5, 2009, from http://www.newyorker. com/archive/2006/04/10/060410fa_fact.
453
Compilation of References
Paavola, M. K. E., & Page, J. (2005). 3D audio for mobile devices via Java. In Proceedings of the AES 118th Convention.
Peck, N. (2001). Beyond the library: Applying film postproduction techniques to game sound design. In Proceedings of Game Developers Conference. San Jose, CA.
Pachet, F., & Roy, P. (2001). Musical harmonization with constraints: A survey. Constraints Journal.
Peck, N. (2007, September). Unpublished Presentation. CoFesta/TGS, Tokyo, Japan.
Pac-Man. (1980). Namco.
Pedersini, F., Sarti, A., & Tubaro, S. (2000). Object-based sound synthesis for virtual environments using musical acoustics. IEEE Signal Processing Magazine, 17(6), 37–51. doi:10.1109/79.888863
Pai, D. K., Doel, K. d., James, D. L., Lang, J., Lloyd, J. E., Richmond, J. L., & Yau, S. H. (2001). Scanning physical interaction behaviour of 3D objects. In P. Lynn (Ed.), Proceedings of SIGGRAPH ’01: The 28th annual conference on Computer graphics and interactive techniques (pp. 87-96). New York: ACM.
Perron, B. (2006). Silent hill: Il motore del terrore. Milan: Costa & Nolan.
Panksepp, J. (2004). Affective neuroscience: the foundations of human and animal emotions. Oxford: Oxford University Press.
Perron, B. (2004). Sign of a threat: The effects of warning systems in survival horror games. In Proceedings of the Fourth International COSIGN (Computational Semiotics for Games and New Media) 2004 Conference.
Papadopoulos, G., & Wiggins, G. (1999). AI methods for algorithmic composition: A survey, a critical view and future prospects. In AISB Symposium on Musical Creativity. Edinburgh, Scotland.
Perron, B. (2005a). A cognitive psychological approach to gameplay emotions. In Proceedings of the Second International DiGRA (Digital Games Research Association) 2005 Conference.
PaRappa the rapper. [Video game], (1996). Sony Computer Entertainment.
Perron, B. (2005b). Coming to play at frightening yourself: Welcome to the world of horror video games. In Proceeding of the Aesthetics of Play conference.
Parke, J., & Griffiths, M. (2006). The psychology of the fruit machine: The role of structural characteristics (Revisited). International Journal of Mental Health and Addiction, 4, 151–179. doi:10.1007/s11469-006-9014-z Parker, J. R., & Heerema, J. (2008). Audio Interaction in Computer Mediated Games. International Journal of Computer Games Technology, 2008, 1–8. .doi:10.1155/2008/178923 Parker, P. (2003). Filling the gaps . In Sider, L. (Ed.), Soundscape: The School of Sound lectures 1998-2001 (pp. 184–194). London: Wallflower Press. Pashler, H. E. (1999). The psychology of attention. Cambridge, MA: MIT Press. Paul, L., & Bridgett, R. (2006). Establishing an aesthetic in next generation sound design. Gamasutra. Retrieved May 25, 2009, from http://www.gamasutra.com/view/ feature/2733/
454
Phase. [Video game], (2007). Harmonix Music Systems. Phillips, N. (2009). From films to games, from analog to digital: Two revolutions in multi-media! Retrieved July 8, 2009, from http://www.filmsound.org/game-audio/ film_game_parallels.htm. Picard, R. W. (1997). Affective computing. Cambridge, MA: MIT Press. Pillay, H. K. (2002). An investigation of cognitive processes engaged in by recreational computer game players: Implications for skills of the future. Journal of Research on Technology in Education, 34(3), 336–350. Pitzen, L. J., & Rauscher, F. H. (1998, May). Choosing music, not style of music, reduces stress and improves task performance. Poster presented at the American Psychological Society, Washington, DC.
Compilation of References
Planescape: Torment. (2005). Black Island Studios. Interplay. Plantec, P. (2007). Crossing the Great Uncanny Valley. In Animation World Network. Retrieved August 21, 2010, from http://www.awn.com/articles/production/crossinggreat-uncanny-valley/page/1%2C1. Plantec, P. (2008). Image Metrics attempts to leap the Uncanny Valley. In The Digital Eye. Retrieved April 6, 2009, from http://vfxworld.com/?atype=articles&id=37 23&page=1. Platt, J. C., Burges, C. J. C., Swenson, S., Weare, C., & Zheng, A. (2002). Learning a Gaussian process prior for automatically generating music playlists. Advances in Neural Information Processing Systems, 14, 1425–1432. Plomp, R., & Mimpen, A. M. (1968). The ear as a frequency analyzer. The Journal of the Acoustical Society of America, 36, 1628–1636. doi:10.1121/1.1919256 Plutchik, R. (1984). Emotions: A general psychoevolutionary theory. Hillsdale, NJ: Erlbaum. Plutchik, R. (2001). The nature of emotions. American Scientist, 89(4), 344–350. Polaine, A. (2005). The flow principle in interactivity. In Proceedings of the Second Australasian Conference on interactive Entertainment. Pollack, I. (1952). The information of elementary auditory displays. The Journal of the Acoustical Society of America, 24, 745–749. doi:10.1121/1.1906969
Pong. (1972). Atari Inc. Posner, J., Russell, J. A., Gerber, A., Gorman, D., Colibazzi, T., & Yu, S. (2009). The neurophysiological bases of emotion: An fMRI study of the affective circumplex using emotion-denoting words. Human Brain Mapping, 30(3), 883–895. doi:10.1002/hbm.20553 Posner, J., Russell, J. A., & Peterson, B. S. (2005). The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and Psychopathology, 17, 715–734. doi:10.1017/S0954579405050340 Pozzati, G. (2009). Infinite suite: Computers and musical form. In G. Scavone, V. Verfaille & A. da Silva (Eds.), Proceedings of the International Computer Music Conference (ICMC) (pp. 319-322). Montreal, Canada: International Computer Music Association, McGill University. Prey. (2005). 2K Games/3D Realms. Primerose. (2009). Jason Rohrer. Prince, R. (1996). Tricks and techniques for sound effect design. CGDC. Retrieved October 10, 2008, from http:// www.gamasutra.com/features/sound_and_music/081997/ sound_effect.htm Productions, K. S. K. (n.d.). Cinematic & Muzak. Retrieved October 20, 2009, from http://www.kskproductions.nl/en/ services/cinematic-a-muzak. Prototype. (2009). Activision.
Pollack, I. (1953). The information of elementary auditory displays II. The Journal of the Acoustical Society of America, 25, 765–769. doi:10.1121/1.1907173
Przybylski, A. K., Ryan, R. M., & Rigby, S. C. (2009). The motivating role of violence in video games. Personality and Social Psychology Bulletin, 35(2), 243–259. doi:10.1177/0146167208327216
Pollick, F. E. (in press). In search of the Uncanny Valley . In Grammer, K., & Juett, A. (Eds.), Analog communication: Evolution, brain mechanisms, dynamics, simulation. Cambridge, MA: MIT Press.
Pudovkin, V. (1985). Aynchronism as a principle of sound film . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. (Original work published 1929)
Polotti, P., Papetti, S., Rocchesso, D., & Delle, S. (Eds.). (2001). The sounding object (Sob project). Verona: University of Verona.
Pulkki, V. (2001). Spatial sound generation and perception by amplitude panning techniques. Unpublished doctoral dissertation. Helsinki University of Technology, Finland.
455
Compilation of References
Pulman, A. (2007). Investigating the potential of Nintendo DS Lite handheld gaming consoles and Dr. Kawashima’s Brain Training software as a study support tool in numeracy and mental arithmetic. JISC TechDis HEAT Scheme Round 1 Project Reports. Retrieved June 6, 2009, from http:// www.techdis.ac.uk/index.php?p=2_1_7_9.
Recommendation ITU-T P.911. (1998/1999). Subjective audiovisual quality assessment methods for multimedia applications. Geneva: International Telecommunication Union.
Quilitch, H. R., & Risley, T. R. (1973). The effects of play materials on social play. Journal of Applied Behavior Analysis, 6(4), 573–578. doi:10.1901/jaba.1973.6-573
Reeves, B., & Voelker, D. (1993). Effects of audio-video asynchrony on viewer’s memory, evaluation of content and detection ability. (Research Report prepared for Pixel Instruments, CA). Palo Alto, CA: Standford University, Department of Communication.
Raghuvanshi, N., Lauterbach, C., Chandak, A., Manocha, D., & Lin, M. C. (2007). Real-time sound synthesis and propagation for games. Communications of the ACM, 50(7), 67–73. doi:10.1145/1272516.1272541
Reid, J., Geelhoed, E., Hull, R., Cater, K., & Clayton, B. (2005). Parallel worlds: Immersion in location-based experiences. In CHI ‘05 Extended Abstracts on Human Factors in Computing Systems.
Raghuvanshi, N., & Lin, M. C. (2006). Interactive sound synthesis for large scale environments. In Proceedings of the 2006 symposium on Interactive 3D graphics and games (pp. 101-108). New York: ACM.
Reiter, U. (2011). Perceived quality in game audio . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Rand, A. (1971). Art and cognition. The Romantic Manifesto. 78. Signet.
Reiter, U., & Jumisko-Pyykkö, S. (2007). Watch, press and catch—Impact of divided attention on requirements of audiovisual quality . In Jacko, J. (Ed.), Human-Computer Interaction, Part III, HCI 2007 (pp. 943–952). Berlin: Springer Verlag.
Rault, J. B., Emerit, M., Warusfel, O., & Jot, J. M. (1998). Audio rendering of virtual room acoustics and perceptual description of the auditory scene. TCI/SC29/WG11. Ravaja, N. (2004). Contributions of psychophysiology to media research: Review and recommendations. Media Psychology, 6(2), 193–235. doi:10.1207/s1532785xmep0602_4 Ravaja, N., Turpeinen, M., Saari, T., Puttonen, S., & Keltikangas-Järvinen, L. (2008). The psychophysiology of James Bond: Phasic emotional responses to violent video game events. Emotion (Washington, D.C.), 8(1), 114–120. doi:10.1037/1528-3542.8.1.114 Ravaja, N., Saari, T., Laarni, J., Kallinen, K., Salminen, M., Holopainen, J., & Järvinen, A. (2005). The psychophysiology of video gaming: Phasic emotional responses to game events. In Proceedings of DiGRA 2005 Conference: Changing Views - Worlds in Play.
456
Reiter, U. (2009). Bimodal audiovisual perception in interactive application systems of moderate complexity. Unpublished doctoral dissertation. TU Ilmenau, Germany. Reiter, U., & Weitzel, M. (2007). Influence of interaction on perceived quality in audiovisual applications: Evaluation of cross-modal influence. In Proceedings of 13th International Conference on Auditory Displays (ICAD). Resident evil 3: Nemesis. [Computer game]. (1999). Capcom (Developer). Sunnyvale: Capcom USA. Resident evil 4. [Computer game]. (2004). Capcom Production Studio 4 (Developer). Sunnyvale: Capcom USA. Resident evil 5. [Computer game]. (2009). Capcom Production Studio 4 (Developer). Sunnyvale: Capcom USA. Cardinal, S. (1994). Occurrences sonores et espace filmique. Unpublished master’s thesis. University of Montréal, Montréal.
Compilation of References
Resident evil. [Computer game]. (1996). Capcom (Developer). Sunnyvale: Capcom USA. Resident evil. [Computer game]. (2002). Capcom (Developer). Sunnyvale: Capcom USA. Reynolds, G., Barry, D., Burke, T., & Coyle, E. (2007). Towards a personal automatic music playlist generation algorithm: The need for contextual information. In Proceedings of the Audio Mostly Conference on Interaction with Sound. Rez. [Video game], (2001). Sega (Developer, Dreamcast), Sony Computer Entertainment Europe (Developer, PlayStation 2). Rhodes, L. A., David, D. C., & Combs, A. L. (1988). Absorption and enjoyment of music. Perceptual and Motor Skills, 66, 737–738. Richards, J. (2008, August 18). Lifelike animation heralds new era for computer games. The Times Online. Retrieved April 7, 2009, from, http://technology.timesonline.co.uk/ tol/news/tech_and_web/article4557935.ece. Riessman, D. C. K. (1993). Narrative analysis (1st ed.). Los Angeles: Sage.
Röber, N., Kaminski, U., & Masuch, M. (2007). Ray acoustics using computer graphics technology. In Proceedings of the 10th International Conference on Digital Audio Effects (DAFx-07) (pp. 117-124). Bordeaux, France: LaBRI University Bordeaux. Rocchesso, D., Avanzini, A., Rath, M., Bresin, R., & Serafin, S. (2004). Contact Sounds for Continuous Feedback. In Proceedings of International Workshop on Interactive Sonification. Rock Band. (2008). Harmonix. MTV Games. Rock band. (2005-2007). Harmonix Music Systems. Roddenberry, G. (1966-1969). Star trek. Paramount Television. Roeber, N., Deutschmann, E. C., & Masuch, M. (2006). Authoring of 3D virtual auditory environments. In Proceedings of the First International AudioMostly Conference (pp. 15-21). Rohner, S. J., & Miller, R. (1980). Degrees of familiar and affective music and their effects on state anxiety. Journal of Music Therapy, 17, 2–15.
Ripken, J. (2009, October 19). Game synchronisation: A view from artist development. Paper presented at the Music and Creative Industries Conference 2009, Manchester, UK.
Roque, L. (2005). A sociotechnical conjecture about the context and development of multiplayer online game experiences. In Proceedings of DiGRA 2005 Conference: Changing Views – Worlds in Play. Vancouver, Canada.
Rivlin, G. (2004, May 9). The tug of the newfangled slot machines. New York Times. Retrieved July 15, 2009, from http://www.nytimes.com/2004/05/09/ magazine/09SLOTS.html.
Roux-Girard, G. (2011). Listening to fear: A study of sound in horror computer games . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey, PA: IGI Global.
Roads, C. (1996). The computer music tutorial. Cambridge, MA: MIT Press.
Roux-Girard, G. (2009). Plunged alone into darkness: Evolution in the staging of fear in the Alone in the Dark series . In Perron, B. (Ed.), Horror video games: Essays on the fusion of fear and play (pp. 145–167). Jefferson, NC: McFarland.
Röber, N. (2008). Interacting with sound: Explorations beyond the frontiers of 3D virtual auditory environments. Munich, Germany: Dr. Hut. Röber, N., & Masuch, M. (2005). Leaving the screen: New perspectives in audio-only gaming. In Proceedings of 11th International Conference on Auditory Display (ICAD).
Ruiz, P. (1969). A technique for simulating the vibrations of strings with a digital computer. Unpublished master’s thesis. University of Illinois, Urbana, IL.
457
Compilation of References
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. doi:10.1037/h0077714 Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110(1), 145–172. doi:10.1037/0033-295X.110.1.145 Russolo, L. (1913). Russolo: The art of noises. Retrieved December 30, 2009, from http://120years.net/machines/ futurist/art_of_noise.html.
Schafer, R. M. (1977). The tuning of the world. Toronto: McClelland and Stewart. Schafer, R. M. (1977). The soundscape: Our sonic environment and the tuning of the world. New York: Destiny Books. Schafer, R. M. (1973). The music of the environment. Cultures, 1973(1). Schell, J. (2008). The art of game design: A book of lenses. London: Morgan Kaufmann.
Ryan, R., Rigby, C., & Przybylski, A. (2006). The motivational pull of video games: A self-determination theory approach. Motivation and Emotion, 30(4), 344–360. doi:10.1007/s11031-006-9051-8
Schlosberg, H. (1952). The description of facial expressions in terms of two dimensions. Journal of Experimental Psychology, 44(4), 229–237. doi:10.1037/h0055778
Sakaguchi, H. (Director). (2001). Final fantasy [Motion picture]. Los Angeles: Columbia.
Schmidt, A., & Winterhalter, C. (2004). User context aware delivery of e-learning material: Approach and architecture. Journal of Universal Computer Science, 10(1), 38–46.
Salen, K., & Zimmermann, E. (2004). Rules of play: Game design fundamentals. Cambridge, MA: MIT Press. Samorost 1. (2003). Amanita Design. Samorost 2. (2005). Amanita Design. Satoshi Yairi, Y. I., & Suzuki, Y. (2008). Individualization of Head-Related Transfer Functions based on subjective evaluation. In Proceedings of the 14th International Conference on Auditory Displays. Saunders, K., & Novak, J. (2006). Game development essentials: Game interface design. Stamford, CT: Cengage Learning. Scarface: The world Is yours. (2006). Vivendi. Schachter, S., & Singer, J. (1962). Cognitive, social, and physiological determinants of emotional state. Psychological Review, 69, 379–399. doi:10.1037/h0046234 Schachter, S. (1964). The interaction of cognitive and physiological determinants of emotional state . In Berkowitz, L. (Ed.), Advances in experimental social psychology (Vol. 1, pp. 49–80). New York: Academic Press. doi:10.1016/S0065-2601(08)60048-9 Schaeffer, P. (1966). Traité des objets musicaux. Paris: Seuil.
458
Schneider, E., Wang, Y., & Yang, S. (2007). Exploring the Uncanny Valley with Japanese video game characters. In Proceedings of Situated Play, DiGRA 2007 Conference, 546-549. Schottstaedt, W. (1989). Automatic counterpoint . In Mathews, M., & Pierce, J. (Eds.), Current directions in computer music research. Cambridge, MA: MIT Press. Schroeder, M. R. (1962). Natural sounding artificial reverberation. Journal of the Audio Engineering Society. Audio Engineering Society, 10(3), 219–223. Schroeder, M. R. (1970). Digital simulation of sound transmission in reverberant spaces (part 1). The Journal of the Acoustical Society of America, 47(2), 424–431. doi:10.1121/1.1911541 Schull, N. D. (2005). Digital gambling: The coincidence of desire and design. The Annals of the American Academy of Political and Social Science, 597, 65–81. doi:10.1177/0002716204270435 Scott, R. (1979). Alien. Twentieth Century Fox. Scott, T. (Director). (1986). Top Gun [Motion picture]. Hollywood, CA: Paramount Pictures.
Compilation of References
Seah, M., & Cairns, P. (2008). From immersion to addiction in videogames. In [New York: ACM.]. Proceedings of BCS HCI, 2008, 55–63. Seeking Alpha, “The Video Game Industry: An $18 Billion Entertainment Juggernaut” August 05, 2008 http://seekingalpha.com/article/89124-the-video-game-industry-an18-billion-entertainment-juggernaut. Sega (2001). Rez [Computer game]. Sega. Seitter, W. (2007). Das Spektrum der menschlichen Schallproduktionen. In H. Schulze & C. Wulf (Eds.), Paragrana, Internationale Zeitschrift für Historische Anthropologie, 16(2), 191-205. Berlin: Akademie Verlag. Sek, A., & Moore, B. C. (1995). Frequency discrimination as a function of frequency, measured in several ways. The Journal of the Acoustical Society of America, 97(4), 2479–2486. doi:10.1121/1.411968 Sengers, P., Boehner, K., Mateas, M., & Gay, G. (2008). The disenchantment of affect. Personal and Ubiquitous Computing, 12(5), 347–358. doi:10.1007/s00779-0070161-4 Sengers, P., & Gaver, B. (2006). Staying open to interpretation: Engaging multiple meanings in design and evaluation. Proceedings of the 6th Conference on Designing Interactive Systems, 2006, 99-108. Sequeira, S. D. S., Specht, K., Hämäläinen, H., & Hugdahl, K. (2008). The effects of different intensity levels of background noise on dichotic listening to consonantvowel syllables. Scandinavian Journal of Psychology, 49(4), 305–310. doi:10.1111/j.1467-9450.2008.00664.x Serafin, S. (2004). Sound design to enhance presence in photorealistic virtual reality. In Proceedings of the 2004 International Conference on Auditory Display. Serquera, J., Miranda, E. R. (2010) CA sound synthesis with an extended version of the multi-type voter model. AES128 (8029) London, UK. Sevsay, E. (2005). Handbuch der Instrumentationspraxis (1st ed.). Kassel, Germany: Bärenreiter.
Seyama, J., & Nagayama, R. S. (2007). The uncanny valley: The effect of realism on the impression of artificial human faces. Presence (Cambridge, Mass.), 16(4), 337–351. doi:10.1162/pres.16.4.337 Shams, L., Kamitani, Y., & Shimojo, S. (2000). What you see is what you hear. Nature, 408, 788. doi:10.1038/35048669 Shams, L., Kamitani, Y., & Shimojo, S. (2002). Visual illusion induced by sound. Brain Research. Cognitive Brain Research, 14, 147–152. doi:10.1016/S09266410(02)00069-1 Sharpe, L. (2004). Patterns of autonomic arousal in imaginal situations of winning and losing in problem gambling. Journal of Gambling Studies, 20, 95–104. doi:10.1023/B:JOGS.0000016706.96540.43 Sheridan, T. B. (1994). Further Musings on the Psychophysics of Presence. Presence (Cambridge, Mass.), 5, 241–246. Shiffrin, R. M., & Grantham, D. W. (1974). Can attention be allocated to sensory modalities? Perception & Psychophysics, 15, 460–474. Shilling, R., Zyda, M., & Wardynski, E. C. (2002). Introducing emotion into military simulation and videogame design: America’s Army: Operations and VIRTE. In Conference GameOn 2002. Retrieved January 1, 2010, from http://gamepipe.usc.edu/~zyda/pubs/ShillingGameon2002.pdf. Shultz, P. (2008). Music theory in music games . In Collins, K. (Ed.), From Pac-Man to pop music: Interactive audio in games and new media (pp. 177–188). Hampshire, UK: Ashgate. Sider, L. (Ed.). (2003). Soundscape: The School of Sound lectures 1998-2001. London: Wallflower Press. Sierra (1993). Gabriel Knight: Sins of the Fathers [Computer game]. Sierra Entertainment. Silent hill 2. [Computer game]. (2001). KCET (Developer). Redwood City: Konami of America.
459
Compilation of References
Silent hill 3. [Computer game]. (2003). KCET (Developer). Redwood City: Konami of America. Silent hill homecoming [Computer game]. (2008). Double Helix & Konami (Developer/Co-Developer). Tokyo, Japan: Konami. Silent hill series. (1999-). Konami. Silent hill. [Computer game]. (1999). KCEK (Developer). Redwood City: Konami of America. Sim city. (1999-2007). Maxis. Simmel, G. (1979). The metropolis and mental life. Retrieved February 1, 2010, from http://www.blackwellpublishing.com/content/BPL_Images/Content_store/ Sample_chapter/0631225137/Bridge.pdf. SimTunes. [Video game], (1996). Maxis (Developer). Singer, W., Engel, A. K., Kreiter, A. K., Munk, M. H. J., Neuenschwander, S., & Roelfsema, P. R. (1997). Neuronal assemblies: necessity, signature and detectability. Trends in Cognitive Sciences, 1(7), 252–261. doi:10.1016/ S1364-6613(97)01079-6
Smith, J. O. III. (1992). Physical modeling using digital waveguides. Computer Music Journal, 16(4), 74–91. doi:10.2307/3680470 Smith, B. R. (2004). Tuning into London c.1600 . In Bull, M., & Back, L. (Eds.), The auditory culture reader (1st ed., pp. 127–136). Oxford, UK: Berg. Sobchack, V., & Sobchack, T. (1980). An introduction to film. Boston, MA: Little Brown. Sonnenschein, D. (2001). Sound design: The expressive power of music, voice and sound effects in cinema. Studio City, CA: Michael Wiese Productions. Sotamaa, O. (2009). The player’s game: Towards understanding player production among computer game cultures. Unpublished doctoral dissertation. University of Tampere, Finland. Space invaders [Computer game]. (1978). Tokyo, Japan: Taito. Spadoni, R. (2000). Uncanny bodies. Berkeley: University of California Press.
SingStar. [Video game], (2004). Sony Computer Entertainment Europe (PlayStation 2 & 3).
Spence, C., Nicholls, M. E. R., & Driver, J. (2001). The cost of expecting events in the wrong sensory modality. Perception & Psychophysics, 63(2), 330–336.
Sjöström, V. (1921). The phantom chariot. Svensk Filmindustri.
Splinter cell series. (2002-). Ubisoft.
Skea, W. H. (1995). “Postmodern” Las Vegas and its effects on gambling. Journal of Gambling Studies, 11(2), 231–235. doi:10.1007/BF02107117 Slater, M. (2002). Presence and the sixth sense. Presence (Cambridge, Mass.), 11(4), 435–439. doi:10.1162/105474602760204327 Smith, C. A., & Morris, L. W. (1976). Effects of stimulative and sedative music on cognitive and emotional components of anxiety. Psychological Reports, 38, 1187–1193. Smith, B. R. (1999). The acoustic world of early modern England: Attending to the o-factor (1st ed.). Chicago: University Of Chicago Press.
460
Spore. (2008). Electronic Arts. Spyro the Dragon. (2008). Insomniac Games. Sony Computer Entertainment. Stanton, A. (2008). Wall-E. Pixar Animation Studios. Steckenfinger, A., & Ghazanfar, A. (2009). Monkey behavior falls into the uncanny valley. Proceedings of the National Academy of Sciences of the United States of America, 106(43), 18362–18366. doi:10.1073/ pnas.0910063106 Stenzel, M. (2005). Automatische Arrangiertechniken für affektive Sound-Engines von Computerspielen. Unpublished diploma thesis. Otto-von-Guericke University, Department of Simulation and Graphics, Magdeburg, Germany.
Compilation of References
Steuer, J. (1992). Defining virtual reality: Dimensions determining telepresence. The Journal of Communication, 42(4), 73–93. doi:10.1111/j.1460-2466.1992.tb00812.x Stigwood, R., & Badham, J. (Producers). (1977). Saturday night fever [Motion picture]. Hollywood, CA: Paramount. Stockburger, A. (2007). Listen to the iceberg: On the impact of sound in digital games . In von Borries, F., Walz, S. P., & Böttger, M. (Eds.), Space time play: Computer games, architecture and urbanism: The next level (pp. ##-##). Location: Birkhäuser Publishing. Stockburger, A. (2003). The game environment from an auditory perspective. In M. Copier & J. Raessens (Eds.), Proceedings of Level Up: Digital Games Research Conference. Stockmann, L. (2007). Designing an audio API for mobile platforms. Internship report. Magdeburg, Germany: Otto-von-Guericke University. Stockmann, L., Berndt, A., & Röber, N. (2008). A musical instrument based on interactive sonification techniques. In Proceedings of Audio Mostly 2008: 3rd Conference on Interaction with Sound (pp. 72-79). Piteå, Sweden: Interactive Institute/Sonic Studio Piteå. Subrahmanyan, N., & Lal, B. (1974). A textbook of sound. Delhi: University of Delhi. Sucker Punch Productions (Developer). (2009). Famous [Computer game]. Sony Computer Entertainment. Sullivan, D. B. (1992). Commentary and viewer perception of player hostility: Adding punch to televised sports. Journal of Broadcasting & Electronic Media, 35, 487–504. Sun Microsystems. (2010). Java ME API. Retrieved February 4, 2010, from http://java.sun.com/javame/ reference/apis.jsp. Super Mario Bros. NES (1985). Nintendo. Nintendo. Surman, D. (2007). Pleasure, spectacle and reward in Capcom’s Street Fighter series . In Krzywinska, T., & Atkins, B. (Eds.), Videogame, player, text (pp. 204–221). London: Wallflower.
Sweethome. [Computer game]. (1989). Capcom (Developer). Osaka: Capcom. Sweetser, P., & Wyeth, P. (2005). GameFlow: A model for evaluating player enjoyment in games. [CIE]. Computers in Entertainment, 3(3), 3. doi:10.1145/1077246.1077253 Sykes, J., & Brown, S. (2003). Affective gaming: Measuring emotion through the gamepad. In Proceedings of Conference on Human Factors in Computing Systems (CHI ‘03). Takala, T., & Hahn, J. (1992). Sound rendering. In Proceedings of SIGGRAPH ’92: The 19th annual conference on Computer graphics and interactive techniques, 26(2), 211-220. New York: ACM. Tamminen, S., Oulasvirta, A., Toiskallio, K., & Kankainen, A. (2004). Understanding mobile contexts. Personal and Ubiquitous Computing, 8(2), 135–143. doi:10.1007/ s00779-004-0263-1 Tarantino, Q. (1994). Pulp fiction. Miramax. Tarkovsky, A. (1972). Solaris. Mosfilm. Tarkovsky, A. (1979). Stalker. Mosfilm. Tarkovsky, A. (1986). Sacrifice. Argos Films. Taube, H. K. (2004). Notes from the metalevel: Introduction to algorithmic music composition. London, UK: Taylor & Francis. Taylor, L. (2005). Toward a spatial practice in video games. Gamology.Retrieved from http://www.gamology. org/node/809. Tellegen, A., Watson, D., & Clark, A. L. (1999). On the dimensional and hierarchical structure of affect. Psychological Science, 10(4), 297–303. doi:10.1111/14679280.00157 guest [Computer game]. (1993). Trilobyte (Developer). London: Virgin Games. th
Thayer, J. F., & Levenson, R. W. (1983). Effects of music on psychophysiological responses to a stressful film. Psychomusicology, 3(1), 44–52.
461
Compilation of References
The adventures of Rocky and Bullwinkle [Computer game]. (1992). Radical Entertainment (Developer). Agoura Hills, CA: THQ. The Beatles. Rock band [Computer game]. (2009). Harmonix. Redwood City, CA: EA Games. The casting [Technology demonstration]. (2006). Quantic Dream (Developer). Foster City, CA: Sony Computer Entertainment, Inc. The Curious Team. (1999). Curious about space: Can you hear sounds in space? Ask an Astronomer. Retrieved September 31, 2009, from http://curious.astro.cornell.edu/ question.php?number=8 TheElder Scrolls III: Morrowind. (2002). Bethesda Softworks. The Flintstones. (1991). The rescue of Dino & Hoppy [Computer game]. Vancouver, Canada: Taito Corporation. The Jetsons. (1992). Cogswell’s caper! [Computer game]. Vancouver, Canada: Taito Corporation. The legend of Zelda: A link to the past. (1992). Nintendo EAD (Developer). Kyoto, Japan: Nintendo The legend of Zelda: Twilight princess. (20060). Nintendo. The path. (2009). Tale of Tales. The Sims series (2000-). Electronic Arts. Theremin, L. S. (1924). Method of and apparatus for the generation of sounds. U.S. Patent No. 73,529. Washington, DC: U.S. Patent and Trademark Office. Thibaud, J. (1998). The acoustic embodiment of social practice: Towards a praxiology of sound environment . In Karlsson, H. (Ed.), Proceedings of Stockholm, Hey Listen! (pp. 17–22). Stockholm: The Royal Swedish Academy of Music. Thief 3: Deadly shadows. (2004). Eidos. Thief: The dark project. (1998), Eidos. Thom, R. (1999). Designing a movie for sound. Retrieved July 7, 2009, from http://filmsound.org/articles/designing_for_sound.htm
462
Thompson, J. B. (1995). The media and modernity. Standford, CA: Stanford University Press. Tinwell, A., Grimshaw, M., & Williams, A. (2010). Uncanny behaviour in survival horror games. Journal of Gaming and Virtual Worlds, 2(1), 3–25. doi:10.1386/ jgvw.2.1.3_1 Tinwell, A., Grimshaw, M., & Williams, A. (2011). Uncanny speech . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Tinwell, A. (2009). The uncanny as usability obstacle. In A. A. Ozok & P. Zaphiris (Eds.), Online Communities and Social Computing workshop, HCI International 2009, 12, 622-631. Tinwell, A., & Grimshaw, M. (2009). Bridging the uncanny: An impossible traverse? In Proceedings of Mindtrek 2009. Tobler, H. (2004). CRML—Implementierung eines adaptiven Audiosystems. Unpublished master’s thesis. Fachhochschule Hagenberg, Hagenberg, Austria. Tom Clancy’s ghost recon: Advanced warfighter 2. (2007). Ubisoft. Toneatto, T., Blitz-Miller, T., Calderwood, K., Dragonetti, R., & Tsanos, A. (1997). Cognitive distortions in heavy gambling. Journal of Gambling Studies, 13, 253–261. doi:10.1023/A:1024983300428 Tonkiss, F. (2004). Aural postcards: sound, memory and the city . In Back, M., & Bull, L. (Eds.), The auditory culture reader (1st ed., pp. 303–310). Oxford, UK: Berg. Too human [Computer game]. (2008). Silicon Knights (Developer). United States: Microsoft Game Studios. Toprac, P., & Abdel-Meguid, A. (2011). Causing fear, suspense, and anxiety using sound design in computer games . In Grimshaw, M. (Ed.), Game sound technology and player interaction: Concepts and developments. Hershey: IGI Global.
Compilation of References
Trautmann, L., & Rabenstein, R. (2003). Digital sound synthesis by physical modelling using the functional transformation method. New York: Kluwer Academic/ Plenum Publishers.
Välimäki, V., Pakarinen, J., Erkut, C., & Karjalainen, M. (2006). Discrete-time modelling of musical instruments. Reports on Progress in Physics, 69, 1–78. doi:10.1088/0034-4885/69/1/R01
Traxel, W., & Wrede, G. (1959). Changes in physiological skin responses as affected by musical selection. Journal of Experimental Psychology, 16, 57–61.
Valve Corporation. (1998). Half-Life [computer game]. Sierra Entertainment.
Traxxpad. [Video game], (2007). Eidos Interactive (PlayStation Portable). Truax, B. (2001). Acoustic communication. Westport, CT: Greenwood Press. Truax, B. (1995, September). Sound in context: Acoustic communication and soundscape research Simon Fraser University. Paper presented at the International Computer Music Conference. Truppin, A. (1992). And then there was sound: The films of Andrei Tarkovsky . In Altman, R. (Ed.), Sound theory sound practice. New York: Routledge. Tsukahara, N. (2002). Game machine with random sound effects. U.S. Patent No. 6,416,411 B1. Washington, DC: U.S. Patent and Trademark Office. Tulving, E., & Lindsay, P. H. (1967). Identification of simultaneously presented simple visual and auditory stimuli. Acta Psychologica, 27, 101–109. doi:10.1016/00016918(67)90050-9 Turner, N., & Horbay, R. (2004). How do slot machines and other electronic gambling machines actually work? Journal of Gambling Issues, 11. Tuuri, K., Mustonen, M., & Pirhonen, A. (2007). Same sound—different meanings: A novel scheme for modes of listening. In Proceedings of the Second International AudioMostly Conference, 2007, 13-18. Ubisoft Shanghai (Developer). (2008). Tom Clancy’s EndWar [Computer game]. Ubisoft.
van den Doel, K., & Pai, D. K. (1998). The sounds of physical shapes. Presence (Cambridge, Mass.), 7(4), 382–395. doi:10.1162/105474698565794 Vatakis, A., & Spence, C. (2006). Audiovisual synchrony perception for speech and music using a temporal order judgment task. Neuroscience Letters, 393, 40–44. doi:10.1016/j.neulet.2005.09.032 Verbiest, N., Cornelis, C., & Saeys, Y. (2009). Valued constraint satisfaction problems applied to functional harmony. In Proceedings of IFSA World Congress EUSFLAT Conference (pp. 925-930). Lisbon, Portugal: International Fuzzy Systems Association, European Society for Fuzzy Logic and Technology. Vicario, G. B. (2001). Prolegomena to the perceptual study of sounds . In Polotti, P., Papetti, S., Rocchesso, D., & Delle, S. (Eds.), The sounding object (Sob project) (p. 13). Verona: University of Verona. Vinayagamoorthy, V., Steed, A., & Slater, M. (2005). Building characters: Lessons drawn from virtual environments. In Proceedings of Toward social mechanisms of android science, COGSCI 200, 119-126. von Ahn, L., & Dabbish, L. (2008). Designing games with a purpose. Communications of the ACM, 51(8), 58–67. doi:10.1145/1378704.1378719 Vorländer, M. (2008). Auralization—Fundamentals of acoustics, modelling, simulation, algorithms and acoustic virtual reality (1st ed.). Berlin: Springer. Wachowski, L., & Wachowski, A. (1999). The matrix. Warner Bros. Pictures.
Ultimate band. (2008). Fall Line Studios. Väänänen, R. (1998). Verification model of advanced BIFS (systems VM 4.0 subpart 2). ISO/IEC JTCI/SC29/WG11.
463
Compilation of References
Wallén, J. (2008). Från smet till klarhet. Unpublished bachelor’s thesis. University of Skövde, Country. Retrieved month day, year, from http://his.diva-portal.org/ smash/record.jsf?searchId=1&pid=diva2:2429 Warcraft 3: Reign of chaos. (2002). Blizzard Entertainment. Ware, C. (2004). Information visualization: Perception for design (2nd ed.). Location: Morgan Kaufman Publishing. Warren, D. H., Welch, R. B., & McCarthy, T. J. (1982). The role of visual-auditory “compellingness” in the ventriloquism effect: Implications for transitivity among the spatial senses. Perception & Psychophysics, 30(6), 557–564. Warren, R. M. (1992). Perception of acoustic sequences . In McAdams, (Eds.), Thinking in sound: The cognitive psychology of human audition. New York: Clarendon Press. Oxford: University Press. Watson, D., & Tellegen, A. (1985). Toward a consensual structure of mood. Psychological Bulletin, 98(2), 219–235. doi:10.1037/0033-2909.98.2.219 Watson, D., Wiese, D., Vaidya, J., & Tellegen, A. (1999). The Two General Activation Systems of Affect: Structural findings, evolutionary considerations, and psychobiological evidence. Journal of Personality and Social Psychology, 76(5), 820–838. doi:10.1037/0022-3514.76.5.820 Wenzel, E. M. (1998). The impact of system latency on dynamic performance in virtual acoustic environments. In Proceedings of the 15th International Congress on Acoustics and 135th Meeting of the Acoustical Society of America, 2405-2406. Wenzel, E. M. (2001). Effect of increasing system latency on localization of virtual sounds with short and long duration. In Proceedings of 7th International Conference on Auditory Displays (ICAD). 185-190. Weschler, L. (2002). Why is this man smiling? Wired. Retrieved April 7, 2009, from http://www.wired.com/ wired/archive/10.06/face.html.
464
Wessel, D. L. (1973). Physchoacoustics and music: A report from Michigan State University. PAGE Bulletin of the Computers Arts Soc., 30. West, S. (Director). (2001). Laura Croft:Tomb raider [Motion picture]. Hollywood, CA: Paramount. Westerkamp, H. (1990). Listening and soundmaking: A study of music-as-environment . In Lander, D., & Lexier, M. (Eds.), Sound by artists (pp. ##-##). Location: Art Metropole & Walter Phillips Gallery. Westermann, C. F. (2008). Sound branding and corporate voice: Strategic brand management using sound. Usability of speech dialog systems: Listening to the target audience. Berlin: Springer-Verlag. Whalen, Z. (2004). Play along: An approach to videogame music. Game Studies, 4(1). Retrieved from http://www. gamestudies.org/0401/whalen/. White, G. (2008). Comment on the IEZA: A framework for game audio. Retrieved January 13, 2010, from http:// www.gamasutra.com/view/feature/3509/ieza_a_framework_for_game_audio.php Whitmore, G. (2009). The runtime studio in your console: The inevitable directionality of game audio. Develop, 94, 21. Whittington, W. (2007). Sound design & science fiction. Austin: University of Texas Press. WiiMusic. [Video game], (2008). Kyoto: Nintendo. Wilde, M. D. (2004). Audio programming for interactive games. Oxford: Focal Press. Wilhelmsson, U., & Wallén, J. (2011). A combined model for the structuring of game audio . In Grimshaw, M. (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Global. Wilhelmsson, U. (2001). Enacting the point of being. Computer games, interaction and film theory. Unpublished doctoral dissertation. University of Copenhagen, Country.
Compilation of References
Williams, A. (1985). Godard’s use of sound . In Weis, E., & Belton, J. (Eds.), Film sound: Theory and practice. New York: Columbia University Press. Williams, L. (2006). Music videogames: The inception, progression and future of the music videogame. In Proceedings of Audio Mostly 2006: A Conference on Sound in Games (pp. 5-8). Piteå, Sweden: Interactive Institute, Sonic Studio Piteå. Wingstedt, J. (2008). Making music mean: On functions of, and knowledge about, narrative music in multimedia. Unpublished doctoral dissertation. Luleå University of Technology, Sweden. Wolf, M. J. P. (2003). Abstraction in the video game . In Perron, B., & Wolf, M. J. P. (Eds.), The video game theory reader (pp. 47–65). New York: Routledge. Wolfson, S., & Case, G. (2000). The effects of sound and colour on responses to a computer game. Interacting with Computers, 13, 183–192. doi:10.1016/S09535438(00)00037-0 Wooller, R. W., & Brown, A. R. (2005). Investigating morphing algorithms for generative music. In Proceedings of Third Iteration: Third International Conference on Generative Systems in the Electronic Arts. Melbourne, Australia. Working Group Noise Eurocities. (n.d.). Retrieved January 10, 2010, from http://workinggroupnoise.web-log.nl/. World of Warcraft. (2004). Blizzard Entertainment. Blizzard. World of warcraft. (2005). Blizzard. World soundscape project. (n.d.). Retrieved September 31, 2009, from http://www.sfu.ca/~truax/wsp.html Woszczyk, W., Bech, S., & Hansen, V. (1995). Interactions between audio-visual factors in a home theater system: Definition of subjective attributes. AES 99th Convention. Preprint 4133.
Wundt, W. (1896). Grundriss der Psychologie. Leipzig, Germany: Alfred Kröner Verlag. Wurtzler, S. (1992). “She sang live, but the microphone was turned off”: The live, the recorded and the subject of representation . In Altman, R. (Ed.), Sound theory sound practice. New York: Routledge. Yalch, R. F., & Spangenberg, E. R. (2000). The effects of music in a retail setting on real and perceived shopping times. Journal of Business Research, 49, 139–147. doi:10.1016/S0148-2963(99)00003-X Yamada, M. (2009, September). Can music change the success rate in a slot-machine game? Paper presented at the Western Pacific Acoustics Conference, Bejing, China. Yee-King, M., & Roth, M. (2008). Synthbot: An unsupervised software synthesiser programmer. International Computer Music Conference. Yoshi’s Island. (2007). Nintendo Japan. Nintendo. Yost, W. A. (2007). Fundamentals of hearing: An introduction (5th ed.). New York: Academic Press. You don’t know jack [Computer game]. (1995). Berkeley Systems/Jellyvision (Developer). Fresno, CA: Sierra On-Line. Young, K. (2006). Recreating reality. Game Sound. Retrieved February 13, 2009, from http://www.gamesound. org/articles/RecreatingReality.html Zahorik, P., & Jenison, R. L. (1998). Presence as beingin-the-world. Presence (Cambridge, Mass.), 7(1), 78–89. doi:10.1162/105474698565541 Zelda: Phantom Hourglass. (2007). Nintendo. Nintendo. Zemekis, R. (Producer/Director). (2004). The polar express [Motion picture]. California: Castle Rock Entertainment. Zemekis, R. (Producer/Director). (2007). Beowulf [Motion picture]. California: ImageMovers.
Wrightson, K. (2000). An introduction to acoustic ecology. Soundscape: The Journal of Acoustic Ecology, I (I, Spring 2000), 10-13.
465
Compilation of References
Zhang, T., & Jay Kuo, C. C. (2001). Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9(4), 441–457. doi:10.1109/89.917689
Zillman, D. (1991). The logic of suspense and mystery . In Bryant, J., & Zillman, D. (Eds.), Responding to the screen: Reception and reaction processes (pp. 281–303). Hillsdale, NJ: Lawrence Erlbaum Associates.
Zheng, C. & James, D. L. (2010). Rigid-Body Fracture Sound with Precomputed Soundbanks. ACM Transaction on Graphics (SIGGRAPH 2010), 29(3).
Zwicker, E., & Fastl, H. (1999). Psychoacoustics—Facts and models (2nd ed.). Berlin: Springer Verlag.
Zheng, C., & James, D. L. (2009). Harmonic fluids. ACM Transaction on Graphics (SIGGRAPH 2009), 28(3). Zielinski, S., Rumsey, F., Bech, S., de Bruyn, B., & Kassier, R. (2003). Computer games and multichannel audio quality—The effect of division of attention between auditory and visual modalities. In Proceedings of the AES 24th International Conference on Multichannel Audio, 85-93.
466
Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands. The Journal of the Acoustical Society of America, 33, 248. doi:10.1121/1.1908630
467
About the Contributors
Mark Grimshaw is a Reader in Creative Technologies in the School of Business & Creative Technologies at the University of Bolton, United Kingdom, where he runs the Emotioneering Research Group. He possesses an honours degree in music, an MSc in music technology, and a PhD in computer game sound from South Africa, England, and New Zealand and is widely published in the area of computer games, particularly on the topics of immersion and sound. Mark’s previous book was entitled The Acoustic Ecology of the First-Person Shooter and he is also the lead developer for WIKINDX, an Open Source, Virtual Research Environment in wide use around the world. *** Ahmed Alaa Abdel-Meguid was born in Cairo, Egypt to Alaa Abdel-Meguid and Azza Tawfik. Soon afterwards, his family moved to the Midwestern United States where they soon made a home for themselves. His first video game was Joust for the Atari 5200 at age six. He started in game design as a game-master for tabletop roleplaying games such as Dungeons and Dragons during his high-school years. After earning his Bachelor’s of Organizational Leadership at Illinois State University, he immediately went on to The Guildhall at Southern Methodist University to earn his Masters of Interactive Technology with a specialization in Level Design. As of writing this biography, he is currently working on The Old Republic at BioWare Austin as a World Builder. In his spare time, he plays the guitar and violin, swings fire, and paints little space marines and orcs. Valter Alves is a lecturer of Computer Science at the Informatics Department of the Polytechnic Institute of Viseu, Portugal. He has taught diverse courses to Informatics Engineering and Technology and Design of Multimedia students. He holds a degree in Informatics Engineering and a MsC in Information Systems and Technologies, both from the Faculty of Sciences and Technology of the University of Coimbra, where he is now a PhD candidate under the supervision of Professor Licínio Roque. Valter is also a researcher at the Center for Informatics and Systems of the University of Coimbra. His research interests include human–computer interaction, user experience, computer game design, sound design, context, emotions, and research targeting handicapped people. Currently his research is focused on the enrichment of user experience through soundscape design. Axel Berndt studied computer science and music at the Otto-von-Guericke University in Magdeburg, Germany. He is currently working there as a computer music researcher. His research interests comprise expressive performance analysis and modelling, musical structure analysis, automatic comCopyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
About the Contributors
position, arrangement, and adaptation for interactive media scoring. Beyond that, Axel Berndt is active as a musician and composer. Karen Collins is Canada Research Chair in Interactive Audio at the Canadian Centre of Arts and Technology, University of Waterloo, Canada. She is the author of a book on game audio, Game Sound: An Introduction to the History, Theory and Practice of Video Game Music and Sound Design published by The MIT Press, and editor of From Pac-Man to Pop Music: Interactive Audio in Games and New Media published by Ashgate. Stuart Cunningham was awarded the BSc degree in Computer Networks in 2001 and, in 2003, was awarded the MSc Multimedia Communications degree with Distinction, both from the University of Paisley (UK). In 2009 he was awarded the degree of PhD in Data Reduced Audio Coding by the University of Wales (UK). He is a Fellow of the British Computer Society (BCS), Chartered IT Professional (CITP), Member of the Institution of Engineering & Technology (IET) and Member of the Institute of Electrical and Electronics Engineers (IEEE). Dr Cunningham was a member of the MPEG Music Notation Standards (MPEG-SMR) working group. His research interests are in the areas of digital audio, computer music, human perception of sound, and audio compression techniques. In his spare time, Stuart is an avid mountain biker and performs in a Pink Floyd tribute band named Pink Lloyd. Michael J. Dixon is Professor of Psychology at the University of Waterloo. He is one of the foremost authorities on synaesthesia (an anomalous type of perception). His current research into problem gambling is aimed at identifying those elements of the gambling experience that lead to measurable changes in behaviour–changes which may, potentially, lead to problem gambling. Milena Droumeva has a Bachelors degree in Communication (focusing on acoustic communication and acoustic ecology) and media studies. Then she completed a Masters in Interactive Arts and Technologies focusing on interactive soundscape design for responsive environments and ambient intelligent games. She has since worked on a variety of game sound projects and has a particular interest in adapting sonification techniques and environmental sound for games. Currently, she is pursuing a Doctorate in Education exploring the cultural and epistemological implications of secondary orality and the soundscape. She is interested in drawing connections between listening experiences in designed soundscapes, and our practices and conceptions around knowledge. She did not grow up as a gamer but came to gaming and a subsequent keen interest in game sound through procrastination from other graduate work. Andy Farnell is a computer scientist from the UK specialising in audio DSP and synthesis. Author of Designing Sound, his original research and design work champions the emerging field of Procedural Audio. Between consultancy for pioneering game and audio technology companies he teaches widely, as resident lecturer and visiting professor at several European institutions. Andy is a long-time advocate of free open source software, educational opportunities and universal access to enabling tools and knowledge. Jonathan Fugelsang is Assistant Professor in Cognitive Psychology at the University of Waterloo. His research interests span several topics in cognitive psychology and cognitive neuroscience, though
468
About the Contributors
his primary focus is in higher level cognition. He has recently joined the problem gambling research team at the University of Waterloo. Guillaume Roux-Girard is a Master’s Degree student in film studies at the University of Montreal. His current research focus on the different roles of sound in horror video games. His recent publications include an appendix chapter on film studies and video games in the Video Game Theory Reader 2 (Routledge, 2009) and a chapter on the Alone in the Dark series (1992-2008) in the anthology Horror Video Games: Essays on the Fusion of Fear and Play (McFarland, 2009). Vic Grout was awarded a BSc in Mathematics and Computing from the University of Exeter in 1984 and a PhD in Communication Engineering from Plymouth Polytechnic in 1988. He has worked in senior positions in both academia and industry for twenty years and has published and presented over 200 research papers and three books. He is currently Professor of Network Algorithms at Glyndŵr University, Wales, where he leads the Centre for Applied Internet Research. Professor Grout is a Chartered Engineer, Chartered Electrical Engineer, Chartered Scientist, Chartered Mathematician and Chartered IT Professional, a Fellow of the Institute of Mathematics and its Applications, British Computer Society and Institution of Engineering and Technology and a Senior Member of the Institute of Electrical and Electronics Engineers. He chairs the biennial international conference series on Internet Technologies and Applications (ITA 05, ITA 07 and ITA 09). Kevin Harrigan teaches game design and is the lead researcher and contact person for the Problem Gambling Research Team at the University of Waterloo. His primary research interest is in gambling addictions with a focus on why so many slot machine gamblers become addicted. Daniel Hug has a background in music, sound design, interaction design and project management in applied research. From 1999 he has investigated sound and interaction design-related questions through installations, design works and theoretical publications. Since 2005, he teaches sound studies and sound design at the Interaction Design and the Game Design departments of the Zurich University of the Arts, Switzerland. Daniel is currently pursuing a PhD on sound design for interactive commodities at the University of the Arts and Industrial Design of Linz, Austria, is management committee member in the European COST-initiative Sonic Interaction Design, and greatly enjoys the fact that his profession “requires” him to play computer games regularly. Kristine Jørgensen is a postdoctoral research fellow at the Department of Information Science and Media Studies, University of Bergen, Norway. She holds a Ph.D. in Media Studies from the University of Copenhagen with a thesis on the functional role of computer game sound. Her current research project is funded by a grant from the Norwegian Research Council, and focuses on the communicative aspects of computer games, fiction in games, and the relationship between the user interface and the gameworld. Jørgensen is also a board member of Joingame, the Norwegian network for games research and development. Mats Liljedahl Since the mid 1980’s, Mats Liljedahl has been working with sound, music and digital and interactive media in various forms and contexts. Since 2000 he has been at the Interactive Institute, Sweden, involved in research and development projects related to sound and sound design, all built on
469
About the Contributors
and carried by interactive media. Mats Liljedahl has a special interest in how people perceive sound and how sound affects us cognitively, emotionally and intuitively. This interest has led to projects focusing on how sound can be used in new ways and in new contexts. Examples of projects include audio based games as research tools and as potential new gaming products, sound design for information and new tools and methods for working with sound design. Eoin Mullan obtained his undergraduate Degree in Electronic and Software Engineering from the University of Ulster in 2005. This included an industrial placement year spent writing software for British Telecom. In 2006 he completed a Masters in Sonic Arts at the Sonic Arts Research Centre (SARC) in Queen’s University Belfast, which combined his background in programming with elements of music, sound design, composition, musical interface design, acoustics, and physical modelling. Eoin returned to SARC to undertake his PhD in the area of physical modelling for real time sound synthesis in computer games and virtual environments. He is currently researching efficient ways to synthesise contact sounds for objects that may be modified in real-time and for arbitrarily shaped objects. David Murphy is a lecturer and researcher at the Department of Computer Science, University College Cork, Ireland where he is also a director of the Interactive Medical Computing Lab. In a previous life, David was a professional musician, and a Multimedia Engineer at Apple Computer, where he was responsible for Audio and MIDI in Apple products. In 1999 David left Apple to setup the Multimedia section of the Computer Science Department, UCC. His research interests include spatial sound, serious games, and virtual reality. Lennart Nacke received one of Europe’s first Ph.D. degrees in Digital Game Development from Blekinge Institute of Technology, Sweden. He is currently working on affective and entertainment computing as a postdoctoral fellow in the Human-Computer Interaction Lab of the University of Saskatchewan, Canada. He chaired and co-organized several expert panels on psychophysiological player measurement and interaction, game usability and UX at academic conferences (e.g., DiGRA, Future Play, CHI) and industry venues (e.g., GDC Canada). As much as an avid gamer, he is a passionate scientist, whose research interests are psychophysiological player testing and interaction for example with EEG (i.e., brainwaves) and EMG (i.e., facial muscle contractions) or eye tracking as well as gameplay experience in player-game interaction, technology-driven innovation (e.g., playability metrics, affective computing) and innovative interaction design with digital entertainment technologies. Flaithrí Neff is a lecturer and researcher at the Department of Electrical & Electronic Engineering, Limerick Institute of Technology, Ireland. He is also a research member of the IDEAS Research Group at the Department of Computer Science, University College Cork, Ireland, where he is currently completing his PhD studies. In 2002 he attained a first class honours MSc degree at the University of Limerick, Ireland specializing in Audio Technology. His research interests are in virtual sonic interface design and intelligent hearing systems. He is particularly focused on applying his research to issues encountered by visually-disabled users of technology. Linda O Keeffe (www.lindaokeeffe.com) is a sound artist currently pursuing a PhD within the department of sociology, Maynooth and her working title is How I See What I Hear. She has exhibited internationally and in Ireland where she lives. O Keeffe is also in the process of composing a body of work with musician composer Tony Doyle for performance and CD. 470
About the Contributors
Richard Picking is a Reader in Computing at Glyndŵr University in Wales and Deputy Director of the Centre for Applied Internet Research (CAIR). He has a BSc (Hons) degree in Computing and Operational Research from Leeds Polytechnic (UK, 1986), an MSc in Control Engineering and Information Technology (University of Sheffield, UK, 1987) and a PhD in Interactive Multimedia Interface Design from Loughborough University (UK) in 1996. His research interests cover various aspects of user-interface design and usability. Rich is a passionate saxophonist and keen songwriter. Ulrich Reiter is a researcher and lecturer working in the fields of audiovisual quality perception, subjective assessment methodologies, and interactivity issues in audiovisual applications at the Norwegian University of Science and Technology (NTNU) in Trondheim, Norway. He holds a Master’s degree in electrical engineering from RWTH Aachen, and a PhD in media technology from TU Ilmenau, both in Germany. Ulrich was the development coordinator for the cross-platform, object-based, multi-processing, and real-time audio rendering engine TANGA used in the IAVAS I3D MPEG-4 player. His work has been published in numerous AES-, IEEE- and other journals, conference proceedings and papers. He was the recipient of the ‘IEEE International Symposium on Consumer Electronics (ISCE) Best Paper Award’ in 2005 and 2007. Ulrich’s current research focus is on cross-modal effects in audiovisual media. Licínio Roque obtained a PhD in Informatics Engineering from the University of Coimbra while developing Context Engineering, a socio-technical approach to Information Systems Development. He has been practicing research and development in diverse fields: management information systems, individual and organizational learning, technologies for online communities, and computer games. Over the last 10 years he taught postgraduate courses on Software Engineering, Human-Computer Interaction, Ludic Learning Contexts, Game Studies and Development, using studio and project-based methodologies. He also teaches a course on game design as strategy for exploring cultural heritage as part of the EuroMACHS European Master Program. Currently, he does research on design methodology and technologies for multiplayer online games. He is Adjunct Teaching Professor at Carnegie Mellon University, on the MSE Program. Holly Tessler is Senior Lecturer and Program Leader in the Music Industry Management program at the University of East London, UK. She recently completed her PhD on music and branding at the Institute of Popular Music at the University of Liverpool. Angela Tinwell As a Senior Lecturer in the School of Business & Creative Technologies at the University of Bolton, Angela Tinwell is researching the subject area of the Uncanny for a PhD. Recent works, including Uncanny as Usability Obstacle, authored for the HCI International Conference 2009, and Survival Horror Games – An Uncanny Modality, for the Thinking After Dark Conference, 2009, investigate the implications of the Uncanny Valley phenomenon for realistic, human-like virtual characters within 3D immersive environments. Angela Tinwell teaches modules on the Computer Games Design and Computer Games Art Courses at the University of Bolton which involve the design and creation of 3D characters for Computer Games. Paul Komninos Toprac is a lecturer at The Guildhall at Southern Methodist University, where he focuses on teaching and the research, design, and implementation of game technology-based applications. He has more than the twenty years of experience in the software industry, in roles ranging from
471
About the Contributors
CEO to product manager to consultant. During his studies at the University of Texas at Austin, Paul was the producer and designer of a science-based computer game called Alien Rescue: The Game, which was used in his dissertation entitled The Effects of a Problem Based Learning Computer Game on Continuing Motivation to Learn Science. He holds a Bachelor’s of Science in Engineering, a Master’s of Business Administration, and a Ph.D. in Curriculum and Instruction from The University of Texas at Austin. In his spare time, Paul hopes to convince universities and schools that students can have fun and learn at the same time. Jacob Wallén holds a Bachelors degree of arts in the field of computer game design from the University of Skövde, Sweden. He has been making music and working with sound for the greater part of his life, and the education at the University of Skövde made it possible for him to combine his interest for sound with computer games. His bachelor thesis, Från smet till klarhet ‘from batter to better’, is about creating a complete and balanced sound design for computer games. He has been in charge of sound and music for a couple of smaller game projects and he recently finished working with the game Testament (www.testamentgame.com), a game funded by the Church of Sweden. Ulf Wilhelmsson holds a Ph.D from the University of Copenhagen, Denmark. His Ph.D dissertation, Enacting the Point of Being, has a focus on computer games and film theory. Wilhelmsson was one of the initiators of the computer game studies programs that have been offered since 2002 by the school of Humanities & Informatics at the University of Skövde, Sweden and he is currently working as senior lecturer and coordinator for these programs. He is a member of the InGaMe Lab research group (www. his.se/iki/ingame) at the University of Skövde. His research interests lies primarily within computer game studies and integrate film theory, cognitive theory and theories concerned with the audiovisual construction of space and narratives. Andrew Williams is a Principal Lecturer in the School of Business & Creative Technologies at the University of Bolton. He has published on engagement and motivation in game development processes and on the use of competitive strategy games as a way of motivating students. He is currently leading a project relating to the use of gesture-driven interfaces for games. He leads a team of seven in delivering three games-related undergraduate programmes and teaches on the Advanced Games Technology, Games Design Team Project and Games Evaluation modules. He has sat on a number of review panels for the provision of games undergraduate degrees and he is currently external examiner for the University of Hull’s MSc in Games Programming.
472
473
Index
Symbols 3D audio interfaces 53 3D-environments 27 3D-graphics 23 3D-positioned 34 3D space 34, 36, 53 4 dimensions of perception 166
A abstract soundtrack 390 acoustemology 45, 57 acoustic communication 131, 132, 133, 146, 148 acoustic communities 83, 131, 146, 148, 151 acoustic ecology 131, 132, 133, 136, 138, 140, 146, 150, 151, 362, 364, 365, 366, 373, 378, 380, 382, 383 acoustic environment 365, 366 acoustic frustration 9, 10, 20 acoustic realism 131, 148 acoustics 100, 130 acoustic viability 325, 332 Advanced Multimedia Supplements (AMMS) 306, 307, 308, 309 aesthetic independence 392 Affect 103, 107, 109, 116, 117, 118, 119, 120, 124, 125 Affective Gaming 285 Affective Sound 264, 272, 285 Affordance Theory 129 Allure 211 ambient game sounds 183 Ambient Listening 112, 129 ambient sounds 31, 32, 34, 38, 50, 53, 55
Ambisonics 297, 298 ambulatory listening 112, 114, 130 ambulatory visual position 114 Amplitude 62, 68, 70, 73 analytic listening 133 androids 215, 216, 217, 219, 231 anterior and posterior transverse temporal areas (H) 156 anxiety 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 190, 191 Aperture Listening 112, 130 API (Application Programming Interface) 65, 75, 299, 301, 304, 306, 308, 309, 310, 311, 312 Arduino 408, 413 asynchrony 214, 215, 224, 225, 226, 227, 228, 232 attention 153, 154, 157, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173 audio-editing software 121 Audio Entertainment 285 Audio Similarity Matrix (ASM) 245, 246 Audiosurf 62, 67, 68, 73 audio synthesis 340 audio-visiogenic effects 199, 200 Audio-visual 233 audio-visual (bimodal) perception 154, 161, 162, 163, 169 audio-visual media 60 audition (hearing) 155 auditory perception 22, 23, 24, 29, 35, 39, 40, 43 Auditory Scene Analysis 337 aural architecture 46, 57 AuralAttributes Object 299, 300
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Index
aural objects 138, 141, 142, 143 Automated Collaborative Filters (ACFs) 253 automatic playlist generation 253, 254 Autopoiesis 411, 413 avatar 32, 43, 111, 112, 129, 135, 136, 140, 146, 179, 181, 182, 183, 390, 391, 398, 400, 401, 403, 404 Avatar sounds 32
B background diegetic music 63 background music 6, 7, 17, 67, 137, 146, 169 background noise 55, 58 BackgroundSound node 299 behavioural audio 314, 315, 321, 322 behaviourally informed 316 behavioural parameters 323, 324, 330, 331 behavioural realism 325 behaviourist 5 Beowulf 35, 36, 37, 38, 39, 41 Bet Max 3 binary connection tree (BCT) 349 Binaural 173 bi-polar 249 bipolarity 106, 110, 111, 114 bits and chunks 106 black box 316, 327 Brodmann areas 156, 173 bukimi 218 butterfly effect 322 button-mashing 4
C cartoonish 216, 221 Character sounds 32 Chion 100, 101, 103, 104, 105, 106, 112, 123, 127 chiptunes 134, 135, 147, 152 civil inattention 47 Cognitive Emotional Theory 177, 178, 190 cognitive load 98, 101, 104, 114, 115, 116, 119, 174 Cognitive processing 165, 166 Combined Model for the Structuring Computer Game Audio 130 Comprendre 211
474
computer game audio 98, 99, 100, 101, 102, 103, 104, 107, 110, 125, 126, 129 computer game playing 264 computer game sound 78, 80, 81, 85 ConeSound node 299 Constructive interaction 67 constructivism 248 Content 235, 236, 239, 240, 241, 244, 245, 246, 247, 248, 250, 251, 253, 254, 255, 256, 258, 259, 260, 261, 262, 263 context 235, 236, 239, 242, 247, 248, 249, 250, 251, 253, 254, 255, 258, 259, 260, 262, 263, 362, 364, 365, 366, 367, 368, 371, 372, 373, 374, 375, 376, 378, 379, 380, 382, 383 context-oriented 24 Continuous Parameterisation 337 controlled parallel processing (CPP) 163 Cross-modal 233 cross-modal interaction 153 Csound 315, 319, 335 cultural conventions of media and technology 131, 148
D Darwinian Emotional Theory 177, 190 Dataflow 319, 320, 337 DAW [Digital Audio Workstation] 342 Deferred Form 337 Demo Scene 407, 413 Designing Sound 313, 316, 318, 319, 329 Destructive interaction 66 diégèse 83, 87, 88, 89, 90, 93, 197, 205, 209, 212 diegesis 62, 64, 66, 68, 69, 70, 75, 76, 79, 80, 81, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 96, 152 diegetic 60, 61, 62, 63, 64, 65, 66, 68, 71, 73, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 91, 93, 94, 96, 103, 107, 109, 111, 112, 114, 116, 119, 122, 124, 126, 127, 176, 177, 181, 183, 186, 197, 198, 199, 200, 201, 205, 207, 209, 212, 237, 239, 240, 261 diegetic music 60, 61, 63, 64, 65, 66, 75 diegetic sound 78, 80, 81, 82, 83, 84, 85, 86,
Index
87, 91 digital analog convertor (DAC) 314 digital games 265, 268, 270, 275, 276, 279, 280 digital game sound 51, 53 digital game soundscape 56 digital signal processing (DSP) 313, 314, 317, 318, 319, 322, 330, 337, 339 digital visual game design 51 digital waveguide synthesis 341, 345, 346, 347, 360 DigiWall 36, 37, 38, 39, 40 Discretise 360 distractors 164 DJ Hero 69, 73 Donkey Konga 68, 69, 74 Doppler Effect 306, 312 Dorsal Stream 174 Dracula 222, 224, 225, 226, 230 dread 192, 193, 202, 204 Driving Mode 68 Drum Pants 69, 73 dynamic audio 343 dynamic interface 88 Dynamic Profile 211 Dynamics 337
E earlids 29, 34, 51 early sound 293 Easter Eggs 4 ECG/EKG 277 ecology 131, 132, 133, 136, 138, 140, 143, 145, 146, 148, 149, 150 Écouter 211 Effect 103, 107, 109, 115, 116, 117, 118, 119, 120, 124, 125 EGM sound 1, 14, 15 electrocardiogram (ECG) 238 electrodermal activity (EDA) 273, 274, 275 electrodermal response 13, 20 electroencephalograms (EEGs) 13 electromyography (EMG) 238, 273, 274, 277 electronic gambling machines (EGMs) 1, 2, 3, 4, 5, 8, 9, 10, 12, 14, 15, 20 embodied sounds 98, 99, 101, 104, 105, 106,
107, 109, 111, 119, 122, 124, 125, 126, 130 emitter paradigm 404 Emotion 380, 382, 383 Emotional Interaction 263 Emotional Reaction 263 emotional state (E-state) 255, 256, 257, 258, 263 Emphasized Interface Sounds 93, 96 Empirical Methods (Quantitative) 285 encoded 98, 99, 101, 104, 105, 106, 107, 109, 116, 119, 121, 122, 124, 126, 130 Entendre 211 entrainment 362, 366, 367, 372, 373, 374, 378, 379, 383 Environmental Audio Extensions (EAX) 303 ergo-audition 402, 403, 406 essential realism 324, 335 everyday listening 24, 133, 138, 146, 148 Excitation 337 exodiegetic 84 external transdiegetic 85, 91, 97 extra-diegetic 75, 199, 201, 207, 212 eye-centric 34
F Faceposer 214, 215, 226, 227, 228, 230 facial electromyography 13 Falling Mode 68 Fast Convolution 312 fear 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 202, 203, 204, 205, 206, 207, 208, 209, 210 F.E.A.R. 98, 102, 108, 112, 115, 117, 118, 119, 123, 127 Feedback 28, 36, 38, 43 fidelity 131, 132, 133, 134, 137, 138, 140, 141, 143, 145, 147, 148, 152 film music 80, 81, 82, 95 film sound aesthetics 385, 397, 398 film sound design 385, 387, 397, 406 Finite Difference 337 Finite Element 337 finite element method (FEM) 344 First Difference 337
475
Index
first-person shooter (FPS) 51, 53, 112, 113, 135, 138, 140, 141, 145, 224, 273, 274, 275, 276, 404 Fitts’ Law 160, 174 flow 25, 26, 30, 40, 42, 43 flow experience 25 FoleyAutomatic 350, 351, 357 forewarning 201, 204, 205, 207 Free-Field 312 full motion video (FMV) 214, 227 functional Magnetic Resonance Imaging (fMRI) 269, 282, 284 functional transformation method (FTM) 347 Futurist Music 413 fuzzified 256, 258 fuzzing 199 fuzzy logic 254, 256, 259 Fuzzy Rule Based System (FRBS) 257, 258, 259
403, 404, 405, 406, 407, 408, 409 game sound design 384, 385, 386, 387, 405, 406, 407, 408 game (sound) designers 409 gamespace 45, 53, 54, 88, 89, 90, 91, 92, 96, 97 Game System 96 gameworld 23, 25, 31, 33, 35, 36, 43, 44, 51, 53, 78, 79, 80, 81, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 96, 97, 391, 398, 400, 401, 402, 403, 405, 406 gaming community 194 gaming environment 244 Generated music 70 Good Behaviour 330, 338 Grain 211 Graphical Processing Unit (GPU) 352 Grey Goo 338 Guitar Hero 67, 68, 69, 73
G
H
Galvanic skin response (GSR) 13, 14, 20 game audio 153, 154, 318, 320, 321, 323, 326, 327, 329, 333, 335, 336, 337 game audio designer 98, 101 game design 98, 123, 124, 384, 408 GameFlow 22, 25, 26, 28, 29, 31, 32, 35, 39, 42 game-generated sounds 38 Game Metaphor 43 game music 82, 83, 86, 94, 96 gameplay 35, 43, 101, 102, 103, 109, 113, 115, 120, 123, 126, 129, 177, 178, 179, 180, 184, 185, 186, 187, 189, 192, 193, 194, 195, 196, 197, 199, 200, 202, 203, 204, 207, 208, 209, 211, 212, 343, 355, 356 gameplay emotions 178, 179, 180, 189 game sound 1, 2, 9, 10, 12, 13, 14, 15, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 91, 92, 93, 94, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 147, 148, 149, 150, 152, 192, 193, 194, 196, 197, 199, 200, 205, 206, 264, 266, 267, 269, 273, 274, 276, 277, 362, 363, 366, 367, 372, 374, 375, 377, 378, 382, 384, 385, 386, 387, 388, 391, 392, 397,
Half-Life 113, 114, 128 hardwired 24, 30, 40 Head-Related Impulse Response (HRIR) 297 head-related transfer functions (HRTFs) 156, 287, 288, 289, 297, 302, 309, 312 Head Tracking 305, 306, 312 hermeneutic affordances 406 hi-fi 132, 134, 138 higher level semantics 395 Holistic 59 horror computer games 192, 193, 194, 195, 197, 198, 199, 200, 201, 202, 203, 205, 206, 207, 208, 212 Huiberts 98, 99, 102, 103, 104, 125, 126, 127, 129, 130 Human-Centered Design 285 Human-Computer Interaction (HCI) 266, 283, 284, 285, 327, 363, 378, 381 Human Computer Interface (HCI) 25, 237 human emotion 236, 240, 241 human-likeness 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 224, 228, 233 human-likeness of voice 213
476
Index
I
kinediegetic 84
Iconic Interface Sounds 93, 96 ideodiegetic 84 Idiophonic 338 idiot skill 4 IEZA 131, 150 IEZA-framework 98, 99, 102, 103, 104, 106, 107, 110, 114, 115, 119, 122, 124, 125, 126, 130 imitation 80 immersion 2, 4, 7, 28, 36, 42, 43, 103, 106, 110, 111, 112, 132, 140, 142, 143, 145, 150, 176, 177, 182, 186, 188, 189, 265, 266, 267, 269, 270, 271, 272, 273, 274, 276, 277, 278, 279, 280, 282, 283, 285 immersive 3D environments 214 immersive user experience 311 Implementation 338 indie game 389 indoor acoustics 292 Inharmonic 360 input 153, 154, 159, 160, 161, 165, 170 Integrated Interface Sounds 93, 96 Interaction Design 285 interactive ambiences 391 Interactive Institute, Sonic Studio 22, 27, 35 interactivity 153, 154, 159, 160, 161, 165, 166, 170, 172 Interaural Intensity Difference (IID) 288, 291 interaural level differences (ILDs) 155 Interaural Time Difference (ITD) 288, 289, 290, 291 inter-beat intervals (IBI) 238 Interface 103, 107, 109, 117, 119, 120, 125, 129, 130 internal transdiegetic 85, 89, 91 International Phonetic Alphabet (IPA) 227
L
J James-Lange Emotional Theory 177, 180, 190 Just-Noticeable-Difference (JND) 298
K Kalman Filter 157 keynote sounds 52
latency 154, 160, 161, 172, 173 Legend of Zelda 98, 103, 112, 115, 116, 120, 121 leitmotif 81, 201 liberation of the soundtrack 384, 385, 393 lifeless 213 lifelike 213, 215 lip-synchronization 213, 225, 226, 227 lip-vocalization 224 listening modes 24, 28, 29, 131, 133, 138, 145 listening positions 133, 134, 138, 139, 143, 144, 148, 152 LOAD (Level of Audio Detail) 318, 332, 333, 335, 338 Localization 174 locomotion 110, 112, 114 lo-fi 132, 134, 138 logjam 101, 102, 104, 109, 125 loopy 135, 137, 142, 143, 152 losses disguised as wins 3, 5, 10, 14, 15, 17, 20
M Machine Listening 338 mapping 160, 161 Masking 338 Mask Topology 338 Massively multi-player online role-playing games (MMORGs) 147, 148 Mass Profile 211 maximum-likelihood estimation (MLE) 157, 158, 164 McGurk Effect 225 meaning of sounds 196 mediated listening 47, 48 Mediatization 59 metalepsis 81 Metaphorical Interface Sounds 96 Method 338 mimesis 80 mise en scène 193, 194, 202, 203, 207, 211, 212 Mobile Media API (MMAPI) 306, 307, 308, 309, 311
477
Index
modal synthesis 341, 346, 347, 348, 349, 350, 351, 352, 353, 357, 358, 360 mode compression 353 Model 323, 338 mode truncation 353 Monaural 174 mood track 146 morphing 332 morphology 192 movie brats 393, 394, 396 MPEG (Motion Picture Experts Group) 299, 302, 303, 304, 305, 306, 309 multi-modality 161, 174 multi-modal salience 164, 165 multi-player environments 373 multiplayer games 397 Murch’s conceptual model 100, 102, 104, 105, 106, 107, 116, 124, 130 musical diegesis 68, 70, 75 music (embodied) 99 Music Video Games 75 musique concrète 134, 207, 326, 393, 396, 413 Muzak 6, 18, 19
N naive listening 143 naive physics of perception 143, 145 near miss 3, 5, 14, 20 neurochemical transmitters 265 neurophysiological pleasure 265 Next Generation 387, 407 nickel slot 3 noise pollution 49 non-diegetic 60, 61, 62, 63, 64, 65, 73, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 91, 93, 94, 96, 103, 107, 109, 114, 116, 122, 124, 126, 127, 181, 183, 237, 239, 240, 261 non-diegetic music 61, 64, 65 non-diegetic sound 78, 80, 81, 82, 83, 86, 91 non-gambling environments 2 nonhuman-like 220, 223 nonlinear music 67, 75 non-player characters (NPCs) 91, 399, 400, 405 Nouvelle Vague 393, 394, 396
478
O Object sounds 32 objet sonore 393, 401, 413 Occlusion 289, 312, 338 one-dimensional (1D) 345, 347, 360 OpenSL ES 308, 310, 311 Operating System (OS) 306, 307 Ornamental sounds 32 Ouïr 211 outdoor acoustics 288, 294 Overlay Interface Sounds 96
P Parameter 332, 338 Parametric (Signal) Method 338 partial differential equations (PDEs) 344, 347 PAs (amplified public announcements) 147 Perceptual Cycle 154, 162, 166, 174 perceptual feedback 154, 161 perceptual relevance model 157 personal construct psychology (PCP) 248, 249 personal construct theory (PCT) 248 Pervasive Game 43 phenomenology 327, 334 photorealistic 27 Phya 352, 353, 354, 355, 358, 359 physically informed 316, 338, 339 Physically Informed Stochastic Event Modeling (PhISEM) 348, 354 physical modelling 316, 338, 340, 341, 343, 344, 345, 346, 347, 348, 349, 351, 352, 353, 354, 355, 356, 357, 359, 360 physical world 23, 24, 25, 26, 27, 30, 31, 32, 34 Physics Engine 339 physiological response 12, 15, 46 pit music 81 Playlist Generation 254, 259, 263 points of observation 111 PointSound node 299 polymorphism 319, 335 Precomposed music 70 Presence 154, 160, 161, 171, 172, 173, 174 Primary Auditory Cortex 156 Primary Visual Cortex (V1) 156
Index
Principal Component Analysis (PCA) 250 procedural audio 229, 230, 313, 314, 315, 316, 318, 319, 320, 321, 323, 326, 328, 332, 333, 334, 335, 340, 342, 343, 356, 358, 360 programmable sound generators (PSGs) 341 Progression functions 201 psychoacoustics 100, 127, 130, 190, 191 psychoacoustic sound 328 Psychophysiological research 265, 267 psychophysiology 267, 269, 277, 278, 281, 282, 283, 285 Pure Data 319, 320, 337 pure narrative 80, 96
Q Quality of Experience (QoE) 153, 154, 174 quality-oriented 24 quality scaling 353
R range 160, 161, 165, 166, 169 Reactive Audio 339 Reading Mode 68 Realism 233 reality window 323 real-time 313, 315, 320, 328, 333, 335, 338 real time strategy (RTS) 111 real world 44, 46, 47, 49, 50, 51, 59 real world listening 47 real world sound sources 64 reduced listening 393, 394, 413 reinforcement cues 5 Replication 339 resonance 362, 366, 367, 372, 373, 374, 378, 379, 383 reverberation 292, 293, 294, 300, 302, 308 reward schedule 3, 8, 9, 21 Rez 62, 69, 74 role of sound 264, 272, 275 Rolling Sound 21
S Salience 165, 170, 174 salience model 153, 155, 165, 166, 170
sampling plus synthesis (S+S) 321, 339 Scene Description Language (SDL) 302 SceneGraph 299 Schema 174 schizophonic 48, 59 Scrambled Eggs 38, 39 screen music 81 see and hear 24 see-hear 24 self-learning 254 semantics 371, 383 sensible realism 324 Servicescapes 6 Single-Cell Recording 174 skin conductance level (SCL) 238 slot machines 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 15, 17, 19, 20 smell (olfaction) 155 Snapshot Listening 112, 130 Social construction of space 59 social constructivist theory 177 Sonic Architecture 59 sonic branding 11 sonic effect 47, 51, 401 sonic elements 143, 145, 146 sonic environment 99, 100, 101, 102, 103, 104, 106, 107, 110, 112, 116, 119, 121, 123, 124, 125, 126, 130 sonic events 401, 402 Sonic Explorer 348, 349, 351 sonic expression 368, 370, 371, 378 sonic identity 394 sonic objects 399, 404, 413 sonic re-engineering 394 Sonification 399, 414 sound design 100, 121, 123, 124, 126, 127, 176, 180, 189, 191, 362, 363, 364, 366, 367, 369, 372, 375, 378, 379, 382, 383, 384, 385, 386, 387, 390, 392, 393, 395, 397, 398, 399, 400, 403, 404, 405, 406, 407, 408, 409, 410, 412, 414 sound designers 402, 406, 408 sound effects 31, 32, 176, 177, 178, 179, 181, 182, 183, 184, 185, 186, 187, 189, 340, 341, 342, 343, 355, 356, 357 sound images 138
479
Index
Sound layers 383 sound localization 290, 291, 296, 297, 311 soundmarks 46, 52, 55 Sound Node 299, 300, 301, 302 sound objects 313, 314, 315, 316, 322, 339 sound positioning 64 Sound Principles 191 sound rendering 65 soundscape 6, 14, 23, 29, 30, 32, 34, 35, 36, 37, 40, 42, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56, 59, 64, 149, 150, 151, 152, 362, 365, 366, 368, 369, 370, 371, 373, 374, 378, 379, 380, 381, 382, 383, 401, 413 soundscape composition 371, 378, 383 Soundscape Node 299, 300 soundscapes 177, 180, 181, 182, 185, 190 sound script 347 sound signals 52 sound synthesis 341, 342, 343, 344, 346, 347, 348, 350, 351, 353, 354, 355, 356, 357, 359 Soundwalking 59 Source 181, 183, 191 Space Invaders 168, 174 spatial audio 297, 307, 308, 311 spatial audio system 297 spatial sound 287, 288, 289, 290, 294, 297, 298, 299, 301, 303, 304, 305, 309, 311 speech (encoded) 99 speed 159, 160 Statefulness 339 static ambience 391 static interface 88 surprise effects 204 surround sound 301, 302, 303 suspense 176, 177, 178, 180, 181, 182, 183, 184, 185, 186, 187, 189, 190, 191 Suspension of Disbelief 43 Symbolic Interactionist 59 synchronisation points 200 synchronism (simultaneous events) 14 synchronized game sound 176, 184 synchrony 224, 225, 230, 233, 234 synthesis 14
480
T Takagi-Sugeno-Kang (TSK) 257, 259 taste (gustation) 155 telediegetic 84 Tensor 339 three dimensional 334, 347, 349, 350, 359, 385, 387, 391, 398, 404, 405, 407, 412, 414 time on device 4 Timing 181, 182, 191 Tone Wall/Harmonic Field 69 touch (taction or pressure) 155 trans-diegetic 79, 80, 84, 85, 86, 87, 89, 91, 92, 95, 96, 97, 239 transdiegetic sounds 79, 85, 87, 92, 95 two dimensional 319, 323, 334, 342, 347, 357, 358 typology 99, 100
U Uncanny Modality (UM) 214, 217, 222, 223, 224, 226, 227 uncanny speech in computer games 228 Uncanny Valley 213, 214, 215, 216, 217, 218, 219, 220, 229, 230, 231, 232, 233 unstrange 218 urban overload 47 urban soundscape 46, 49 user-centered design (UCD) 285 User Experience (UX) 153, 285 user generated content (UGC) 321 user investment 37 User Studies 285
V van Tol 98, 99, 102, 103, 104, 125, 126, 127, 129, 130 Ventral Stream 174 verisimilitude 131, 132, 133, 140, 141, 142, 143, 145, 147, 148, 152 VFoley 355, 359 videoludic 192, 193, 194, 202, 207, 208, 211, 212 virtual characters 213, 214, 215, 216, 217, 219, 220, 221, 222, 226, 227, 228, 234
Index
Virtual Environment (VE) 27, 56, 81, 89, 90, 160, 237, 246, 340, 341, 343, 346, 347, 348, 351, 358, 359, 365 virtual gameplay environment 248 virtual gameworld 23, 26 virtual physical parameters 340 Virtual Reality Modeling Language (VRML) 302 Virtual Reality (VR) 287, 288, 289, 290, 292, 294, 297, 298, 299, 302, 309, 312 virtual scene 287, 297, 303 virtual soundscape 44, 48, 50, 59 virtual space 45, 46, 53 virtual worlds 23, 32, 35, 44, 55, 62, 65, 70, 313, 397, 404, 405 vision-based 34 Vision (sight) 155 Visual Association Cortex (V2 and V3) 156 visual capture 157, 158 visual representation (viseme) 215, 226, 234 Volume 181, 191
W Walter Murch 98, 129, 130 Warcraft III 98, 102, 111, 115, 116, 118, 119, 123, 128 Waveguide 328, 339 willing suspension of disbelief 370, 383 winning cue 14 World Forum for Acoustic Ecology 48 World Soundscape Project 46
X XNA/XACT 301
Z Zone 103, 107, 109, 115, 116, 117, 118, 119, 120, 125, 129, 130
481