ANIMAL GROUPS IN THREE DIMENSIONS
ANIMAL GROUPS IN THREE DIMENSIONS Edited by JULIA K. PARRISH WILLIAM M. HAMNER University of Washington
University of California, Los Angeles
CAMBRIDGE UNIVERSITY PRESS
PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE
The Pitt Building, Trumpington Street, Cambridge CB2 1RP, United Kingdom CAMBRIDGE UNIVERSITY PRESS
The Edinburgh Building, Cambridge CB2 2RU, United Kingdom 40 West 20th Street, New York, NY 10011-4211, USA 10 Stamford Road, Oakleigh, Melbourne 3166, Australia © Julia K. Parrish and William M. Hamner 1997 This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1997 Printed in the United States of America Typeset in Times Library of Congress Cataloging-in-Publication Data Animal groups in three dimensions / edited by Julia K. Parrish, William M. Hamner. p. cm. Includes bibliographical references and index. ISBN 0-521-46024-7 (he) 1. Animal societies. 2. Animal societies - Simulation methods. 3. Three-dimensional display systems. I. Parrish, Julia K., 1961-. II. Hamner, William, M. QL775.A535 1997 591.5-dc21 97-25867 CIP A catalog record for this book is available from the British Library. ISBN 0-521-46024-7 hardback
To Akira Okubo
Contents
List of contributors Acknowledgments 1
2
3
xiii xvii
Introduction - From individuals to aggregations: Unifying properties, global framework, and the holy grails of congregation Julia K. Parrish, William M. Hamner, and Charles T. Prewitt 1.1 Seeing is believing 1.2 Defining a framework 1.3 Properties of animal congregations 1.4 On adopting new perspectives 1.5 Central themes and "big picture" questions 1.6 Organization - from measurement to models 1.7 What's next? Part one: Imaging and measurement Methods for three-dimensional sensing of animals Jules S. Jaffe 2.1 Introduction 2.2 Existing methods 2.3 Application of three-dimensional measurement techniques to in situ sensing of animal aggregates 2.4 Conclusions Acknowledgments Analytical and digital photogrammetry Jon Osborn 3.1 Introduction 3.2 The geometry and process of image capture 3.3 Stereoscopy 3.4 Tracking
vn
1 1 2 5 6 8 9 13 15 17 17 19 25 35 35 36 36 37 42 50
viii
4
5
6
7
8
Contents 3.5 Discussion 3.6 Examples 3.7 Summary Acoustic visualization of three-dimensional animal aggregations in the ocean Charles H. Greene and Peter H. Wiebe 4.1 Introduction 4.2 Acoustic visualization 4.3 Field studies: Hypotheses, methods, and results 4.4 Discussion Acknowledgments Three-dimensional structure and dynamics of bird flocks Frank Heppner 5.1 Introduction 5.2 Line formations 5.3 Cluster formations 5.4 Future directions and problems Acknowledgments Appendix Three-dimensional measurements of swarming mosquitoes: A probabilistic model, measuring system, and example results Terumi Ikawa and Hidehiko Okabe 6.1 Introduction 6.2 Probabilistic model for stereoscopy 6.3 Measuring system for mosquito swarming 6.4 Spatiotemporal features of swarming and the adaptive significance 6.5 Viewing extension of the method Acknowledgments Part two: Analysis Quantitative analysis of animal movements in congregations Peter Turchin 7.1 Introduction 7.2 Analysis of static spatial patterns 7.3 Group dynamics 7.4 Spatiotemporal analysis 7.5 Conclusion Movements of animals in congregations: An Eulerian analysis of bark beetle swarming Peter Turchin and Gregory Simmons 8.1 Introduction
51 52 59 61 61 62 62 65 66 68 68 70 76 85 87 87
90 90 91 96 97 103 104 105 107 107 108 110 110 112 113 113
Contents
9
10
11
12
8.2 Congregation and mass attack in the southern pine beetle 8.3 An approximate relationship between attractive bias and flux 8.4 Field procedure 8.5 Results 8.6 Conclusion Individual decisions, traffic rules, and emergent pattern in schooling fish Julia K. Parrish and Peter Turchin 9.1 Introduction 9.2 Experimental setup: Data collection 9.3 Group-level patterns 9.4 Attraction/repulsion structuring 9.5 Discussion Aggregate behavior in zooplankton: Phototactic swarming in four developmental stages of Coullana canadensis (Copepoda, Harpacticoida) Jeannette Yen and Elizabeth A. Bundock 10.1 Introduction 10.2 Methods 10.3 Results 10.4 Discussion 10.5 Conclusions Acknowledgments Part three: Behavioral ecology and evolution Is the sum of the parts equal to the whole: The conflict between individuality and group membership William M. Hamner and Julia K. Parrish 11.1 Introduction 11.2 Membership and position 11.3 Costs and benefits to the individual 11.4 Group persistence 11.5 Individuality versus "group" behavior 11.6 Cooperation or veiled conflict 11.7 Concluding remarks Inside or outside? Testing evolutionary predictions of positional effects William L Romey 12.1 Introduction 12.2 Where should a flocker be in a group? 12.3 Individual differences in location 12.4 Differences in selection 12.5 Differences in motivation
ix 114 115 118 119 124 126 126 129 132 135 138
143 143 145 151 155 161 162 163
165 165 166 167 168 170 171 172 174 174 176 182 183 183
x
13
14
15
16
Contents
12.6 Balancing motivations 12.7 Conclusion Acknowledgments Costs and benefits as a function of group size: Experiments on a swarming mysid Paramesopodopsis rufa Fenton David A. Ritz 13.1 Introduction 13.2 Study animal 13.3 Laboratory conditions 13.4 Food capture success versus group size 13.5 Swarm volume in different feeding conditions 13.6 Food capture versus swarm size in the presence of a threat 13.7 Discussion 13.8 Conclusions Acknowledgments Predicting the three-dimensional structure of animal aggregations from functional considerations: The role of information Lawrence M. Dill, C. S. Holling, and Leigh H. Palmer 14.1 Introduction 14.2 Predicting position 14.3 Testing the predictions 14.4 Null models Acknowledgments Appendix Perspectives on sensory integration systems: Problems, opportunities, and predictions Carl R. Schilt and Kenneth S. Norris 15.1 Introduction 15.2 Sensory integration systems 15.3 Problems and opportunities 15.4 Predictions 15.5 Conclusions Acknowledgments Part four: Models Conceptual and methodological issues in the modeling of biological aggregations Simon A. Levin 16.1 Introduction 16.2 The problem of relevant detail
189 193 193
194 194 197 198 198 201 202 204 206 206
207 207 208 214 215 219 219
225 225 226 234 238 243 244 245 247 247 249
Contents
17
18
19
20
16.3 Interacting individuals: From Lagrange to Euler 16.4 Cells to landscapes: From discrete to continuous 16.5 Evolutionary aspects of grouping 16.6 Conclusions Acknowledgments Schooling as a strategy for taxis in a noisy environment Daniel Griinbaum 17.1 Introduction 17.2 Asocial searching: Taxis from directionally varying turning rates 17.3 Simulations of searching with schooling behavior 17.4 A nonspatial deterministic approximation to social taxis 17.5 Discussion Trail following as an adaptable mechanism for popular behavior Leah Edelstein-Keshet 18.1 Introduction 18.2 Trail following in social and cellular systems 18.3 Phenomena stemming from trail following 18.4 Minimal models for trail-following behavior 18.5 Discussion Acknowledgments Metabolic models of fish school behavior - the need for quantitative observations William McFarland and Akira Okubo 19.1 Introduction 19.2 Density in mullet schools 19.3 Mullet school velocity 19.4 Oxygen consumption of swimming mullet 19.5 Modeling oxygen consumption within a mullet school 19.6 Discussion Acknowledgments Symbols Social forces in animal congregations: Interactive, motivational, and sensory aspects Kevin Warburton 20.1 Introduction 20.2 Models of attraction and repulsion 20.3 Sensory modalities mediating the attraction-repulsion system 20.4 Real-life factors mediating attraction and repulsion, and their effects on group cohesion
xi 250 251 253 255 256 257 257 261 265 273 276 282 282 285 286 290 297 300 301 301 304 304 305 306 310 311 311 313 313 314 323 327
xii
Contents
20.5 Models of attraction and repulsion which describe the effects of motivational change 20.6 Conclusion Acknowledgments
330 332 336
References Subject index Taxonomic index
337 373 377
Contributors
Elizabeth A. Bundock Finch University of Health Sciences/The Chicago Medical School, North Chicago, Illinois 60064, USA.
[email protected] Lawrence M. Dill Behavioural Ecology Research Group, Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada.
[email protected] Charles H. Greene Ocean Resources and Ecosystems Program, Corson Hall, Cornell University, Ithaca, New York 14853, USA.
[email protected] Daniel Griinbaum Department of Zoology, Box 351800, University of Washington, Seattle, Washington 98195, USA.
[email protected] William M. Hamner Department of Biology, University of California, Los Angeles, Box 951606, Los Angeles, California 90095-1606, USA.
[email protected] Frank Heppner Zoology Department, University of Rhode Island, Kingston, Rhode Island 02881, USA. C. S. Holling Department of Zoology, University of Florida, Gainesville, Gainesville, Florida 32611, USA.
[email protected] Terumi Ikawa Department of Liberal Arts and Sciences, Morioka College, 808 Sunagome, Takizawa-mura, Iwate-gun 020-01 Japan,
[email protected] Jules S. Jaffe Marine Physical Lab, Scripps Institution of Oceanography, La Jolla, California 92093-0238, USA.
[email protected] Xlll
xiv
Contributors
Leah Edelstein-Keshet Department of Mathematics, University of British Columbia, #121-1984 Mathematics Road, Vancouver, British Columbia, V6T 1Z2, Canada.
[email protected] Simon A. Levin Department of Ecology and Evolutionary Biology, Eno Hall, Princeton University, Princeton, New Jersey 08544, USA.
[email protected] William McFarland Friday Harbor Laboratories, University of Washington, 620 University Road, Friday Harbor, Washington 98250, USA. Kenneth S. Norris Long Marine Laboratory, Institute of Marine Sciences, University of California, 100 Shaffer Road, Santa Cruz, California 95060, USA. Hidehiko Okabe Research Institute for Polymers and Textiles, Higashi, Tsukuba 305, Japan.
[email protected] Akira Okubo
Deceased
Jon Osborn Department of Survey and Spatial Information Science, University of Tasmania at Hobart, GPO Box 252C, Hobart, Tasmania 7001, Australia.
[email protected] Leigh H. Palmer Department of Physics, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, palmer®sfu.ca Julia K. Parrish Department of Zoology, Box 351800, University of Washington, Seattle, Washington 98195, USA.
[email protected] Charles T. Prewitt Carnegie Institution of Washington, Geophysical Laboratory, 5251 Broad Branch Road N.W., Washington, D.C. 20015, USA.
[email protected] David A. Ritz Department of Zoology, University of Tasmania at Hobart, GPO Box 252C, Hobart, Tasmania 7001, Australia.
[email protected] William L. Romey Department of Biology, Kenyon College, Gambler, Ohio 43022-9623, USA.
[email protected] Carl R. Schilt 30602, USA.
Institute of Ecology, University of Georgia, Athens, Georgia
Gregory Simmons 92227, USA.
USDA, APHIS, 4151 Highway 86, Brawley, California
Contributors
xv
Peter Turchin Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut 06269-3042, USA.
[email protected] Kevin Warburton Department of Zoology, The University of Queensland, Brisbane, Queensland 4072, Australia.
[email protected] Peter H. Wiebe Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, USA.
[email protected] Jeannette Yen Marine Sciences Research Center, State University of New York, Stony Brook, New York 11794-5000, USA.
[email protected] Acknowledgments
Many people helped to make this book possible. Lauren Cowles at Cambridge University Press was a patient, understanding editor. Karen Jensen, Trista Patterson, Johanna Salatas, and Jen Cesca typed, edited, and retyped various manuscripts, as well as sent countless emails, letters, and faxes, and xeroxed, collated, and filed hundreds if not thousands of pages. Every author in this book deserves credit for living through an extended process of book preparation which often involved less-than-tactful editing. Mea culpa (JKP). Every author in this book also served as an anonymous reviewer for at least one other chapter; thanks also to E. J. Buskey and T. J. Pitcher for reviewing. Leah Edelstein Keshet and Simon Levin deserve special thanks for facilitating the inclusion of Chapters 6 and 17, respectively. Chapter 17 is reprinted in large part from an article of the same name in Evolutionary Ecology, and appears by permission of the publishers Chapman & Hall. Nothing would have been possible without the foresight of Larry Clarke at the National Science Foundation, who served as program director for the initial grant, "Three-dimensional analysis and computer modeling of schooling patterns," OCE86-16487, to Bill Hamner and Charlie Prewitt, as well as to our workshop grant, "Workshop - Animal aggregations: Three-dimensional measurement and modeling," OCE91-06924, which allowed us to aggregate in Monterey. Finally, and certainly foremost, we owe a huge and growing debt of gratitude to Peggy Hamner who served as wife, sister, friend, mother, grandmother, colleague, editor, typist, overseer, mediator, and always supreme debutante.
xvn
Introduction - From individuals to aggregations: Unifying properties, global framework, and the holy grails of congregation JULIA K. PARRISH, WILLIAM M. HAMNER, AND CHARLES T. PREWITT
1.1 Seeing is believing An aggregation of anything against a background of sameness captures our eye. Congregations of creatures that routinely swarm and cluster or crowd together capture our imagination and generate new descriptive, often florid, collective terms for groups of living things; descriptors that are species-specific and etymologically precise (see Lipton's "Exaltation of Larks," 1991). A swarm of bees, a host of sparrows, and a smack of jellyfish generate crisp images in our mind's eye, while a cloud of goats, a gaggle of flies, and a pod of parrots only generate confusion. There is no collective term in the English language for this wealth of collective adjectives (Lipton 1991), other than terms of "venery" (from the Latin venari, to hunt game), words that initially described aggregations of game animals, clustered conveniently for the huntsman. Some of these terms denote protean behavioral displays that are visually compulsive. However, when our congregations of creatures are behaviorally coordinated in space and time, synchronously moving and wheeling and twisting before us in three-dimensional space, as in a school of smelt or a flock of phalaropes, they subvert our visual ability to focus on an individual animal and, somehow, suddenly the sum of the parts becomes a cohesive whole. Those of us who are terminally entranced with the three-dimensional, hypnotic beauty of synchronized flocks of birds and schools of fish quite simply cannot be cured. We know there is order within these three-dimensional displays, but it is not immediately obvious how to quantify it. Schools of fish have fascinated evolutionary biologists for many years (Williams 1964) because the individuals in the aggregation do not appear to act selfishly at all; rather they seem to behave and interact as if for the benefit of the school as a whole. Indeed, if the individuals within a school did not look and behave similarly, then one of the primary antipredatory advantages associated with
2
Parrish, Hamner, and Prewitt
schooling, anonymity within the aggregation, could not exist. Odd animals are eaten first; however, this does not necessarily mean the dissolution of group structure as all individuals fight to gain access to the best locations (Krause 1994). Natural selection has produced behavioral patterns which emphasize similarity and uniformity within the group, such that coherence and cohesion are hallmarks of many types of animal congregation. It is this tension between survivorship of the individual within the protection of the school and the constraints imposed by living within a group that poses an evolutionary paradox that still has not been resolved. Collective behavior is illustrative of one of the central philosophical issues of biology in particular and science in general, i.e. the issue of individuality, the dichotomy between the sum of the parts and the whole.
1.2 Defining a framework 1.2.1. The phenomena of aggregation Aggregation is a pervasive phenomenon. At the most basic level, an aggregation is a collection of parts or units which form some coherent, often cohesive, whole. Molecules aggregate to form the basic building blocks of matter and substance as we know them. Inanimate objects of all shapes and sizes aggregate to form the familiar landscape within which we live. Beaches are made of aggregations of sand grains or cobblestones, glaciers are made up of compacted aggregations of snowflakes, planets are aggregated into solar systems, and solar systems are aggregated into galaxies. In many cases, inanimate objects are not only aggregated, but sorted along some set of physical gradients. Adjacent sand grains on a beach are apt to be the same size, having been sorted by the physical force of wave action. But beach material is not all the same size. Thus, a beach contains a gradient of grain sizes instead of a random assemblage of sand, pea gravel, and cobble. Furthermore, the sand on a given beach is likely to be predominantly of a single type. Pink sand beaches in Bermuda are made mostly from coral growing in the adjacent reef, whereas black sand beaches in Hawaii are made from the locally abundant volcanic rock. Both physical sorting and local abundance of source material create nonrandom aggregations which then may be arranged into repetitive patterns. Sand on a dune may be arranged by density and size, but dunes are also repetitively arrayed along a beach. However, aggregation is not only passive sorting. Many objects actively aggregate, such that like materials attract, while foreign materials are repelled. Atoms and molecules are both attracted and repulsed, resulting in cohesion into liquids or solids within which the individual units are held at some minimum distance. Human societies have all adopted the phenomenon of arranged, or ordered, aggregation as basic to living. We build a brick wall one ordered row at a time
From individuals to aggregations instead of in haphazard arrangements. Engineers and architects instruct us about the structural and aesthetic properties of ordered arrangements designed to make our surroundings both functional and pleasing to the eye. Many of us sort silverware by type: knives with knives and forks with forks. Commuters in automobiles follow each other in columns determined by the locations of roads and freeways, attracted to their ultimate destinations, but repelled from each other for fear of having an accident (not always successful). In short, ordered arrangements of like objects surround us comfortably, as a consequence of our actions.
1.2.2 Animate aggregations Most of the aggregation that surrounds us, both inanimate and animate, is arrayed in three dimensions, and some of it contains a fourth dimension, time, as well. Like the physical world, animate aggregations and patterns within them can be the result of sorting by physical forces. Assemblages of plants are often found in discrete locations, not only based on where they can grow, but also on where the seeds were carried (Forcell & Harvey 1988). Wind, water, and animals all distribute seeds nonrandomly (e.g. Becker et al. 1985; Skoglund 1990). Animal aggregations also result from physical sorting. In open water, zooplankton are often found in dense aggregations, associated with localized physical phenomena (e.g. Hamner & Schneider 1986). Animals are not necessarily actively attracted to these aggregations; often they are passively transported there via physical processes. These types of assemblages might be called passive aggregation, although this does not preclude the possibility of the aggregation members acting and interacting once within the group. Within the animate world, aggregations often form around an attractive source, with potential members of the aggregation actively recruited to a specific location. Zooplankton aggregate nightly at the surface of the sea as a result of vertical migration. Clumped patches of any resource, such as food or space, attract animals, especially if the resource is limiting. We refer to these types of aggregations as active aggregations. In these situations the aggregation is apt to disperse if the source of attraction wanes. Once seeds have been consumed, birds no longer visit a feeder. Individuals also may continuously join and leave the aggregation, rather than remain continuous members. Thus, turnover may be high even if the aggregation as a whole remains fairly constant in terms of size, density, shape, or location. Although attraction to a common source may be responsible for the creation of the aggregation, repulsion also plays a crucial role in determining group structure (Okubo 1980). Unmitigated attraction would result in an aggregation so
3
4
Parrish, Hamner, and Prewitt
dense that the costs to individual members would quickly outweigh the benefits. As density increases, basic resources, such as oxygen in the aquatic environment, are depleted faster than they can be replenished (see McFarland & Okubo Ch. 19). At the same time, waste products are likely to build up faster than they can be advected out of the group. Repulsion may occur on a global level; i.e. individuals are repulsed from an external source (e.g. Payne 1980), creating an open space or vacuole around the repulsion source (e.g. a predator in a fish school; Pitcher & Parrish 1993). Repulsion also occurs on a local level; i.e. regulation of interindividual density (see Parrish & Turchin Ch. 9). The combination of attractive and repulsive forces should thus define the physical attributes of the group as the spacing between many interacting individuals and forms the emergent pattern we see as group structure. In contrast to plants and pebbles, animals have the ability to react rapidly to changes in their environment. While a sand grain or a seed may fall among others of equal size and origin, it does not do so by choice. Many animal aggregations are formed and maintained by the mutual attraction of members. When the source of attraction is the group itself, we define this behavior as congregation (sensu Turchin 1997). Examples of animal congregations abound: flocks of birds, swarms of insects, schools of fish. Congregations can be shaped by internal, i.e. member-derived, forces, by external forces, and by frictional forces (see Warburton Ch. 20; Okubo 1986). The foraging trails of ant colonies may have structure, determined in part by the surfaces they crawl over. However, given a smooth, featureless environment, the ants would still congregate (EdelsteinKeshet Ch. 18,1994; Gordon et al. 1993). Thus the phenomenon of congregation may be structured by the larger environment within which the group resides as well (Gordon 1994). Although a large variety of animals congregate, interactions within the group differ markedly across species (Bertram 1978). Spatially well-defined congregations, such as fish schools, may be composed of individuals with little to no genetic relation to each other (Hilborn 1991), low fidelity to the group (Helfman 1984), and thus no reason for displaying reciprocal altruism. Schooling fish are generally considered "selfish herds" (Hamilton 1971), in that each individual attempts to take the maximum advantage from group living, independent of the fates of neighbors (Pitcher & Parrish 1993). The fact that three-dimensional structure is apparent does not necessarily lead to the conclusion that the individuals within the group interact socially. Rather than active information transfer (i.e. social interaction), information may be transferred passively (sensu Magurran & Higham 1988). For this reason, we refer to these asocial types of congregations as passive. In a. passive congregation (the FSH of Romey Ch. 12), in-
From individuals to aggregations
5
dividual members are attracted to the group per se, but do not display social behaviors. Many animal congregations, however, are socially developed. Often the individual members are related, sometimes highly so, as in the social insects (see Edelstein-Keshet Ch. 18; Wilson 1975). Unrelated congregation members will often engage in social interactions if group fidelity is high, such that the chance of each individual meeting any of the others is high (Alexander 1974). Social congregations display a variety of interindividual behaviors, necessitating active information transfer. Antennal contact in ants may be used to transfer a variety of information about individual identity or location of resources (Gordon et al. 1993). The rate of contact may also, to some extent, define the structure of the group (see Edelstein-Keshet Ch. 18). Social congregations frequently display a division of labor, such that large tasks unassailable by an individual are accomplished by the group (e.g. hunting in social carnivores - Kruuk 1972, 1975; Packer & Ruttan 1988). The way in which both passive and social congregations transfer information between members about the larger environment which is unsensible by any single individual is the subject of Chapter 15 by Schilt and Norris. Highly social congregations, such as felid or canid packs, or any number of primate groups, may actually display a lack of regularly defined spatial pattern within the group (e.g. Janson 1990), perhaps because of the level of social development. In these cases, constant proximity of neighbors is no longer a requirement for information transfer and the "structure" of the groups is by relatedness and social hierarchy rather than interindividual distance.
1.3 Properties of animal congregations Regardless of species or circumstance, many animal congregations share one or more of the following features. 1. Congregations have edges which are usually very distinct; the change in density from inside to outside is abrupt. This is one operational way to define a group. When a congregation moves or changes shape, the edges remain intact. Thus, individuals are either members or isolates, depending on their location. 2. Many types of animal congregations have fairly uniform densities, particularly when on the move (e.g. herds, flocks, schools). Other types of animal congregations may have a broader distribution of densities most of the time (e.g. midge swarms), yet retain the ability to assemble almost instantaneously into a more uniform mass. Feeding birds often display non-uniform distribu-
6
Parrish, Hamner, and Prewitt
tions around food sources, but if a predator comes into view the flock will take wing as a cohesive, structured unit. 3. Congregations which exist largely as groups of uniform density are often also polarized, with all members facing in the same direction. When a flock of birds is in flight, for instance, it is obvious why this should be so. A bird in the interior of the flock, flying at right angles to the rest, would quite possibly create a significant hazard. However, some animal congregations, notably schooling fish, remain in polarized configurations even at rest. Why this occurs is not known. 4. Within the volume of the group, polarized or not, individuals have the freedom to move with respect to their neighbors. In a resting group this may mean that individuals are constantly shifting positions, even if the position or shape of the congregation as a whole remains static. In moving groups individuals can also re-sort without disturbing the integrity of the group. The ability to shift positions means that individuals can take selfish advantage of momentto-moment circumstances as well as accrue the more general benefit of group membership. 5. Many congregations display coordinated movement patterns of an almost balletic nature. Flocks on the wing appear to turn simultaneously. Fish in schools arc in a fountain-like pattern in response to attack by a predator, completing the move by reaggregating behind the predator. Ant trails branch out in dendritic structures which coalesce back into main paths.
1.4 On adopting new perspectives We live in a three-dimensional medium and we constantly, albeit unconsciously, make thousands of three-dimensional calculations each second. A disproportionately large portion of the human brain is committed to these very functions, yet tiny creatures, like hover flies, make lightning judgments in space and time with hardly any neurological equipment. Even with our stereoscopic, full color, visual abilities we cannot track an individual sardine within a rapidly wheeling school. Perhaps, if we could slow everything down, we would be more effective. So, as scientists, we record the behavior with film or video, and replay the images at slower speed. Again, we are lost. Our films are in two dimensions, and we begin our analysis of three-dimensional behavior at the 0.66% level of confidence. If we film with two or more cameras to capture the third spatial dimension, we then must analytically treat the resulting data set using classical three-dimensional photogrammetric calculations (see Osborn Ch. 3). And then we learn the bad news. Automatic three-dimensional data collection and analysis, for any length of time over several seconds, requires the dedicated attention
From individuals to aggregations of the biggest computers currently on the market. A cloud of gnats obviously does not engage in such time-intensive calculation. There must be simple traffic rules for species' engaging in collective movement. This book is all about animal aggregations in four dimensions, three in space and one in time. It is not confined to just an experimental treatment of the subject. We believe that it will require much more than biology to understand how and why animals do (or do not) congregate in more or less ordered arrangements. Two of us (Parrish and Hamner) experienced the limitations of a purely biological approach, first independently and then together, when we tried to answer questions about how individuals move within groups and how those movements are patterned in space and time. As biologists we found ourselves immersed in a rich literature on why animals aggregate. Hypotheses describing where animals should be in a group, and why they should be there, abound (Alexander 1974; Hamilton 1971; Lazarus 1979; Pitcher et al. 1982b), but the literature on how animals aggregate is much sparser. While we found information on how individuals might match retinal images (Parr 1927), how they might match their speeds (Shaw & Tucker 1965), or how quickly individuals might detect and respond to a stimulus, these papers did not point the way to answering our questions about how these individuals organize themselves in space and time within aggregations. When we pressed these issues, we quickly found ourselves in a technological morass. Following individual animals (or units of anything within a moving aggregation) in space and time turns out to be very difficult. Tracking requires a known frame of reference within which the object moves. If an object moves very fast, the rate at which its position is sampled must also be fast to accurately record changes in speed and direction. For confined objects, such as a fish in a tank, this is relatively easy. However, tracking a fish in the ocean is more difficult, as it is likely to swim away. If a tracking device such as a transponder is attached to a fish, then the receiving array must also move with the fish, and it in turn must be accurately tracked. Quite quickly the limits of technology are reached. Following individual units moving in space and time within a group which also moves is nearly impossible. Even if we did manage to collect the requisite fourdimensional data, analytical tools were not readily available. Distilling fourdimensional data on identified individuals into a form where interesting biological questions can be addressed is a daunting task. We were faced with interesting questions and no way to answer them. So, we did what most of us do when confounded, we found someone with complementary skills to help us solve our problems. Prewitt is a crystallographer, used to thinking about the structure of three-dimensional aggregations and trained in how to detect three-dimensional patterns. In the course of our collaboration we
7
8
Parrish, Hamner, and Prewitt
subsequently discovered other people working in the general field of threedimensional aggregation, many from perspectives that we had not initially considered. Eventually, we came to the conclusion that examining four-dimensional animal aggregations was a multidisciplinary "field" of its own. Like most individuals within any group, researchers interested in four-dimensional problems generally only have a sense of their own work, and little appreciation of that of their "nearest neighbors." Because intellectual disciplines move forward as new ideas are injected into an existing framework, we decided that it was time to reevaluate the framework for the study of animal aggregations. We convened a group of scientists who work on many different aspects of aggregation, both animate and inanimate. This book is the product of our interactions. Because each of us soon saw our studies in a new perspective (new ways to collect data, new methods of analysis, new phenomena to model, new systems for comparison, new questions to ask), we decided to begin at the beginning and review all aspects of the multidisciplinary study of animal aggregation in space-time. This new field encompasses aspects of animal behavior, ecology, and evolution as well as crystallography, geology, photogrammetry, and mathematics. The thread that ties us together is the how and why of aggregation. Sand grains on a beach and fish in a school share some similar properties. Might models of the former elucidate the latter?
1.5 Central themes and "big picture" questions A defining aspect of any field is the set of questions it attempts to answer. As a multidisciplinary group, we have come up with what we refer to as Big Picture Questions (BPQs) — issues central to the study of animal aggregation (a noninclusive list of which follows). One of the central themes connecting all of these questions deals with the basic conundrum of how a set of selfish individuals can apparently act as a cohesive, coherent whole. What are the costs and benefits of group membership? Are they positionally dependent? What information can, and do, individuals use? Do individuals have a sense of the whole? Is there an optimal group size? The study of animal aggregation can be attempted at several levels. The former questions acknowledge the central importance of the individual member and they attempt to examine the group through the combined action of its members. However, one can also look at the entire group as a unit possessing certain prop-
From individuals to aggregations erties. The second core theme embedded in our BPQs addresses the group as a whole. Why are there discrete boundaries? What is the appropriate scale for assessing pattern? Why should pattern exist in three-dimensional aggregations? Is observed three-dimensional structure no more than would result from optimal packing? The third theme attempts to integrate elements of the individual with those of the group - Essentially, trying to define the whole as some function of the parts. What are the assembly rules? Which properties of the group are epiphenomena and which are functional properties that have been selected for? Can models which predict epiphenomena be used to make predictions about individual behavior? None of the BPQs are easy to answer, and several of them are outside the framework of the scientific method, that is, they do not lend themselves to testable predictions. However, we believe these questions are a starting point from which we will launch our studies. In this book, we attempt to address some of these questions, as well as others which are logical extensions of the few presented here.
1.6 Organization - from measurement to models We have organized this book around four central issues: collecting data, analyzing data, the functional biology of aggregation, and modeling aggregation. Within each section the reader will find several chapters devoted to examples of how to address the issue or define the approach. However, each chapter addresses other issues as well. It is impossible to analyze data without first collecting it. It is useful to have model predictions when examining the functional role of individual position within the group. Rather than read cover-to-cover, we encourage readers to follow their own path through the book as each chapter leads to others within and across sections. Neither group structure, nor individual movement within that structure, can be described, analyzed, or modeled without the ability to collect data in X, Y, Z over time. In this respect, many of us have been limited by technology. It is only recently that off-the-shelf systems with the ability to collect four-dimensional information have become available. Prior to the advent of automated data collection, researchers interested in collecting four-dimensional data sets had to repeatedly digitize hundreds, if not thousands, of points. Methods sections in
9
10
Parrish, Hamner, and Prewitt
several fish schooling papers from the 1960s and 1970s are full of agonizing descriptions of the number of frames analyzed (e.g. Partridge et al. 1980 hand digitized over 1.2 million points). The endless hours of data collection were enough to turn anyone away. Today technology offers us not only visual options for data collection but also acoustic methods. In concert, these sensory modalities will eventually allow us to examine animal aggregations at the level of the individual, the group, and the habitat. The first section of the book - Imaging and Measurement - reviews the technology and specific methods available to resolve three-dimensional images and track moving points through space-time. Jaffe (Ch. 2) gives a broad overview of three-dimensional technology before focusing in on acoustic techniques. Greene and Wiebe (Ch. 4) give a specific example of data collected via three-dimensional acoustic technology. While Jaffe uses sound to attempt to follow individual plankters (his FTV system), Greene and Wiebe use sound to map plankton aggregation over several kilometer volumes of open ocean. Thus, acoustic technology lends itself to a tremendously broad range of spatial scales. Osborn (Ch. 3) reviews three-dimensional optical methods which rely on the principles of photogrammetry and gives four short examples of photogrammetric analyses in aquatic systems. The final two chapters provide examples of optical collection of three-dimensional data in aerial systems. Heppner (Ch. 5) discusses the development of devices to follow birds in flocks, along with the underlying reasons for flocking. Ikawa and Okabe (Ch. 6) discuss a system for following the movements of swarming mosquitoes. The search for three-dimensional structure or animal architecture has been one the of holy grails of animal aggregation research. Early attempts to detect structure used physical world examples, such as crystals, as a model (Breder 1976). These attempts were largely unsuccessful because the spacings of animals in a school or flock are not as regular as are atoms or molecules in a crystal and perhaps because these investigators did not employ the full range of possibilities for description that exist in the crystallographic literature. We believe that research on animal aggregations should embrace physical models, especially those created from the study of inanimate aggregation. Several authors in this volume, notably in the sections on Analysis and Models, adapt concepts from the physical sciences that can be useful in a more biological context. For example, the concept of diffusion is used by several authors to describe relative movement of aggregations and/or the movement of individuals within those aggregations. Most people think of diffusion as something that occurs when there is a physical or chemical gradient present in a system. McFarland and Okubo (Ch. 19) use advection and diffusion equations to model oxygen depletion as a function of school size. In contrast to much of the existing literature on
From individuals to aggregations
11
schooling fish, which links schooling with ecological processes such as foraging or predation, these authors suggest that many of the emergent properties of the group (e.g. structure, density, shape) may be dictated by self-imposed physiological constraints. Griinbaum (Ch. 17) models how organisms can detect and aggregate along gradients in a noisy environment. An important aspect of the study of animal aggregation is how one describes relative movement of individuals in schools, flocks, or swarms under varying environmental conditions. Yen and Bundock (Ch. 10), making use of another physically derived concept, measure the fractal dimensions of the trajectories of copepods in two and three dimensions. A fractal dimension by itself is not very useful, but when the dimensions of two or more trajectories are compared, one can conclude that one trajectory is more sinuous than another. This might indicate that individuals making tracks having larger fractal dimensions are more disturbed by the environment than are the others. Analysis of three-dimensional data of aggregating individuals is a relatively new field, emerging as a by-product of our increasing ability to produce threeand four-dimensional data sets. In his introductory chapter to the Analysis section, Turchin (Ch. 7) divided the analysis of animal congregations based on whether data are collected on movement of individuals (Lagrangian) or population fluxes (Eulerian). (This dichotomy is echoed in the introduction to the section on models - Levin Ch. 16.) There are costs and benefits to both approaches. As an illustration, Turchin and Simmons (Ch. 8) adopt a Eulerian approach in the study of pine-bark beetle mass attacks. Rather than attempt to follow thousands of individuals, the evolution and decay of the congregation are tracked by measuring the flux of individuals past set spatial coordinates relative to the attraction source (a Southern pine). For small, moving congregations, following individual trajectories may be more appropriate. Although the Lagrangian approach is fraught with logistical difficulties, it has the advantage of allowing the researcher to analyze behavioral differences between individuals, as well as interactions between congregation members. The final two chapters in this section adopt an individually based approach to analyze the interactions between schooling fish (Parrish & Turchin Ch. 9) and swarming copepods (Yen & Bundock Ch. 10). Whether there is obvious structure or apparently haphazard arrangements of group members, the turnover of individual members through various positions within the aggregation must be mediated by rules governing the flow of traffic, such that both individuality and cohesion are simultaneously maintained. Much like the freeway at rush hour, traffic rules in animal congregations should describe interindividual interactions and predict group-level phenomena at the same time. Several of the chapters in this book deal with ways of describing in-
12
Parrish, Hamner, and Prewitt
terindividual movement, either from real data sets (Turchin & Simmons Ch. 8; Parrish & Turchin Ch. 9, Yen & Bundock Ch. 10), or from models of individualbased interactions (Griinbaum Ch. 17). Physical models and methods of analysis help us discern patterns and provide clues as to how animals might accomplish the monumental task of organizing themselves. However, they do not bear upon the question of why animals organize themselves. Functional considerations, which tend to center on the costs and benefits to the individual, predict a wide range of "optimal" individual actions depending on what selective forces the aggregation is experiencing. Hamner and Parrish set up the dichotomy between the individual and the group within which it exists in the introductory chapter to the section on Behavioral Ecology and Evolution (Ch. 11). If congregations are structured, that does not imply stasis, either of the group as a whole, or of the individuals within it. Group members, having made the basic decision to join and remain in the group, have a variety of positional options open to them. However, the freedom to move within the group is restricted by the positions, or even actions, of other group members. Individual animals simply may not be able to pass by their neighbors. Alternately, they may find it impossible to supersede group members already occupying desirable space. Finally, individuals may find themselves in new positions they did not actively choose to occupy, due to the movement of others around them. This general theme, choices at the level of the individual versus actions at the level of the group, is also addressed by Romey (Ch. 12), who focuses on the question: which positions within the group should individuals choose, and why? The consequences of summed individual actions are addressed by Ritz (Ch. 13) in his examination of the relationship between group size and individual optimality. Traffic rules also provide a way to predict what neighboring individuals will do, given a certain situation, as long as all individuals "play by the rules." Thus, gregarious animals may be paying attention to a much smaller data set than the trajectories of all groupmates within their sensory range. The kinds of information individual group members might use, and the consequences of information transfer across the group, are the subject of chapters by Dill, Holling, and Palmer (Ch. 14) and Schilt and Norris (Ch. 15), respectively. Rules governing individual movement within a congregation, and the emergent properties or "group behaviors" are a subject not easily addressed either experimentally or observationally. The final section of this book explores mathematical approaches to the study of animal aggregation. Modeling is a powerful tool because it allows the freedom to test rules of association by setting assumptions and then determining whether the computer congregations display the attributes of their real-world counterparts. Levin, in his introductory chapter to the Models section (Ch. 16), explores several of the conceptual issues inherent in the
From individuals to aggregations
13
construction and use of models of aggregation, including scale, emergent pattern, and the interaction between explanatory and actual reality. The remaining chapters explore various rule sets under which aggregation will either evolve (Grunbaum Ch. 17; Edelstein-Keshet Ch. 18; Warburton Ch. 20) or break down (McFarland & Okubo Ch. 19).
1.7 What's next? The strength of this volume, namely that it is a broadly based approach to the study of three-dimensional animal aggregations, is also its drawback in that no subject is covered exhaustively and many topics are untouched. Rather than a final work, this book represents an initial attempt to both define and understand animal aggregations in three-dimensional space and time. We intend it as a springboard for future thought, discussion, and science. Furthermore, we are neophytes. Our measuring devices, our computers, our words, and our graphics may never let us adequately describe the aesthetic beauty of a turning flock of starlings or a school of anchovies exploding away from an oncoming tuna. What we see as apparent simplicity we now know is a complex layering of physiology and behavior, both mechanistically and functionally. It is our sincere belief that an interactive, multidisciplinary approach will take us farther in understanding how and why animals aggregate than merely pursuing a strictly biological investigation. It is also more fun.
Part one Imaging and measurement
2 Methods for three-dimensional sensing of animals JULES S. JAFFE
2.1 Introduction Most animals have the ability to sense their world three-dimensionally. Using visual, pressure-related, and chemical cues, which are filtered through sophisticated neural circuitry and central processing, animals continually measure the distance to and shape of objects in their environment. If the objects are moving, as in an oncoming predator, or fleeing prey, animals automatically track and predict trajectories, allowing both escape and interception. Of course, all of these complex calculations are processed in real time. When we attempt to emulate these feats of three-dimensional perception with scientific instruments and complex computers, we quickly discover that four-dimensional measurement is extremely difficult. This chapter is a general survey of the area of three-dimensional sensing. In recent years, three-dimensional sensing has seen much development, and there is every indication that the current proliferation of computer techniques and capabilities will fuel the continued acceleration of this field. The primary goal of this chapter is to review present methods used to measure the three-dimensional patterns of individuals as well as aggregations of animals in the laboratory and the field. Secondarily, I will comment on the future potential of methods under development. In a very general sense, the requirements for three-dimensional imaging should be examined with respect to the information that one is interested in. However, most applications require measurement in both space and time. For instance, in the area of medical imaging, both static (i.e. anatomical) and dynamic (i.e. physiological) information is necessary to judge individual health. In the case of astronomy, both the position and the trajectory of heavenly bodies are necessary to deduce the dynamic laws by which the solar system evolves.
17
18
Jules S.Jaffe
In a physical sense, we are interested in measuring the state vector, a function of both position and time of a system. In principle this is everything we need to know. However, many three-dimensional sensing methods assess state variables indirectly, and these data require considerable subsequent interpretation. To reconstruct the necessary three- or four-dimensional data set, inferences must be made by using the physical relationship of the measured property to the ultimate property that one is interested in measuring. For example, when X-rays are used to map the three-dimensional structure of internal organs, a procedure known as computerized tomography (CT), what is actually being measured is the X-ray absorption of electromagnetic radiation through biological tissue, which is proportional to atomic number. With respect to animal aggregations, measurements can be made at several relevant scales. At coarser scales of resolution, one would like to have the capability to measure parameters of an entire animal aggregation, ideally, in both space and time. What is its position, what is its shape, and how do these parameters evolve? In addition, a set of much more detailed questions can be asked regarding individual animals. That is, how do the positions of aggregation members vary over time in both absolute space, as well as relative to each other. However, one need not stop here, because traits of the individual animal, for instance tail beat frequency and amplitude in fish, can be examined for themselves as well as with respect to those of neighboring individuals. A criterion of primary importance to three-dimensional sensing is the degree of spatial resolution, functionally defined by how close together two point objects can be placed without coalescing. Resolution can be specified linearly, e.g. in meters, or more appropriate to three-dimensional systems, in three-dimensional volumetric resolution elements-voxels. These are the three-dimensional analogs of pixels. Adding dimensions quickly increases the complexity of the measurement process. One can easily image the side of a cube containing 1000 elements; however, in three dimensions, the total number of voxels in the image is 109. Three-dimensional imaging thus places severe demands on both processing speed and memory storage. Additional complications often arise because the actual measurement is, in fact, a mathematically transformed attribute of the real three-dimensional object. Suppose that we desire a vector X which consists of 109 elements. What actually is measured is vector Y, related to X via a linear transformation X - H Y. In this case, the matrix H would consist of 109 X 109 elements, beyond the capability of current computers. To overcome this problem, many high-resolution imaging methods (e.g., X-ray, computerized tomography) approximate this space as 1000 sets of 1000 X 1000 matrices, which can be computed quite easily.
Methods for three-dimensional sensing
19
Let us imagine then, that we are interested in examining a state vector, S(X,t), at some number of three-dimensional locations, with some spatial resolution, AX and some temporal resolution, At. The following sections will indicate how this has been done in the past and how some of the emerging techniques will contribute to our knowledge of the three-dimensional structure of animal aggregations in the future.
2.2 Existing methods Existing methods for measuring three-dimensional information can be classified in many different ways. I have grouped available techniques into a somewhat ad hoc classification by similarity of mathematical procedure. Three-dimensional sensing can be appreciated through knowledge of mathematical techniques called inversion procedures. A forward model is used to predict the resultant set of observations of an experiment (the outcome) given prior knowledge of the three-dimensional structure of the object, the collection geometry, and the experimental procedure. By contrast, an inversion procedure computes the three-dimensional structure of an object; the latter is a more difficult problem, partly because knowledge of the forward model is a prerequisite to the inversion. Application of these procedures can be found in such diverse fields as seismology (Menke 1984) and medical imaging (Herman 1979; Kak & Slaney 1987). A journal is now dedicated solely to this class of mathematical analyses (Inverse Problems).
2.2.1 Transmission techniques Let us slice a three-dimensional structure into parallel two-dimensional sections (Fig. 2.1). If the complete structure of each of these parallel sections can be obtained, the entire three-dimensional object can be reconstructed. Transmission techniques achieve this goal by passing natural or artificially created radiation through an object (Fig. 2.1) and projecting a shadow image onto a recording device. The incident radiance distribution of intensity /0, will be attenuated as: (2.1) where the path integral dl is taken over the radiation path through the object and f(r) is the object density resulting in the attenuation of the radiation. In the easiest case, scattering will be small compared to attenuation and radiation will
20
Jules S. Jajfe
o <x> y)
Figure 2.1. Geometry for a transmission tomography experiment.
propagate through the structure in a straight line. Considering only the ith single plane of density/(x, y, zt), by taking the logarithm of the measured intensity and neglecting an additive constant, the received intensity, can be measured as:
• 1f
(2.2)
f(r)dl
(e.flline
Here, the notation (B,i) line signifies that for a general projection, the integrals are computed along a line specified by 6 (view angle) and t (distance along the projection). Fourier transforms of this equation produce: where
= sxcosd
sy sin 9
(2.3)
This is the projection slice theorem, which states that the projection of a twodimensional object can be related to a slice through the structure of the Fourier transform of the object. The entire two-dimensional transform of the object can be obtained by taking different projections (corresponding to the shadows of different views) and then "filling up" Fourier space. Once the entire Fourier space is filled at sufficient density, an inverse Fourier transform (two-dimensional) can be used to obtain all of the two-dimensional slices needed to recreate the threedimensional structure of an object. This latter process is known as tomography.
Methods for three-dimensional sensing
21
In the general case, data are represented by the data vector Y, the collection geometry can be embodied in a transformation matrix H, and the unknown values (i.e. the density values in three-dimensions) are represented by X. The datacollection process can then be approximated by the linear transformation: Y = HX
(2.4)
And the unknown density can be obtained from an inversion procedure: X = H~lY
(2.5)
The matrix for a 10003 voxel image is 10009, but by using the tomographic approximation the problem can be solved as a set of 1000, 10002 images, summarized as: f(r) = F~ ^WF^Pgil))}
where
s = sx cos 0 + sy sin 0
(2.6)
Here, Wis a weighting matrix which takes into consideration some of the redundancy in the Fourier coefficients. F{ represents a one-dimensional Fourier transform, and F2 represents an inverse two-dimensional Fourier transform. Many complex three-dimensional problems can be solved by using simplifications of these matrix structures. The most widely known application of three-dimensional inversion techniques is computerized tomography (CT) (Herman 1979). This technique uses either a parallel or fan beam of X-rays which are projected through a body over a wide range of incidence angles. Starting from the shadow of the object, Pe(t), the data are subjected to a linear inversion procedure, as above, allowing computation of the three-dimensional structure. As is common practice now, threedimensional computer graphics techniques permit these data to be viewed as a composite three-dimensional object, rather than as a set of two-dimensional slices (Fishman et al. 1987). The electron microscope is another example of the use of three-dimensional transmission imaging techniques. Transmission electron tomography can be used at very high resolution to image individual molecules (Henderson et al. 1990) or, even at lower resolution, to image the threedimensional structure of DNA (Olins et al. 1983). Transmission techniques have been used also for sonar imaging. The original idea was to use the attenuation of a sonar beam in a manner similar to computerized X-ray tomography, known as acoustic tomography (Mueller et al. 1979); however, because of unacceptable scattering, resolution was poor, and a more complex method based on travel times was developed. Acoustic diffraction tomography accommodates the fact that sound waves diffract when traveling through tissue. Unfortunately, both of these methods have met with limited practical success, probably due to multiple scattering.
22
Jules S. Jaffe
Deciphering the three-dimensional structure of various biological specimens can also be accomplished with optical microscopy. Serial microscopy (Agard & Sedat 1983) uses the fact that optical microscopes can be set to an extremely narrow depth of field, and thus only one "slice" of a specimen is in focus at a time. Ideally, serial two-dimensional photographs at different depths are inverted to generate three-dimensional structure (Castleman 1979). Unfortunately, data obtained from serial sections is often convolved together in such a way that accurate three-dimensional reconstruction is difficult. A more recent development, confocal microscopy (Wilson 1990), seems to circumvent some of the problems associated with the serial sectioning technique. This method uses a set of camera pinholes to eliminate scattered light (which contains little information about the structure). Procedurally, the system is similar to optical serial sectioning except that a much shorter depth of field is possible. One additional feature of this method is that the resolution of the resultant structures is twice as good as that of a standard microscope. For three-dimensional reconstruction of objects, the short depth of field is invaluable. Many commercial confocal optical imaging systems are in the marketplace today, and a host of biological structures are being explored using this technique. Holographic optical techniques also have recently been used to look at the three-dimensional structure of both natural (zooplankton) and man-made (cavitation nuclei) structures in the ocean. Although several configurations are possible, the technique that seems to be favored consists of photographing an inline or Fraunhoffer hologram. Here, the wave that is scattered by the set of objects is allowed to interfere with the wave that is unscattered and propagated straight through the structure. The interference pattern created between the unscattered and the scattered beam is recorded on high-resolution film. The three-dimensional structure of the object can be reconstructed by using an optical bench which illuminates the recording film with a coherent light beam and then uses a set of lenses to image the resultant pattern. A computer recording can then be made on a plane-by-plane basis to visualize the entire volume, or the two-dimensional images can be viewed directly. A system currently under development (Schulze et al. 1992) will have the capability of observing zooplankton at very high resolution in a volume of approximately 1000 cm3. In some cases, specific optical properties of the object(s) can be used to facilitate three-dimensional imaging. An example is fluorescence imaging, a technique that will allow three-dimensional mapping of phytoplankton via the fluorescence of their chlorophyll-a distribution (Palowitch & Jaffe 1992) (Fig. 2.2). A light stripe beam is projected parallel to the camera plane at a distance and an image of the fluorescence induced in the chlorophyll is recorded by the camera. The illumination stripe is then translated to the next plane and another
Methods for three-dimensional sensing
23
SEQUENTIAL IMAGE PLANES
LIGHT SOURCE Figure 2.2. The system for measuring the three-dimension distribution of chlorophyll-a using fluorescence imaging. Depicted here is an experimental setup that consists of a light source and a camera.
image is collected. This procedure is continued in sequence until a given volume is mapped out. A system of equations can be used to describe the forward model, which can then be inverted to determine the three-dimensional distribution of chlorophyll-a in a way similar to the optical serial microscopy mentioned above. The technique now works for small volumes (1 m3), and the potential exists for imaging larger volumes as well as using other biochemical compounds.
2.2.2 Emission techniques Emission techniques use natural or induced emission of sound or electromagnetic radiation to locate, and in some cases track, objects in three dimensions. In the case of natural emissions, these methods are referred to as passive techniques. For example, radio astronomy images distant celestial bodies, and sound emitted by animals can be used to track them. In active techniques emissions are induced. Examples include stimulating a nucleus into a higher energy state and imaging the electromagnetic wave emitted during decay or attaching a sonar "pinger" to an animal. The most widely known contemporary active emission techniques are used in medical imaging. An array of different imaging techniques all use the emission of energy from inside the structure. In some cases, such as magnetic resonance imaging or MRI, an image is formed by scanning through an activated three-
24
Jules S. Jajfe
dimensional volume. In the other techniques, radionuclides are first ingested and then imaged. In positron emission tomography (PET), a pair of coincident y rays are given off and are then localized by coincidental reception at a ring of detectors. More information on these medical techniques is found in general texts (Kak & Slaney 1987).
2.2.3 Other techniques Two other types of imaging methods which do not really fit into the above scheme but which deserve attention are reflection methods and triangulation methods. Remote sensing, or monostatic reflection methods, represents a class of techniques in which the illumination source and the receiver are close together or even superimposed. A schematic of a simple reflection imaging technique which can be applied to any type of propagation wave is shown in Figure 2.3. In this application one measures the "time of flight" of either light or sound waves of the reflected wave over the target range. Here a pulse of illuminating energy is propagated in the medium at either a single angle with a narrow beam or a multiplicity of angles with a wide beam (Fig. 2.3). A single sensor in the first case, or an array of sensors in the second, is used to judge both the intensity of the reflected radiation and the amount of time (time of flight) it takes for the wave to return. For opaque reflective bodies, only one return pulse per look direction is
©Mfc/SIO
Figure 2.3. A simple reflection imaging system.
Methods for three-dimensional sensing
25
measured. For translucent bodies, radiation is backscattered at all ranges. In both cases, by knowing the time the wave was sent and the speed of radiation in the medium, the reflectivity of the three-dimensional scene as a function of range and direction of return of the wave can be computed. In the case of optical techniques, the extremely fast speed of light (3.0 108 m/sec) creates several advantages and disadvantages. On the one hand, range resolution can be limited, due to the extremely fast digitizing hardware that is needed to record the transient wave. On the other hand, the extreme speed of the light wave can allow many pulses over a very short time period, permitting simpler optics and a much higher scan rate. In the case of sonar techniques, the slower speed of sound (1.5 103 m/sec) permits a much slower digitization rate at the expense of a slower scan speed. Triangulation methods are used primarily in conjunction with optical imaging. The basic idea is that if several views of an object (like stereo pairs) are available, then location in three dimensions can be inferred from a computational procedure which uses "optical disparity," or the difference between the images. In the general case, two views are sufficient; however, several problems usually occur. One of them concerns deciding which points in the final image correspond to others in the second images (the correspondence issue). As the number of objects increases, correspondence becomes an increasing problem. In addition, there is an issue of data sensitivity. Unfortunately, these two problems work against each other. For example, given identical views, the correspondence is, in fact, trivial. However, there is no optical disparity from very different images. On the other hand, given substantial optical disparity from very different views, the accuracy in the reconstruction would be great if similar points could be identified. Osborn (Ch. 3) reviews optical triangulation methods extensively.
2.3 Application of three-dimensional measurement techniques to in situ sensing of animal aggregates Many of the techniques I have described are used primarily to discern the threedimensional structure of a single object (e.g. the internal configuration of organs within a human body) or the three-dimensional spatial relationship between objects (e.g. fluorescence imaging). At present, these technologies have yet to be applied to track an individual, or group of individuals, as they move through space and time. An interesting result of my broad review of these many approaches to three-dimensional mensuration is that almost none of them meet our specific goal of simultaneous good resolution in both time and space. It is also clear that one of the problems in applying the above techniques to sensing individual animals and animal aggregations is the often prohibitive cost. On the
26
Jules S. Jaffe
other hand, surely there are clever ways of using these principles to learn more about animal aggregations.
2.3.1 Optics Optical techniques have unique advantages. The ready availability of either video or film allows lots of information to be stored at quite high rates. For example, in the case of video, most black and white images currently can be approximated by a matrix of grey levels consisting of 360 X 240 elements. Video images are usually recorded at 30 frames/second; if a single element is equal to one byte, aggregated data storage is several megabytes/second. Film also has tremendous storage capability. In the case of 35 mm holographic imaging film, 3000 line pairs/mm can be recorded. Here, the amount of information that can be stored is as high as 109 bytes in a single frame! Optical techniques also have good inherent spatial resolution. Graves (1977) used a slowly sinking camera to attempt to measure the density of fish photographically as the device sank through schools of anchovies. Klimley and Brown (1983) used stereophotography to measure size and spacing of hammerhead sharks and Aoki et al. (1986) did the same with schools of jack mackerel (Trachurus japonicus) and mackerel {Scomber sp.). Graves localized the fish spatially in 3-D by assuming that the animals were all the same size, so that smaller images represented fish that were farther from the camera. A stereophotographic approach avoids this objection, but it is still limited in range by the attenuation of light in seawater and the opacity of fish to light. Partridge et al. (1980) and Cullen et al. (1965) judged the three-dimensional position of fishes in a school via a structured lighting technique that cast shadows of the fish on the side or bottom of the aquarium. By judging the distance between the fish and its shadow, the three-dimensional position of the fish was deduced. This facilitated the studies of many aspects of the behavior of these animals in schools (Partridge 1981). However, this technique is not readily adaptable to the field. Osborn (Ch. 3) provides a detailed discussion of the use of multiple optical images (photogrammetry) to map and track individuals in three dimensions. However, optical techniques cannot look through the bodies of animals that are opaque to light. Ultimately, one cannot see the forest through the trees. When one animal occludes another, the three-dimensional positions of all the animals cannot be determined. In fact, only the outside of an aggregation can be determined in most situations. This has limited the use of optical techniques to situations where the densities of the animals is low or where ranges are short, so that the projection of one animal upon another does not occur.
Methods for three-dimensional sensing
27
2.3.2 Sonar Unlike light, sound will both reflect off of, and pass through, most living tissue. Furthermore, because water conducts sound much farther than the attenuation distance of light in even the clearest oceans, acoustic imaging is a much more flexible tool. For example, acoustic imaging can be used at night, or at depth, where introduced light necessary for optical resolution might alter the behavior of the animals being measured. The use of acoustics in fish stock assessment is not new. Almost all sonar methods can be classified as reflection techniques. Echo sounders have been used in fish detection since the mid 1930s (Sund 1935). Since then, two major approaches to acoustical fish stock assessment have gained prominence: echo counting and echo integration. Echo counting identifies echoes from individual targets, which allows the numerical density of fish to be directly estimated. With a good target strength model (see below), the size distribution of fish within a school can be estimated. However, this technique depends upon the fish being distributed sparsely enough so that individual echoes are distinguishable. Therefore, abundance in large, dense schools will be underestimated. Echo integration, first proposed by Dragesund and Olsen in 1965 (Dragesund & Olsen 1965), relates the total acoustical energy reflected from a school of fish to the amount of biomass it contains. Because it does not require individual echoes, echo integration can be used with higher concentrations of fish than can the echo counting technique. The major drawback is that the relationship between fish size and echo energy must be known before density can be evaluated. This is by no means a simple problem, and has been the focus of research since the development of the technique. In the typical application, the total reflected energy is assumed to be a known linear function of the number of fish. In many cases the dependence of target strength (B) on fish length (L) can be described by the equation (Foote 1983): TS = 20 log L + constant
(2.7)
However, this relationship breaks down when the density of the aggregation substantially attentuates the sound in the observation field. For example, Rottingen (1976) has shown that there are nonlinearities associated with shadowing; fish nearer to the transducer block some of the sound energy so that more distant fish contribute less. Modifications to the linear equation to account for shadowing have been proposed (Foote 1990). Target strength relates the sound level transmitted to a fish to that reflected from it. It is most notably a function of the presence or absence of swim bladder (that is, a contained gaseous medium), although the species under consideration,
28
Jules S. Jqffe
fish size, and even time of day can also affect target strength. Foote determined that 90-95% of the reflected energy from an individual fish is due to the reflection from the swim bladder (MacLennan 1990). Target strengths of Atlantic mackerel (Scomber scombrus) with no swim bladder are about 20 dB lower those of slightly smaller cod (Gadus morhua), which have swim bladders (MacLennan 1990). In addition, MacLennan noted a diurnal variation of about 7 dB in the target strengths of both mackerel and cod. Target strengths have been measured in a variety of ways, including the use of dead fish, live anesthetized fish, caged fish, and finally, free swimming fish, with the last believed to be the most realistic. Two methods of direct in situ measurement have gained prominence in recent years; dual-beam sonar and split-beam sonar. In general, however, both approaches give satisfactory results in terms of the target strengths of individual fish. Interested readers are directed to Ehrenberg (1979). In addition, for a general review of different methods, see Foote (1991). An interesting alternative approach to fish size estimation is to measure the Doppler spread of a narrow-band acoustic signal and relate this to tail beat velocity, which is in turn related to fish length. Holliday (1972, 1974) has also suggested a possible approach to fish identification through the use of wide-band sonar. Because each species has a characteristic size and shape of swim bladder, at a given body length, the frequency content of the reflection from a school of fish will be enhanced at the frequencies corresponding to the resonant frequencies of their swim bladders. Despite the obvious benefits of sonar for oceanographic applications of threedimensional imaging, there has been a paucity of effort to date. In fact, few documents refer to the advantages of using multidimensional arrays for fish assessment. Sonar sensing of the underwater environment is a natural alternative to visual imaging in situations where greater range capability is desired. However, in contrast to light, the speed of sound in water is relatively slow (1500 m/sec), and this prevents several different types of schemes that are feasible in optical imaging from being used with sound. Underwater "high-frequency" sonar imaging has been an area of intense research interest for some time (Sutton 1979). The basic principles upon which almost all of these sonar devices work stem from the relationship between a group of transducers and its sensitivity pattern (Goodman 1986). Assuming that/(jc) represents the object to be imaged and the far field pattern of this object is/(s), the equation relating/(x) and/(s) at the appropriate range can be represented as:
f(s) = exp (ikl) exp| ^5?1 jj f(x) exp[-2m(s-x)/\l} dx^
(2.8)
Methods for three-dimensional sensing
29
The integration is taken over the region of the object/(x). Here / represents the distance from the object to the receiving array and k is the wave number 2ir/l. Note that this relationship can be viewed as a Fourier transformation. The designer of a sonar imaging system must determine how to convert from the Fourier domain to the real space domain in order to obtain an image. One option would be to build a sonar lens; however, adequate sonar lenses are extremely temperature sensitive (Sutton 1979). Alternatively, the relationship can be inverted either by implementing the math electronically or by using software (Goodman 1986). An additional criteria which describes the performance of a sonar imaging system is the range resolution, or accuracy in the judgment of distance. In this case, the resolution or ability to discriminate the distance between two targets in range can be related to the length of the pulse T used to create the image as: I = cT/2
(2.9)
where c is the speed of sound and / is the range-resolving capability of the system. A more sophisticated treatment of this topic takes into account the ability of the imaging apparatus to produce very short bursts of sound. Here, the temporal bandwidth of the transducer (BW) can be related to the shortest temporal pulse that can be created so that T = //BW. That is, the temporal duration of the pulse is the inverse of the bandwidth. This treatment has considered only the simplest type of waveform that can be generated, a gated sinusoid. More sophisticated signal design and its associated processing can result in increased signal to noise ratio (SNR), but the fundamental diffraction-limited and bandwidth-limited resolution can rarely be exceeded. An additional complication occurs in sonar imaging that is not typically present in the light optical case. Because most sonar imaging systems use almost single wavelength sound and most surfaces are rough with respect to this wavelength (1.5 mm @ 1 MHz), the reflection of sound can be considered equivalent to the superposition of a random distribution of time delayed wave forms. This leads to a special kind of multiplicative noise called speckle (Goodman 1986). For fully developed speckle, the SNR is 1. Clearly, this results in very noisy images. Jaffe (1991) has proposed an approximate classification of sonar imaging systems, relative to increasing complexity (Fig. 2.4). In the simplest case, a single transducer is used to obtain an image of continuous backscatter by transmitting sound and then recording the magnitude of the returned wave as a function of time. This is essentially a one-dimensional imaging system. An increase in complexity and also functionality can be obtained by pointing the device in different directions and recording the intensity of the backscattered sound as a function of
30
Jules S. Jqffe
SINGLE TRANSDUCER
SINGLE BEAM MECHANICALLY SCANNED VERSION OF (a)
MQ
IMAGJNG
Figure 2.4. A proposed classification for sonar imaging systems. inspector angle, to obtain two-dimensional images. See Greene and Wiebe (Ch. 4) for an interesting three-dimensional application of this technique. A third option, used by a number of commercial systems, is to have a onedimensional array of transmitting and receiving transducers. These systems propagate a beam of sound which is broad in both the horizontal and vertical direction and then resolve the horizontal beam into a number of narrow beams in the horizontal. The image that is formed is essentially a two-dimensional map of backscatter intensity at a given look-angle versus distance. Finally, use of a two-dimensional array can resolve the image into individual beams in both the horizontal and the vertical directions. These systems can create a threedimensional map of backscatter intensity versus direction in both horizontal and vertical direction.
2.3.3 Three-dimensional acoustic imaging Over the last several years we have been developing a three-dimensional underwater imaging system (Fig. 2.5), primarily for tracking zooplankton. The system is composed of two sets of eight "side scan"-like array elements operating at a frequency of 450 kHz which are stacked and pointed in slightly different directions (2 degrees). One set of eight transducers is used as a transmitting array while the other set receives. The system transmits sequentially on all eight transducers in the transmitting array and reflected sound is continuously received, amplified, and digitized on all eight of the receiving transducers. This process continues until all of the transmitting transducers have played their sounds. If
Methods for three-dimensional sensing
31
IMAGING SEQUENCE: Transmit on Row 1 -- Receive on all Columns
I
I
Transmit on Row 8 -- Receive on all Columns
Transmit Channels
(fi)
SYSTEM SPECIFICATIONS: Center frequency (f0): 445 kHz Pulse length (t p ): 40-100 msec Number of beams: 64 Receive Channels (8)
BEAM ORIENTATION:
Number of transmitters and receivers: 16 All elements are identical 2° x 20° Horizontal resolution 2° Vertical resolution 2° Data acquisition rate: 4 frame/sec
©Jaffe/SIO
Figure 2.5. The beam configuration for the FTV system.
one considers the three-dimensional space as a matrix, our system scans the space by transmitting on the rows, one by one, and receiving on all of the columns each time. The pointing angles of the transmit and receive transducers provide the azimuth and bearing resolution, and the time delay of the signals, after transmit, provides the range information. The system's specifications are summarized in Figure 2.5. Figure 2.6 shows the result of tracing a moving object in three dimensions. The approximate dimensions of the figure are 1 m X 1 m X 3 m. Shown inside the rectangle, on the left-hand side of each figure, is a feedback signal which ini-
32
Jules S. Jaffe
Figure 2.6. Successive scenes of a test target from an animated film.
tialized each pulse. The test target is shown in the right-hand side of each figure as a solid three-dimensional patch. It is obvious that the object can be visualized as it moves in three dimensions. In the future the system will be deployed on an underwater robot to track individual animals in the water column. We also plan to use it at a fixed depth to quantify the flux of animals as they migrate vertically through the water column on a daily basis. We have also pursued the idea of creating a dual-frequency, dual-resolution imaging system. Because the cost of a sonar system is proportional to the number of independent look directions, it makes sense to maximize the amount of information that can be acquired from a given set of sonar transducers. In some situations a high-resolution image is desirable even though it may be unnecessary to have this high resolution over the entire field of view of the sonar. In fact, a lower resolution image may be suitable over the entire field of view with a highresolution image only in the center of the image. I call this type of imaging system "foveal," in analogy to the basic principle of human vision (Fig. 2.7). The system is designed to assess both the extent of aggregations of fish and characteristics therein, such as interanimal distances. Lower frequencies determine the outline of the school of fish and higher frequencies concurrently obtain interanimal spacings.
Methods for three-dimensional sensing
33
Advanced Concept for Dual Resolution Sonar /} Imaging System
©Jaffc/SIO
Figure 2.7. The proposed foveal sonar imaging system. We are currently developing a prototype of this system by augmenting our existing 450-kHz imaging system with an addition set of transducers at 1.5 MHz. We plan to create a system which has a set of 8 2A degree X 2A degree beams with a concentric field of view (Fig. 2.7). This system could be used to image both the extent of a mass of zooplankton and the individual animals inside. Another possible use of this system is to look at predator-prey interactions between small fish and zooplankton. Because these two classes of animals have very different acoustic target strengths as a function of frequency, it has been difficult to simultaneously image both targets using one frequency. It is possible that a larger version of this system could be used to answer some of the interesting questions posed in this book with respect to movement of individuals and resultant "group" behavior of fish schools. What are the requirements that such a system would have in space and in time? As a modest goal, one can imagine a laboratory type system that has the capability of discerning the three-dimensional positions of an aggregation offish in a tank that is approximately 20 m across (Fig. 2.8). For example, Partridge (1981) analyzed the movement of saithe in a 10-m tank using an optical threedimensional system. Here, fish velocities were always less than 1 m/sec and the interanimal spacings were never less than approximately 2-5 cm. Given this as a starting point the basic system can be designed. Because of the speed of sound propagation (1500 m/sec), it takes approximately 1/15 of a second for the sound to be reflected from the farthest object in a
34
Jules S. Jaffe
ELECTRONICS
COMPUTER
TRANSMIT BEAM (60°) ©MCc/SIO
Figure 2.8. The proposed mills crossed array three-dimensional imaging system.
tank of this size. If interanimal spacing is approximately 0.3 m and the top speed of the animals is 1 m/sec, it makes sense to have a system with an angular resolution of approximately 10 cm and a frame rate of at least 10 frames/sec. With the frame rate and the spatial resolution matched, the animals can then be tracked unambiguously. A system with a center frequency at 200 kHz and an aperture of .75 m would certainly fulfill this goal. The resolution of this array would be approximately 10 cm at the full tank width of 10 m. A transmit element of small size (1 \ ) could be used to insonify a cone of width 60 degrees. A crossed array of receivers of 100 elements in the horizontal and 100 elements in the vertical could be used to obtain the necessary resolution in these directions. With a bandwidth of 25 kHz, range resolution would be approximately 3 cm. Crossed arrays are a solution to obtain two-dimensional beams without having fully populated arrays; e.g. this system would have 200 receiving elements instead of 10,000. Systems of this type are known as Mills crossed arrays and have well known trade-offs between side lobe suppression and angular resolution. Other types of array designs such as random or aperiodic arrays are possible, but unconventional. The advantages of this type of acoustical three-dimensional remote sensing system over optical
Methods for three-dimensional sensing
35
systems would be that (1) the sonar system could operate in optically opaque environments and (2) occlusion of animals would not be a problem if the animal densities were not too high.
2.4 Conclusions I have reviewed the most common types of three-dimensional imaging systems that exist today. These systems are in routine use and have been used to address a variety of scientific questions from geology to cell biology to animal behavior. Although many variations of these systems exist, they can all be put into a relatively coherent framework via the classification methodology introduced here. The application of these systems with respect to sensing animals or animal aggregations has been addressed. The special area of interest pursued here has been the application of these techniques to sensing animals in the sea. In this context both optical and sonar imaging systems have been reviewed and their relative advantages listed. Several projects, underway in my lab, with the collective goals of obtaining three-dimensional information about the distribution and extent of both phytoplankton and zooplankton in the sea have been discussed.
Acknowledgments The author would like to thank the National Science Foundation for supporting this work under grant OCE 89-143000, the NOAA National Sea Grant College Program, Department of Commerce, under grant number NA89AA-D-SD128, project OE-15 through the California Sea Grant College, and the Office of Naval Research, grant number N0014-89-1419. The author would also like to thank Andrew W. Palowitch and Duncan E. McGehee for contributing to this chapter and Kenneth Foote for reviewing an early version of the manuscript.
Analytical and digital photogrammetry JON OSBORN
3.1 Introduction Photogrammetry is the "art, science, and technology of obtaining reliable quantitative information about physical objects and the environment through the process of recording, measuring and interpreting photographic images and patterns of radiant imagery derived from sensor systems" (Karara 1989). This chapter is an introduction to photogrammetry and its application to biological measurement as it relates to the analysis of an animal's position and motion within a three-dimensional aggregation of similar individuals. The principle analytical methods of measuring three-dimensional coordinates from two photographic images, the geometric basis of stereovision, and three-dimensional visualization are described. Although I concentrate on two-camera systems, the principals can be extended to multistation photogrammetry. I use typical commercial photogrammetric systems and recent applications to illustrate the use of photogrammetry and its costs and complexity. Measurement techniques are centrally important to all of the discussions of three-dimensional animal aggregations in this book. Humans capture images of their surroundings through paired two-dimensional, camera-like eyes, and our brains are able to instantly reconstruct complex images that accurately reflect the three-dimensional world. We make measurements of distance, volume, shape, color, and motion with blinding speed, and we react behaviorally to this incredibly complex array of information with reliability and safety. We take these astonishing feats of visual performance for granted, and it is only when we attempt to recreate simple three-dimensional optical tasks with cameras and computers that we begin to appreciate the actual complexity of three-dimensional optical measurement. Animal aggregations in rapid motion create visual confusion that confounds our ability to track individual animals visually within an aggregation. Cameras and computers can resolve this confusion because cameras record behavioral 36
Analytical and digital photogrammetry
37
patterns that can be reviewed repetitively and in slow motion, providing a perceptual luxury that we could otherwise never experience. When movement within the aggregation is reviewed in slow motion, most of the visual confusion disappears. We can then begin to sort out how individual animals react to one another and to individuals of other species, such as when a tuna attacks a school of fish or a falcon a flock of finches.
3.2 The geometry and process of image capture A photograph is a two-dimensional projection of three-dimensional reality, also known as the object space. For a photograph taken with a geometrically perfect camera, every point in three-dimensional object space is imaged through the perspective center of the camera onto a perfectly flat image plane, the photograph. The distance from the perspective center (camera) to the image plane (film) is called the principal distance, and in most respects this is the same as the focal length. For convenience, photogrammetric formulae are normally derived in terms of a positive image, positioned in front of the perspective center (Fig. 3.1). A photograph is not an orthographic projection and it cannot be read linearly like a map. Mathematical representation of a single photograph is normally based on the principle of collinearity. According to collinearity, any point in the object space is connected by a straight line running through the image point on the positive and ending at the perspective center (Fig. 3.1). This condition is expressed by two equations for every such point. The collinearity condition equations can be extended to include parameters which describe additional systematic errors in the image, such as errors caused by disturbance of the light rays passing through object space, distortion of rays within the camera lens, distortions of the recording medium (film or electronic sensor), and errors attributable to the image-measurement process. To reconstruct an object in three dimensions, it must be photographed from at least two positions. Although the collinearity condition is sufficient to calculate the parameters which describe a stereomodel, the mathematical solution may also be based on the principle of coplanarity. That is, for any object point, there is a single plane described by the object point and the two perspective centers, which includes the two image points on each positive (Fig. 3.1). This condition is expressed by one equation for every such object point. The mathematical reconstruction of an object using image coordinates measured on two or more photographs is normally achieved by calculating three distinct orientations: interior (within the cameras), relative (between the cameras), and absolute (between the cameras and the object space). I will describe the rel-
Jon Osborn
38
L2
perspective centres
principal distance
object space
Figure 3.1. Geometry of a stereopair. evant parameters at each scale of orientation, and then discuss how to solve for them. 3.2.1 Interior orientation Interior orientation describes the internal geometry of the camera with reference to four values: principle distance, principal point, lens distortions, and image distortions. The principal point is the intersection of the camera's optical axis at the focal plane, the coordinates of which are expressed relative to either the edge of the image, fiducial marks at the corners of the film, or a grid of reseau marks which are imaged onto the film. Lens distortions can be described by radial and decentering distortion functions. Image distortions, caused by film or sensor distortion, are often irregular and are thus harder to quantify. Cameras are classified according to the accuracy and stability of their interior orientation. Metric cameras are precisely constructed instruments, specifically designed for photogrammetric applications, and they exhibit very low and stable lens distortions. Fiducial marks are used to define the image coordinate system, and film-flattening devices such as a vacuum back are used to minimize film distortion. Metric cameras are manufactured by Wild, Zeiss, Hasselblad, and others. There are no truly
Analytical and digital photogrammetry
39
metric underwater cameras. Semimetric cameras exhibit relatively low and stable lens distortions. They may contain fiducial marks and, rather than a filmflattening device, contain a grid of reseau marks so that film distortions can be modeled. Semimetric cameras are manufactured by Rollei, Leica, Hasselblad, and others. The Camera Alive CAMEL 70 mm and the Photosea 2000 are semimetric underwater cameras. Digital CCD video cameras and electronic still cameras can usually be considered to be semimetric. Nonmetric amateur cameras are not designed for photogrammetry, but they are affordable and they have operational advantages such as low cost. The Nikonos is a nonmetric underwater camera. Nonmetric cameras exhibit high and unstable distortions and do not contain fiducial or reseau marks. In an analytical solution, it is the unreliability of distortions more than their magnitude that is of concern, because systematic errors can easily be modeled. Yet even unreliability can be overcome by making frequent calibrations (i.e. for each photograph taken). For most biological applications, the type of camera to be used depends more on project considerations, such as cost, portability, and object space control, rather than on accuracy. Various calibration techniques are used to determine a camera's interior orientation parameters. Laboratory methods (collimators and goniometers) are used for large-format aerial mapping cameras, but these are used only occasionally for metric terrestrial cameras, and they are usually not appropriate for semimetric or nonmetric cameras. Calibration ranges are three-dimensional arrays of targets with known geometric orientations and distances (control points). The known object space coordinates of the targets and their measured image space (photo) coordinates are used to solve for the interior orientation parameters contained in the modified collinearity condition equations. At least fifteen control points are normally required (e.g. Fig. 3.2). Because the position of the cameras (particularly the camera-object distance) and the principal distance are usually both unknown and are highly correlated in the solution, it is imperative that the control points are noncoplanar and that there must be as much depth as possible. An alternative approach is analytical plumb-line calibration (Brown 1971; Fryer & Brown 1986) in which a series of known straight lines are photographed. This method is based on the fact that, for a distortion-free lens, each straight line in object space should appear as a straight line on the image. Therefore, measured departures from collinearity can be attributed to lens distortion. A disadvantage is that the principal distance and the coordinates of the principal point cannot normally be determined using this method. A convenient method of obtaining photographs for a plumb-line calibration is to photograph a series of straight lines, rotate the camera by 90 degrees and then rephotograph. Typically, fifty points along each of approximately ten plumb lines are observed (Fryer 1992).
40
Jon Osborn
Figure 3.2. A control frame for underwater calibrations.
On-the-job calibration is used with nonmetric cameras. The technique is the same as calibration ranges except that the control points must surround the object to be measured, and must be used to calibrate the camera for each and every photograph. Although this overcomes the limitations of unstable interior orientation, on-the-job calibration has obvious disadvantages when measuring animal aggregations, namely the inconvenience of requiring some sort of control frame as well as its probable effects on animal behavior. If nonmetric cameras are used with only periodic calibrations, measurement error becomes a consideration. Self-calibration is a more sophisticated method of camera calibration (Shih 1989). Like on-the-job calibration, self-calibration requires a large number of well-defined targets on or surrounding the object. However, the targets do not need to be coordinated. Self-calibration techniques are used in very high-accuracy industrial applications of convergent close-range photogrammetry and are of limited general application. When using semimetric and nonmetric cameras, calibration range or on-the-job calibration techniques are normally the most appropriate solution in close-range biological applications.
Analytical and digital photogrammetry
41
3.2.2 Relative orientation Once the camera has been calibrated, the geometric relationship between cameras, or relative orientation, must be measured. Relative orientation is a function of the position and the angular orientation of one camera relative to the other. Relative orientation parameters are usually solved for analytically by comparing the positions of image points that appear in each photograph. Either the coplanarity or collinearity condition can be used to solve for the position and orientation of the two cameras in an arbitrary coordinate system. Direct measurement of the orientation and position of the cameras is usually not sufficient to determine relative orientation because it is difficult to physically locate the perspective center. Once interior and relative orientation have been achieved, it is possible to measure three-dimensional coordinates. This allows the creation of a three-dimensional optical or mathematical model. However, there is still no absolute scale of measurement or orientation with respect to the three-dimensional object space.
3.2.3 Absolute orientation Absolute orientation relates the cameras and the three-dimensional model to an object space coordinate system. By using points of known geometrical orientation (control points) in the field of view of the cameras, absolute orientation parameters can be easily solved for analytically. For in situ observations, the absolute orientation defines the scale of the stereomodel and relates the fixed camera geometry to the vertical and, if necessary, to absolute position and azimuth (e.g. a north point). There are two common methods of calculating absolute orientation parameters. If the coplanarity condition equations have been used to determine a relative orientation, then coordinates of object points, measured in the model space, can be transformed into an absolute coordinate system with the use of a three-dimensional similarity transformation. A minimum of three object space control points must appear in the stereomodel so that it can be related to the object space coordinate system. Alternatively, the collinearity condition equations can be used to calculate the absolute orientation of each camera without going through the intermediate model space calculations. Once the interior, relative, and absolute orientations of a stereopair are known, image coordinates of image points appearing in both photographs can be measured and used to calculate corresponding object space coordinates. The mathe-
42
Jon Osborn
matical model and computational procedures used to solve for the orientation parameters and object space coordinates vary considerably. Some, such as the Direct Linear Transformation (DLT) (Abdel-Aziz & Karara 1971; Marzan & Karara 1975; Walton 1994), can provide a more direct solution than the procedure described above and are very suitable for many close-range biological applications. Increasing accuracy, of course, requires increasing rigor and greater complexity. McGlone (1989) and Walton (1994) discuss alternative algorithms.
3.3 Stereoscopy One of the most common approaches to photogrammetry is stereoscopy, where two cameras are mounted with their optical axes nearly parallel. Although this is not a necessary condition for photogrammetry, it is useful if one wishes to view the target as a three-dimensional stereoscopic image. The stereoscopic approach does offer some real advantages, including reliable visual correlation of corresponding images, increased measuring speed, and improved photointerpretation, the last leading to more reliable identification and increased accuracy if the image quality is poor. Close-mounted parallel cameras are also simpler to handle than cameras mounted along orthogonal axes. The geometric basis of stereovision is illustrated in Figure 3.3. The essential requirement for three-dimensional visualization is that the two corresponding image points and the observer's eyes lie in the same optical plane, the epipolar plane. If the observer's left eye observes only the left image and the right eye observes only the right image, changes in parallax (or apparent displacement between the corresponding image points) are interpreted by the brain as height changes in the perceived stereomodel. A variety of techniques is used to ensure optical separation. The simplest example is the stereoscope, in which optical overlap of the two eyes is physically inhibited. Nearly all high-accuracy analytical stereoscopic systems use separate optical trains. Anaglyphic systems use filters of complimentary colors, usually one red and one cyan, to produce separate optical trains. The observer wears glasses with left and right filters that correspond to the filters through which the left and right photographs have been projected. This is a cheap and simple solution; however, it does preclude the use of color film and results in a significant amount of light loss. Anaglyphic systems have been implemented in some threedimensional computer displays. Polarized light systems use the same principal as anaglyphies except that polarizing filters are used instead of color filters. The advantages are that color film can be viewed and light loss is minimal. One of the
Analytical and digital photogrammetry
43
model space
Figure 3.3. The geometric basis of stereovision. most versatile stereoviewing systems currently available is based on the polarized projection system. Using a high-resolution graphics monitor, interlaced images are projected through a synchronized liquid crystal shutter so that differently polarized images are consecutively presented to the viewer. The viewer wears appropriately polarized glasses to view the stereomodel. Stereo-image alternation systems use synchronized mechanical shutters to alternately obscure the projected image and interrupt the observer's line of sight for the left and then the right eye. Graphics monitors based on this approach are available, with the left and right images updated on a graphics screen synchronized with LCD shutters on the left and right side of the viewers' glasses. Although there are several techniques for viewing the stereomodel, none of these alone allows for data collection (e.g. intra-object distance). Analytical data reduction of film images is accomplished by the use of comparators or analytical stereoplotters (see Petrie 1990 for a general review). Comparators are used to measure the image coordinates of targets; they do not necessarily perform any further processing of the measured coordinates. Monocomparators measure only
Jon Osborn
44
i i
T Plotter
X
LH photo
RH photo
y Computer x y
X,Y »
Data file X.Y.Z
II11 Ml 11II
Illllflllll
left Y
right X
Figure 3.4. Schematic illustration of a stereocomparator (adapted from Petrie 1990). one image whereas stereocomparators (Fig. 3.4) allow corresponding images to be measured while the observer views a stereomodel. However, the observer must continually adjust the position of the left and right images because stereocomparators do not automatically maintain the stereomodel. The real advantage of stereocomparators is that by viewing a stereomodel, the observer can ensure that corresponding left and right images of every target are correctly correlated. Reliable correlation of target coordinates measured on a monocomparator, depending on the number and distribution of targets, can be a very difficult task. On the other hand, stereocomparators are generally unsuitable for highly convergent photography because they cannot accommodate the image rotations required to maintain the epipolar geometry necessary for viewing a stereomodel. Image coordinates measured on a comparator can be used to calculate the true coordinates (object space). For many biological applications, where only a relatively small number of measurements have to be made and where feedback mechanisms for contour tracing are not required, stereocomparators are an attractive data-reduction system (Fig. 3.4). Unlike comparators, analytical stereoplotters automatically maintain the stereomodel (Fig. 3.5). Once the model is set up, three-dimensional object space coordinates are inputted directly into the computer, the left and right image coordinates are calculated, and the images (or measuring marks) are then shifted accordingly. Analytical stereoplotters are used almost exclusively for analytical aerial mapping because of the requirements of contour plotting and the need to drive the measuring mark to predefined object space coordinates. A wide range of analytical stereoplotters is available (Karara 1989). A number of these run
Analytical and digital photogrammetry
PX,
py
px |
j py
45
software & orientation data
• -1 I
t f
to. X,Y
Plotter
Computer RH photo
LH photo
Data file X.Y.Z optics I
X
Y
Z
Figure 3.5. Schematic illustration of a fully analytical stereoplotter (adapted from Petrie 1990).
software specifically designed for close-range photogrammetry and appropriate for biological applications.
3.3.1 Digital imagery All of the techniques noted above exploit film-based image analysis techniques. However, images can also be captured digitally, that is, projected through a lens directly onto a pixel array at the back of the camera. The advantages of capturing an image digitally include high temporal resolution, three-dimensional visualization of dynamic processes, digital enhancement of the image, and real-time measurement. In addition, digital photogrammetry offers the possibility of automatic target recognition and tracking. Until recent years, photogrammetry has not made use of digital cameras to extract spatial information, although these techniques are widely used in remote sensing (see Jaffe Ch. 2). In part, this is because photographic emulsions had a much higher resolution and provided greater accuracy than did digital images, metric digital cameras simply were not available, and real-time visualization and measurement were not normally required. Furthermore, storage and manipulation of digital images demanded expensive computers. All of these problems are being resolved. Real-time visualization and measurement are required for a growing number of close-range applications, particularly in robotics, industrial quality assurance, and a wide variety of medical and
46
Jon Osborn
biological applications. High-resolution metric digital cameras are becoming available. In the past twenty years, commercially available arrays have increased from 100 100 pixels to over 3000 2000. High-definition television is certain to increase the availability of cameras containing high-resolution sensors. Accurate and reliable target recognition and correlation techniques are being used in military, industrial, and medical applications, although it is these aspects that remain the greatest impediment to fully automatic digital photogrammetry. Digital photogrammetry requires solid-state digital cameras, of which the charge coupled device (CCD) is a well-known example. These still contain both geometric and radiometric errors, although all indications are that their geometric reliability is currently at least as high as that of semimetric film cameras. Geometric errors result from a variety of sources, including lens distortion, nonperpendicularity of the image plane and the camera's optical axis, the limited spatial resolution of the pixel array, differing pixel spacing in the x and y directions of the pixel array, and distortions resulting from the analog-to-digital conversions from a CCD video array either directly to the computer or via tape. Although commonly referred to as digital cameras, CCD area array video cameras essentially rely on analog processes. The charge builds up in the array elements and is transferred from the array as an analog signal. The image is usually stored on tape as an analog signal and later converted to a digital image for analysis (A/D conversion) using a. frame grabber. To properly digitize an image, the image source (camera or tape player) and the framegrabber must be synchronized. If video cameras are used, most of the errors result from synchronization differences. Beyer (1990, 1992, 1993) describes radiometric and geometric accuracy of digital cameras and framegrabbers. Most framegrabbers rely on a line synchronization process called phase locked loop (PLL) genlocking to control the sampling rate of the video signal. This can lead to significant errors across each line of the digital image and between lines of the image, usually referred to as line-jitter. Line-jitter can be avoided by running the camera and the framegrabber on the same clock, a process known as pixel synchronous frame grabbing. Unfortunately, many cameras and framegrabbers cannot be driven by an external clock, and only provide for PLL genlocking. Digital video cameras and still video cameras, on the other hand, use internal pixel synchronous framegrabbing and output a digital image free of line-jitter. When a tape player is the framegrabber's image source, there are additional sources of error, the most significant being caused by the instability of the synchronization pulses contained in the video signal. To capture images reliably from a tape it may be necessary to play the signal through a time base corrector,
Analytical and digital photogrammetry
Al
which normally collects either a full frame or a field of the output image and reestablishes reliable synchronization (e.g. Inoue 1986). Recent advances in computer technology allow the observer to view digitized images in reconstructed three-dimensional image space (i.e. the stereomodel). Digital Photogrammetric Workstations (DPWSs) permit real-time threedimensional visualization. By using a floating cursor superimposed on the screen, the observer can measure three-dimensional object space coordinates from the image model. Basic image processing routines can be used to enhance the images while viewing in real time. More sophisticated image restoration and enhancement techniques can also be used. The recent availability of DPWSs is a consequence of the dramatic increase in affordable and sufficiently powerful computers. An important additional possibility offered by a digital photogrammetric approach is automation. Two aspects of this process, target recognition and target correlation (see Fig. 3.6), essentially the only "human" activities on a modern analytical plotter, deserve particular comment. To automatically extract threedimensional data it is necessary to recognize and correlate features in one image with corresponding features in the matching image. There are essentially two approaches: grey-level-based and feature-based matching. Algorithmic aspects of image matching that are primarily concerned with speed - such as employing image pyramids (e.g. Ackerman & Hahn 1991; Schenk et al. 1990); extensions to standard algorithms such as applying geometric constraints (e.g. Griin & Baltsavias 1987; Wrobel & Weisensee 1987), or window shaping (e.g. Norvelle 1992); as well as the more esoteric matching algorithms such as relational matching, Fourier domain methods, and Al methods (e.g. Shapiro & Haralick 1987; Lemmens 1988; Greenfeld 1991; Gopfert 1980) - are outside the scope of this discussion. For a description of these techniques see reviews by Dowman et al. (1992), Lemmens (1988), or Weisensee and Wrobel (1991). Grey-level-based matching, also known as area- or signal-based matching, relies directly and exclusively on the image signal. The two most commonly employed grey-level-based matching methods are cross-correlation matching and least squares matching. In cross-correlation methods a smaller target array is systematically shifted across a larger search array. The target array may be either an "ideal" target the observer wishes to identify or an actual subsample from one of the digitized images. The result of this process is an array of discrete correlation coefficients from which a most likely position of the match can be interpolated. The mathematical basis of cross-correlation matching and some different correlation coefficients are reported by Duda and Hart (1973), Haggren and Haajanen (1990), Gopfert (1980), Lemmens (1988), Trinder et al. (1990) and Pilgrim (1992). Although relatively simple, cross-correlation approaches have
48
Jon Oshorn camera 1
camera 2
Image Acquisition
Enhancement
Function selection
Segmentation
Image
Extraction
Recognition
Characteristics
Sub-pixel point location
Correlation
Photogrammetric solution
Orientation parameters & object space control
3D object space coordinates
Figure 3.6. Processing steps for real-time measurement (adapted from El-Hakim 1986; Karara 1989).
some disadvantages (Dowman 1984; Trinder et al. 1990; Mitchell 1991; Pilgrim 1992). Cross-correlation is computationally expensive, especially if subpixel matching is attempted. Furthermore, it does not adapt to geometric distortions (e.g. from scale differences and rotations) in each image, nor does it adapt to radiometric differences (e.g. from illumination effects and shadows) or image differences (e.g. from occlusions). Finally, cross-correlation does not provide reliable feedback on the accuracy of the match. Thus, although cross-correlation techniques can be useful approximate methods, they are unsuitable for highprecision matching.
Analytical and digital photogrammetry
49
Least squares matching techniques are appropriate when high-precision matching is needed (see for example Ackermann 1984; Lemmens 1988; Trinder et al. 1990; Rosenholm 1987; Mitchell 1991; or Pilgrim 1992). In this technique, the grey level of each pixel in a defined subset of one digitized image (a window) is related to the grey level of a corresponding pixel in a window from the second image, via some functional model that incorporates both geometric and radiometric differences in the images. Matching is achieved by minimizing the sum of the squared residuals of the grey-level values. The least squares approach has several advantages (Trinder et al. 1990; Mitchell 1991; Weisensee & Wrobel 1991; Pilgrim 1992). The technique is computationally efficient because a pixelby-pixel search of the target array over the entire search array is not required. Other advantages include the potential for incorporation of error detection, such as data snooping, as well as the fact that the model provides data on the quality of the match. Feature- or attribute-based matching (Forstner 1986), which relies on extracting information about the structure and attributes of images, is a suitable technique for low-accuracy applications or as a precursor to a least squares area based match (e.g. Chen et al. 1990). The information extracted might include numerical descriptors (contrast, length, or area), topological descriptors (connectedness), or symbolic descriptors (shape classification). Operators are used to detect specific features, for instance edges, in the image resulting from object shape, color, or shadows. Once specific features have been recognized, they are matched to corresponding features in the other image on the basis of their attributes (Trinder et al. 1990). Information extraction is normally referred to as segmentation. Although there are many different approaches, three categories (Pilgrim 1992) are illustrative: When only a few regions of interest exist in an image, thresholding can be used to divide the image into scenes by separating targets from the background on the basis of brightness (i.e. grey level). Image thresholding prior to target detection is commonly applied in real-time photogrammetry (e.g. Grun 1988; Haggren 1986; Wong & Ho 1986). Portions of the image with similar attributes, such as a similar statistical distribution of grey levels, can be isolated as discrete regions for matching. Finally, image analysis techniques such as edge or point detectors can be used to isolate specific features in the image. Feature-based matching has several advantages (Forstner 1986; Trinder et al. 1990; Pilgrim 1992). It is more versatile and generally faster than either the cross-correlation or least square method and has a greater range of convergence than least square techniques. Its disadvantages (Trinder et al. 1990; Weisensee & Wrobel 1991; Schenk & Toth 1992) are that accuracy is limited to approximately
50
Jon Osborn
the pixel size of the data, estimation of matching accuracy is more difficult, and implementation is more difficult than least squares (area) based matching. Much of the current research in object recognition and matching relies on defining targets in terms of generalized geometric objects, such as lines, planes, and cylinders. While this is appropriate for industrial applications such as quality assurance for manufactured parts, these methods are less well suited to biological tasks where animals move and their shape changes as a function of animal distance and orientation with respect to the cameras. Aloimonos and Rosenfeld (1991) note, " . . . it seems unlikely that such techniques can handle object recognition in natural three-dimensional scenes, or can deal with scenes that deal with a large number of possible objects." Image correlation often exploits the epipolar geometry illustrated in Figure 3.3. The coordinates of a target in one image and the known exterior orientation parameters are used to define the epipolar plane and thus the epipolar line along which the second image of the same target must lie. Correlation along that line leads to a most likely solution for the position of the corresponding target (e.g. Konecny & Pape 1981). However, correlating targets in an animal aggregation can be considerably more complicated than more common mapping tasks, because the targets are clustered in three dimensions. Ambiguities arise when more than one animal lies along an epipolar line, or when two animals cross paths. In these situations, three-dimensional coordinates can be resolved only by using three images rather than two or by relying on humans to edit the model. Errors such as an ambiguity lead to outliers, a common problem in photogrammetry. If the image is of a predictable surface, as in most photogrammetric mapping, then outlier detection is not particularly difficult. For a structure such as an animal aggregation where there is essentially no a priori knowledge of the structure's dimensions, outlier detection can be extremely difficult. If a small number of undetected outliers cannot be tolerated in the model for which the data are being collected, then unsupervised image recognition and measurement routines should not be used.
3.4 Tracking Once a target has been reliably identified and its coordinates measured in each image, its three-dimensional position can be calculated. The remaining task is to track the object in space and time. The complexity of this task is truly formidable. Success depends upon individual animal parameters (e.g. speed and path complexity), group level parameters (e.g. number and density of animals and the frequency of occlusions), and technical parameters (e.g. spatial accuracy of the measurements and the image sampling rate). It is difficult to generalize about
Analytical and digital photogrammetry
51
the probable success of motion tracking techniques applied to animal aggregations as most of these problems noted above have yet to be resolved. Some preliminary attempts at tracking are described below in this chapter and in Chapter 9 by Parrish and Turchin. Readers interested in photogrammetric threedimensional tracking systems are referred to Haggren and Leikas (1987), Walton (1990, 1994), Mostafavi (1990), Turner et al. (1991, 1992), Axelsson (1992a, 1992b), Grim (1992), and Dowman et al. (1992).
3.5. Discussion Accurate and reliable fully automated target recognition, correlation, and tracking is still in the developmental stage. Although accuracy requirements may not be particularly high for biological applications, recognition issues are more complicated than for physical systems, and identifying outliers caused by incorrect correlation can be quite difficult. Most commercial automatic recognition systems rely on feature recognition for binary images (e.g. a high-contrast edge recognized as black against white). Therefore, image recognition of aggregating ants filmed against a high-contrast background is relatively simple. But automatic identification of every fish in a school (e.g. Fig.3.7), let alone recognition of the head and tail of each individual, becomes a problem of considerable
Figure 3.7. Schooling fish.
52
Jon Osborn
difficulty. In my experience, obtaining images of sufficient quality to allow human recognition and correlation is one of the greatest difficulties in photogrammetric analysis of animal aggregations, and automatic recognition is far more challenging. Tracking trajectories of individual animals in dense or spatially complex aggregations still requires the use of the human mind's exceptional image-recognition and correlation powers. Photogrammetric systems that allow three-dimensional visualization of video images can be powerful tools for animal behaviorists. The advent of fully automated three-dimensional tracking systems will allow biologists to address a wide range of questions (see Ch. 11 by Hamner & Parrish) that they have only begun to address.
3.6 Examples Photogrammetric techniques have been used by various researchers to study animal aggregations, particularly fish schools (Aoki & Inagaki 1988; Aoki et al. 1986; Cullen et al. 1965; Dill et al. 1981; Graves 1977; Hasegawa & Tsuboi 1981; Hellawell et al. 1974; Hunter 1966; Koltes 1984; O'Brien et al. 1986; Partridge et al. 1980; Pitcher 1975; Symons 1971a; Van Long & Aoyama 1985; Van Long et al. 1985). Although several of these references describe approaches that are different from the methods described in this chapter - for example, the cameras may be configured orthogonally, or mirrors are used to provide the second view - the geometry and data reduction are essentially identical. Rigorous photogrammetric treatment of the images (e.g. O'Brien et al. 1986) is not the norm. Most solutions simplify the mathematics and make assumptions about the physical model. For example, it is commonly assumed that images can be measured from paper prints without allowing for errors introduced during enlargement and or for the dimensional instability of photographic paper. It is also assumed that images can be measured with calipers rather than a comparator, that refraction can be ignored, that nonmetric cameras remain stable, that cameras can be fixed in position reliably for constant relative and absolute orientations, and that camera positions and orientations can be known accurately from physical measurement rather than analytical determination using object space control points. It is reasonable to make these assumptions, provided that the resultant errors in object space coordinates are properly understood. O'Brien et al. (1986), however, note that several researchers quote accuracies inconsistent with their experimental techniques. The following examples illustrate four different photogrammetric solutions. These examples demonstrate that off-the-shelf technology, particularly nonmetric cameras or CCD video cameras, can be used to make reliable and useful
Analytical and digital photogrammetry
53
three-dimensional measurements. The first two examples are described in greater detail in O'Brien et al. (1986) and Osborn et al. (1991), the third is courtesy of Ritz (per. comm.), and the fourth is taken from Hamner and Hamner (1993).
3.6.1 Laboratory analysis of macroplankton This example demonstrates that rigorous photogrammetric techniques can be used without expensive equipment (except a rented monocomparator). The only added complexity is the mathematical treatment of the measured image coordinates. For this analysis a flume tank was constructed to study the schooling behavior of planktonic shrimp (Fig. 3.8). Two nonmetric 35-mm film cameras (Pentax MEF) were mounted on a frame approximately 75 cm above the water level of the tank. A glass tray of known thickness and refractive index was floated on the water surface. The glass improved the image quality of the photographs by eliminating surface irregularities. The tray supported a photogrammetric control frame containing twenty-two control points, of known distance and geometric orientation. Stereo images of the control frame were used to solve for the interior, relative, and absolute orientation of the two cameras. The image coordinates of one eye of each shrimp were measured with a monocomparator. Because the shrimp and the cameras were in different media, the refractive effects of the water-glass and glass-air interfaces had to be taken into account. The relationship between the true and apparent position of an animal is illustrated in Figure 3.9. The ray-tracing technique used in this project requires that the refractive index of each medium be measured to calculate the direction vectors of each ray. After calculating the coordinates of the point of intersection of these rays with the planes defined by the media interfaces, the coordinates of the intersection of the two rays at the object point (i.e. the shrimp eye) could be
cameras
n
control frame
re-circulating system Figure 3.8. Section view of the flume tank (adapted from O'Brien et al. 1986).
54
Jon Osborn camera 1
camera 2
air
glass apparent position
"">-
water
true position
Figure 3.9. Refraction of light rays through the three media environment (adapted from O'Brien et al. 1986). found. The mean positional error of this system was less than 0.5 mm and the mean error in calculated distances between individuals was less than 0.25 mm.
3.6.2. In situ fish stock assessment Biologists at New Zealand's Ministry of Agriculture and Fisheries, Fisheries Research Centre (NZFRC), use acoustic methods to estimate the total stock biomass of deep water fish such as orange roughy (Hoplostethus atlanticus). The three-dimensional extent of aggregations is determined from repeated ship transects and the density of the aggregation is estimated from reflected signals. Target strength for a particular acoustic frequency depends on the species and size of the fish, body orientation with respect to the acoustic device, and school packing. In order to groundtruth the acoustic data set, the minimum photogrammetric information needed was verification of fish species and an independent estimate of school density. To further refine the estimates produced by acoustic methods and to calibrate the acoustic system, information was required about the vertical distribution of the target species and the size, orientation, and packing arrangement of individuals within the aggregation. The NZFRC uses two Lobsiger DS 3000 cameras, equipped with 28-mm Nikkor underwater lenses mounted behind a flat acrylic port, each with a reseau grid in the focal plane. Camera separations can be set at 250 mm, 500 mm, or 700 mm. A control frame (Fig. 3.2) is photographed before and after each cruise and is carried on the vessel during the cruise in case the cameras are disturbed at sea. The control frame contains about sixty control points, which have been coordinated to submillimeter accuracy using traditional survey techniques. The control frame photography is used to cal-
Analytical and digital photogrammetry
55
0.1Q. Std.Dev. (m)
Z orthogonal to camera base X & Y orthogonal to Z
0.05_
'4 '6 Distance from cameras (m)
T
8
10
Figure 3.10. Accuracy of the NZFRC system (with cameras separated by 0.7 m).
culate the interior, relative, and one component (scale) of the absolute orientation of the two cameras. On both calibration photographs, image coordinates of the control points and the corner reseaus are measured using a stereocomparator. Many different software packages are available to calculate orientation parameters from the measured photocoordinates, some of which are in the public domain (e.g. Karara 1989). In this project a formulation of the collinearity condition equations known as the Direct Linear Transformation (DLT) (Abdel-Aziz & Karara 1971; Marzan & Karara 1975) and the University of New Brunswick Analytical Self Calibration Program (UNBASC1) (Moniwa 1976) are being used to obtain initial and then rigorous solutions for the orientation parameters. Distortions due to the flat camera port are treated as an additional lens distortion and modeled accordingly. This is significantly simpler than the ray-tracing techniques needed in the previous example. Both an ADAM Technology MPS-2 analytical stereoplotter and a Zeiss stereocomparator are used to measure and reduce the data on the photographs containing fish. A typical stereomodel containing some fifteen fish takes about twenty minutes to set up and measure. The estimated accuracy of the NZFRC system is illustrated in Figure 3.10. Note the rapid deterioration in the Z coordinate (parallel to the camera's optical axes) compared with the X and Y coordinates. This illustrates the importance of defining requirements of the photogrammetric solution before choosing cameras, deciding on calibration procedures, and designing the stereocamera geometry.
3.6.3 Measuring mysid aggregations The project was designed to investigate the costs and benefits of aggregation (see Ritz Ch. 13 for an extended discussion of the behavioral ecology of this research). Approximately 100 mysids were placed in each of two identical glass
Jon 0shorn
56
tanks, one with an adequate food ration and the other with an insufficient ration. Characteristics of the aggregations, such as volume and density, and properties of the individuals, such as swimming speed, were monitored. This project illustrates the simplicity of some photogrammetric solutions. Because accuracy requirements were relatively low, a single CCD video camera in conjunction with a mirror was used to generate two images of the tank (Fig. 3.11; see Pitcher 1973, 1975 for other applications of this solution). The mirror was positioned at 45 degrees above the tank, creating a second orthogonal image. The camera was approximately 5 m from the tank but a zoom lens was used so that the mirror and the tank filled the field of view. The camera was placed this far from the tank to minimize both the refraction at the water-glass-air interfaces and the difference in scale between animals moving at a maximum and minimum distance from the camera. The images were grabbed from the S-VHS videotape and loaded into an image-processing system. After contrast enhancement, the images were scaled, using three axes marked on the tank. Frame advancement was manual. Calculating the three-dimensional position of animals was very simple because one image contained the scaled X and Y object space coordinates and the other image contained the scaled Y and Z coordinates. The common Y coordinate was used to correlate the two images. Although not used in this project, if the aggregation size and/or density is high, a third orthogonal image can be introduced to reduce the possibility of an ambiguous solution. To improve accuracy, simple empirical corrections can be formulated by measuring test grids within the tank. For this system, mean and R.M.S. errors on calibration targets were X: 0.2 ± 1 . 0 mm; Y: 0.0 ± 0.9 mm; Z: 0.0 ± 0.6 mm. A single-camera approach has the advantage that image correlation is simplified. However, placing both images on a single frame does reduce the resolution of each image, an important issue because current video technology already has significantly lower resolution than film.
plan view
section view
image
tank
Figure 3.11. Single camera mysid tank.
Analytical and digital photogrammetry
57
3.6.4 Three-dimensional tracking with EV3D Behavioral studies in open water must deal with three-dimensional movements of animals. To quantify in situ swimming behavior, a submersible-mounted dual camera video system was developed (Fig. 3.12; Hamner & Hamner 1993). "Offthe-shelf" monochrome video cameras with fixed-focus lenses and wide-angle adapters were permanently mounted in custom housings at the empirically determined distance behind the dome port at which refractive distortions at the water-port interface (see Fig. 3.9) were physically eliminated (Walton 1988). Separation of the cameras was maximized to improve accuracy of measurements made along the horizontal axis perpendicular to the plane of the two cameras, and the cameras were angled toward each other so that the two fields of view intersected approximately 2 m in front of the submersible, within the volume of water illuminated by the submersible's lights.
camera
camera
SUBMERSIBLE video # 2 camera control box VCR #1
[microphone) VCR audio#l
audio # 2
video #1 time code generator
#2
video # 2 time code video # 1
monitor
7
Figure 3.12. Schematic of 3-D video system on the submersible Johnson Sea-Link. The two cameras are mounted outside the submersible, on either side of the sphere, and are linked by cable to the video equipment inside the sphere (see text).
58
Jon Osborn
Camera power and focus were remotely controlled from inside the submersible through a pressure-resistant cable that also transmitted video signals from each camera to a separate video cassette recorder. Prior to data collection the two cameras were calibrated in situ by holding a cuboidal frame with eight control points motionless in front of the cameras with the submersible's claw and recording the grid with both cameras (Fig. 3.13). The calibration frame filled approximately 3/4 of each camera's field of view. After the frame was recorded for several minutes at depth, the manipulator arm on the submersible moved it out of view, allowing behavioral sequences to be recorded. As the scene was recorded, a time-code signal laid down simultaneously on one audio track of each VCR synchronized the two videotapes for subsequent identification of corresponding frames. Later, on work tapes dubbed from these originals, the auditory time code was transformed into a visual code, with the minute, second, and frame number (at 30 fps) imprinted on every frame. Images on the two synchronized videotapes were automatically tracked through time and in three-dimensional space with an "Expert Vision 3-D Tracking System (EV3D)" created and sold by the Motion Analysis Corporation. From known distances between targets on the calibration frame, the EV3D determines relative and absolute orientation X, Y, and Z axes of the volume viewed by both cameras. The outline of each digitized image is reduced to a centroid. The soft-
camera •#•! (port) 5 (0,71,0)
camera #-2 (starboard) 5 (0,71,0)
(71,71,0)
6
1 (0,0,0)
Figure 3.13. The digitized calibration grid in port and starboard views. White cubes at the eight corners were used as targets. One cube was poorly illuminated and its video image could not be digitized. Known spatial coordinates for each target established the x, y, and z axes for subsequent three-dimensional measurements. Any one target could be selected as the origin and the remaining targets were then identified in relation to that target.
Analytical and digital photogrammetry
camera #1 (port)
59
camera # 2 (starboard)
Figure 3.14. The two-dimensional paths of a sergestid shrimp and a fish recorded by port and starboard cameras. Because of the camera angles, the sergestid appears to lobstertail at two different angles and the two apparent paths of the fish are diametrically opposed. Only when the x,y coordinates in both views were combined to calculate positions in three-dimensional space could the true paths of either animal be determined.
ware automatically recognizes each set of image points and calculates the threedimensional coordinates of each centroid through time. A set of algorithms is then available to calculate speed, direction of travel, etc. Behavioral sequences for swimming fish and planktonic invertebrates at about 900-m depth were successfully collected with the research submersible Johnson Sea-Link. In one example sequence, each of the two video cameras recorded two-dimensional centroid paths of a sergestid shrimp and an unidentified fish as they swam in front of the submersible (Fig. 3.14). The sergestid swam into view from the port side above the fish, which entered the cameras' view from the starboard side. Review of the original video suggests that at least one of the sergestid's long, trailing antennae touched the fish, startling both animals into changing course and speed. Three-dimensional analysis of the approximate 1.5 sec interaction indicates that the sergestid increased its speed from 12 cm/sec to a maximum value of 140 cm/sec in response to the fish, while changing course.
3.7 Summary Photogrammetry offers critical advantages over traditional measurement techniques for biological applications, particularly in three-dimensional environments. The mathematics of photogrammetry are relatively straightforward, particularly when applied to simple stereophotography, but it does require careful
60
Jon Osborn
analysis and modeling of systematic geometric errors and random measurement errors. A wide range of metric, semimetric, nonmetric, film, and digital cameras are available. The choice of camera depends largely on logistical considerations such as the trade-off between additional object space control against the cost and restrictive limitations of metric cameras. The introduction of solid-state digital cameras offers great possibilities for real-time visualization and near real-time measurement, although automatic real-time measurement of digital images is currently possible only if the targets are very well defined. The enormous potential of real-time measurement in industrial, medical, and military applications is certain to ensure significant advances in the near future. To track the trajectories of individual animals in a dense or complicated aggregation, it is still, and for some time will remain, necessary to use the human mind's exceptional image recognition and correlation powers. Consequently, photogrammetric systems that allow three-dimensional visualization of video images are likely to be particularly useful to researchers studying animal behavior. Recently some impressive methods of visualizing stereomodels have become available. Affordable off-the-shelf technology can be used to make reliable three-dimensional measurements and, in some cases, to track individuals, but it is critical that the system be properly designed and that full consideration be given to the effect of propagating systematic and random errors.
r
~i LONGITUDE
127.93 32.57
127.63
LATITUDE
32.31 INTEGRATED VOLUME BACKSCATTERING
HHB < 2.01 x E-6
1 > 6.33 x E-6
Plate 4.1 Plate 4.1. Cruise track of the star survey pattern. Color bar corresponds to values of mean acoustic backscatter averaged over 30-sec report intervals and integrated from 10 to 120 m. Bathymetric contours of Fieberling Guyot from 500 to 1500 m are shown in the figure as well as a rectangle encompassing the area mapped in plate 4.2.
Plate 4.2. Map of depth-integrated acoustic backscatter in the waters overlying and surrounding the summit of Fieberling Guyot. The irregularly spaced survey data presented in plate 4.1 were interpolated, using a point Kriging algorithm (Isacks & Srivastava 1989), to produce a regularly spaced, two-dimensional grid of values for mapping. Bathymetric contours of Fieberling Guyot from 500 m to 1500 m are shown in the figure.
Plate 4.3. Three-dimensional visualization of acoustic backscatter in the waters overlying and surrounding the summit of Fieberling Guyot. In this case, the survey data were divided into 22 5-m depth strata and interpolated, using punctual Kriging, to produce a regularly spaced, three-dimensional grid of values for volume rendering and visualization. Bathymetric contours of Fieberling Guyot from 500 to 1500 m are projected on the lowest horizontal plane shown in the figure.
r
~i 127.87 35.52
LONGITUDE
LATITUDE
32.38 INTEGRATED VOLUME BACKSCATTERING
< 2.01 x E-6
Plate 4.2
Is. le. 11.
Plate 4.3
> 6.33 x E-6
127.70
Acoustic visualization of three-dimensional animal aggregations in the ocean CHARLES H. GREENE AND PETER H. WIEBE
4.1 Introduction Pelagic animals exist in a three-dimensional fluid medium and are continuously subjected to the physical processes of advection and turbulent mixing. Despite the tendencies of turbulence to mix and homogenize scalar properties in the ocean, most distributions of pelagic animals exhibit patchiness over a wide range of spatial and temporal scales (Haury et al. 1978; Powell 1989; Steele 1991; Levin et al. 1993). Since many pelagic animals are active swimmers, it is perhaps not surprising that the power spectra of their spatial distributions deviate, at least on smaller scales, from those of passive scalar properties such as sea surface temperature and chlorophyll fluorescence (Levin 1990). The patchiness of pelagic animal distributions results from the interaction between physical processes at work on the fluid and animal aggregation responses to biotic and abiotic cues in their fluid environment (Omori & Hamner 1982; Mackas et al. 1985; Hamner 1988; Greene et al. 1994). This interaction between physics and biology is both complex and fascinating; its study will demand new methods in oceanography and ethology which are more sophisticated than those brought to bear on the subject in the past. Three fundamental problems complicate efforts to study patchiness and animal aggregations in the oceanic environment. First, the ocean presents humans with a relatively hostile environment within which to work. Second, the ocean is largely opaque to light and other forms of electromagnetic radiation. Third, the distributions of pelagic animals are highly dynamic, continuously changing in both space and time. All three of these problems make it difficult to observe or sample pelagic animal distributions without confounding spatial and temporal patterns. In combination, these problems necessitate the development of novel approaches for visualizing the distributions of pelagic animals in the ocean environment (Greene et al. 1994).
61
62
Charles H. Greene and Peter H. Wiebe
4.2 Acoustic visualization Underwater acoustics have provided an arsenal of powerful tools for scientists interested in remotely sensing the ocean's interior (Clay & Medwin 1977). Unlike electromagnetic radiation, which is rapidly absorbed and scattered by seawater and the small particles suspended within it, low- to high-frequency (200 kHz) sound is typically required. Unfortunately, these much higher acoustic frequencies are absorbed more rapidly in seawater, and their value in remote sensing is correspondingly diminished. For applications requiring the detection of small individual animals, the reader is referred to the review article by Greene and Wiebe (1990). Since the focus of this volume is on three-dimensional animal aggregations, we will introduce here new methods recently developed for visualizing zooplankton and micronekton distributions in three dimensions using high-frequency sound. These methods, which we refer to as acoustic visualization, have proven to be effective in characterizing the distributions of zooplankton and micronekton on scales ranging from meters to tens of kilometers (Greene et al. 1994). In the section that follows, we will present results from a field study conducted at sea during October 1990. The results from this study provide a useful example for discussing the power and present limitations of acoustic visualization in the three-dimensional analysis of animal aggregations in the ocean.
4.3 Field studies: Hypotheses, methods, and results The field study described below was conducted as part of the Office of Naval Research's Accelerated Research Initiative on Flow Over Abrupt Topography. This research initiative was designed to explore the physical and biological oceanographic consequences of abrupt topographic features occurring in the open ocean. Fieberling Guyot (32.5 degrees N, 127.7 degrees W), a relatively isolated
Acoustic visualization of three-dimensional aggregations seamount in the eastern Pacific, was chosen as the primary study site for field investigations. The minimum summit depth of this seamount is 435 m. The main objective of our investigation was to determine whether or not abrupt topographic features, such as submarine banks, seamounts, and guyots, generate characteristic bioacoustical signatures in the open ocean. For example, one such bioacoustical signature might correspond to the gaps devoid of vertically migrating zooplankton and micronekton which have been hypothesized to form over abrupt topography (Isaacs & Schwartzlose 1965; Genin et al. 1994; Greene et al. 1994). These gaps arise from interactions between the topography and a combination of physical and biological processes including advection, vertical migration, and predation. Specifically, the following sequence of events was hypothesized to occur at Fieberling Guyot. During the evening, vertically migrating zooplankton and micronekton from deep waters surrounding the seamount's summit ascend to near-surface waters. Since fewer animals ascend from waters directly overlying the seamount's summit, a gap in the distribution of zooplankton and micronekton is formed. In the presence of surface currents, this gap would be advected downstream throughout the night. The following morning, the animals descend back to deep water, except for those trapped by the seamount's summit. During the day, some of the trapped animals may escape by migrating horizontally or by being swept by currents off the summit to where they can descend back to deep water. Many of the remainder are consumed by predators resident to the seamount. In either case, the topography impedes the replenishment of deep-water zooplankton and micronekton by day, thereby setting the stage for gap formation the following evening. During October 1990, an acoustic survey was conducted aboard the RV Atlantis II to search for a bioacoustical signature of the hypothetical gap overlying Fieberling Guyot (Plate 4.1). Since this gap is hypothesized to be a nighttime feature, we collected survey data only between 21:00 hr and 05:00 hr. Due to the large areal extent of Fieberling's summit, the survey required two nights to complete. On each night, acoustic data were collected with a BioSonics 120-kHz echo sounder as the ship steamed a four-pointed star pattern. The second night's star pattern was designed to have each point of the star offset by approximately 45 degrees from the previous night's pattern. The Global Positioning System (GPS) satellites provided accurate navigational data for each night's cruise track. The conventional method for analyzing an acoustic data set of the type we collected would involve individually analyzing each section corresponding to each transect line comprising the two stars. This, in fact, should be done, and Nero and Magnuson (1989) provide an excellent description of some of the statistical techniques available for such analyses. Although statistical analyses of the indi-
63
64
Charles H. Greene and Peter H. Wiebe
vidual sections can provide insights into some of the physical and biological processes at work during the survey, they fail to provide an overview of the entire area surveyed. Our first attempt at visualizing the area is presented in Plate 4.2. Here, the acoustic backscatter integrated from 10 to 120 m has been mapped for only the portion of the total survey area where the spatial coverage of data appeared adequate. Since the star survey patterns did not produce regularly spaced data, kriging (see caption to Plate 4.2) was used to grid the depthintegrated acoustic data prior to mapping. Despite logistical constraints in the survey's design and implementation, most of the seamount's summit and adjacent surrounding waters received coverage we deemed adequate for our mapping and visualization procedures. The map of depth-integrated acoustic backscatter is reasonably consistent with the gap hypothesis, but certainly does not provide indisputable evidence by itself. In general, depth-integrated acoustic backscatter in the surface waters overlying Fieberling's summit was found to be low relative to most of the surrounding waters. This was the result we were looking for, but whether one chooses to view it as convincing evidence for a gap or not is debatable. Therefore, to improve our view, we created a three-dimensional visualization of acoustic backscatter throughout the entire volume of water being examined (Plate 4.3). This involved kriging the acoustic backscatter data from each 5-m depth stratum individually (from 10 m to 120 m), and then combining them all to generate a three-dimensional data grid. An IBM Power Visualization System (PVS), running IBM's Data Explorer software, was used to visualize these volume-rendered data. The new and improved view had immediate consequences on our interpretation of this large and complex data set. In particular, two features captured our attention immediately. First, there appeared to be a fairly distinctive gap in the sound-scattering layer overlying the seamount. Although a lack of better survey coverage on Fieberling's western flank makes our case less complete than we would have desired, the evidence for a gap overlying the seamount's summit is clearly more convincing in this visualization than in Plate 4.2. The second feature immediately obvious in Plate 4.3 is the presence of discrete, soundscattering aggregations on the seamount's upstream and downstream flanks. These aggregations can be associated with the two hot spots on the map of depth-integrated acoustic backscatter (Plate 4.2). The hot spots arose, not from local increases in the backscatter intensity of the 30- to 60-m-deep sound scattering layer, but rather from the appearance of deeper aggregations of animals. Unfortunately, the full vertical extent of these deeper aggregations could not be determined from our acoustic data. Also, we were unable to determine from conventional net sampling the identity of sound scatterers comprising these
Acoustic visualization of three-dimensional aggregations
65
deeper aggregations. They may correspond to predators resident to the flanks of the seamount, such as lophogastrid mysids and sternoptychid fish, which are demersal by day and enter the water column to feed on zooplankton at night. This phenomenon has been described for other seamounts (Boehlert & Genin 1987) and seems like a reasonable hypothesis to explain our observations. Other hypotheses could explain these findings equally well and resolution of the issue will require further investigation.
4.4 Discussion One of the fundamental goals of ecology is to understand the processes regulating the distributional patterns of organisms in space and time. This goal has been particularly elusive in the oceanic environment since we rarely have an opportunity to watch pelagic organisms directly, and even when we do, they rarely stay in one place long enough for us to characterize their distributions on scales greater than meters to tens of meters (fine scales - Haury et al. 1978). Sampling with water bottles and nets has proven useful for certain applications, but issues associated with spatial coverage, spatial resolution, and sample processing time pose severe constraints on what we can learn from such methods (Greene 1990). For these reasons and others, we have turned to acoustic methods for studying the distributions of pelagic animals in the ocean (Greene & Wiebe 1990). Acoustic visualization represents the next logical step in the evolution of acoustic remote-sensing methods. Two-dimensional sections of acoustic transect data have been studied qualitatively and quantitatively for many years. The subject of interest, however, is the three-dimensional distribution of pelagic animals in a volume of ocean. As we have shown, with the powerful computing hardware and software currently available, it is feasible to extrapolate two-dimensional sections of acoustic transect data and create three-dimensional composite visualizations. These visualizations have proven extremely valuable in the search for patterns in the distributions of pelagic animals, but they must be interpreted carefully and with proper attention to methodological limitations. The most exciting part about generating an acoustic visualization like Plate 4.3 comes in the recognition that it is feasible to characterize the distributions of zooplankton and micronekton in a relatively large volume of ocean over a relatively short period of time. It must be recognized, however, that our acoustic surveying capabilities at present are limited, and these limitations impose important constraints on what we can interpret from such acoustic visualizations. The field study just described exceeded the spatial and temporal limits of what one might call a truly synoptic survey of animal distributions in the waters overlying and surrounding Fieberling Guyot. Given the local advective regime (as
66
Charles H. Greene and Peter H. Wiebe
indicated by satellite-tracked drifters: Haury & Genin pers. comm.), the volume of water surveyed the first night would have moved several kilometers prior to the second night. Furthermore, the animals would have undergone one complete diel vertical migration cycle between the beginning and the end of the two-night survey. Thus, it was clearly inappropriate to interpret Plate 4.3 as a snapshot visualization of the nighttime distribution of zooplankton and micronekton over Fieberling Guyot. Instead, we interpret this figure as a composite visualization from the two-night acoustic survey. A composite visualization, we argue, that helps reveal the locations of recurring or persistent bioacoustical signatures specifically associated with seamount-related processes. Recurring or persistent bioacoustical signatures might be expected to arise when animal behavior is of comparable importance to physical advection in determining zooplankton and micronekton distributions. The two phenomena proposed earlier, the hypothetical gap overlying the seamount's summit and the midwater aggregations of predators overlying the seamount's flanks, both result from behavioral mechanisms that separate animal trajectories from the streamlines of the local flow regime (Mackas et al. 1985). Both phenomena should generate characteristic bioacoustical signatures, a negative signature associated with the gap and positive signatures associated with the aggregations. Although there was circumstantial evidence for these phenomena, our inability to draw more definitive conclusions about their existence and origin points toward another limitation of acoustic visualization. Relying as it does on acoustic backscatter data, acoustic visualization methods must be supplemented by other methods to determine the identities of the sound-scattering animals. In this regard, acoustic and satellite remote-sensing methods share comparable limitations. Acoustic backscatter data are as difficult to relate to the taxonomic composition of zooplankton and micronekton as ocean-color data are to relate to the taxonomic composition of phytoplankton. Despite this limitation, acoustic visualization can provide an unprecedented, relatively large-scale overview of pelagic animal distributions in the ocean. As such, it has the potential to revolutionize the way ocean scientists study the ecology of pelagic animals in much the same way that satellite remote sensing has revolutionized the study of oceanic phytoplankton ecology.
Acknowledgments We would like to thank the officers and crew of the RV Atlantis II as well as members of the scientific party that assisted us with our data collection at sea. Eli Meir and Hugh Caffey provided considerable help in the analysis of acoustic data; their help is gratefully acknowledged. Bruce Land provided the brains be-
Acoustic visualization of three-dimensional aggregations
67
hind the mouse during visualizations using the IBM Power Visualization System (PVS); his expertise is greatly appreciated. Finally, thanks are extended to Bill Hamner and Julia Parrish for allowing us to participate in the workshop leading to this volume. Our research was supported by the Office of Naval Research and Department of Defense University Research Initiative. Access to the IBM PVS was provided by the Cornell Theory Center, a center supported jointly by Cornell University, International Business Machines, the National Science Foundation, and the state of New York. This is contribution number 9 of the Bioacoustical Oceanography Applications and Theory Center.
Three-dimensional structure and dynamics of bird flocks FRANK HEPPNER
5.1 Introduction Of all coordinated groups of moving vertebrates, birds are at the same time the easiest to observe and perhaps the most difficult to study. While fish can be brought into a laboratory for study, and many mammals move in a two-dimensional plane, a single bird in an organized flock can move through six degrees of freedom at velocities up to 150 km/hr. Present three-dimensional analysis techniques generally demand either fixed camera or detector positions, so free-flying flocks must either be induced to fly in the field of the cameras, or the cameras must be placed in locations where there is a reasonable probability that adventitious flocks will move through the field. Perhaps because it has been so difficult to obtain data from free-flying natural flocks, there is now a current of imaginative speculation, and lively controversy, in the literature on flock structure and internal dynamics. Birds can fly in disorganized groups, such as gulls orbiting over a landfill, or organized groups, such as the Vs of waterfowl (Fig. 5.1a). To the evolutionist, behaviorist, or ecologist, any group is of interest, but I will primarily consider only the organized groups. Heppner (1974) defined organized groups of flying birds as characterized by coordination in one or more of the following flight parameters: turning, spacing, timing of takeoff and landing, and individual flight speed and direction. The term used for such organized groups was "flight flock," but for consistency in this volume, the term "congregation" will be used herein. Two general questions have driven the examination of bird congregations. The first, usually expressed while observing a skein of geese flying overhead, is, "Why do they fly in this precise alignment?" The second is prompted by the sight of perhaps 5000 European Starlings, Sturnus vulgaris, turning and wheeling over a roost. "How do they manage to achieve such coordination and polarity?" The first question is usually asked in reference to relatively large birds, like waterfowl, flying in line formations (Heppner 1974; Fig. 5.1a): groups of birds 68
69
Three-dimensional structure of bird flocks
•T- . '
.
(a)
Figure 5.1. (a) A V-formation of Canada Geese (i.e. line formation). Notice, in this oblique two-dimensional view, the difficulty of determining distance between birds and angular relationships between birds, (b) A cluster formation of mixed blackbirds.
70
Frank Heppner
flying in a single line, or joined single lines. Typically, such formations are approximately two-dimensional, the birds all lying in an X-Y plane parallel to the ground. The "how" question is customarily asked about relatively large flocks of small birds, like sandpipers, flying in cluster formations (Fig. 5.1b): flocks characterized by development in the third dimension, and rapid, apparently synchronous turns. Attempts at analysis of structure and dynamics in these two major classes of organized flight formations have been driven both by the characteristics of the formations and the types of questions that have been asked. In the line formations, the functional significance (i.e. why) of the groupings has been of cardinal interest; therefore data have been sought on the values of parameters that might be supportive of hypotheses concerning costs and benefits to the individual. Because these formations have characteristically been interpreted as twodimensional, structural analysis, although challenging due to the ephemeral nature of flocks, has not required true three-dimensional analysis techniques. In contrast, it is the synchronous and coordinated turning of the cluster flocks that has drawn the greatest interest. Questions such as, "Is there a leader in such groups?" or "If there is no leader, how is coordination achieved?" have spurred the structural analyses. However, because these formations occupy a threedimensional volume, the formidable technical challenges involved have produced few field studies to date. In this chapter, I will explore how progress in the study of both line and cluster flocks has proceeded in stepwise fashion, sometimes being stimulated by a new technique, at other times prompted by a testable (as opposed to speculative) idea.
5.2 Line formations 5.2.1 Theoretical considerations Speculation and unsupported conclusions about the function of apparent structure in line formations have a long history. Rackham (1933) translated Pliny's authoritative observation in the first century A.D. that geese "travel in a pointed formation like fast galleys, so cleaving the air more easily than if they drove at it with a straight front; while in the rear the flight stretches out in a gradually widening wedge, and presents a broad surface to the drive of a following breeze." Two thousand years later, Franzisket (1951) posited that close-formation flight provided an area of turbulence-free air. In contrast, Hochbaum (1955), attempting to explain why waterfowl fly in staggered formation, hypothesized that it was to "avoid the slipstream of rough air produced by the movement of its companions." Hunters often offer the folk suggestion that birds in these formations might be "drafting," like auto or bicycle racers do: tucking in behind the vehicle
Three-dimensional structure of bird
flocks
71
ahead to reduce air resistance. Geyr von Schweppenburg (1952) suggested that a phase relationship in wing beating might be important in the aerodynamics of flight in line formations. Nachtigall (1970) found such a relationship in geese, but von Berger (1972) and Gould (1972) did not. Hainsworth (1988) did not see phase synchrony in pelicans. If the wings of each bird in a flock are regarded as independent oscillators, there will be some periods of time in which significant numbers of the flock will be in temporary synchrony, and this may have been what Nachtigall observed. Non-aerodynamic hypotheses also have been offered for line formation flight. One of the most compelling suggestions stated that structured formations facilitate the collection of information by and from flock mates (see Dill, Holling & Palmer, Ch. 14 for a discussion of this possibility in fish schools). Hamilton (1967) suggested that a stagger formation allowed communication between individuals. Forbush et al. (1912) and Bent (1925) suggested that staggered flight permitted a clear field of vision to the front, while at the same time allowing a leader to fly at the head of the formation. Heppner (1974), Molodovsky (1979), and Heppner et al. (1985) all offered the possibility that V or echelon flight lines might be the result of the optical characteristics of the birds' eyes. Until 1970, students of line formations could only offer streams of hypotheses about function, because there was no suggestion about what parameter(s) might be useful to measure to test those hypotheses. However, Lissaman and Shollenberger (1970) published a seminal, but enigmatic, paper that suggested the function of the V-formation was to enable each bird in line to recapture energy lost by the wingtip vortex produced by the preceding bird. According to their hypothesis, birds abreast in line, flying tip-to-tip, should have a range approximately 70% greater than a lone bird. Distance between wingtips was inversely proportional to maximal energy recapture. In other words, the closer neighbors are, the higher the potential energy savings. Based on Munk's (1933) stagger-wing theory from aircraft aerodynamics, they predicted that the "optimal" V-formation formation (Fig. 5.2) was achieved at a tip spacing equal to 1/4 of the wingspan. However, the Lissaman and Shollenberger (1970) paper was vague; it presented neither the theoretical equations nor sample calculations illustrating their predicted energy savings. Furthermore, there were some major deviations from biological reality. For instance, flapping flight (as opposed to gliding aircraft flight) was not considered. May (1979), in a brief review of flight formation, suggested that Lissaman and Shollenberger's (1970) calculations predict an optimal V angle for saving energy of roughly 120 degrees. Hummel (1973, 1983) also felt that significant power savings were possible by V-formation flight and presented the calculations he used to arrive at this conclusion. He argued that the power reduction for a V-formation flock as a whole
72
Frank Heppner
Figure 5.2. An optimal V-formation (Lissaman & Shollenberger 1970). was strongly dependent on the lateral distance between wingtips, whereas energy savings for individuals in a flock could be affected by longitudinal distances between wingtips. Haffner (1977) attempted to replicate Lissaman and Shollenberger's predicted energy savings also using Munk's stagger theorem, and only obtained a calculated 22% potential energy saving for formation flight. When Haffner then modified the calculation using Cone's (1968) flapping wing theory, the potential maximum energy saving dropped to 12%. Several investigators seized this possibility of a testable hypothesis: determine the geometry of a flock and the distance between the birds, and then the tip-vortex hypothesis could be tested. There was now incentive to develop analytic techniques for the spatial structure of line formations.
5.2.2 Data collection If a photograph of a level V-formation of geese was taken when the birds were directly over the camera position (or if the camera was directly over the birds), there would be little problem in determining either the geometric relationship of the birds in the formation or the distances between two birds. A known distance, say bill-to-tail length, could be used to establish a distance scale. However, the number of times line-flying birds fly directly over an observer in the field is sufficiently small to tax the patience of the most dedicated researcher. Heppner (1978) made a fruitless attempt to fly a radio-controlled, camera-equipped model airplane directly over goose flocks, but the geese were faster than the aircraft. If a photograph is taken at an oblique angle to an oncoming or departing Vformation (Fig. 5.3), perspective will change both the angle between the legs of the V and the distances between birds. Gould and Heppner (1974) published the first technique for determining the angular relationship and distances between Canada Geese, Branta canadensis, flying in a V. Their technique employed a single cine camera and assumed that (1) the birds were flying in a level plane, (2)
Three-dimensional structure of bird flocks
APPARENT
73
/
TRUE
Figure 5.3. Overhead view, looking down at a camera mounted on a tripod, tracking a Vformation that maintains a constant angle between the legs during its passage. The apparent angle of the V, as seen through the viewfinder and recorded on the film, changes as the birds approach their nearest point to the camera and then depart. At the point of closest approach, the apparent angle is at a minimum, and a line drawn between the camera position and the head of the formation at closest approach describes a right angle with theflightpath. This angular relationship is then used for projective geometry to calculate the true angle (from Gould & Heppner 1974). the flight path was a straight line, and (3) the shape of the formation did not substantially change in the few seconds needed for filming. The key observation for this technique (Fig. 5.4) was that when the formation was at its closest point to the camera position, the apparent angle of the legs of the V on the film was at a minimum, while the angle of the optical axis of the camera above the horizon was at a maximum. By marking the angular elevation of the camera at this point of closest approach, it was possible to use projective geometry (Slaby 1966) to obtain the true angle of the formation and the distances between birds. In the five formations they measured, the true V angle was 34.2 ± 6.4 degrees, and the distance between the centers of the birds was 4.1 ± 0.8 m. O'Malley and Evans (1982) used this technique to measure the angle of the Vformation in White Pelican, Pelecanus erythrorhynchos, flocks, and found a mean angle of 69.4 ± 4.5 degrees. They noted that the range of their values was
74
Frank Heppner HORIZON
Figure 5.4. Relationship of camera position to flight path, apparent angle in the viewfinder, and camera elevation in a V-formation. This relationship forms the basis for the projective technique used to calculate true angle and distance between birds. (From Gould & Heppner 1974).
large (24-122 degrees) and did not seem to follow a pattern in flight direction, formation type, or flight size. Hainsworth (1988) filmed echelon formations of Brown Pelicans, Pelecanus occidentalis, that flew directly overhead, then in a straight line away from the camera. A known distance, the outstretched wingspread of a pelican, was used as a scale to determine wingtip-to-wingtip spacing which ranged from —171 cm (overlap) to +183 cm. Williams et al. (1976) used a mobile, modified small boat radar called an "ornithar" to determine the angle between the legs of a V-formation of Canada Geese. The portable radar technique offered smaller distortion due to perspective than optical methods, approximately 3 degrees maximum, but individual birds were not resolvable. V angles ranged from 38 to 124 degrees. Interestingly, there was greater variance among formations than within the same formation over time. In 1975, Heppner, using Gould and Heppner's (1974) optical method, and Williams et al. (1976), using their radar method, both measured the same formations at the same time at Iroquois National Wildlife Refuge in New York. Two formations met the requirements for measurement for both techniques, i.e. they
Three-dimensional structure of bird
flocks
75
were large enough for radar and well-organized enough at the apex for optical measurements. Both methods yielded essentially identical results (Williams et al. 1976). Although it has not yet been used for looking at flocks, a technique developed by Pennycuick (1982), and described in detail by Tucker (1988, 1995), offers potential for following flight paths of individual birds. It makes use of a device called an "ornithodolite," an optical range finder with a 1-m base mounted on a panoramic head that electrically records elevation and azimuth, with the range indicated by the range finder. In this way, a continuous record of a bird's threedimensional flight path was obtained. Error in the system increased with distance of the bird from the instrument. At a range of 1 km the true position of the bird is somewhere within a 10 m3 "volume of uncertainty." Tucker (1991) used this technique to follow the flight paths of landing vultures. One could, presumably, use a modification of this technique for tracking individual birds in a flock, but one would need either one ornithodolite for each bird, or a combination of ornithodolites and some accessory system for tracking the flock, such as the portable radar or optical techniques already described. Tracking radars (as opposed to planned-position indicator radars, like the familiar airport screen) have been used to follow individual birds. There is no technical reason why the technology that has been developed to track multiple targets for military purposes could not be used for bird flocks. However, as Vaughn (1985) pointed out in his excellent review of birds and insects as radar targets, high cost and limited accessibility to high-precision radar tracking devices have reduced the number of active radar ornithologists to a handful. However, there are several excellent studies on tracking of individual birds from the 1970s (DeMong & Emlen 1978; Emlen 1974; Vaughn 1974) that indicated the potential of the technique. If there is such a thing as a key paper in line formation flight in birds, it is probably Lissaman and Shollenberger (1970). They proposed, for the first time, a testable hypothesis about flock formation. Although the Lissaman and Shollenberger paper has had a powerful effect on stimulating thought and action, there has never been a direct experimental test of the aerodynamic assumptions in the paper. Haffner's (1977) unpublished study of Budgerigars, Melopsittacus undulatus, flying in a wind tunnel with smoke plumes suggested that the tip vortex in flapping flight was interrupted during the wingstroke cycle. Both Rayner et al. (1986) and Spedding (1987) found that tip vortices behind flying animals moved both vertically and horizontally during the wingstroke, making precise positioning relative to neighbors less advantageous. Most investigators of the structure of the V-formation have found wide variation in spacing and angular positions (Hainsworth 1988; O'Malley & Evans 1982), but have interpreted this variation
76
Frank Heppner
as a failure of birds to maximize energy savings, rather than rejecting Lissaman and Shollenberger's (1970) energy-saving model.
5.3 Cluster formations 5.3.7 Function and synchrony The phenomenon of coordinated flight has been known since the ancients. The redoubtable Pliny (Rackham 1933) noted that "It is a peculiarity of the starling kind that they fly in flocks and wheel round in a sort of circular ball, all making towards the center of the flock." Selous (1931) organized many years of anecdotal observations of cluster flocks and framed the basic question, "There does not appear to be an identifiable leader in such groups, so how do these birds coordinate their movements?" Selous's speculative hypothesis was "thought-transference," and he viewed a coordinated flock as a kind of group mind. Selous was handicapped by lack of a conceptual model that would permit the formation of testable hypotheses and an almost total lack of quantitative information. On the other hand, Heppner and Haffner (1974) presumed that there had to be a leader in such flocks, and then proceeded to demonstrate the formidable obstacles to visual or acoustic communication between such a putative leader and its followers. Synchrony, or apparent synchrony, in the turning movements of cluster flocks has drawn much attention. Observers describe a "flash" that passes, wave fashion, through the flock, and conclude from this that the turn is initiated in one part of the flock and then spreads. Gerard (1943) observed that birds that were pacing his car at 35 mph turned within 5 msec of each other. Unfortunately, like many early studies, the details were vague; we do not know what kind of birds were involved, nor how the determination was made. Potts (1984), Davis (1980), and Heppner and Haffner (1974) describe waves of turning in European Starlings, suggesting the existence of some originating point for the turn, or a possible follow-the-leader model. Heppner and Haffner (1974) expressed the time lag between the initiation of a turn by a leader on one side of a spherical flock, and a subsequent turn by a follower taking his cue to turn by the sight of the leader turning, as a function of the reaction time of individual birds, and the diameter and density of the flock. Potts (1984) suggested a "chorus-line hypothesis" to explain a rapid wave of turning in cluster flocks. In this hypothesis, turning birds respond not to turning neighbors, but to more distant birds. In other words, they anticipate the approaching wave of motion. However, there is a possibility that observers who have seen a wave of turning may, in fact, have seen instead an artifact of the way that a stationary observer perceives the turn. Birds, like fish, do not reflect light uniformly over their bodies. Starlings have semireflective feathers, and the shorebirds reported in other
Three-dimensional structure of bird flocks
11
Figure 5.5. How simultaneous turns might give the impression of a "flash," or wave of turning. In a denseflockof Dunlin, birds turning catch the light off their bodies, creating a wave of white that passes through the flock. (Photograph by Betty Orians.)
studies are also differentially colored. If a ground observer was watching a group of birds that were turning simultaneously, the "flash" might appear first in one portion of the flock, then give the impression of moving through the flock as the flock's position in three-dimensional space changed relative to the observer and light source (Fig. 5.5). Davis (1975) investigated another apparent simultaneity in take-off of a flock of pigeons. Using "actor" and "observer" pairs, the "actor" was induced to take off with a mild shock. Observers departed within .5 msec, unless the "actor" displayed some preflight intention movements before the shock, in which case its flight tended to be ignored.
5.3.2 Analysis Major and Dill (1977) provided the first determination of the three-dimensional structure of freely flying bird flocks: European Starlings and sandpiper-like Dunlin, Calidris alpina. In this and a later paper (Major & Dill 1978), they used a
78
Frank Heppner
stereoscopic technique that employed two synchronized still cameras mounted on a rigid, 5 m bar. Although they were able to determine interbird distances, there was not enough information to determine flight paths of individual birds. Major and Dill concluded that the internal organization of the cluster flocks they studied strongly resembled that of minnow schools (Pitcher 1973); that is, nearest neighbors tended to be behind and perhaps below a reference fish. Dunlin had a tighter, more compact flock structure than Starlings, somewhat surprising because the Dunlin had flight speeds approximately two to three times faster than the Starlings. One might rather expect that at higher flight speeds, more distance between birds would be desirable for collision avoidance. Pomeroy (1983) and Pomeroy and Heppner (1992) described a technique for plotting the three-dimensional locations and flight paths of individual pigeons in flocks of twelve to twenty birds using two orthogonally placed synchronized 35-mm still cameras focused on a common point (Fig. 5.6. See Appendix for a description of their method). Although their method of data reduction involved making photographic prints and measuring locations with calipers, the basic technique could be easily modified to use video and a digitizer or video framegrabber. Being able to plot flight paths over time allowed a more detailed examination of the dynamic structure of cluster flocks. Pomeroy and Heppner (1992) found that birds regularly shifted position within the flock (Fig. 5.7). Birds in front ended up toward the back, and birds on the left ended up on the right at the completion of a turn. This rotation of position was a consequence of the birds flying in similar-radius paths, rather than parallel paths during the turn, suggesting that no individual bird was the "leader." The possibility still exists that there might be a kind of rotating positional leadership although there is no evidence for such a mechanism at this time. Pomeroy (1983) suggested that flocking birds are more easily able to transit through the interstices between neighbors than fish can move between individuals in a school. In most of Pomeroy's trial flocks, nearest neighbor distances decreased during a turn. Progress in the analysis of cluster flocks was slow, not so much for a lack of analytical tools, but (until recently) the lack of a conceptual alternate to a leadership model to produce coordinated movements. Presman (1970) was committed to a leader model, but refined Selous's (1931) "thought transference" model to a more sophisticated model in which electromagnetic fields produced by either the brain or neuromuscular system of the leader would be instantaneously transmitted to other members of the flock, there to act either on the follower's brain or directly on the follower's neuromuscular system. The flock would then become a kind of "superindividual." Neither of these hypotheses had any experimental
Three-dimensional structure of bird flocks
79
Negative from Camera A
X
Figure 5.6. Nonstereo determination of three-dimensional location of individual birds. Two cameras. A and B, are aimed at, and equidistant from a common point, S, that represents the center of a sphere of radius, CS, that represents the maximum volume in which a bird's location may be determined. Point T is the projection of the bird's position in three-dimensional space (from Pomeroy & Heppner 1992). support and stretched biological communication to or perhaps beyond the limit, but there was then no biologically plausible nonleader model available either. Although it is traditional to think of the internal architecture of cluster flocks in terms of potential adaptive significance, there are difficulties presented by some generally accepted suggestions for the adaptive advantage of tight, highly coordinated, and polarized flocks. For example, if coordinated flocking is of advantage against predation by hawks (Tinbergen 1951), why then do European Starlings turn and wheel in highly coordinated fashion for a half-hour to fortyfive minutes above a roost before retiring for the night, exposing themselves to what would appear to be an unnecessary risk of predation? Wynne-Edwards
Frank Heppner
80
dr D. Random impact. The fourth term dP(t) was originally included with the idea of more closely simulating a natural environment in which wind gusts, distractions from moving objects on the ground, and predators might randomly perturb the flight paths of in-
Three-dimensional structure of bird
flocks
83
dividuals. In practice, without the inclusion of this term, it was not possible to produce a coordinated, polarized simulated flock, an interesting observation in terms of both the dynamics of the model system and the transitions between coordinated and uncoordinated flocks in nature. The random impact term was modeled by an n-dimensional, time-homogeneous Poisson process with stochastically independent components. The ith component of dP(t) was zero unless t happened to be an event of the ith component of the Poisson process. In the latter case, dP(t) equals a random three-vector with uniformly distributed components, with a scalar parameter controlling the magnitude of the random vector. Heppner and Grenander's (1990) model produced polarized flocks that would either orbit an external attraction and demonstrate the rotation of individual position seen in Pomeroy and Heppner's (1992) natural pigeon flocks, or escape the influence of the attraction and fly a straight flight path indefinitely, depending on the values attached to the variables in the model, such as preferred spacing. The model would not, however, produce the spontaneous coordinated turns seen in natural flocks. Heppner and Pakula (unpublished) prepared a computer simulation of a type of natural flock behavior that resembles, in basic character, a simultaneous, or near-simultaneous, departure from a wire or field, and thus bears resemblance to a coordinated, or near-synchronous turn. In this behavior, flocks of blackbirds will descend to a field and forage. From time to time, individual birds will "pop up" spontaneously to a height of a few meters above the ground (Heppner & Haffner 1974). Occasionally, a bird will depart the area after "popping up," but more typically will settle back to the ground. From time to time, small groups will pop up and leave, but after a variable interval, the entire flock will appear to rise up simultaneously and move to a new foraging area. In Heppner and Pakula's two-dimensional model, an individual bird is represented by a graphic "bird" that can move freely on a Y axis line above the X axis ground surface. At the beginning of the demonstration, birds are spaced equidistantly along the ground surface. Each bird is a member of a cohort of birds that are within a defined lateral distance from a bird in question. The cohorts overlap. At the beginning of a run, individual birds pop up randomly in space and time. The mean interval between pop-ups can be varied by the experimenter. When a bird pops up, it rises to a height on the screen where it can "see" the other birds in its cohort. If no other birds in its cohort are in the air, or if the number of other birds in its cohort who have also spontaneously popped up and are airborne are below a preset threshold, the popped-up bird will slowly descend back to the ground. If, however, the threshold number is exceeded, the bird will depart the area by flying vertically off the screen.
84
Frank Heppner
Decision Height;
c.
D.
E. .^^f^*
4~9f^.
Figure 5.8. Demonstration of a "pop-up" model. (A) All birds are on the ground. (B) One bird randomly flies above the decision height, detects no other birds, and (C) returns to ground as two other birds which happen to be next to each other in the same cohort randomly "pop up" above the decision height, but seeing no other birds, (D) return to ground as three birds which happen to be in the same cohort randomly "pop up," and seeing a threshold number of their cohort above the decision height, (E) fly away, as other birds on the ground see a cohort depart and depart themselves.
Three-dimensional structure of bird
flocks
85
Birds on the ground will ignore popped-up birds in the cohort, unless the number of birds in the air exceeds a threshold, in which case all of the birds in the cohort will rise simultaneously and depart the area. Birds not in the cohort in question will ignore the behavior of birds in other cohorts, unless the number of cohorts in the air exceeds a preset threshold, in which case all birds in the flock rise up and depart the area (Fig. 5.8). By manipulating the thresholds, it is possible to produce qualitatively a behavior that resembles natural flocks; individuals pop up at random, usually dropping back to the ground, small groups rise up and leave without affecting the flock as a whole, and after a period of time, the balance of the flock rises up almost simultaneously and departs. A similar mechanism might be employed to produce coordinated turns in a flock. In natural flocks, individuals and small groups are constantly turning away from the flock as a whole. Sometimes they return, other times they do not. Individual birds might have a threshold for being influenced by neighboring turners; if only a few neighbors turn at random, they will be ignored, but if a greaterthan-threshold number turns, the individual will follow the turners. In this case, coordinated turns, like the formation and cohesion of the flock itself, might be driven by a stochastic process.
5.4 Future directions and problems Low-tech, inexpensive techniques now exist for determining the two- and threedimensional structure of bird flocks, but they require so much time for manual data reduction that few people would now be willing to employ them. High-tech, expensive methods exist that would solve the data-reduction problem, possibly even permitting real-time three-dimensional analysis, but there remains the problem that essentially killed wide use of radar ornithology - cost and availability. The perfect three-dimensional analysis technique does not currently exist for bird flocks, but if it did, it would have the following properties: 1. Portability. Whether using stereo (e.g. Major & Dill 1977), or orthogonal (e.g. Pomeroy 1983) techniques, present optical methods require a relatively fixed volume of space within which the birds can fly. This greatly reduces the usefulness for the analysis of wild flocks. Ideally, one should be able to set up and take down a recording device in ten minutes or so, to take advantage of blackbird or shorebird flocks whose appearance is unpredictable. 2. Auto-correspondence. To date, three-dimensional analysis methods require images from at least two matching viewpoints. A technique that would permit
86
Frank Heppner
rapid, automatic, and accurate correspondence of the images would be very valuable. 3. Low initial cost and data-acquisition cost. Most footage obtained of wild animals in the field is worthless for one reason or another, so many feet of film or tape must be exposed. Cine film is very expensive now, but offers excellent resolution, and true slow motion. Videotape is very cheap, but has only a fraction of the resolution of 16-mm film, and with consumer camcorders, does not permit true slow motion. Perhaps more important than the development of a faster, cheaper, and better analytical technique is resolving the question of what to measure with this technique (see discussion by Dill et al. Ch. 14). What parameters should be measured to address questions of leadership, synchrony, internal structure, and driving mechanism? How do you define a "turn"? Is it when more than a certain fraction of the birds depart from the mean flight path of all birds by a given angular amount? Is the interval between turns significant in some way? To use an ornithological metaphor, there is a chicken-and-egg problem here. Without knowing what the technique has the capacity to measure, it is difficult to set the task for the technique, and without the technique, it is difficult to know what questions can be addressed with it. Ultimately it will be desirable to "truth-test" the models and simulations that have been made. It is all well and good to prepare a stunning and realistic computer simulation of a flock, but how do you know real birds are using a similar algorithm? To test this, it will be necessary to measure parameters in both the simulation and the real flock, and with some appropriate statistical test, compare them. At this stage in the development of the field, it is not clear what the key parameters should be, so perhaps a shotgun approach might be in order, in which every parameter that can be measured by a technique (such as interbird distances) is measured in both simulations and flocks, to see which offer promise for identifying characteristic flock properties of particular bird species. Model-makers still have much to learn from their models. Heppner and Grenander (1990) noted that the values of the parameters in their model that produced flocking behavior were arrived at serendipitously, and the choice of a Poisson-based force rather than a Gaussian one to drive the model was fortuitous rather than deliberate. In essence, the model worked, but it was not altogether clear why. Attraction-repulsion models (Warburton & Lazarus 1991) may be useful in investigating flock formation. Studying the properties of a flocking model at a screen may permit avian investigators to have the same facility in testing hypotheses as, for example, students of schooling have had in looking at fish in a tank.
Three-dimensional structure of bird
flocks
87
In summary, the three major immediate tasks that must be addressed in the analysis of the structure and dynamics of bird flocks are (1) the development of an inexpensive, portable device for determining three-dimensional space positions of free-flying individuals over time, and whose data can be reduced directly by computer; (2) the determination of what parameters to measure in either a real flock or a model, or both, that will speak directly to questions of leadership, synchrony, initiation of turning and takeoff, internal structure, and coordinating mechanism; and (3) a complete and thorough analysis of the properties of existing simulations to determine what factors influence the formation and movements of simulated flocks.
Acknowledgments I thank Susan Crider, who did most of the yeoman work in the library, and C. R. Shoop, J. G. T. Anderson, and W. L. Romey, who read early drafts of this manuscript and offered helpful suggestions. J. Parrish did a wonderful job of reducing the manuscript by 20% without causing either hurt or outrage.
Appendix Absolute position. Information derived from the two photographic prints was first used to establish where in three-dimensional space each bird in the flock was located each time photographic samples were taken. A Cartesian-coordinate system was defined for this point-in-space analysis. The X- and 7-axes of the system were perpendicular and crossed at the point of intersection of the optical axes of the two 35-mm cameras. The XF-plane was parallel with the ground. The Z-axis, or vertical axis, of the system was defined as perpendicular to the XF-plane. The elevation (Zaxis) and the bird's displacement along the horizontal grid system (Xy-plane) were the real-space coordinates of the bird. Real-space coordinates were calculated for each bird in the flock for every point in time at which the flock was photographed. For the computer program developed to determine the positions of a bird, the horizontal and vertical deviations of a bird's image from the center of a negative were used as the basis for all calculations (Fig. 5.6). The position of a bird on a negative from camera A can be used to locate that bird along a line originating and extending from point T (the optical center of the lens) to point G at infinity. The bird could be anywhere along line TG. Line TG is
88
Frank Heppner
determined as follows: The horizontal displacement (distance Dx) of the image of the bird's head from the center of the negative is measured to yield the length of side QR in triangle QRT. Side RToi the triangle is the focal length of the camera lens when focused at infinity (58 mm). Angle B in right triangle QRT can be expressed as tan-' (QR/RT). Triangles QRT and MET are corresponding right triangles, such that angle S, in triangle MET is equal to angle B in triangle QRT. Angle B{ in triangle MET defines the horizontal displacement of line TG on the Y axis. With this information only, the bird could be in quadrant I or II. The same process is used with data from camera B to locate the bird along line CD. The intersection of lines TG and CD defines point F, which will be the position of the bird in three-dimensional space. It now becomes necessary to determine the X-, Y-, and Z-coordinates of point F. In the example shown in Figure 5.6, the photograph from camera B shows that the bird is left of the center line (Z-axis). In the view taken from camera A, the bird is also left of the Z-axis, placing it in quadrant II of the XY plane. Lines FE (Z-coordinate), ME (F-coordinate), and MS (X-coordinate) must now be determined. Triangle TEC in the XF-plane connects the optical center of the lens of camera A (point T), and of camera B (point C), with point E, which is the projection of point F onto the XT-plane. Side TC of triangle TEC, the distance between the cameras, is a measured distance. Angle t (given by LBX + 45°), angle c (given by A5° — /LF, which is the angular deviation of CD from the Y axis as determined from photographs taken by camera B), and angle e (given by 1 8 0 ° - \ L t + Lc]) are all known. All internal angles and side TC of triangle TEC are now known. Thus, side TE can be determined as TE = [(TC) sin (Z.c)]/sin (Le)
(5.5)
The position of the bird along the F-axis (side ME of right triangle MET) is given by (TE) sin (^B,). The elevation of point F above the AY-plane can be calculated by determining the length of side EF of right triangle TEE Side TE and angle (f) of the triangle are known. Distance EF, the elevation of point F, can be expressed as [cos ()] (TE). The displacement of the bird along the X-axis (side MS) is determined as follows. The distance from the optical center of the lens of camera A to point S is constant (TS = 60.80 m). Side TM of right triangle MET can be calculated as TM = (M£)[tan (LBX)]
(5.6)
In this example, where the bird is in quadrant II, distance TM must be subtracted from 60.80 to yield MS.
Three-dimensional structure of bird
flocks
89
The X, Y, and Z Cartesian coordinates of all birds in the flock were determined for every time at which photographic samples of the flock were taken. Coordinate positions of each possible pairing of birds were used to calculate distances between flock members using the formula D = [(XR - XN)2 + (YR - YNy + (ZR - ZN)2]05
(5.7)
Subscripts 7? and N in the formula refer to the reference (/?) and neighbor (AO birds. Each bird in the flock was analyzed in turn as the reference bird for every time at which the flock was photographed. Distances between each reference bird and all other birds in the block were calculated to yield a series of values for first-nearest neighbor, second-nearest neighbor, through Mh-nearest neighbor. Data for each of the neighbor-distance categories, and the associated mean values, were plotted over time to represent graphically the structure of the flock (Pomeroy & Heppner 1992).
6 Three-dimensional measurements of swarming mosquitoes: A probabilistic model, measuring system, and example results TERUMI IKAWA AND HIDEHIKO OKABE
6.1 Introduction Aggregations in flight such as mating swarms and group migration are widespread phenomena in the insects. However, due to the difficulties of threedimensional measurement, many questions about insect aggregation, such as the process of group formation, identification of spatial structure within a group, or aspects of individual behavior such as spacing or mutual interference, remain unanswered. Using instruments available at the time, various methods have been developed for three-dimensional measurements of animal aggregations (for insect swarms: Okubo et al. 1981, Shinn & Long 1986; for fish schools: Cullen et al. 1965, Pitcher 1973, 1975; for bird flocks: Gould & Heppner 1974, Major & Dill 1978). In our methods, still or video cameras are used to record the positions of objects simultaneously from different perspectives. The principles of stereoscopy are then used to reconstruct the full three-dimensional positions of the objects based on sets of two-dimensional images. In conventional stereoscopic methods, there are two problems which make three-dimensional measurements difficult. One is the camera calibration problem: precise adjustment of the cameras or other apparatus is essential for minimizing distortion in the images. The other is the correspondence or matching problem: matching the points in each image that correspond to the same object is difficult. Manual matching is an exhausting and unreliable process. Automatic matching is preferable, but so far there have been few algorithms or theories for such methods. Moreover, even if matching is done automatically, there are no effective methods of ensuring accuracy. To overcome these problems, we have constructed a probabilistic model and computer programs for automatic matching and reconstruction of the threedimensional position of objects and then applied these methods to the design of a portable photographic system originally designed for measuring mosquito
90
Three-dimensional measurements of swarming mosquitoes swarms in the field (Ikawa et al. 1994). Precise adjustment of apparatus is not a prerequisite and the process of matching is automated. Our method has the advantages of flexibility in experimental application as well as efficiency in data processing. It is applicable to three-dimensional measurements of various kinds of animal aggregations both in the laboratory and in the field. In this chapter, we describe the main features of the probabilistic matching technique, show how it is applied to reconstructing three-dimensional positions, and discuss the actual measuring system used in the field. We also present example data that illustrate spatial and temporal features of mosquito swarming in Culex pipiens pallens Coquillett.
6.2 Probabilistic model for stereoscopy The two main principles, on which most systems of noncontacting measurement of three-dimensional position are based, are trigonometric stereoscopy and range detection with the delay of signal propagation. In a stereoscopic system, the essential and difficult problem is to identify, in each view, the object (point, edge, etc.) which corresponds to the identical real object in space. Most methods for solving this problem can be classified into the following three categories: 1. When objects can be distinguished on the basis of features such as shape, color, or size, similar objects in both views can be readily matched (Herman et al. 1984; Cavanagh 1987; Tatsumi 1987). 2. If two identical cameras are juxtaposed horizontally and their optical axes are parallel, the projected images of a point should have the same vertical coordinates in both views. Conversely, two points having almost the same vertical coordinates are likely to match. This method can be generalized to arbitrary settings of two cameras and is called the "epipolar plane constraint method" (Ohta&Kanadel985). 3. Additional views obtained by supplemental cameras can decrease erroneous matchings (Ito & Ishii 1986; Morita 1989; Randall et al. 1990). For a more general survey of three-dimensional measurement and related methods, the readers are referred to Chapter 3 by Osborn in this volume. In determining the position of a point from two views, we face the problem of redundancy in that although there are four numbers obtained (two pairs of plane coordinates of the image of the point), only three numbers are required for the calculation of the spatial coordinates of the real point. Because it is impossible to eliminate errors in actual measurement of the plane coordinates, the coordinates calculated by each choice of three of the four numbers never coincide and some
91
92
Terumi Ikawa and Hidehiko Okabe
kind of average is usually adopted (Rogers & Adams 1976; Yakimovsky & Cunningham 1978). Finally, to evaluate the precision of the obtained coordinates, some estimate for the expected error should be given. Although some studies have focused on this problem (McVey & Lee 1982; Verri & Torre 1986; Blostein & Huang 1987; Mohan et al. 1989), most of the practical work for stereoscopic range measurement give only empirical data for errors. Our method treats these problems of redundancy by means of Bayesian inference. This is basically the epipolar plane constraint method. However, based on the probabilistic model, our criterion to reject false matching has a precise meaning. Although our consideration here is limited to the measurement of independent points in space, this approach can be extended to the matching of more complicated objects and to a larger variety of situations of stereoscopic measurement.
6.2.1 Binocular system For simplicity, consider the case of two cameras, yielding a pair of images, each photographed from a different direction (a binocular system). Each photograph is simply a planar projection of real three-dimensional objects. Given six reference points whose three-dimensional coordinates are known, the projection matrix can be estimated using the projected positions of these points on a given photograph (Sutherland 1974). For example, a mosquito at true position x has images a , and a 2 on two photographs W{ and W2 (Fig. 6.1). Using the coordinates a ( and the projection matrix, one can determine the equation of the line /. from the camera's projection center C, to a,-. In the practical case, there is error associated with each of our measurements, so that the lines l{ and l2 may fail to intersect. They may pass close to x, but will be separated by some small distance d. If we consider many images, such as a swarm of mosquitoes, there will be many such lines, and the problem of correctly matching pairs of lines arises. Define
*
4
*
* •
A
I
Figure 6.9. Projections of the spatial positions of swarming mosquitoes onto the ground. The size of each panel is 300 cm X 300 cm.
Three-dimensional measurements of swarming mosquitoes
101
often visit swarming sites, where they are eventually caught by a male, mate, and then leave the swarming site with their mates (Downes 1969). Figure 6.8 shows how the number of mosquitoes fluctuated throughout the swarming period. This fluctuation may be better observed by examining the x,y coordinates of mosquitoes (i.e. the projections of spatial positions of swarming mosquitoes onto the ground; Fig. 6.9). Panels A, B, and C on Figure 6.9 were taken at intervals of several seconds, when the mosquitoes crowded into the swarming site. Later, mosquitoes dispersed rapidly (Panels D, E, and F). This suggests that mosquitoes did not remain at a single swarming site, but repeatedly entered and left the sites. Figure 6.10 shows the distribution of nearest-neighbor distances in one swarm. Although mosquitoes clustered tightly at times (as evidenced by the peaks at small nearest-neighbor distances), the fluid structure of the swarm was just as likely to dissolve (traces with no tall peaks or with peaks at higher nearest-neighbor distances). In fact, this was due to changing numbers of mosquitoes in the swarm. There was a negative correlation between the number of individuals and nearest-neighbor distance, suggesting that the swarming space did not change significantly with mosquito numbers.
15
Figure 6.10. A distribution of the distances between nearest neighbors throughout the swarming period. The time intervals between series shown are about 5 min.
102
Terumi Ikawa and Hidehiko Okabe
In a separate publication, we describe the details of principal component analysis and how it can be applied to describing the overall shape of the mosquito swarm. Using this method, we showed that the swarms we observed resembled ellipsoids with axes in proportion roughly 50 : 30 : 20 and that this shape was roughly constant over the time period of observation. This suggests that there was little change in the region in which swarming took place. The swarm height above the marker was between 50 cm and 100 cm throughout the swarming period (Fig. 6.11). Height, time of swarming, and marker characteristics seem to be species specific. For example, some sibling species swarm simultaneously above the same marker; however, the swarms are monospecific because they form at different heights (Downes 1969). Such species-specific variations in swarm behavior may serve to ensure the species isolation. Within the swarm, we could also trace the path of individual mosquitoes. However, because our flashlight power was limited, we could follow only very short trajectories. For nine paths traced, mosquito speed varied between 24 cm/sec and 156 cm/sec with a mean of 81 cm/sec. Thus, individuals within the swarm boundaries move across a broad range of speeds. Our results lead us to conclude that, although mosquitoes act as relatively independent entities, moving quickly in and out of the swarm, the shape, size, and height of a swarm are fairly constant throughout the swarming period. Thus,
15
Figure 6.11. A distribution of the heights of the mosquitoes from the swarm marker throughout the swarming period. The time intervals between series shown are about 5 min.
Three-dimensional measurements of swarming mosquitoes
103
individual mosquitoes may swarm primarily with reference to a marker and their behavior may be influenced by the presence of other individuals only as a secondary effect. Gibson (1985), however, has shown with the two-dimensional analysis of Culex pipiens quinquefasciatus that although mosquitoes change flight speed or flight path in the presence of conspecifics, the swarm area itself does not change. Note the distinction between mosquito swarms and herds, flocks, or schools in which interactions between individuals are a dominant effect, as shown in many chapters of this book. Are there any adaptive advantages in spatial and temporal features of mosquito swarming? Swarming behavior gives mosquitoes opportunities for mating. However, this behavior has a cost, because swarms attract many predators (Downes 1969). Therefore, it may be quite important for mosquitoes to increase the efficiency of finding mates and to keep the swarming period short so as to reduce the risk of predation. Constancy of swarm shape, size, and height may serve not only to promote species isolation, but also to increase the efficiency of finding mates. Male mosquitoes locate potential mates by the distinct sound of the female wing beat. Because this sound is low amplitude, clustering about the marker increases the likelihood that males are within hearing distance of the females. Frequent migration from one swarm station to another results in several advantages. First, by visiting a number of markers, males may find a better swarming site where they can find more females. Second, mosquitoes may find mates while flying between sites as well as at the swarming site. The combined strategy of searching for mates in and between swarms may serve to increase the total encounter rate of males with females.
6.5 Viewing extension of the method The methods and measuring system described in this chapter have a number of advantages over previous techniques: (1) The equipment needed is inexpensive and readily available commercially. (2) The system is portable and compact, and so can be easily transported and used in the field. (3) The precision of placement of camera and reference unit is not a limiting factor in the measurement. Thus, the apparatus can be used for a variety of field experiments. (4) The procedures are automated once the positions of images and reference points on the photograph are entered into the computer. (5) Calculation times are short, even on a personal computer. (6) A three-dimensional graphics program (also developed here) displays the reconstructed images easily on the personal computer. This flexibility and portability make our method applicable to three-dimensional measurement for many kinds of swarms and groups of animals in the field or in the laboratory.
104
Terumi Ikawa and Hidehiko Okabe
Automatic tracking of the three-dimensional movement of each member of a group recorded in movie films or video tapes is a goal of our work. Given a proper kinetic model of the movement of an individual, one could consider two different approaches to reconstruct three-dimensional movement. First, we could track the two-dimensional movement in each view in advance and match the obtained time sequences of two-dimensional positions by our method. Alternatively, we could calculate the three-dimensional coordinates of the objects by stereoscopy and only then track the three-dimensional movement. In the first approach, tracking two-dimensional movement is difficult because there are many apparent collisions among individuals: lacking information about movement in the perpendicular direction, the kinetic model is incomplete. On the other hand, in the second approach, solving the matching problem can be troublesome when the density of individuals is high. To address these problems, we propose a third approach in which the tracking and matching are done simultaneously. We are now developing a computer vision system to realize this approach for the analysis of swarming mosquitoes.
Acknowledgments We are most grateful to Dr. L. Keshet, who encouraged us, read the manuscript at various stages, and provided invaluable comments. We thank Dr. T. Ikeshoji and Dr. H. Akami, to whom we owe the motivation of this study. We are indebted to Mr. E. Masuda and other staff of the Nikon service center for technical advice and to Dr. A. Matsuzaki and the staff of the Tanashi Experimental Farm for allowing us to conduct experiments on site and for their help and kind interest. Finally, we express special thanks to the late Dr. A. Okubo for helpful discussions.
Part two Analysis
Quantitative analysis of animal movements in congregations PETER TURCHIN
7.1 Introduction Most animals spend part or all of their life in groups (Pulliam and Caraco 1984). Gregarious behavior can strongly affect individual fitness, as well as spatiotemporal dynamics of populations (Allee 1931; Hamilton 1971; Thornhill & Alcock 1983; Taylor 1986; Part IV of this book). Quantitative analyses of gregarious movement behaviors, however, are rare (Turchin 1989a). In population ecology, for example, most theoretical analyses assume a contagious, or clumped distribution of organisms (often summarized by a single number, e.g. the variance to mean ratio), without attempting to examine behavioral mechanisms by which organisms clump together. This bias is partly due to the intrinsic difficulty of studying movement, and partly to our limited understanding of how individuals interact within aggregations. It is often difficult to collect data on the spacing and movements of individuals in aggregations, especially in large three-dimensional aggregations such as bird flocks, fish schools, and insect swarms. Recent advances in instrumentation (reviewed in Part I) are beginning to address this problem. However, even when data are available, innovative methods of analysis are needed to test hypotheses about how aggregations are formed, and to build dynamical models of aggregation structure. Movement by organisms is most generally defined as a change in an organism's spatial position over time. Thus, by its nature, the process of movement involves two scales - a temporal and a spatial. Because the description of spatial position typically involves two or three coordinates, a description and an analysis of movement has to be multidimensional (3-D for most terrestrial organisms, and 4-D for aquatic, aerial, arboreal, etc., organisms). A primary conceptual difficulty of analyzing animal movement, thus, is the necessity of dealing with multidimensional data and models. This difficulty caused most population ecol-
107
108
PeterTurchin
ogists to avoid studying movement, concentrating instead on one-dimensional processes of birth, death, and population interactions that can be studied at a point in space, or integrated over a spatial area. To give an example of this avoidance reaction, the "ecologist's bible" (Southwood 1978) devotes only 15 pages out of more than 500 to methods for quantifying movement. Active aggregation (using the definition of Parrish, Hamner, & Prewitt, Ch. 1) is any movement process that results in a nonuniform spatial distribution of organisms. It could result from organisms responding to a wide variety of stimuli, including avoidance of inimical physical conditions and attraction to patchily distributed resources (food, mates, shelter, etc.). The analysis of aggregation presents even greater difficulties than other kinds of movement, because it involves spatially varying movement rates. Yet, until quite recently, the overwhelming majority of quantitative studies of ecological movement employed models with spatially invariant coefficients, such as the simple diffusion (Turchin 1989b). A subset of active aggregation is what I have termed congregation (Turchin 1997): aggregation as a result of behavioral responses of organisms to conspecifics (congregate is to gather together, as opposed to aggregate, which is to gather at some locality). Congregating organisms may respond directly to neighbors or to defined groups of conspecifics using visual and acoustic stimuli, or indirectly to cues such as pheromones and to population density cues, e.g. feeding damage on a host plant. Modeling and analyzing congregation presents even greater conceptual difficulties than aggregation or movement in general. This is because congregation involves a positive feedback between movements of two neighbors, or between individual movement and population density. Eulerian models of congregation can be formulated as nonlinear diffusion problems which pose a number of mathematical challenges (Turchin 1989a; Lewis 1994). Lagrangian models of individual behavior with congregation quickly lead to highly instable, chaotic kinematics. The preceding paragraphs with their litany of difficulties may give the reader an impression that the quantitative analysis of congregation is so difficult that it is practically impossible. This is not, however, the message that I intend to convey. Rather, I would argue that we need a clear understanding of difficulties involved, so that we can design analytical approaches to this difficult, but ultimately tractable and potentially very rewarding problem. In this introductory chapter, I will briefly review various approaches that have been developed for studying congregation. I begin with two broad groups of methods that attempt to reduce the dimensionality of the problem, by concentrating either on the spatial or on the temporal aspect of it, and then turn to full spatiotemporal methodologies.
Quantitative analysis of animal movements
109
7.2 Analysis of static spatial patterns Analysis of spatial patterns has received a lot of attention from plant ecologists, because the majority of plants do not move (or rather, only pollen and seeds can move). Although most animals are not sedentary, their spatial distribution at any given point in time can also be quantified by similar methods. A number of quantitative measures of the degree of clumping or clustering have been developed (for a review see Pielou 1977). If organisms are distributed in discrete units of habitat patches, such as herbivorous insects feeding on host plants, then their statistical distribution can be characterized by such statistics as the variance/ mean ratio, the negative binomial parameter, Lloyd's indices of mean crowding and patchiness, and so on (see Pielou 1977). Organisms distributed in a continuous space can be sampled by means of imposing an arbitrary spatial discretization, e.g. by placing quadrats and counting the number of individuals in them. The same aggregation indices as mentioned above can be then applied to such counts. Although calculating various aggregation indices is straightforward, interpreting them is not. The size of sampling quadrats will often have a great influence on the numerical value of an aggregation index. Changes in mean density between two samples are typically confounded with changes in aggregation as measured by an index. By focusing only on the number (or density) of organisms in a discrete unit of habitat or in a quadrat, an aggregation index ignores potential spatial autocorrelation with neighboring units (quadrats). Thus, aggregation indices ignore not only the temporal component in the data, but also most of the spatial one! As a result, while many papers reporting aggregation indices have been published, little insight has been gained into the dynamical process of aggregation. My subjective impression is that the number of papers reporting an aggregation index for spatial data has declined over the last decade, which hopefully reflects a shift to more sophisticated methods of spatial analysis. Analysis of nearest-neighbor distances (NND) has a greater potential for yielding insights into potential causes of aggregation or, at least, for testing specific hypotheses. For example, Kennedy and Crawley (1967) used an NND analysis to demonstrate what they called a "spaced-out gregariousness" in a sycamore aphid: there is a minimum separation between aphids that is maintained by their habit of vigorous kicking at too close neighbors, and at the same time these aphids form recognizable congregations. Another example of NND analysis is found in Parrish and Turchin (Ch. 9). More generally, spatial positions of individuals within congregations may be modeled as a spatial point process (Diggle 1983). Andersen (1992) provides a recent example of an ecological application of this methodology.
110
Peter Turchin
An alternative to analyzing animal positions in space is to "smear" individuals, and analyze their spatial density distribution. Traditional methods for analyzing variation in spatial density are spatial autocorrelation (Sokal & Oden 1978) and spectral (Platt & Denman 1975) analyses. As a result of an increased current interest in landscape ecology and in analysis of large-scale spatial patterns, ecologists have become interested in adapting geostatistics methodology to their problems. For an overview of statistical methods in landscape ecology see Turner etal. (1991). Although we have apparently progressed beyond aggregation indices in the field of statistical spatial ecology, much work still remains. We still do not know how to make inferences about potential mechanisms that have produced the observed spatial pattern from a description of the pattern itself (Levin 1992). Moreover, methods that focus exclusively on the spatial pattern ignore the dynamical features of the pattern evolution. By throwing out a large component (the temporal dimension) of data, we decrease the statistical power of our methodology to distinguish between rival mechanistic hypotheses.
7.3 Group dynamics While the methods briefly reviewed above focus exclusively on the spatial dimension of the data, an opposite approach is to focus exclusively on the temporal dynamics of congregations, or groups. The dynamical variable of interest is the group size. For example, Cohen (1971) developed models for a stochastic growth/decline of "casual groups," which continuously lose some individuals, while being joined by others. Many such models are reviewed by Okubo (1986:43-54). Okubo (1986) gives a number of applications of these models to data by comparing the observed and predicted frequency distributions of group sizes, including zooplankton patches, fish schools, and mammalian herds. Of particular interest is his fitting of a dynamical model of bark beetle congregation to the field data on the cumulative number of bark beetles attracted to a massattacked host tree (Okubo 1986:64-66).
7.4 Spatiotemporal analysis The most conceptually difficult, but potentially most rewarding are multidimensional (spatiotemporal) analytical techniques. These techniques can be classified into two broad groups. The Lagrangian approach is centered on the individual. Individual movement is characterized by a position, a velocity, and an acceleration (the latter includes turning). The velocity and acceleration can be influenced by spatial coordinates of the organism (environmental influences). By contrast, the Eulerian approach is centered on a point in space. The spatial point is char-
Quantitative analysis of animal movements
111
acterized by densities and population fluxes of moving organisms. These two approaches are reflected in models used to represent animal movements: random walks and individual-based simulations (Lagrangian) or diffusion models (Eulerian). Which of the approaches is used will be determined in part by the types of questions one wants to ask of the system, and partly by the kinds of data one can collect. Following individuals through space and time can provide very detailed information about movement, and is the preferred approach where it is possible (Turchin et al. 1991; Turchin 1997). Detailed understanding of individual movements can be translated into an understanding of population redistribution (e.g. Patlak 1953; Othmer et al. 1988; Turchin 1989b, 1991, 1997; Grunbaum 1994), but the converse is generally not possible. Most Lagrangian approaches, especially in the terrestrial systems, have employed a random walk framework. In this framework, paths of organisms are broken into a series of discrete "moves," which are characterized by one temporal coordinate, move duration, and two (three if movement is in 3-D space) spatial coordinates, move length, and direction (or, sometimes, turning angle). The random walk framework with various elaborations, e.g. correlated random walk, has proved to be a very fruitful approach to modeling and analyzing animal movements, including a number of applications to congregation (e.g., Alt 1980; Turchin 1989a; Lewis 1994; Grunbaum Ch. 17). The random walk framework is especially appropriate to analyzing movements of organisms that make periodic stops, such as a butterfly that moves from one host plant to the next (Kareiva & Shigesada 1983). Animals that move continuously present a difficulty. While their paths can be broken into arbitrary moves at some regular time intervals (Kareiva & Shigesada 1983), this leads to certain problems at the analysis stage (see Turchin et al. 1991). Perhaps a more natural way of modeling continuously moving organisms is based on breaking animal movement into a series of discrete accelerations, rather than discrete moves. This kinematic approach has been used by Okubo and Chiang (1974), Okubo et al. (1977), and will be more fully discussed in Parrish and Turchin (Ch. 9). In recent years a novel approach based on calculating fractal dimensions of observed trajectories has become popular. An example of application of this approach to data on copepod swarming will be discussed by Yen and Bundock (Ch. 10). The estimated fractal dimension of a pathway can be used as a measure of trajectory complexity: "sinuosity" or "tortuosity." In addition to using it as a phenomenological measure of trajectory complexity, some authors have proposed that the fractal dimension can be used to extrapolate movement patterns of organisms across spatial scales (e.g. Wiens et al. 1995). This latter approach may not be valid, since a strict self-similarity is a necessary condition for such an
112
Peter Turchin
extrapolation, yet self-similarity is rarely tested for in fractal analyses of individual paths (Turchin 1996). While individual-based Lagrangian approaches are more mechanistic and provide more detailed information about movement, it is not always possible to follow individuals, because they could be too minute, move too fast, or enter regions inaccessible to human observers or opaque to their recording equipment. In addition, studies based on following individuals are typically limited in spatial extent, temporal duration, and the number of organisms for which trajectories can be obtained. When for some reason it is impractical to follow individuals, one has to resort to observing spatial and temporal changes in their population density. The Eulerian approach, however, should not be considered a poor cousin of the Lagrangian one. It is perfectly adequate, and may even be preferable, in situations where one aims to understand and quantify populationlevel redistribution processes and their consequences for population dynamics and species interactions. Most typically the data for an Eulerian investigation of movement is obtained by some variation of mass-marking and recapturing organisms. The essential purpose is to quantify the temporal change in the spatial distribution of population density. At the analysis stage, the data are fitted to a variety of models, most frequently formulated as partial differential equations, although other mathematical frameworks can also be used (e.g. making either space, time, or both discrete variables). Good references to diffusion models and their uses in modeling and analyzing population redistribution can be found in books by Okubo (1980), Edelstein-Keshet (1988), and Turchin (1997). Although Eulerian models are typically fitted to spatial distribution of organism density or numbers, sometimes we have more information which can be used in sharpening the analysis insights. For example, in Chapter 8 Simmons and I analyze both the spatial densities and population fluxes of swarming bark beetles.
7.5 Conclusion My main exhortation here is that we should not limit ourselves to analyzing only one aspect of the data. In the past, too many analyses focused exclusively on static spatial patterns and ignored the temporal or dynamic component. Full spatiotemporal analysis is conceptually difficult. Construction of explicit models, or better, a set of rival models, to guide the analysis is usually unavoidable. Yet the dramatic increases in computer power and in the quantity and quality of data that we can now collect leave us no excuses for employing outdated, limited techniques of analysis.
8 Movements of animals in congregations: An Eulerian analysis of bark beetle swarming PETER TURCHIN AND GREGORY SIMMONS
8.1 Introduction The mechanisms determining three-dimensional dynamics in animal congregations are behavioral, and thus it is appropriate that the approaches covered in this book are focused primarily on individuals. This chapter breaks out of this mold, because we use a population-level, Eulerian approach (see Ch. 7 by Turchin). A focus on populations, rather than individuals, was forced upon us by the characteristics of our empirical system - the congregation of southern pine beetles around mass-attacked host trees. In fact, in our first (unsuccessful) attempts at quantifying beetle movements in the vicinity of attractive foci, we tried to use an individual-based approach. However, we were not able to consistently follow flying beetles. A certain proportion of beetles flew upwards, out of sight. Even when they remained low, beetles were easily lost against the forest background because they are very small (about 3 mm in length), are dark colored, and fly fast, following erratic paths. In our second approach, we shifted our focus from the behavior of individuals to the dynamics of groups. Because the Eulerian approach does not keep track of individuals, one cannot directly measure parameters of individual behavior. However, by using behavioral observations and making certain assumptions, we can construct a model of population redistribution, which in turn can be used to interpret population-level data, as well as to make inferences about individual behavior. In this chapter we describe an example of how an Eulerian approach can be usefully employed for measuring certain quantitative features of congregating behavior, in particular, the attractive bias exhibited by flying beetles toward the source of congregation pheromones. The organization of this chapter is as follows. We begin by giving a brief description of the biological features of the empirical system. Next, we develop a simple model relating individual parameters of beetle behavior to their popula-
113
114
Peter Turchin and Gregory Simmons
tion-level characteristics - beetle densities and fluxes. In the section after that we describe the specific methods we used to measure fluxes of beetle density at various spatial points, which allowed us to estimate the attractive bias. We conclude with some final remarks on using an Eulerian approach in the analysis of movement behavior.
8.2 Congregation and mass attack in the southern pine beetle The southern pine beetle, Dendroctonus frontalis, is a native predator of pines in the southern United States. It is an aggressive bark beetle, that is, it generally needs to kill a host tree in order to complete development in it. Pines, however, possess defenses against bark beetle attack - the oleoresin system (Lorio 1986). An attack by a single female (the pioneering sex in this beetle) or by a small group of females is unlikely to succeed on a healthy pine, because resin flow from wounds will prevent beetles from penetrating inside the tree bark and excavating galleries in the inner bark for egg laying. In response, the southern pine beetle has evolved a remarkable strategy to overcome tree defenses. As pioneering beetles bore into the tree bark, they begin emitting a mix of volatile compounds, of which the most important is frontalin (Payne 1980). The mixture of beetle-produced and host volatiles attracts other beetles. A positive feedback loop is established: As more beetles (primarily females) bore into the tree, they release more pheromone, attracting additional beetles. As beetles congregate on the tree, they literally drain it of its resin resources, nullifying the tree's ability to defend itself (Hodges et al. 1979). It may take 2000-4000 beetles to overcome defenses of a healthy pine tree (Goyer & Hayes 1991). This phenomenon is known as mass attack. As the mass attack progresses, and the larval resource - inner bark of the tree - starts to fill up, beetles (primarily males) begin releasing repelling pheromone, which eventually inhibits congregation at the tree (Payne 1980). Although the southern pine beetle is the most serious insect pest of the southern pines, quantitative information on its dispersal and congregation is lacking. Over the past few years one of us (Turchin) has been involved in research aimed at constructing a model for understanding and predicting spatial population dynamics of this beetle (see Turchin & Thoeny 1993). Because mass attack of host trees plays such an important role in the beetle's biology, it is clear that quantitative understanding and measurement of southern pine beetle congregation is critical for our ability to predict its spatial dynamics. These considerations motivated a field study of beetle attraction to mass-attacked trees that was conducted in Summer-Fall 1991 (full details of this study will be reported elsewhere).
An Eulerian analysis of bark beetle swarming
115
The basic premise of the study was that beetles flying in the vicinity of a massattacked tree use chemical (pheromones and host volatiles) and visual (vertical shape of tree bole) cues to bias their movements toward the tree (Gara & Coster 1968). This bias results in congregation, which in turn fuels mass attack. The attractive bias is assumed to be a function of the distance and direction from the tree to the flying beetle. This bias is modified by the total number of beetles already boring into the tree. At the beginning of mass attack, the strength of the bias should increase with the number of attacking beetles, since more beetles are congregating on the tree, releasing more frontalin. As the tree begins to fill up, the bias should decrease in strength, possibly even becoming negative (repulsion).
8.3 An approximate relationship between attractive bias and flux Consider movement of a beetle in the vicinity of a congregation focus, a massattacked tree. Behavioral observations suggest that the spatial scale of movement A (the spatial step or the mean free path) is about 1 m. We will model the movement process of beetles as a random walk occurring within a threedimensional lattice of 1 m3 cells, biased toward the attraction focus. At any given point in time, the magnitude of the bias is a function of direction and distance from the focus, that is, there is an "attraction field" centered on, but not necessarily symmetrical around, the focus. The attraction field will change with time as a result of shifts in the wind direction and the number of beetles already attacking the focal tree (there are many other factors that could potentially influence the attractive field, but we will ignore them in order to keep the model simple). Without loss of generality, let us orient the x-axis so that it would pass through the current position of a beetle and the attraction focus. Let R be the probability per unit of time that the beetle will hop one cell toward the attractive focus (say, to the right), and L the probability of moving one step away from the focus (e.g. to the left). Let the sum of these two probabilities be the motility: fi = R + L. Thus, 1 — IJL is the probability that no displacement with respect to focus will occur during the time interval (either the organism did not leave the 1-m cube, or it moved laterally with respect to the focus). We define the attractive bias (/3) as the difference between the probabilities of going toward focus versus going away, given that some displacement with respect to focus has occurred: /3 = (R - L) ix. The attractive bias can vary from 1 (perfect attraction) to - 1 (perfect repulsion), and j3 = 0 implies random movement with respect to the attractive focus. Our goal is to translate the random-walk parameters, in particular the attractive bias, into quantities that can be measured at the population level with exper-
116
Peter Turchin and Gregory Simmons
iments using groups of beetles, rather than individuals. A continuum model such as the diffusion equation may help us to interpret population-level data. We know that a biased random walk can be approximated with a diffusion equation (Okubo 1980; Turchin 1989b). If u — u (x,t) is defined as the density of swarming beetles at position x at time t, then it obeys the following partial differential equation: du
A2 d2
Ad
(8.1)
This equation can be rewritten in terms of the flux (/). Flux with respect to the attractive focus at point x: J , is the difference between the number of organisms passing through a unit (1 m2) surface at x going toward the attractive focus, and the number going in the opposite direction, per unit of time (Fig. 8.1). The diffusion equation in terms of flux is: - = - - / , dt
dx
(8-2)
x
Thus, the flux Jx is related to the behavioral parameters m and b as follows: (8.3)
*X ATTRACTIVE FOCUS
Figure 8.1. Flux in relation to the attractive focus, Jx, is defined as the net flow of beetles through a i m 2 surface toward the focus.
An Eulerian analysis of bark beetle swarming
117
The flux has two components. The first term on the right side is the directional component: It is the product of the local density of swarming insects, u, and the difference in the probability of going toward versus away from the attractive source (yu,j8). The second term on the right side is the random component of flux. It indicates that as a result of random movements, net flow of organisms will be down the population density gradient. Because the density of organisms tends to increase toward the center of the swarm, the random component of flux will work against the directed component. Equation (8.3) suggests that we may be able to estimate the attractive component of flux, and thus /3, by subtracting the random component of flux from the total flux. Let us return to the random-walk formulation and derive an explicit relationship between the attractive bias /3 and the quantities that we can observe experimentally. Suppose we put a sticky screen at the boundary between two neighboring cells, with one side facing the attractive focus, and the other side facing the opposite way (Fig. 8.1). Beetles attempting to move from the cell on the left to the cell on the right will hit the screen and stick to its side facing away from the focus (Fig. 8.1). The rate (numbers per unit time) at which beetles hit the screen on its away-facing side, Jx, is the product of the density of beetles in the cell on the left multiplied by the probability of each beetle moving right per unit time: x
= ux - \,t Rx -
\,t
(8.4)
Similarly, we can write that the rate at which beetles hit the screen side facing toward the focus is j ; = ux + \,t
Lx + {,t
(8 5)
Expanding these relationships in a Taylor series, we obtain J+ = ux-\,t
Rx-{,t
= uR
(uR) + • • •
(8.6)
2 dx Jx =ux
+ \,t
Lx + [,t
=uL + - — I ox
The difference between these two rates is the flux, which is approximately
j, = J; - ,;
**-U-~
K* + «] - ft- - ~
0»>
(8 . 8)
118
Peter Turchin and Gregory Simmons
Note that this expression is simply equation (8.3) where all quantities have been expressed in units of A and T. Let us now consider the total number of beetles hitting both sides of the screen. Like Jx, Sx is affected by the population density gradient. However, random-walk simulations indicated that the magnitude of this effect is slight, and in order to simplify the relationships we have chosen to ignore it. Thus, very approximately Sx = /+ + J~
u(R + L) = IJLU
(8.9)
Both Jx and Sx are instantaneous rates of flows through a unit area, while the data are collected at discrete time intervals, in this case once a day (see next section). Thus, the observed numbers (numbers captured on a sticky screen during T = 24 hr) are actually time integrals of / and 5^: T
jx =
Jxdt
(8.10)
o T
sx=
Sxdt
(8.11)
o T
Integrating both sides of equation (8.8), substituting sx in place of /xudt, and solving for /3 we obtain °
p = L + LW^iA sx
2
(812)
dx
We have assumed that (3 changes slowly compared to T, and that this change can be neglected (this is not a bad assumption since T equals one day, while complete course of mass attack took several weeks to develop). Note that a naive estimator of the bias jS would bejx/sx: the difference between numbers of beetles crossing a unit area toward versus away from the focus, scaled by the total number of beetles crossing in any direction. Although this quantity resembles the definition of /3, it would yield a biased estimate, because of a net flow of beetles down the gradient of population density, resulting from the random component in their movements. The second term on the right side of equation (8.12) corrects for this flux component.
8.4 Field procedure A loblolly pine (Pinus taeda) was selected as the focal tree for mass attack. At each of 6 distances from the focal tree (1.5, 4, 7.5, 12.5, 20, and 30 m) in each of
An Eulerian analysis of bark beetle swarming
119
the four cardinal directions we placed a 1-m 1-m hardware cloth screen. These screens were placed at 5 m above the ground (we have also placed screens at other heights, but here we will concentrate on the data from the 5-m screens). The screens were made sticky by spreading tanglefoot (viscous semiliquid material used to capture insects or protect trees from pests) on them, and then spraying the tanglefoot with a pesticide (otherwise, beetles were able to walk off the screens). The attack was initiated by baiting the focal tree with the synthetic pheromone frontalin, as well as turpentine to simulate host volatiles. As soon as mass attack was underway, the artificial volatiles were removed, allowing the attack to proceed naturally. The course of mass attack on the focal tree was monitored by smoothing with a drawing knife 16 square areas of bark (each 1 dm2 in area) and counting entrance holes of boring beetles in each area every day. The smoothed areas were located in pairs (on East and West sides of the trunk), with a pair at 2, 3, 4 , . . ., 9 m above the ground. Throughout the course of mass attack, all beetles captured on sticky screens were counted and removed once per day. We recorded how many beetles were caught on each side of the sticky screen (facing toward and away from the focal tree). The data reported here are based on three focal trees that were studied in August-October 1991. The course of mass attack was relatively slow during this period, varying from two to four weeks between different replicates.
8.5 Results A typical example of data is shown in Figure 8.2. Numbers are summed over a period of four days, because this provides a less noisy picture of the population fluxes around the attack focus (but still constitutes a short segment of the course of mass attack, so that the strength of attraction did not change very much). The width of the boxes indicates the magnitude of flows towards (filled)/away (open) from the focal tree. Figure 8.2 illustrates two features of the beetle swarm around the attacked tree. First, the population density of flying beetles, as indicated by the total number of beetles captured on both sides of the sticky panel (sx), increases drastically near the focal tree (Fig. 8.3). Distance from the attractive focus (x) explained 70% of variance in total captures, as indicated by a linear regression of ln(sx + 1) on In x. This result is not surprising, since congregation should result in population concentration around attractive foci. A more striking feature of the data is that the relative flux jx/sx exhibits a nonlinear relationship with the distance to the attractive focus: jx/sx is low close to the tree or far away from the tree, and highest at intermediate distances
120
Peter Turchin and Gregory Simmons
1
X
wind
ll
fl f
Figure 8.2. The spatial structure of the beetle swarm around replicate tree 4 during the period of 29 Sept.-2 Oct. 1991. X indicates the focus of congregation (a mass-attacked pine tree). Filled boxes indicate travel toward the focus. Open boxes indicate travel away from the focus. Boxes show how many beetles hit the 1-m2 sticky screen going toward/away from the tree. The width of the box indicates the actual number of beetles captured.
(Fig. 8.4a). This raises the following question: is the decrease in relative flux near the focus a result of decreased attraction there, or is it due to undirected flow of beetles down the population density gradient? We can answer this question by estimating the attractive bias and plotting it as a function of distance to the focus. The attractive bias was estimated from the data using the equation (8.12), assuming a linear relationship between sx and In x. If a and b are the regression intercept and the slope respectively, then the equation (8.12) becomes (8.13)
An Eulerian analysis of bark beetle swarming
121
EARLY
5 10 15 20 25 30
5 10 15 20 25 30
5 10 15 20 25 30 DISTANCE, m Figure 8.3. Average population densities, as measured by s^ the sum of beetles captured on both sides of each screen, as a function of distance to the focus tree. Attack stages: early, middle, and late correspond to the time periods during which 0-25%, 25-75%, and 75-100% of attacks occurred.
Peter Turchin and Gregory Simmons
122 1_
X D W > H
H
i
h
u
i
5
10
I
15
20
DISTANCE FROM FOCUS, m -1J Figure 8.4. Comparison between a) the relative flux, jjsx, (averaged over all replicates) (b)functions of distance from the focus. to b) the estimated attractive bias, j3{x); both (the intercept a reflects the average density of swarming beetles, and does not affect the estimate of /3). The basic data unit for estimating attractive bias was the number of beetles collected at a given trap during a given interval of time. The difference between the number of beetles caught on either side of the trap is j i k t , where i and k code, respectively, for the distance and direction from the focus to the trap, and t indexes time period. Analogously, sjkt is the total number of beetles caught on both sides of the trap. Capturing no beetles at a trap provides no indications about the bias at this point and time. Thus, traps with sikt - 0 were dropped out of the data set. To obtain an estimate of the slope, we regressed In s!kt on In x while keeping k and t constant. The slope estimate bkt was thus different for each combination of direction and time period, because the relationship
An Eulerian analysis of bark beetle swarming
123
between s and x varied from day to day (as attack progressed, the population density gradient got steeper), and also depended on the direction from the focal tree and current wind direction. In formal terms, an estimate of the attractive bias for each trap which caught beetles during each time period was (8.14) s
i ikt
Plotting average attractive bias against distance from the focus (Fig. 8.4b), we see that, unlike relative flux (Fig. 8.4a), attractive bias does not decrease in magnitude near the attractive focus. This observation suggests that low relative flux near the tree does not reflect low attractive bias at these distances, but instead is due to an increased influence of the undirected flux component. The density gradient of flying beetles, and thus the magnitude of the undirected component, is steepest near the tree (see Fig. 8.3). Sometimes, the undirected component can even overpower the attractive bias. For example, the 1.5-m traps situated upwind of the focus tended to exhibit inverse (outward) fluxes of beetles (e.g. the nearest trap to the NE of the focus in Fig. 8.2). To investigate the factors influencing attractive bias, we performed multiple regressions of )3.fa on the distance from the focus to the trap, x., the cosine of the angle between the wind direction and the direction from the focus to the trap, ckl, and the number of attacking beetles per unit area of bark on the focal tree, At. We tested the linear effects of these three independent variables, the quadratic terms, and all pairwise interaction terms. Both xi and xf terms were significant (F, 364 = 10.96, p < 0.001; and F, 363 = 8.64, p < 0.005), suggesting that the relationship between the attractive bias and distance from the attractive focus is nonlinear (this conclusion is confirmed by Fig. 8.4b). Interestingly, the relationship between the attractive bias and At was highly nonlinear: Neither At nor A2t terms were significant by themselves, but they were significant when included into the model jointly (F 2361 = 3.78,/? < 0.025). Wind direction was also influential: Including the three terms, Ckl,Cjt and the interaction term CktAt doubled the percent of variance explained by regression and was highly significant (F 3241 = 4.57, p < 0.005). As expected, attractive bias toward the attacked tree was stronger in the downwind, compared to the upwind, direction. The other two interaction terms did not significantly better the regression. The coefficient of determination (R2) of the model that included all significant terms was low at 0.13. This is not surprising, however, since many traps captured just a few or even one beetle. Thus, the estimate of the relative flux for a trap that captured only one beetle is either 1 or - 1 , which introduces a lot of variability in the /3 estimates.
124
Peter Turchin and Gregory Simmons
8.6 Conclusion The analysis of this data set indicates that the flux approach can provide more information about the behavior of congregating organisms than an approach based on measuring population density near an attractive focus. Measuring population density indicated only that beetles were swarming densely around the attractive focus, suggesting that there was an active congregation at the attacked pine tree. Analysis of population fluxes, on the other hand, provided more details about the southern pine beetle congregation. Most importantly, beetles were actively biasing their movements toward the attacked trees, as indicated by positive /3 at distances of up to 12.5 m. However, there was also a significant undirected (random) element in beetle movement, as evidenced by negative fluxes (away from the focus) just upwind of the tree where the population density gradient was very steep but the attractive bias was weak. The significant effect of wind direction on congregative bias supports the hypothesis that congregation is, at least partly, mediated by airborne chemicals such as congregating pheromones and plant volatiles. The number of beetles attacking the focal tree had a highly nonlinear effect on the attractiveness of the tree to beetles. During the early stages of attack, beetles were drawn to the tree from distances of up to 7.5 m (Fig. 8.5 - early), and the average attractive bias at these distances was about 0.45. During the middle stages of attack the spatial extent of the attractive field increased to at least 12.5 m (Fig. 8.5 - middle), and the average attraction went up to 0.53. Such an increased attraction was most probably due to greater numbers of beetles boring into the tree, whose activity elevated concentration of the congregative pheromone. However, during the late stage of attack, when the tree began to fill up, the average attractive bias decreased to 0.41 (Fig. 8.5 -late), probably as a result of elevated concentration of repelling volatiles.
8.6.1 General implications for individual-based versus population-based approaches The Lagrangian and Eulerian points of view are distinct but related approaches to studying movements of congregating animals. The Lagrangian approach has an a priori advantage, since the complete knowledge of individual behaviors, in principle, allows one to deduce all the population-level patterns. However, by making assumptions and building models one can also deduce individual-based parameters from population data, as this chapter demonstrated. Moreover, the Lagrangian approach will sometimes be an overkill in ecological applications. When all we need to know is the influence of movement on spatial population
An Eulerian analysis of bark beetle swarming
125
EARLY
1
o
h-H M M
5
10 1
15
20
M
1: MIDDLE
DISTANCE,
m
-id Figure 8.5. The effect of the attack stage on the strength of the estimated attractive bias. Attack stages are defined in the same way as in Figure 8.3.
dynamics of organisms, it is an easier and a more direct approach to simply measure the population redistribution parameters, rather than first quantify individual movements and then make an extra step of translating these data into population-level quantities. To summarize, the choice of an approach will depend on many factors, e.g. what kinds of questions are of importance, what kinds of data we can collect. Sometimes, the most powerful approach is to collect the data at both levels (e.g., Turchin 1991).
Individual decisions, traffic rules, and emergent pattern in schooling fish JULIA K. PARRISH AND PETER TURCHIN
9.1 Introduction Schools of fish are one of the most studied and best known of all animal congregations (see Pitcher & Parrish 1993). Over 25% of the world's fish school throughout their lives, and over 50% school as juveniles (Shaw 1978). Behavioral and evolutionary studies of schooling fish have indicated that group membership is more advantageous than a solitary existence. Group members may incur a lower risk of predation (Turner & Pitcher 1986; Magurran 1990; Romey Ch. 12), have greater access to food resources (Street & Hart 1985; Ryer & Olla 1992), and expend less energy swimming (Zuyev & Belyayev 1970; Weihs 1973, 1975). Regardless of the reason, most studies assume that membership in a stable congregation is beneficial to the individual. This positive cost to benefit ratio is then used as an argument for both the evolution (Hamilton 1971; Mangel 1990) and maintenance (Parrish 1992) of aggregative behavior in fish. However, as with the study of congregation in general, mechanistic approaches to the study of fish schooling have lagged behind functional approaches. While we may have a good idea why fish congregate, we know relatively little about how fish congregate, let alone form polarized schools of synchronously responding individuals. Traditionally, schools have been defined by a polarized orientation of the individuals, regardless of whether the school itself is moving or stationary (see Pitcher & Parrish 1993). Thus it is easy to imagine a congregation of fish slipping into, and out of, a schooling configuration, while still maintaining the same group boundaries, volume, shape, and even relative position of the individual members. For instance, a school will often form when the group moves from one location to another; however, should a patch of food be encountered, the polarized configuration may break down as each individual begins to feed independently. Therefore, while the literature often refers to schooling species, this does
126
Decisions, rules, and pattern in schooling
fish
127
not necessarily imply that these fish are constantly confined to a polarized configuration, but rather have the ability to adopt such an orientation should the circumstances warrant it. Attempts to find structure within animal aggregations have often used a static, time-independent approach (but see Okubo 1986), such as the analysis of the distribution of nearest-neighbor distances from a set of still images, irrespective of how identified individuals are moving within the group (Partridge 1980; Campbell 1990). At its extreme, this approach becomes a search for defined structure (e.g. Symons 1971a, b; Partridge et al. 1983), or even a theoretical attempt to constrain school structure to rigidly predefined arrangements (e.g. the crystalline-lattice structure - Breder 1976). Yet animal aggregations are dynamic entities where individual elements within the group are constantly moving with respect to each other. One of the most striking examples of this fluidity are fish schools, which can continuously change volume, shape, density, and direction, yet maintain a coherent, even patterned, structure to the human eye. The apparent visual simplicity of a fish school or a bird flock is belied by the fact that individuals can constantly re-assort without loss of group-level structure. Mechanistically, the group-level properties, such as the discreteness of boundaries and the apparent ability of the group to respond as a unit, are a result of a set of decisions made by each individual about where to go and where to stay (see also Potts 1984; Adler & Gordon 1992; Pitcher & Parrish 1993). In general, movement decisions of an individual group member can be viewed as a balance of forces (Okubo 1980), in particular a set of attractions to, and repulsions from, various sources or foci (for specific applications see Mullen 1989; Heppner & Grenander 1990). The important biological question then becomes: what are the foci of attraction and repulsion from the point of view of an individual group member? Individuals may be attracted to "external" sources, such as concentrations of prey (beetles - Turchin & Simmons Ch. 8; fish - Mullen 1989), or "internal" sources, namely each other (Warburton & Lazarus 1991). Once the sources are identified, the next question is whether the strength of these forces can maintain the congregation (not all kinds of attraction will necessarily lead to congregation formation; see Turchin 1989b). Finally, if attraction sources are stable and the balance of attraction/repulsion leads to group formation and maintenance, can these forces be used to explain the observed structure? Documenting association between individuals in a three-dimensional animal aggregation and a focal point (e.g. aggregation center, specified individual, etc.) requires knowledge of the path taken by identified individuals relative to the path taken by the focus. Therefore, individual positions must be resolved in space (X, Y, Z) and plotted through time to obtain trajectories as well as more
128
Julia K. Parrish and Peter Turchin
specific information about the component of movement (both direction and acceleration) relative to the focus. Once relative movement with respect to the focus has been assessed for all individuals in the group, commonalities in movement pattern, perhaps resulting in group-level architecture, should emerge. Relatively few studies have examined how individuals within a school of fish interact, and how these interactions sum to produce the school as we know it. Aoki (1984) recorded the spatial positions of identified individuals, but limited the school to eight fish, constrained to a planar configuration. Knowledge of individual-based interactions has been technologically hard to obtain, because it requires four-dimensional spatiotemporal information on all individuals (three dimensions for position, and one dimension for time). Thus, many studies collect a series of three-dimensional static images, which lack the temporal component or the reference to identified individuals or both (Graves 1977; Partridge et al. 1980; Koltes 1984). Such studies, therefore, cannot provide direct answers to the questions we have raised above. Partridge (1981) obtained four-dimensional information on a school of saithe constrained to swim within a lighted area. Although individuals were identified, fish frequently swam into and out of the camera view, such that continuous path records were impossible to calculate. However, recent advances in technology (see Osborn Ch. 3; Jaffe Ch. 2) have allowed for automated collection of four-dimensional data on identified individuals, albeit for short periods of time. The other stumbling block has been a lack of analytical tools with which to properly examine four-dimensional data (see Partridge 1981). There are two major approaches to quantifying movements of organisms. Movement of many individuals past fixed locations can be assessed through time, or identified individuals can be followed. The first approach is useful if the group size and/or spatial dimensions preclude the second approach. Turchin (Ch. 7) explores the uses of both approaches and provides an example of the former. Our analysis of fish schools adopts the latter method, based on the point of view centered on the moving individual. Individual movement is characterized by velocity and acceleration, which can be influenced by the position of the organism relative to other objects in its environment. The magnitude and the direction of acceleration can be used to formulate hypotheses about cues that individuals use to stay within the group. In this chapter we demonstrate how the individually based approach can be used to identify potential focal points of attraction/repulsion with a data set on the positions of all individuals within congregations of fish, as they move through three-dimensional space and time. Once biologically relevant foci have been identified, our goal is a quantitative description of the rules of individual movement based on the balance of attractive and repulsive forces. Furthermore,
Decisions, rules, and pattern in schooling
fish
129
we interpret these rules in the context of functional considerations, that is, the selective advantages which may govern the maintenance and evolution of individual movement decisions (see Warburton Ch. 20).
9.2 Experimental setup: Data collection Juvenile blacksmith, Chromis punctipinnis, are temperate, nearshore, upperwatercolumn planktivores, feeding in loose, nonpolarized groups of tens to hundreds of fish. When threatened by predators, either aquatic or aerial, the group coalesces into a tight, polarized school which quickly moves away from the oncoming threat, usually into nearby kelp or rock-reef. Thus, while not an obligatory schooling species, these fish are obviously gregarious at all times and are capable of assuming a packed, ordered arrangement on occasion. Observations of fish behaviors took place at the Catalina Marine Science Center on Catalina Island, California during 1989. A captive population was obtained by netting fish near the laboratory dock. The experimental setup consisted of a still-water Plexiglas tank (1 m3), placed in the middle of a white, featureless room. Thus, fish in the tank had no moving objects to react to (apart from each other). Three SVHS video cameras were placed 7.8 m from the center point of the tank, along each of three orthogonal axes. Thus, the camera views overlapped in all dimensions twice (i.e. X, Y; Y, Z; X, Z). The cameras were connected to three Panasonic video tape decks (VCRs) wired to start and stop simultaneously, and recording at 30 frames/sec. As the videos recorded, a timecode stamp was laid down on each video, so that each frame possessed a date and minute:second:frame code. In this way, frames from all cameras could be matched in time. For each experiment, the tank was filled with filtered seawater, and 5, 10, or 15 individuals were chosen from the captive population and placed in the experimental tank for a one-hour acclimatization period. After the initial transfer, the fish were totally secluded, as all subsequent manipulations were remotely controlled from an adjacent room. The VCRs were set to record for half an hour. After the recording session finished, the fish were removed and replaced in the wild, several hundred meters away from the original capture location to minimize the chance of recapture. Altogether, nine replicates of each fish congregation size (5, 10, and 15) were taped. Small segments of each taping were randomly chosen for analysis. Information from these segments was automatically digitized in 10-sec clips (300 frames) by an automatic framegrabber (Motion Analysis VP310) connected to a Sun Microsystems 3/110 workstation. Therefore, for each sequence, 900 frames (i.e. 300 from each of three cameras) were processed to obtain three-dimensional
130
Julia K. Parrish and Peter Turchin
information. Details of the digitization and three-dimensional data collection are provided elsewhere (Osborn Ch. 3). Briefly, the timecode stamp on each frame allowed individual frames from each camera to be readily identified. Rather than digitize the entire image, the VP310 records only the position of pixels along lines (edges) of sufficiently high contrast. For example, a dark fish against a light background would be digitized as an outline. Reducing a complex image to a set of perimeters allows for a much larger number of frames to be digitized, at the expense of detailed information about individuals (e.g. fin movement, color pattern, etc.). When all views had been digitized, each clip was trimmed so that the start and stop frame of the sequence were identical among the views. To streamline the computational process, every other frame was deleted from each view, so that the effective recording speed became 15 frames/sec. The three-dimensional coordinates of the centroid of each fish were then calculated by the Motion Analysis EV3D program. Centroids were defined as the average X and Y positions of the set of pixels defining the perimeter of each fish, for each frame. The final centroid position (X, Y, Z) was resolved across the three camera views by matching redundant axes (i.e. X, Y; Y, Z;X,Z). Calculated centroids were within .5 cm of the true center of the fish. Error was a by-product of the fact that fish shape (and thus two-dimensional center point) differed between camera views. Within the EV3D program, all fish were assigned an identifying number (1 through AO at the beginning of each 10-sec clip. The program then kept track of each individual by plotting its trajectory and forecasting the most likely volume within which to search for the next point. Mis-assignments were corrected by the program operator and occurred with regularity only at higher school sizes (i.e. >10 fish). The final data sets contained an X, Y, Z coordinate for each individual fish, located in the approximate center of the fish, for every second frame, over a 10-sec interval, for each school.
9.2.1. Analytical approach Our analytical approach was two pronged. First, we attempted to discover whether the relative positions of fish in a given school were structured to any degree (i.e. nonrandom). We defined the presence of structure by testing the spatial associations between nearest neighbors, as well as between all pairwise combinations of known fish (e.g. fish 1 and fish 5; fish 3 and fish 7, etc.) against an expectation of random positions throughout the tank, as well as against random positions within a smaller volume constrained by the separation between individuals.
Decisions, rules, and pattern in schooling fish
131
After finding nonrandom associations between fish, we explored the data set for potential attraction/repulsion foci which might lead to the observed structures. The basic premise of this analytical approach was that fish react to changes in the positions of other members of the school by changing either the direction or the speed (or both) of their movement; in other words, they will accelerate. Acceleration is the time derivative, or temporal rate of change in the fish velocity, and the second time derivative of the spatial position, X. Because fish positions were measured at discrete time intervals, we use the discrete version of the second derivative: (9.1) where At is the acceleration at time t, Xt is the spatial position at time t, and T is the time interval between successive observations (Fig. 9.1). Like Xt, At is a three-dimensional vector. The discrete acceleration measured according to the above formula is a finitetime approximation of the instantaneous acceleration. Thus, the time interval T
Figure 9.1. Diagrammatic representation of the component of an individual fish's discrete acceleration, Af projected onto the direction toward the attraction focus. If the proportion is positive, the fish is accelerating toward the focus. A negative value indicates thefishis accelerating away from the focus.
132
Julia K. Parrish and Peter Turchin
should not be too long, or the approximation will be too coarse-grained. However, if the time interval is too small, then the opposite problem of redundance is encountered. For example, when a fish perceives that it is too close to a neighbor, it will turn (accelerate) away. If the data are collected too frequently, then the acceleration of the fish will occupy several time frames, and discrete acceleration calculated during consecutive frames will not be independent. Indeed, we observed that there were positive serial correlations between discrete accelerations when calculated at time intervals of 1/15 of a second. In other words, the data set is oversampled, and subsequent data points are somewhat redundant (see also Partridge 1981). To overcome this problem, we increased the time interval until the autocorrelation between successive accelerations disappeared. This occurred at T = 0.2 sec, which we subsequently used in our analyses. We are primarily interested in the behavioral response of the fish to some "congregation focus," which could be a neighbor, a group of neighbors, or the entire school. Accordingly, we started with the simplest subset - an individual's nearest neighbor - and then incremented the subset by adding the next nearest neighbor, and so on, until the focus eventually became synonymous with the entire school. For each subset of neighbors (NN1 through the school), we defined the potential focus as the centroid of all relevant fish positions. Then, for each fish, we calculated the projection of its acceleration vector on the direction toward the potential focus of attraction (Fig. 9.1). If this projection is positive, then the fish is accelerating toward the focus; if negative, then the fish is accelerating away from it. Our analysis focused on this component of acceleration in the direction of a focus. 9.3 Group-level patterns In the absence of any disturbance, juvenile blacksmith explored the limits of the tank, both as singletons and as a group. Individuals frequently left the main congregation, and then later rejoined it. Although undisturbed fish did not display any polarized or otherwise recognizably regular configuration, they did appear to be clumped together nonrandomly. When tested against a random arrangement of positions within the volume of the tank, real fish were significantly more clumped than their random counterparts (Student Mests, P < 0.001 for all school sizes; Fig. 9.2a). However, this comparison is somewhat artificial in that blacksmith were originally chosen because they aggregate. Therefore, the data sets (real and random) were constrained by excluding stragglers, defined as those individuals greater than 18 cm (approximately 3 body lengths) from any other fish. The exclusion of stragglers had several effects: First, sample sizes (the number of nearest-neighbor distances (NND) calculated) were lower in both
Decisions, rules, and pattern in schooling fish
133
random positions real positions
15 School Size Figure 9.2. The mean and standard deviation of the distance between each fish and its nearest neighbor (nearest-neighbor distance, NND in cm) for randomly derived positions (black) and real fish positions (white). Numbers above each bar are sample sizes (= the number of NNDs calculated per time sequence). Asterisks indicate significant differences. (A) All fish in the tank included. (B) Stragglers, defined as those fish greater than 18 cm (3 body lengths) away from any other fish, eliminated from the data sets.
134
Julia K. Parrish and Peter Turchin
data sets, albeit dramatically in the random position data set (Fig. 9.2a, b). Second, with the effect of these stragglers removed, the remaining fish were more closely associated, although the real schools were tighter (Student f-tests, P < 0.001 for 10- and 15-fish school sizes; Fig. 9.2b). There was no apparent difference between random and real schools at the 5-fish school size; however, this is quite probably due to the extremely small sample size of the random school (6 NND < 18 cm out of a possible 240). Finally, fish in real schools appeared to maintain more constancy in their nearest neighbor associations (Fig. 9.2b). If the congregations had been rigid, distances between identified pairs would have varied little, resulting in a small measure of deviation around each mean distance, regardless of its absolute size (i.e. whether the fish pair were nearest neighbors or far apart). Conversely, if the fish had been swimming randomly with respect to each other, any measure of deviation around the mean distance between identified pairs would have been high, because of the total lack of linkage between pairs of fish. We would expect real fish to fall somewhere between these two extremes. Obviously, linkages between fish pairs are not rigid, but perhaps elastic. The degree of elasticity may depend on the species and the situation. Although the blacksmith did remain clumped within the boundaries of the tank, they did not maintain any degree of fixed architecture with respect to each other (Fig. 9.3). The range and the shape of the distribution of these pairwise
5 fish schools
10 fish schools •
15 fish schools
0
5
10
15
20
25
30
Standard Deviation of Pairwise Distances Between All Individuals (cm) Figure 9.3. Box-and-whisker plots (vertical bar-mean, box - standard deviation, horizontal line - range) of the standard deviations of all paired distances between all identified individuals in all schools of each size (5, 10, and 15) respectively. For example, within a single 10-sec video clip, the distance between fish 1 and fish 2 is recorded for every frame (= 150 frames). Then a mean and standard deviation are calculated for each pair. The range and shape of the distribution indicates something about how much individuals move with respect to each other.
Decisions, rules, and pattern in schooling fish
135
16.0 14.0 f 12.0
1
Q 10.0 „ 8.0 S3 4.0 2.0 0.0
5
fish
10 fish
15 fish
School Size Replicates
Figure 9.4. The mean and standard deviation of the nearest-neighbor distance for each replicate school (stragglers excluded). The dashed lines indicate the grand mean for each school size.
deviations is large and unimodal, respectively, indicating that these fish varied between the two extremes. While some fish pairs may have remained more or less equidistant (low standard deviation), other fish pairs ranged widely in interfish distance. Therefore, while fish are not randomly distributed about the tank, they are also far from forming any type of rigid association. Constantly changing the distance between individuals is not necessarily synonymous with a total lack of three-dimensional structure. It may simply mean that associations between unique pairs of individuals are constantly forming and breaking as the fish move past each other, regardless of degree of regularity of interfish spacing. Across all school sizes, blacksmith maintained a fairly regular distance between themselves and their nearest neighbors, regardless of the identity of that individual (approximately 11 cm, or 2 body lengths; Fig. 9.4). Thus, in the absence of any external structuring force, such as predation or food, the fish nevertheless appeared to congregate and maintain a set average distance between adjacent individuals, even though the identity of nearest-neighbor pairs was constantly changing.
9.4 Attraction/repulsion structuring Maintenance of the congregation, in the absence of external forcing factors, implies that the sources of attraction/repulsion are the fish themselves. How many of their neighbors do fish pay attention to? Fish could be attracted to all school members equally, or they could respond to some subset of the school. Further-
136
Julia K. Parrish and Peter Turchin
more, any particular focus could be either attractive, repulsive, or neutral, depending on the distance separating it from the fish. We explored this issue by examining the influence of a range of foci on fish movement. Foci ranged from an individual's nearest neighbor to the centroid of the school. We regressed the component of each individual's acceleration in the direction of the focus on the distance between the focus and the individual. Distance had to enter the model, because the degree of attraction varies with distance to that focus (see below). This procedure was iterated for all fish in the tank. The influence of the focus on individual movements was quantified by the coefficient of determination of the regression (R2). Our first observation was that the proportion of variance explained by the distance to the focus (for all subsets) was quite low (Fig. 9.5). This means that individual fish move quite freely with respect to their neighbors, rather than having to constantly accelerate to adjust their position with respect to the positions of other school members. Second, and more interestingly, the highest R2 values tended to be on the extremes of the continuum (Fig. 9.5). In other words, indi-
.12 • 5 fish • 10 fish
*3 w .10 O) Vi
• 15 fish
£3
is fa •3 e 83 a jo
.08
.06
a -a
M
.04
o
o 0.0
8 NN •
a
0.5-
-2
-1
0
1
Orientation Angle, 8 (a)
u 60
2