Lecture Notes in Geoinformation and Cartography Series Editors: William Cartwright, Georg Gartner, Liqiu Meng, Michael P. Peterson
For further volumes: http://www.springer.com/series/7418
.
Vasily Popovich Christophe Claramunt Thomas Devogele Manfred Schrenk Kyrill Korolenko l
l
Editors
Information Fusion and Geographic Information Systems: Towards the Digital Ocean
Editors Prof.Vasily V. Popovich Deputy Director SPIIRAS for Research 14 Liniya, 39, SPIIRAS St. Petersburg, 199178. Russia
[email protected] [email protected] Prof. Thomas Devogele Naval Academy Research Institute Brest naval France
[email protected] Prof. Christophe Claramunt Naval Academy Research Institute Brest naval France
[email protected] Dr. Manfred Schrenk CEIT ALANOVA gemeinnu¨tzige GmbH Inst. Stadt, Verkehr, Umwelt und Informationsgesellschaft Am Concorde Park 2 2320 Schwechat Austria
[email protected] Kyrill Korolenko NAVSEA, Chief Scientist/NUWC Code 1543 B1320 Howell St. 1176 02841-1708 Newport USA
[email protected] ISSN 1863-2246 e-ISSN 1863-2351 ISBN 978-3-642-19765-9 e-ISBN 978-3-642-19766-6 DOI 10.1007/978-3-642-19766-6 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011931203 # Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: SPi Publisher Services Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
This volume contains the papers presented at the Fifth International Workshop on Information Fusion and Geographical Information Systems (IF&GIS’2011) held from May 10 to 11, 2011, in Brest, France, and having a specific focus on maritime geographical information systems and the digital ocean. This series of workshops are continuously organized by the OOGIS Research Laboratory of St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS) and Ecole Navale. In May of 2011, IF&GIS’2011 was organized under the umbrella of the Third International Conference on Safer Seas (ecли ecть дoбaвить coкpaщeннoe нaимeнoвaниe). The workshop successfully continued the biannual series, and as always attracted representatives of the academic and industrial community from a wide range of disciplines, including computer science, geography, statistics, mathematics, hydrograph, geomorphology, and environmental sciences. The objective of this workshop was to consider the development of Geographic Information Systems’ science with a specific relation to Maritime GIS applications, GIS data fusion, and computational developments. The scope of the Fifth IF&GIS workshop and, hence, its proceedings address several research issues of modeling, analysis, information processing, and visualization of geographical information with a specific application to the maritime environment. Maritime GISs are at the cutting edge of the novel applications oriented to navigation safety, and several contributions presented at the workshop consider this aspect in depth. The papers selected by the international program committee reflect problems of multidisciplinary approaches to shipping and navigation, monitoring, planning and surveillance of maritime areas, as well as the concept of maritime safety and security and maritime transportation. While traditional topics of GIS conferences are well represented and still advanced, several new domains with an emphasis on the problems of scientific and technological innovations and possibilities to make the seas safer and cleaner, and meeting new challenges with respect to economic and shipping trends are emerging. The issues of information assurance and protection for GIS have also been integrated into the workshop program, thus, including computer systems’ protection, information technologies’ complex security maintenance by scanning protection mechanism as applicable to GIS. v
vi
Preface
The submission process attracted 38 abstracts from 14 countries, and 26 papers were selected for reviewing. After thorough reviewing, the program committee accepted 15 papers from 9 countries for presentation and publication, as well two invited papers. On the basis of the accepted papers, the following four sessions are made up of contributions on: Ontology and Programming Technologies for GIS and GIS Applications; GIS Data Fusion; GIS Algorithms and Computational Issues; Novel and Emerging Marine GIS Research Areas. The IF&GIS’2011 program was enriched by contributions from two invited speakers: Distinguished Professor JeanClaude Thill, University of North Carolina at Charlotte, USA, and Professor Bin Jiang, University of Ga¨vle, Sweden, whose invited papers are also included in the volume. The success of the workshop is conspicuously due to the joint efforts of sponsors, organizers, reviewers, and participants. We would like to acknowledge the contribution of the organizers of the Third International Conference on Safer Seas, as well as the contribution of the program committee members, and specially praise the support and sincere effort of all the reviewers; our deep gratitude should also be expressed to all the participants and all the authors of the submitted papers. We are grateful to our sponsors, the Russian Academy of Sciences and the US Office of Naval Research Global (ONRGlobal), Brest Me´tropole Oce´ane, Technopoˆle Brest Iroise, Poˆle Mer Bretagne, Re´gion Bretagne, Conseil Ge´ne´ral de Bretagne for their generous support. Finally, we also wish to extend our gratitude to Springer’s team, managed by Dr. Christian Witschel and Agata Oelschla¨ger, for their help and collaboration. St. Petersburg, Russia Brest, France Vienna, Austria Brest, France Newport, RI, USA May 2011
Vasily Popovich Christophe Claramunt Manfred Schrenk Thomas Devogele Kyrill Korolenko
Contents
Part I
Invited Papers
1
Is Spatial Really That Special? A Tale of Spaces . . . . . . . . . . . . . . . . . . . . . . 3 Jean-Claude Thill
2
A Short Note on Data-Intensive Geospatial Computing . . . . . . . . . . . . . . 13 Bin Jiang
Part II
Ontologies and Programming Technologies for GIS and GIS Applications
3
SOA-Based Modeling for Digital Ocean: Approach and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Alexander Smirnov, Alexey Kashevnik, and Nikolay Shilov
4
Advantages of Intelligent Geographic Information System Research Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Ruslan Sorokin
5
Combining of Scanning Protection Mechanisms in GIS and Corporate Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Igor Kotenko, Andrey Chechulin, and Elena Doynikova
Part III 6
GIS Data Fusion
On the Use of Three-Dimensional Self-Organizing Maps for Visualizing Clusters in Georeferenced Data . . . . . . . . . . . . . . . . . . . . . . . 61 Jorge M.L. Gorricha and Victor J.A.S. Lobo
vii
viii
Contents
7
Choice of Danger Prevention Strategy Based on Aggregative Indices Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Ivakin Jan
8
Beyond K-means: Clusters Identification for GIS . . . . . . . . . . . . . . . . . . . . 93 Andreas Hamfelt, Mikael Karlsson, Tomas Thierfelder, and Vladislav Valkovsky
9
A Time Calibration Algorithm Based on the Data Fitting and Its Application in Fusion of Navigation Information . . . . . . . . . . Tian-zhen Wang, Tian-hao Tang, and Sheng-jie Zhang
Part IV 10
11
107
GIS Algorithms and Computational Issues
Remote Sensing Data Analysis Based on Hierarchical Image Approximation with Previously Computed Segments . . . . . . . . . . . . . . Philipp Galiano, Mikhail Kharinov, and Victor Kuzenny
119
Measuring Surrounding Space to Assess the Pedestrian Visual Aperture Angle in the Urban Fabric: Toward a Kurtosis-Based Isovist Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Thomas Leduc, Vincent Tourre, Philippe Woloszyn, and Francis Miguet
Part V
Novel and Emerging Marine GIS Research Areas
12
New Tools in Multisensor GIS Integration and Visualization . . . . . . Philippe Alain
13
Tsunami Risk Assessment Method Using Geoinformation Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O.V. Smirnova
145
155
Real-Time 3D Monitoring of Marine Navigation . . . . . . . . . . . . . . . . . . . Cyril Ray, Rafal Goralski, Christophe Claramunt, and Chris Gold
161
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
177
14
Organization
Workshop Chairmen Co-Chairmen Christophe Claramunt Vasily V. Popovich
Naval Academy Research Institute St. Petersburg Institute for Informatics and Automation, Russia
Program Committee Chairmen Vasily V. Popovich Christophe Claramunt Manfred Schrenk Thomas Devogele Kyrill V. Korolenko
St. Petersburg Institute for Informatics and Automation, Russia Naval Academy Research Institute MULTIMEDIAPLAN.AT, Vienna, Austria Naval Academy Research Institute NAVSEA, Newport, USA
Program Committee 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
G. Andrienko S. Axberg T. Badard M. Bertolotto R. Billen J. Breman M. Breunig J. Carswell C. Chauvin D. Engelke M. Florea
(Fraunhofer Institute IAIS, Germany) (National Defense College, Stockholm, Sweden) (Universite Laval, Quebec, Canada) (University College Dublin, Ireland) (University of Liege, Belgium) (ESRI, USA) (University of Osnabrueck, Germany) (Dublin Institute of Technology, Ireland) (South Brittany University, France) (Spatial Planning, pakora.net) (Thales Systems Division, Canada)
ix
x
12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.
Organization
C. Gold F. Gourmelon D.R. Green E. Guilbert A. Hamfelt N. Hovanov B. Huang G. Jacobson B. Jiang A. Khenchaf I. Kotenko M. Kokar S. Levachkine V. Lobo N. Moldovyan M. Mueller R. Pelot E. Peytchev A. Poplin A. Radjesvarane E. Saux A. Smirnov A. Stepnowski T. Tang M. The´riault R. Thibaud M. Tomko V. Valavanis
(University of Glamorgan, Wales) (University of Brest, France) (University of Aberdeen, UK) (Hong Kong Polytechnic) (Uppsala University, Sweden) (St. Petersburg State University, Russia) (Chinese University of Hong Kong) (Altusys, Boston, USA) (University of Ga¨vle, Sweden) (ENSIETA, France) (SPIIRAS, St. Petersburg, Russia) (Northeastern University, Boston, USA) (National Polytechnic Institute, Mexico) (New University of Lisbon, Portugal) (SPECTR, Russia) (GEOMAR, Germany) (Dalhousie University, Canada) (The Nottingham Trent University) (University of Hamburg, Germany) (Naval Academy Research Institute, France) (Naval Academy Research Institute, France) (SPIIRAS, St. Petersburg, Russia) (Gdansk University of Technology, Poland) (Shanghai Maritime University, China) (Laval University, Canada) (Naval Academy Research Institute, France) (University of Zurich, Switzerland) (Hellenic Centre for Marine Research, Greece)
Reviewers Axberg Stefan Badard Thierry Bertolotto Michela Berman Michael J. Breunig Martin Carswell James D. Claramunt Christophe Corgne Samuel Devogele Thomas Florea Mihai C. Gourmelon Francoise
(National Defense College, Stockholm, Sweden) (Universite Laval, Quebec, Canada) (University College Dublin, Ireland) (ESRI, USA) (Research Centre for Geoinformatics and Remote Sensing, University of Osnabrueck, Germany) (Dublin Institute of Technology, Ireland) (Naval Academy Research Institute, France) (University of Rennes 2, France) Devogele Thomas (Thales Systems Division, Canada) (University of Western Brittany, France)
Organization
Green David Guilbert Eric Hamfelt Andreas Jiang Bin Kokar Mieczyslaw M. Lobo Victor Moldovyan Nikolay Niculescu Simona Pelot Ronald Peytchev Evtim Popovich Vasily V. Saux Eric Smirnov Alexander Thibaud Re´my The´riault Marius Tomko Martin Valavanis Vasilis
xi
(University of Aberdeen, UK) (Hong Kong Polytechnic) (Uppsala University, Sweden) (University of Ga¨vle, Sweden) (Northeastern University, USA) (New University of Lisbon, Portugal) (SPECTR, Russia) (University of Western Brittany, France) (Dalhousie University, Canada) (The Nottingham Trent University, UK) (SPIIRAS, St. Petersburg, Russia) (Naval Academy Research Institute, France) (SPIIRAS, St. Petersburg, Russia) (Naval Academy Research Institute, France) (Laval University, Que´bec, Canada) (University of Zurich, Switzerland) (Hellenic Centre for Marine Research, Greece)
.
Part I
Invited Papers
.
Chapter 1
Is Spatial Really That Special? A Tale of Spaces Jean-Claude Thill
Abstract Spatial sciences have been operating under the mantra that “spatial is special.” The contention of this chapter is that this is contingent upon the implicit assumption that spatial and semantic information are fully separable. Large segments of spatial science research take space as an a priori and invariant backcloth over which events and experiences unfold. A considerable intellectual discourse on the nature and modes of existence of space has identified discordant perspectives that may undermine the universality of the innate distinctiveness of space. The chapter reviews and discusses some of the best established lines of thought on the understanding of space across disciplinary boundaries and underscores their relevance to spatially enabled research. Some of these ideas are articulated in the context of spaces reconstructed in relation to population migration in the USA.
1.1
Introduction
For geography and other spatial sciences that have spun off it over the past decades, space is nothing less than “the basic organizing concept” (Whittlesey 1954) according to which all other concepts and constructs assume a meaning. Place, location, direction, distance, interaction, scale, and miscellaneous semantic information dimensions of features are all subservient notions of their grounding in some theorization of space. The premise is that organization and structure are revealed through the spatial agency of features, events, and experiences (Abler et al. 1971). Operating under the mantra of “Spatial is Special,” contemporary spatial scientists have forcefully staked their claim on this segment of the intellectual discourse. It appears that the case for the spatial specificity was first couched in these terms by Luc Anselin (1990) in a contribution titled “What is special about spatial data?” Over the past two decades, this catchy phrase has garnered a great deal of interest
J.-C. Thill Department of Geography and Earth Sciences, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA e-mail:
[email protected] V. Popovich et al. (eds.), Information Fusion and Geographic Information Systems: Towards the Digital Ocean, Lecture Notes in Geoinformation and Cartography 5, DOI 10.1007/978-3-642-19766-6_1, # Springer-Verlag Berlin Heidelberg 2011
3
4
J.-C. Thrill
and passion inside and out of spatial sciences. The argument is a cornerstone of Geographic Information Science instruction, as evidenced by the prominent place it occupies in textbooks such as Longley et al. (2005). It has been echoed by many other spatial scientists, including Anselin and Getis (1992), Goodchild (2003), and Andrienko et al. (2007), to name a few. In fact, this has become a well-recognized and well-used phrase, as evidenced by the 8,000 hits that a simple Google® internet search for the phrase “What is special about spatial?” reveals. As advanced by Anselin (1990), the distinguishing characteristics of spatial data boil down to the properties of spatial dependence and spatial heterogeneity. Spatial dependence expresses the tendency of phenomena occurring in closer proximity to exhibit more similarities than events further apart in space (spatial autocorrelation). That a process would drift across space, and thus be nonstationary in some sense, is what spatial heterogeneity is about. In this paper, the contention is that the claim that “spatial is special” is contingent upon the implicit assumption that geographic space and semantic information associated with events and features are fully separable. While the assumed conceptualization of space also serves as a backcloth for the overwhelming majority of spatial science research, a considerable intellectual discourse on the nature and modes of existence of space has identified discordant perspectives that may undermine the universality of the innate distinctiveness of space. The purpose of this paper is to question the “unquestionable” premise of spatial analysis that space exists independently of events and experiences. I review and discuss some of the best established lines of thought on the understanding of space across disciplinary boundaries and underscore their relevance to spatially enabled research from process and analytical perspectives. Some of these differences are articulated in the context of spaces reconstructed in relation to population migration in the USA.
1.2
Concepts of Spaces
The debate on space and on possible understandings of the concept of space is as old as the ability of human beings to cogently reflect on the human condition and their place in the earth environment. It is often closely intertwined with the discourse on the nature of time. Greek philosophers and mathematicians were the earliest contributors on record to formally articulate views on the nature of space. Their understanding was that of an absolute space. It took no less than 2,000 years for a relative concept of space to take hold in Western thought and science.
1.2.1
Absolute Space
From Aristotle to Euclid, Greek understanding of the world was that of a space comprising a primal, unchanging, and immutable empty expanse (void) occupied
1 Is Spatial Really That Special? A Tale of Spaces
5
with entities (atoms) referenced to a simple coordinate system. Euclidean geometry provided the abstract framework for reasoning and analysis in and on this absolute space. Physicist Isaac Newton imagined space as a kind of absolute grid, within which objects are placed and events occur. Space is objective and acts as a container with non-dynamical space-time structures and geometry to which objects are tied (Curry 1996). Immanuel Kant (1929) subsequently expounded on the same idea by saying that “Space is not an empirical concept which has been derived from outer experiences.” Therefore, as shown by Jammer (1969), Harvey (1969) and others, space is infinite, homogeneous, and isotropic. Abler, Adams, and Gould (1971) remarked that geographers of all stripes regarded space in absolute terms until the 1950s, whether they were proponent of the scientific method, like Christaller (1933), or of idiosyncratic studies, like Hartshorne (1939). Representation of the objective, absolute space has also been the ultimate goal of cartography and mapping sciences from their inception, at least in the western tradition. In spite of methodological impediments and the inherent difficulty to represent the curved reality supported by the earth’s surface, space absolutism has consistently been held as the gold standard (Forer 1978). Banding the wagon of the quantitative revolution in social sciences, spatial analysis promptly embraced the view that space is consistently measurable, immutable, uniform, and independent of the social or environmental qualities of the space that is being investigated. The deep-seated belief is here that space and distance relationships between entities have a profound effect on semantic patterns that emerge over a certain territorial expanse (Cliff et al. 1975). It is argued for instance that, because of spatial dependence, patterns of socio-ethnic population segregation take form in urbanized areas and ecological territories develop with precise mixes of fauna and flora, and so on. Espousing the paradigm of an a priori physical space, spatial statistics, and spatial econometrics (Anselin 1988; LeSage and Pace 2009) have blossomed to constitute one of the pillars of contemporary spatial sciences. Paradoxically, spatial effects are routinely captured through rather simplistic measurements of spatial relationships between entities. Gatrell (1983) is not afraid to use the word “naive” to qualify the mainstream portrayal of spatial relationships in spatial analysis. It is standard indeed to rely on a so-called spatial weight matrix W defined a priori as a time-independent row-standardized table of pairwise polygonal contiguities. First-order contiguity can be handled according to a queen criterion or the more restrictive rook criterion. Higher order adjacency can also be used, as well as some inverse function of distance between polygonal entities. If spatial econometric research is deeply wedded to Euclidean geometry, so is spatial theory as little research has been conducted with alternative geometries [see, for instance, Beguin (1979) and Rushton and Thill (1989)]. There is growing recognition however that, even if space acts as a container, it may not be isotropic and adjustments may be in order to prevent biases. In particular, phenomena such as car crashes, retail businesses, or stream pollution, which are known to happen only on one-dimensional spaces (roads, streets, streams, etc.) embedded in the two-dimensional Euclidean plane, have been analyzed on the basis of spatial weight matrices defined on network spaces (Yamada and Thill 2010; Steenberghen et al. 2004).
6
J.-C. Thrill
The concept of absolute space has been the cornerstone of geographic information systems (GIS) since their inception in the mid-1960s. Curry (1996) and other commentators have also argued that GIS has dramatically extended the shelf life of the concept of absolute space. The absolute view of space is consistent with both the field-based model and the object-based model of geospatial data, with the minor distinction that the spatial framework comes first in the former, while it becomes explicit only after objects have been identified, in the latter. Conventionally, in a GIS, a standardized coordinate system shared by all the features within a certain geographic perimeter and anchored in the absolute space is used. The elegant simplicity of this approach that mimics and automates the manual overlay of hardcopy maps has greatly facilitated the integration of spatial data. Spatial reasoning and spatial analytical processing within and across layers and scales of information are thus enabled. The integration of data referenced to multiple quantitative referencing systems is possible in a GIS provided that mapping functions between pairs of referencing systems are defined. For instance, a milepost referencing method can be integrated with data stored by mailing address and data referenced to a state plane coordinate system through a multilayered data modeling framework as presented in Dueker and Butler (2000). On the other hand, qualitatively referenced information has so far proved rather elusive to handle (Yao and Thill 2006). Qualitative locations (QL), for instance, have positional and semantic properties that vary across individuals and may vary across contexts (time of the day, purpose of a spatial query, etc.). This variability critically hinders GIS data handling and integration, as well as reasoning with other objective information layers stored in the GIS database. It has been shown in Yao and Thill (2005) and Yao and Thill (2007) that a QL-enabled spatial information system designed around the conventional principles of metric space embedded in GIS is feasible. Taken together, these results suggest that GIS can be divorced from the tyranny of spatial absolutism and articulate absolute and relative spaces cognized by various individuals under changing contexts.
1.2.2
Relative Space
A relativist view on space posits that it is not merely a neutral container for natural and socioeconomic processes, but that it is in turn constructed and continually reconstructed by the very events and experiences that take place within it (Jammer 1969). Early breaches to the hegemony of the concept of absolute space were exacted by physicist and philosopher Gottfried Leibniz, who held radical views on motion of material objects and energy that led him to posit that space is no more than the collection of relations between material entities in the world. These ideas were instrumental in fostering Albert Einstein’s theory of relativity where space and time are fused into a single construct known as space-time. Following the indictment of objectivism and spatial absolutism advocated by spatial analysis that was waged by humanists, structuralists, and behaviorists
1 Is Spatial Really That Special? A Tale of Spaces
7
largely on the ground that spatial patterns exhibited by semantic structures are not inexorably governed externally (tyranny of the physical space), the concept of a relative space defined by relationships among spatial features has attracted geographers and other spatial scientists (Gould 1991). Generally speaking, one concept of relative space should not be favored over another; they are complementary and collectively celebrate the richness of processes creating spatial structures. Couclelis (1992), Gatrell (1983), Forer (1978), and others effectively argued that there is no single best perspective for discussing and comprehending space. Selected schemes are examined later in this section. Since the 1960s, the idea of relative space has transpired through numerous mathematical models of socioeconomic processes (Couclelis 1992), but much less so for environmental processes (Meentemeyer 1989). If geographic space is indeed relative and if it is a creation of perceptions, experiences, interactions, and relationships (Gould 1991), there is no universal spatial “ground truth” anymore. Instead, space is eminently plastic (Forer 1978), fungible, and malleable. The prospect of working with a relative space of one form or another (hence, one of many possible spaces) may appear rather daunting given the contrasting differences that exist between absolute and relative spaces. Table 1.1 summarizes some of the salient characteristics of the two theorizations of space. Challenges presented by nonphysical spaces abound at many levels. They range from capturing the relevant semantic information that frames such spaces, to measuring their latent semantic dimensions, developing compatible data models and visualization forms, developing semantically truthful spatial reasoning and analytical tools, and processes of multispectral information fusion that are not restricted to Euclidean geometries. Table 1.1 Summary of properties of absolute and relative spaces Absolute space Relative space Exists independent of any objects In reference to relational processes only A “container” Contents Geographical space Attribute space, social space, behavioral space, etc. Mathematical (Euclidean) space Non-Euclidean space Geographical distance Relative (nongeographical) distance Constant friction of distance, Time-space convergence/divergence, nonlinear linear time time Independent of observer, Perceptions and experiences shape one’s view nonjudgmental observation of the world Time is independent, constant Coexistence and connection of space and time: and universal space-time Space is fixed, a background Space and time as attributes of features for all features Objective Objective or subjective Compatible with field-based and Not directly compatible object-based data models Associated primarily with study of earth, Associated primarily with studies of functions conventional mathematical models and interaction and patterns Rigid geometric structure within which Location referenced in terms of relationships events and features are referenced between features, topological and temporal
8
J.-C. Thrill
Spatial scientists have worked toward formalizations of spaces encapsulating various forms of relations and experiences, from cognitive to socioeconomic, and political. Constructing a relative space involves the observation, interpretation, and synthesis of processes and patterns of flux of changes within a knowledge domain (Wachowicz 1999). Consequently, a map can be regarded as “a visual or graphical representation of a space” (Gatrell 1983, pp. 4) and may exhibit only remote resemblance to the corresponding absolute (Euclidean) map because of distortions imparted by semantic relations between features. Map transforms and cartograms are common mapping techniques called upon to express cartographically a relative view of space. Various methods exist to create synthetic attribute maps. Some techniques process semantic attributes of features by collapsing measured dimensions to a manageable number of new latent dimensions. Through this exercise of unsupervised data reduction, an attribute space is elicited and observations are projected onto it. For instance, self-organizing maps serve to reveal the complex, unknown, and sometimes concealed structures of competition between airlines serving US airports in Yan and Thill (2009). On the other hand, the method of multidimensional scaling (MDS) works on a matrix of dissimilarities among observations. It provides a spatial configuration of observations such that pairwise distances between observations in this configuration match input dissimilarities as well as possible. Gatrell (1983) reports on countless studies that transcend the view of absolute space, including mappings of time spaces exhibiting space-stretching of urban regional environments along high-speed corridors and space-shrinking in zones where mobility is low and congestion is high, mappings of functional economic spaces, and mappings of cognitive spaces. Cognitive maps have, for instance, been instrumental in establishing that distances tend to be perceived logarithmically rather than linearly (remote locations appear proportionately closer than nearby ones) and that their perception is influenced by activity patterns. Maps that people carry in their minds are decidedly not Euclidean (Downs and Stea 1973)! Like time spaces and experiential spaces, they are subjective, whereas the economic space framed by transaction costs of commerce between places has an objective quality. Thus, the relative view of space is a radical departure from space absolutism. While choosing between the two views may depend on the purpose (Galton 2001), the relativist concept opens up the possibility to get to the core of processes that recursively create semantic realities with a geographic expression. For a broad class of spatiotemporal analyses, the concept of relative space strikes as more general and empirically more useful. However, it is not a matter of single-handedly replacing the old absolute paradigm with the concept of relative space, as the two concepts are complementary more than substitute (Peuquet 1994). With new possibilities come new challenges as well. In particular, to remain relevant, spatial analysis must articulate new ways to account for the fact that “semantic content” and “physical container” are not independent entities. Methods that are mindful of relevant socioeconomic and environmental processes are needed to integrate spatial and semantic information across scales. Because the essence of GIS is so tightly linked to the absolute view of space, challenges are numerous. Most notably, geosemantic
1 Is Spatial Really That Special? A Tale of Spaces
9
data models designed on the basis of object-oriented principles so as to not require the prior definition of a single georeferencing frame would squarely transport spatial sciences into a post-Newtonian mindset.
1.3
The Case of Cognized Migration Spaces
Permanent and long-distance relocation of people (also known as “migration” in the demography and geography literatures) has been known for many decades to respond to two major drivers, namely, the spatial separation between regions (associational properties) and the attractiveness of these regions (attributal properties). Migration flows are well predicted by so-called models of spatial interaction (Roy and Thill 2004), which exhibit formal analogies with Newton’s law of planetary gravitation. As a member of the spatial interaction model family, doubly constrained models exactly reproduce observed migrant supplies and demands at all origin and destination regions. This model is a pure flow distribution model aimed at capturing the pattern of region-to-region interactions, rather than the size of regional inflows and outflows. In his study of interstate migrations in the USA, Plane (1984) noted that absolute physical distances usually provide an inferior measure of the relative functional distances that affect destination choices of migrants. Based on the point of view that migrants act on a cognized environment (space) to make their selection of migration destinations rather than the physical space, he posited that a large portion of the error of the spatial interaction model in reproducing observed patterns of migration results from using Euclidean, physical distances instead of cognized distances. He proposed to use a reversed doubly constrained spatial interaction model with observed state-to-state migration flows as input to calibrate and estimate the cognized distances between states. As a result, the functional “migration space” surrounding origins and destinations can be reconstructed. This approach is used here to illustrate the dramatic differences that may exist between the absolute space and relative spaces, which in this case are subjective constructs of migrants. The reverse calibration procedure proposed by Plane (1984) renders a matrix of pairwise cognized distances between state centroids. The estimated associational properties can straightforwardly be compared to the physical distances to gain insight into the relative advantages and disadvantages that select states have in attracting migrants in comparison to the average relationship of the US migration system as a whole. We use distance cartograms to depict the cognized migration space from the perspective of each state of origin or state of destination. Each cartogram forms a local view of the perceived functional distance. Figures 1.1 and 1.2 depict the cognized migration spaces of Arizona immigrants and New York outmigrants, respectively, in year 2000. In these maps, each state is represented by its centroid and labeled with a two-letter abbreviation code. Both spaces reveal dramatic distortions with respect to the physical map of the USA, which underscores the non-Euclidean geometry of migration considerations.
10
J.-C. Thrill
Fig. 1.1 Cognized migration space of Arizona immigrants
Fig. 1.2 Cognized migration space of New York State outmigrants
Yet, distortions are quite different for the two cases illustrated here. Among migrants bound for Arizona, cognized distances from states in the Midwest and Northeast are short, while they are notably stretched from Western states and Florida. Arizona is indeed the recipient of a large stream of retired migrants from the former two regions, who consequently relocate to Arizona at a rate higher than absolute distances would suggest; few Westerns or Floridians relocate to Arizona. Along the same line, cognized distances between New York, on the one hand, and Southeast and Southwestern states, on the other hand, are diminished as a result of the large migration of New Yorkers (often retired) to these states.
1 Is Spatial Really That Special? A Tale of Spaces
1.4
11
Conclusions
In this chapter, I have argued that the absolute view of space that has prevailed in spatial sciences for many decades has led to the assertion that the geographic space and semantic information belong to two separate realms of considerations. In fact, the geographic space is often mediated and filtered through cognitive and mental processes, or constructed through socioeconomic or environmental agency, and therefore separability is an overstatement. The belief in the uniqueness and absolutism of space should give way to a broader framework rooted in relational principles, in which geosemantic spaces can be complementarily articulated and their dynamical properties fully vetted.
References Abler R, Adams JS, Gould P (1971) Spatial organization. Prentice-Hall, Englewood Cliffs, NJ Andrienko G, Andrienko N, Jankowski P, Kiem D, Kraak MJ, MacEachren A, Wrobel S (2007) Geovisual analytics for spatial decision support: setting the research agenda. International Journal of Geographical Information Science 21:836–857 Anselin L (1988) Spatial econometrics: Methods and models. Kluwer Academic, Dordrecht Anselin L (1990) What is special about spatial data? Alternative perspectives on spatial data analysis. In: Griffith, DA (ed) Spatial statistics past, present and future. Institute of mathematical Geography, Ann Arbor, MI, pp 63–77 Anselin L, Getis A (1992) Spatial statistical analysis and geographical information systems. Annals of Regional Science 26:19–33 Beguin H, Thisse JF (1979) An axiomatic approach to geographical space. Geographical analysis 11:325–341 Christaller W (1933) Central places in Southern Germany. (Trans). Cliff AD, Haggett P, Ord JK, Bassett K, Davies RB (1975) Elements of spatial structure: a quantitative approach. Cambridge University Press, London Couclelis H (1992) Location, place, region and space. In: Abler RF, Marcus MG, Olson JM (eds) Geography’s Inner Worlds. Rutgers University Press, New Brusnwick, NJ, pp 215–233 Curry M (1996) On space and spatial practice in contemporary geography. In: Earle C, Mathewson K, and Kenzer M (eds) Concepts in human geography. Rowman and Littlefield Publishers, Lanham, MD, pp 3–32 Downs RM, Stea D (1973) Image and environment: cognitive mapping and spatial behavior. Aldine Publishing Co., Chicago Dueker KJ, Butler JA (2000) A geographic information system framework for transportation data sharing. In Thill JC (ed) Geographic information systems in transportation research. Pergamon, Oxford, pp 13–36 Forer P (1978) A place for plastic space? Progress in Human Geography 3: 230–267 Galton A (2001) Space, time, and the representation of geographical reality. Topoi 20:173–187 Gatrell AC (1983) Distance and space. A geographical perspective. Clarendon Press, Oxford Goodchild MF (2003) GIS: finding the mainstream. GIS Development 7(1):8–15 Gould P (1991) Dynamic structures of geographic space. In: Brunn SD, Leinbach TR (eds) Collapsing space and time: geographic aspects of communications and information. HarperCollins, London
12
J.-C. Thrill
Hartshorne R (1939) The nature of geography: a critical survey of current thought in light of the past. Association of American Geographers, Lancaster, PA Harvey D (1969) Explanation in geography. Edward Arnold, London Jammer M (1969) Concepts of space. Harvard University Press, Cambridge, MA Kant I (1929) Critique of pure reason. (Trans. Smith NK) Macmillan, Basingstoke and London LeSage J, Pace RK (2009) Introduction to spatial econometrics. CRC Press, Boca Raton, FL Longley PA, Goodchild MF, Maguire DJ, Rhind DW (2005) Geographic information systems and science. John Wiley, Chichester, UK Meentemeyer V (1989) Geographical perspectives of space, time, and scale. Landscape Ecology 3:163–173 Peuquet DJ (1994) It’s about time: a conceptual framework or the representation of temporal dynamics in geographic information systems. Annals of the Association of American Geographers 84:441–461 Plane D (1984) Migration space: doubly constrained gravity model mapping of relative interstate separation. Annals of the Association of American Geographers 72:244–256 Roy JR, Thill JC (2004) Spatial interaction modeling. Papers in Regional Science 83(1):339–361 Rushton G, Thill JC (1989) The effect of the distance metric on the degree of spatial competition between firms. Environment and Planning A 21:499–507 Steenberghen T, Dufays T, Thomas I, Flahaut B (2004) Intra-urban location and clustering of road accidents using GIS: A Belgian example. International Journal of Geographical Information Science 18(2):169–181 Wachowicz M (1999) Object-oriented design for temporal GIS. Taylor and Francis, London Whittlesey D (1954) The regional concept and the regional method. In: James P, Jones CF (eds) American geography: Inventory and prospect. Syracuse University Press, Syracuse, NY, pp 21–68 Yamada I, Thill JC (2010) Local indicators of network-constrained clusters in spatial patterns represented by a link attribute. Annals of the Association of American Geographers 100:1–17 Yan J, Thill JC (2009) Visual data mining in spatial interaction analysis with self-organizing maps. Environment and Planning B 36:466–486 Yao X, Thill JC (2005) How far is too far? A statistical approach to context-contingent proximity modeling. Transactions in GIS 9(2):157–178 Yao X, Thill JC (2006) Qualitative queries with qualitative locations in spatial information systems. Computers, Environment, and Urban Systems 30(4):485–502 Yao X, Thill JC (2007) Neurofuzzy modeling of context-contingent proximity relations. Geographical Analysis 39(2):169–194
Chapter 2
A Short Note on Data-Intensive Geospatial Computing Bin Jiang
For over 1,000 years, human knowledge has been recorded in printed formats such as books, journals, and reports archived and stored in libraries or museums. In the past decades, ever since the invention of computers, human knowledge can also be hidden in digital data. Although part of the data has been researched and reported, the pace of the data explosion has been dramatic and exceeds what the printed medium can deal with. This is particularly true for the twenty-first century, enormous data volume of scientific data, including geospatial data (NRC 2003), collected by all kinds of instruments in a variety of disciplines mostly on a 24/7 basis. The lack of related cyberinfrastructure for the data capture, curation and analysis, and for communication and publication alike led Jim Gray to lay out his vision of the fourth research paradigm – data-intensive science (Bell et al. 2009). This fourth paradigm differs fundamentally, in terms of the techniques and technologies involved, from the third paradigm of computational science (ca. 50 years ago) on simulating complex phenomena or processes. Before the computer age, there was only empirical science (ca. 1,000 years ago), and then theoretical science (ca. 500 years ago) like Newton’s Laws of Motion and Maxwell’s Equations. Nowadays, scientists do not look through telescopes but instead mine massive data for research and scientific discoveries. Every discipline x has evolved, over the past decades, into two computer- or information-oriented branches, namely x-informatics and computational x (Gray 2007, cited from Hey et al. 2009). Geography is no exception. Both geoinformatics, which deals with the data collection and analysis of geoinformation, and computational geography or geocomputation (Gahegan 1999), which has to do with simulating geographic phenomena and processes, are concurrent research issues under the banner of geographic information science. The data size or scale under the fourth paradigm and for data-intensive computing is up to terabytes or petabytes using high-performance supercomputers.
B. Jiang Department of Technology and Built Environment, Division of Geomatics, University of G€avle, 80176 Gavle, Sweden e-mail:
[email protected] V. Popovich et al. (eds.), Information Fusion and Geographic Information Systems: Towards the Digital Ocean, Lecture Notes in Geoinformation and Cartography 5, DOI 10.1007/978-3-642-19766-6_2, # Springer-Verlag Berlin Heidelberg 2011
13
14 Table 2.1 Evolution of PC in terms of cost according to Steve Kopp’s talk (Kopp 2010)] Year Data size (byte) 1980 1 1990 1,000 2000 1,000,000 2010 1,000,000,000
B. Jiang and data size [the last two columns are partially 1 GB hard disk ($) 223,000 10,000 10 0.1
256 MB RAM ($) 200,000 50,000 50 5
What data scale do we adopt daily and using an ordinary personal computer (PC) for geospatial analysis then? Let us assume that we were able to deal with data scale in bytes in the 1980s. According to Moore’s law, up to this year 2010 we should be able to deal with gigabytes (Table 2.1). Ask ourselves, what data scale are we dealing with in analysis and computation today? Most of us are probably still constrained at the scale of megabytes if not kilobytes. It seems we are far behind the time, using far less computing power of the modern PC. In the mean time, if we take a look at the cost of 1 GB storage and 256 MB memory of a PC, it has been plummeting over the past three decades since 1981 (Table 2.1), when IBM introduced a PC powered by an Intel 8088 processor. In what follows, I will use three examples to illustrate that the scale of gigabytes is quite feasible for the stateof-the-art PC or affordable server-like machines. Triggered by an obvious morphological difference between European cities (being organic or self-organized) and US cities (being more planned in nature), I started a study trying to illustrate the difference in a quantitative manner. This was the original study motivation. However, instead of the seemingly obvious difference, the study ended up with a similarity – a universal pattern of urban street networks (Jiang 2007). This pattern can be described by the 80/20 principle: about 80% of streets are less connected (below an average, e.g., 4 or 5), whereas 20% of streets are highly connected (above the average). This discovery was quite a surprise to me. In the course of the investigation, 40 cities with different sizes (10 largest cities, 10 smallest cities, and 20 middle-sized cities) were chosen from over 4,000 US cities, and all bear this pattern or regularity. This pattern holds true even for the smallest US town Duffield in Virginia. This study goes beyond US cities and includes some cities elsewhere from Israel, China, and Germany. The pattern perceived still remains valid. All these data have been archived in the web for free access, http://fromto.hig.se/~bjg/PhyAData/. In the study, the total size of the data examined is up to a couple of gigabytes. The data we used in the study is the freely available US Census Bureau TIGER database. GPS data constitute one of the emerging geospatial data formats over the past decade or so. Pervasive GPS units with various mobile devices provide an unprecedented way of collecting individual-based geospatial data. Similar individual-based data include the data from mobile phone transmitters about phone users’ daily movement trajectories. In one of the studies with the aim to explore GIS-based mobility information for sustainable urban planning, we collaborated with a local taxi company and collected massive GPS data (over 6 GB) about human movement patterns. We analyzed over 72,000 people’s moving trajectories, obtained from
2 A Short Note on Data-Intensive Geospatial Computing
15
50 taxicabs during a 6-month period in a large street network involving four cities. The human movement demonstrates a scaling in terms of travel length (see video clips http://fromto.hig.se/~bjg/gps/). We illustrated that goal-oriented moving behavior has little effect on the overall traffic distribution (Jiang et al. 2009; see applet available at http://fromto.hig.se/~bjg/movingbehavior/). We further implemented some agent-based simulations (with both random and purposive moving agents) which took days in a state-of-the-art personal computer to get one result, in total a couple of weeks computer time to get all the simulations done. The study ended up with a finding that purposive, or random moving behavior, has little effect on the overall mobility pattern (Jiang and Jia 2011). This finding indicates that given a street network, the movement patterns generated by purposive human beings and by random walkers have little difference. We are still bound by a nondisclosure agreement, for the taxi company is reluctant to release the part of the GPS data for research purposes. However, we have made the massive simulated data available at (http://fromto.hig.se/~bjg/PREData/). Not only analysis but also computation can be done at a gigabytes scale for geospatial data. For example, the computation of shortest paths tends to be intensive and time consuming, in particular when a graph is huge, involving hundreds of millions of vertices and edges. Because of the enormity of road networks, and the request of real-time computation of shortest paths in map services like Google Maps, it would need dozens of gigabytes of memory to accommodate the enormous road network of the entire world. In this connection, the only data source available at this scale, which meets the request of testing is probably OpenStreetMap (OSM), a wikilike collaboration to create free editable map of the world using data from portable GPS units, aerial photography, and other free sources. Since its inception in year 2004, it has now attracted over 200,000 of registered users (Jackson 2010). OpenStreetMap constitutes one of the most successful examples of crowdsourcing – massive amateurs performing functions that were used to be performed by trained professionals, or volunteered geographic information (VGI) – a term coined by Goodchild (2007) to refer to the user-generated geographic information on the Web. Using the OSM data, we have been developing an intelligent routing service – FromToMap (http://www. fromtomap.org/). We have conducted some data-intensive computation for deriving the fewest-turn routes. For Europe, there are over 30 GB OSM data, from which we extracted 17 GB street data. For the purpose of computing routes, we generated a huge graph involving 10 million nodes and 17 million links, occupying about 30 GB of memory. In addition, we implemented algorithms to compute isochrones within a few minutes far from any location. The FromToMap service is hosted by an affordable HP server, costing about $7,000 with 48 GB memory and a 1 TB hard disk. From the above elaboration, let me add a few reflections on data-intensive geospatial computing. First of all, geospatial research should go beyond the data scale of kilobytes and megabytes. If raw data are in gigabytes, there is probably no need to sample the data. Geographic theory and knowledge should be based on some massive data for verification, to avoid the distorted picture of the elephant in the eyes of the blind man. Current computing storage and processing
16
B. Jiang
capacity are fairly powerful in dealing with the data scale. Second, geographic space is heterogeneously distributed. The larger the geographic space, the more heterogeneous it appears. For example, some physicists have extracted massive trajectory data of registered dollar notes (another format of VGI) from http:// www.wheresgeorge.com/ to study human mobility patterns that bear this heterogeneity (Brockmann et al. 2006). In this regard, I believe that geographers or geospatial researchers have unique contributions to the research using the cutting edge geographic information technologies. Third, OSM can be a benchmark data for geospatial research. OSM data is very rich in content. It contains streets, pedestrian, and cycling paths, as well as public transports. In addition, points of interest and land use information are also embedded in the data. More importantly, there is no constraint to get access to it, since it is owned by no one. This point seems applicable to other formats of VGI as well. Following the third point, geospatial community should set up a data repository to archive data, algorithms, and source codes that can be shared among interested researchers. This way geospatial research is based on some common datasets, becoming replicable, extendible, and reusable in terms of algorithms and codes in line with open-source activities. Similar efforts have been attempted in many other disciplines such as biology, physics, and computer sciences. Finally, many analytics tools or geovisualization tools we have developed do not meet the challenge of data-intensive computing. And many tools developed are still targeted to verify a given hypothesis and to find some presumed relationship. Current tools are still rather weak in discovering hidden knowledge. Computing power and physical memory limits for operation systems to operate have been increasing dramatically. Virtually, there is no upper limit for the recently released Windows 2008. In the meantime, cloud computing has fully deployed for those who need more computing power than ordinary PC. In this regard, many Internet giants such as Amazon, eBay, Google, and Microsoft all have their own cloud computing services (Armbrust et al. 2009). GIS software suppliers like ESRI have also been working on the next generation of products that are to be more cloud compatible. We are facing an exciting yet challenging age when geospatial analysis and computation need to be done at a massive scale. Acknowledgments An early version of this chapter was presented at the National Science Foundation TeraGrid Workshop on Cyber-GIS, February 2–3, 2010, Washington DC.
References Armbrust M., Fox A., Griffith R., Joseph A. D., Katz R. H., Konwinski A., Lee G., Patterson D. A., Rabkin A., Stoica I., and Zaharia M. (2009), Above the Clouds: A Berkeley View of Cloud Computing, Technical Report available at http://www.eecs.berkeley.edu/Pubs/TechRpts/ Bell G., Hey T. and Szalay A. (2009), Beyond the data deluge, Science, 323, 1297–8. Brockmann D., Hufnage L., and Geisel T. (2006), The scaling laws of human travel, Nature, 439, 462–465.
2 A Short Note on Data-Intensive Geospatial Computing
17
Gahegan M. (1999), What is geocomputation? Transactions in GIS, 3(3), 203–206. Goodchild M. F. (2007), Citizens as sensors: the world of volunteered geography, GeoJournal, 69 (4), 211–221. Hey T., Tansley S. and Tolle K. (2009), The Fourth Paradigm: data intensive scientific discovery, Microsoft Research, Redmond, Washington. Jackson J. (2010), OpenStreetMap Attracts 200,000 Volunteers, PCWorld, http://www.pcworld. com/article/186096/openstreetmap_attracts_200000_volunteers.html Jiang B. (2007), A topological pattern of urban street networks: universality and peculiarity, Physica A: Statistical Mechanics and its Applications, 384, 647–655. Jiang B. and Jia T. (2011), Agent-based simulation of human movement shaped by the underlying street structure, International Journal of Geographical Information Science, 25, 51–64, http://arxiv.org/abs/0910.3055. Jiang B., Yin J. and Zhao S. (2009), Characterizing human mobility patterns in a large street network, Physical Review E, 80, 021136, Preprint, arXiv:0809.5001. Kopp S. (2010), Toward spatial modeling in the cloud, A position paper presented at the National Science Foundation TeraGrid Workshop on Cyber-GIS, February 2–3, 2010 – Washington DC. NRC (2003), IT Roadmap to a Geospatial Future, The National Academies Press: Washington, D. C.
.
Part II
Ontologies and Programming Technologies for GIS and GIS Applications
.
Chapter 3
SOA-Based Modeling for Digital Ocean: Approach and Architecture Alexander Smirnov, Alexey Kashevnik, and Nikolay Shilov
Abstract Today, situation management in maritime areas assumes acquisition, integration, and processing of huge amounts of information from various sources such as geographical information system (GIS) data, current and forecasted weather conditions, current and planned/forecasted positions, and movements of ships in the considered area. This has caused the appearance of a new aggregated term of “Digital Ocean” integrating various technologies. This paper proposes an approach to sea rescue operation management based on the service-oriented architecture (SOA) and digital ocean technologies. The approach assumes representation of the operation members and information sources with Web services. This makes it possible to replace maritime situation modeling with modeling of a Web service network. The major idea of the approach is described together with its architecture and major components. Application of the approach is demonstrated on the case study of hurricane/storm situation management.
3.1
Introduction
Digital ocean is a wide term that does not have an exact definition. The Digital Ocean Theme Team (2010; http://web.mit.edu/seagrant/digitalocean/) defines it as extensive digital representations or models of ocean resources and phenomena. In Liu et al. (2009), digital ocean assumes application of various technologies and methods to ocean description and modeling including wideband network, satellite remote sensing imagery data acquisition, mass data store and compression, interoperability, scientific computing, distributed objects, virtual reality, spatial data warehouse, metadata database, and Web GIS. Digital ocean technologies can be used for various purposes including preparation for hurricanes, exploring passive acoustics in fisheries, and supporting offshore oil industry (The Digital Ocean Theme Team 2010; http://web.mit.edu/seagrant/digitalocean/).
A. Smirnov (*), A. Kashevnik, and N. Shilov SPIIRAS, 39, 14 Line, 199178 St. Petersburg, Russia e-mail:
[email protected];
[email protected];
[email protected] V. Popovich et al. (eds.), Information Fusion and Geographic Information Systems: Towards the Digital Ocean, Lecture Notes in Geoinformation and Cartography 5, DOI 10.1007/978-3-642-19766-6_3, # Springer-Verlag Berlin Heidelberg 2011
21
22
A. Smirnov et al.
In this chapter, an intelligent decision-making support in hurricane/storm emergency situation response (referred to as “sea rescue operation”) based on technologies such as Web services and global positioning systems (GPS) and an integration of information from heterogeneous sources to hurricane/storm rescue operation management is proposed. Since sea rescue operation members may belong to different independent companies and organizations, centralized control is likely impossible due to different subordination of actors involved, and other reasons. Another disadvantage of a centralized control is that its possible failure would stop the entire system. A possible solution for this is the organization of a decentralized network. For information integration purposes, it is currently reasonable to use Web services since they are becoming a de facto standard for information exchange and sharing. This chapter proposes an approach representing the sea rescue operation members and information sources with Web services. This makes it possible to replace maritime situation modeling with modeling of a Web service network.
3.2
Model-Driven Approach
Model-driven approaches to systems development is a relatively new and intensively elaborated paradigm (e.g., Stahl and Voelter 2006). Model-driven paradigm assumes orientation to models at all stages of the system development, operation, and discontinuation. In software development, this paradigm facilitates the automatic generation of significant amounts of code, which otherwise would need to be generated manually (Seidman and Ritsko 2006). This has led to the appearance of such approaches as, for example, model-driven software factories (Langlois 2004). Application of the above paradigm to a wide range of applications can be fruitful as well. We propose to use it for building an information system for decision support in sea rescue operations. Such decision support would require integration of information from various sources. There are different research efforts undertaken in this field. In Balmelli et al. (2006), an approach to systems development mainly based on the usage of UML is presented. For distributed systems, the service-oriented architecture as one of the component paradigm is often used (e.g., Atkinson et al. 2002). Definition of information architectures based on the business component identification (BCI) method is described in Albani and Dietz (2006). In this work, an enterprise ontology and three types of relationships for component identification are used: relationships between single process steps, between information objects, and between process steps and information objects. The approach presented in this chapter integrates environmental (digital ocean) information and knowledge in context as a problem model (Smirnov et al. 2005). The main idea of the developed approach is to represent the members of the sea rescue operations by services they provide (Fig. 3.1). This would make it possible to replace the modeling of maritime situation with that of services constituting it. The flexibility assumes efficient management of information and taking into account the
3 SOA-Based Modeling for Digital Ocean: Approach and Architecture Application ontology
Abstract context
23
Operational context
Constraint satisfaction problem solving
Service network
Maritime situation
Web-service interface
Configuration of distributed services
Member profiles Sea rescue operation management Services
Relationship Correspondence Reference Information flow
Sea rescue operation members
Fig. 3.1 Generic scheme of the approach
dynamic environment. For this purpose, the approach proposed actualizes information in accordance with the current maritime situation. An ontological model described by the application ontology (AO) is used in the approach to solve the problem of service heterogeneity. This model makes it possible to enable interoperability between heterogeneous information services due to provision of their common semantics (Dey 2001). Depending on the considered problem, the relevant part of the AO is selected forming the abstract context that, in turn, is filled with values from the sources resulting in the operational context (Smirnov et al. 2007). Application of the context model makes it possible to reduce the amount of information to be processed. This model enables management of information relevant for the current situation (Gr€ uninger 1996). The operational context represents the constraint satisfaction problem that is used during configuration of services for problem solving. The access to the services, information acquisition, transfer, and processing (including integration) are performed via usage of the technology of Web services. The service-oriented architecture has a number of advantages resulting from its principles (CADRC 2009). Among these, the following ones should be mentioned (the specifics related to maritime situation modeling are indicated with italic): 1. Service Autonomy. Services engineered for autonomy exercise a high degree of control over their underlying run-time execution environment. Autonomy, in this context, represents the level of independence, which a service can exert over its functional logic. With regard to the considered problem, the autonomy also reflects independence of the operation members, which in real life are independent.
24
A. Smirnov et al.
2. Service Abstraction. Further supporting service autonomy, service-oriented architecture advocates that the scope and content of a service’s interface be both explicitly described and limited to that which is absolutely necessary for the service to be effectively employed. Beyond the service interface, abstraction applies to any information, in any form, describing aspects of the service’s design, implementation, employed technologies, etc. This principle helps to abstract from real services provided by the operation members and concentrates on their modeling via Web services. 3. Service Standardization. As services are typically distributed throughout networks, they must be easily accessible by other entities in terms of discoverability and consequential invocation. Given this requirement, service-oriented architecture recommends that services adhere to standards, including, for example, standards for the language used to describe a service to prospective consumers. In the proposed approach, the standardization is achieved via usage of the common standards such as WSDL and SOAP as well as common terminology described by AO. As a result, the services constituting the network are fully interoperable and can communicate with each other without any problems. 4. Service Reusability. Reusability is a central property of any successful service. It denotes the capacity of a service to be employed in support of not just one but rather a variety of models. Service-oriented architecture promotes such functional reuse through stipulations for service autonomy and interface abstraction. With these features, the same service can be invoked by multiple consumers, operating in various domains, without requiring the service provider to recode service internals for each application domain. Service reusability significantly facilitates the modeling process and decreases the amount of work required for model building. Besides, the existing services of digital ocean systems can be used. In CADRC (2009), two more service-oriented architecture principles are defined, namely, service composability and service discovery. However, they are out of the scope of the presented approach.
3.3
Technological Framework
As it was mentioned, distributed networks provide a better coordination for communities of independent parties. The generic scheme of a service network is presented in Fig. 3.2. Each operation member is represented as an intelligent agentbased service (or several services) acting in the system. Integration of the intelligent agent technology into the Web services makes it possible to develop “proactive” services that would undertake an initiative in negotiation processes. Each service has its own knowledge stored in its knowledge base. This knowledge is described
3 SOA-Based Modeling for Digital Ocean: Approach and Architecture
25
Alternative information sources
Application ontology
Information source Agent-based service Agent-based service
Agent-based service
OWL-based information exchange SOAP-based information exchange
Fig. 3.2 Generic scheme of a service network
by a portion of the common shared AO related to the current service’s tasks and capabilities and called context. Capabilities, preferences, and other information about the service are stored in its profile that is available for viewing by other agentbased services of the system (Smirnov et al. 2009b). This facilitates communication, which is performed via the communication module responsible for meeting protocols and standards that are used within the system. The services communicate with other services for two main purposes: (1) they establish links and exchange information for better situation awareness; and (2) they negotiate and make agreements for coordination of their activities during the operation. The services may also get information from various information sources; for example, navigating channel information can be acquired from a geographical information system (GIS). To make agent-based services independent, a component service directory facilitator (SDF) has been proposed. SDF is responsible for service registration and update of autonomous services. Initially, a service does not have any information about neighboring services and its portion of the AO is empty. An automatic tool or an expert assigns it a set of the AO elements related to the service. For example, if a knowledge source service is considered, the expert can assign it classes, which can be reified by this service or attributes whose values can be defined using content of this knowledge source service. For problem solving services, this could be tasks and methods existing in the problem domain. Services inform SDF about their appearance, modifications, intention to leave the system, and send scheduling messages to update the internal repository. The task of SDF is to build a slice of the AO for each service, update references to the neighboring services, and maintain a list of services (type, location, schedule, and notification of services about changes in the network). Organization of references between services is a complex task and is out of scope of this chapter. Wellorganized references result in a list of services, which are used as initiators of the negotiation process.
26
A. Smirnov et al.
3.4
Web Services for Information Acquisition
Integration of information from various sources requires development/application of technologies for information acquisition. In order to facilitate the interoperability within the system, it is proposed to use the Web service technology for this purpose. Within AO the domain and task and method ontologies are interrelated by relationships specifying values of which class attributes of the domain ontology serve as input arguments for the methods of the task and methods ontology. Information and knowledge acquired from outside the system is described via references to information/knowledge sources. Knowledge source services are responsible for the interaction with information/ knowledge sources in the following ways: l
l l l l
Representation of information provided by information sources by means of the formalism of object-oriented constraint networks (Smirnov et al. 2009a) Querying information sources Transfer of the information to the system Information integration Data conversion The following types of information sources are distinguished:
l
l
l
l
l
Sensors are physical devices that detect a signal, physical conditions, etc; Web services working with sensors have to support their outputs and interfaces. Databases are organized collections of information; Web services that work with databases have to support SQL queries. Web sites and RSS (RDF Site Summary) are textual information with predefined structure; Web services that work with these sources have to be able to work with RSS structure. Users may know large amount of information and can pass it to the Web service through graphical user interface (GUI). Other information sources allowing interaction through Web services or for which appropriate Web services can be developed.
The mechanisms of interaction between information sources and their assigned services are out of the scope of this research. Hence, the term “information source” denotes the information source itself and the Web service assigned to it together. For abstract context instantiation, the values acquired from information sources are used. These values are taken from the Web services that are identified in the application to supply values to particular attributes. If an attribute obtains values from several sources, this information is integrated. For this purpose, special functions are implemented in the Web services. The values can also be obtained as results of calculations (functional constraints included into the abstract context). Since the approach uses the formalism of object-oriented constraint networks (Smirnov et al. 2009a), the resulting operational context is a constraint network with valued variables.
3 SOA-Based Modeling for Digital Ocean: Approach and Architecture Fig. 3.3 Retrieving information from information sources
Attribute 1
Web-service 1
Information source 1
Attribute 2
27 Attribute 3
Web-service 2
Information source 2
Information source 3
In Fig. 3.3, the scheme of retrieving information from the information sources is shown. One or more class attributes contained in the abstract and operational contexts can request needed information from one Web service that, in turn, can request this information from one or more information sources. When necessary, the Web service calculates the value(s) of the attribute(s) based on information from different information sources. Exemplified tasks to be solved in the tasks and methods ontology are as follows: l l
l
l
Define the number of potential victims Define the number of required lifeboats (rescue vessels) and rescue helicopters required for the current operation Determine route (navigating channel) availabilities based on the current weather conditions Select lifeboats and helicopters for taking part in the hurricane/storm emergency situation based on their current availabilities, locations, and route availabilities
Weather conditions are provided by sensors. The sensors provide values for precipitation, temperature, wind speed and directions, visibility, wave height, and speed (an exemplified list of parameters). In order to specify this, a set of methods taking values from the appropriate sensors are introduced in AO. These methods in the task and method ontology are represented as classes Get Precipitations, Get Temperature, Get Wind Direction, Get Wind Speed, Get Visibility, Get Wave Height, and Get Wave Speed, respectively. To specify that a method takes the value from the sensor, the URI of the Web service responsible for the interactions with this sensor is introduced as an input argument of the method. The output argument of the method is the value the Web service returns. In the operational context, this value instantiates an appropriate attribute specified in the class Weather. An example of “linking” AO and sensors is given in Fig. 3.4. In the example, a sensor provides values of the current air temperature. The URI of the Web service responsible for these values (attribute Servive_URI) is specified as an input argument of the method Get Temperature. The method’s output argument is specified as Temperature. In order to specify dependencies between domain and task and method constituents within AO, functional constraints are used. These constraints state what domain
28
A. Smirnov et al.
Fig. 3.4 Linking AO and information sources
knowledge is involved in problem solving and in what way. For instance, the method Get Temperature outputs the temperature value. This value is used to instantiate the attribute Air temperature of the class Weather. In order to specify this, a functional constraint is introduced between the classes Weather and Get Temperature. Figure 3.4 shows specification of the function between attribute Air Temperature and the output argument Temperature of the method Get Temperature. The functional constraint Air Temperature is introduced in the class Weather. The value of this function is calculated by the method Get Temperature (the bottom list box). In the top list box, the attribute Air Temperature is chosen as a value returned by the method Get Temperature. The fact that the method Get Temperature returns Temperature as the output argument has been specified above.
3.5
Case Study
The approach is illustrated by a hurricane/storm emergency situation response scenario. The AO used for this purpose represents knowledge of the emergency management domain. As soon as the information that an emergency situation has happened gets into the system, an abstract context characterizing the situation is created or reused.
3 SOA-Based Modeling for Digital Ocean: Approach and Architecture
29
Within the abstract context, the acting resources fall into classes Actor and Job Role. The local rescue services are responsible for providing lifeboats and rescue helicopters for medical care of injured people and/or for transporting them to the shore hospitals as well as for preventing the vessel in distress from destruction/ sinking. Two types of transportation are possible: Air transportation used by helicopters and water transportation used by lifeboats. The lifeboats also fall into two classes: lifeboats with evacuation facilities (can evacuate injured people) and lifeboats without evacuation facilities (cannot evacuate injured people). Web services implementing tasks concerned with supplying the system with the data from the information sources use the following kinds of information resources. The current weather conditions are taken from the sensors and Web sites. Information about the locations of the routes (navigating channels) of the maritime area is taken from the GIS. Information about lifeboats and rescue helicopters available in the area is read from a database; information about the locations of them is provided by the GPS-based devices installed on the vessels/helicopters. Information about the emergency location, its characteristics, and the approximate number of victims is provided by the smart sensor of the vessel in distress (a device transmitting coordinates, speed, orientation, acceleration, states of different vessel’s systems, etc.). Information about hospitals available in the region and their locations is read from the healthcare infrastructure database; hospital free capacities are provided by hospital administration systems.
Fig. 3.5 Operational context: current situation
30
A. Smirnov et al.
Fig. 3.6 Operational context: plan for actions
The current situation (operational context) and the set of possible action plans are presented to the decision maker (Fig. 3.5). The decision maker chooses one solution (Fig. 3.6) from the generated set that is to be the decision. The decision is delivered to the leaders of the emergency and rescue teams and to the hospital administrations. They have access to the operational context through any Internet browsers from a notebook, PDA, mobile phone, etc. (the Internet today can be accessed from almost everywhere via, for example, a satellite). The system has been implemented as a distributed system for operational decision support. The interface of the system is Web-based; i.e., regular Web browsers can be used for working with the system.
3.6
Conclusion
The chapter proposes an application of the developed earlier approach to an intelligent decision-making support for hurricane/storm emergency situation response incorporating digital ocean technologies. The major idea of the approach is to model the sea rescue operation members and information sources with Web services. This makes it possible to replace maritime situation modeling with modeling of a Web service network. The common application ontology is used by the operation members to provide for interoperability at the level of semantics; the
3 SOA-Based Modeling for Digital Ocean: Approach and Architecture
31
context-based modeling technology enables narrowing of the problem domain what makes it possibly to reduce the solution search space. An agent-based service model is used for negotiation. Agents make Web services “active” components. Interactions of Web services can be supported by formal interface agreement defined by the technology of Web services and enriched with the application ontology semantics. In the future work, the authors plan to integrate the research results with existing systems and technologies to estimate the applicability of the approach to real-world applications. Acknowledgments Some of the results are due to research carried out as a part of the project funded by grants # 09-07-00436, # 11-07-00045, # 09-07-00066, and 10-08-90027 of the Russian Foundation for Basic Research, and project # 213 of the research program “Intelligent Information Technologies, Mathematical Modelling, System Analysis and Automation” of the Russian Academy of Sciences.
References Albani A, Dietz J (2006) The Benefit of Enterprise Ontology in Identifying Business Components. In: The Past and Future of Information Systems: 1976-2006 and Beyond. In: Proceedings of IFIP 19th World Computer Congress, TC-8, Information System Stream (WCC2006), pp 243–254. Atkinson C, C, Groß H-G, K€ uhne T (2002) Towards a General Component Model for Web-Based Applications. In: Annals of Software Engineering, 13(1-4):35–69 Balmelli L, Brown D, Cantor M, Mott M. (2006) Model-Driven Systems Development. In: IBM Systems Journal, IBM, 45(3): 569–585 CADRC (2009) KML Sandbox: An Facility Based on SOA Principles. CADRD Currents, Fall, 2009, Collaborative Agent Design Research Center (CADRC), California Polytechnic State University, San Luis Obisro. Dey A.K. (2001) Understanding and Using Context. Personal and Ubiquitous Computing J, 5(1):4–7 Gr€ uninger M (1996) Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2):93–155 Langlois B, D (2004) MDSOFA: A Model-Driven Software Factory. In: Position papers for OOPSLA & GPCE Workshop “Best Practices for Model Driven Software Development”, http://www.softmetaware.com/oopsla2004/langlois.pdf/, last accessed Sep. 20, 2010. Liu X, Zhang X, Chi T, Qu H, Jiang Y (2009) Study on China Digital Ocean Prototype System. In: proceedings of the WRI World Congress on Software Engineering (WCSE’09), IEEE, pp 466 – 469 Seidman DI, Ritsko JJ (2006) Preface. In: IBM Systems Journal, IBM, 45(3):449-450 Smirnov A, Levashova T, Shilov N (2009) Context-Based Intelligent Service for Healthcare Applications. In: Dwivedi A (ed.) Handbook of Research on Information Technology Management and Clinical Data Administration in Healthcare, Medical Information Science Reference, 1:128–142 Smirnov A, Kashevnik A, Levashova T, Shilov N (2007) Context-Driven Information Fusion for Operational Decision Making in Humanitarian Logistics. In: Popovich V, Korolenko K, Schrenk M (eds.): Lecture Notes in Geoinformation and Cartography. Proceedings of the third International Workshop on Information Fusion and Geographic Information Systems (IFGIS 2007), Springer, pp 69–83
32
A. Smirnov et al.
Smirnov A, M, Chilov N, Levashova T (2005) Constraint-driven methodology for context-based decision support. Design, Building and Evaluation of Intelligent DMSS (Journal of Decision Systems), Lavoisier, 14(3):279–301 Smirnov A, Shilov N, Levashova T, Kashevnik A (2009) GIS for profile-based context formation in situation management. In: Information Fusion and Geographic Information Systems, Proceedings of the Fourth International Workshop, IFGIS 2009 (electronic proceedings). Stahl T, Voelter M (2006) Model-Driven Software Development: Technology, Engineering, Management. Wiley, 444p The Digital Ocean Theme Team, http://web.mit.edu/seagrant/digitalocean/, last accessed Nov. 19, 2010.
Chapter 4
Advantages of Intelligent Geographic Information System Research Prototype Ruslan Sorokin
Abstract The chapter describes some advantages and accomplishments of intelligent geographic information systems that were successfully implemented in research prototyping and commercial systems developing. These advantages include improvements of a general scenario-based spatial process simulation framework, creation of two-level simulation network based on the latest web technologies, and integration of the system with becoming popular programming language, Clojure aimed at information processing enhancement. Implementation details and algorithms of motion and transformation of the extended spatial objects outlined during the simulation are also considered.
4.1
Introduction
Intelligent geographc information system (IGIS) is intended for studying the complex spatial processes by simulation and modeling. Along with conventional GIS technologies, it incorporates different artificial intelligence techniques for knowledge representing and processing. The IGIS concept was introduced in Sorokin et al. (2003). Complex spatial process was defined in Sorokin (2007) as an aggregate of elementary spatial processes, such as motion of point-like objects and motion and transformation of extended objects on the earth surface. To describe the complex spatial processes, a notion of Scenario was introduced and elaborated in this paper. Basic visual integrated environment, being used for several years as the IGIS research prototype, is described in Sorokin (2007) and Sorokin (2009); based on this prototype navy surveillance (Popovich 2009) and training (Popovich 2010) systems have been successfully developed. On the basis of IGIS, an intelligent environment for a conduct of visual naval operations and combat actions at sea was created. At that, the first line of improvements dealt
R. Sorokin St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, 14 Line, 39, St. Petersburg 199178, Russia e-mail:
[email protected] V. Popovich et al. (eds.), Information Fusion and Geographic Information Systems: Towards the Digital Ocean, Lecture Notes in Geoinformation and Cartography 5, DOI 10.1007/978-3-642-19766-6_4, # Springer-Verlag Berlin Heidelberg 2011
33
34
R. Sorokin
with the further refinement of the general spatial simulation environment and scenario execution framework. Naval training system required development of distributed simulation system consisting of many workstations working in coordination on the joint mission. At that, the second direction of improvements led to an integration of the latest web technologies and building on their basis the networks of simulation stations; the third direction concentrated on integration of IGIS and up-todate means of information processing. Also the continuous upgrading of the IGIS prototype has been performed in order to conform to the latest versions of the prototype integral parts. All advantages and improvements are considered and incorporated in the open source project published in Internet. This chapter describes the above-listed advantages of the IGIS research prototype and specifies some trends of the further research efforts.
4.2
General Improvements
In Sorokin (2007) the notion of scenario is defined as a succession of phases and decisions. It is mandatorily defined that phases inside a single scenario cannot be executed concurrently. Here, the underlying idea is to make the notion of scenario as close to the notion of algorithm as possible. So, any parallelism can take place only at the actions inside a single phase level. At that, the subsequent use of the system clearly reveals that this limitation is a serious one. In fact, the majority of spatial processes consist of parallel actions and subprocesses, and to adequately implement them it is often necessary to disintegrate, sometimes excessively, a single scenario into subscenarios in order to be able to use an action Subscenario for the parallel execution inside one phase. To get over this complication, the notion of Phase was replaced by notion of Task, so that several tasks can be executed concurrently inside a single scenario. Figure 4.1 shows a comparison of parallel execution of two activities in an old “phase” paradigm and in a new “task” paradigm. However, here arises a problem of synchronizing parallel tasks when needed; to solve this problem, a notion of Join, which is a node where parallel tasks’ flows can merge, is introduced. Another elaboration was done regarding the scenario execution framework; in particular, the notion of “event” was introduced as Event class and its subclasses into the scenario ontology. In essence, an event here is considered a logical function that returns true or false value depending on occurrence of some real event at the particular moment of time. The notion of event has received further specification in following subclasses of the Event class: l
Property1 is a property or an attribute of some object that can either take place or not at some moment, and so, it is a qualitative property. For instance, an object is “moving” or “stopped” or “near some point.” Other types of properties encompass quantitative properties such as “speed” or “course”; at that, the corresponding event would be, for instance, an object “speed over 12 knots” or “course equal to 315” and so on.
4 Advantages of Intelligent Geographic Information System Research Prototype
35
Fig. 4.1 Parallel execution of two subscenarios in one phase vs. parallel execution of two tasks l
l
Relation is some interlink, or relationship between two or several objects. Relations also can be qualitative and quantitative. Examples of qualitative relations are “near,” “far,” “abaft,” “on beam,” “ahead,” and “inside.” Examples of the quantitative ones are “distance,” “bearing,” and “course angle.” MuliEvent is a logical function derived by composition of the above-mentioned properties and relations when “and” and “or” connectives are used.
On the basis of this event hierarchy, a new decision type EventFork is introduced. Through this type of decision, a scenario execution flow can be changed depending on the presence or absence of this or that event at the particular time instant. Besides, an implementation of the action Calculus, being used for arbitrary computations in Groovy language, was improved to allow for using directly the properties, attributes, and relations of objects.
4.3
Networking
When running Naval Training System “ONTOMAP-V1” (Software Complex for Information Support and Automation of Staff Personnel Functional Activity at Naval Bases’ Combat Information Centers; http://oogis.ru), a set of new requirements was encountered with. The above system incorporates several levels of command and control including the strategic, operational, and tactical levels. Besides, it consists of several subsystems including at least two adversarial forces,
36
R. Sorokin
air forces, surface forces, underwater forces, and so on. Moreover, it includes separate subsystems for trainers and trainees. All subsystems at all levels consist of several workstations working together and interacting with each other. Evidently, the task of communication and interaction of these subsystems and workstations using general and state-of-the-art approaches should be solved, and only a network-based approach can be used for the case. Developing networks for simulation and modeling may solve many problems in the considered area. Obviously, a network of interacting computers simulates a system of interacting objects more naturally and efficiently. Networks for simulation of complex spatial processes can be built based on different approaches from the simplest to quite complex ones. The chapter discusses some results received at testing techniques based on Prote´ge´ server, classical HTTP server, and on a combination of the latest web technologies including AJAX interacting web pages, KML web geographical data format, and “OpenLayers” GIS open source library.
4.3.1
Using Prote´ge´ Server
Initially, Prote´ge´ server was intended for a remote ontology editing and was based on Java RMI. Using this technology, several users can concurrently work from different machines with a single ontology located on the server, reading and writing classes, and instances of classes in it, changing content of their slots. This technology is developed to an interchange of signals between machines running different simulation scenarios, and for this purpose, a simple signal communication ontology was developed that consisted of one abstract class Address with one slot content of a plain text type. For the above abstract class, a set of concrete subclasses representing a set of concrete participants of a signal communication network should be developed. This ontology is stored on Prote´ge´ server. Algorithm of signal communication using Prote´ge´ server is simple enough. In order to send a signal, one participant of the signal communication network remotely creates on the server a new instance of class, corresponding to the address of another participant, and enters in its “content” slot a message along with other useful information (e.g., “from” address). To receive signals, another participant periodically remotely reads, processes, and deletes from the server the instances of his/her own address class. To propagate this technique to communication between different scenarios being run on different machines two new actions: SendSignal and ReceiveSignal were added up the scenario ontology, as well as a new decision SignalControl. When some action of the SendSignal type is executed in one scenario, a corresponding signal is sent to the server, and the scenario execution continues. When some action of the ReceiveSignal type is executed in the scenario on the other machine, the execution halts and waits till the corresponding signal is received. A decision of the type SignalControl can be additionally used where process must fork depending on the specific signal type.
4 Advantages of Intelligent Geographic Information System Research Prototype
4.3.2
37
Classical HTTP Server
This technique gives a possibility to control the simulation process on a web server by the web clients using an ordinary web browser that is based on HTTP server from OpenMap GIS library. This original server belongs to WMS protocol server family and is intended for displaying a map on a web page. Layers of the map are taken from a working OpenMap application; such application is embedded in the working IGIS simulation environment as a Prote´ge´ plug-in and is used to display the simulated spatial process on the server. OpenMap HTTP server has been specifically refined to control the simulation process as follows. The main web page was divided into two frames. One frame is dedicated to displaying the map with all its layers, including the simulated objects’ layer. This frame periodically updates itself to display motion of simulated objects, generally, the spatial processes’ change in time. The second frame consists of two panels: the control panel and the navigation panel. The control panel contains fields for input names and coordinates of point objects and fields for input direction and speed of moving objects; there also exists a field for input names and parameters of missions. Some scenario of actions of map objects is meant under notion of mission. Accordingly, the control panel provides for a minimal and full set of instruments to control the simulation process. The navigation panel contains fields for input coordinates of the map center and a field for the map scale input. So, in this case the minimal and full sets of instruments for navigation on the map are available. To support this framework inside Protege two classes – HTTPServer and HTTPClient were created; HTTPServer class instance describes a configuration of the server including a port number, the main page and its parts, list of clients, list of the map layers, and several other parameters; HTTPClient class contains the set of client instances with their specific parameters. HTTP server implementation is written as a Groovy script.
4.3.3
Using Latest Web Technologies
Naval training system “ONTOMAP-V1” (Software Complex for Information Support and Automation of Staff Personnel Functional Activity at Naval Bases’ Combat Information Centers; http://oogis.ru) has challenged the use of more sophisticated technique to create a simulation control framework incorporating several levels, subsystems, and workstations. At the basic level, the scenario simulation machines would interact using the signal communication techniques mentioned earlier. By using the latest web technologies, a level for simulation control is created above this level. Distinct web clients at this level must have more intensive interchange of information with the basic level, and the classical HTTP
38
R. Sorokin
technology in this case is not applicable. So, instead of a new AJAX technology that allows to send requests from the web client back to the web server XML HTTP and to get back responses to these requests without reloading the whole web page. This significantly enhances the performance of web interactions. Likewise, the simple WMS server from OpenMap appeared to be inadequate for representing more elaborate picture of complex simulation processes in “ONTOMAP-V1” project. To meet the increased requirements, OpenLayers GIS open source library (Open source library for displaying map data in web browsers OpenLayers; http://openlayers.org) was selected. This library is specially designed for the web applications and has many features for displaying a number of digital map data formats, for specific geographic calculations, and decoration of the map with a set of useful information and control elements. So, OpenLayers is currently used as a general framework for displaying geographical information on web clients. Support of KML geographical data format (Keyhole 2004) is an important feature of OpenLayers (MetaCarta Labs 2006). Initially designed for Google Earth program, this format rapidly gained popular among the broad GIS community and became de facto a standard. Into KML-document, the elaborate geographical data could be reasonably set as well as other information, for instance, HTML data, XML data, and, in general, any arbitrary useful data. OpenLayers possesses a collection of probe-tested samples of embedding into the web page map layers that take data from KML servers. Moreover, KML data received from KML servers could be efficiently processed by the JavaScript functions, embedded into the web page. All these advantages allow for designing two-level network of simulation stations and web control clients functioning as shown in Fig. 4.2. Every simulation station has an embedded mini web server, based on Jetty technology (Greg Wilkins 1995). This Jetty-server upon a request from the web client generates and returns KML-document containing current locations of the simulated objects and other geographical data along with messages from the simulation station to the control station on web client. The control station sends back to the simulation station a control message using XML HTTP requests. An intermediate Apache web server is needed in the considered case due to safety reasons, and other modern web browsers do not permit to collect information from several different remote sources on one web page. So, the use of Apache “proxypass” technology (Robert Mc Coo 1995) here seems quite appropriate, and due to it KML-documents from different machines come to the web control client as if from one machine. Use of KML format gives additional advantage: GoogleEarth program (Keyhole Ink 2005) could be used as a web client and display the spatial simulation processes on its rich background. Communication between all other simulation stations using signals is also possible based on using Prote´ge´ server. Different application levels (strategic, operational, tactic, and so on) are implemented in a frame of Control Level.
4 Advantages of Intelligent Geographic Information System Research Prototype
39
Fig. 4.2 Two-level simulation network
4.4
Advanced Information Processing
Quite a while ago Prote´ge´ was written in Lisp programming language; then Java appeared on the scene and became a hit, and Prote´ge´ was rewritten in Java. Years passed, and a dizzy rise of Clojure (Rich Hickey 2007) is being observed – reincarnation of Lisp for Java Virtual Machine. This success is determined by the combination of expressive power of Lisp and mature Java technology, along with sophisticated support of concurrent computations and functional style of programming in Clojure. Focused upon knowledge representation, the core Protege always had a certain powerful means deficiency for information processing and over the time of its history existence plug-ins for efficient knowledge processing such as JessTab, Algernon, CLIPSTab, ScriptTab, DroolsTab, and other (see comprehensive Prote´ge´ plug-in library) were developed. Eventually, a time comes to unify knowledge representation power of Prote´ge´ and knowledge processing power of Clojure. Bearing in mind that main objective of IGIS is fast prototyping Clojure seems most applicable to match such an objective. It is known that program in Clojure is four to ten times shorter than in Java; the number of useful libraries in Clojure grows rapidly, including wrappers to the most advanced scientific software packages. Due to the direct support of concurrent computations, Clojure must take an important place in “cloud computing.”
40
R. Sorokin
To integrate Clojure into the IGIS-based simulation environment, an approach based on the Prote´ge´ plug-in technology was selected and Clojure tab plug-in ClojureTab was developed. This approach allows for implementing the above developments in the simulation environment as well as in some general purpose cases. Details of this approach are given below. Data model of ClojureTab is represented by the ontology ClojureOntology and includes the following classes: l l l l l
l
CloProgram – a set of functions and global variables1 CloFunction – a function in Clojure CloMacro – a “macro” in Clojure (loosely, a special form of function) CloVar – a global (inside program) variable CloNamespace a “namespace” in Clojure (loosely, a space of names having its own name) CloFuncall – a function call with a specific set of parameter values
Graphic user’s interface of Clojure tab contains instruments for creating programs out of them and loading them for execution. It also contains Read-Eval-Print-Loop (REPL) window to run and debug programs interactively. This unique for Lisp-based programming environments’ feature is especially important in the process of debugging a spatial simulation scenario because it allows to intermit and change program implementing some action during the scenario execution as well as during the repeating action execution.
4.4.1
Implementation of Polygons’ Moving and Transformation Algorithm
Let us consider an algorithm of polygons’ moving and transformation as a simple case of using the Clojure for knowledge processing. This algorithm is quite general and can be used to represent the displacement and shape’s variation of the extended spatial objects at simulation. For instance, these may be oil spills in the events of accidents at sea or adverse meteorological phenomena, or the front lines shift in a result of operations. This algorithm is well applicable to the cases when initial and final positions of polygon points are known and is also useful when series of intermediate locations of polygon are known; it implements comparatively simple interpolation of points’ positions in time as shown in Fig. 4.3. In general case, the initial and final polygons have unequal number of points and that is why an equalization of points’ number is needed prior to the movement and/ or transformation. This equalization gives a possibility to carry out transformations between any polygons. For instance, the example included in the open source 1 There exists no notion of “the program” in Clojure terminology; a file or a group of files with the same namespace in Clojure terminology matches the notion of CloProgram introduced in ClojureOntology best.
4 Advantages of Intelligent Geographic Information System Research Prototype
41
Fig. 4.3 Interpolation of points’ positions of polygon in time
Fig. 4.4 Equalization of points’ number for two polygons by some points spawning and further motion to final polygon
project DroolsTabScene-5.0 on the Web site: http://oogis.ru demonstrates a “shuffling” of main European countries that is a transformation of one country to another. Equalization does not lead to losing the precision because it is performed by augmentation of points’ number of polygon with the lesser one. Equalization carried out by spawning of some points for the polygon with the lesser number of points is shown in Fig. 4.4. Additional refinement of this algorithm lies in even distribution of spawned points along the polygon with simultaneous minimization of spawn factors for distinct points. Such “optimal” distribution is represented by repeater – a vector of spawn factors that should be applied to initial collection of points (polygon) to get a resulting polygon. Clojure function for spawning points using the repeater is:
42
R. Sorokin
Here, used functions such as partition-all, count, repeat, and mapcat are built-in functions of Clojure. An exemplary call of the function spawn-points is given below:
For comparison let us take a look at the shortest recursive Java program accomplishing the same task:
Function for developing a repeater looks as follows:
Here, min is an argument representing a lesser number of polygon points and max is an argument representing a greater number of polygon points. This function uses the function evenly-mix-lists to evenly mix two lists, and function evenly-mix-
4 Advantages of Intelligent Geographic Information System Research Prototype
43
lists at its calculation alternatively takes from a longer list more elements than from a shorter one:
Finally, the function mix-lists-by-n used by the previous function, while mixing two lists x and y, alternatively takes n elements from x and one element from y repeatedly:
This algorithm was embedded into the PolyMoving action implementation that demonstrates how Clojure code could be integrated into the specific actions in the scenario ontology; more details could be received from the source code of the DroolsTabScene-5.0 open source project presented at: http://oogis.ru. Also, a new universal action CCalculus for arbitrary Clojure code inclusion into scenarios was developed in addition to the existing action calculus.
4.5
Conclusion
Successful usage of IGIS for several years in research prototyping and development of commercial systems clearly proved the correctness and soundness of the chosen approach to construction of the system-based powerful and well-tested open source packages Prote´ge´, Drools, and OpenMap. The system possesses a sufficient flexibility and versatility to adopt new functionality based on the latest computer technologies smoothly and seamlessly. Most of the done work has arrived to visual design of a new specific ontology using Protege and writing little pieces of code (if any) in any of three languages – Java, Groovy, or Clojure, depending on the best applicability to this specific task. As a result, a solution requiring less code – less errors – less debugging time is received. The proposed approach dramatically reduces debugging time as well as the usage of Clojure REPL because of a possibility to debug scenarios without the execution interrupting. The experience gained at modernization of the IGIS-based visual simulation environment definitely shows certain trends in the further improvements, and the integration of modern open source artificial intelligence frameworks such as OpenNLP (Apache Software Foundation 2010), WEKA Holmes (1994), and OpenCyc (Douglas Lenat 1984) are most likely.
44
R. Sorokin
References Apache Software Foundation (2010) Open source natural language processing project, http:// opennlp.sourceforge.net Douglas Lenat (1984) Ontology and knowledge base of everyday common sense knowledge OpenCyc http://www.opencyc.org Greg Wilkins (1995) Embeddable and pluggable web server http://jetty.codehaus.org/jetty Holmes G, Donkin A, Witten IH (1994) Open source suite of machine learning software WEKA http://www.cs.waikato.ac.nz/ml/weka Keyhole link (2005) Virtual globe, map and geographic information program GoogleEarth, http:// earth.google.com Keyhole Markup Language KML (2004) Keyhole lnk., http://code.google.com/intl/ru/apis/kml MetaCarta Labs (2006) Open source library for displaying map data in web browsers OpenLayers, http://openlayers.org Popovich V (2009) Surveillance & Recognize Systems’ Software for Information Support and Automation of Information Centers for Battleships, Navy Formations and Units, ONTOMAP Project SPIIRAS, Russia, http://oogis.ru Popovich V (2010) Software Complex for Information Support and Automation of Staff Personnel Functional Activity at Naval Bases’ Combat Information Centers, ONTOMAP-V1 Project SPIIRAS, Russia, http://oogis.ru Rich Hickey (2007) General-purpose language supporting interactive development, functional programming and multithreaded programming Clojure http://clojure.org Robert Mc Cool (1995) Apache HTTP Server proxy pass module http://httpd.apache.org/docs/1.3/ mod/mod_proxy.html Sorokin RP, Shaida SS, Smirnov AV (2003) Intelligent geo-information systems for modeling and simulation. In: Harbour, Maritime and Multimodal Logistics Modeling & Simulation, Riga, Latvia, September 18–20, 2003, pp 395–398 Sorokin RP (2007) Visual Modeling of Spatial Processes. In: Popovich V, Shrenk M, Korolenko K (eds) Information Fusion and Geographic Information Systems, Proceedings of the Third International Workshop, St. Petersburg, Russia, 2007, pp 263–271 Sorokin RP (2009) Intelligent geo-information systems and harbour protection. In: Shahbazian E, Rogova G, de Weert MJ (eds) Harbour Protection Through Data Fusion Technologies, Proceedings of the NATO Advanced Research Workshop on Data Fusion Technologies for Harbour Protection, Tallinn, Estonia, 27 June – 1 July 2005, pp 191–198
Chapter 5
Combining of Scanning Protection Mechanisms in GIS and Corporate Information Systems Igor Kotenko, Andrey Chechulin, and Elena Doynikova
Abstract This chapter proposes an approach to combine different mechanisms of network scanning protection against malefactors’ reconnaissance actions and network worms. This approach can be implemented as a part of protection mechanisms in corporate information systems, including Geographical Information Systems (GIS). The approach allows improving highly the scanning protection effectiveness due to reducing the false positive rate and increasing the detection accuracy. Particular scanning techniques are outlined. The core combining principles and architectural enhancements of the common scanning detection model are considered. An approach to automatically adjust the parameters of used mechanisms based on statistical data about network traffic is also suggested.
5.1
Introduction
Development of effective detection and scanning prevention mechanisms (mainly detection and reaction) is an actual problem of information security in the GIS and corporate information systems. Networks scanning can be used by malefactors to gather information about attacked network and by computer worms in process of spreading. The large part of network scanners has very aggressive behaviors. They attempt to identify victim computer networks in a short period of time. This type of scanners is actually easier to detect because their aggressiveness stands out from the background traffic. Fast scanners are trying continuously to connect with the big amount of hosts and identify new victims. Hosts trying to quickly connect with the big variety of hosts are the main traffic feature for fast scanners. This feature (high connection intensity) and its manifestations (e.g., the big rate of failed connections) are the basic features for the most of scanners.
I. Kotenko (*), A. Chechulin, and E. Doynikova Saint Petersburg Institute for Informatics and Automation of Russian Academy of Sciences (SPIIRAS), 39, 14 Liniya, 199178 St. Petersburg, Russia e-mail:
[email protected];
[email protected];
[email protected] V. Popovich et al. (eds.), Information Fusion and Geographic Information Systems: Towards the Digital Ocean, Lecture Notes in Geoinformation and Cartography 5, DOI 10.1007/978-3-642-19766-6_5, # Springer-Verlag Berlin Heidelberg 2011
45
46
I. Kotenko et al.
This chapter is directed primarily to the development of effective automatic detection techniques against fast network scanning, but in our investigations we are also trying to develop the techniques that can be applicable to detection of less aggressive scanning and other various modes of scanning. The main contribution of this chapter is the development and investigation of a new approach to combining of different scanning protection mechanisms and development of common combined protection approach. The main accent in this chapter (in comparison with our other papers on worm detection and response, for example, (Kotenko 2009; Kotenko et al. 2009)) is on combining of protection mechanisms. Decision of the stated problem is provided in the chapter on the basis of the step-by-step implementation of the following tasks: 1. Definition of effective protection mechanisms 2. Selection of the most important traffic characteristics and traffic classification by these characteristics 3. Determination of detection effectiveness metrics 4. Development of the common approach to combining of protection mechanisms The last task includes the development and training of the optimal features selection algorithm for the protection mechanisms on each of the traffic classes, and the development of the algorithm for combining the protection mechanisms. The main requirements for automatic scanning protection are as follows: 1. High adequacy: low false positive and false negative probabilities 2. Low timeliness (or low protection latency) 3. Automation: This requirement is tightly related to the timeliness. A protection system must require minimal real-time administrator intervention 4. Protection against slow scanners: it is also important since most of threshold and window-based schemes can be defeated by slow scanners This chapter is structured as follows: Second section is dedicated to a related work. Third section analyzes different scanning detection mechanisms. In fourth section, we select main network traffic characteristics. Fifth section defines protection mechanisms effectiveness metrics. Sixth section specifies the essence of the suggested approach to combining of protection mechanisms. Seventh section describes the environment used for experimental evaluation of protection mechanisms. Results of experiments are outlined in eighth section. Conclusion generalizes the main results of the chapter and indicates the future research directions.
5.2
Related Work
There are many particular methods that allow detecting different types of scanning with varying effectiveness in different conditions. Many scanners send packets to unused or dark addresses. Detection systems monitoring these ranges, called network telescopes (Moore 2002; Moore et al. 2003),
5 Combining of Scanning Protection Mechanisms in GIS
47
have become a standard tool for Internet abuse monitoring and have been employed to track such worms as Slammer (Moore et al. 2004) at the global level. By installing a nonproduction honeypots (Provos 2004), it is possible to detect malicious activity. Honeypots are valuable detection tools, because any traffic associated with them is suspicious by definition. Provos (2004) characterizes honeypots as high interaction or low interaction. The former fully simulates a live operating system and can be fully compromised. The latter simulates only limited services. Honeypots can be organized into networks called honeynets (Curran et al. 2005). With use of honeynets, detection ability is enhanced via a large set of honeypots executing a diverse set of software configurations. Jung et al. (2004) and Jung (2006) propose an effective algorithm, called Threshold Random Walk (TRW), to detect scans by observing the success or failure of connection attempts to new destinations. The approach is based on the behavioral characteristics such as scanner having a rate of unsuccessful connections higher than that of legitimate sources. The approach is based on modeling of connection attempts to new addresses as a random walk using two stochastic processes. Scanning can be detected by observing success and failure outcomes of connection attempts and classifying a source as legitimate or malicious. Classification is based on the observed sequence of connection outcomes according to the theory of sequential hypothesis testing. Using trace data, Jung et al. demonstrated that TRW can detect malicious activity in a small number (four or five) of requests. A lot of other protection mechanisms based on limitation of intensity of network connections establishment (“rate limiting”) are offered. Williamson (Sanchez 2007; Twycross and Williamson 2003; Williamson 2002) suggests to limit the number of various connections of host Chen and Tang (2004) and Schechter et al. (2004) apply restrictions to hosts which show a lot of uncompleted or failed connections. DNS-based detection is based on the empirical observation that most legitimate connections are preceded by a DNS query (Whyte et al. 2005; Wong et al. 2006). We believe that integrated and coordinated use of particular protection mechanisms seems to be a perspective approach to the network attacks protection in case of large-scaled GIS and corporate information systems (Kotenko 2009; Kotenko et al. 2009). It is due to the fact that particular protection methods are directed on the appropriate traffic and attack types, but combined protection mechanisms allow combining the advantages of different methods and to decrease their disadvantages. In this chapter, the approach to combining of different network scanning protection mechanisms is suggested and investigated. This approach allows improving the detection effectiveness highly. Although the problem of developing the particular scanning protection mechanisms is researched enough in the existing works [for example, in (Chen and Tang 2004; Curran et al. 2005; Jung et al. 2004; Jung 2006; Moore 2002; Moore et al. 2003, 2004; Provos 2004; Sanchez 2007; Schechter et al. 2004; Twycross and Williamson 2003; Weaver et al. 2004; Whyte et al. 2005; Williamson 2002; Wong et al. 2006; Zuev and Moore 2005)], it is very important to find the ways to increase the effectiveness of these mechanisms and to develop automatic adjustment procedures to adapt these mechanisms to the current
48
I. Kotenko et al.
very changeable network environment (including used applications, traffic characteristics, etc.).
5.3
Scanning Detection Mechanisms
As the basis of the suggested combined approach to the network scanning protection, we assume the usage of a few mechanisms based on the following techniques: l
l l l
“Virus Throttling” technique and its modifications (VT-S and VT-C) (Twycross and Williamson 2003; Williamson 2002) Techniques based on the Failed Connection (FC) analysis (Chen and Tang 2004) Techniques which use TRW method (Jung et al. 2004; Jung 2006) Credit-Based Rate Limiting (CB) techniques (Schechter et al. 2004)
Main elements of the some of the specified techniques are presented below. “Virus Throttling” technique for realization on switches (VT-S) (Twycross and Williamson 2003; Williamson 2002) is based on the following model of the network traffic processing. The hash table with length N is specified for each host. It is to register the values of the hash function calculated for the last IP addresses to which the host sent connection requests. The values of the hash function are from 0 till N 1. “1” is recorded on the k-th position in the hash table when the value is k. Connection request is defined by TCP-SYN packet. There is defined the minimal threshold between two TCP-SYN packets arrival with new destinations (rate threshold). If the request arrived in time less than specified threshold, receiver address is hashed and added to the hash table for this source. Otherwise hash list is cleared and information about the current request is added. If hash table is full, all requests from the source are considered to be malicious for a T time. The TCP protocol is used, and TCP SYN packets are analyzed. The “source IP” and “destination IP” fields are controlled in the packets. Suggested modification of the Virus Throttling technique on the basis of CUSUM method (VT-C) (Chechulin and Kotenko 2008) includes the following procedures. Counter C exists for every host. CMax is a threshold of the number of requests directed to new addresses successively. Counter increases in case of less than RMin seconds from the last request with unrecorded receiver address. Connection request is identified via TCP-SYN packet. Connection requests from the appropriate host are considered to be malicious in case of exceeding specified threshold by counter C. TCP protocol is used, and TCP SYN packets are analyzed similar to the “Virus Throttling” technique. The “source IP” and “destination IP” fields are controlled in the packets. In the Failed Connection (FC) (Chen and Tang 2004) techniques, based on the failed connection analysis, the following information is used and kept: address of the host which initiated connection, failure rate, connection time, and failed connections counter. In case of exceeding the rate threshold of submitting new records to the hash table, a record for the particular host is created. Failure rate (f) value for
5 Combining of Scanning Protection Mechanisms in GIS
49
this record is updated after increasing of failed connection counter on the fixed value (for example, 100). In case of exceeding the specified threshold by the failure rate value, connection requests from this host are considered as malicious. In process of implementation of these techniques, TCP protocol is used and TCP RESET packets are analyzed. “Source IP” field and TCP RST flag are controlled in the packets. TRW method (Jung et al. 2004; Jung 2006) is based on the following computation model. For the analysis of the host, which shows high network activity, if it can perform scanning, the Sequential Hypothesis Testing method is used. Let H1 be a hypothesis that the host r shows high network activity (performs scanning), H0 – a hypothesis that the host does not perform scanning, and Yi – a variable describing the connection attempt to the ith host. This variable can have the following values: 0, if connection is established; 1, if connection is not established. It is assumed that conditions of coming the hypothesis Hi – the random variables Yi|hi ¼ 1,2,. . . – are independent and identically distributed. In this case, we can express the distribution of the Bernoulli random variable Yi as: Pr ¼ ½Yi ¼ 0jH0 ¼ y0 ; Pr ¼ ½Yi ¼ 1jH0 ¼ 1 y0 ; Pr ¼ ½Yi ¼ 0jH1 ¼ y1 ; Pr ¼ ½Yi ¼ 1jH1 ¼ 1 y1 :
(5.1)
The assumption that a connection attempt is more likely to be a success from a benign source than a malicious one implies the condition: y0 > y1. With these two hypotheses, there are four possible decisions (malicious or benign): l
l
l
l
“Scanning detection” – when the algorithm selects H1 in case of real scanning from the host “False negative” – if the algorithm chooses H0 in case of real scanning from the host “No scanning” – when the algorithm selects H0 in case of scanning from the host is really absent “False positive” – if the algorithm selects H1 in case of scanning from the host is really absent
Let us use PD as probability of the scanning detection from the host and PF as false positive probability to define performance conditions of the algorithm. It is supposed that the following conditions are satisfied for an optimum efficiency of algorithm: PD b, PF a, a ¼ 0.01, b ¼ 0.99. As each event is observed, we calculate the likelihood ratio: LðYÞ
n Pr½Y jH1 Y Pr½Yi jH1 ¼ ; Pr½Y jH0 i¼1 Pr[Yi jH0
(5.2)
where Y is the vector of events observed so far, and Pr[Yi|Hi] represents the conditional probability of the event, when Y sequence completely corresponds to the Hi hypothesis.
50
I. Kotenko et al.
Then the likelihood ratio L is compared with an upper threshold (1) and a lower threshold (0). The results of comparison show the presence or absence of scanning from the host: LðYÞ 0 ) H0 LðYÞ 1 ) H1 :
(5.3)
If 0 < L(Y) < 1, then we should wait until the additional events for more accurate identification. Thresholds are selected as follows: 1 ¼
5.4
b 1b ; ¼ : a 0 1a
(5.4)
Selection of Network Traffic Characteristics
In this chapter, we mark out nearly 30 characteristics for the traffic classification. The following characteristics were selected as basic: l l l l l l
Connections rate Percent of TCP SYN requests from the overall packets number Connection time Average frequency of incoming packets Average number of packets on one source host Ratio of the connection requests number (TCP SYN) to the approved connections number (TCP SYN-ACK), etc.
Training objective includes looking for the optimal configuration of the used particular protection mechanisms on the base of traffic characteristics. It is complicated by high correlation of these characteristics in the real networks. The selected characteristics of traffic (average frequency of incoming packets, percent of TCP SYN requests, etc.) represent some statistical data collected for some time period. There are few ways for evaluation of such traffic parameters. The simplest way is to store all packets received during some fixed period (for example, 1 h) and to evaluate statistical parameters using this history. The main disadvantage of this approach is high resource intensity, especially for high-speed networks. Another way is to store only values of statistical characteristics. For example, average frequency value of traffic for a phase can be calculated, if we have quantity of packets received from the beginning of phase. This way is considerably less resource intensive. However, after each reasonably small phase, the statistical values should be cleared: statistical information about traffic for a week can be uninteresting. On the other hand, at the end of time period we have no statistics. To overcome this situation, it is offered to gather traffic statistics in several time “windows” simultaneously, and windows are time shifted. Combined protection
5 Combining of Scanning Protection Mechanisms in GIS
51
uses traffic statistics from the oldest window. The number of windows could be one of the configuration parameters.
5.5
Protection Mechanisms Effectiveness Metrics
Effectiveness metrics for detection of hosts with scanning traffic are based on the binary classification of hosts: “contain/does not contain” malicious traffic. These characteristics are based on the following classification matrix: Really contains Detected as malicious Detected as benign
a (true positive) c (false negative)
Really does not contain b (false positive) d (true negative)
Here a – number of hosts which traffic contains scanning traffic and which are identified properly; b – number of hosts which traffic does not contain scanning traffic, but which are identified by system as malicious; c – number of hosts which traffic contains scanning traffic, but which are identified by system as benign; d – number of other hosts. To analyze the scanning protection mechanisms, we use both errors metrics of the first and second kind (false positive rate and false negative rate) and the integrated metrics (which are calculated on the basis of errors metrics of the first and second kind) – completeness, precision, and accuracy. Completeness (r) is calculated as the ratio of true positive to the total number of the malicious hosts. This metric describes the ability of system to recognize malicious hosts, but it does not take into account the false positive value. Precision (p) is defined as the ratio of true positive to the total number of the recognized malicious hosts. Precision describes the ability of system to recognize just really malicious hosts. Accuracy (A) is calculated as the ratio of proper decisions to the total number of the system decisions.
5.6
The Approach to Combining of Protection Mechanisms
Combining of detection and response mechanisms is based on using of those mechanisms that proved to be the best on traffic like the current one. To resolve combining problem, the following combining methods, i.e., methods of integration of particular techniques, were used: 1. 2. 3. 4.
Selection of the best (optimal) mechanism and disconnection of others Simple majority Weighted majority Combining with use of data mining methods
52
I. Kotenko et al.
Let us consider the suggested combining methods. All these methods suppose initial testing of every method and selection of the best method (with specified characteristics) for appropriate traffic characteristics. It is supposed that these methods allow using the best from each mechanism for each traffic class. There is complication in the realization of the first method. It is not always possible to perform full ranking of the protection mechanisms according to their applicability rate, because of the variety of traffic and its characteristics. The second method uses data from all protection mechanisms. It supposes that the mechanisms work with the same accuracy rate and does not take into account the applicability of the protection mechanisms to different cases. Let us present more details about the suggested realization case of the third method which is based on the calculation of weight coefficients for each protection mechanism. Weight coefficients are defined on the basis of traffic parameters. These parameters should be selected in advance. W(d,x1, . . . , xn) function for the computation of weight coefficients uses as arguments type (class) d of the protection mechanism and the point of the n-dimensional Cartesian superspase. This superspase is defined by the parameters (x1, . . . , xn). The superspace dimension is defined by the number of selected traffic parameters. On the basis of the results of initial testing (training), the dependence of accuracy value for different mechanisms on the value of the specified parameters is built. Let us define the function F(x), which stands for dependence of the accuracy on the traffic parameter x, as follows: on the basis of data, obtained on the training stage, the Lagrange interpolation polynomial is built. This polynomial allows to get hypothetical accuracy values for every value of parameter x. Let Fi(d, x) define the dependence of the accuracy value on i-th parameter for the protection mechanism d. Then the function of weight coefficients is defined as follows: n P
Wðd; x1 ; . . . ; xn Þ ¼
Fi ðd; xÞ
i¼1
n
:
(5.5)
The less accuracy for the specified mechanism with specified traffic characteristics, the less the function W value. In the simplest case, the function W treats all parameters of traffic as equivalent. In other case, it is possible to supply traffic parameters with weights according to their importance for decision making. Then the summands in the numerator should be multiplied by an appropriate weight coefficient. Let us define a function HF(f, d) of making decision if the host is malicious or not. It gets a value after accepting a packet f by the defense mechanism d: it is 1 if host is malicious and +1 if it is not. The voting function for m defense mechanisms used for decision making, if host is malicious after getting packet f at traffic characteristics x1, . . . , xn, is as follows:
5 Combining of Scanning Protection Mechanisms in GIS
Vð f ; x1 ; . . . ; xn Þ ¼
m X
ðWðdj ; x1 ; . . . ; xn Þ HF ð f ; dj ÞÞ:
53
(5.6)
j¼1
If the value of V( f, x1, . . . , xn) is negative, the host is malicious, otherwise it is not. The fourth combining method uses data from all defense mechanisms. These data together with main traffic characteristics are transferred to the trained earlier classifier [for example, Naı¨ve Bayes (Zuev and Moore 2005)]. As a result, the classifier makes decision on whether analyzed traffic is malicious or not on the basis of input data and prior training. The same combining methods are applicable for the automatic selection of the optimal parameters of protection mechanisms. The main idea of this approach is that it is reasonable to extend combining on different configurations of particular mechanisms by means of changing setting parameters of the mechanism due to the current situation in the network. The main requirements for combined protection mechanisms to be developed in whole correspond to the functional requirements of the developed proactive approach to scanning detection in whole (Kotenko 2009; Kotenko et al. 2009). These requirements are adequacy and efficiency of the detection, effective use of the system resources, automatic execution, and possibility to detect different scanning types.
5.7
Environment for Experiments
To model and evaluate the combined and particular protection mechanisms, as well as to select the optimal parameters for them, we have developed the automated modeling and analysis methodology and software tools. The essence of practical evaluation of the combined and particular protection mechanisms is concluded in realization of a multitude of different experiments. These experiments are carried out by using the software environment for different values of input data and include measuring of protection effectiveness metrics. Architecture, which is used to develop the modeling and analysis software tools and techniques of the optimal parameters selection, includes the next elements: l
l
l
Traffic source models. These models are intended to provide the traffic for protection mechanisms investigated. They have to include both scanning and usual (legitimate) network traffic. Preprocessing models and traffic sources synchronization. Preprocessing models are intended to translate the traffic from the source format to such format that is easy to analyze for protection mechanisms. This model is also responsible for the traffic synchronization when more than one traffic source at the same time is used. Scanning detection mechanisms models. Input parameters of every detection mechanism are the fields of the packet received from the traffic source and being processed by this mechanism. As control parameters, we use different internal parameters of each mechanism which influence on its effectiveness.
54
I. Kotenko et al.
Implemented modeling and analysis software tools allow evaluating the following main effectiveness metrics of protection mechanisms and optimal parameters selection techniques: l l l l
Rate of blocked and (or) deferred legitimate traffic (false positives) Rate of passed malicious traffic (false negatives) Integrated metrics (completeness, precision and accuracy) Attack response time
In experiments we have applied a combined method of traffic modeling. It consists in using the different records of real traffic with addition of the traffic necessary for research. In this case, the needed traffic is a scanning traffic and traffic of “fast” applications, such as NetBIOS/NS or P2P. The experiments were carried out by using both the real traffic traces and the generated traffic (by plugging various modules of traffic models). To compare the protection mechanisms, the two traffic types were used: l
l
Enterprise-level network traffic with a big set of different applications (including vulnerability scanners) without prevalence of any traffic Local area network traffic with prevalence of P2P applications
Selection of such traffic types is reasoned by the fact that the mechanisms of detection and response against network scanning should precisely detect scanning both in traffic of legitimate applications and in traffic of applications that create great number of connections with different hosts. The latter is especially important, because the traffic of applications that create great number of connections with different hosts resembles the behavior of network scanning.
5.8
Results of Experiments
For the experiments, the specified traffic types were mixed with scanning traffic. The following scanning parameters were used: l l l l
Connection requests rate – 50 requests per second Probability of the successful connection – 30% Probability to receive TCP-RST packet – 30% IP addresses selection for the scanning – random
Let us provide the testing results for the particular detection mechanisms VT-S, VT-C, FC, TRW, and CB (see third section for description of these mechanisms) for different traffic types (Tables 5.1–5.4). The experiments with particular protection mechanisms have shown that in many cases the Virus Throttling technique on the basis of CUSUM method (VT-C), which was suggested by us, is better for the P2P traffic than used on the network switches mechanism VT-S. That is why it is reasonable to implement it on the switches in the networks with P2P traffic instead of VT-S or together with VT-S.
5 Combining of Scanning Protection Mechanisms in GIS
55
Table 5.1 Average values of false positives (FP), false negatives (FN), memory (V), and accuracy (A) for particular mechanisms in case of usual traffic without scanning traffic
Protection mechanism VT-S VT-C FC TRW CB
FP
FN
0.022600 0.023400 0.005950 0.004300 0.021800
0 0 0 0 0
Table 5.2 Average values of false positives (FP), false negatives (FN), memory (V), and accuracy (A) for particular mechanisms in case of P2P traffic without scanning traffic
Protection mechanism VT-S VT-C FC TRW CB
FP
FN
0.089913 0.061528 0.002946 0.002311 0.080051
0 0 0 0 0
Table 5.3 Average values of false positives (FP), false negatives (FN), memory (V), and accuracy (A) for particular mechanisms in case of usual traffic mixed with scanning traffic
Protection mechanism VT-S VT-C FC TRW CB
Table 5.4 Average values of false positives (FP), false negatives (FN), memory (V), and accuracy (A) for particular mechanisms in case of P2P traffic mixed with scanning traffic
Protection mechanism VT-S VT-C FC TRW CB
FP
FN
V
A
0.023300 0.024100 0.009680 0.004980 0.025600
0.000663 0.000530 0.054291 0.002416 0.000344
23,496 36,224 48,572 28,912 9,608
0.998929 0.998983 0.972348 0.998216 0.987114
FP
FN
V
A
0.301541 0.101948 0.103761 0.0599373 0.020021
0.0178923 0.0317885 0.013333 0.0972275 0.001258
34,820 105,272 74,108 127,656 22,632
0.96833 0.964804 0.98227 0.905997 0.999393
Currently, we are in the process of testing the different realizations of combined protection mechanisms. The experiments fulfilled show that the combined methods based on selection of the best (optimal) mechanism, implementation of weighed majority, and data mining methods for different traffic types allow improving the effectiveness metrics in the most cases. These experiments were realized by using the mix of usual traffic, traffic with P2P connections, and scanner traffic so that main host parameters were changed noticeably. For example, such experiments were realized by addition of the P2P traffic to the host traffic beginning from some moment or by removing P2P traffic from the host.
56
I. Kotenko et al. Table 5.5 Average values of false positives (FP), false negatives (FN), memory (V), and accuracy (A) for the combining method (1) – selection of the optimal mechanism in case of usual traffic mixed with scanning traffic Protection mechanism FP FN (1) 0.024100 0.000530
Table 5.6 Average values of false positives (FP), false negatives (FN), memory (V), and accuracy (A) for the combining method (1) – selection of the optimal mechanism in case of P2P traffic mixed with scanning traffic Protection mechanism FP FN (1) 0.020021 0.001258
Table 5.7 Average values of false positives (FP), false negatives (FN), memory (V), and accuracy (A) for the combining method (2) – simple majority in case of usual traffic mixed with scanning traffic Protection mechanism FP FN (2) 0.092851 0.033488
Under the use of the combined protection method based on the selection of the optimal mechanism, designated as (1), the VT-C protection mechanism was automatically selected for the usual traffic (Table 5.5) and the CB protection mechanism – for the traffic with P2P traffic (Table 5.6). Unfortunately, the implemented method based on the selection of the optimal mechanism did not take into account unexpected changes in traffic parameters. This combined mechanism showed noticeably worse results than some particular scanning detection mechanisms in the case of appearance of P2P traffic in the usual traffic or removing of P2P traffic. Combined protection method based on the simple majority, designated as (2), responded better on changing of traffic parameters than particular detection mechanisms. But the results in every specific situation were worse than results of the particular methods which are optimal for these traffic types (Table 5.7). For the combined protection method based on the weighted majority, designated as (3), the following parameters were selected for control: connection requests rate, connection request percent (TCP SYN packets) to the total packets count, connection time, and average packets number for one source host. As a result of the experiments, the following average values of false positives (FP) and false negatives (FN) were obtained (Table 5.8). The best results were obtained while using data mining methods for the combining (Naı¨ve Bayesian approach was used). As a result of the experiments, the following average values of the false positives and false negatives were obtained (Table 5.9).
5 Combining of Scanning Protection Mechanisms in GIS
57
Table 5.8 Average values of false positives (FP), false negatives (FN), memory (V), and accuracy (A) for the combining method (3) – weighted majority in case of usual traffic mixed with scanning traffic Protection mechanism FP FN (3) 0.009421 0.008596
Table 5.9 Average values of the false positives (FP), false negatives (FN), memory (V), and accuracy (A) for the combining method (4) – use of data mining methods in case of usual traffic mixed with scanning traffic Protection mechanism FP FN (4) 0.003529 0.001286
Note that the following fact influences on the false positives: after detection of scanning from the host, it is blocked and its normal (not malicious) traffic is blocked too.
5.9
Conclusions
In this chapter, an approach to combining of scanning protection mechanisms and to selection of optimal parameters for these mechanisms in GIS and corporate networked information systems was suggested and investigated. We described the developed software tools which intended for experiment realization and evaluation of protection effectiveness on real and generated traffic records. On the basis of the experiments fulfilled, we evaluated particular and combined scanning protection mechanisms. We can conclude that the suggested combined protection methods (particularly, combined method based on weighted majority and data mining) and the parameters adjustment procedures for particular protection mechanisms, due to the statistical traffic metrics, allow for significant increase in the effectiveness of scanning protection. The future research will be devoted to experimental analysis of the effectiveness of the combined methods for different traffic mixes, modification of existing particular protection mechanisms, and development of new protection mechanisms. We are planning also to investigate other schemes for protection mechanisms cooperation and improve the modeling and analysis software tools. Acknowledgments This research is supported by grant from the Russian Foundation of Basic Research (project № 10-01-00826-a), program of fundamental research of the Department for Nanotechnologies and Informational Technologies of the Russian Academy of Sciences (contract № 3.2) and partly funded by the EU as part of the SecFutur and MASSIF projects.
58
I. Kotenko et al.
References Chechulin AA, Kotenko IV (2008) Investigation of Virus Throttling Defense Mechanisms against Network Worms. In: Information Security. Inside, 3:68–73 (in Russian) Chen S, Tang Y (2004) Slowing Down Internet Worms. In: Proceedings of the 24th International Conference on Distributed Computing Systems Curran K, Morrissey C, Fagan C, Murphy C, O’Donnel B, Fitzpatrick G, Condit S (2005) Monitoring Hacker Activity with a Honeynet. International Journal of Network Management, 15:123–134 Jung J, Paxson V, Berger AW, Balakrishnan H (2004) Fast portscan detection using sequential hypothesis testing. In: Proceedings of the 2004 IEEE Symposium on Security and Privacy, Oakland, California, pp 211–225 Jung J (2006) Real-Time Detection of Malicious Network Activity Using Stochastic Models. PhD Theses. MIT Kotenko I (2009) Framework for Integrated Proactive Network Worm Detection and Response. In: Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and network-based Processing (PDP 2009). IEEE Computer Society, 2009. pp 379–386 Kotenko IV, Vorontsov VV, Chechulin AA, Ulanov AV (2009). Proactive security mechanisms against network worms: approach, implementation and results of experiments. Information Technologies, 1:37–42 (in Russian) Moore D (2002) Network Telescopes: Observing Small or Distant Security Events. In: Proceedings of the 11th USENIX Security Symposium Moore D, Paxson V, Savage S, Shannon C, Staniford S, Weaver N (2003) Inside the Slammer Worm. IEEE Security and Privacy Magazine 1:33–39 Moore D, Shannon C, Voelker G, Savage S (2004) Network Telescopes: Technical Report, Caida Provos NA (2004) A virtual Honeypot Framework. In: SSYM’04 Proceedings of the 13th conference on USENIX Security Symposium, Vol.13. San Diego, CA Sanchez M (2007) Virus Throttle as basis for ProActive Defense. In: Communications in Computer and Information Science (CCIS), Vol.1, Springer Schechter S, Jung J, Berger AW (2004) Fast Detection of Scanning Worm Infections. In: Proceedings of the Seventh International Symposium on Recent Advances in Intrusion Detection, French Riviera, France Twycross J, Williamson MM (2003) Implementing and testing a virus throttle. In: Proceedings of the 12th USENIX Security Symposium, Washington DC, pp 285–294 Weaver N, Staniford S, Paxson V (2004) Very fast containment of scanning worms. In: Proceedings of the 13th USENIX Security Symposium Whyte D, Kranakis E, Oorschot PC (2005) DNS-based Detection of Scanning Worms in an Enterprise Network. In: Proceedings of the Network and Distributed System Security Symposium Williamson MM (2002) Throttling viruses: Restricting propagation to defeat malicious mobile code. In: Proceedings of the 18th Annual Computer Security Applications Conference 1–61. IEEE Computer Society, Washington Wong C, Bielski S, Studer A, Wang C. (2006) Empirical Analysis of Rate Limiting Mechanisms. In: Lecture Notes in Computer Science, Vol. 3858, Springer, 2006 Zuev D, Moore AW (2005) Traffic Classification using a Statistical Approach. In: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Banff, Alberta, Canada
Part III
GIS Data Fusion
.
Chapter 6
On the Use of Three-Dimensional Self-Organizing Maps for Visualizing Clusters in Georeferenced Data Jorge M.L. Gorricha and Victor J.A.S. Lobo
Abstract The self-organizing map (SOM) is an artificial neural network that is very effective for clustering via visualization. Ideally, so as to produce a good model, the output space dimension of the SOM should match the intrinsic dimension of the data. However, because it is very difficult or even impossible to visualize SOMs with more than two dimensions, the vast majority of applications use SOM with a regular two-dimensional (2D) grid of nodes. For complex problems, this poses a limitation on the quality of the results obtained. There are no theoretical problems in generating SOMs with higher dimensional output spaces, but the 3D SOMs have met limited success. In this chapter, we show that the 3D SOM can be used successfully for visualizing clusters in georeferenced data. To overcome the problem of visualizing the 3D grid of units, we start by assigning one primary color (of the RGB color scheme) to each of the three dimensions of the 3D SOM. We then use those colors when representing, on a geographic map, the georeferenced elements that are mapped to each SOM unit. We then provide a comparison of a 2D and 3D SOM for a concrete problem. The results obtained point to a significant increase in the clustering quality due to use of 3D SOMs.
6.1
Introduction
There is a wide range of problems that need to be addressed in a geospatial perspective. Some of these problems are often associated with environmental and socioeconomic phenomena where the geographic position is a determinant element for analysis (Openshaw 1995). In such kind of analysis, frequently based on georeferenced secondary data (Openshaw 1995), the focus is centered in the search of patterns and spatial relationships, without defined a priori hypotheses
J.M.L. Gorricha and V.J.A.S. Lobo (*) CINAV-Naval Research Center, Portuguese Naval Academy, and ISEGI-UNL, Portugal e-mail:
[email protected] V. Popovich et al. (eds.), Information Fusion and Geographic Information Systems: Towards the Digital Ocean, Lecture Notes in Geoinformation and Cartography 5, DOI 10.1007/978-3-642-19766-6_6, # Springer-Verlag Berlin Heidelberg 2011
61
62
J.M.L. Gorricha and V.J.A.S. Lobo
(Miller and Han 2001). This is achieved through clustering, defined as the unsupervised classification of patterns into groups (Jain et al. 1999). It is also widely recognized that visualization is a potentially useful technique for pattern exploratory analysis and may, under certain circumstances, contribute to discover new knowledge. Moreover, when applied to georeferenced data, visualization may allow the explanation of complex structures and phenomena in a spatial perspective (Koua 2003). Visualization is, in this perspective, defined as the use of visual representations of data obtained with interactive computer systems in order to amplify cognition (Card et al. 1999). It is in this context that unsupervised neural networks, such as the SOM (Kohonen 1990, 1998, 2001), have been proposed as tools for visualizing georeferenced data (Koua 2003). In fact, the SOM algorithm performs both vector quantization (process of representing a given dataset by a reduced set of reference vectors; Gersho 1977, 1978; Buhmann and Khnel 1992) and vector projection, making this artificial neural network (ANN) a very effective method for clustering via visualization (Flexer 2001). Among all the strategies for visualizing the SOM, we are particularly interested in those that allow dealing with spatial dependency. One of the methods used to visualize georeferenced data using the SOM consists in assigning different colors to the units of the SOM network, defined only in two dimensions (2D SOM), so that each georeferenced element can be geographically represented with the color of its best matching unit (BMU) (Skupin and Agarwal 2008). Because it is very difficult or even impossible to visualize SOMs with more than two dimensions (Bac¸a˜o et al. 2005; Vesanto 1999), the output space of this ANN is generally defined by a regular two-dimensional grid of nodes (2D SOM). This approach, supported by a nonlinear projection of data on a two-dimensional surface, performs a dimensionality reduction that generally leads to a loss of information, and for this reason there is a strong probability that some of the existing clusters will be undetectable (Flexer 2001). However, in the particular case of georeferenced data, it is possible to consider the use of a three-dimensional SOM for this purpose, thus adding one more dimension in the analysis, and consequently reducing information loss. As we shall see later, the inclusion of a third dimension in the analysis will allow us to identify some of the clusters that are undifferentiated in SOMs with the output space defined only in two dimensions. This chapter is divided into five sections as follows. Section 6.2 presents the theoretical framework of the problem under review, especially regarding the use of SOM as a tool for visualizing clusters in a spatial perspective; in Sect. 6.3 we propose a method for visualizing clusters in georeferenced data that uses the output space of a three-dimensional SOM; Sect. 6.4 presents the results and discusses practical applications of the presented method, including experiments with real and artificial data; finally, in Sect. 6.5, we present the general conclusions.
6 On the Use of Three-Dimensional Self-Organizing Maps for Visualizing Clusters
6.2
63
The Self-Organizing Map
The SOM is an ANN based on an unsupervised learning process that performs a nonlinear mapping of high-dimensional input data onto an ordered and structured array of nodes, generally of lower dimension (Kohonen 2001). As a result of this process, and by combining the properties of a vector quantization and vector projection algorithm, the SOM compresses information and reduces dimensionality (Vesanto 1999; Vesanto et al. 2000). In its most usual form, the SOM algorithm performs a number of successive iterations until the reference vectors associated with the nodes of a bidimensional network represent, as far as possible, the input patterns that are closer to those nodes (vector quantization). In the end, every input pattern in the dataset is mapped to one of the network nodes (vector projection). After this optimization process, topological relations among input patterns are, whenever possible, preserved through the mapping process, allowing the similarities and dissimilarities in the data to be represented in the output space (Kohonen 1998). Therefore, the SOM algorithm establishes a nonlinear between the input data space and the map grid that is called the output space. Because the SOM converts nonlinear statistical relationships that exist in data into geometric relationships, able to be represented visually (Kohonen 1998, 2001), it can be considered a visualization method for multidimensional data specially adapted to display the clustering structure (Himberg 2000; Kaski et al. 1999), or in other words, as a diagram of clusters (Kohonen 1998). When compared with other clustering tools, the SOM is distinguished mainly by the fact that, during the learning process, the algorithm tries to guarantee the topological ordering of its units, thus allowing an analysis of proximity between the clusters and the visualization of their structure (Skupin and Agarwal 2008). In order to transform the SOM into a better tool for exploratory data analysis, several methods have been developed increasing the capabilities of this algorithm for that purpose. These methods explore both perspectives of SOM: vector projection (an output space perspective) and vector quantization (an input data space perspective). Usually, the visualization of SOM is based on two-dimensional constructs such as the U-Matrix (Ultsch and Siemon 1990; Ultsch 2003), component planes, hit maps, and other similar variants (Ultsch 2003; Kraaijveld et al. 1995) or by exploring the data topology (Tasdemir and Merenyi 2009). However, the aim of this chapter is not focused on all those strategies but only in those that allow visualizing clusters in a geospatial perspective. Typically, a clustering tool should ensure the representation of the existing data patterns, the definition of proximity between these patterns, the characterization of clusters, and the final evaluation of output (Jain et al. 1999). In the case of georeferenced data, the clustering tool should also ensure that the groups are made in line with geographical closeness (Skupin and Agarwal 2008). Thus, the geospatial perspective is, in fact, a crucial point that makes the difference between clustering georeferenced data and other data.
64
J.M.L. Gorricha and V.J.A.S. Lobo
Recognizing that fact and knowing that the visualization of SOM can be considered by other means than the usually used methods, we will look now to one specific approach that has been proposed in order to deal with geospatial features. An alternative way to visualize the SOM can be reached by taking advantage of the very nature of georeferenced data, coloring the geographic map with label colors obtained from the SOM units (Skupin and Agarwal 2008). This approach is proposed in the “Prototypically Exploratory Geovisualization Environment” (PEGE) (Koua and Kraak 2008). This software incorporates the possibility of linking SOM to the geographic representation of data by color, allowing its analysis in a geospatial perspective. One possible application of PEGE, which constitutes the bottom line of this chapter, consists in assigning colors to the map units of a 2D SOM with some kind of criterion (similarity by example) and finally coloring the georeferenced elements with those colors. Figure 6.1 shows an example of clustering georeferenced data based on the application of this method. A color was assigned to each map unit of a 2D SOM defined with nine units (3 3). This map was trained with data related to the main causes of death in several European countries. As we can see through this example, the geospatial perspective seems to be essential to understand this particular phenomenon.
2D SOM - Output space
Fig. 6.1 The principal causes of death with a 2D SOM. This example was obtained by training a 2D SOM with data related to the main causes of death in several European countries. Each country was painted with the same color of its BMU in the SOM. Data source: EUROSTAT
6 On the Use of Three-Dimensional Self-Organizing Maps for Visualizing Clusters
6.3
65
Clustering Georeferenced Data with 3D SOM
In this section, we propose a clustering method for georeferenced data based on a visualization of the output space of a 3D SOM. This method is no more than an association of each of the three orthogonal axes (x, y, and z) that define the SOM grid to one of the three primary colors: red, green, and blue (RGB Scheme). As a result, each of the three dimensions of the 3D SOM will be expressed by a change in tone of one particular primary color (RGB), and each SOM unit will have a distinct color label. After that we can paint each geographic element with its BMU color. Figure 6.2 represents schematically a SOM with 27 units (3 3 3) in RGB space followed by the geographical representation of several georeferenced elements painted with color labels of their BMUs. Formally, let us consider a SOM 3D defined with three dimensions [u v w] and a rectangular topology. The SOM grid or the output space (N) is a set of (u v w) units (nodes) defined in 23 kn and steering 2
antenna is from 15 to 30 r/min, which means that the rotating period of marine radar antenna is about 3 s, and its sampling period is stable and has faster renewal rate, but it has a lower accuracy of measurement, and would lose some information sometime. AIS has a higher accuracy of sampling signal, but its sampling period is unstable; its characteristics are as follows: when AIS works at automatic mode, the intervals of the dynamic information it broadcasts vary with different states of vessel navigating; the time intervals of reporting AIS dynamic information are as shown in Table 9.1. The main algorithms of time alignment for the information fusion of AIS and radar are central cluster algorithm, linear interpolation algorithm, and interpolation–extrapolation based on the vessel’s course and speed provided by AIS. Central clustering algorithm clusters the time sampling sequences of AIS: Ta ¼ ta1 ; ta2 ; . . . ; taf and the sampling time sequences of radar into a set ft0 1 ; t0 2 ; . . . ; t0 n g, where (f ¼ n + t); for this set with central cluster algorithm based on the nearest neighboring rules, the central clustering algorithm is characterized by computational complexity, more calculation periods, and is error prone (Wang et al. 2009). As for linear interpolation algorithms, it is considered that the ships are in uniform linear motion, but a ship under sail would be affected by tideway, so this algorithm is not suitable for the actual navigation of vessel. The algorithms of interpolation and extrapolation based on the vessel’s course and speed provided by AIS could not be implemented when some sampling information of AIS is lost. The paper first analyzed the characteristics of the sampling periods of AIS and radar, considering the real navigation status, then adopted a time alignment algorithm based on interpolation fitting method, and normalized the two sensors’ sampling intervals, thus realizing the united sampling time.
9.1
The Time Alignment Algorithm Based on Interpolation Fitting Method
From the above analysis, we have known that the sampling frequencies of AIS and radar are different. The sampling intervals of radar are more uniform, while those of AIS at different states are nonuniform. On the assumption that when the time alignment of AIS and radar has been conducted before they begin to work, and since then both AIS and radar obtained a set of ship information at each sampling times. In the paper, some smooth curves are obtained after the interpolation operation and data fitting for the sampling information; according to the fitting curves, curve values can be calculated at any period (Liang and Pan 2006). For example,
9 A Time Calibration Algorithm Based on the Data Fitting and Its Application
109
the radar’s sampling information at AIS’ sampling times can be calculated from the fitting curves, so the two sensors’ sampling data can be obtained at AIS samplings times and would make good preparations for the following fusion. Spline function is a kind of piecewise function in essence. Considering real navigation status and the effect of waves, more smooth fitting curve is needed. Therefore, this paper adopted triple spline fitting algorithm based on least-square method. Suppose one sensor has n þ 1 measurements at some targets in a period of time [a, b], the whole period of time was divided into a x0 < x1 < < xn b Here, xi is sampling times (i ¼ 0; 1; . . . ; n). The sampling data are corresponding to the given time xi ; i.e., yi ¼ f ðxi Þ ði ¼ 0; 1; . . . ; nÞ, where yi is distance or speed. A triple spline function SðxÞ is constructed and is made to satisfy the following conditions (Li et al. 2001): 1. Sðxi Þ ¼ yi ði ¼ 0; 1; . . . ; nÞ 2. It is cubic polynomial in every interval ½xi ; xiþ1 ði ¼ 0; 1; . . . ; n 1Þ 3. It has second continuous derivative If we have known the derivative value of interpolation function at the nodes, as: S0 ðxi Þ ¼ mi ði ¼ 0; 1; . . . ; n 1Þ, according to (9.1), then Sðxi Þ ¼ yi , Sðxiþ1 Þ ¼ yiþ1 , S0 ðxi Þ ¼ mi , S0 ðxiþ1 Þ ¼ mi¼1 , ði ¼ 0; 1; . . . ; n 1Þ; then using Hermite interpolation formula to write the triple spline function in every interval ½xi ; xiþ1 ði ¼ 0; 1; . . . ; n 1Þ:
x xi x xiþ1 2 x xiþ1 x xi 2 SðxÞ ¼ 1 þ 2 yi þ 1 þ 2 yiþ1 xiþ1 xi xi xiþ1 xi xiþ1 xiþ1 xi x xiþ1 2 x xi 2 þ ðx xi Þ mi þ ðx xiþ1 Þ miþ1 : xi xiþ1 xiþ1 xi (9.1) Let hi ¼ xiþ1 xi , then both sides of (9.1) are computed to solve the second derivatives and the right and left derivatives: S0 ðxÞ ¼
6 12 6 12 ðx xÞy þ ðx x Þ iþ1 i i yiþ1 h2i h3i h2i h3i 2 6 2 6 þ ðxiþ1 xÞ mi ðx xi Þ miþ1 ; hi h2i hi h2i
(9.2)
6 6 4 2 S00 xþ ¼ 2 yi þ 2 yiþ1 mi miþ1 ; i hi hi hi hi
(9.3)
6 6 2 4 ¼ 2 yi1 2 yi þ mi1 þ mi : S00 x i hi1 hi1 hi1 hi1
(9.4)
110
T.-z. Wang et al.
The second derivative of SðxÞ is continuous, so S0 xþ ¼ S00 x i i , i.e.,
6 6 4 2 6 6 2 4 yi þ 2 yiþ1 mi miþ1 ¼ 2 yi1 2 yi þ mi1 þ mi ; hi hi hi1 hi1 h2i hi hi1 hi1 (9.5) ð1 ai Þmi1 þ 2mi þ ai miþ1 ¼ bi ði ¼ 1; 2; . . . ; n 1Þ;
(9.6)
where ai ¼
hi1 1 ai ai ; bi ¼ 3 ðyi yi1 Þ þ ðyiþ1 yi Þ ði ¼ 0; 1; 2;... ; n 1Þ: hi1 þ hi hi1 hi
Using the natural boundary condition S0 ðx0 Þ ¼ S00 ðxn Þ ¼ 0, we have n 1 linear equations including n + 1 unknown variables m0 ; m1 ; . . . ; mn , 8 > < 2m0 þ a0 m1 ¼ b0 ; ð1 ai Þmi1 þ 2mi þ ai miþ1 ¼ bi ; > : ð1 an Þmn1 þ 2mn þ an mnþ1 ¼ bn ði ¼ 1; . . . ; n 1Þ;
(9.7)
where a0 ¼ 1, an ¼ 0, b0 ¼ ð3=h0 Þðy1 y0 Þ, bn ¼ ð3=hn1 Þðyn yn1 Þ. The coefficient matrix of the equations is triangular matrix, and its determinant value is not equal to zero, so the equations have a unique solution m0 ; m1 ; . . . ; mn . Substituting xi ; yi ; mi ði ¼ 0; 1; . . . ; nÞ into (9.1) yields three spline interpolation functions. Three spline interpolation functions Sðxi Þ are constructed based on sensors’ sampling points, assuming that F is a function set which is constructed by all of on least square is to find Sðxi Þ, the three-spline interpolation-fitting arithmetic Pn based 2 out S ðxi Þ in F that can satisfy the equation ½ S ðx i Þ yi ¼ min , where i¼0 f ðxi Þ ¼ yi . As the above theoretical analysis, we can know that the three spline interpolation-fitting curves based on least square have perfect mathematical characteristics and are also closer to actual case.
9.2
Normalized Sampling Intervals
Due to the different sampling period of two sensors, the normalization of sampling period is required in order to make subsequent processing; i.e., the same sample intervals T are selected according to their own characteristics of systems. by the specific status of target navigation. Comparing T is determined xRi xRði1Þ (the time difference of the former sampling and the latter one of radar) and xAi xAði1Þ (the time difference of the former sampling and the latter
9 A Time Calibration Algorithm Based on the Data Fitting and Its Application
AIS sampling data
Interpolationcomputation
Curve fitting
111
Radar sampling data
Normalization
Agreed sampling
AIS data after normalization of sampling time
Interpolationcomputation
Curve fitting
Radar data after normalization of sampling time
Fusion center
Fig. 9.1 The time alignment algorithm based on interpolation fitting method
one of AIS), the sample interval of the sensor which has smaller time difference is chosen as normalized sample interval T. This method can adapt automatically sample interval with the change of speed of target vessel and provide denser sampling time point, featuring in adaptability. Once T is determined, the abovementioned algorithm can be employed to realize the time calibration of sampling data of the two sensors. To sum up the flow chat is as follows (Fig. 9.1):
9.3
Simulation and Results Analysis
Suppose our own vessel is equipped with radar and AIS, a Cartesian coordinate system is to be established with the vessel as origin, true north as the Y-axis direction, and vertical direction as X-axis direction. Now, the three target vessels sharing the sea areas with the own vessel take relative motion to the own vessel. Suppose their tracks are as follows:
112
T.-z. Wang et al.
1. The track of vessel A is: Xa ¼ R cos ot; Ya ¼ R sin ot; R ¼ 1; 500 m, o ¼ 10 rad/s: 2. The track of vessel B is maintaining the uniform linear motion at the speed of 10 m/s and making the uniform variable rectilinear motion along the original direction with acceleration of 2 m/s2 after 29 s navigating. 3. The track of vessel C is maintaining the uniform linear motion at the speed of 8 m/s and making the variable accelerated speed linear motion along the original direction after 26 s navigating. Simulation time is set to 60 s and the rotation period of the radar antenna is 3 s; target B is the actual target vessel measured by AIS. In Table 9.1, its information updating changes from 6 to 2 s according to the sailing speed of vessel B. Considering the real ship maneuverability and sensor error levels (radar error is tens of meters while AIS error is a few of meters), trifle disturbance is added in the process of simulation. Therefore, the track sequences of sampling periods of three vessels A, B, and C are obtained as follows (Tables 9.2–9.5): The above-mentioned method is employed to conduct interpolation fitting by three sensor sampling values, and Fig. 9.2 is obtained: Fitting the data of three sensors, a smooth curve is obtained, respectively, fi ðtÞ ði ¼ 1; 2; 3; 4Þ, from which the sampling values at any sampling period of sensors are deduced. Considering the error of sensors, the accuracy of fitting
Table 9.2 Track sequences of vessel A detected by radar Tra 3 6 9 12 15 18 21 24 27 30 Lra 1,592 1,641 1,720 1,656 1,592 1,514 1,458 1,480 1,543 1,493 Tra 33 36 39 42 45 48 51 54 57 60 Lra 1,542 1,601 1,672 1,614 1,555 1,543 1,520 1,506 1,488 1,498 Lra indicates the distance between the origin and vessel A measured by radar at sampling time in meters. Tra indicates the radar sampling periods in seconds
Table 9.3 Track sequences of vessel B detected by radar Trb 2 5 8 11 14 17 20 23 26 29 Lrb 132 179 215 249 290 315 332 373 415 460 Trb 32 35 38 41 44 47 50 53 56 59 Lrb 511 587 670 746 849 1,092 1,235 1,405 1,595 1,804 Lrb indicates the distance between the origin and vessel A measured by Radar at sampling time in meters. Trb indicates the radar sampling periods in seconds Table 9.4 Track sequences of vessel B detected by AIS Tab 2 8 14 20 26 28 30 32 34 36 38 Lab 125 176 206 283 362 383 434 446 492 552 615 Tab 40 42 44 46 48 50 52 54 56 58 60 Lab 678 775 990 1,090 1,170 1,280 1,490 1,617 1,742 1,820 1,943 Lab indicates the distance between the origin and vessel A measured by radar at sampling time in meters. Tab indicates the radar sampling periods in seconds
9 A Time Calibration Algorithm Based on the Data Fitting and Its Application
113
Distance from the own vessel L
Table 9.5 Track sequences of vessel C detected by radar Trc 1 4 7 10 13 16 19 22 25 28 Lrc 317 439 520 539 620 678 745 823 914 1,086 Trc 31 34 37 40 43 46 49 52 55 58 Lrc 1,295 1,487 1,694 1,886 2,112 2,347 2,408 2,467 2,502 2,612 Lrc indicates the distance between the origin and vessel A measured by radar at sampling time in meters. Trc indicates the radar sampling periods in seconds
The curve of vessels’ tracks formed by conduction of interpolation fitting 3000 A Radar B Radar
2000
B AIS C Radar
1000
0
0
10
20
30 Sampling time T
40
50
60
Fitting residual 20 Fitting residual
A Radar 15
B Radar B AIS
10
C Radar
5 0
0
10
20
30 Sampling time T
40
50
60
Fig. 9.2 Fitting curves
residual is within the limits. Now the sample intervals of two sensors are to be normalized. Due to the change of motion speed of vessel B after a period of time of navigating, the sampling frequency of vessel A changes from 6 s at the beginning to 2 s. Then the sampling period of radar is applied with 3 s of time intervals and then that of AIS is applied with 2 s of time intervals. All are regarded as standard sampling points, i.e., {2, 5, 8, 11, 14, 17, 20, 23, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60}, which are substituted into the cubic spline function curve as variables. The data measured by radar and AIS at the same sampling period are obtained, as shown in Fig. 9.3.
T.-z. Wang et al.
The distance from own vessel L
The distance from own vessel L
114
Sampling points of 3 vessels before the time calibration
3000
A Radar B Radar
2000
B AIS C Radar
1000
0
0
10
20
30 Sampling time T
40
50
60
Sampling points of 3 vessels after the time calibration 3000 A Radar B Radar
2000
B AIS C Radar
1000
0
0
10
20
30 Sampling time T
40
50
60
Fig. 9.3 Sampling points of three vessels before and after the time calibration
9.4
Application in Fusion of Navigation Information
In this paper, the time calibration algorithm is application in fusion of navigation information. The sampling data are from the Navigation Information Simulator (made by Shanghai Honghao Science and Technology Co., Ltd). In this simulator, some parameters can be set, such as information of the ship, parameters about the weather, parameter of radar, and AIS. By finishing the system establishment, the radar information and AIS information can be saved in database. After the time alignment algorithm based on interpolation fitting method, the fusion information about the ship trajectory will be shown in Google earth software. As shown in Fig. 9.4. The sample data are set as follows. The ship travels in Huangpu River of China; the trajectory is from (Latitude: 31 140 55.8900 , Longitude: 121 290 46.3000 ) to (Latitude: 31 140 33.7000 , Longitude: 121 290 29.7300 ). The result of navigation information fusion based on the time calibration algorithm is shown in Fig. 9.5. The fusion result shows that the fusion ship trajectory is in the middle of the scope area between the AIS and radar. Furthermore, the fusion ship trajectory is much smoother than others.
9 A Time Calibration Algorithm Based on the Data Fitting and Its Application
115
Radar Google Earth
The Time Calibration Algorithm
AIS
Ship
Fusion
Fig. 9.4 Application in fusion of navigation information
AIS Radar Fusion
Fig. 9.5 The result of navigation information fusion based on the time calibration algorithm
9.5
Conclusions
The time calibration of data measured by sensors, the preprocessing of information fusion of AIS and radar, has a direct impact on the quality of track correlation and trace fusion. In this paper, the time calibration algorithm of data measured by radar and AIS is presented based on interpolation fitting. The simulation results show that this algorithm could align the data from the different sampling period measured by two different sensors to the normalized sampling period. Considering real navigation status, the mathematical characteristic of interpolation fitting algorithm could solve the problems arising from big fluctuation or loss of sensor sampling data.
116
T.-z. Wang et al.
References Li Qingyang, Wang Nengchao, Yi Dayi. Numerical Analysis (Third Edition), Tsinghua University Press, 2001.08. Li Weiyun. Discussing information fusion about radar and AIS in VTS. China Water Transport, 2006, 07. Lin Changchuan. Study on Fusion Method of Object Position Information in Radar With AIS. Navigation of China, 2002(1). Liang Kai, Pan Quan. The Research of Multi-sensor Time-Registration Method based on Curve Fitting. Fire Control and Command Crontrol, 2006, 12. Luo Suyun, Tan Jian. Object Track Correlation Algorithms Of AIS and Radar. Journal of Wuhan University of Technology, 2003, 02. Wang Chenxi, Wang Xiaobo, Zhu Jing, Wang GuoHong. A Survey of Radar and AIS Information Fusion. Command Control & Simulation, 2009.04(2).
Part IV
GIS Algorithms and Computational Issues
.
Chapter 10
Remote Sensing Data Analysis Based on Hierarchical Image Approximation with Previously Computed Segments Philipp Galiano, Mikhail Kharinov, and Victor Kuzenny
Abstract In this report, a method for remote sensing data segmentation is proposed. For image approximation, the segments with minimal difference between adjusted threshold parameter and segment variance or entropy are selected. The generation of image hierarchical segmentation based on intensity gradient analysis and computation of object segments accounting for variance (entropy) are considered. For the algorithm generalization, the method of sequential image approximations within prescribed standard deviation values by means of image segments computed in merging/splitting technique is proposed. The peculiarities of data structures for optimization of computations are discussed. The comparison with known methods is outlined.
10.1
Introduction
GIS is an information system that provides means for collection, storage, processing, access, display, and dissemination of spatially encoded data (Berlant and Coshcarev 1999). GIS can be divided according to territorial scope (global, regional, subregional, and local), by subject area (e.g., city, municipal, and environmental), etc. “Problem orientation of GIS is determined by her objectives (scientific and applied), including an inventory of resources (for example – cadaster), analysis, evaluation, monitoring, management and planning, decision support” (Berlant and Coshcarev 1999). Modern GIS needs to handle large amounts of data, “a broad geographical distribution of objects, great variety of analyzed information, complex relationships and structures” (Potapichev 2005). These features make it necessary to use GIS, which employ methods and means of artificial intelligence – intelligent GIS (IGIS). The most important way of obtaining information about spatially encoded data is remote sensing – the process of obtaining information about the Earth’s surface
P. Galiano (*), M. Kharinov, and V. Kuzenny SPIIRAS, 39, 14 Line, 199178 St. Petersburg, Russia e-mail:
[email protected];
[email protected] V. Popovich et al. (eds.), Information Fusion and Geographic Information Systems: Towards the Digital Ocean, Lecture Notes in Geoinformation and Cartography 5, DOI 10.1007/978-3-642-19766-6_10, # Springer-Verlag Berlin Heidelberg 2011
119
120
P. Galiano et al.
(and other cosmic bodies), objects, located on its surface, or in its entrails by remote methods (Berlant and Coshcarev 1999). Systems for analysis of remote sensing data (RSD) are a special case of machine vision systems. For its development, integration of methods from different subject domains, in particular image processing, pattern recognition, and some branches of physics, is needed. Analysis of remote sensing data (bitmap image), in general, consists of three stages: 1. Segmentation of the bitmap, i.e., grouping the pixels in the set of connected regions – segments. The resulting segmented image should be an approximation of the original and in this case should be possible to select the degree of closeness of segmented images to the original. 2. Construction of a feature space, i.e., description of the properties of these segments. Herewith, for each segment the following properties are defined: 2.1. The properties of segment itself. 2.2. Properties of context, i.e., set of other segments of analyzed remote sensing data, with the exception of the current segment. In practice, the analysis takes into account not the whole context, but only its subset, that is relevant to the purposes of analysis, and this subset is determined according to objects properties. 3. Classification of elements of the feature space using different classification methods (artificial neural networks, Bayesian classifier, etc.). Let us consider the contents of the first stages in detail. The range of problems solved with the help of remote sensing data is wide, and the complexity of the scene is usually very large. The volume of processed remote sensing data shows an upward trend. For some subject domains (e.g., meteorology), remote sensing analysis system must work in hard real time. In addition, the properties of a single object in the remote sensing data are characterized by high variability, depending on the season, the angle of shooting, etc. Thus, the segmentation algorithms must obey the following requirements: l
l
l
Computational efficiency. A time limit for processing makes difficult or impossible to use segmentation algorithms developed for the machine vision (Aksoy 2008). Consideration of both the brightness and the geometric properties. This requirement implies the impossibility of using only histogram segmentation algorithms (Gonzalez and Woods 2006), despite the high speed processing. Sustainability in relation to the distortion caused by imperfections in both receiving and transmitting equipment, and changes in shooting conditions. Especially, the requirement of commutativity can be accented in relation to the reversible affine transformations.
We note that currently there is no solid theory of bitmap image analysis, so the system for analysis of remote sensing data should include a wide range of segmentation techniques that allow taking into account various properties of objects. The set of segments properties is standard (ENVI Feature Extraction
10
Remote Sensing Data Analysis Based on Hierarchical Image Approximation
121
Module User’s Guide 2008) and includes a number of obvious geometric characteristics (area, perimeter, etc.) and the properties of the brightness of pixel segment [histogram moments (Gonzalez and Woods 2006), etc.]. Classification algorithms typically have greater flexibility and less dependence from the chosen feature space. A more detailed review of algorithms for analysis of remote sensing data is held in Galjano and Popovich (2007). This chapter proposes a new algorithm of segmentation, which satisfies the above requirements.
10.2
A Formal Description of Image Properties
RSD can be represented as a set of bitmap images. In this section, we introduce the basic terms (Gonzalez and Woods 2006), used for description of proposed method. Definition 1. The set of all elements of the matrix of the bitmap (Pi) we denote as S. Pixels (elements of Pi) are labeled by lowercase latin alphabet: p, r, q. Pixels belonging to S are denoted as pS. Each pixel has two coordinates: x and y. Labels “p” and “(xp, yp)” are equivalent. Numerical value (brightness) of pixel p is denoted as B(p), BðpÞ 2 ½0; MS , where MS is maximal bright of bitmap S. Definition 2. The average brightness of the raster S is defined as the arithmetic mean of all the brightness of the raster. Definition 3. 8pðxp ; yp Þ; qðxq ; yq Þ 2 S the distance is defined as follows: kp; qk ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðxp xq Þ2 þ ðyp yq Þ2 :
(10.1)
Definition 4. For each pixel p(xp, yp) from set S, the four nearest neighbors are defined as follows: N4 ðpÞ ¼ fqðxq ; yq Þjkq; pk ¼ 1g:
(10.2)
Definition 5. For each pixel p(xp, yp) from set S, the four diagonal neighbors are defined as follows: ND ðpÞ ¼ fqðxq ; yq Þjkq; pk ¼
pffiffiffi 2g:
(10.3)
Definition 6. Eight neighbors of N8(p) are defined as follows: N8 ðpÞ ¼ N4 ðpÞ [ ND ðpÞ:
(10.4)
If the point lies on the border of the image, some of its neighbors are outside of the image.
122
P. Galiano et al.
Definition 7. Suppose that d is given threshold. Two pixels p and q are called contiguous with the threshold d, if q 2 N8 ðpÞ and jBðpÞ BðqÞj d. In this and subsequent definitions under contiguity we mean contiguity with the threshold d. Definition 8. Discrete path (curve) with the threshold d from a pixel p with coordinates (x, y) to pixel q with coordinates (s, t) is a sequence of not repeating pixels with coordinates: Kd ðp; qÞ ¼ ðx1 ; y1 Þðx2 ; y2 Þ ðxn ; yn Þ;
(10.5)
where (x1, y1) ¼ (x, y), (xn, yn) ¼ (s, t) and pixels ri(xi, yi) and ri + 1(xi 1, yi 1) are contiguous with threshold d, when 1 i n. In this case, n is called the length of path Kd. Definition 9. Let W be a subset of image elements. Two of its elements, p and q, are connected in S, if there is a path between them, entirely composed of the elements of W: p 2 SLq 2 SL9ðp1 ; . . . ; pn Þ 2 S, such that 8i ¼ 1; . . . ; n; pi 2 Kd ðp; qÞ. For every pixel p from S, the set of all pixels connected with it in the S with the threshold d is called the connected component (or segment): Rd ðpÞ ¼ fqi j9Kd ðp; qi Þg:
(10.6)
Definition 10. The square of connected component – the number of pixels that composes the component. Definition 11. Two segments are called equal if they consist of pixels with the same coordinates. Definition 12. Segment A consists of the segments B1, . . . , Br, if the set of all pixels in the segment A is equal to the union of the sets of all pixels of the segments B1, . . . , Br. Note that r can be equal to 1. Definition 13. Segmentation with the threshold d is the selection of all image segments, connected with the threshold d. In the resulting segmentation, the connected fields are filled with numeric values, approximately describing the brightness of the pixels belonging to the segment. From this values here and henceforth we use the arithmetic mean of the segment pixel brightness.
10.3
Algorithm Description
Obviously, the result of segmentation with the threshold d ¼ 0 is the same as the original image. While d increases, the average area of the segment increases monotonically, and at d ¼ dmax coincide with the size of the image. Experiments show that dmax is generally much smaller than Ms (maximum brightness of image).
10
Remote Sensing Data Analysis Based on Hierarchical Image Approximation
123
Let d1 2 ½0; dmax , d2 2 ½0; dmax , and d1 > þu þg ¼0 < @t @x @x ; (13.1) > @ @ > : þ ð½hðxÞ þ uÞ ¼ 0 @t @x where – elevation of water surface; u – horizontal velocity of water flow; g – gravity; hðxÞ – undisturbed water depth. Solving the system (13.1) via numerical methods for each ith variant of initial conditions leads to receiving Bi ðtÞ – parameters of ith waves crest on the coast in the areas of interest. These parameters include the waves’ amplitude and steepness, and water level fluctuation at the line. For a particular case, a single parameter may be sufficient, for instance, the wave amplitude. Note that under the assumptions made in (13.1), the water viscosity and the bottom soil structure were not taken into account. Moreover, so far many aspects related to this system of equations solution for the bottom complex topography are not elaborated. To define the wave parameters more precisely, these factors could be taken into account by simulating the propagation of such waves by special geographic information technologies. Risks assessment of each tsunami wave variant at time instant T provides the accounting for its properties, as well as for the characteristics of the inflicted facilities, used protective structures and main land relief. In the interests of such assessment inclusive of tsunami wave generation uncertainty Bi ðTÞ – mean values for parameters of ith waves crest on the coast at time instant T are calculated as: ðT Bi ¼ Bi ðTÞ tfi ðtÞdt:
(13.2)
0
Depending exactly on these parameters Bi ðTÞ, the conditional ith tsunami waves risks Wij ½Bi ðTÞ for specific jth objects are suggested to be determined. These partial
158
O.V. Smirnova
conditional risks can be, for instance, determined by experts. They can be also received in a result of the processes’ modeling based on advanced geoinformation technologies and special analytical problems’ solutions. Among these problems are the assessment of area flood level, the environmental threats occurrence, and others. The resultant predicted tsunami risk at time instant T inclusive of previous conditions and probabilities Pi of ith waves manifestation are calculated by the following formula: Wi ðTÞ ¼
n X m X
Wij Bi ðTÞPi ;
(13.3)
i¼1 j¼1
where m – number of prototypes studied for the coastal objects stability against tsunami. Risk Wi ðTÞ according to (13.3) is a current one. Risks’ prediction for specified time intervals is possible through the integral estimates derivation. Unlike the known methods, the proposed approach allows to predict the tsunami risks in the current situation under a significant uncertainty of its sources emerging. The new method can be successfully implemented in modern geoinformation systems. However, the above-mentioned use is associated with a number of features.
13.4
Some Features of the Method Application for Geographic Information Systems
The spatial analysis performed by GIS tools allows for representing the results of risk prediction as a series of thematic maps. Risk maps inclusive of spatial binding to a specific area display zones characterized by specific indices Wi ðTÞ of ith tsunami waves crest on jth coastal objects. Application of the proposed risk assessment method permits to create risk dynamic maps “on-the-fly” and estimate detriment value as caused by the specific risk, wave parameters, sources of hazard (in water and on land), at that, not excluding the possibility of static maps development. Creation of the thematic maps assumes a development of qualitative scale that reveals the relation between risk level and dilapidating effects’ nature. On the basis of the developed dynamic maps, the decisions aimed at decreasing the risks and planning actions for coasts protection against tsunami can be made.
13.5
Conclusion
This chapter proposes a new tsunami risks’ assessment method based on using geoinformation technologies and taking into account the uncertainty of tsunami sources emerging and waves propagation conditions. The method can be used in modern geographic information systems for emergency analysis. Also by analogy
13
Tsunami Risk Assessment Method Using Geoinformation Technology
159
with this method, the approaches to other risks assessment could be developed, particularly, to other catastrophes’ risks, like dams damage and floods risks’ assessment.
References Aivazian SA, Buchstaber VM, Enyukov IS, Meshalkin DD Applied Statistics. Classification and reduction of dimension. Moscow: Finance and Statistics, 1989. Canadian Topographic Digital Maps, http://www.etopo.ca. Khakimzyanov GS Numerical simulation of the oblique interaction of solitary wave with vertical wall. J Modelling in mechanics. 1992. Bып. 6(23). № 1. C. 141–146. Lavrentiev M.A., Shabat B.V. Problems of hydrodynamics and its mathematical models. Moscow, Nauka, 1973. Marchuk AG, Groshev EV, Chubarov LV Numerical simulation tsunami waves behavior in the shelf zone. J Research tsunami, 1986. №1. p. 94–102. Novikov VA, Fedotova ZI, Kuzmicheva TV Some problems of modeling long waves runup on the coast of complex shapes, J Computational tecnologies 1993. №4. C. 196–209. Pelinovsky DE, Hydrodynamics of tsunami waves. Nizhny Novgorod: IPF RAS, 1996.
.
Chapter 14
Real-Time 3D Monitoring of Marine Navigation Cyril Ray, Rafal Goralski, Christophe Claramunt, and Chris Gold
Abstract Maritime transportation has a huge impact on the world economy and our everyday lives. Nowadays, the vast majority of world goods are carried by sea by huge container carriers. The disasters and damages caused in the event of major sea collisions can pose serious threats to the environment. This stresses the urgent need for the development of maritime navigation systems whose objective should be to contribute to a safer sea. Among many technical and methodological issues, there is still a need for distributed computing architectures that will favour real-time monitoring of mobile objects at sea, and generation of computer and visual-based environments. The research presented in this chapter introduces a three-dimensional (3D) GIS applied to maritime navigation. The three-dimensional component facilitates the understanding of maritime behaviours and patterns at sea, thus favouring better decisions for either sailors at sea or maritime authorities in charge of traffic monitoring.
14.1
Introduction
Maps are among the oldest forms of graphical communication and are the most effective and efficient means for the transfer of spatial and geographical information (Kraak 2001). Over the years, different types of maps have been developed with different cultural and application backgrounds and used for many aspects of our everyday lives. More recently, maps have moved from paper to digital formats and are becoming more popular than ever. Despite these progresses, usual twodimensional (2D) maps are sometimes difficult to interpret from a cognitive point of view. Still, users have to generate a mental model of a map and in particular to
C. Ray (*) and C. Claramunt Institut de Recherche de L’Ecole navale (EA 3634), 29240 Brest Cedex 9, France e-mail:
[email protected];
[email protected] R. Goralski and C. Gold GeoVS Ltd, CBTC, Cardiff CF24 4AY, UK e-mail:
[email protected];
[email protected] V. Popovich et al. (eds.), Information Fusion and Geographic Information Systems: Towards the Digital Ocean, Lecture Notes in Geoinformation and Cartography 5, DOI 10.1007/978-3-642-19766-6_14, # Springer-Verlag Berlin Heidelberg 2011
161
162
C. Ray et al.
translate the symbols and map features towards some abstract concepts. This explains why so many people have difficulties with the interpretation and understanding of two-dimensional maps, often resulting in errors and sometimes leading to fatal mistakes. Three-dimensional maps commonly rely on 3D computer graphics in order to present geographical information that to a certain degree fit a real-world representation. The potential of topographic three-dimensional maps is based on their greater representational flexibility, as well as inherent ease of their understanding, and their relatively high intuitiveness. This is a consequence of the way we see the world and how three-dimensional representations appeal to our brains (Van Driel 1989). According to estimates, about 50% of brain neurons are used in the process of human vision. Three-dimensional views stimulate more neurons and hence are processed quicker (Musliman et al. 2006). The process of decoding is easier because three-dimensional maps resemble the real world to a greater degree than their traditional 2D counterparts and are more natural to human brain (Schilling et al. 2003). However, the strength of three-dimensional maps lies not only in the perspective representation that mimics the way humans perceive the world, but also in the use of 3D symbols, which are much quicker and easier to comprehend than 2D equivalents. The process of perception in three dimensions has been perfected by thousands of years of evolution, because prompt recognition of potential dangers was, and still is, crucial for survival. One of the results of this evolutionary spatial training is that a large part of properly designed 3D symbols can be recognised without special training or referring to a legend. This is true for symbols that represent tangible objects or features, which have equivalents in the real world. For these objects, recognition can be based on the previous life experience of a user. This extends the range of experience-based presentations used in classical cartography, as described by Giertsen and Lucas (1994). What is important is that symbols do not have to be realistic. Simplified and abstracted representations are still easily recognisable. In fact non-photorealistic computer graphics is known for its capability to provide vivid, expressive, and comprehensive visualisations with strong potential for use in cartography (Durand 2002; Dollner 2007). The use of 3D symbols contributes to the increased usefulness of 3D maps also for large regions, where otherwise visualisation of terrain topography loses the effect of 3D and becomes visually flat, almost 2D like. The time and mental effort required for understanding of two-dimensional maps has severe consequences in areas where time for analysis of the situation is crucial. This is the case for example in navigation of high-speed marine craft, where not only situation changes quickly, and available time for reaction is limited, but tiredness of navigators further limits their cognitive capabilities. Porathe (2006) describes several cases of high-speed vessels crashing into rocks, among them one involving 16 deaths. All these collisions were caused by poor understanding of the situation or temporal loss of orientation by navigators. On the basis of the experiments made with four different types of maps, Porathe (2006) argues that three-dimensional maps not only are quicker to understand, but also provide
14
Real-Time 3D Monitoring of Marine Navigation
163
improved situation awareness, and have strong potential of helping to minimise marine accidents. Experiments conducted by Porathe showed that the use of threedimensional maps leads to around 80% reduction in the number of navigational mistakes, as well as a 50% reduction in the time required for completion of the simulated navigational task. Three-dimensional maps were also indicated by the participants as the most friendly and easy to understand. This has a special importance in the light of research on the causes of marine accidents, which indicates that between 89 and 96% of collisions at sea are attributed to human errors (Rothblum 2006). On the basis of these findings, the research presented in this chapter introduces three-dimensional visualisation facilities to favour real-time monitoring of maritime navigation. This system has been experimented in the Brest coastal area in North West France where ships trajectories and behaviours are obtained using the Automatic Identification System (AIS) and a self-made tracking device. The remainder of this chapter is organised as follows. Section 14.2 introduces the current trends in maritime navigation regarding nautical charts and motivates our three-dimensional visualisation approach. Section 14.3 presents the modular platform developed for the monitoring of mobile objects in maritime navigation, the data integration, and visualisation principles retained. Section 14.4 introduces the application of the 3D visualisation system to maritime navigation monitoring in Brest Bay. Finally, Sect. 14.5 concludes this chapter and introduces further work.
14.2
Trends in Maritime Navigation and Motivation of Our Approach
Maritime transportation has a large impact on the economy and our everyday lives as the vast majority of world goods are carried by sea. The European Union strategy regarding maritime transport development includes plans to significantly increase sea shipping, by establishing safe motorways of the sea (European Commission 2006). In order to coordinate maritime activities, the International Maritime Organization (IMO) was established in Geneva in 1948. IMO’s main task has been to develop and maintain a comprehensive regulatory framework for shipping and its remit today includes safety, environmental concerns, legal matters, technical cooperation, maritime security, and the efficiency of shipping. Other organisations involved in the safety of navigation are the International Hydrographic Organization (IHO) and many state regulators. A major development fully supported by the IMO and IHO had great impact on maritime navigation in the last decades, namely the electronic chart display information system (ECDIS) whose objective is to replace maritime maps on the decks with automated and electronic charts. These developments opened many opportunities regarding the range of functionalities then offered to sailors at sea as well as authorities tracking and monitoring maritime navigations, but are limited to two-dimensional visualisation principles.
164
C. Ray et al.
Fig. 14.1 Example of 2D Electronic Navigational Chart (Brest Bay). Courtesy of Service Hydrographique et Oce´anographique de la Marine
The current ECDIS standard is meant for supporting safe navigation and decision making by ship navigators. An ECDIS-compatible system comprises the official nautical chart data (International Hydrographic Bureau 2000) stored in the vector IMO/IHO Electronic Navigational Chart (ENC, cf. Fig. 14.1) format, real-time two-dimensional display associated with the current position of a vessel obtained from GPS, and a user interface to perform basic navigational tasks. The idea of three-dimensional navigational charts was initially introduced in Ford (2002), with the conclusion that three-dimensional visualisation of chart data had the potential to be a decision support tool for reducing vessel navigational risks. Beyond the security matter, another effective and attractive real-time visualisation that might take advantage of three-dimensional real-time monitoring is sailing events. Indeed, sailing championships and maritime festivals bring together a wide audience, among sailors, sailing experts, and spectators. However, such events take place far from the coast (e.g. 5 and 15 km) and the audience follows ships regatta with difficulty. This also motivated the development of user-oriented architectures and prototypes for monitoring maritime navigations. The Share-Loc project was one of the early approaches to provide maritime data on mobile computers (Desvignes et al. 2002), and it has been later extended by a multi-agent reasoning mechanism in order to provide simulation functionalities to anticipate potential risks of collisions
14
Real-Time 3D Monitoring of Marine Navigation
165
(Fournier et al. 2003). A Web-based interface for maritime navigation has also been introduced for tracking maritime navigations (Bertrand et al. 2007). The main assumption of our approach is that three-dimensional representation of space in navigation environment might offer a significant contribution to the problem of maritime safety and helps to minimise the likelihood of human-errorinduced marine accidents. We introduce a marine geographic information system that aims at providing navigational aid superior to today’s electronic charts. It is based on kinetic data structures, a three-dimensional visualisation engine, a combination of static and real-time data structures and intelligent navigation rules (Goralski and Gold 2008) and relies on a modular platform developed for the monitoring of mobile objects in maritime navigation (Bertrand et al. 2007). The system uses the official nautical chart data stored in the IHO S-57 format to derive automatically real-time three-dimensional display, including standard objects and attributes’ catalogue, associated with the current position of vessels.
14.3
Architecture and Information System Design
14.3.1 Location at Sea The range of innovations required to reach real-time 3D monitoring of marine navigation is wide and includes amongst others engineering efforts such as chart production and data distribution, specialised software for maritime training and navigation, Search and Rescue (SAR) application, automatic communication protocols, and onboard monitoring equipment development (e.g. sensors and ARPA radar). Location at sea has long relied on radar sensors. Recent advances concern new technologies based on optic sensors, camera-based visualisation with automatic movement detection, and location-based sensors broadcasting navigational information on a regular basis. One of the most successful systems used in maritime navigation and positioning is the AIS whose objective is to identify and locate vessels at distance. The International Maritime Organization has made the AIS a mandatory standard for the Safety of Life at Sea (SOLAS). Passenger ships and international commercial ships with gross tonnage are now equipped with AIS. An AIS generally integrates a transceiver system, a GPS receiver, and other navigational sensors on board such as a gyrocompass and a rate of turn indicator. An AIS transponder runs in an autonomous and continuous mode, regardless of its location (e.g. open sea, coastal, or inland areas). The transmitted information is broadcasted to a range of 35 miles through VHF communications, to surrounding ships, and to maritime Vessel Traffic Systems (i.e. maritime and port authorities). Bringing the AIS to sailing ships raises some problems as it does not specifically respond to energy consumption, weight, space, and environmental constraints (salinity, water, etc.). A light positioning system embedded in each boat had been designed. It includes a GPS receiver and an ISM
166
C. Ray et al.
modem and meets the constraints of nautical competitions, that is, to suit to the constraints of weight (less than 250 g), range (up to 15 km), and autonomy (over 10 h of operation). Practically, the trajectory of a mobile object located by both devices can be considered as a set of geographic locations, including moves and stops. Such geographic and temporal information are usually completed by descriptive metadata providing either quantitative or qualitative information on the ship trajectory. It can include data on the ship characteristics, route, speed, harbour from where the ship came, the destination of the trajectory, material transported, and estimated arrival time. Such trajectories should be ideally tracked and integrated in automated systems that provide the whole range of functionalities to the users, either at the local (i.e. vessel) or global levels (maritime authorities). The technological challenge is multiple: real-time integration of maritime trajectories (a somehow sensor-based issue), integration of these trajectories into distributed databases, mining and knowledge discovery of location data, and development of decisionaided systems.
14.3.2 Architecture and Information System Design Generic and scalable architecture and information system have been designed to integrate, store, model, analyse, and visualise spatially related data in time. The objective of our framework is to provide an integrated and flexible system for the real-time monitoring and tracking of different types of mobile objects. This framework has been experimented in the context of maritime navigation in Brest coastal area. Ship behaviours and trajectories are obtained combining data coming from the AIS and the self-made tracking devices for sailing ships within the same database. The platform, developed so far, has been designed with four-tier client–server architecture and organised through a distributed data and processing model (Fig. 14.2). This information system is a distributed platform based on three functions: (1) integration of positioning information, (2) data management and processing, and (3) visualisation. The platform is a computing system that includes a Java-based platform for mobile objects and PostGIS spatial database for data manipulation and storage. The platform is able to collect raw data from remote sensor stations through Internet and wireless communications. As the range of receivers on shore is limited, this provides an efficient location-based information system for maritime authorities that need to access data collected far from their location. This data integration level constitutes the central component of the platform. Data volumes can be relatively important. For instance, the AIS system located in the port of Brest can receive up to 50,000 relevant AIS messages per day. A representative daily figure for the Brest harbour is 13,817 positions recorded from 26 moving vessels. A day of sailing race monitoring (approximately 10 h, 20 ships) brought approximately 230,000 positions (transmission frequency of about 4 s).
14
Real-Time 3D Monitoring of Marine Navigation
167
Fig. 14.2 Architecture
This distributed platform can be also aggregated as a complete and autonomous package in order to provide the same tracking service onboard. In this case, ships need to embark the Java-based platform for mobile objects, a local PostGIS database, and an AIS receiver connected to a serial port of a computer. Such a configuration can provide onboard real-time views of maritime traffic. We designed several visualisation outputs corresponding to different kinds of users and that suit for navigational purposes as well as for maritime authorities. Based on existing standards and recommendations of the Open Geospatial Consortium (OGC) and the W3C for the design of Web services, a Web-oriented approach proposed to follow maritime traffic has been designed as a Web-based service that can report dynamic location-based information either on real time or not. It has been designed with a PHP-based generator of Keyhole Markup Language (KML) files (KML is an OGC standard) to apprehend the diversity of Geoweb applications. This favours integration of the location-based information on GeoWeb-based applications (e.g. GeoServer, Google Maps, and Mobile). The use of structured language like KML is also a way to support the connection of existing GIS software (e.g. ESRI tools, UDIG, Quantum GIS, and Google Earth). The 3D visualisation GIS relies on this approach. However, while being a worldwide standard, KML have the default to combine geographic data with visual representation within the same language. Such graphical description might be inappropriate in certain cases. Regarding our 3D visualisation system, an additional
168
C. Ray et al.
language has been designed (semi-structured TRR format, cf. Fig. 14.2) to optimise data exchanges and interpretation.
14.3.3 GIS 3D Graphical Engine The 3D ECDIS prototype is capable of performing a fully automatic conversion of standard 2D ENC vector charts into fully featured interactive 3D models. The software development relies on an object-oriented design and it supports typical GIS operations. The 3D visualisation system employed is based on a customdeveloped graphical engine for the storage, management, and dynamic visual optimisation and drawing of the graphical objects. The GIS engine is a combination of object-oriented code in several different programming languages and Open Graphics Library (OpenGL) API. The ENC reader is responsible for reading charts in the ENC format. It is based on GDAL (Geospatial Data Abstraction Library), which is an open-source project managed by the Open Source Geospatial Foundation, and more specifically on its OGR Simple Features Library. The reader performs several operations related to the data input, such as reading navigational objects of a specified type, determining the area boundaries, creating a terrain model based on contour lines and soundings, etc., and generates terrain model directly from the ENC chart. Automatic algorithms for 3D terrain generation from 2D vector data (S-57) are based on implementation of the state-of-the-art algorithms available in the most advanced GIS systems. The effective reconstruction of a 3D model of terrain from 2D data is problematic and is sensible to the quality and details of the original vector charts. This is typical and very likely to be encountered by other researches working in the field of extension of a 2D GIS into 3D. The problem has been originally explained in Dakowicz and Gold (2003). The terrain model is constructed based on the underlying ENC chart. Several layers are used and intelligent re-sampling algorithms are employed to produce results of the required density and quality. Then the original model is enriched with slope and valley reconstruction algorithms, based on the skeleton and crust properties of the Voronoi mesh used for storage of the model. The final model is stored as a file with simple points. The points are stored as geographical locations with associated height level in metres. When the model is loaded points are read and added to the Voronoi mesh using an incremental local update algorithm. The final model is shaded to resemble a typical chart provided with predefined colour values and minimal and maximal height. The navigational objects, ships, landmarks, and buildings have been designed using an external 2D modelling tool and are kept in external files. The models are associated with navigational signs of given types and attributes based on a configuration file. This approach allows for simple modification and extension of the model library. For optimisation reasons, only one model of any single type is loaded (only when needed) and stored in the memory as an OpenGL call list. Instances are then drawn several times for all objects associated with it in all required positions.
14
Real-Time 3D Monitoring of Marine Navigation
14.4
169
Visualising Maritime Traffic in 3D
The system has been evaluated for the display of two different and representative maritime events. The objective, on both case studies, was to (1) provide a real-time and on-demand replay tracking service and (2) to estimate the impact of a threedimensional visualisation on the perception users might have of maritime trajectories and behaviours. The first case study has been set up with firemen (final end users to convince) and emphasises on security and safety concerns. The second application has been set up to provide to general (non-expert) public an attractive visualisation of regattas, which take place every spring in the Brest Bay.
14.4.1 Safety Concerns Research and development department of rescue services (DERT) realises regularly simulation of operations for training and validation purpose. Such simulations should reproduce as far as possible exact conditions of expected real operations. The combination of our three-dimensional visualisation tool with the Java-based platform for mobile object had been experimented for real-time tracking of rescue simulations. The experiment has been conducted in the Brest coastal area in March 2009. The objective was to simulate a rescue operation on a passenger ship where a passenger should be evacuated. Figure 14.3 shows the 3D representation of terrain model generated automatically from ENC chart FR501130. The three-dimensional
Fig. 14.3 3D visualisation (Brest Bay)
170
C. Ray et al.
Fig. 14.4 Immersive 3D visualisation from passenger ship (Brest Bay)
visualisation system favours a real-time and immersive view representing the real situation perceived by marine officers on board (cf. Fig. 14.4). The operation involved the passenger ship Enez Eussa 3 of the Pen-Ar-Bed Company, the firemen’s high-speed ship, and a helicopter from civil security (Dragon 29). The passenger ship was equipped with AIS and the firemen’s ship with self-made portable location-based device. Movements of these vessels and AIS traffic in Brest Bay had been gathered and monitored in real time. During the operation, 6,798 positions from 17 vessels have been integrated in the database (1,237 positions for Enez Eussa 3). Beyond the capacity of our system to ensure real-time tracking of the operation either in 2D or in 3D, the storage of position data also provides useful information to firemen for debriefing and further training purposes.
14.4.2 Sailing Championship The annual championship of sailing boats from the French Naval Academy brings together a wide audience, among competitors, sailing experts, and visitors. The monitoring of this event relies on the combination of our 3D visualisation tool with the distributed platform to diffuse real-time information, from the coastal maritime area to the users located in the ground (http://navtrack.fr). The races take place far from the coast (located between 7 and 12 km) and the audience follows the regatta with difficulty. It has been noticed, during past events, that interest and expectations of competitors and sailing experts are different from the ones visitors might expect.
14
Real-Time 3D Monitoring of Marine Navigation
171
Fig. 14.5 3D visualisation of sailing championship (Brest Bay)
This application is mainly oriented to visitors to understand and satisfy the general public. That experiment has been conducted in the Brest coastal area in May 2010. Figure 14.5 shows the 3D representation of the terrain model generated automatically from the ENC chart FR370660. The data collected by the system and stored in the spatial database combine AIS traffic data and location-based data coming sensors embedded on two classes of ships: seascape and open 5.7. The monitored sailing race covered a period of approximately 10 h. The 20 equipped boats brought approximately 230,000 positions (transmission frequency of about 4–5 s). Figure 14.6 presents an immersive 3D view taken from one of the sailing ship.
14.4.3 Evaluation The methodology developed for these case studies favours comparison between a two-dimensional visualisation (KML files displayed on Google Earth application) and the three-dimensional visualisation. Our goal at the current stage is to produce a system that will be usable and convincing as a proof of concept. Several functionalities still have to be implemented in order to favour a good/better understanding of real-life maritime events: extend physics with new motion types and inertia, automatic simulation of tides and weather, more UI customisation with predefined and custom user profiles, automatic loading of relevant ENC files on the move,
172
C. Ray et al.
Fig. 14.6 Immersive 3D visualisation from a seascape ship (Brest Bay)
combination of several ENC files of different types for better quality, terrain display optimisation, incorporation of land and event-oriented data, and trajectory prediction. It is, however, possible to provide some evaluation of a three-dimensional visualisation of such maritime events when monitored either in real time or a posteriori. The evaluation has been conducted with system designers, firemen, the public audience of the Naval Academy sailing race, and some sailors. Their comments on the different visualisation functionalities have been collected and summarised. Through these experiments, it had been noticed that the orthogonal view provided by the two-dimensional visualisation offers a way to quickly understand the maritime situation if the visualisation is accompanied with additional information such as heading, speed vector, and path of the past minutes. Three-dimensional visualisation performs similar, in terms of understanding, to the orthogonal view. In both case studies, the main comment on such a view concerns the quick perception of the overall situation and distances between mobile objects. The good knowledge of the represented area and the visualisation of the coast on both views favour this perception. When using the immersive view only, comments differ according to the case study. Regarding the maritime traffic and its security, a single immersive view (bound to a static viewpoint or to the ego-centric perspective of a single moving vessel) of a maritime navigation appeared to be insufficient to provide a full understanding of a given situation due to several considerations such as the perception of distance and visibility of objects masked by the coast or other objects.
14
Real-Time 3D Monitoring of Marine Navigation
173
This may be resolved by combining both 3D and 2D presentations, by the use of multiple-linked immersive views, or by providing interactive mechanisms for manual modification of the perspective by the user. Another option, which frees the user from the need for constant manipulation and allows passive observation, is to use automatic view control mechanisms pre-programmed for accurate presentation of the displayed area by an experienced user. In our experiments, the possibility provided by our marine GIS to generate additional views taken from other ships has been identified as a real-added value for security. This allows a marine officer to visualise and perceive the environment from another ship, and to anticipate complex or unexpected behaviours, particularly when the number of ships increases. Another advantage of the immersive view concerns the training of marine officers. Many coastal areas are dangerous and sometimes unknown to marine officers. The immersive representation can be used for simulation purposes, allowing trainees to gain a prior understanding of a given environment and navigation constraints. Experiments done for sailing races showed that the three-dimensional approach generates a real interest from novice users and spectators on the ground alike. Those appreciate the ability to virtually embark sailing ships and to perceive what skippers can see from the deck. Overall, we can conclude that three-dimensional visualisations complement very well two-dimensional visualisations. The system developed allows the users to manipulate different views from immersive to orthogonal visualisations passing by all possible intermediate visualisations, and taking different points of view. Still, the understanding of sailing races by expert sailors through three-dimensional visualisation has not been yet evaluated, and this is left to further work. Such users might expect a better understanding of ship behaviours and navigation strategies, being closely related to environmental conditions, that is, wind speed and direction, sea currents, and tides. In particular, integration of annotations, symbols, and race characteristics are other relevant examples of some of the additional information that might be integrated within three-dimensional immersive views. Last but not least, a three-dimensional visualisation might also take into account some analysis functions to analyse trajectories and automatically detect specific events.
14.5
Conclusion
Although current maritime systems are relatively successful, thanks to the development of vessel traffic systems and automated information systems, there is still a need for additional interfaces and services that favour diffusion and visualisation of maritime navigation data. The research presented in this chapter introduces a modular data integration platform and a 3D interface to monitor and visualise mobile objects in a maritime environment. The system developed complements current two-dimensional visualisation systems and provides realistic three-dimensional views that give a better sense of the environment for many
174
C. Ray et al.
situations at sea, being an asset for safer navigation by providing navigational aid and decision support to navigators and training facilities. This architecture allows for the integration and filtering of various sensor-based data and provides several additional Web-based visualisation services to either vessel considered as clients or monitoring authorities. The interface developed can be used either in real time for navigation monitoring and control, or for the analysis of maritime navigation behaviours. Further works concern improvements of the distributed platform, further functional developments of the 3D visualisation system, and real-time experiments realised onboard.
References Bertrand F., Bouju A., Claramunt C., Devogele T. and Ray C. (2007), Web architectures for monitoring and visualizing mobile objects in maritime contexts, in proceedings of the 7th international symposium on Web and Wireless Geographical Information Systems (W2GIS 2007), pages 94-105, Springer-Verlag LNCS 4857 Dakowicz M. and Gold C.M. (2003), Extracting Meaningful Slopes from Terrain Contours, International Journal of Computational Geometry and Applications, vol 13, pp. 339-357 Desvignes G., Lucas de Couville G., Peytchev E., Devogele T., Fournier S., Claramunt, C. (2002), The Share-Loc project: A WAP-based maritime location system, In Proceedings of the 3rd International Conference on Web Information Systems Engineering, B. Huang et al. (eds.), IEEE press, pp. 88-94 Dollner J. (2007), Non-Photorealistic 3D Geovisualization, in Multimedia Cartography (2nd Edition). Springer-Verlag, pp. 229-239 Durand F. (2002), An Invitation to Discuss Computer Depiction, in proceedings of the 2nd international symposium on Non-Photorealistic Animation and Rendering (NPAR), Annecy, France European Commission (2006), Towards a future Maritime Policy for the Union: A European vision for the oceans and sea, Green Paper on a future Maritime Policy for the EU, Brussels Ford S.F. (2002), The first three-dimensional nautical chart, Undersea with GIS, ESRI Press, Redlands, California, pp. 117-38 Fournier S., Devogele T. and Claramunt C. (2003), A role-based multi-agent model for concurrent navigation systems, In Proceedings of the 6th AGILE Conference on Geographic Goralski R. and Gold C. (2008), Marine GIS: Progress in 3D visualization for dynamic GIS, in proceedings of the 13th conference on Spatial Data Handling (SDH), Montpellier, France Giertsen C. and Lucas A. (1994), 3D Visualization for 2D GIS: an Analysis of the Users’ Needs and a Review of Techniques, Computer Graphics Forum, 13(3), pp. 1-12 International Hydrographic Bureau (2000), IHO transfer standard for digital hydrographic data edition 3.0, Special publication No. 57 Kraak M.J. (2001). Cartographic principles. In: Kraak M. J. and Brown A. (Eds.) Web Cartography: Developments and Prospects. CRC Press, pp. 53-72 Musliman I.A., Abdul-Rahman A. and Coors V. (2006), 3D Navigation for 3D-GIS – Initial Requirements. Innovations in 3D Geo Information Systems. Springer-Verlag, pp. 259-268 Porathe T. (2006), 3-D Nautical Charts and Safe Navigation, Doctoral Dissertation, M€alardalen University Press Rothblum A. M. (2006), Human Error and Marine Safety, U.S. Coast Guard Risk-Based DecisionMaking Guidelines vol. 4, U.S. Coast Guard Research and Development Center, USA
14
Real-Time 3D Monitoring of Marine Navigation
175
Schilling A., Coors V., Giersich M. and Aasgaard R. (2003), Introducing 3D GIS for the Mobile Community - Technical Aspects in the Case of TellMaris, IMC Workshop on Assistance, Mobility, Applications, Rostock, Stuttgart. Van Driel N.J. (1989), Three dimensional display of geologic data, Three Dimensional Applications in Geographical Information Systems, CRC Press, pp. 1-9
.
Index
A Alain, P., 145 Albertovich, I.J., 77
L Leduc, T., 129 Lobo, V.J.A.S., 61
C Chechulin, A., 45 Claramunt, C., 161
M Miguet, F., 129
D Doynikova, E., 45 G Galiano, P., 119 Gold, C., 161 Goralski, R., 161 Gorricha, J.M.L., 61 H Hamfelt, A., 93
R Ray, C., 161 S Shilov, N., 21 Smirnov, A., 21 Smirnova, O.V., 155 Sorokin, R., 33 T Tang, T.-H., 107 Thierfelder, T., 93 Thill, J.-C., 3 Tourre, V., 129
J Jiang, B., 13
V Valkovsky, V., 93
K Karlsson, M., 93 Kashevnik, A., 21 Kharinov, M., 119 Kotenko, I., 45 Kuzenny, V., 119
W Wang, T.-Z., 107 Woloszyn, P., 129 Z Zhang, S.-J., 107
V. Popovich et al. (eds.), Information Fusion and Geographic Information Systems: Towards the Digital Ocean, Lecture Notes in Geoinformation and Cartography 5, DOI 10.1007/978-3-642-19766-6, # Springer-Verlag Berlin Heidelberg 2011
177