Societal Challenges and Geoinformatics
edited by A. Krishna Sinha Department of Geological Sciences Virginia Tech Blacksburg, Virginia 24061, USA and Adjunct Professor of Geology and Geoinformatics Department of Geological Sciences San Diego State University San Diego, California 92182 USA David Arctur Open Geospatial Consortium 35 Main Street, Suite 5 Wayland, Massachusetts 01778 USA Ian Jackson British Geological Survey Keyworth, Nottinghamshire NG12 5GG UK Linda C. Gundersen U.S. Geological Survey MS 911 National Center Reston, Virginia 20192 USA
Special Paper 482 3300 Penrose Place, P.O. Box 9140
Boulder, Colorado 80301-9140, USA
2011
Copyright © 2011, The Geological Society of America (GSA), Inc. All rights reserved. GSA grants permission to individual scientists to make unlimited photocopies of one or more items from this volume for noncommercial purposes advancing science or education, including classroom use. In addition, an author has the right to use his or her article or a portion of the article in a thesis or dissertation without requesting permission from GSA, provided the bibliographic citation and the GSA copyright credit line are given on the appropriate pages. For permission to make photocopies of any item in this volume for other noncommercial, nonprofit purposes, contact The Geological Society of America. Written permission is required from GSA for all other forms of capture or reproduction of any item in the volume including, but not limited to, all types of electronic or digital scanning or other digital or manual transformation of articles or any portion thereof, such as abstracts, into computer-readable and/or transmittable form for personal or corporate use, either noncommercial or commercial, for-profit or otherwise. Send permission requests to GSA Copyright Permissions, 3300 Penrose Place, P.O. Box 9140, Boulder, Colorado 80301-9140, USA. GSA provides this and other forums for the presentation of diverse opinions and positions by scientists worldwide, regardless of their race, citizenship, gender, religion, sexual orientation, or political viewpoint. Opinions presented in this publication do not reflect official positions of the Society. Copyright is not claimed on any material prepared wholly by government employees within the scope of their employment. Published by The Geological Society of America, Inc. 3300 Penrose Place, P.O. Box 9140, Boulder, Colorado 80301-9140, USA www.geosociety.org Printed in U.S.A. GSA Books Science Editors: Marion E. Bickford and Donald I. Siegel Library of Congress Cataloging-in-Publication Data Societal challenges and geoinformatics / edited by A. Krishna Sinha... [et. al.]. p. cm. — (Special Paper ; 482) Includes bibliographical references. ISBN 978-0-8137-2482-9 (pbk.) 1. Geographic information systems. 2. Geodatabases. I. Sinha, A. Krishna, 1941–. G70.212.S6295 2011 910.285—dc23 2011028777 Cover caption: The relevance of geoscience. Geological block diagram courtesy of Christopher Wardle, British Geological Survey.
10 9 8 7 6 5 4 3 2 1
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1. Integrating sensor data and geospatial tools to enhance real-time disaster management capabilities: Wildfire observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Vincent G. Ambrosia, Donald V. Sullivan, and Sally W. Buechel 2. Ontological relations and spatial reasoning in earth science ontologies . . . . . . . . . . . . . . . . . . . . 13 Hassan A. Babaie 3. Geoscience metadata—No pain, no gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Jeremy R.A. Giles 4. Geoscience data and derived spatial information: Societal impacts and benefits, and relevance to geological surveys and agencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 R.A. Hughes 5. Strategic Sustainability Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 B. Deal, E. Jenicek, W. Goran, N. Myers, and J. Fittipaldi 6.. Grid query optimization in the analysis of cone penetration testing data . . . . . . . . . . . . . . . . . . . 59 Patrick M. Dudas, Hassan A. Karimi, and Abdelmounaam Rezgui 7. The role and development of a persistent interoperability test bed for geosciences research . . . . 69 M.J. Jackson, G. Hobona, L. Bernard, J. Brauner, and C. Higgins 8. GEONETCast: Global satellite data dissemination and the technical and social challenges . . . . 77 George Jungbluth, Richard Fulton, Linda Moodie, Paul Seymour, Mike Williams, Lothar Wolf, and Jiashen Zhang 9. Developing and implementing international geoscience standards—A domestic perspective . . . 87 J.L. Laxton and T.R. Duffy 10. The need for ontologies: Bridging the barriers of terminology and data structure . . . . . . . . . . . . 99 Leo Obrst and Patrick Cassidy 11. Data provenance for preservation of digital geoscience data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Beth Plale, Bin Cao, Chathura Herath, and Yiming Sun 12. Theoretical foundations of the event bush method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Cyril A. Pshenichny and Oksana M. Kanzheleva 13. Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 A. Krishna Sinha 14. Global Map: International cooperation in the mapping sciences . . . . . . . . . . . . . . . . . . . . . . . . 183 D.R. Fraser Taylor iii
Preface
The fusion of informatics technologies with geoscience-based data and tools signals a necessary change in the way we manage the future of our science. It has become abundantly clear that for data to be useful, it must exist without borders and allow scientists, educators, and decision makers to use it freely and easily. Although the goal appears to be simple, it is very complex in detail, and this volume is dedicated to the broader community who wish to participate in translating data into knowledge. This transformation will enable all of us who are interested in geoscience-based solutions to address significant challenges facing society, such as sustainability of resources, urbanization, and climate change. In a recent report by the The Wall Street Journal (January 8–9, 2011), the CEO of General Motors, Dan Akerson, was quoted as saying, “GM has to start acting like a consumer-driven, not engineering-driven company.” Geoscience is no different; we have to make our science societally relevant and user friendly, and not be driven solely by technology. Therefore, geoinformatics can be considered as an agent for making our data and products useful to the public at large. Contributors to this volume are recognized authorities in facilitating informatics-based solutions to global challenges, and are committed to expanding the role of geosciences by translating data into knowledge. The chapters in this volume cover a broad spectrum of research themes (presented alphabetically by primary author’s last name), and provide the latest thinking that will influence ongoing and future research in the emerging science of geoinformatics. Fourteen research papers, co-authored by thirty-eight researchers from both geoscience and computer sciences, cover a spectrum of topics, which range from integrating sensor and satellite data to the need for interoperability through test beds and semantics. Other research topics addressed include strategic sustainability, international standards and collaborations, metadata, provenance, query optimization, and a discussion of the event bush method. This vast array of topics has one common theme—facilitating the use of data and tools for the geoinformatics community to act as first responders to societal challenges. This book follows an earlier publication Geoinformatics: Data to Knowledge (Special Paper 397) by the Geological Society of America in 2006, and is a testimony to GSA for its leadership role in supporting geoinformatics. A. Krishna Sinha, Senior Volume Editor
v
The Geological Society of America Special Paper 482 2011
Integrating sensor data and geospatial tools to enhance real-time disaster management capabilities: Wildfire observations Vincent G. Ambrosia California State University–Monterey Bay, Seaside, California 93955, and National Aeronautics and Space Administration (NASA) Ames Research Center, Moffett Field, California 94035, USA Donald V. Sullivan Sally W. Buechel NASA Ames Research Center, Moffett Field, California 94035, USA
ABSTRACT The primary factors needed to manage disaster events are time-critical geospatial information on the event occurrence and presentation of that information in an easily manageable, collaborative/interactive geospatial decision-support and visualization environment. In this chapter, we describe the development, integration, and use of an unmanned airborne system (UAS), a multispectral sensor with autonomous onboard processing capabilities, a data distribution system, and geospatial processes to deliver real-time information to emergency incident management teams facing wildland fires. The unique integration of the described tools has contributed to an order of magnitude decrease in the delivery time of critical geospatial information to disaster managers. The UAS wildfire imaging campaigns in the western United States in 2007 and 2008 are briefly described in the context of real-world adaptation and utility of the resultant information improvements. These capabilities have farreaching applications to other time-critical, disaster event management scenarios, and they are being expanded to further utilize various UAS platforms and other airborne sensor system data. This chapter will also describe the resultant integration issues faced and the solutions for ubiquitous adaptation of many of these processes in future UAS missions.
INTRODUCTION
The severe fire season transitions into the Southwest, north through the Rockies, west to the Pacific Northwest, and then south through California, culminating in late fall–early winter fire events in Southern California. This extended fire season taxes the resources of local, state, and federal agencies mandated with monitoring and mitigating these destructive events. During the
Large-scale wildfires occur frequently throughout the United States every year. The fire season begins in late winter in the Southeast United States (initially in Florida) and transitions both northward into the Appalachian states and westward.
Ambrosia, V.G., Sullivan, D.V., and Buechel, S.W., 2011, Integrating sensor data and geospatial tools to enhance real-time disaster management capabilities: Wildfire observations, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 1–12, doi:10.1130/2011.2482(01). For permission to copy, contact
[email protected]. © 2011 The Geological Society of America. All rights reserved.
1
2
Ambrosia et al.
extensive burning seasons throughout the United States, piloted aircraft employing thermal imaging systems are deployed and/or contracted by some state agencies and by the National Interagency Fire Center (NIFC; Boise, Idaho) to collect large volumes of data over fires spread from Canada to Mexico, and from the Rocky Mountain Front to the Pacific Ocean. Large fires such as the Yellowstone conflagration in 1988; the Cerro Grande, New Mexico, fire in 2000; the western Montana fires of 2000; the Colorado fires of 2002; the San Diego, California, fires of 2003; the Southern California fires of 2007; and the Northern California fires of 2008 burdened these remote-sensing data gathering crews and taxed both the resources and the stamina of these personnel. Goals Currently, geospatial information derived from airborne systems can require a few hours for processing, georectification, use, and integration by field personnel. Those processes can be streamlined and automated to provide near-real-time contextual geospatial information to mitigation managers on disaster events. The goals for integrating the unmanned airborne system (UAS) platform, payload, data telemetry, and georectification capabilities were to significantly improve the timeliness of the data stream for utility in fire mapping and disaster mitigation, and to provide improved and more accurate information on fire conditions than is currently realized. This chapter describes the processes that allow geospatial data development and use within 15 min of acquisition by the Autonomous Modular Scanner (AMS)– Wildfire sensor on the National Aeronautics and Space Administration (NASA) UAS platform. SYSTEMS AND TECHNOLOGY INTEGRATION Between 2006 and 2008, NASA and the U.S. Forest Service (USFS) managed a series of unmanned airborne sensor missions that showcased five major integrated technologies for improving the timeliness and utility of wildfire geospatial information. Those five technologies were: (1) use of a long-duration UAS as a sensor platform; (2) development of improved wildfire sensor systems for UAS platform operations; (3) autonomous onboard data processing capabilities; (4) enhanced geospatial tools; and (5) real-time data distribution capabilities. These five technologies, assets, and tools are described in the following sections. A schematic showing the data collection, information processing, and data distribution process is shown in Figure 1. Each of the elements shown in that schematic will be further detailed in this chapter. Unmanned Airborne System—NASA Ikhana The NASA Ikhana UAS is a modified General Atomics– ASI, Inc., Predator-B (MQ-9) unmanned aerial vehicle (UAV, and
it entered NASA service in January 2007 to support earth science and aeronautics research (Fig. 2). “Ikhana” is a Native American Choctaw word meaning intelligence, conscious, or aware. The name is descriptive of the research goals NASA has established for the aircraft and its related systems. The Ikhana UAS consists of the Ikhana aircraft, a ground control station, ground support equipment, and ground communications systems. The Ikhana is remotely controlled by a pilot on the ground seated at a console located in the ground control station. The sensor system operator, seated at a console located in the ground control station, can remotely control the AMS-Wildfire sensor payload carried aloft by the Ikhana. The Ikhana home base is the NASA Dryden Flight Research Center (DFRC) at Edwards Air Force Base (EAFB), California. The Ikhana is capable of ~24 h duration, ~13,720 m (45,000 feet) altitude, and flight legs of over 7408 km (4000 nautical miles) (Table 1). The Ikhana flew its first science missions in support of wildfire observations in August 2007. Payloads can be flown in the nose compartment of the Ikhana or in pods at various wing-mount locations on the aircraft. The imaging payload instrument used for the wildfire imaging missions was mounted in an instrument payload pod that was attached under the wing. This configuration allowed quick instrument access and the ability to “swap-out” the sensor pod rapidly for necessary maintenance or mission reconfiguration. All necessary electronic cabling was designed to provide power from the Ikhana to the payload in the wing-pod, as well as to provide the necessary cabling to connect the payload instrument, processors, and interface to the aircraft data telemetry system, described in the next section. Aircraft and Data Telemetry on UAS There are two kinds of ground communications to the aircraft: line-of-sight (LOS) and satellite over-the-horizon (OTH) systems. A portable ground data terminal provides command and control and payload uplink/downlink when the aircraft is within radio line-of-sight (~130 km or 70 nautical miles). The satellite communications system provides the over-the-horizon uplink and downlink to the ground control station. Aircraft and telemetry data are downlinked to the ground control station for display on the payload operator and user consoles. The OTH data telemetry system in the NASA Ikhana UAS is used for bidirectional command and control of the UAS as well as for bidirectional control of the AMS-Wildfire sensor system. The Ikhana telemetry is accomplished through a Ku-band frequency, commercial service provider, geosynchronous satellite platform. The system has a data bandwidth capacity of 3.0 megabits per second (Mbs), where 1.0 Mbs are used for data transmission, and 2.0 Mbs are used for video data transmission. This telemetry link allows imagery and level II data products from the AMS sensor, developed on the payload processors, to be sent from the UAS to the ground control station and then redistributed through the Internet to the community. The sensor payload system, which provides the imagery and processed information, is described in the following section.
Integrating sensor data and geospatial tools for disaster management
Airborne Element AMS
POS/AV
Scanner
IMU/DGPS
• Automated Data Capture (ADC) • NAV Data MUX
Ground Element Internet End User
Full Res. 200 Hz Data (RS-232)
Digitizer g Functions:
Web Server
Functions:
• Collaborative Decision Environment • Image Server: Level-2 GEOTIFS & Science Products Track Maps
Shared Storage
Payload Computer Functions:
Shared Storage
Sat Com Link 3 Mbs Ku (Full Res. Imagery)
• Automated Image Geocorrection • Level-2 Algorithm Processing • On-Demand Image Subsetting
DEM Database
3
9.6 Kbs Sensor C&C, IMM)
Ground un Computer Functions: • IMM Interface • Query Handling • Sensor C&C
Product QA/QC
Figure 1. Autonomous Modular Scanner (AMS)–Wildfire sensor image data collection/distribution architecture. The Applanix Position and Orientation System for Airborne Vehicles (POS/AV) combines data with the onboard inertial measurement unit (IMU)/ differential global positioning system (DGPS) to provide sensor/platform geo-location information, combined with the AMS data to the digitizer to generate an automated data capture (ADC) set and a navigation (NAV) data multiplexer (MUX). Data are transferred then to the payload computer, transmitted through a satellite communications (Sat Com) telemetry link to a ground computer where sensor command and control (C&C) capabilities reside (to monitor sensor functionality). At the ground station an intelligent mission management (IMM) process agent resides, allowing the real-time information sharing of sensor system performance characteristics to dictate flight/data collection modifications to be performed in real time. Data are then transferred to a server, allowing access via web services to the image data, which are formatted into public domain metadata standard georeferencing information embedded within a TIFF image file (GEOTIF), for ease of ingestion into GIS systems. QA/QC—quality assurance/quality control.
Figure 2. The NASA Ikhana UAS platform. The sensor is carried in a pod located under the wing, as can be seen in this image. The aircraft size and specifications can be derived from Table 1. The payload pod housing the AMS instrument is approximately 2.5 m long.
4
Ambrosia et al. TABLE 1. NATIONAL AERONAUTICS AND SPACE ADMINISTRATION IKHANA UNMANNED AIRBORNE SYSTEM SPECIFICATIONS Length 11 m (36 ft ) Wing span 20.11 m (66 ft) Maximum take-off weight 4773 kg (10,500 lb) 1089 kg (2400 lb) of instruments Payload Range >7408 km (4000 nautical miles) Speed 170–200 knot s Operational altitudes Up to 13,720 m (45,000 ft) Maximum endurance ~24 h Internal payload bay Yes External payload mounts Yes Electrical power 4.9 kW @ sea level; 2.8 kW @ altitude
Autonomous Modular Scanner (AMS) Sensor The AMS-Wildfire sensor, developed at the NASA Ames Research Center, is an airborne multispectral imaging line scanner capable of high-altitude autonomous operations on both manned and unmanned aircraft. The sensor is a highly modified Daedalus AADS-1268 scanning system that has interchangeable optical components (primary apertures), and it can support pixel size resolutions of 1.25 mrad and 2.5 mrad. The swath width is always 716 pixels across, giving total angular widths of roughly 43° or 86°, respectively, and scan rates are continuously adjustable from 2 scans/s to 33 scans/s, which allow operations through a wide range of altitudes and aircraft speeds. Spatial resolution is determined by altitude and the primary aperture size (1.25 mrad or 2.5 mrad). For the wildfire missions flown in 2007 and 2008, the Ikhana operated at a nominal altitude of 7011 km (23,000 feet) above mean sea level (amsl) (~20,000 feet above ground level [AGL]), while the AMS-Wildfire instrument was configured with an aperture size of 2.5 mrad, which provided a pixel spatial resolution of ~15 m (50 ft). The system is configured with sixteen (16) discrete spectral channels, ranging from the visible through shortwave-, mid-, and thermal-infrared (VIS-IR-TIR) (Table 2). The TIR channels are calibrated for accurate (~0.5 °C) temperature discrimination of hot targets, up to ~850 K. The TIR channels simulate those found on the proposed National Polar-Orbiting Operational Environmental Satellite System (NPOESS) visible/ infrared imager/radiometer suite (VIIRS) instrument (channels M12 and M15). Because the AMS line scanner collects a series of scan lines over a wildfire event, the raw spectral data are sent to a computer processor onboard the platform to further process the data into useful information data sets for delivery to a telemetry system and distribution to receiving nodes on the ground. The onboard autonomous data processing is described in the following section. Onboard Autonomous Data Processing The onboard data processing system was designed to complete the acquisition, preprocessing, information extraction, and output product generation from the raw spectral data collected by
TABLE 2. AUTONOMOUS MODULAR SCANNER– WILDFIRE 12 CHANNEL SCANNER SPECIFICATIONS Spectral band Wavelength (µm) 1 0.42–0.45 2 0.45–0.52 (TM1) 3 0.52–0.60 (TM2) 4 0.60–0.62 5 0.63–0.69 (TM3) 6 0.69–0.75 7 0.76–0.90 (TM4) 8 0.91–1.05 9 1.55–1.75 (TM5) high gain 10 2.08–2.35 (TM7) high gain 11 3.60–3.79 (VIIRS* M12) high gain 12 10.26–11.26 (VIIRS M15) high gain 13 1.55–1.75 (TM5) low gain 14 2.08–2.35 (TM7) low gain 15 3.60–3.79 (VIIRS M12) low gain 16 10.26–11.26 (VIIRS M15) low gain Note: Total field of view is 42.5° or 85.9° (selectable). Instantaneous field of views is 1.25 mrad or 2.5 mrad (selectable). Spatial resolution is 3–50 m (variable based on altitude). *VIIRS—visible/infrared imager/radiometer suite.
the AMS-Wildfire sensor system (Fig. 3). The processing chain is initiated based upon a single-step “acquisition request” from the data system operator. The remaining processing steps are autonomous, based on that single request for a data acquisition. This request includes, among other options, specifying the band selection for the three-band visual product to deliver the desired output resolution (optional), and the algorithm selection to apply to the data stream (fire detection, burn severity index, etc.). Based on the request received, the processing system selects the necessary AMS-Wildfire spectral channels from the continuous full-resolution sensor data stream, performs the conversion to temperature/radiance as appropriate, applies the requested algorithm, extracts the resulting information in vector form if appropriate, and creates a georectified visual raster product. The vector and raster products are produced on the link-module computer processor and transmitted via the Ikhana telemetry antenna
Integrating sensor data and geospatial tools for disaster management
5
POS-AV Circular Buffer All Channels
LN-100 A to D
Sensor UDP Control
AMS
Request
Flight Direction
an Sc
CDE
Link Module Shapefile Extraction
ne Li
9.6kbs ppp
Map Server
Ground Station
Optional Jpg2000 Compression
Ku - Link
Figure 3. Schematic of the on-board autonomous processing architecture. The Applanix Position and Orientation System for Airborne Vehicles (POS/AV), and the AMS data are sent through an analog-to-digital (A to D) converter and then to the sensor controller, then via a user datagram transport protocol (UDP) to the Link Module computer for on-board algorithm and geo-processing of data. Operations processing command exchanges between the on-board Link Module and the ground station are sent via a 9.6 kbs ppp (point-to-point protocol) data link protocol. The processed data products are sent to the same ground station via a wide-band Ku frequency satellite communications link, then are available on the ground for ingestion into mapping or web mapping services.
through the commercial Ku-band SatCom system to the ground station. The steps of the autonomous processing are detailed in the following subsections. Preprocessing Eighteen channels (16 spectral and 2 metadata channels) of the AMS-Wildfire sensor are received continuously on the linkmodule. The selected raw digital data counts are converted to apparent radiance for visible and near-infrared wavelength channels, and brightness temperature for the thermal channels. The most recent sensor calibration information (from the NASA Ames sensor spectral calibration facility) is employed to derive apparent radiance, in addition to solar elevation angle and relative solar/sensor azimuth information, determined from knowledge of the current time and position of the sensor platform during acquisition. Two on-sensor black-body calibration reference source temperature readings provide a linear digital count-to-
radiance conversion, which is then used in an approximate inverse Planck’s equation to produce a brightness temperature for each pixel in the thermal channels. This onboard preprocessing calibration step allows data to be spectrally and thermally consistent from mission to mission. Extracting Information—Autonomous Algorithm Processing Fire hot-spot detection algorithm. The onboard processing system for the AMS sensor was designed to support a variety of data manipulation algorithms to utilize the multispectral range (16 channels) of the sensor. For fire hot-spot detection, a multichannel temperature threshold algorithm, based on that developed by the Canadian Center for Remote Sensing (CCRS), was implemented (Li et al., 2000a, 2000b; Flasse and Ceccato, 1996; Cahoon et al., 1992). The CCRS algorithm is similar to other fire-detection algorithms used on various satellite-derived thermal data, including the Moderate-Resolution
6
Ambrosia et al.
Imaging Spectroradiometer (MODIS) fire-detection algorithm. The CCRS algorithm was originally developed for use with satellite (AVHRR) imagery (Li et al., 2000b), but it has been adapted for use on various airborne sensor systems, including the AMSWildfire sensor. The fire hot-spot detection algorithm uses the 3.6 μm channel of the AMS-Wildfire sensor to define fire via a temperature threshold, and two or more additional channels to further refine this classification. Multichannel thresholds take advantage of particular fire and nonfire target characteristics to remove fire commission errors encountered when using a single midwave thermal-infrared channel–derived temperature value alone. For example, restricting the reflectance value of a selected nearinfrared channel helps eliminate sun glint, a common cause of fire commission error. The threshold values used in the algorithm (AMS channels 11 and 12 and, for daytime missions, channel 7; see Table 2) are parameters that can be variably set by the operator during a mission. The current fire hot-spot detection algorithm is calculated as: If Band 11 (3.60–3.79 µm) > Band 11 minimum temperature (e.g., 360 K) and Band 12 (10.26–11.26 µm) > Band 12 minimum temperature (e.g., 290 K) and Band 11 – Band 12 > Difference minimum (e.g., 14 K), And (if available) Band 7 (0.76–0.90 µm) < Reflectance maximum (e.g., 0.4) (to screen high-reflectance commission errors), Then pixel is classified as a fire hotspot. A vector data set outlining the boundaries of adjacent “fire hotspot–detected” pixels is provided as an alternative or additional data product. This vector file, in ESRI shapefile format, is processed through the same telemetry system and delivered along with the raster product for display in geographic information system (GIS) packages or other Web-map services. An example of the hot-spot detection vector file data is shown in Figure 4. Normalized burn ratio (NBR). A normalized burn ratio (NBR) index algorithm option was also implemented for use as a postfire vegetation rehabilitation assessment tool. The NBR utilizes two spectral channels from the AMS-Wildfire sensor: band 7 (0.76–0.90 µm) and band 10 (2.08–2.35 µm). These two bands provide the best contrast between photosynthetically healthy and burned vegetation (Howard et al., 2002). The normalized burn ratio index algorithm is calculated as: Band 7 – Band 10 , Band 7 + Band 10 where band numbers refer to the AMS-Wildfire channels (see Table 2 for band characteristics). The NBR is usually determined for both prefire (satelliteprovided) and postfire scenes, and it is useful for accomplishing postfire assessments of burn severity conditions to assist in reme-
Figure 4. AMS-Wildfire scanner data collected 19 July 2008 over the Canyon Complex fire, northern California. The fire hot-spot detection algorithm data (polygons in yellow) overlain on Google Earth terrain indicate the fire extends beyond the most recent defined fire perimeters (dark area polygons). The blue line running NE to SW is the real-time flight track of the Ikhana UAS during the data collection effort. This scene covers an approximate area of 11 × 11 km2 and is centered at approximately 39°44′50″ N, 121°12′12″ W.
diation activities. Since the AMS is calibrated, the AMS-acquired postfire NBR values can theoretically be compared to preburn satellite data (with the same spectral channels, such as Landsat Thematic Mapper) to derive a differenced normalized burn ratio (dNBR), where the difference is determined by subtraction (preNBR–post-NBR) to give dNBR. Currently, any number of arithmetic algorithms requiring up to five channels of the AMS-Wildfire data set can be easily added to the system to provide autonomous level II product generation for distribution through the telemetry link to the user community on the ground. The system is dynamic and can accommodate additional processes as warranted. Georectification Inertial measurement unit (IMU). The fully automated georectification processing utilizes metadata from an Applanix Position and Orientation System for Airborne Vehicles (POS AV) model 310 system. The POS AV-310 integrates precision global positioning satellite (GPS) data with inertial technology to provide real-time and postprocessed (POSPac) measurements of the position, roll, pitch, and heading of airborne sensors. The Applanix POS AV-310 system, with the inertial measurement unit (IMU) mounted on the sensor scan head, combines differential GPS output (via the Omnistar wide-area differential GPS service) with the IMU data to provide accurate platform position/attitude information at the time a scan line is collected. The POS-AV-310 roll, pitch, heading, altitude,
Integrating sensor data and geospatial tools for disaster management and platform position are recorded with each scan line from the full 100 Hz Applanix output and embedded in the sensor data stream as they are collected during a mission. Photogrammetric projective transformation equations are used to determine the position of each pixel in the scan line as projected to the ground, with “ground” being determined by the onboard digital elevation model (DEM) data for the area being overflown. The AMS data pixels are resampled to a requestor-specified map resolution and requested band order (for visual products) to produce a particular output product. For efficiency, data are processed in “frames” of acquired flight-line transects (1200 lines of scanner data), which, as GeoTIFF formatted files, combine to produce fire mosaics when displayed in various GIS. An example of a real-time developed and processed AMS-Wildfire image frame mosaic for the 2008 Basin (California) wildfire is shown in Figure 5. Digital elevation model data (DEM). A digital elevation model (DEM) is served on the onboard processor and consists of a composite data set of 1 arc-s Shuttle Radar Topographic Mission (SRTM) elevation “tiles,” which are turned into a mosaic in real time as needed, creating a seamless DEM for the entire western United States, where the majority of the missions are flown. The SRTM data are at 30 m postings (spatial resolution) (U.S. Geological Survey, 2008). The SRTM DEM data are used to define the geospatial context (latitude, longitude, elevation) reference for georectification of the sensor line-scanner data. Each of the AMS-Wildfire data pixels are georeferenced based upon the relationship between the location and attitude of the sensor platform (which defines the pointing vector of the line-scanner pixel at acquisition time) and the latitude, longitude, and elevation of the terrain (from the SRTM data). The geometric accuracy of the real-time georeferenced final products has not yet been rigorously assessed. The position and attitude data when used with only the “forward” solution are inherently less accurate than when postprocessed. In eight flight lines of repetitive overflights from a 2006 wildfire mission data set, a single overflight in 2007 and 2008, and two calibration flights, the real-time (postprocessed) georeferencing accuracies had an average root-mean-square (RMS) error of 2.5 pixels (R. Dominguez, 2007, personal commun.). These errors include those related to any fixed misalignment between the sensor and the Applanix navigation system. The authors found similar georeferencing positional RMS errors on onboard real-time UAS AMS-Wildfire–acquired data sets collected in 2007 as well. Improvements to these results can be made. The relative alignments of the sensor/IMU and GPS antennae must be reassessed with every new system reinstallation, since very small orientation discrepancies will reduce the precision of the georeferencing. Improvements may also be achieved with an increase in the DEM resolution. The 30 m posting SRTM DEM product used for this study is coarser than desirable for georectification with our higher-spatial-resolution AMS-Wildfire instrument data. The onboard product generation, algorithm processes, and georectification processes were developed with automation and
7
Figure 5. AMS-Wildfire sensor real-time processed image frame mosaic for the Basin Fire, Big Sur, California, collected on 8 July 2008. Data was processed and mosaiced in “real-time” from 5 flight lines of frame data. The fire hot-spot detection algorithm shape-file data are shown draped on the AMS-Wildfire sensor 3-channel composite image. This scene covers an approximate area of 8122 sq. km (56 × 56 sq. mi.) and is centered at approximately 36°13′40″ N, 121°30′00″ W, immediately south of Monterey, California.
near-real-time delivery of information as critical objectives. The complete image processing time onboard the UAS takes ~10 s (0.1 min) per image-file frame (1200 lines of AMS-Wildfire spectral data). With the additional data transmission time (via satellite telemetry) and ground-based quality control assessment, the total process time (to final delivery to a server for Internet distribution) still falls well within the 15 min defined as a metric for near-real-time data delivery. Automation can be extended by increasing the system’s usable knowledge base (i.e., day/night awareness to autonomously employ the correct detection algorithm, spatial concept of a fire, etc.) to further remove the sensor data engineer from the process. The resultant geospatially registered imagery is transmitted from the aircraft through a satellite communications system down to the ground control station and is then made available to the community for visualization in a multitude of GIS systems or data visualization software. Those capabilities are defined in the following sections. Real-Time Data Distribution and Geospatial Tools The level II georectified data sets and imagery are sent from the onboard link module to the Ku-band telemetry satellite communications system (described in “Aircraft and Data Telemetry on UAS” section). The GeoTIFF files have moderate file sizes (1–3 Mb per frame), allowing for minimal transmittance time
8
Ambrosia et al.
through the telemetry link to the ground control station, where they are then sent to servers at NASA Ames for redistribution through the Internet. The geospatial processing services for real-time AMSWildfire–derived data products were implemented utilizing open standards promulgated primarily by the Open Geospatial Consortium (OGC). The services utilized included: (1) sensor planning service (SPS), (2) Web notification service (WNS), (3) Web map service (WMS), (4) Web coverage service (WCS), and (5) Web feature service (WFS). Operationally, an incident responder (fire management team) would register an image acquisition request with the SPS and request a notification of acquisition once it has occurred, via a range of Web notification services. WNS notification mechanisms include e-mail and instant messaging (IM). This notification service allows interested requestors to remain current on new acquisitions and to be notified when those data elements are available for viewing or download. The common alerting protocol (CAP), a format codeveloped by the Federal Emergency Management Administration (FEMA) and the State of California Governor’s Office of Emergency Services (CA-OES), describes the message content. Items included in the CAP message are network pointers to the various access mechanisms of the requested image. There are five types of pointers to the image data: (1) a pointer to the original spectral data available via anonymous file transfer protocol (FTP); (2) a pointer to the data via an OGC-compliant WMS, used by GIS clients (such as ESRI ARC users); (3) a pointer to the data via an OGC-compliant WCS, used primarily by other processing services, including fire and smoke modeling teams; (4) a pointer to a Keyhole markup language (kml) file, used primarily by Google Earth clients; and (5) a pointer to a thumbnail-sized version of the file for quick-look viewing of the data. Web services utilized include an SPS where tasking requests can be submitted, a sensor alert service to alert the SPS of new data, a Web map service, and a Web coverage service, both of which are automatically updated. After the Web services are updated, an SAS alert is sent to the requestor in the format requested (e-mail or IM). The requestor then can retrieve the original data directly for ingest into a desktop GIS, access the WMS or WCS via a desktop GIS, or visualize it using any standard Web browser or Google Earth. The project team developed a visualization capability utilizing Google Earth as a “front-end” for real-time image viewing of the kml files as they were automatically generated. A Google Earth tool, the Collaborative Decision Environment (CDE), was developed to quickly simplify three-dimensional visualization of the wildfire imagery and to allow the additional visualization of pertinent fire information data layers such as fire weather, etc. (Fig. 6). Incident command teams were provided linkages to the
CDE (served at NASA Ames Research Center) through a network link to the data “mash-up” service. The CDE was used extensively, as was access to the various WMS data-formatted holdings. The team also developed an enhanced browser-based viewer that allowed fire images and flight tracks to be displayed on a user-selectable base-map layer, if they preferred to use non-kml-formatted data viewers. Therefore, in addition to the CDE kml-formatted data, the AMS imagery and shape files were available as layers in road, satellite, and hybrid maps from Yahoo, Google, OpenStreetMap, and Microsoft’s “Bing” base maps (Fig. 7). CAPABILITY DEMONSTRATIONS, 2006–2008 During the Western States Fire Missions (WSFM) in 2006– 2008, the aforementioned tools were matured and integrated together to develop real-time (under 15 min) geospatial wildfire imagery and information for use by incident command teams. Next, the 2006–2008 Western States Fire Missions are briefly summarized to provide context for employment of the integrated tools during operational missions. 2006 Mission Series In the fall of 2006, the Western States Fire Mission Series was initiated following receipt of the Federal Aviation Authority (FAA) Certificate of Authorization (COA) for limited flight opportunities in the National Airspace System (NAS). On 24 October, the Ikhana UAS predecessor (Altair), with the AMSWildfire instrument onboard, was allowed to fly into the National Airspace System over a small series of controlled burns on the eastern flanks of the Sierra Nevada. The mission demonstrated long-duration flight profiles and was the first NAS operation for the project. The Altair UAS subsequently supported emergency fire imaging on 28 October over the Esperanza Fire in Southern California, providing real-time information on fire location and progression to the on-site incident command team. The emergency COA provided unprecedented access for the UAS to the highly populated Los Angeles Basin NAS to support fire datacollection activities. The Esperanza Fire had the California governor’s “State Emergency Declaration” status, allowing the FAA to respond quickly to modify the area of operations and flight conditions for the UAS. During that 16 h mission, the AMSWildfire scanner system provided multiple image data sets of the fire progression. Following the Esperanza fire emergency support mission, the 2006 mission series ended. In 2006, a total of ~40 h of operations were run with AMS-Wildfire scanner data collected over two burn complexes. Western States Fire Mission, 2007 The 2007 mission series were the first flights of the new NASA Ikhana UAS, which was delivered in January 2007. The 2007 Western States Fire Mission series was initiated in August
Integrating sensor data and geospatial tools for disaster management
9
Ikhana UAS Google Earth
Weather NWS, NRL, MIT
MODIS
Internet
CDE Visualization in Google Earth
USFS, NASA/UMD
Fire Incidents NIFC, USFS, USGS
Airspace Restrictions
Satellite Track Prediction
FAA, NASA
Streaming Video
Group IM
Figure 6. Components of the Collaborative Decision Environment (CDE). The visualization element of the CDE employs Google Earth. The critical fire data elements (left side) that compose the additional visualization components are a “mash-up” of data from various web-served data locations, including those from the National Weather Service (NWS), Naval Research Lab (NRL), Massachusetts Institute of Technology (MIT), U.S. Forest Service (USFS), University of Maryland (UMD), the National Interagency Fire Center (NIFC), the U.S. Geological Survey (USGS), the Federal Aviation Administration (FAA), and others. The CDE also allows integration of instant messaging (IM) and provision of streaming video data from the acquiring UAS platform, in addition to the 3-D visualization of the AMS-Wildfire sensor–acquired data.
following the allowance of the Ikhana COA from the FAA. In total, eight fire data-collection missions occurred during the fire season in 2007. The first four missions demonstrated longduration and long-range capabilities, with data collection over fires located in eight western states. Mission operations had 10 and 22 h duration and 2593–5926 km (1400–3200 nautical miles) mission ranges. During those first four flights, 27 fires were overflown and imaged with real-time geospatial fire data relayed to incident command centers (ICC). To assist in information integration, Wildfire Research and Applications Partnership (WRAP) team members were embedded at various ICCs. Southern California Firestorm Support Missions, October 2007 In late October 2007, a series of Santa Ana wind–driven fires erupted in the Los Angeles and San Diego regions of Southern California. Over 11 major fires were burning in the
Southern California area. The NASA Ikhana AMS sensor supported the fire management teams on those wildfires with provision of real-time hot-spot detection and postfire assessment imagery. On 23 October, the team requested and received an emergency COA modification from the FAA to facilitate operations in the affected regions. On 24 October, the Ikhana UAS with the AMS-Wildfire sensor flew the first of four missions over 11 major wildfire complexes in the region. Flight endurance each day was between 7 and 9 h with ~2500 km (1350 nm) mission ranges. Many of the fires were imaged twice a day to provide up-to-date fire progression information. Team members were again embedded in various ICC and county-level Emergency Operations Centers (EOC). Fire information from the AMS-Wildfire was delivered to the ICs and EOCs as well as to national operations centers, including NIFC and Department of Homeland Security (DHS). A summary of the 2007 missions is shown in Table 3.
10
Ambrosia et al. Google Earth Data Processing • KML Conversion • WMS • CAP Notification
Payload GCS Publish Web Server
Internet
WMS
ArcGIS @ IC
FTP Server
FTP Mirror Site Figure 7. Data collection and distribution of sensor data, illustrating the web services employed in the generation and distribution of the AMS sensor data sets. The AMS-derived products, generated onboard the Ikhana UAS, transmitted to the Payload Ground Control System (GCS), and then to the NASA FTP (file transfer protocol) server are processed into a variety of product formats including Keyhole Markup Language (KML) Google Earth–enabled formats, Web Map Service (WMS) format, and as a common alerting protocol (CAP) notification. Data investigators can also access AMS-derived product data (fire hot-spot shape files) from the NASA FTP server (or a mirrored site at the U.S. Forest Service) using geographic information systems (GIS) software, such as ESRI’s ArcGIS package.
TABLE 3. 2007 WESTERN STATES FIRE MISSION SUMMARY Flight date Duration Fires flown Mileage (h) (km) 16 Aug. 10 4 2993 (1400 nm) 29 Aug. 16.1 7 4630 (2500 nm) 7 Sept. 20 12 5926 (3200 nm) 27 Sept. 9.9 4 3333 (1800 nm) 24 Oct. 9 11 2500 (1350 nm) 25 Oct. 8.7 11 2500 (1350 nm) 26 Oct. 7.8 11 2500 (1350 nm) 28 Oct. 7.1 11 2500 (1350 nm)
Western States Fire Mission, 2008 In late June 2008, lightning ignited hundreds of fires in Northern California. When the California governor declared a State of Emergency, the WRAP team was requested to support state and federal firefighting efforts. The WRAP team requested an emergency COA amendment from the FAA to allow flight operations over a vast region of the state, which was granted on 25 June. Since the Ikhana and the AMS-Wildfire sensor had been undergoing routine maintenance and preparation for a much later seasonal mission, the first flights over the Northern California did not occur until July 8. Subsequently, four total missions were
TABLE 4. 2008 WESTERN STATES FIRE MISSION SUMMARY Flight date Duration Fires flown Mileage (h) (km) 8 July 9.5 9 2593 (1400 nm) 19 July 5.0 4 1852 (1000 nm) 17 Sept. 3 1 1482 (800 nm) 19 Sept. 3.5 2 1482 (800 nm)
flown in 2008 through the conclusion of the fire season (Table 4). Each of the mission data-collection efforts focused on providing real-time fire information to the various ICC and well as to the State Operations Center, and the Multi-Agency Coordination Center, where data were integrated into the wildfire management decision process. The 2008 missions showcased new imaging and software enhancements, including the delivery of postfire NBR data sets to incident teams. Following the Northern California firestorms, the remainder of the western U.S. experienced a “light” fire season, which allowed the WRAP project team to “stand-down” capabilities for the majority of August, September, and October. In October 2008, the WRAP team discontinued demonstration and development missions, due to the official “close” of the western U.S. fire season.
Integrating sensor data and geospatial tools for disaster management FINDINGS AND INTEGRATION ISSUES The demonstration and emergency support missions flown by the NASA-USFS research team demonstrated autonomous geospatial data collection, processing, and delivery within 15 min of acquisition by the AMS sensor on the NASA UAS (Ikhana) platform. This time period was well within the metrics established at the initiation of our project flights in 2006 (1 h delivery metric). The information and data product delivery time were also an order of magnitude faster than those methods previously employed for fire detection and mapping. Most of the major integration issues with adaptation of these capabilities for operational utility are related to component costs. UAS platforms, such as the NASA Ikhana, are expensive compared to manned aircraft platforms, but in some cases they operate at much greater range, altitudes, duration, and with improved flight characteristics. One of the major UAS operational integration issues is related to the current regulatory framework for operations of UAS in the NAS and the restrictions imposed on those operations. Routine UAS flights in the NAS are probably over 5 yr away from reality. The national disaster management communities may see some relief from those restrictions, now that it has been shown that UAS capabilities are essential for supporting observation and monitoring strategies over rapidly evolving events, such as wildfires. The sensor system described in this chapter (AMS-Wildfire) is a “one-of-a-kind” scientific instrument development. All the components of the instrument are commercial-off-the-shelf technologies, and there is interest in private industry (sensor development industry) to develop duplicate and “next-generation” sensor systems. The components that compose the satellite telemetry system are ubiquitous and are being integrated on various manned and unmanned platforms, although their costs are still high. The autonomous real-time georectification processing is modeled after various photogrammetric processes used to postmission-correct aerial photography, and it is being evolved further as a result of a Small Business Innovative Research grant to Terra-Mar, Inc., to develop fast emergency response georectification procedures. During our research, the project team matured and evolved those processes to operate autonomously on data sets collected from a remotely operated sensor system. The real-time fire hot-spot detection and burn assessment algorithms are based on calculations developed for use with various satellite imaging systems, including those developed for the Advanced Very High-Resolution Radiometer and the ModerateResolution Imaging Spectroradiometer. We have shown that those same algorithms can be operated in the onboard processing chain, allowing for level II data products to be produced in near real time. This process is more time efficient than delivering sensor level I data to the ground and then manually applying those algorithms on that data into level II products. The capacity for providing custom-developed data products direct from an acquiring sensor is limitless. This will have a significant impact on time savings and data utility, especially for the disaster man-
11
agement community, who rely on rapid decision-making information data provision. The “sensor webs” developed and enhanced during the period of these missions have demonstrated the utility of designing Web services with open standard interfaces. The Google Earth visualization tool, the messaging and communication environment, and the real-time video broadcasting all were developed around open standards and allowed access to users of all data-serving platforms, including personal digital assistant devices. CONCLUDING REMARKS We have demonstrated that various platform, sensor, communications, and geospatial technologies can be integrated to provide near-real-time intelligence in support of disaster management entities. In our work with the U.S. wildfire management agencies, we designed a functional system for meeting disaster response metrics of “geospatial data delivery in under an hour from the collection event.” We have shown that geospatial data can be provided within a 5–15 min time period (from collection), which represents a significant advance in current capabilities. Large-capacity, long-duration, high-altitude UAS platforms can play a significant role in providing repetitive, lingering capabilities over disaster events, especially dynamic, evolving events like wildfires. The OTH satellite data and voice communications telemetry systems on these platforms can be employed to control and command acquisition of imaging payload as well as to provide sensor data to ground team members through the same telemetry linkages. This allows rapidly updateable information to be in the hands of incident management teams when needed. Imaging sensor systems can be designed to collect critical spectral and thermal wavelength channel information specifically “tuned” to the phenomenon that is being observed. The use of multispectral data in analysis of wildfire behavior characteristics is critical to ascertain fire behavior, location, and movement. The spectral channels defined in this chapter are essential for wildfire observations. Multichannel capabilities offer clear advantages over single-channel fire detection systems, as we have shown in this chapter. Image-processing capabilities, in order to derive level II data from acquiring sensor systems, can be automated and included as part of the payload processing package on an airborne (or satellite) sensing platform, such as the UAS described in this chapter. Complex algorithms can be integrated into the processing scheme to further reduce those labor and time-consuming, postmission analysis tasks. By integrating sensor and/or platform IMU and positioning information with terrain DEM data, a fully georectified image product can be developed autonomously onboard an aircraft, further reducing the critical labor and time requirements for delivery of accurate geospatial data. Web-enabled GIS tools and systems such as Google Earth or ESRI products provide a user-friendly “platform” for display of georectified imagery and information. Our goals were to ensure that the information products developed autonomously from the UAS-acquiring sensor would integrate seamlessly into a
12
Ambrosia et al.
multitude of geospatial visualization packages. We achieved that objective by providing autonomously generated data products in Open Geospatial Consortium (OGC) standard formats. Following 3 yr of system development and emergency support missions in the western United States, we have demonstrated that current off-the-shelf technologies can be integrated to provide the disaster management community with the data and “intelligence” that they require in real time. We anticipate that the civilian use of UAS will increase dramatically, especially in support of disaster management and disaster relief efforts. The processes and technologies described here for the use of UAS platforms and enabling sensors and technologies should form the foundation for designing future disaster monitoring and observation capabilities. These integrated technologies have obvious cross-disciplinary application to other disaster events in the United States and the world. ACKNOWLEDGMENTS We acknowledge the support of the National Aeronautics and Space Administration (NASA) through a grant (REASoN-0109-0172) awarded to support this work. We are also grateful for the support of S. Wegener (Bay Area Environmental Research Institute, BAERI), B. Lobitz (California State University–Monterey Bay), F. Enomoto (NASA), S. Johan (NASA), S. Schoenung (BAERI), T. Zajkowski (U.S. Forest Service–Remote Sensing Applications Center, USFS-RSAC), E. Hinkley (USFS-RSAC), S. Ambrose (NASA), T. Fryberger (NASA), T. Rigney (NASA), B. Cobleigh (NASA), G. Buoni
(NASA), J. Myers (University of California–Santa Cruz, UCSC), T. Hildum (UCSC), M. Cooper (General Atomics– Aeronautical Systems Inc.), and J. Brass (NASA). We would also like to acknowledge the wildfire management community members who engaged us in defining observation criteria and metrics that allowed us to help improve their wildfire/disaster mitigation capabilities. REFERENCES CITED Cahoon, D.R., Jr., Stocks, B.J., Levine, J.S., Cofer, W.R., III, and Chung, C.C., 1992, Evaluation of a technique for satellite-derived area estimation of forest fires: Journal of Geophysical Research, v. 97, p. 3805–3814. Flasse, S.P., and Ceccato, P.S., 1996, A contextual algorithm for AVHRR fire detection: International Journal of Remote Sensing, v. 17, p. 419–424, doi:10.1080/01431169608949018. Howard, S.M., Ohlen, D.O., McKinley, R.A., Zhu, Z., and Kitchen, J., 2002, Historical fire severity mapping from Landsat data: Pecora 15/Land Satellite Information IV/ISPRS (International Society of Photogrammetry and Remote Sensing) Commission I/FIEOS (Future Intelligent Earth Observation Satellites) 2002 Conference Proceedings: Bethesda, Maryland, American Society of Photogrammetry and Remote Sensing (CD-ROM). Li, Z., Nadon, S., Cihlar, J., and Stocks, B., 2000a, Satellite mapping of Canadian boreal forest fires: Evaluation and comparison of algorithms: International Journal of Remote Sensing, v. 21, p. 3071–3082, doi:10 .1080/01431160050144965. Li, Z., Nadon, S., and Cihlar, J., 2000b, Satellite detection of Canadian boreal forest fires: Development and application of an algorithm: International Journal of Remote Sensing, v. 21, p. 3057–3069, doi:10 .1080/01431160050144956. U.S. Geological Survey, 2008, Shuttle Radar Topographic Mission: http:// srtm.usgs.gov/index.php, site updated 23 June 2008 (site accessed 1 June 2009). MANUSCRIPT ACCEPTED BY THE SOCIETY 17 FEBRUARY 2011
Printed in the USA
The Geological Society of America Special Paper 482 2011
Ontological relations and spatial reasoning in earth science ontologies Hassan A. Babaie Department of Geosciences, Georgia State University, Atlanta, Georgia 30302-4105, USA
ABSTRACT Several types of fundamental ontological relations connect the endurant (continuant) and perdurant (occurrent) entities in every domain. These include: instantiation, parthood, location, and connection relations, and those that are derived from them, such as adjacency, overlap, containment, and coincidence. Some of these types of relations, and their subtypes, are formally defined in the context of the Web Ontology Language (OWL) for a variety of endurant geological examples, mostly from the Nankai Trough in southwest Japan and the San Andreas fault in California. Here, the foundational ontological relations are discussed to show their application in building useful earth science ontologies. These relations, defined as properties in OWL, are given in the context of the Resource Description Framework (RDF) triples and their relationship to relational databases. The role of properties in providing semantics, reasoning, and knowledge structuring and representation is discussed for various ontological relations. The semantics of classes are provided by the metaproperty and restrictions of the properties that use these classes as domain and range. Types of properties are described from different perspectives and for different purposes. Property subclassing, through OWL’s subproperty construct, is used to restrict properties. The formal definitions of the foundational taxonomic (isA), partonomic (partOf), location (locatedIn), containment (containedIn, componentOf), and topologic (overlap, adjacentTo) relations, at the class and instance levels, are given in first-order logic for continuant geological entities. Geologic examples for several other basic relations such as derivesFrom, transformationOf, and absorb are also given.
INTRODUCTION
ontologies (Kashyap et al., 2008; Donnelly et al., 2005; Smith, 2003, 2004) represent the reality by depicting the taxonomic, partonomic, and other types of hierarchical structure of objects and events in the real world (Lambe, 2007). Application of ontologies to support information management, knowledge discovery, and spatial and temporal reasoning has been progressively appreciated by earth scientists in recent years, as is apparent by
Depiction of the structure and dynamics of reality, to enable spatial and temporal reasoning of simple and composite components of Earth’s natural systems, requires construction of ontologies that are based on formalized representation of the static and dynamic entities and their complex relationships. Formal domain
Babaie, H.A., 2011, Ontological relations and spatial reasoning in earth science ontologies, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 13–27, doi:10.1130/2011.2482(02). For permission to copy, contact
[email protected]. © 2011 The Geological Society of America. All rights reserved.
13
14
H.A. Babaie
a significant increase in the number of projects devoted to building top-level and domain ontologies (e.g., Raskin and Pan, 2005; Sinha, 2006). Construction of useful ontologies requires a thorough understanding of the formal meaning of the foundational ontological relations such as instantiation, parthood, location, and connection, and those that are derived from them, such as adjacency, overlap, containment, and coincidence (e.g., Randell et al., 1992; Cohn and Varzi, 2003; Donnelly, 2004b; Schulz et al., 2005, 2006). Despite the spatial and spatio-temporal nature of most components in Earth systems, and the fact that most natural objects are complex composite entities, the existing taxonomic structures in current earth science ontologies are mainly based on the primitive isA (i.e., subclass, instantiation, or subsumption) relation, and they under-represent the other types of ontological relation. The hierarchies in these ontologies generally do not include the mereological part-whole (i.e., partitive) and topological (e.g., connection) relations that are needed to depict composite, spatially and spatio-temporally related entities. Moreover, there seems to be confusion in the earth science community as to the difference between formal relations that hold at the universal level compared to those that exist among instances (individuals) in reality. In this paper, I focus on introducing the major fundamental relations (both Web Ontology Language [OWL] qualified names and user-defined) and the difference between those that hold between universal types and those that exist among instances in reality. The main objective of this paper is to introduce the formal ontological relations so that they can be used more consistently in designing better and more reliable earth science ontologies. Due to the scarcity of ontologies in the earth sciences, most of the material used in this paper is based on the work in artificial intelligence, medical informatics, and analytical philosophy, where significant progress has been done in ontological engineering. The formal relations in this paper are given in the first-order logic notation (Levesque and Lakemeyer, 2000; Brachman and Levesque, 2004), and the symbols used in this paper include: ∧ (and), ∨ (or), → (then), ¬ (not), ∃ (there exists), ⊆ (is a subclass of), ∩ (intersection of) and ∀ (for all, if any). All variables are given in Helvetica font, and those for types are given in the upper case letters, e.g., X, C, R. Variables representing particulars or instances of universal types are given in the lowercase letters, e.g., x, c, r. Universal types and particulars are also given in Helvetica font, with the first letter of every word written in the capital letters, e.g., Ocean, Rock, Fault, and the first letter of each word in compound names capitalized in the camel case, e.g., IgneousRock, AccretionaryPrism. The relation names start with a lowercase letter in Helvetica font, and the first letter of each subsequent word in composite names is in uppercase, i.e., camel case (e.g., partOf, isA, connectedTo). Concepts, i.e., term definitions, are given in italic font, e.g., mineral, water. Although the geological examples given to elucidate the semantics of each type of relation are varied, they mostly relate to the spatial relations in the Nankai Trough accretionary prism in southwest Japan (Tobin and Kinoshita, 2006) and the San Andreas fault in California.
PROPERTIES AND SEMANTIC LANGUAGES Semantic models (e.g., ontologies) have a graph-theoretical structure, consisting of nodes representing terms, and edges representing relations that link the nodes (Smith et al., 2005). The relata for the relations are classes (i.e., terms) that represent entities in reality. Ontology captures the formal relations and their universal relata, based on the real relationships and constraints between the instances in a specific domain of discourse, i.e., a discipline or field (e.g., Smith, 2003, 2004; Bains, 2006). This way, ontologies can be used to represent domain knowledge and theories, and support inquiries in the domain. Information models depict the reality by using classes to represent the universals and instances of these classes to represent the individuals. Knowledge in any domain is a collection of numerous true statements (propositions) (Carrara and Sacchi, 2006). For example, the statement, “Rock is an aggregate of one or more minerals,” is a piece of the petrology knowledge. The following are two other examples of tectonics and structural geology knowledge statements: “Temperature (T) increases with depth (z),” denoted by the geothermal gradient ∂T/∂z ≈ 30 °C/km, and “stress (σ) and strain (e) are linearly related (at low temperatures/pressures and shallow depth)” expressed by Hooke’s law of elasticity, σ = Ee, where E is the proportionality constant. These knowledge statements are composed of terms (e.g., stress, strain in the last statement) that represent universal classes that evoke domain concepts, and the relationships among them (e.g., the “linearly related to” relation between stress and strain). Knowledge statements are explicitly asserted in ontologies in the form of Resource Description Framework (RDF) triples, applying the semantic web languages such as OWL (Web Ontology Language) and its underlying RDF and RDF schema (RDFS) languages (Breitman et al., 2007; Allemang and Hendler, 2008; Antoniou and van Harmelen, 2008; Kashyap et al., 2008). Every statement in RDF is like a value in a single cell of a database table, which requires three components for its complete representation (Allemang and Hendler, 2008): a row identifier (subject, s), a column identifier (predicate, P), and the value in the cell (object, o) (Fig. 1). Subject is the thing (individual) for which we make the statement, the predicate is the property (relation) for the subject, and the object is the value for the property. We refer to the subject-predicateobject statement as an RDF triple. Figure 1 shows the concept of the RDF triple and its relationship to the relational database table structure, and Figure 2 gives an example for conversion of the relational database table into RDF triples. A knowledge base built based on a domain ontology is a large set of such RDF triples about the individuals in that domain. Because each row (i.e., record) in a relational database table has multiple columns (fields, attributes), several triples in an ontology often relate to one subject (Figs. 1B and 2B). In other words, a single subject in a knowledge base may relate to many objects through many predicates, and hence RDF triples. For example, a specific, individual sedimentary bed (subject) has specific values (objects) for its age, orientation, and composition predicates.
Ontological relations and spatial reasoning in earth science ontologies
Figure 1. The correspondence between the Resource Description Framework triple (subject-predicate-object) and a relational database table. (A) Each row (record) in a relational database table represents a subject (s). Each column (field or attribute) in the table is a predicate (property, Pi). The cell value at the intersection of a row and column is the object (Oi). (B) Each subject (S) corresponds with many objects (Oi) through many properties (Pi) (Allemang and Hendler, 2008).
15
In the semantic web jargon, subjects are said to be instances of classes, predicates are properties (relations), and objects, which are either instances of other classes or are data types, provide values for the properties. There are two types of properties in OWL: data type and object. The data type property is a binary relation between a set of individuals (subjects) and a set of instances of a typed literal (e.g., XSD, XML Schema Definition data types) (Breitman et al., 2007). An object property is a binary relation between sets of individuals of two classes, i.e., the subject and objects of a RDF triple are both individuals. Properties are restricted by their domain and range. Domain relates a property (predicate) to a subject class of a triple that uses the property. This way, domain namespace imposes restriction on the type of the subject instances that use the property (Antoniou and van Harmelen, 2008); the domain is the set of values for which the property is defined, i.e., it is the collection of types that use the property. For example, the domain for the composition property in the triple: Mineral composition MineralGroup, is the Mineral class. Range is used to relate a property to a target, object class, or data type, and therefore it puts a restriction on the set of values a property can take. For example, the range for the composition property in the previous triple can be silicate, phosphate, sulfide, hydroxide, etc., which are all of type MineralGroup. Domain and range are constructs that give information on how a property may be used, and they should be declared with care. The direction of roles should be defined correctly when using the domain and range. Table 1 shows some statements in the structural geology domain (namespace prefix: struc), given in the N3 serialization format (Allemang and Hendler, 2008). In these statements, the Fold class has the foldDescription and tightness data type properties of string type, and a foldAxis object property for which the range is of the Line type. RELATIONS AND REASONING
Figure 2. A relational database table (A) converted into the Resource Description Framework triples (B) for the first row (record) of the table.
As conceptual units, relations constitute a major component of knowledge structuring, representation, and reasoning by providing semantics, i.e., meaning (Jouis, 2002; Lambe, 2007). Because semantic data are focused mainly on the relationship
TABLE 1. ABBREVIATED N3 SERIALIZATION OF THE STRUCTURAL GEOLOGY DOMAIN’S FOLD CLASS N3 statement Resource Description Framework triple struc:foldDescription rdf:type owl:DataTypeProperty. Fold tightness “open” struc:foldDescription rdfs:domain struc:Fold. Fold Description “Harmonic fold in bedding” struc:foldDescription rdfs:range XSD:string. Fold foldAxis Line1 struc:foldAxis rdf:type owl:ObjectProperty. struc:foldAxis rdfs:domain struc:Fold. struc:foldAxis rdfs:range struc:Line. struc:tightness rdf:type owl:DateTypeProperty. struc:tightness rdfs:domain struc:Fold. struc:tightness rdfs:range xsd:string.
16
H.A. Babaie
between individuals, ontologies are property-oriented in contrast to the object-oriented models (Sagaran et al., 2009). Properties in ontology languages such as OWL are first-class citizens, meaning that like classes, they can subsume subproperties (Antoniou and van Harmelen, 2008). Because properties are defined globally, they can be used anywhere by any class, i.e., unlike methods in an OO (object-oriented) language, a property in RDF does not belong to one class! This provides a great deal of flexibility in OWL and its sublanguages because there is no need to modify classes when we introduce new properties. Relations provide a framework to meaningfully connect categories of entities defined in classes that stand for our domain types. Properties stand for relations or constitute the arguments for relations. Classes are defined in ontologies based on properties or values of the properties, for example, we can define classes of minerals that are silicate, or classes of deformed rock that are mylonitic. Ontologies are often built in one domain and reused in another through importation. The reuse requires declaration of namespace for each ontology. The XML namespace standard, used by RDF, provides a global scope to domain classes and properties, allowing reuse of existing types into new ones without naming conflicts. The XML standard allows different communities of scientists (e.g., oceanography and atmospheric science) to independently develop their own ontologies and markup languages. There is often a need to integrate these autonomously developed vocabularies into other applications, and this is where the namespace becomes very useful. It is common for two domain vocabularies to contain classes or properties that have the same name but that are structured in different ways, and thus provide meaning differently. If the two vocabularies are shared by an application, there will be a name conflict, which would lead to an error during processing. The namespace prevents this kind of name collision by assigning the similarly named terms, which belong to different domains, to different uniform resource identifiers (URI) that reference these communities. Declaration of a namespace is done by the xmlns (XML namespace) attribute, which allows both an optional prefix and a URI to be chosen for the namespace. The prefix, which references the URI, qualifies each term within a vocabulary to a specific community, e.g., struc:Fold, ocean:Floor, which are qualified names for the structural geology and oceanography domains, respectively.
The fact that rocks have textures and textures have textual descriptions can be stated as: Rock texture Texture and Texture textureType XSD:string, respectively. Let’s assume that Rock and Texture are defined in the Petrology ontology (with the “petr” namespace prefix, which is a URIref), and then they are imported into another ontology. The imported Rock and Texture classes are referred to, in the ontology that is using the Petrology ontology, by the qualified names petr:Rock and petr:Texture, respectively. In practice, the petr prefix is a URI, which is a unique identifier for the Petrology ontology if it exists. Figure 3 shows the graphical and textual presentation of the texture property and its domain (Rock) and range (Texture) classes, and the textureType property and its domain (Texture) and range (XSD:string). Two instances (individuals) of the Rock and textureType are also shown. There are two general types of relation: static and dynamic (Jouis, 2002). While the static relations connect the spatial or static aspects of entities (i.e., no change in state), the dynamic relations deal with the temporal and spatio-temporal concepts (e.g., process, event) that involve change of state. Examples of the static relations are: IdahoBatholith locatedIn Idaho; Rock isA Solid. An example of the dynamic relation is: Faulting displace Rock. Relations are used to structure knowledge and provide semantics for reasoning through the following constructs: (1) hyponomy (Cruse, 2002), to specialize general spatial entities through the “isA” and “is a kind of” relations (e.g., NormalFault isA Fault), (2) troponymy (Fellbaum, 2002), to subclass verbs that represent the effects of processes (e.g., strain isA deform, slip isA move), and (3) meronymy (Pribbenow, 2002; Koslicki, 2008), to structure complex wholes by using the “part of” relation (e.g., Mineral partOf Rock). A statement such as “NormalFault isA Fault” or “strain isA deform” implies inclusion of meaning, that is, the meaning of the NormalFault type or the strain relation includes the meaning of the Fault type or deform relation, respectively. In other words, the hyperonym-hyponym pair (i.e., general-specialized pair, e.g., Fault and NormalFault, or strain and deform) are of the same semantic type. In a troponymy in which elongate and distort are subproperties of the strain relation, the subordinate (i.e., specialized) relations (elongate, distort) contain the meaning of the superordinate (general) relation (strain), but they add extra semantics to it, that is, whereas strain is change in something (volume, length, and
Figure 3. (A) Directed graph of two resource description framework (RDF) triples: Rock texture Texture and Texture textureType XSD:string. “andesite” and “porphyritic” are instances of the Rock and Texture classes. (B) The N3 serialization of the two RDF triples, defining the Rock class and the domain and range for the texture and textureType properties.
Ontological relations and spatial reasoning in earth science ontologies angle), the subordinate elongate or distort relation deals specifically with the change in length or shape, respectively (Fellbaum, 2002). The difference between the verbal relations elongate and distort is in the “manner” or “way”’ in which the processes that these relations represent occur. The elongate or distort relation, as in “Extension elongate Pebble” or “Shearing distort Fossil,” denotes the manner or way in which a pebble or fossil strains by elongation or change in angle, respectively. SUBSUMPTION OF PROPERTIES AND CLASSES Properties relate classes in the hierarchy. Importing and equating properties from different sources can be done with the use of the rdfs:subPropertyOf. When an ontology is imported into another one, we may need to make two properties equivalent. For example, assume that two groups of geologists (e.g., geochemistry and tectonics groups) have knowledge bases in which one domain uses the property study and the other one uses investigate for the scientific work done by its geologists. Let’s assume that the namespace prefixes for these two domains are geochem and tect, respectively. Assuming that the investigate and study verbal properties mean the same thing, we make them equivalent letting each property be the subproperty of the other (Allemang and Hendler, 2008), i.e., geochem:study rdfs:subPropertyOf tect:investigate and tect:investigate rdfs:subPropertyOf geochem:study. Or, we can use the owl:equivalentProperty for this purpose. As another example, if hydrogeology ontology calls flow for what structural geology ontology calls transport, we can use the rdfs:subPropertyOf as long as the domain and range of these various properties are of the same type. To state that all uses of the transport and move properties are the same, we assert: hydro:transport rdfs:subPropertyOf struc:move. If we have the triple: x transport y in the hydrogeology domain (hydro namespace prefix), we can infer x move y in the structural geology domain (struc namespace prefix), as long as x and y are of related types (e.g., water and ion). For example, since transport is a subproperty of the move property, the explicit assertion hydro:Water hydro:transport geochem:Ion infers that struc:Fluid struc:move geochem:Ion.
If class (type, set) C is a subclass of A and a subclass of B (i.e., C rdfs:subClassOf A, and C rdfs:subClassOf B), then C is in the intersection of sets A and B (i.e., C ⊆ A ∩ B). In this case, if individual x is in C, then x is also in both A and B. For example, pyroclastic rocks (e.g., tuff) have the properties of both volcanic and clastic sedimentary rocks. In a default namespace (where no prefix precedes the colon in the qualified name), this fact is asserted as :PyroclasticRock rdfs:subClassOf:VolcanicRock, and :PyroclasticRock rdfs:subClassOf:DepositionalRock. If we now assert that :HuckleberryTuff rdf:type:PyroclasticRock, we infer (i.e., reason through the inference rules) that the Huckleberry tuff (in Wyoming) is both depositional (kind of sedimentary) and volcanic, i.e., we derive the following two inferred statements: :HuckleberryTuff rdf:type:VolcanicRock, and :HuckleberryTuff rdf:type:DepositionalRock. Notice that the inference is unidirec-
17
tional, i.e., pyroclastic rock is both volcanic and depositional, but every depositional sedimentary or volcanic rock is not pyroclastic! Each class defines the essential and accidental properties (Colomb, 2007). The essential properties are necessary for the individuals to have if they are members of the class. Individuals may or may not have the accidental properties. For example, one essential property of the silicate family of minerals is to have a composition made of silicon, oxygen, and certain metals. If a mineral does not have Si or O (essential properties), then it does not belong to the Silicate class. Thus, essential values for all instances of a class must necessarily be the same, and they must always be present. However, being glassy, blue, smoky, purple, or milky for quartz (a silicate) is accidental. Notice that what is essential for a subclass may be accidental for its superclass. For example, in the IgneousRock isA Rock taxonomy, formation from magma, which is essential for a member of the IgneousRock subclass, is accidental for a member of the Rock superclass. An essential whole is a complex individual with essential unifying properties (relating all the essential parts) that are required by the whole. For example, the Fold essential whole must have limb and hinge line as essential parts. If a property is both essential and sufficient to identify the class, it is called rigid. For example, physical, optical, and chemical properties of minerals are rigid. All instances of a given mineral (rigid class) have the same set of values drawn from its rigid properties. Properties represent the binary relations (predicates) between resources or individuals, referenced by the subjects and objects in the RDF triples (Hitzeler et al., 2009). Properties are defined in RDF as instances of the rdf:Property class (e.g., solidify rdf:type rdf:Property). The meaning of the recrystallize property, relating metamorphism and rocks (Metamorphism recrystallize Rock), is the set of all individuals that are recrystallized. As sets, properties exhibit a similarity to classes rather than to individuals. Properties are subclassed in OWL by applying the owl:subPropertyOf construct (e.g., grainBoundaryMigrate owl:subPropertyOf recrystallize). The rdfs:subPropertyOf provides a mechanism to extend properties, from more general to more specific, i.e., allow hierarchy of properties. “P is said to be a subproperty of P′ if every instance of P is also an instance of P′” (Colomb, 2007), or, stated differently, “P is a subproperty of P′ if P′(x,y) whenever P(x,y)” (Fig. 4). In general, P rdfs:subPropertyOf P′ means: if x P y, then x P′ y, i.e., if x and y are related by P, then they are also related by P′. To clarify this, think of the shear and extend properties, which are more specific types of the displace property (i.e., shear rdfs:subPropertyOf displace and extend rdfs:subPropertyOf displace) (Fig. 4). In this case, if Fault shear Grain, then Fault displace Grain. The brittlyDeform property is more specific than deform, and rotate is more specific than deform, i.e., rotate rdfs:subPropertyOf deform. This means that if a fault rotates a fold, it deforms it (if Fault rotate Fold, then Fault deform Fold). The crystallize property is a subproperty of solidify (crystallize owl:subPropertyOf solidify), and strain, rotate, and translate are subproperties of the deform property (e.g., strain owl:subPropertyOf
18
H.A. Babaie
Figure 5. A property inheriting from two other properties, where x and y are instances of classes A and B, which are related by property P. The relation x P y implies the two top x R y and x S y relations (properties).
Figure 4. Subclassing of properties, where x and y are instances of the A and B classes, respectively. (A) P is a subproperty of P′ if x p y infers x p′ y, i.e., the relation of two instances (resources) x and y by the subproperty infers the relation by the superproperty.
deform). So crystallize owl:subPropertyOf solidify means that if a magma x crystallizes into a mineral y (i.e., if x P y), then x also solidifies into mineral y (i.e., implies x P′ y). Also, grainBoundaryMigrate owl:subPropertyOf recrystallize means that
grain boundary migration during metamorphism brings recrystallization. Notice that subsumption is transitive, i.e., material that is strained is also deformed; material that is crystallized is also solidified. The converse is not necessarily true, that is, something that solidifies does not necessarily do it through crystallization. There are cases where a property (P) inherits meaning from two (R and S) (or more) other properties. This can be done by defining property P to be the logical intersection of R and S (i.e., P ⊆ R ∩ S), and two resources x and y are related by property P. In other words, if x P y, then x R y and x S y (Fig. 5). For example, recrystallization (defined in the structural geology ontology with the struc namespace) implies both crystal plastic deformation and strain softening. These relations are asserted as follows: struc:recrystallize rdfs:subPropertyOf struc: crystalPlasticallyDeform and struc:recrystallize rdfs:subPropertyOf struc:strainSoften. This is a unidirectional inference, i.e., when recrystallization occurs, it strain softens the rock, but all strain softenings are not achieved via recrystallization (for example, it can occur via recovery).
Optional properties can subsume mandatory subproperties, but mandatory properties can only have mandatory subproperties (Colomb, 2007). Instances of the domain classes for an optional property are not required to participate in it. For example, ductilelyDeform may be optional for a Rock, but it can subsume mylonitize, which can subsume recrystallize. If the mylonitize property is mandatory for a crystal-plastically deformed fault rock, then its recrystallize and recover subproperties must be mandatory too. Mandatory properties are expressed with the existential quantifier (∃). For example, Folding (a process) must involve either a planar object (e.g., bedding, foliation) or a linear object (flute cast, lineation). So, if there is an instance of Folding, there also exists an instance of a Planar or Linear class through the deform property. For example, if the Folding and Bedding classes are defined in the structural geology ontology (namespace prefix: struc), then the domain and range for the deform property are given by: struc:deform rdfs:domain struc:Folding, and struc:deform rdfs:range struc:Bedding. A class can “supply” or “carry” a property. It supplies the property if the property only holds for the class and not its superclass, i.e., it is defined in the subclass. It carries the property if it is inherited from the superclass (Guarino and Welty, 2002; Colomb, 2007). Properties can carry essentiality (+E), unity (+U), identity (+I), and rigidity (+R). These metaproperties are defined in the Ontoclean method (Guarino and Welty, 2002). It should be noted that the Ontoclean method is only one of several possible ontology evaluation methods. Essentiality means that all instances must have the same value for their property. Unity relates to the parts that are needed to make the whole. It is concerned about how parts of a complex object are put together to make the whole object. Identity refers to the properties that are needed to identify instances of a class. Rigidity is provided by the necessary and sufficient properties that identify the class of an individual. If any of these metaproperties is annotated with a negative prefix (read: “not”), e.g., –E, –U, –I, and –R, it means that the metaproperty does not necessarily hold (may hold by accident) for all instances of the class for which the property is defined (Guarino and Welty, 2002). Thus, –E means that property is not essential, although it could be accidental. Same is true for –U, –I,
Ontological relations and spatial reasoning in earth science ontologies and –R, which read “not unity,” “not identity,” and “not rigidity,” respectively. A metaproperty annotated with ~ (read “anti”), e.g., ~E (anti-essential), ~U (anti-unity), ~I (anti-identity), or ~R (antirigid), means that it necessarily does not hold for any instance of the class. An anti-identity and anti-unity property cannot be used as a basis of an identity or unifying relation for any instance, respectively. An anti-essential property can be updated (i.e., the value may change) in all instances (Colomb, 2007). The +R and +E properties must be mandatory, whereas –R, ~R, –E, and ~E can be optional. Subclasses cannot weaken the strength of the metaproperty for the superclass, i.e., a subclass cannot have an ~E, ~I, or ~U if the superclass has the property +E, +I, and +U. However, the opposite is possible, i.e., if a superclass has a property with metaproperties ~E, ~I, or ~U, the subclass can have that property with +E, +I, and +U, respectively (Colomb, 2007). RELATIONS THAT HOLD BETWEEN FUNDAMENTAL TYPES OF ENTITIES The entities in a domain (e.g., subduction zone, fold-andthrust belt) fall into two broad, disjoint (i.e., nonoverlapping) categories (e.g., Smith and Grenon, 2004; Bittner et al., 2004): (1) continuants (endurants), and (2) occurrents (perdurants). The continuants include material and immaterial substances, parts (both fiat and bona fide parts; Smith 2001), boundaries, aggregates, qualities, roles, function, and spatial regions (Smith and Grenon, 2004). The continuant objects, such as fault, lake, accretionary prism, rock, and porosity, exist in their entirety (i.e., as a mereological whole) at any temporal slice (i.e., at a time instant, ti) of their four-dimensional (spatio-temporal) life history. Despite the continuous change in the object (attribute) and relational properties (e.g., partitive and connection relations) of the continuants, these entities maintain their identity through time and space, as long as they exist. For example, the continuous qualitative changes in the type, thickness, and spatial location of sediments in the Kumano forearc basin, and its underlying accretionary prism, do not change the identity of these components of the Nankai Trough (just like the change in the color of your hair does not change you). While continuants represent the static part of reality, the occurrents correspond to the dynamics of the world. The occurrents include events that signify the instantaneous beginning and end of state change in objects (e.g., rock, fault) through homogeneous processes that bring qualitative change to the continuants. For example, the accretionary prism (a continuant) grows through processes of offscraping, underplating, and sedimentation, which modify the structure of the prism over time. In this paper, I focus on the formal spatial relations and do not cover the temporal and spatio-temporal relations, which require discussions of occurrents, such as processes and events, that are not in the scope of the paper. These relations can be found in Babaie (2011). Formal, in this case, means that the relations, which are defined in first-order logic, apply to any domain of
19
reality, for example, to subduction zone, strike-slip fault, experimental rock deformation, or atmospheric science. Notice that the formal relations, such as partOf, are not necessarily part of the OWL language. The formal relations (e.g., partOf, locatedIn) may hold between: (1) continuant objects, e.g., SeismogenicZone partOf PlateBoundaryFaultZone; SplayFault locatedIn AccretionaryPrism; UnderplatedSediment externallyConnectedTo AccretionaryPrism, (2) occurrent objects, e.g., Comminution partOf Cataclasis; Fracturing partOf Faulting, or (3) between objects of the two disjoint types, e.g., Folding involves Discontinuity; Ions participate-in Mineralization. Notice that the relations between two continuants
are defined at the time instants that the two objects are related to each other, i.e., xi partOf yi, or xi properPartOf yi, for the time index i. The related objects can have different granularities, for example, Microfracture partOf DamageZone; TwinBanding locatedIn Mineral. A damage zone, a fractal entity, can exist over a large range of scale, microscopic to regional, compared to twin banding, which is microscopic. The universal term entity refers to objects, processes, events, functions, times, and places (Smith and Rosse, 2004). Entities are represented in information systems by classes (universals, types, kinds) and their instances (individuals, particulars, tokens), which exist in a specific region of space and time. Examples of entities include Ocean (a continuant type) and its instances, e.g., IndianOcean and PacificOcean, and Subduction (an occurrent type). Smith (2004) and Klein and Smith (2006) defined concept to refer to the meaning of a consensual, general term that has been agreed upon by domain experts and has been used in making scientific assertions. For example, the San Andreas fault in California is an individual of the Fault type (class). The string: “San Andreas fault” is a term (symbol) that refers to the actual, individual San Andreas fault, and it evokes the abstract fault concept, which has a welldefined meaning to the domain experts. The concept fault means “a planar discontinuity in rock along which there has been some displacement.” The concept ocean refers to the universal Ocean type, the largest body of water on Earth. The fault and ocean concepts refer to (i.e., stand for) the universal types Fault and Ocean that have instances (particulars, individuals) in reality, such as the San Andreas Fault and Pacific Ocean. Concepts do not exist; they are used to represent the universal types that can be instantiated in reality. Ontologies are not about concepts; they are models of individual entities that instantiate the universals in space and time, e.g., SanAndreasFault, NankaiTroughAccretionaryPrism. These two examples are instances of the Fault and AccretionaryPrism type, respectively. Universal ontological types are represented in information models as artifacts, such as classes in UML (Unified Modeling Language) diagrams, entities in entity relationship diagrams, elements in XML schema, and tables in databases. Thus, UML classes represent ontological types that have instances in reality, and they are given specific terms that refer to our concepts. It is imperative that we not think of ontologies as hierarchies
20
H.A. Babaie
of concepts, but of types and instances and the ways in which they are related. For example, the rock and water concepts are not related in dictionaries. In reality, however, rock can contain water in its pores, i.e., Pore partOf Rock and Pore contains Water (inverse of Water containedIn Pore). Notice that the types Rock, Water, and Pore are related at the universal level. At the instance level in reality, however, there are some real rocks (i.e., instances of the Rock type) that do not have pores and therefore do not contain water. Ontologies are depictions of both the universal types and the real relations that may exist among instances of these types, based on domain theories and knowledge. METAPROPERTIES Complex entities can be partitioned either through decomposition into subclasses (taxonomy) or into parts (partonomy) (Tversky, 1990). Class inclusion through taxonomy is based on similarity. Whereas the meronomic relations are between concepts, allowing no inheritance by the subclass, the taxonomic relations are within concepts, making it possible for a class to inherit properties from its superclass (Pribbenow, 2002). In contrast to the downward inheritance in a taxonomy, a partonomy may allow an upward inheritance, whereby a whole inherits from its parts. For example, an ultramafic rock inherits its dark color and high density from its mineral parts. A fold inherits its shape from the constituent layers; a molecule inherits its composition from its elements. Relations can be unary, binary, ternary, or n-ary (e.g., Smith, 2003; Smith et al., 2005). There are three general types of relations (Smith et al., 2005) that hold between: (1) classes, i.e., , e.g., isA; (2) a class and it instances, i.e., , e.g., instanceOf, and (3) instances, i.e., , e.g., partOf. In the following, some of the formal, primitive, foundational relations that obtain between different classes are described. Ontological relations are those that obtain between instances in reality independent of the ways in which we gain knowledge about them or represent them (Smith et al., 2005). Formal means that the relations are general and domainneutral, and primitive means that they are atomic and other relations can be derived from that. The metaproperties, which define the properties for properties, are defined next (e.g., Breitman et al., 2007). If the property that relates two classes is the same in both directions, we declare the property to be symmetric. P is symmetric if and only if, for any x and y, P(x,y) if and only if P(y,x). For example: “twinsWith” is a symmetric property (if x twinsWith y, then y twinsWith x). Symmetric properties must be declared as such (P rdf:type owl:SymmetricProperty). The inference for a symmetric property is as follows: P owl:inverseOf P. That is, the symmetric property is an inverse property. Property R is said to be inverse of property P if for any x, y, P(x,y) if and only if R(y,x). Many properties in one direction have an inverse property in the opposite direction but are named differently. For exam-
ple, the first property in each of the following pairs reverses the direction of the second property: analyzes and analyzedBy, investigates and investigatedBy, hasSample and sampleOf, wrote and writtenBy, and locatedIn and locationOf. The partOf property is an inverse property. This means that if fold has limb as part, then limb is part of fold (struc:Fold hasPart struc:limb; partOf owl:inverseOf struc:hasPart.). In mathematics, a relation P is said to be transitive, for any x, y, and z, if and only if P(x,y) and P(y,z), then P(x,z). This is represented by the owl:transitiveProperty and may be declared as follows: P rdf:type owl:TransitiveProperty. The inference for this property is as follows: If x P y and y P z, then x P z. For example, C partOf B, B partOf A, then C partOf A. For example, by being fractal, faults have segments that have smaller fault segments, which have even smaller segments, which are themselves faults, such that struc:FaultSegment struc:partOf struc:FaultSegment; struc:partOf rdf:type owl:TransitiveProperty; struc:FaultSegment rdfs:subClassOf struc:Fault. The locatedIn property is also transitive: geo:locatedIn rdf:type owl:TransitiveProperty. For example, if tect:SanAndreasFault geo:locatedIn geo:California, and geo:California geo:locatedIn geo:United States, then, geo:SanAndreasFault geo:locatedIn geo:United States. Property P is functional, for any x, y, and z, if P(x,y) and P(x,z), then y = z. It is the one for which there exists only one value. This is in analogy with mathematical function (y = 3x), which for any single input value (e.g., x = 2) returns one unique value (y = 6). For example, the location of a sample given by its longitude and latitude is a functional property. The owl:FunctionalProperty and owl:InverseFunctionalProperty allow merging data for a same individual from different sources. The owl:FunctionalProperty can only take one value for any individual, allowing sameness to be inferred. The inference rule for this construct is as follows: if P rdf:type owl:FunctionalProperty, X P A and X P B, then A owl:sameAs B. Property P is inverse functional, for any x, y, and z, if P(y,x) and P(z,x), then y = z. The owl:InverseFunctionalProperty, which is the inverse of the owl:FunctionalProperty, is very useful for merging data from different sources. This property is equivalent to the key in relational databases, such as social security number and driver’s license number. The inference rule of this construct is as follows: if P rdf:type owl:InverseFunctionalProperty, and A P X, B P X, then A owl:sameAs B. For example, assume that the location of any measurement is uniquely identified by the longitude and latitude (defined by the location class), i.e., no two samples can be taken at the same exact spot, given :Station134 :locatedAt :LocationA and :Station346 :locatedAt :LocationA, we infer that :Station134 owl:sameAs :Station346. For a one-to-one relationship, we use a combination of the owl:FunctionalProperty and owl:inverseFunctionalProperty. For example, we can define sample ID to be unique. :sampleId rdfs:domain :Sample :sampleId rdfs:range xsd:Integer :sampleId rdf:type owl:FunctionalProperty :sampleId rdf:type owl:Inverse FunctionalProperty. So, any two samples with the same ID must
be the same sample!
Ontological relations and spatial reasoning in earth science ontologies Notice that not every functional property can also be an inverse functional property. For example, mineral composition can be functional only, because every mineral has a unique composition, but many individuals can share that same composition (if they belong to the same class) (compare with the hasFather property). Some properties can only be inverse functional, but not functional. For example, a single-author publication of an author (or description of an outcrop of a thin section by one person) may be inverse functional, because it only belongs to one person. The person can have several such publications or descriptions. INSTANTIATION RELATION Individuals (instances) are related to the universals (types) with the primitive isA relation, e.g., SanAndreasFault isA Fault. The isA relation, as a mechanism for subtyping in reality, provides for the specialization of a class in an information model. In scientific investigations, we deal with the individuals, not the universals. These instances can be of the continuant or occurrent types (Klein and Smith, 2006). As scientists, we study individuals such as the SanAndreasFault, or a specific specimen of a rock or water sample from a particular river (i.e., an instance of the River type). The instances stand in different relations to each other in the real world. At the class level, we may have: Mylonite isA Rock; Pore properPartOf Rock; Recrystallization potentialPartOf Deformation, Basin contains Water, and AccretonaryPrism tangentiallyConnectedTo subductingPlate. Notice that in reality a particular deformation (e.g., brittle deformation), somewhere in space and time, may not include recrystallization. The assertion Mylonite isA Rock implies that all instances of the type Mylonite are also instances of the type Rock. However, notice that only some instances of the type Rock are also instances of the Mylonite type. The assertion ForearcBasin potentiallyAdjacentTo AccretionaryPrism implies that all instances of the type ForearcBasin may be adjacent to some instance of the type AccretionaryPrism. Two conditions are needed for a type to be a subtype of another type (Smith, 2004): C isA D, if: C and D are universals, and for all times t, if anything instantiates universal C at t, then that same thing must also instantiate the universal D at t. A universal is anything that can be instantiated by an individual (particular). For example, if contraction in a subduction complex leads to the creation of an instance of a SplayFault in the accretionary prism, it also forms an instance of the Fault super type at the same time. Instantiation is represented, at the instance level, by the instanceOf relation, for example, c instanceOf C at t, which means that the continuant, particular c instantiates universal C at time t (Smith et al., 2005), e.g., BereaSandstone instanceOf SedimentaryRock. The binary instanceOf relation is written as Inst (c, C), or Cct, where the particular c is an instanceOf the universal C. Every universal has a particular (i.e., ∃c Inst [c, C]), and every particular is an instance of a universal (i.e., ∃C Inst [c, C]) (Donnelly et al., 2005). The symbol ∃ is the existential quantifier, which
21
reads: there exists some (at least one). The instance relation can be used to define subsumption at the universal level: isA (C, D) = ∀c (Inst[c, C] → Inst[c, D]), which says that C is subsumed by D (i.e., C isA D), if every instance (c) of C is also an instance of D. The ∀ symbol is the universal quantifier, which reads: for all, if any. For example, isA (Pseudotachylyte, FaultRock), or isA (Silicate, Mineral). These can also be written as: Pseudotachylyte isA FaultRock, and Silicate isA Mineral. The universal-level assertion: C isA D means: for all c, t, if c instanceOf C at t, then c instanceOf D at t (Smith et al., 2005). For example, ThrustFault isA Fault, which is true only if instantiation of a structure of type ThrustFault, say in an accretionary prism, leads to the instantiation of the structure of type Fault at the same time in the prism. Thus, in the examples: FaultSegment isA Fault and Mylonite isA FaultRock, the instantiation of the FaultSegment or Mylonite leads to simultaneous instantiation of the Fault or FaultRock, respectively. Notice that the isA relation does not hold between the concepts (i.e., meaning of the terms); it holds between universals with actual instances in reality. When we assert that: ThrustFault isA Fault, we are not saying that the meaning of the term thrust fault is the same as the meaning of the term fault. The meanings of these two terms are of course different. The assertion means that the universal ThrustFault isA type of the universal Fault type. Thus, the isA relation is used here to mean subtyping between universals and between their corresponding instances. The isA relation represents a necessary, but not a sufficient, condition for an entity. For example, the universal type Mylonite is necessarily a FaultRock, but it has other properties that the more general FaultRock type does not have. A ThrustFault is necessarily a Fault, but a thrust has some properties that are unique to itself, that a general type of Fault may lack. In other words, it is not sufficient to say that a ThrustFault isA Fault or Mylonite isA Rock. Even though a human being is necessarily a mammal, it is not sufficient to say that a human being isA mammal (dogs are also mammals). There is a difference between dogs and humans even though both are necessarily mammals. The difference is represented by additional and unique object and relational properties of the subclasses in the ontology model. For example, an instance of the type Mylonite may have foliation, lineation, and zones of recrystallization that an instance of Rock may not have. PARTONOMIC RELATION Although most ontologies apply the isA relation for class hierarchies, the mereological (partOf, part-whole, partitive) relation is probably of equal value for building the hierarchical structure of ontologies (Pribbenow, 2002; Schulz and Hahn, 2005; Koslicki, 2008). Notice that, although OWL uses the owl:subclassOf property to construct the isA relation, it does not have a qualified name to construct the partOf or hasPart relation. Many entities of interest to Earth scientists are composite, i.e., aggregates made of several parts, which have complex spatial or spatio-temporal structures. The following is a discussion of
22
H.A. Babaie
the formal semantics of the non-OWL constructs such as partOf and hasPart properties. Composite entities can be separated into parts, which maybe spatial objects, temporal entities (events), or spatio-temporal entities such as processes. For example, a subduction complex above the plate-boundary fault is a composite whole made of several parts that include the accretionary prism, forearc basin, plate-boundary fault zone, and inner and outer slope basins. A strike-slip fault is a composite of many segments, steps, and bends. A ductile deformation (process) along a fault may contain several spatio-temporal parts, which may include subprocesses of recrystallization, recovery, or cataclastic flow. Each of these parts may have its own parts, e.g., the prism part includes the offscraped and underplated thrust sheets of sediment; a segment is made of many other segments, steps, and bends (because fault is fractal). The offscraped thrust sheets are bounded thrust faults that are members (i.e., parts) of a collection of faults (a whole). The plate-boundary fault zone, at the base of the prism, may have the following parts: seismogenic zone, mylonite zone, cataclastic zone, aseismic zone, and boundaries that include the décollement. Notice that even though the seismogenic zone may be part of some plate-boundary fault zone in a subduction zone, not all such fault zones have a seismogenic zone. Facts like this need to be included in the ontology. The recrystallization may include dynamic recrystallization by grain boundary migration or rotation, or static recrystallization. The partitive partOf relation, like the isA relation, only holds between universals or instances, not between concepts or classes in information models. The formal definitions for the partOf relation at the instance level is given by (Smith et al., 2005): c partOf d at t, which means that the particular, continuant c is an instancelevel partOf the particular, continuant d at time t. The equivalent universal-level assertion: C partOf D means that for all c, t, if c instance of C at t, then there is some d such that d instanceOf D at t, and c partOf d at t. Notice the all-some structure in these definitions (given in italic font) (Smith et al., 2005). For example, Xenolith partOf IgneousRock means that all xenoliths, if any exists at any time anywhere, should be part of some igneous rock. This reflects the domainal fact that only some igneous rocks have xenolith, and that xenoliths do not make any sense if they are not part of an igneous rock. Smith (2004) defined the partitive partOf relation as a combination of the partFor and hasPart relations, which are defined as follows. The universal assertion X partFor Y provides information mainly about X, and it asserts that if x (an instance of X) exists at time t, then y (an instance of Y) also exists at t, and that x exists only as a partOf y (i.e., at instance level). For example, UnderplatedSediment partFor AccretionaryPrism, means that if there is an instance of underplated sediment at time t, then it is a part of an instance of an accretionary prism that exists at the same time. This means that underplated sediment does not exist (or mean anything) if there is no prism. It does not deny the fact that accretionary prisms may have no underplated sediment. The assertion Y hasPart X, on the other hand, provides information about Y, and asserts that if y (an instance of Y) exists at time t, then x (an instance
of X) exists at the same time as a partOf y (i.e., at the instance level). This means that y does not exist unless it has an instance of X as part. For example, NankaiTroughAccretionaryPrism hasPart AccretedSediment, which means that the accretionary prism cannot exist at time t (e.g., today) if it does not have accreted sediment as a part. Thus, X partOf Y holds, by combining the two assertions, if and only if for any instance x of X existing at time t, there is some simultaneously existing instance y of Y, such that x is an instance-level part of y, and y has x as part. For example, SubductingPlate partOf SubductionZone, which means that if the PhilippinePlate is an instance of the SubductingPlate and the NankaiTrough is an instance of the SubductionZone at the present time, then the subducting Philippine plate can only exist today as a part of the present-day Nankai Trough subduction zone. The relationship between parts and the whole that composes or aggregates the parts is the subject of the formal theory of part-whole structure called mereology (e.g., Simon, 1987; Varzi, 2003). The mereological partOf relation (denoted as P) and its inverse, hasPart (denoted as P-1), that obtain between two individuals (a part and its whole) constitute the fundamental relations for composite entities (e.g., Varzi, 1996; Casati and Varzi, 1999; Pontow, 2004; Schulz and Hahn, 2005; Koslicki, 2008). The relations may hold in the following cases (e.g., Schulz et al., 2006): between material and nonmaterial objects, parthood over time, parthood and spatial location, and parthood between occurrents (Schulz et al., 2005, 2006). For example, Mineral partOf Rock (or Rock hasPart Mineral) signifies that an instance of the class Mineral (which represents individual minerals in reality in a domain model) is a part of an instance of the class Rock at a specific instant of time, t. In UML the isA and partOf relations are represented by subclassing and composition (filled black diamond) or aggregation (open diamond), respectively. The partitive partOf relation is needed, along with other semantic relations such as attribution (e.g., thickness of the forearc sediments), class-subclass (e.g., ThrustFault isA Fault), spatial inclusion (e.g., NankaiTrough locatedIn Japan), and connection relation (e.g., ForearcBasin externallyConnectedTo Prism), for a better representation of reality (e.g., Donnelly et al., 2005). Partitive relations in ontologies, designed to portray the reality in natural systems, hold between universals and then are applied to individuals using constraints based on domain knowledge. We need to make a distinction between parthood at the universal level, i.e., PartOf (A, B) versus that at the instance or individual level, i.e., partOf (x, y) (Schulz et al., 2006). For example, compare PartOf (PlateBoundaryFault, SubductionZone) versus partOf (NankaiTroughPlateBoundaryFault, NankaiSubductionZone). The class-level PartOf (A, B) means that the universal A is part of universal B if every instance of B has some instance of A as part, and every instance of A is part of some instance of B. Schulz et al. (2006) introduced the ternary relation Inst, which relates an individual to a universal at time t, i.e., Inst (x, A, t). The formal definition of the class-level PartOf (A, B) is then given based on the Inst (x, A, t) as follows: ∀x,t Inst (x, A, t) ∃ y
Ontological relations and spatial reasoning in earth science ontologies Inst (y, B, t) ∧ p (x, y, t), which reads: for all x and t, x is an instanceOf A at time t, if there exists a y, where y is an instanceOf B at t, and x is a partOf y at t. For example, Inst (SanAndreasFault, StrikeSlipFault, t) means that the San Andreas fault is a strike-slip
fault at a given time; it may become a thrust fault at another geological time if the stress field changes due to plate reconfiguration. The definition of the universal-level HasPart (A, B) relation (which is an inverse of the universal PartOf [A, B] relation) is as follows: ∀y,t Inst (y, B, t) ∃ x Inst (x, A, t) ∧ p (x, y, t). The classlevel PartOf (A, B) can be interpreted in the following different ways (Schulz et al., 2006), which need to be clarified using constraints such as cardinality in the ontology model. It means that all instances of A are partOf some instances of B, for example: PartOf (underplatedSediment, AccretionaryPrism), i.e., there are some prisms in which there is no underplated sediment as part. It means that all instances of B have some instances of A as part, for example: PartOf (Bed, Formation). It means that all instances of A are partOf some instances of B and all instances of B have some instance of A as part. For example, PartOf (OuterSlopeBasin, AccretionaryPrism). It implies that there is at least one instance of A which is partOf some instance of B, for example, PartOf (SplayFault, AccretionaryPrism). Notice, however, that not all splay faults are part of accretionary prisms, and not all prisms have splay faults. The relationship between parts and the whole may be functional, structural, spatial, or based on whether or not parts are separable from the whole, or are homeomerous (Lambrix, 2000). An example of a functional partOf relation is that between cement/matrix (parts) and the sediment (a whole); the function of cement is to hold the grain together. The function of pores or open fractures (parts) in a rock (a whole) is to store or transmit fluids, respectively. Parts are separable if the whole survives after the parts are separated from it, for example, when dissolved ions (parts) are removed from water (the whole). Parts are homeomerous if they are the same kind of thing as their whole, e.g., calcite or quartz crystal (part) in a monomineralic marble or quartz quartzite (whole), respectively. Specimens taken from a sample of a core of granite are homeomerous with the sample, and with the core itself. Gerstl and Pribbenow (1996) divided a composite whole into three end-member types: heterogeneous complex, uniform collection, and homogeneous mass. The mereological partOf relation is further extended (i.e., specialized) by the following relations (Lambrix, 2000). The componentOf relation implies the existence of a complex, heterogeneous structure in the whole, and functional, structural, spatial, temporal, and other relations among the components (Gerstl and Pribbenow, 1996), and between components and the whole. The parts (i.e., components) are separable and nonhomeomerous, and they have spatial and temporal relations to the whole. Examples: UnderplatedSediment componentOf AccretionaryPrism; ForearcBasin componentOf SubductionComplex; PlateBoundaryFaultZone componentOf SubductionComplex. Both prism and the subduction complex, in these cases, are heterogeneous, complex wholes, with several
23
internal components that have spatial and temporal relations to each other. The memberOf (elementOf) relation relates all members (parts) to a compositionally uniform collection (a whole), in the same way, without any functional or subclassing implication. The parts in this case have membership based on spatial proximity, not type (i.e., are not subclasses of the whole), and they are separable and nonhomeomerous. Examples: RiserCore memberOf CoreCollection of the boreholes in the NanTroSEIZE project in the Nankai Trough. For a splay fault propagating from the plate-boundary fault: SplayFault memberOf SplayFaultCollection. FractureSet memberOf FractureSystem, for the sets of fractures forming in shear zones. SiteNT3-01A memberOf NanTroSEIZESiteCollection. The portionOf relation relates a homeomerous, separable, and nonfunctional part to a homogeneous whole. For example, WorkingCoreSplit portionOf Core represents the portion of the core, other than the archive core split, that is set aside for different analyses. Other examples include: SpotCoreSample portionOf RoundCoreSample; DrillingMudSample portionOf DrillingMud. The nonfunctional, separable, and nonhomeomerous stuffOf relation holds between a whole and the material that it is partly or wholly made of. Examples: Rock stuffOf Core, in addition to other stuff, such as air, water, and mud; Mineral stuffOf Vein; IgneousRock stuffOf Dike. There are several criteria that can be used to identify genuine parthood of entities (Smith et al., 2005). The main criterion is sortality, which means that parts need to be of the right type to enable instantiation of the whole. For example, a drilling pipe stuck in a bore hole is not part of rock; monitoring equipment in a borehole is not part of the borehole, they are containedIn it. Metal and plastic cannot be part of rock. A question arises as to whether a piece of meteorite embedded in a clastic sedimentary rock (e.g., mudstone) is part of the rock, even though meteorite and mudstone are not of the same type. In this case, the meteorite is a clast, and clasts are parts of the clastic rock. It is possible for the provenance to help in this case, where a meteorite that is extraterrestrial in origin does not fit the terrestrial origin of other clasts in the mudstone. The function of a part may be essential to the functioning of the whole. For example, the cement of a conglomerate is a part of the conglomerate because it is holding the clasts together. If there is no cement, the conglomerate would be another entity (loose gravel aggregate). A part may be a structural element of the whole. For example, one or more limbs of a fold are needed for a fold to exist. In most cases, the life cycle of the part and the whole correspond to each other. For example, quartz, feldspar, and mica are parts of a granitic rock. However, the feldspar may be altered into clay at a time when other components (i.e., mica and quartz) are still parts of the granite. There may be exceptions to the life cycle rule if temporary parthood occurs. For example, the seismogenic zone in the plate-boundary fault zone under the accretionary prism may migrate into the prism and become part of the prism at a later time.
24
H.A. Babaie
The relata of the part-whole relations, in addition to objects, can be activities and processes. For example, the phaseOf or stageOf relation holds between different (spatio-temporal) parts of activities or processes. The phaseOf and stageOf relations are functional but not separable or homeomerous. Activity in this case involves human or machine agents, and it may be sampling, examination, simulation, or drilling. Examples: Stage3 stageOf NanTroSEIZEDrilling; PaleomagneticMeasurement phaseOf NonDestructivePhysicalPropertyMeasurement; Microscopic Measurement phaseOf OnBoardAnalysis. All nonabstract objects occupy spatial and temporal regions. The placeOf relation holds
between an area (a whole) and the places that are located in it. Examples: Japan placeOf Nankai Trough. LOCATION AND CONTAINMENT RELATIONS The locatedIn relation (inverse of the placeOf relation) holds between a continuant and a spatial region (r) at time t (i.e., c locatedIn r at t), i.e., it depends on a function (Galton, 2000, 2004; Donnelly, 2004a), which assigns a region r(c, t) that any continuant instance c exactly occupies at time t. Thus, c locatedIn d at t means that r(c, t) partOf r(d, t) at t (Smith et al., 2005), which means c is locatedIn d if c’s region is part of d’s region (Donnelly, 2004a). Example: Vein locatedIn fracture at time t, Mylonite locatedIn ShearZone at time t. In all cases, scientists measure instances of the continuants at the present time, and therefore the present time is implied in the assertions (i.e., t is dropped). However, the present time becomes part of the past as new measurements are done in the “present times,” i.e., time instants of the future. Many of the locatedIn relations actually represent instantaneous parthood, for example, the SeismogenicZone partOf PlateBoundaryFault only applies at a specific time, and it may not be true at other times because the zone may migrate with time. Symbolically, the location relation is given as: locatedIn (c, d) = P(r[c], r[d]), which reads: c is locatedIn d if region c is partOf region d. Thus, if x is partOf y, then x is locatedIn y. Examples: if Mylonite locatedIn ShearZone at time t, then the region of Mylonite is partOf the region of the ShearZone at time t. If VolcaniclasticSediment is locatedIn ForearcBasin at t, then the region of VolcaniclasticSediment is partOf the region of the Forearc Basin at t. At the universal level, the assertion C locatedIn D means for all c, t, if c instanceOf C at time t, then there is some d such that d instanceOf D at t, and c locatedIn d at t. For example, at the present time: SanAndreasFault partiallyLocatedIn California, or YellowstoneHotSpot partiallyLocatedIn Wyoming. We can derive the following transitive relations from the locatedIn relation (Donnelly, 2004c): if c is locatedIn d and d is locatedIn z, then c is locatedIn z. Example: Mg2+ ion locatedIn water sample, which occupies a fracture, is located in the fracture. If c is part of d and d is locatedIn z, then c is locatedIn z. Example: A fracture that is part of a thrust sheet located in the accretionary prism is located in the prism. If c is locatedIn d and d is partOf z, then c is locatedIn z. Example: If volcaniclastic sediment locatedIn
the Kumano forearc basin sequence, and the forearc is partOf the Nankai Trough, then the volcaniclastic sediment is partOf the trough. If two objects coincide partially or wholly without being part of one another, then we use the coincidence relation, which is another kind of location relation. In other words, a continuant may exist in a spatial region occupied by another continuant of which it is not a part. In this case, the first entity may coincideWith but is not locatedIn the larger entity, which means that it is not part of the second entity. Examples are: water in an open fracture, and ions or contaminants in the pores of sediment or an aquifer. In all of these cases, there is no parthood relation, just spatial coincidence. It is not necessary for a fracture or pore to have water or a contaminant. Object x is said to overlap object y if x and y share a common part z (Pontow, 2004). In other words, Oxy = ∃z (Pzx ∧ Pzy). Examples: NankaiAccretionaryPrism overlap Nankai Trough. Object x is discreteFrom object y if x does not overlap y, i.e., Dxy = ¬Oxy. Example: SubductingPlate discreteFrom ForearcBasin; SlopeBasin discreteFrom ForearcBasin. In the partial coincidence (PCoin [x, y]) case, the regions of the two objects, x and y, overlap without a part-whole relation (i.e., ¬Oxy). Thus, PCoin (x, y) = Or(x)r(y). Partial coincidence is reflexive, PCoin (x, x), which means that any object partially coincides with itself, and it is symmetric, PCoin (x, y) = PCoin (y, x), which means that if x partially coincides with y, then y partially coincides with x. For example, ForearcBasin partiallyCoincidesWith AccretionaryPrism, which means that even though the spatial regions of both of these spatial entities overlap, the forearc basin is not part of the prism. Partial coincidence is more common than the total (whole) coincidence. The coincidesWith and locatedIn relations are related: locatedIn (x, y) → PCoin (x, y), i.e., if x is locatedIn y, then x coincides with y. Other relations include the containedIn, which obtains between a material continuant and a site, i.e., empty space that can contain the object. For example, ZeoliteCrystal containedIn Vug; Water containedIn Fracture; Contaminant containedIn Pore. Again, like the partOf relation, the containedIn relation holds at a certain time index, t. Some composite spatial entities are derived from other entities through the Boolean-like operations of sum, product, and complement, which are equivalent to the set theory’s union, intersection, and complement, respectively (Galton, 2004). For example, a fracture system in the plate-boundary fault zone is a sum of several variably oriented fracture sets of different types (e.g., Riedel shear, Y-shear, and P-shear fractures). This is true for onedimensional fracture traces and two-dimensional fracture planes. An example in three dimensions is the accretionary prism, which is the sum of all thrust sheets of accreted and underplated sediments and rocks. The derivesFrom relation obtains between two nonidentical individuals. The assertion c derivesFrom c1 means that continuant object c is derived from the continuant object c1 at some time t, and that c1 does not exist anymore. At the universal level, we
Ontological relations and spatial reasoning in earth science ontologies can have Gneiss derivesFrom Granite. There are three types of the derivesFrom relations: (1) Continuation, where an instance of a continuant succeeds another at time t. (2) Fusion, where two or more continuants fuse into a new continuant (earlier continuants cease to exist; examples include water and ions crystallizing into a new mineral at t; two magmas mixing at instant t; P- and Y-slip surfaces merging and becoming a new slip surface). Unification (Smith and Brogaard, 2003), which is a close relative of the fusion relation, in which two or more continuants join, but continue to exist, in a new complex continuant (for example, pebbles of many older rocks unifying in a conglomerate; porphyroclasts of older minerals unifying in a mylonite). (3) Fission, where a part of a continuant breaks into two or more other continuants at time t, which will exist on their own. Example: metamorphism of a rock leading to the formation of a new mineral at the expense of several older minerals (parts) that existed in the rock. The adjacentTo (Donnelly et al., 2005) relation is a proximity (topology) relation that holds between two disjoint continuants. The transformationOf relation represents change in the continuants over time. It obtains between an instance of a class C at time t, which used to be an instance of another disjoint class C1 at an earlier time t1. Examples: QuartzofeldspathicMylonite transformationOf Granite; Soil transformationOf Rock at t. The absorb relation obtains when a continuant continues to exist but absorbs another continuant, which ceases to exist on its own, for example, Mineral absorb Water, which means a mineral may absorb a water molecule, H2O as hydroxyl ion, OH–1, or an ion into its crystalline structure. Rector et al. (2006) introduced the notion of collectives and granular parts, which relate closely to the formalization of entities such as rocks and sediment that are made of many parts. Such collective wholes (e.g., rock, sediment) have “emergent” properties that do not exist in the individual parts (minerals, grains), for example, the emergent property of the silicate chain structure when the silicon-oxygen tetrahedra connect to each other to make a whole molecule. The emergent property of the silicate mineral (a collective) is not the same as the individual atoms (silicon, oxygen). A collective of grains in a sedimentary rock has emergent properties such as porosity, hydraulic conductivity, texture, and fabric that do not make sense for individual grains. Collectives are themselves a part of larger collectives, for example, minerals, which are collectives of atoms, are part of larger rock collectives, which are themselves part of even larger collectives organized in different packages such as lithostratigraphic rock units, members, formations, groups, and sequences. The emergent properties of a lithostratigraphic rock unit may include such things as its anisotropy and homogeneity in all sorts of physical properties. Rector et al. (2006) distinguished two types of subrelation under the parthood relation related to collectivity: granular parthood and determinate parthood. The collectives are aggregates of grains that play the same role in the whole, and they do not depend on the number of grains in the collective. For example, the relation of grains in a layer of sediment is a granular part-
25
hood. In this case, there is an indeterminate number of grains in the layer, and removal of one part (i.e., grain) does not necessarily diminish the whole (sedimentary layer). Compare this with the relation of a collective of several tectonostratigraphic units of different ages separated by unconformities or faults (e.g., in an accretionary prism). In this case, which represents the determinate parthood, the removal of any of the parts (e.g., thrust sheet), which are limited in number, will necessarily diminish the whole, and the integrity of the collective will be lost. DISCUSSION AND SUMMARY Continuant entities that exist in any domain of the natural world stand in a variety of ontological relations to each other, including instantiation, parthood (P), overlap (O), location (LocIn), containment, and adjacency. These types of binary relations, denoted by R (Smith and Rosse, 2004; Donnelly et al., 2005), can hold between universals or particulars of any earth science discipline. We use consensual, agreed-upon terms to define the universals in our domain, and we represent them as classes in information systems. The relations that exist among the instances of the universal may not exist among the concepts (terms) that represent them. While only one instance of a material, universal type can occupy a unique spatial region in a specific temporal region, many instances of a given universal type can synchronously exist in different spatial regions. The spatial regions of these simultaneously existing instances of the same universal type may or may not overlap. Smith and Rosse (2004) and Donnelly et al. (2005) introduced a refinement of each of the binary relations (R) of parthood (P), overlap (O), instantiation, and location (LocIn). In the following, R can stand for P, O, LocIn, or other types of binary relations. Donnelly et al. (2005) defined R1(A, B), R2 (A, B), and R12 (A, B) relations among universal types A and B, depending on whether the restriction is put on the first argument (A) or the second (B). These R-structures are defined next (notice the allsome structure in the three cases). R1(A, B) = ∀x {Inst (x, A) → ∃y (Inst[y, B] ∧ Rxy)}, i.e., A is related to B (e.g., by instantiation, parthood, location) if all instances of A are related to some instances of B. This means that each A stands in R relation (e.g., proper parthood) to some B. Notice that the emphasis is on A, i.e., something is true about A; hence subscript 1 in R1 to emphasize the first argument. For example the assertion: PP1(ForearcBasin, SubductionComplex) means that each forearc basin is a properPartOf some subduction zone. This does not say that each subduction zone must have a forearc basin as proper part. The assertion: O1(SeismogenicZone, PlateBoundaryFaultZone) means that every seismogenic zone overlaps (i.e., shares a common part with) some plate-boundary fault zone under the prism. This assertion does not say that each plate-boundary fault zone must overlap a seismogenic zone. O1(SplayFault, Décollement) means that every splay fault overlaps some décollement, but not the other way around. LocIn1(Vein, Fracture) asserts that every vein is located in some fracture. It does not mean that every fracture
26
H.A. Babaie
has vein in it. PCoin1(AccretionaryPrism, SubductingPlate) asserts that every accretionary prism partially coincides with some subducting plate, but every subducting plate does not have to partially coincide with an accretionary prism. Notice that we can assert PP1(AccretedSediment, AccretionaryPrism), but not PP1(UnderplatedSediment, AccretionaryPrism). We can assert PCoin1(Trench, SubductingPlate) but not PCoin1(SubductingPlate, Trench). B2(A, B) = ∀y {Inst (y, B) → ∃x (Inst[x, A] ∧ Rxy)}, i.e., the relation between A and B stands if all instances of B are related to some instance of A. Here, restriction is on the second argument (B), i.e., it says that for each B, there is some A that stands in R relation to it, for example, PP2(Sediment, ForearcBasin), which states that each forearc basin has some sediment as a proper part. Notice that the statement does not say that each instance of sediment is a proper part of a forearc basin. The assertion: O2(AccretionaryPrism, SlopeBasin) states that every slope basin overlaps some accretionary prism. However, it does not assert that every accretionary prism overlaps some slope basin. LocIn2(Sediment, ForearcBasin) states that every forearc basin contains sediment. However, not all sediments are located in forearc basins. PCoin2(SubductionZone, UnderplatedSediment) states that every underplated sediment partially coincides with some subduction zone. It does not say every subduction zone partially coincides with underplated sediment. R12(A, B) = R1(A, B) ∧ R2(A, B) conjuncts the above two cases and states that each instance of A stands in R relation to some B, and each instance of B stands in R relation to some instance of A. In this case, the restriction is on all instances of both A and B. For example, PP12(SubductingPlate, SubductionZone), which says that each subducting plate is a proper part of a subduction zone, and each subduction zone has a subducting plate as a proper part. O12(SubductingSediment, PlateBoundaryFault) asserts that every subducting sediment overlaps some plate-boundary fault, and every plate-boundary fault overlaps some subducting sediment. LocIn12(SubductionZone, Ocean) states that every subduction zone is located in some ocean, and every ocean has some subduction zone. PCoin12(TransformFault, MidOceanRidge) asserts every transform fault partially coincides with some mid-ocean ridge, and every mid-ocean ridge partially coincides with some transform fault. These R structures, defined by Donnelly et al. (2005), provide a powerful means for spatial reasoning. It can be concluded that a complete and comprehensive representation of knowledge in a specific earth science domain requires the application of these R structures. The axioms of the ontologies in these domains need to differentiate among R1, R2, and R12 structures to enable effective spatial reasoning. REFERENCES CITED Allemang, D., and Hendler, J., 2008, Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL: Amsterdam, Morgan Kaufmann Publishers, 330 p.
Antoniou, G., and van Harmelen, F., 2008, A Semantic Web Primer: Cambridge, Massachusetts Institute of Technology Press, 264 p. Babaie, H., 2011, Modeling geodynamic processes with ontologies, in Keller, R., and Baru, C., eds., Geoinformatics: Cyberinfrastructure for the Solid Earth Sciences: Cambridge, UK, Cambridge University Press, p. 166–189. Bains, P., 2006, The Primacy of Semiosis: An Ontology of Relations: Toronto, University of Toronto Press, 186 p. Bittner, T., Donnelley, M., and Smith, B., 2004, Endurants and perdurants in directly depicting ontologies: AI Communications, v. 17, no. 4, p. 247– 258. Brachman, R.J., and Levesque, H.J., 2004, Knowledge Representation and Reasoning: Amsterdam, Morgan Kaufmann Publishers, 381 p. Breitman, K.K., Casanova, M.A., and Truszkowski, W., 2007, Semantic Web: Concepts, Technologies and Applications: Berlin, Springer-Verlag, 327 p. Carrara, M., and Sacchi, E., 2006, Propositions: An introduction, in Carrara, M., and Sacchi, E., eds., Propositions: Semantic and Ontological Issues: Amsterdam, Rodopi B.V. Publishing, p. 1–27. Casati, R., and Varzi, A.C., 1999, Parts and Places: The Structures of Spatial Representation: Cambridge, Massachusetts Institute of Technology Press, 238 p. Cohn, A.G., and Varzi, A.C., 2003, Mereological connection: Journal of Philosophical Logic, v. 32, p. 357–390, doi:10.1023/A:1024895012224. Colomb, R.M., 2007, Ontology and the Semantic Web: Amsterdam, IOS Press, 258 p. Cruse, D.A., 2002, Hyponymy and its varieties, in Green, R., Bean, C.A., and Myaeng, S.H., eds., The Semantics of Relationships: An Interdisciplinary Perspective: Information Science and Knowledge Management Series: Dordrecht, the Netherlands, Kluwer Academic Publishers, p. 3–21. Donnelly, M., 2004a, Layered mereology, in Gottlob, G., and Walsh, T., eds., Proceedings of the 18th Joint International Conference on Artificial Intelligence (JICAI 2003): San Francisco, Morgan Kaufman, p. 1269–1274. Donnelly, M., 2004b, A formal theory for reasoning about parthood, connection, and location: Artificial Intelligence, v. 160, p. 145–172, doi:10.1016/ j.artint.2004.06.003. Donnelly, M., 2004c, On parts and holes: The spatial structure of the human body, in Fieschi, M., et al., eds., MEDINFO 2004: Amsterdam, IOS Press, p. 351–355. Donnelly, M., Bittner, T., and Rosse, C., 2005, A formal theory for spatial representation and reasoning in biomedical ontologies: Artificial Intelligence in Medicine, v. 36, no. 1, p. 1–27, doi:10.1016/j.artmed.2005.07.004. Fellbaum, C., 2002, On the semantics of troponymy, in Green, R., Bean, C.A., and Myaeng, S.H., eds., The Semantics of Relationships: An Interdisciplinary Perspective: Information Science and Knowledge Management Series: Dordrecht, the Netherlands, Kluwer Academic Publishers, p. 22–34. Galton, A., 2000, Qualitative Spatial Change: New York, Oxford University Press, 409 p. Galton, A., 2004, Multidimensional Mereotopology, in Dubois, D., Welty, C., and Williams, M.-A., eds., Proceedings of the International Conference on the Principles of Knowledge Representation (KR’04): Menlo Park, California, AAAI Press, p. 45–54. Gerstl, P., and Pribbenow, S., 1996, A conceptual theory of part-whole relations and its applications: Data and Knowledge Engineering, v. 20, p. 305–322, doi:10.1016/S0169-023X(96)00014-6. Guarino, N., and Welty, C., 2002, Identity and subsumption, in Green, R., Bean, C.A., and Myaeng, S.H., eds., The Semantics of Relationships: An Interdisciplinary Perspective: Information Science and Knowledge Management Series: Dordrecht, the Netherlands, Kluwer Academic Publishers, p. 111–125. Hitzeler, P., Krotzsch, M., and Rudolph, S., 2009, Foundations of Semantic Web Technologies: Boca Raton, Florida, CRC Press, 427 p. Jouis, C., 2002, Logic of relationships, in Green, R., Bean, C.A., and Myaeng, S.H., eds., The Semantics of Relationships: An Interdisciplinary Perspective: Information Science and Knowledge Management Series: Dordrecht, the Netherlands, Kluwer Academic Publishers, p. 127–140. Kashyap, V., Bussler, C., and Moran, M., 2008, The Semantic Web: Berlin, Springer-Verlag, 404 p. Klein, G., and Smith, B., 2006, Concept Systems and Ontologies: Journal of Biomedical Informatics, v. 39, no. 3, p. 274–287. Koslicki, K., 2008, The Structure of Objects: New York, Oxford University Press, 288 p.
Ontological relations and spatial reasoning in earth science ontologies Lambe, P., 2007, Organizing Knowledge: Taxonomies, Knowledge and Organizational Effectiveness: Oxford, UK, Chandos Publishing, 277 p. Lambrix, P., 2000, Composite objects, in Part-Whole Reasoning: Lecture Notes in Artificial Intelligence, v. 1771: New York, Springer, p. 21–30. Levesque, H.J., and Lakemeyer, G., 2000, The Logic of Knowledge Bases: Cambridge, Massachusetts Institute of Technology Press, 282 p. Pontow, C., 2004, A note on the axiomatics of theories of parthood: Data and Knowledge Engineering, v. 50, p. 195–213, doi:10.1016/j.datak.2003.12 .002. Pribbenow, S., 2002, Meronomic relationships: From classical mereology to complex part-whole relations, in Green, R., Bean, C., and Myaeng, S.H., eds., The Semantics of Relationships: An Interdisciplinary Perspective: Information Science and Knowledge Management Series: Dordrecht, the Netherlands, Kluwer Academic Publishers, p. 35–50. Raskin, R., and Pan, M.J., 2005, Knowledge representation in the semantic web for Earth and environmental terminology (SWEET), in Babaie, H.A., and Ramachandran, R., eds., Applications in Geosciences: Computers and Geosciences, v. 31, p. 1119–1125. Rector, A., Rogers, J., and Bittner, T., 2006, Granularity scale and collectivity: When size does and does not matter: Journal of Biomedical Informatics, v. 39, p. 333–349, doi:10.1016/j.jbi.2005.08.010. Schulz, S., and Hahn, U., 2005, Part-whole representation and reasoning in biomedical ontologies: Artificial Intelligence in Medicine, v. 34, no. 3, doi:10.1016/j.artmed.2004.11.005, p. 179–200. Schulz, S., Daumke, P., Smith, B., and Hahn, U., 2005, How to distinguish parthood from location in bio-ontologies, in Friedman, C.P., ed., American Medical Informatics Association (AMIA) Annual Symposium Proceedings 2005, p. 669–673. Schulz, S., Kumar, A., and Bittner, T., 2006, Biomedical ontologies: What partOf is and isn’t: Journal of Biomedical Informatics, v. 39, p. 350–361, doi:10.1016/j.jbi.2005.11.003. Segaran, T., Evans, C., and Taylor, J., 2009, Programming the Semantic Web: Sebastopol, California, O’Reilly Media Inc., 282 p. Shanks, G., Tansley, E., and Weber, R., 2004, Representing composites in conceptual modeling: Communications of the Association for Computing Machinery, v. 47, no. 7, p. 77–80, doi:10.1145/1005817.1005826. Sider, T., 2001, Four-Dimensionalism: An Ontology of Persistence and Time: Oxford, UK, Clarendon Press, 255 p.
27
Simon, P., 1987, Parts: A Study in Ontology: Oxford, UK, Clarendon Press, 390 p. Sinha, K., ed., 2006, Geoinformatics: Data to Knowledge: Geological Society of America Special Paper 397, 288 p. Smith, B., 2001, Fiat objects: Topoi, v. 20, no. 2, p. 131–148, doi:10.1023/A: 1017948522031. Smith, B., 2003, Ontology, in Floridi, L., ed., Blackwell Guide to the Philosophy of Computing and Information: Oxford, UK, Blackwell, p. 155–166. Smith, B., 2004, Beyond concepts: Ontology as reality representation, in Varzi, A., and Vieu, L., eds., Formal Ontology and Information Systems: Amsterdam, IOS Press, p. 73–84. Smith, B., and Brogaard, B., 2003, Sixteen days: The Journal of Medicine and Philosophy, v. 28, p. 45–78, doi:10.1076/jmep.28.1.45.14172. Smith, B., and Grenon, P., 2004, The cornucopia of formal-ontological relations: Dialectica, v. 58, p. 279–296, doi:10.1111/j.1746-8361.2004 .tb00305.x. Smith, B., and Rosse, C., 2004, The role of foundational relations in the alignment of biomedical ontologies, in Fiechi, M., et al., eds., Proceedings MEDINFO 2004: Amsterdam, IOS Press, p. 444–448. Smith, B., Ceuster, W., Kellages, B., Kohler, J., Kumar, A., Lomax, J., Nungall, C., Neuhaus, F., Rector, A.L., and Rosse, C., 2005, Relations in biomedical ontologies: Genome Biology, v. 6, p. R46, doi:10.1186/gb-2005-6 -5-r46. Tobin, H.J., and Kinoshita, M., 2006, NanTroSEIZE: The IODP Nankai Trough Seismogenic Zone Experiment: Scientific Drilling, v. 2, p. 23–27. Tversky, B., 1990, Where partonomies and taxonomies meet, in Tsohatzidis, S.L., ed., Meanings and Prototypes: Studies in Linguistics and Categorization: New York, Routledge, p. 334–344. Varzi, A.C., 1996, Parts, wholes, and part-whole relations: The prospect of mereotopology: Data and Knowledge Engineering, v. 20, p. 259–286, doi:10.1016/S0169-023X(96)00017-1. Varzi, A.C., 2003, Mereology, in Zalta, E.N., ed., Stanford Encyclopedia of Philosophy: CSLI (Center for the Study of Language and Information) Internet publication: http://plato.stanford.edu/contents.html (accessed May 2011). MANUSCRIPT ACCEPTED BY THE SOCIETY 17 FEBRUARY 2011
Printed in the USA
The Geological Society of America Special Paper 482 2011
Geoscience metadata—No pain, no gain Jeremy R.A. Giles British Geological Survey, Keyworth, Nottingham NG12 5GG, UK
ABSTRACT Metadata are an essential tool for use in the management of geoscientific information. Well-managed metadata provide a number of key information management functions, including facilitating data discovery and providing a robust framework for information asset management. The realization of these and other benefits is predicated on the existence of well-maintained metadata. Sadly, metadata are commonly absent, incomplete, inaccurate, inarticulate, or obsolete. Some of the benefits and opportunities that arise from well-managed metadata collections are discussed here. The rapid development of spatial data infrastructures means that maintenance of metadata for geoscience information is becoming increasingly important.
a proposed purpose, and its limitations. On the simplest level, metadata should give the reader of the record a clear understanding of: (1) the information content of the data set; (2) the reason(s) for data collection; (3) the location(s) where the data were collected; (4) the time at which, or interval during which, the data were collected; (5) the person, team, or organization that collected the data; and (6) the methods of data collection. The term “metadata” is relatively new—it was first used in print in 1973. However, the recording process that the term encompasses has a long history. Terms such as registry, directory, and catalogues describe records that fulfill a similar function. Numerous data catalogues and data directories have been, and continue to be, published physically by organizations wishing to make potential users aware of their data holdings. With the advent of the Internet, many of these resources have been migrated progressively into an Internet compatible form. For example, the National Aeronautics and Space Administration Global Change Master Directory (http://gcmd.nasa.gov), which
INTRODUCTION The phrase “information entropy” was coined by Michener et al. (1997) to describe the tendency for stored information to become more disordered over time. Metadata provide a tool that has been developed to slow this inexorable deterioration of the value of the information content of a data set. Metadata’s high-level purpose is to maintain the continuing usefulness of information so that it can be understood, exploited, reused and repurposed. In practice, the high-level purpose of metadata is fulfilled through its implementation in three principal roles or functions. The first role is described as “discovery metadata.” As the name suggests, this is primarily a tool that supports identification of an information object that might be suitable for an intended purpose. The second metadata role involves its use to describe the information object in such a way that the potential user comes to an understanding of both the context in which the underlying data set was created and processed, and its potential and limitations. The third role of metadata is as an asset management tool for use by the data set manager responsible for their long-term curation. A good metadata record enables the potential user of a data set to understand the contents of the resource, its suitability for
Giles, J.R.A., 2011, Geoscience metadata—No pain, no gain, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 29–33, doi:10.1130/2011.2482(03). For permission to copy, contact
[email protected]. © 2011 The Geological Society of America. All rights reserved.
29
30
J.R.A. Giles
was created in 1987 to enable users to locate and obtain access to earth science data sets, is a sophisticated metadata collection. By the last decade of the twentieth century, numerous metadata collections were available. However, many of them followed organizational, community, national, or regional standards, and, as a result, they lacked interoperability. Geoscience information is commonly difficult and thus expensive to acquire. As a result, geoscientists have become adept at reusing and repurposing previously acquired information (Wood and Curtis, 2004). In many countries, a legislative framework exists to sustain reuse of commercially acquired geoscientific information to support the work of regulators and geological survey organizations. For example, in Great Britain, the Water Resources Act of 1991 requires information collected during the drilling of water wells to be supplied to the Natural Environmental Research Council, the parent body for the British Geological Survey. In some cases, existing legal frameworks are supplemented by informal agreements for data sharing that include data reuse in academic research. However, anyone who has attempted such reuse of information might have found it to be a frustrating and time-consuming activity. Peebler (1996) estimated that the average exploration and production professional spent from 20% to 30% of their total project time searching for, loading, and formatting data. Other studies have suggested that this figure can be much higher for exploration in new areas. In addition, without a clear understanding of the context in which the data sets were originally created, their value and usefulness can be questioned. Generally, the older the data set, the more difficult it is to understand in the absence of appropriate metadata. This problem is well understood by Michener et al. (1997), who described the typical degradation of information content associated with data over time. They identified a number of potential inflection points within the general curve that represents the decay of the information content value. Examples include the retirement, career change, or death of the data set creator(s) and advances in storage technology. BENEFITS OF METADATA A well-formed metadata record provides a wide range of benefits (IGGI Working Group on Metadata Implementation, 2004). It enables faster and easier discovery of information, which facilitates communication and allows development of a comprehensive understanding of existing data sets. Such knowledge will in turn prevent the re-creation of a data set that has previously been compiled, and avoid the additional costs that such re-creation would incur. After discovery of one or more data sets, the well-formed metadata records allow accurate judgments to be made about the potential for reuse and repurposing in ongoing research or in the development of new products and services. A clear understanding of the context of the collection of a raw data set and its subsequent processing contributes to the long-term preservation of the data set. Metadata are an essential tool for those who manage data sets as an organizational asset because they provide the key
information that enables managers to make and justify meaningful decisions. The Global Spatial Data Infrastructure Association has defined spatial data infrastructure as “the relevant base collection of technologies, policies, and institutional arrangements that facilitate the availability of and access to spatial data” (GSDI, 2004, p. 8). Metadata represent a fundamental element of such a spatial data infrastructure. An example of a spatial data infrastructure initiative is the Infrastructure for Spatial Information in the European Community, or INSPIRE (Directive 2007/2/EC). The European Union has recognized that the general situation relating to spatial information in Europe is one of fragmentation of data sets and sources, gaps in availability, lack of harmonization between data sets at different geographical scales, and duplication of information collection. These problems make it difficult to identify, access, and use the data that are available. There is a clearly recognized need for quality georeferenced information to support the development and understanding of the complexity and interactions among human activities and environmental pressures and the associated impacts. Geology, characterized according to composition and structure, including bedrock, aquifers, and geomorphology, is named specifically in the INSPIRE Directive, along with soils, natural risks (such as landslides, earthquakes, and volcanoes), energy resources, and mineral resources. The requirements of INSPIRE are some of the drivers underlying the OneGeology initiative (www.onegeology.org), which aims to create dynamic geological map data of the world that will be made available via the Web. Other examples of spatial data infrastructures include the Australian Spatial Data Infrastructure (ASDI), United Nations Spatial Data Infrastructure (UNSDI), and the U.S. National Spatial Data Infrastructure (NSDI). The primary element of a spatial data infrastructure is good quality geospatial metadata. A standards framework was put in place by the International Organization for Standardization (ISO) with the development of standards for geographic information metadata (ISO 19115:2003 [ISO, 2003]) and an extensible markup language (XML) schema implementation for the metadata (ISO/TS 19139:2007 [ISO, 2007]). This primary importance of metadata is also reflected in INSPIRE, where metadata are the first element of the implementation road map. Carefully targeted metadata collections can contribute to the generation of real economic benefit. In 2000, the UK government identified the need to stimulate oil and gas investment on the UK continental shelf by attracting new niche companies that had the skills necessary to exploit previously undeveloped discoveries by utilizing technically innovative solutions. The aim was to ensure that indigenous oil and gas production remained at significant levels so that it would continue to contribute about $70 billion a year to the UK economy. A key element of the strategy was the development of metadata-based information resources that would provide details about the information that was available and where to obtain it. The result was DEAL (www.ukdeal .co.uk), which is sponsored by the UK Department of Energy and Climate Change and the industry body Oil and Gas UK. DEAL
Geoscience metadata—No pain, no gain publishes information on (1) oil and gas wells; (2) seismic surveys; (3) infrastructure, such as pipelines; (4) licenses; (5) fields; and (6) strategic environmental assessments. One of the principal benefits of metadata for public-sector bodies, such as geological survey organizations, is the reduction of legislative compliance risk. Many countries have established freedom of information legislation under which public-sector bodies are required to respond to requests for information within short time scales. Such legislation is predicated on the existence of sufficient metadata to enable resource discovery. The European Union is in the process of implementing a long-term strategy on improving access to, and the sharing of, environmental information, including many aspects of the geosciences. This includes improving public access to environmental information (Directive 2003/4/EC), establishing an infrastructure for spatial information in the European Union (INSPIRE), and drafting a directive for a Shared Environmental Information System (SEIS). Each of these directives are being, or will be, transposed into national legislation in the European Union member states, where individual geological survey organizations will be required to create, disseminate, and maintain metadata to comply with their legal obligations. THE PROBLEM WITH METADATA In a perfect world, every data set would be fully and clearly described by a complete metadata record. The record would be maintained regularly, so that the information content remained up-to-date and accurate. Individual metadata records would take into account similar records that were already in existence and would ensure that the two records reflected their close relationship while clearly describing any distinctive features. This would guarantee that appropriate database searches would be able to recall apposite information with precision. “Recall” and “precision” are terms originally used in this context within the library community. Recall describes the capability to discover relevant records; a search that misses a lot of relevant information is described as having poor recall. A search that recalls relevant records along with numerous irrelevant ones is said to have poor precision. The user finds it difficult to identify the valuable records amongst the numerous returns. Many simple searches using Internet search engines have poor precision. In the real world, individual metadata records fall far short of the ideal. Poor-quality metadata can lead to misleading conclusions and costly mistakes, yet few people understand the nature of the errors associated with their own metadata. The principal obstacle to the creation and maintenance of a well-formed metadata record is poor management control. This shortfall expresses itself in the form of incomplete, inaccurate, internally inconsistent, inarticulate, and obsolete records. To expect data creators to compile complex and accurate metadata records routinely and willingly is optimistic. An appropriate management framework must exist to promulgate policies that will motivate data creators to ensure that the metadata records are created and maintained.
31
Another persistent problem occurs when metadata aggregators harvest third-party collections to compile more extensive thematic or regional metadata collections. Generally, these records are copied to the aggregators’ metadata collections, and, if for some reason the link with the original record is broken, the copy commonly continues to be published in its obsolete form. It is not uncommon to find such obsolete records, many over a decade old, still being published by metadata aggregators. REALIZING THE BENEFITS OF METADATA For an organization to be able to create, maintain, and disseminate its metadata in a consistent and reliable manner, there must be an appropriate management framework in place to ensure success. Without the subjective pain of an investment of time and resources, there will be no gains; the benefits will not be realized. The steps required are straightforward. (1) Establish a metadata policy. (2) Adopt an appropriate standards framework. (3) Initiate and support metadata collection. (4) Disseminate the metadata. (5) Maintain the metadata. The first and most significant step for any organization wishing to implement systematic, organization-wide metadata is to define an achievable metadata policy. This is a set of broad, high-level principles that form the guiding framework within which the metadata management can operate. This policy must be embraced formally by senior management and supported by appropriate resources and authority. The metadata policy would normally be a subsidiary part of the organizational data or information policy, and as such should cross-refer to it. Experience has shown that a successful metadata policy implementation needs committed managerial backing at the highest level within an organization. The compilation of an individual well-formed metadata record by someone who has knowledge and experience of the data set is a trivial task. However, the management overhead of convincing many busy staff across an entire organization that they should each provide information individually is considerable. A strong and unambiguous high-level directive is absolutely essential. Staff members must be left with no doubt regarding the benefits of undertaking the required actions and the penalties for not doing so. Thus, it is also important that policy statements have a champion who will own the policy at an executive level within an organization. The policy champion will be accountable for ensuring compliance within the organization and identifying the resources required to create, maintain, disseminate, and exploit the metadata resource. In recent years, the International Organization for Standardization (ISO) has established the currently definitive standards framework for metadata. Its adoption and implementation will eventually remove the previously complex situation presented by numerous independent organizational, community, national, or regional standards. Many organizations are still migrating their metadata collections to the appropriate ISO standards, and
32
J.R.A. Giles
it will still be some years before interoperability becomes normal. However, any organization starting to create a new metadata collection now, or reengineering an existing collection, should adopt the relevant ISO standards as the core of their metadata policy. The primary metadata standard for nonspatial data is ISO 15836:2009 (ISO, 2009), and for spatial data, the metadata standard is ISO 19115:2003 (ISO, 2003). Because the ISO standards do not yet meet all requirements, there is a continuing need for community standards to extend the ISO standards. The first major task following the adoption of a metadata policy and a standard is to initiate compilation of metadata across the organization. This should be managed as a single project undertaken by a data set expert who is given the task of identifying the organization’s data sets and one or more individual(s) who understands the specific resources. The quickest way to ensure that the metadata are compiled in a consistent and complete form is to have each of the data set experts interviewed by a metadata steward who is responsible for creating the metadata record. Dissemination of an organization’s metadata can be arranged in a number of ways. At its simplest, the metadata inventory could be published regularly as a searchable document on an internal server or intranet. It could be published regularly as sets of static Web pages or as a query form Web page linking to a metadata database. The better way is as a Web service publishing ISO/TS 19139:2009 (ISO, 2009) XML implementation of ISO 19115:2003 (ISO, 2003). This approach gives users the greatest flexibility in how they select, manipulate, and present metadata from one or more organizations. Once the metadata for an organization have been compiled, it is essential that they be maintained actively, or their currency and value will depreciate over time. Maintenance is required on several levels, and these each need to be considered. To realize the full potential of a metadata collection, it is commonly necessary to maintain it over long periods. Even with investment in technically sophisticated search tools, such systems will find little user acceptance if the data are incomplete or not up-to-date. Hence, following initial metadata compilation, the subsequent metadata maintenance is of major importance. One role of the metadata steward is to ensure that maintenance is carried out consistently over a long period. The shelf life of a metadata record is surprisingly short, and they must be reviewed and updated on a regular cycle. Contact details within metadata can only be described as volatile, and they need reviewing more frequently than other elements. Other aspects of the metadata record can be reviewed on different cycles, depending upon whether the data set described is static or dynamic. Such reviews should not be left to the individual data set experts because their exit from the organization might be the cause of the records’ obsolescence. Review should be undertaken by the metadata steward who is responsible for maintaining the metadata records. In addition to reviewing the metadata content, the steward should also check the currency of its underlying metadata standard, in case it has become outdated following revision of the ISO standards.
KEY ROLES IN METADATA IMPLEMENTATION There are three key roles in the process of managing metadata. A successful metadata policy implementation requires authority from the highest level within the organization. The metadata policy champion is the owner of the policy within the organization and is responsible for ensuring that appropriate authority and resources are available to implement and maintain the policy. A metadata steward is responsible for day-to-day management of the organization’s metadata and for ensuring that the metadata record is comprehensive and meets the standards adopted by the organization. They report to the policy champion and can call upon authority and resources to fulfill their role. They work with the data set experts to compile new metadata records and maintain existing ones. When data set experts change roles or leave the organization, they ensure that a replacement is identified. Finally, they check periodically whether the ISO standard used by the organization has been updated. The data set expert is responsible for providing the metadata steward with sufficient information to create and maintain a well-formed metadata record. CONCLUSION The whole geoscience community would benefit if highquality metadata were available. It would enable the rapid discovery, accurate assessment, and easier management of information assets. Its absence adds considerable hidden costs to the whole sector in terms of lost time and lost opportunities. Metadata have a poor reputation as being ineffectual and expensive. This reputation is rightly deserved because many organizations fail to put the management framework in place to ensure that well-formed metadata records are created, maintained, and disseminated. However, metadata are a powerful tool that can provide a wide range of benefits to organizations that have the discipline to manage metadata activity effectively. One way to improve the situation is to recognize and criticize existing published records that do not meet the needs of the users. Take time to e-mail the record owner and explain metadata shortfalls or why the records are inaccurate or out of date. Use every opportunity to peer-review existing metadata records. REFERENCES CITED Directive 2003/4/EC, 2003, On public access to environmental information: Official Journal of the European Union, L&C, v. L41, p. 26. Directive 2007/2/EC, 2007, On establishing an Infrastructure for Spatial Information in the European Community (INSPIRE): Official Journal of the European Union, L&C, v. L108, p. 1. Global Spatial Data Infrastructure Association (GSDI), 2004, Developing Spatial Data Infrastructures: The SDI Cookbook, Version 2.0: Global Spatial Data Infrastructure Association, www.gsdi.org/docs2004/Cookbook/ cookbookV2.0.pdf (accessed 20 April 2011). IGGI (Intragovernmental Group on Geographic Information) Working Group on Metadata Implementation, 2004, The Principles of Good Metadata Management (2nd ed.): London, Office of the Deputy Prime Minister, 30 p.
Geoscience metadata—No pain, no gain International Organization for Standards (ISO), 2003, ISO 19115:2003, Geographic Information—Metadata: Geneva, Switzerland, International Organization for Standards, 140 p. International Organization for Standards (ISO), 2007, ISO/TS 19139:2007— Geographic Information—Metadata: XML Schema Implementation: Geneva, Switzerland, International Organization for Standards, 111 p. International Organization for Standards (ISO), 2009, ISO 15836:2009—Information and Documentation: The Dublin Core Metadata Element Set: Geneva, Switzerland, International Organization for Standards, 5 p. Michener, W.K., Brunt, J.W., Helly, J., Kirchner, T.B., and Stafford, S.G., 1997, Non-geospatial metadata for the ecological sciences: Ecological Applica-
33
tions, v. 7, p. 330–342, doi:10.1890/1051-0761(1997)007[0330:NMFTES ]2.0.CO;2. Peebler, R., 1996, Extended integration: The key to future productivity leap: Oil & Gas Journal, v. 94, no. 21, p. 57–61. Wood, R., and Curtis, A., 2004, Geological prior information and its application to geoscientific problems, in Curtis, A., and Wood, R., eds., Geological Prior Information: Informing Science and Engineering: Geological Society of London Special Publication 239, p. 1–14. MANUSCRIPT ACCEPTED BY THE SOCIETY 17 FEBRUARY 2011
Printed in the USA
The Geological Society of America Special Paper 482 2011
Geoscience data and derived spatial information: Societal impacts and benefits, and relevance to geological surveys and agencies R.A. Hughes British Geological Survey, Sir Kingsley Dunham Centre, Keyworth, Nottingham NG12 5GG, UK
ABSTRACT Low levels of geospatial literacy and geoscientific understanding mean that basic geological map data are meaningful to, and can therefore be interpreted by, a remarkably small number of people with specialist knowledge and training. Nevertheless, geological maps continue to underpin the exploration, exploitation, and management of natural resources such as fossil fuels, minerals, and groundwater. Geological maps can, however, be the essential basis for derived, spatial geoscience information with which complex science relating to societally relevant issues such as geohazards can be communicated meaningfully to the layperson. Such derived spatial geoscience information offers opportunities for geological surveys and agencies to demonstrate societal relevance by creating social and economic benefits. Production and delivery of such information from complex geoscientific data should therefore be central to the mission of geological surveys and agencies. This pathway is traced from data to information and knowledge of use in decision making. Societal benefits and impacts are described and quantified using case studies and independent economic impact analysis data.
WHO USES GEOLOGICAL MAPS?
tioning system (GPS) satellite positioning and navigation systems, and new initiatives enabled by these technologies (e.g., OpenStreetMap [www.openstreetmap.org] and OpenCycleMap [www.opencyclemap.org]). Innovative outreach programs from state topographical mapping agencies such as the UK’s Ordnance Survey (OS) also play a part. The OS, for example, provides paper topographical maps free of charge to secondary school pupils (the 11–16 age group) in the UK. With links into the school curriculum, it is clear that this initiative is increasing the level of understanding of topographical mapping by children in the UK from all socioeconomic backgrounds.
Geospatial Literacy The ability to understand even simple topographical mapping requires of the user knowledge of fundamental concepts such as scale, azimuthal orientation, elevation, contours, and map symbology. Nevertheless, the proportion of the population able to use such topographical mapping is on the rise, at least in the UK. This is due to many factors, including the ready availability of online mapping from established providers, the increasingly routine availability and use of global posi-
Hughes, R.A., 2011, Geoscience data and derived spatial information: Societal impacts and benefits, and relevance to geological surveys and agencies, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 35–40, doi:10.1130/2011.2482(04). For permission to copy, contact
[email protected]. © 2011 The Geological Society of America. All rights reserved.
35
36
R.A. Hughes
Geoscientific Literacy To interpret a geological map, not only do the fundamentals of scale, azimuthal orientation, and elevation need to be understood, but there must also be a familiarity with highly specialized concepts including stratigraphy and stratigraphical relationships, geological structure, and complex symbology, and an ability to visualize structural, stratigraphic, and lithodemic relationships in three dimensions. So, what proportion of a population has the necessary skills and knowledge to both produce and interpret geological map data to a professional standard? In trying to answer this question, it is useful to consider a few national examples, and it is necessary also to accept the basic assumption that these types of interpretations (and productions) require graduate-level geoscience knowledge and skills. Natural resource exploration and production represent a major contribution to the Canadian economy. Between May 2008 and May 2009, oil, gas, and mineral production were valued at CN$49.9 billion or 4.2% of gross domestic product (GDP) (Statistics Canada [www.statcan.gc.ca]), while during the same period, exploration investment was estimated at CN$2.8 billion (The Northern Miner, 2009). Canada’s population is ~33 million, and in 2008, Canadian universities produced around 1200 geoscience graduates. In 2008, therefore, a paltry additional 0.004% of the Canadian population developed the expertise to interpret a geological map to acceptable professional standards. In the UK, which has an economy that is less dependent upon resource exploration and production, but to which geoscience is equally important for other environmental management reasons, ~1300 earth sciences (geology, environmental sciences, etc.) students graduated in 2008. With a national population of ~60 million, this means that in 2008, an extra 0.002% of the UK population developed the expertise to interpret a geological map to professional standards. Whatever the errors in these crude figures, it is beyond dispute that the proportions of the populations of these countries able to interpret geological maps are extremely low. A further relevant trend lies in the numbers of geoscience graduates that do not pursue careers in geoscience, and whose essential skills in producing and interpreting geological maps are effectively lost to their national economies. The American Geological Institute (2009), for example, reported that less than 13% of the ~6000 new U.S. geoscience bachelor’s majors graduating in the fall of 2008 will ever work professionally in geoscience. Similarly (but slightly less pessimistically), of the ~1300 earth science graduates produced annually in the UK, only half of those will find their first employment in the earth sciences. We can conclude, therefore, that although levels of topographical mapping literacy and use are increasing (at least in the UK), the proportion of the population of major nations that can produce and interpret basic geological mapping to professional standards is extraordinarily small. Furthermore, in the cases of the UK and United States in 2008, at least half (and in the United
States significantly more) of those graduating from universities with the appropriate skills are lost to the profession, either through deliberate career choice decisions or because of the unavailability of geoscience employment. The taxes and royalty contributions of the extractive industries to national economies can be huge. Since these industries are reliant on basic geological map data, it follows that the underpinning contribution of such data to the development and productivity of resource-dependent economies is also very great. However, taking into account the extremely low levels of geospatial and geoscientific literacy described here, it is evident that basic geological map data are largely meaningless to the overwhelming proportion of the population, who lack the skills and knowledge required to interpret such data. There are, however, very real and quantifiable social and economic impacts and benefits in high-resolution spatial geoscience data sets that can be derived from basic geological map data. DERIVED SPATIAL GEOSCIENCE INFORMATION Geological surveys and environmental agencies in tectonically and volcanically active parts of the globe have for many years been using primary geological mapping as the baseline data set from which to derive geohazard maps for use in civil planning, emergency planning and response, and engineering design. The need to understand and mitigate the potential hazards of earthquakes in particular has focused the efforts of agencies on producing geospatial information that can be used by nonspecialists. Examples include seismicity, ground shaking, induced landslide potential, and both probabilistic and deterministic liquefaction potential maps developed specifically for urban planning and development, disaster mitigation, and response planning (see, for example, U.S. Geological Survey, 2009). While tsunami hazard potential maps for some parts of the Pacific coast of North America have existed for many years, one consequence of the devastating Indian Ocean tsunami of 26 December 2005 was the focusing of research efforts on mapping zones of tsunami hazard along previously poorly understood populous coastlines. As a result, tsunami hazard maps and related information (at least at low resolutions) now exist for most of the world’s populous coastlines, including those areas known to be at high risk such as Japan (e.g., Government of Japan Cabinet Office, 2004) and those areas where tsunami hazard potential is generally regarded as low (e.g., Schmidt-Thomé, 2006). Similarly, pyroclastic flow and mud-flow hazard potential maps are available for many of the world’s volcanoes that are located in populous areas (see, for example, Orsi et al., 2004). While large parts of Earth’s surface are seismically relatively inactive and distant from active volcanic sources, they may still be vulnerable to more insidious but nevertheless potentially damaging geological hazards. As custodians of unique geoscience data holdings and owners of sometimes unique in-house expertise, there are real opportunities for geological surveys and agencies to demonstrate societal relevance and generate economic benefits
Geoscience data and derived spatial information by producing geospatial information characterizing vulnerability to these hazards. From Data to Information and Knowledge for Decision Making—The Knowledge Transfer Dimension In 2001, the British Geological Survey (BGS) launched the world’s first national digital vectorized and attributed geological map at the 1:50,000 scale (DiGMapGB; see British Geological Survey, 2009a). In 2003, the BGS initiated a program having the specific objective of using the DiGMapGB digital geological map data as the basis from which to produce societally useful, derivative national geohazard potential information. Within this program, the BGS has since 2003 produced, and annually updated, national geohazard potential data sets, including ground stability, natural radon potential, flooding (groundwater, coastal, and riverine), permeability, non–coal mining, and potentially harmful elements in soils. The BGS national ground stability data set (GeoSure; Fig. 1; see also British Geological Survey, 2009b), for example, consists of multilayer geospatial information (i.e., a “geographic information system [GIS] data set”) that gives indicative hazard potential values for six shallow geohazards, namely, swelling and shrinking clays, soluble rocks, landslides, compressible rocks, collapsible rocks, and unconsolidated running sands. First launched in 2003, it is now widely used by both the public and commercial sectors in the UK for property-specific searches as a preliminary step to property sale/purchase transactions and by operators of major infrastructure such as highway and railway networks. GeoSure is also now used by major UK insurers in setting premiums charges for buildings insurance. Effective knowledge transfer is integral to the success of such geoscience spatial data sets, enabling them to be used by laypersons with no specialist geoscience knowledge or understanding. The GeoSure landslide hazard potential layer, for example, is produced by multiparameter analysis with expert knowledge and validation (Table 1). However, those parameters, data, and expertise are translated into information expressed in plain English, so that complex science can be used as a basis for decision making by unqualified persons, therefore increasing the reach, value, and impact of the science. Significantly for the layperson, this high-resolution geospatial information can be interrogated on a location- or propertyspecific basis, or by postal code (ZIP code). The BGS offers its own such site-specific reporting information service (GeoReports; British Geological Survey, 2009b), or the user can purchase the same BGS information supplied by one of many private-sector “value-added resellers.” Economic Benefits of Derived Geoscience Spatial Information In 2006, the BGS GeoSure natural ground stability hazard potential information was used as a case study by the economists
37
at PricewaterhouseCoopers (PwC) in an investigation into the economic benefit of environmental research, commissioned by the BGS’s parent body, the Natural Environment Research Council (NERC) (Natural Environment Research Council, 2006). PwC concluded the following: (1) By using this information, decision makers are empowered to make better-informed decisions, and they can avoid future costs and prevent loss of investment by avoiding or mitigating subsidence incidents. (2) BGS information on subsidence risk, provided at postal code and household level, is “accurate and relevant to user needs,” responsive to climate change impacts, and “meets the needs of the information age.” (3) By using this information, financial and social costs can be avoided through avoiding investing in areas at risk of subsidence, or taking preemptive action and mitigating subsidence. (4) Using this information, wider societal benefits can be created, such as avoidance of stress, injury, and disruption associated with loss of property. Noting that the annual cost of subsidence to the UK insurance industry is ~£300 million, PwC concluded that use of the BGS ground stability information could save UK insurers between £70 million and £270 million in reduced payouts between 2006 and 2030. The BGS–Health Protection Agency (HPA) natural radon hazard potential information for England and Wales was launched in 2007 (see Miles et al., 2007). This high-resolution spatial data set (see also British Geological Survey, 2007) was produced using a methodology that combines empirical radon measurements with digital geology (Miles and Appleton, 2005), and it can be interrogated on a property- or locationspecific basis. Property owners can use the information to find out indicative natural radon potential levels at their properties or locations (see Fig. 2), and to decide on remedial action if necessary. Similarly, builders and developers of new homes and commercial premises can factor the same information into the designs of new buildings to mitigate the effects of natural radon. Exposure to indoor radon is the largest contributor to the radiation exposure of the population (Miles and Appleton, 2005). It is estimated that about 1100 persons die each year in the UK due to lung cancers caused by natural radon (Independent Advisory Group on Ionising Radiation, 2009). This incidence may increase significantly when combined with other carcinogens such as tobacco smoking. In many parts of the world, natural radon almost certainly causes more deaths than any other environmental hazard. Unlike the BGS ground stability information, the BGS-HPA natural radon potential information has not yet been subject to independent economic impact analysis. However, it seems highly likely that such an analysis would conclude that use of the information could lead to the avoidance of a significant number of potentially fatal lung cancers each year.
100
0 kilometers
100
Figure 1. (Left) GeoSure landslide hazard potential derivative layer. (Right) GeoSure swell-shrink clay hazard potential derivative layer. Copyright © British Geological Survey (Natural Environment Research Council); used with permission.
kilometers
Low to nil
Low to nil
0
Swell-shrink potential Significant Moderate
Landslide potential Significant Moderate
Geoscience data and derived spatial information
39
TABLE 1. FROM DATA TO INFORMATION: THE BRITISH GEOLOGICAL SURVEY GeoSure LANDSLIDE HAZARD POTENTIAL DATA SET SOURCE DATA AND INFORMATION PROVIDED TO USERS Data sources and parameters Information provided for decision making “High hazard potential: Slope instability problems almost • Lithology certainly present and may be active. Significant constraint on • Structure land use.” • Geotechnical properties • • • • • •
Porosity, permeability Groundwater, natural springs Digital terrain models Slope angle and class BGS national landslide database Expert and geographic information system validation processes
“Low hazard potential: Slope instability problems are not believed to occur, but consideration should be given to potential problems of adjacent areas impacting on the site.”
Figure 2. Natural radon hazard potential data set for part of southwest England. Copyright © British Geological Survey (Natural Environment Research Council) and Health Protection Agency; used with permission.
40
R.A. Hughes
SOCIETALLY RELEVANT GEOSCIENCE DATA AND THE FUTURE OF GEOLOGICAL SURVEYS AND AGENCIES State budgets for many geological surveys and agencies are under severe downward pressure. While basic geological map data continue to underpin the exploitation and management of natural resources, beyond these essential applications, they offer little value to wider society, in which there are extremely low levels of geoscientific literacy. There are, however, great—and largely untapped—opportunities for all geological surveys and agencies to demonstrate their relevance to governments and their wider populations by generating geoscience information that has broad societal reach and that can yield real and quantifiable social and economic benefits. The UK government has introduced policies (Research Councils UK, 2007) that put a clear expectation upon public-sector research to generate significant increases in the social and economic impacts and benefits of their research programs. It is not known if similar far-sighted policies exist in other countries. However, by virtue of their unique assets of expertise and national data holdings, geological surveys and agencies are extremely well placed to deliver socioeconomic benefits through relevant geoscience information that can be used to support informed decision making across all sectors of society, and so serve the objectives and needs of governments, society, and agencies alike. Looking to the future, geological surveys and agencies should respond to societal challenges by making the provision of such information central to their missions. ACKNOWLEDGMENTS Topographical data in the figures are reproduced with the permission of the Ordnance Survey of Great Britain.
REFERENCES CITED American Geological Institute, 2009, Status of the Geoscience Workforce 2009: http://www.agiweb.org/workforce/reports/2009-StatusReportSummary .pdf (accessed 24 June 2011). British Geological Survey, 2007, HPA (Health Protection Agency)-BGS Radon Potential Dataset: http://www.bgs.ac.uk/radon/hpa-bgs.html (accessed 24 June 2011). British Geological Survey, 2009a, DiGMapGB-50 digital geological map of Great Britain: http://www.bgs.ac.uk/products/digitalmaps/digmapgb_50 .html (accessed 24 June 2011) British Geological Survey, 2009b, GeoReports: http://shop.bgs.ac.uk/ GeoReports/. Government of Japan Cabinet Office, 2004, Tsunami and Storm Surge Hazard Manual (English edition): 112 p. Independent Advisory Group on Ionising Radiation, 2009, Radon and Public Health: Chilton, Oxfordshire, UK, Health Protection Agency, 240 p. Miles, J.H.C., and Appleton, J.D., 2005, Mapping variation in radon potential both between and within geological units: Journal of Radiological Protection, v. 25, p. 257–276, doi:10.1088/0952-4746/25/3/003. Miles, J.H.C., Appleton, J.D., Rees, D.M., Green, B.M.R., Adlam, K.A.M., and Myers, A.H., 2007, Indicative Atlas of Radon in England and Wales: Didcot, Oxfordshire, UK, Health Protection Agency, 36 p. Natural Environment Research Council, 2006, Economic Benefit of Environmental Science: www.nerc.ac.uk/publications/corporate/economic.asp (accessed 24 June 2011). The Northern Miner, 2009, 1–7 June 2009, v. 95, no. 15, p. 1–2. Orsi, G., Vito, M., and Isaia, R., 2004, Volcanic hazard assessment at the restless Campi Flegrei caldera: Bulletin of Volcanology, v. 66, p. 514–530, doi:10.1007/s00445-003-0336-4. Research Councils UK, 2007, Increasing the Economic Impact of the Research Councils: http://www.rcuk.ac.uk/documents/publications/ktactionplan .pdf (accessed 24 June 2011). Schmidt-Thomé, P., 2006, The Spatial Effects and Management of Natural and Technological Hazards in Europe: European Spatial Planning Observation Network, Statistics Canada, 197 p., www.statcan.gc.ca/daily -quotidien/090731/t090731a1-eng.htm. U.S. Geological Survey, 2009, Reducing Hazards in the Central and Eastern U.S.: Reports and maps: http://earthquake.usgs.gov/regional/ceus/index .php.
MANUSCRIPT ACCEPTED BY THE SOCIETY 17 FEBRUARY 2011
Printed in the USA
The Geological Society of America Special Paper 482 2011
Strategic Sustainability Assessment B. Deal Department of Urban and Regional Planning, University of Illinois, Champaign, Illinois 61820, USA E. Jenicek W. Goran N. Myers Engineer Research and Development Center, Construction Engineering Research Laboratory, Champaign, Illinois 61822, USA J. Fittipaldi U.S. Army Environmental Policy Institute, Arlington, Virginia 22202, USA
ABSTRACT New strategies for sustainability within the Department of Defense are focused on addressing present and future needs while strengthening community partnerships that improve operational abilities. This “across-the-fence line” strategic thinking requires innovative tools that can engage a broad segment of the community and a variety of military interest groups. These tools must provide a platform for understanding the challenges and realizing the goals of both private- and public-sector interests. They must tangibly represent many different potential futures, their implications, and policies that can help mobilize solutions quickly and easily in a uniform, consistent, and democratic manner. The Strategic Sustainability Assessment (SSA) consists of a series of complementary tools for forecasting and backcasting that provide regional stakeholders a unique perspective on potential sustainable regional policy and investment choices. Forecasting approaches use dynamic spatial modeling techniques to project potential future urban transformations and their implication to the social, environmental, and economic fabric of the region. Backcasting is used to determine critical sets of strategic interventions designed to offset the simulated future impacts. The results of the analysis are managed through the use of a Web-based GeoPortal. This helps to democratize the information by providing it to local stakeholders in a useable and accessible way. The hope is that greater and more direct access to models and the information they generate will help lead to better, more sustainable planning decisions in our military bases and in our communities.
Deal, B., Jenicek, E., Goran, W., Myers, N., and Fittipaldi, J., 2011, Strategic Sustainability Assessment, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 41–57, doi:10.1130/2011.2482(05). For permission to copy, contact
[email protected]. © 2011 The Geological Society of America. All rights reserved.
41
42
Deal et al.
INTRODUCTION Sustainability has become an important issue of broad public concern. Clean air and water, renewable energy, open space, and pollution prevention are not only essential for improving the livability of our communities, they are imperative for the successful long-term operation of our military installations. However, demographic and lifestyle shifts have increased our communal demand for land and other limited resources. As these demands grow, they place pressures on the infrastructure, resources, and long-term sustainability of our regions and the installations they support. As the embodiment of enormous capital investment in infrastructure, land, and personnel, military installations are critical to local and state economies and to the sustainability of defense, security, and military readiness. Some military installations’ economic and environmental contributions to the local community, however, have become overshadowed by perceived incompatibilities, such as noise, dust, resource competition, land use, land values, and land availability. These points of contention arise as the local community expands and available resources become more scarce. Eventually, the installation’s benefit to the community may be outweighed by the community’s requirement for resources, and the military may be perceived as a barrier to local growth and development. These potentially “unsustainable” installations face a number of risks, including downsizing, realignment, and even closure. The U.S. Army Strategy for the Environment, “Sustain the Mission—Secure the Future,” establishes a long-range vision for meeting the Army’s mission while addressing issues of regional sustainability (Assistant Secretary of the Army for Installations and Environment, 2004). The foundation of the strategy focuses Army thinking on addressing present and future needs while strengthening community partnerships that can help improve operational abilities. This across-the-fence line strategic thinking requires innovative geoinformatic planning support tools that can engage a broad segment of the community and various military interest groups. These tools must provide a platform for understanding the challenges and realizing the goals of both privateand public-sector interests. They must tangibly represent many different potential futures, their implications, and policies that can help mobilize solutions quickly and easily in a uniform, consistent, and democratic manner. Such planning support innovations are not only timely and important, but they are also critical if we are to effectively communicate the great challenges to improving the built environment and sustaining our defense infrastructure that lie ahead. Better access to regional growth and other models and the information they generate will improve public participation, feedback, and support in the planning process, ultimately leading to better, more sustainable decisions for our communities, institutions, and publicly supported defense infrastructure. This chapter describes an approach for engaging stakeholders and the broader defense community in regional planning and
sustainability issues through the development of an innovative planning support system (PSS). The PSS was constructed to inform planning decisions as part of the Strategic Sustainability Assessment (SSA) project. The SSA is a long-term project sponsored by the Army Environmental Policy Institute that seeks to provide quantitative and visually accessible information on issues critical to the sustainability of public and private communities. The SSA uses a variety of models, tools, and research techniques to provide strategic analyses for the Army and its surrounding communities in their journey toward increasing sustainability. The product is a series of ongoing, regular studies and reports that focus on specific regions or issues that enable the development of implementation plans and concepts for the Army Strategy for the Environment (ASE). The following briefly describes the role that geoinformatics played in the development of a PSS used in an Army SSA application for the Fort Bragg region of North Carolina. We describe the basis for the system and the complex, large-scale models it deploys, along with a case study example of its application in North Carolina. DEPARTMENT OF DEFENSE COMMUNITIES, GEOINFORMATIC TECHNOLOGIES, AND PLANNING SUPPORT Department of Defense Communities By design, military installations were historically located in geographic regions with plentiful and inexpensive land. These are often areas with soils too poor for agriculture with few competing employment centers. Testing and training ranges are sited far from existing populations in an attempt to ameliorate complaints related to noise and dust. Over time the civilian population has crept closer to installation boundaries, both in response to the availability of secure employment and to provide services to the service members and their families living on or near the installation. The previously isolated bases are now, at times, a catalyst for development activity in the surrounding communities (i.e., outside the fence line1). These “encroachment” events—one or more of the many pressures that limit the military’s use of land, air, and sea space—can affect an installation’s ability to sustain its operations by seriously restricting, and in some cases shutting down, its ability to train and test soldiers for conflict. U.S. Army installations face additional pressures from both inside and outside the military complex. For example, Army installations must comply with the same environmental regulations as their civilian neighbors across the fence, although in greater depth and with greater penalty for noncompliance. Noncompliance outside of federal lands leads to monetary fines; inside the fence, noncompliance can lead to greater fines and a loss of funding and/or jobs, and it can project an image of the 1
Outside and inside the fence line are terms used by the military to denote activities that take place “outside” or “inside” military jurisdictional boundaries.
Strategic Sustainability Assessment installation as an inconsiderate neighbor or an unresponsive federal entity. In a broad context, installations must also compete with each other for their very existence as the Army of the future trims real estate. Consequently, if the installation has not proactively engaged its neighbors in planning practices, its perceived value to the community might be outweighed by its value as a source of developable land (or other resources) or by its image as an environmental polluter. The Department of Defense’s (DoD) interest in and ability to reach outside its spheres of control has been lacking for several reasons: 1. Resources/priorities—DoD installations prioritize missionrelated activities over all other facility and facility-planning activities. They recognize factors that impact mission activities such as those arising from nearby community growth as secondary issues. Therefore, the funding, personnel, and time needed to engage in community planning processes are typically under-resourced. 2. Data—U.S. military installations have traditionally managed data within their fence line with great acumen, but not beyond. The initial geographic information systems developed by military installations generally contained data only for regions inside their boundary. Other agencies such as the U.S. Department of Agriculture have followed suit by mapping only outside military installation boundaries, creating a hodge-podge of data, making it difficult to find cohesion among approaches and availability. 3. Short-term focus—Military installation commanders have the authority to interact with their local communities, but they frequently rotate assignments, and this rotation, plus the pressing and urgent short-term issues they face, have tended to limit their focus on community planning and coordination. Planning issues typically have a long time line and are often put aside while short-term issues are addressed. 4. Authority—There are some formal mechanisms for resource sharing and coordinated planning between installations and communities in the DoD (the Office of Economic Adjustment offers programs to facilitate these interactions), but utilization is dependent on local adaptation and limited funding. Geoinformatic Technologies The tools needed to plan these complex DoD-dominated regions within a twenty-first-century society are now on the horizon. With these tools, citizens and stakeholders are no longer bound by an incomprehensible future; they are able to work with tangible representations of many different potential futures, shifting their thinking about the future region from an abstract idea to potential realities. How will our region change if a new investment is made? What types of lands are off limits to urban development? How do the changes affect our schools? Important regional decisions based on data and information about the future can be viewed and shared in a uniform, consistent, and democratic manner. Planners have easy access to this information
43
and can use it to assemble a set of actions (policies, regulations, investments, or designs) based on public input. Emerging geoinformatic technologies make this new paradigm for planning possible. These technologies have the ability to transform planning from a slow and costly, paper-driven, blackbox process of producing reports to a transparent, democratized way of delivering information relevant to policy and investment decisions quickly and easily. Regions will no longer create a plan just for the sake of creating a plan, and the chasm between making and using plans will gradually shrink. Making, using, and evaluating plans will be increasingly consistent across space, time, and functional area (land use, schools, infrastructure, etc.). Decisions can be made on the basis of tangible information available and viewable long into the future by a wide range of constituent groups. These models link physical changes in a community to their relevant consequences, enhancing the individual and collective decision-making processes inside and outside the fence line. Planning Support Systems A growing set of planning support tools that utilize geoinformatic technologies has become available in recent years. Brail and Klosterman (2001) outlined the state of the art and described new approaches in PSS development due in some measure to increased computational capabilities and availability of digital data. Geertman and Stillwell (2003) added a review of the role of PSS in planning with the intention of documenting best practices and promoting the use of planning tools. They described the evolution of PSS-like tools and identified spatially explicit decisionsupport systems as an important subcategory. They discussed tools used in three types of planning: participatory, strategic and land use, and environmental. Klosterman described categorization of PSS along two dimensions: by the planning task that the model addresses and the technique or approach it utilizes (Brail and Klosterman, 2001). This recent evolution of planning support tools toward dynamic spatial simulation systems contrasts somewhat with earlier work in spatial (and aspatial) reasoning systems (Kim et al., 1990). Knowledge-based reasoning systems were loosely founded on a philosophical ideal of capturing the manner in which expert knowledge is applied to address complex planning problems. These systems were characterized by the use of multiple types of domain knowledge and complex domain models to support reasoning processes. This approach is important in considering the development of PSSs in the context of participatory planning exercises. Active participation of stakeholders is an important component of the planning process. Participatory planning has traditionally been led by public agencies and relies on face-to-face contacts in venues such as town-hall meetings. Information limitations in various forms often restrict the public’s access to or interest in participating in the process, making many such processes “expert”-driven with intermittent public feedback. Technological innovations in policy analysis and computing have started
44
Deal et al.
to break down these barriers in many aspects of governance by providing more information and services through the Web. This technology has created an opportunity to develop applications that extend interactivity by providing analytical tools that allow the public to perform on-the-fly policy evaluations and provide immediate feedback. More specifically, these tools allow users to compare outcomes of policy choices in relation to location, type, and intensity related to spatial growth and development of places (cities, regions, etc.) on the basis of their impacts on various quality-of-life indicators—and their potential implications for adjacent military operations. A regionally based PSS utilizing geoinformatic technologies and planning processes was constructed to inform planning decisions as part of a SSA project, and it was applied to a DoD Army installation–dominated region. STRATEGIC SUSTAINABILITY ASSESSMENT The SSA is described as an integrated planning approach that optimizes regional resources and begins to develop sustainable management plans for regions with a strong DoD presence using geoinformatics, dynamic planning support tools, and participatory planning methods. Generally, the SSA initiates a regional journey toward sustainability. Together, regional stakeholders identify assets and resources and define a vision for their future. SSA modeling tools then simulate a range of likely futures that use the current assets and compare the outcomes with the participants’ vision in order to identify the changes needed to achieve common goals. The SSA process does not prescribe a future or finalize a plan for getting there. Rather, it provides information to empower regional decision makers working toward a sustainable future. It is the assembly of data, tools, and plans in a participatory environment that fosters the identification of regional resources and priorities, potential future realities, and emerging coalitions. Necessary tools in forming an SSA engagement include models that characterize current conditions and threats, models that project future conditions and policy scenarios, sustainability impact models, and visualization tools. In any given region, entities will already employ many of these models and tools. They will likely have comprehensive or capital improvement plans that characterize the current condition and evolution of resources. They have likely set future goals and objectives. Specialized impact models are probably used by municipalities to monitor transportation, by school districts to monitor enrollments, and by utility providers to monitor water quantity and quality. The strength (and transferability) of the SSA is in bringing these tools together with supplemental resources to fill in the gaps to shape geoinformatic, dynamic, participatory planning support. Initiating a Set of Tools and Resources The SSA pilot study initiated a toolbox of resources to assist regions. These tools included the Sustainable Installations
Regional Resource Assessment (SIRRA), Land-Use Change Assessment (LUCA), and land-use projection, impact assessment, and GeoPortal visualization tools of the Land-Use Evolution and Impact Assessment Model (LEAM). These tools complement common regional efforts. SIRRA and LUCA are DoD-developed tools for comparing stresses and investments across regions. Each uniquely informs national and broad regional stakeholders about the ways in which a local situation fits within a larger context. LEAM is a privately developed land-use simulation approach that is growing in popularity with DoD installations for its useful role in local public policy and planning deliberations (Deal and Pallathucheril, 2007). SIRRA is a screening tool for assessing relative vulnerability in 10 sustainability issue areas: (1) air quality, (2) airspace, (3) energy, (4) urban development, (5) threatened and endangered species, (6) locational sustainability, (7) water, (8) economic issues, (9) quality of life, and (10) transportation. The results of SIRRA analyses are used to identify regions and sustainability issues that require further study using additional data sources. SIRRA was developed under the Strategic Environmental Research and Development Program (SERDP) and was recognized as the 2006 SERDP Project of the Year. SIRRA, which provided auditable data for the Army stationing analysis for Base Realignment and Closure (BRAC) 2005, has been used to evaluate an existing installation’s ability to absorb additional forces and a region’s capability of supporting a new installation. It is also continually used at installation sustainability planning workshops. Incorporation of SIRRA into the SSA brings regional awareness to individual stakeholders. The assessment provides valuable screenings for which additional studies, planning, and actions may be recommended to ensure continued viability (Jenicek et al., 2004). Also developed under SERDP, LUCA examines local and regional land-use trends and impacts on military installation missions. It can provide technologies and data to help installations and units proactively plan to protect the mission sustainability of DoD’s current and future capabilities (Lozar et al., 2003). A LUCA analysis shows a series of landscape changes over 30 years and covers a variety of regional scales. Trends are drawn from the analysis of historic land-use and land cover maps, satellite images, and other sources. Several studies have utilized differing data sources to tailor graphical presentations and comparative analyses of changes over time. Land-use change can significantly and permanently affect opportunities to test and train, but the decades-long process of change is easy to overlook in installation planning. LUCA analyses bring the recognition of land-use change trends to the SSA. LEAM—a dynamic urban modeling and simulation tool that forecasts the outcomes of urban investment and policy initiatives—is at the core of the SSA efforts to build an interactive PSS modeling environment. LEAM technology and framework have been described elsewhere in detail (Deal and Pallathucheril, 2007; Deal, 2008) and will only be summarized here. The LEAM framework consists of two major parts: (1) a land-use
Strategic Sustainability Assessment change (LUC) model defined by multiple drivers that describe the local causal mechanisms and allow easy addition and removal of variables for playing out alternative scenarios, and (2) impact assessment models that support rapid analysis and interpretation of land-use changes depending on local interest and applicability. In other words, the LEAM framework is intended to help users find answers to the complex questions of “What if?” and “So what?” The LEAM LUC model uses a structured, gridded surface with state-change conditions that evolve over time, similar to other change assessment technologies. The LEAM grid surface is not flat, however, but gains a “hilly” topography on the basis of both biophysical and socioeconomic constraining factors. It incorporates state-change techniques to calculate a probability that represents the potential of each cell to change from one landuse category to another. Unlike other state-change approaches, however, the probability of (state) change is influenced by local interactions (such as the accessibility of the cell to a given attractor), global interactions (the state of the regional economy, for example), and other causal mechanisms, producing suitability scores that contribute to the grid surface relief and affect subsequent allocation. Similar to other large-scale urban models, LEAM works at the regional scale and incorporates regional macro-socioeconomic models combined within the modeling framework to help determine demand for land. Unlike other large-scale models, however, LEAM aggregates to the regional scale from a fine-scale (30 m × 30 m) resolution that includes cell-based micromodels. This architecture enables loosely and tightly coupled linking with other models that may operate at different spatial scales (transportation models, for example) and the capability to quickly link to models that describe the potential implications of the simulated changes. One submodel, LEAMecon, is an econometric, input-output model that determines the regional demand for residential, commercial, and open-space land. Unlike other approaches that use regional constraints on demand to determine spatial allocation (see Wu and Martin, 2002), in the LEAM approach, households and jobs are established by LEAMecon and converted into a demand for land using sector-based economic and demographic analyses that are quantitative, defensible, and allow for economically based what-if scenario questions to be tested. Within each scenario tested, the estimated demand for land serves as a target for regional land allocation. Market variables increase or decrease development rates on the basis of how well the regional demand targets are met (or not met). SSA Engagement Once model simulations are established, scenario descriptions of alternative land-use policies, investment decisions, growth trends, and unexpected events (among others) can be simulated, analyzed, and compared for regional importance. Simulated outcomes are described in graph, chart, text, and map form and are used in engaging in local dialogue and analyzing the potential
45
implications of the changes described (Deal and Schunk, 2004). The assessment of probable impacts is important for understanding the “so what” of scenario simulations. A visual and quantitative representation of each scenario’s outcome provides both an intuitive means of understanding and a basis for analyzing the implications of potential decisions. These representations act as a catalyst for discussion and communal decision making. The importance of effective visualization devices for dealing with dynamic spatial data sets has long been recognized, especially in the field of natural resource research (Ekstrom, 1984; Rosenfeld and Kak, 1982). Natural resource scientists have been using visualization tools to better understand their science, while social scientists have sought to better understand human behaviors vis-à-vis those resources (Cox, 1990; Malm et al., 1981). While the case for supporting visualization at detailed as well as regional scales was made over a decade ago, not much progress has been made in that regard (Orland, 1992). Geographic information system (GIS) software has boosted dynamic spatial models in that it eases management and manipulation of the large amounts of spatial data that go into them. At the same time, GIS software may limit our ability to effectively deal with the data sets produced because the software does not extend beyond conventional cartographic representations. As a result, the visualization devices used on these data sets support only very basic inferences across space. Even “map movies” (animated maps) have limited value beyond some very preliminary inferences, since each frame is displayed for a relatively short period, and detailed comparisons cannot be made across time. Budthimedhee et al. (2002) rendered some key insights into the characteristics of visualization devices that can effectively and efficiently support inferences from dynamic spatial data sets. First, because the speed at which inferences can be made is critically important with large data sets, she draws on the idea that we must pay attention to the ease and accuracy with which the preattentive visual system can assess relative magnitudes. Second, because of the amount of data needed to make inferences, she draws on the idea that graphic attributes of a visualization device may be more important than its efficiency in using ink to represent the data (as is the conventional wisdom). Third, she notes that the more proximate the information needs to be in order to make inferences, the more proximate the information needs to be when visualized (Wickens, 1992). These ideas were critical in the development of the SSA Web-based GeoPortal (described later herein). The SSA GeoPortal development process explored the potential for building visualization devices for multiscale dynamic spatial data sets by focusing on the elementary perceptual task, number and type of graphic attributes involved, and proximity of compatible components. Fort Bragg SSA Example At the time of the SSA project, the Fort Bragg region already had sustainability planning efforts and regional partnerships
46
Deal et al.
under way, although for reasons described previously, they had been tenuous and cumbersome. Fort Bragg was one of the pioneering Army installations for the (inside-the-fence) Installation Sustainability Planning process, which was begun there in 2003. A nonprofit entity, the Sustainable Sandhills group emerged from that process as a regional sustainability entity and a potential catalyst for outside-the-fence interactions. In addition, Joint Land-Use Studies had been completed in the region and a BRAC Regional Task Force had begun to analyze the potential implications if there were significant increases in the number of soldiers housed at Fort Bragg. Building on these efforts, participants in the Fort Bragg SSA pilot study began by identifying current assets and resources and defining a vision for their future. Building the SSA Framework Existing Conditions The Fort Bragg region is undergoing significant environmental, social, economic, and mission changes. In addition to major Army initiatives—Army to Modular Force Conversion and “Grow the Army”—the BRAC Commission recommended the realignment of U.S. Army Forces Command and U.S. Army Reserve Command to Pope Air Force Base and Fort Bragg. Together, these activities are expected to bring tens of thousands of new residents to the region by 2013, an influx exacerbated by the inherent population growth. The Fort Bragg region grew 59.9% in Fayetteville between 1990 and 2000, a time frame in which the overall U.S. population grew just over 13%. Within a 5-mile buffer of the Fort Bragg fence line, urban areas grew 22% between 1992 and 2001, compared with an Army average
of 26%, obtained from 98 Army installations analyzed (Lozar et al., 2005). Economic Drivers A LEAMecon model was developed for the Fort Bragg– Fayetteville region to forecast changes in output, employment, population, and income over time on the basis of changes in the market, technology, productivity, and other exogenous factors. The core model consists of nine economic sectors and nine components of final demand. The output from each sector is consumed by other sectors (interindustry flows) and by components of final demand (which characterizes value added in the economy). Various shocks like investments to specific sectors, increases in public spending, or consumption from households, etc., can be applied to the regional economic system. The employment model shows changes in productivity over time to determine regional employment levels. The total population is subdivided into different age cohorts, each of which has a specific role to play in regional landuse change in the region. The resulting economic trend is used as an input to a dynamic housing market simulation, which then feeds into LEAM as residential land-use change. LEAMecon forecasts employment in the Fort Bragg region to increase by ~216,000 to ~1.7 million jobs by 2030. It forecasts a corresponding population increase of ~343,000 to a total regional population of 2.8 million people over the same time period. Figure 1 describes by-sector changes to 2030. It shows a sharp increase in service sector jobs and a slight decrease in manufacturing jobs over the next 20 years, corresponding roughly with national trends. Government jobs and retail trade employment figures are also expected to increase significantly in this region, generally due to the DoD presence. The financial, insurance, and
Figure 1. Changes in economic structure in the Fort Bragg region of North Carolina to 2030.
Strategic Sustainability Assessment real estate sectors (FIRE) are also expected to increase, but at a slower rate of increase over time. Land-Use Changes Several scenarios of potential land-use change forecasts were tested as a result of stakeholder discussions: Business-as-usual (BAU). The BAU or reference scenario represents an estimate of the spatial extents of future regional development should current trends continue. Figure 2 shows that residential development in the BAU scenario occurs overwhelmingly in Cumberland County, and more specifically in and around Fayetteville and along the southeastern boundary of Fort Bragg. Conversion of more than 10,000 acres (4047 ha) of forest, 5000 acres (2023 ha) of grassland, and almost 6000 acres (2428 ha) of agricultural land will provide the
47
space for new residential development. Harnett, Lee, and Moore Counties are expected to lose the highest proportion of agricultural land to development by 2035. Low-impact development (LID). This scenario illustrates the generalized benefits of encouraging various LID policies and their potential impacts on the region. Conversion of urban open space slows overall, resulting from policies aimed at maintaining natural or predevelopment hydrology and infiltration rates. This scenario shows little difference with baselines in land use, but it has much broader, positive implications when water quality and quantity impact models are used. Transit-oriented development (TOD). Results from the TOD scenario indicate that almost 72% of new residential development will occur in Cumberland County (61% occurs in Cumberland in the BAU scenario). Interestingly, a greater amount
BAU LAND-USE CHANGE IN ACRES BY COUNTY FOR THE FORT BRAGG REGION (2001–2035) County Cumberland Harnett Hoke Lee Montgomery Moore Richmond Scotland
Water
Residential
Commercial
Barren
Forest
Grasslands
Agricultural
–6 –4 0 –5 0 –5 –2 –1
22,399 9614 1777 6096 1322 8900 5432 3529
399 79 120 367 77 533 220 146
–33 –4 –1 –46 –2 0 –118 0
–10,533 –2415 –599 –2903 –622 –4915 –2139 –1202
–4964 –2155 –412 –797 –204 –1581 –1205 –691
–5882 –4733 –788 –2442 –458 –2295 –1854 –1565
Figure 2. Business-as-usual (BAU) growth scenario for the Fort Bragg region.
Urban open space –551 –161 –52 –185 –61 –381 –144 –90
Wetlands –20 –8 –1 –1 –1 –6 –2 –1
Deal et al.
of natural areas would be converted to urban land uses in the TOD scenario than in either the BAU or LID scenarios. Closure of Bragg Boulevard. This scenario presents a possible land-use outcome of closing Bragg Boulevard to through traffic. In this scenario, conversion to urban land uses from natural and agricultural lands continues to roughly the same degree as in the BAU scenario. Regional zoning. This scenario uses the normalized regional zoning map as produced by the Sustainable Sandhills project, showing development patterns if the current zoning were strictly enforced. In Cumberland County, conversion to residential land use remains high, but commercial land use increases by 1.4% above the No Bragg Boulevard scenario. Conversion of natural and agricultural lands is similar for all scenarios, ~40% forest, 18% grassland, 35% agricultural, 3% urban open space, and the remainder, ~2%, split between water and wetlands. Typically, the results of a 30 to 50 yr run are evaluated with a summary map that indicates where new development is projected in a region during this period. Summary maps can also be developed that zoom in on a particular part of a region to assist in local planning efforts. Results are also summarized in spreadsheets and graphs that indicate growth over time in the region and the land uses that decline as urbanization increases. Assessing the Implications of SSA Scenarios Housing Households within the Fort Bragg region grew by over 122,000 between 1970 and 2000 and will likely increase by 210,000 by 2035. A primary driver for this growth is the presence of Fort Bragg and several other proximal DoD installations and the BRAC process. As part of that process, Fort Bragg and Pope Air Force Base are set to receive 62,775 active-duty military and civilian personnel by 2013 (U.S. Department of Defense, 2005). This does not include the soldiers’ families and the supporting services that inevitably follow. Adding to housing pressures, key demographic shifts are taking place on a regional and national scale, such as falling household size, single worker households, age increases, increasing divorce rates, and growth in foreignborn populations. As previously noted, Cumberland County will likely experience the largest growth in households (Fig. 3). Over 50% of the additional 120,608 households expected between 2000 and 2035 will likely be located near the city of Fayetteville, as inbound residents are likely to reside near existing population centers. Harnett County has a rural character and is attractive for development due to its convenient location to both Fort Bragg and the North Carolina Research Triangle. Households in Harnett will likely migrate to the southwest and northern areas of the county near the communities of Sprout Springs, Lillington, Erwin, and Dunn. The majority of residents enjoy unincorporated, large-lot living, so zoning will likely limit residential development. Moore County is attractive as a resort
400,000
Scotland County Richmond County
350,000
Moore County
300,000 Total Households
48
Montgomery County Lee County
250,000
Hoke County
200,000
Harnett County Cumberland County
150,000 100,000 50,000 0 2000
2005
2015
2025
2035
Figure 3. Potential household change by county in the Fort Bragg region to 2035.
and retirement community; its 43 golf courses are especially valued. The communities of Southern Pines, Pinehurst, and Carthage have a distinctive character that the residents wish to maintain. Zoning again will be an issue for residential development in Moore County. New sprawling residential growth patterns in the region can have significant environmental consequences. On average, densities in the region are decreasing, and total developed land is projected to increase 64% (50,000 acres or 20,234 ha) by 2035, causing a loss of 3.25% of existing grasslands, 2.88% of current agriculture land uses, and 1.99% of existing forested areas. This translates into a loss of more than 15,000 acres (6070 ha) of agriculture and 33,000 acres (13,355 ha) of plant and animal habitat. The region is home to several unique plant and animal species— particularly, the endangered red-cockaded woodpecker—and this loss of habitat could threaten the survival of this species. With the in-migration of soldiers, there comes an increase in housing demand, which eventually translates to rising housing costs. Historically, home prices in the Fall Line region have been favorable to low—93% of the current households spend less than 30% of their household income on housing rents or mortgages (U.S. Census Bureau, 1990). Moreover, current fair market rents are affordable to the majority of pay grades for inbound military personnel. New construction (2007) costs an average of $138,000, which requires a minimum income of $42,720, given current lending practices.2 This is just below a warrant officer (W-4) pay grade. The current median household income in the Fall Line region is $42,000—ideally affording a $137,000 priced home. Housing affordability is a key indicator of economic health and a community’s mobility. When affordable housing is
2
Loan amount at 80% loan to value, a 6.61% effective interest rate, and income required to qualify at 25% qualifying ratio.
Strategic Sustainability Assessment
49
unavailable to low-income households, family resources needed for food, medical and dental care, and other necessities are diverted to housing costs. Residential instability results as families are forced to move frequently, live with other families in overcrowded conditions, or experience periods of homelessness. Moreover, when home prices are high, development is pushed outward, and sprawl increases. Finally, economic diversity, income distribution, and social integration are all connected to this indicator as well. Housing-Job Balance The Fort Bragg region expects 39% of its inbound residents to be employed in the service sector. Overall, an estimated 295,735 of year 2035 total households (65%) are expected to have a working wage income, and the remaining 158,703 will likely have higher professional wages. A large sector of working wage residents earn $24,000 annually. This means that developments in the region should feature houses in the $100,000 to $160,000 price range (Bureau of Labor Statistics, 2008). A ratio of jobs to households is commonly used to express the concept of jobs-housing balance. The most basic measure is the ratio of the number of jobs to the number of households in an area, calculated by dividing the number of jobs by the number of households. The recommended target range for jobs-housing ratios is 1.3–1.7 (Ewing et al., 1996). The models suggest that the Fort Bragg region will maintain a jobs-housing ratio of 1.8 through 2035. This suggests that many residents may not need to commute outside of the community for employment purposes. The exception is Harnett County. LEAM Training Opportunities Model Considerations of the incompatibilities between military installations and surrounding communities have traditionally been approached by assessing the areas of the region that might be affected by specific military activity. For example, noise contours can be produced to simulate the spread of training noise activity over space in the vicinity of a military installation (Fig. 4). This map can be used to identify where potential residential complaints might occur given the training activity studied. This essentially looks at the situation as a neighborhood problem; that is, don’t move to the noisy places! An alternative approach might view the problem as a military training problem. For example, the same contour might be originated from every new household generated in the region. As in the previous example, the spatial extent of each contour might represent the probability of complaint. The resulting pattern, however, shows the areas on the installation to which specific training exercises might be limited in order to diffuse complaint probability (Fig. 5). This changes the community problem to a regionally centered one. If the military is too constrained by the transformation of urban uses, for example, its ability to effectively carry out its mission suffers and the installation may close, something neither the community nor the military would like to see.
Figure 4. Simulated noise contours of potential training activity on Fort Bragg.
Figure 5. Areas (in yellow) considered to be lowcompliance probability zones from artillery training noise.
Water Usage Water availability is an issue of increasing concern nationally and locally. In the Fort Bragg SSA study, each scenario was analyzed for its potential water availability implication. We also tested various water policy interventions for their efficacy in reducing regional water consumption and demand. The U.S. Geological Survey Estimated Water Use in the United States in 2000 was used as the basis for the analysis, with some modification and local improvement (Hutson et al., 2004). Figure 6 describes a sector-by-sector forecast of the projected water consumption from 2000 to 2035 in the Fort Bragg region. Consumption is expected to increase from ~48 billion gallons (182 million m3) a year to ~60 billion gallons (227 million m3) a year, a 25% increase. Residential and commercial water consumption are each projected to increase 39%, and agriculture water usage is expected to drop ~4% due to loss of agricultural lands to development.
50
Deal et al.
Water needs can be met in the future by increasing the efficiency of water usage while reducing leaks and waste through cost-effective water-saving technologies, revised economic policies, and appropriate state and local regulations. Strategic intervention initiatives used in the water analysis include a Public System Loss Management Program planned for 2010, a Commercial/Industrial Water Conservation Program in 2012, a Residential Water Conservation Program in 2015, and an Agricultural Water Conservation Program in 2018. Residential rainwater harvesting will also be introduced in 2025, along with commercial/industrial rain water–gray water programs. Water reuse was not considered as an option for this analysis because that would require the development of a separate distribution system. Figure 7 shows the Fort Bragg region’s water usage projection when the various strategic interventions are applied.
Water Quality LEAM water quality (LEAMwq) represents a method for quickly assessing the impacts of urbanization on surface runoff and non-point-source (NPS) pollutant loading, providing a quick screening of the impacts of urbanization and identifying the need for more advanced modeling. A simple export coefficient modeling approach can predict pollutant loading as a function of the export of pollutants from each source in the study area (Johnes, 1996). LEAMwq integrates LEAM with the Long-Term Hydrologic Impact Assessment (L-THIA) model. L-THIA is a GISbased export coefficient model developed at Purdue University with the support of the U.S. Environmental Protection Agency. It calculates mean surface runoff and NPS pollutant loading for a given region and period using daily precipitation series, a land-use map, and a hydrological soil group map. The pollutants
Figure 6. A sector-by-sector forecast of the projected water consumption from 2000 to 2035 in the Fort Bragg region. MGY—million gallons per year.
Figure 7. The Fort Bragg region’s water usage projection when the various strategic interventions are applied. MGY—million gallons per year.
Strategic Sustainability Assessment selected for this analysis include total nitrogen, suspended particles, and phosphorous. Average storm-water runoff and average total sediments are expected to increase by ~1.6%, and average total nitrogen is expected to decrease by ~1.7% across the region between 2000 and 2035 (Fig. 8). The decrease in regional nitrogen loading in local streams is related to a general decline in agricultural land uses, since industrial farming practices are the main cause of seasonal nitrogen loading. The simulated rapid increase in stormwater runoff is directly correlated to an increase in total regional impervious surfaces in the form of new development. Roadways, sidewalks, parking lots, rooftops, driveways, patios, and recreational uses such as basketball courts and bike paths all contribute to a regional rise in imperviousness.
Figure 8. Regional nitrogen loading and runoff for the Fort Bragg region. Land-use change scenarios: BAU—business as usual; LID— low-impact development; TOD—transit-oriented development; No Bragg—closure of Bragg Boulevard; Suit All—residential, commercial, and industrial sustainability; Suit Nat—natural area sustainability; RegZone—regional zoning.
51
Energy A military installation is just one of the regional users of energy. If the installation is large and the regional population is small, it may be both the largest single user and the dominant user. In the past five decades, growth around installations has been significant, and although an installation may be the largest single user of energy, it may not be a large part of the aggregate energy demand in the region. The energy consumption patterns of the installation and its energy reduction program may not be significant compared with regional energy demands and general consumption patterns. Therefore, from an energy perspective, the sustainability of an installation in the regional context depends on the region as a whole moving to a more efficient consumption pattern and resilient energy supply mix along with the installation. The energy gap is the difference between projected energy use and energy use with sustainable programs and goals in place and met. The goals of the recent initiative in the Sandhills for the energy team are to “promote energy efficiency and conservation, renewable energy utilization, and sustainable building design.” Taken to the full extent, these would set a goal for regional fuels to be of domestic origin and have a high renewable content. The goal for 2035 could be ~30% renewable. A further goal would be to reduce the usage of natural gas and petroleum by the current amount of the percentage imported. Currently, ~18% of natural gas and ~70% of petroleum are imported. Reducing petroleum usage by 70% may not be attainable and is also out of the regional government’s control to a great extent, but it should be attempted by encouraging alternatives to automobile usage. This was not included in the model, although improvements in corporate average fuel economy standards were factored in. Figure 9 shows the expected increases in energy consumption related to increased population and related land development. The significant growth over time shows the imperative to infuse better technologies. Figure 10 illustrates the potential effects of strategic policy interventions within the energy sector consisting of efficiency improvements for new and existing building, a transition to zero-net-energy buildings in the residential and commercial sectors, an efficiency program for the agricultural sector, and an emphasis on efficiency and transition to hybrid automobiles in the transportation sector. Emissions Air pollutant and emissions criteria stem predominantly from energy usage. Carbon dioxide emissions come from energy usage and other sources such as deforestation, land-use change, and agriculture. The military installation is just one of the regional users of energy and one of the generators of air emissions. If the installation is large and the regional population is small, the installation may be both the largest single energy user and the dominant emitter. Historically, this has been the pattern, but in the past five decades, growth around installations has been significant. Although an installation may still be the largest single user of energy and producer of emissions, it may not be a large part of the aggregate emissions in the region. This is especially
52
Deal et al.
true when large urban centers and industrial areas are near the installation. Therefore, the emission patterns of the installation operations may not be significant compared with regional emissions due to energy demands and general consumption patterns. From an overall emissions and energy perspective, the sustainability of an installation in the regional context depends on the region as a whole moving to a more benign emissions footprint on the basis of efficient energy use and consumption patterns and the use of low-carbon energy sources along with the installation. Table 1 provides the air emissions projections based on the modeled energy projections, including the strategic policy interventions defined in the energy section. Infrastructure Costs Infrastructure demand and costs (transportation, utilities, other urban infrastructure, and air pollution) were also estimated
as part of the SSA analysis. This involved updating, improving, and implementing the Social Costs of Alternative Land Development System (SCALDS), originally developed for the U.S. Department of Transportation, Federal Highway Administration, by Parsons Brinckerhoff Quade and Douglas, Inc., of Portland, Oregon (Delucchi, 1998). The SCALDS model has three general calculation paths. The physical development path models the consumption of land, projected mixture of new housing units, local infrastructure cost, and annual operating costs of sewer, water, and storm-water services. Projections of the average amount of nonresidential building space needed to support new development are also possible. The total travel cost path models the annual operating cost of peak and nonpeak travel on a passenger-miles-traveled basis. The third path models the air pollution produced by transport mode, the energy consumed by transportation, and the energy
Figure 9. Fort Bragg region energy projection.
Figure 10. Fort Bragg region energy projection with policy interventions. inc—including, wd— wood, lpg—liquid petroleum gas, kero—kerosene.
Strategic Sustainability Assessment consumed by residential land use in nondollar units. The residential energy consumption contains a factor that approximates the nonresidential energy consumption. This path also estimates the cost of the energy consumed by transport and residential land use. The final outputs from the model are regional projections of the demand and costs for transportation, local infrastructure, and residential and commercial energy over the next 30 yr. Infrastructure costs were analyzed for three scenarios for the Fort Bragg region: BAU, LID, and TOD. The results indicate a potential net savings of $751,208,503 in local infrastructure costs between 2000 and 2035 should the region follow LID strategies (Table 2). Most of the savings would come from nontransportation-related energy savings and reducing the need for engineered sanitary and storm-sewer infrastructure. PULLING IT TOGETHER LEAM GeoPortals Our SSA PSS (the SSA GeoPortal) uses an open-source spatial data viewer (MapViewer) in conjunction with a Web-based map service (Google Maps API [application programming interface]) to render and view map images, impact analyses, and other data. Users can easily pan, zoom, and move around the informa-
SO2
53
tion. Using the Google Maps API also enables the user to view satellite images of the data (Figs. 11 and 12). Because of the ability to zoom in closely, one can get a real sense of the ways in which land-use changes might affect their community. They can physically place themselves in the image and locate issues of primary importance to them relatively quickly and easily. Gaining access to the visual data, however, is not enough to effectively influence planning decisions. A critical step involves the system from which the information is both derived and managed. An ideal PSS conveys the complexity of regional dynamics in a setting accessible to those of elementary technological savvy. Our experiences have suggested that the entire PSS architecture must reside in a simple and easy-to-use content management system architecture that requires no specialized skills to upload, create, manage, or view information. Our GeoPortal-based PSS is based on the open-source content management system Plone (http://www.plone.org). We customize Plone for the SSA application through the design and implementation of Plone objects for storing aspatial data, plans, LEAM scenarios, and other spatial data. This system, although still in the exploratory stage, allows content of different types — from text to images to GIS layers—to be managed in a consistent and uniform manner, effectively simplifying the reporting and analysis process. Full-text searches (see Discussion about ISoP
TABLE 1. FORT BRAGG REGIONAL AIR EMISSIONS PROJECTIONS WITH POLICY INTERVENTIONS BY YEAR 2000 2015 2025 2035 2015 2025 2035 Regional air emission projections related to increased population Regional air emission projections related to strategic and related land development policy interventions within the energy sector (Pollutant [tons]) (Pollutant [tons]) 37,200 43,625 48,716 53,879 41,720 32,791 19,425
NOx
23,489
25,351
31,927
35,531
27,255
24,175
CO
5352
6480
7264
8056
6273
5959
6201
CO2
9,854,773
11,761,663
13,253,675
14,761,922
11,291,910
9,555,083
7,039,789
1364
1566
1726
1887
1189
935
556
Particulates Hg
0.99
1.19
1.35
1.51
1.13
0.89
TABLE 2. INFRASTRUCTURE COST COMPARISON OF BUSINESS-AS-USUAL (BAU) TO LOW-IMPACT DEVELOPMENT (LID) SCENARIOS Social costs comparison between BAU and LID scenarios
BAU value (US$)
LID value
Increase in annual private cost for water
136,677,328
133,679,852
LID to BAU comparison (%) 97.8
Increase in annual private cost for sewer
138,538,718
136,489,870
98.5
Increase in annual private cost for stormwater systems
13,853,872
13,648,987
98.5
898,233,935
806,054,717
89.7
6,618,014,035
5,866,805,532
88.6
Increase in annual nontransportation energy costs Local infrastructure costs
21,422
0.52
54
Deal et al.
Figure 11. The Strategic Sustainability Assessment (SSA) GeoPortal, http://www.leam.uiuc.edu/ssa/.
below) can be combined with spatial queries to help locate data of interest. A built-in workflow capability supports a create-reviewpublish cycle for new content. Map exploration allows the usual selection of raster and vector map overlays, the ability to zoom and pan, and an added ability to dial back and forth in time within LEAM dynamic simulations results. This allows SSA users to experience potential changes to complex urban (and nonurban) systems in a systematic manner. There are opportunities for additional information, metadata, and comment on each piece of information presented. When GIS layers and legends are included in the system, the potential of the PSS system for making information useable, accessible, and manageable is illustrated. The SSA GeoPortal PSS helps regional stakeholders to integrate activities and data that lead to plan formulation and policy making by making the results of model building and scenario planning viewable and accessible. This creates real opportunities to engage decision makers, stakeholders, and the general public in the process of making both plans and planning decisions. With
advances in open-source Internet technology, we believe there is untested potential for the efficient and democratic creation of plans without the institutional largesse and conflict of twentiethcentury planning processes. DISCUSSION In its journey to increase sustainability, the Fort Bragg region has embarked on a series of ongoing, regular studies and reports that focus on specific issues and that enable the development of implementation plans and concepts. This journey has several byproducts, for example, beneficial partnerships and opportunities for further dialogue with internal and external stakeholders, recommendations for policy development and new initiatives as the Army works to achieve the goals of the ASE, and identification of short-term actions needed to ensure that long-term goals are met. Many regions are looking for cooperative planning opportunities between local and regional agencies in pursuit of sustainable
Strategic Sustainability Assessment
55
Figure 12. A view of the Land-Use Evolution and Impact Assessment Model (LEAM) results in the Fort Bragg region using the Strategic Sustainability Assessment (SSA) GeoPortal. Yellow and red areas are currently urbanized areas or areas that will become urbanized by the year 2035; these overlie a section scaled aggregate analysis, where darker blue is more intense development.
growth solutions. However, individual plans often overlap and possibly conflict with each other, causing the formation of partnerships and actions to fail. A key component of the SSA in helping regions to succeed is the delivery of plan content. Recent literature refers to this as an information system of plans (ISoP). Currently, considerable effort is expended on making plans, but little attention is paid to making them usable by related projects and planning agencies. An ISoP is an interactive, centralized database designed to help local, county, and regional decision makers form policy in a multiplan environment and allow plan content to be more easily and intelligently retrieved. It facilitates analysis of plan overlap and conflict. Ideally, the GeoPortal records the spatial and temporal scope of the plans in the ISoP, along with an abstract description and a link to the plan in its entirety. Using the SSA GeoPortal, stakeholders and decision makers not only have access to the data and analyses on which the original plan was based, but also urban development decisions and plans that have been made since that time. New planning activi-
ties no longer start from scratch, but instead take into account plans already made by other agencies and entities working on the issue at hand. They quickly assess the actions in the original plan that are relevant and those that are not; they develop a revised set of actions that reflect a current understanding of the region. Furthermore, analyses by consultants and the data used in these plans and analyses are mandatorily stored in a publicly owned information system. CONCLUSIONS Contemporary urban society faces increasing challenges— burgeoning populations, climate change, increasing resource demands, and constantly evolving political environments—in balancing the needs of economic development with goals of sustainable development, especially relating to the environment and equity. Government agencies, when they are able to acknowledge these goals, find it difficult to communicate the long-term
56
Deal et al.
benefits of difficult policy choices designed to address these challenges. Citizens, on the other hand, are limited by their capacity to access and understand the complexity and range of information needed to participate effectively in a public process. In the field of land-use planning and policy, this information relates to a wide range of decisions and investment choices made by public entities, including zoning, infrastructure development, housing location and type, green infrastructure questions, social capital support, and environmental preservation. Increasing public accessibility to the information (and the models that produce it) will affect such decisions. A link between the potential policy and investment questions and their relevant consequences has several advantages. First, making the information available and visually accessible will enable the public to play a greater role in individual and collective decision-making processes. Second, access to models that provide better information about the future should help produce better decisions. Third, development of an innovative cyber PSS will help enhance communal goals of regional sustainability by facilitating alternative forms of policy making. Active participation of stakeholders is an important component of the planning process. Participatory planning has traditionally been led by public agencies that rely on face-to-face contacts in venues such as town-hall meetings, but information limitations in various forms often restrict the public’s access or interest in participating in the processes, making them “expert”-driven with intermittent public feedback. Technological innovations in policy analysis and computing have started to break down these barriers in many aspects of governance by providing more information and services through the Web. This has created an opportunity to develop applications that extend interactivity by providing analytical tools through the Web that allow the public to perform onthe-fly policy evaluations and provide immediate feedback. More specifically, these tools allow users to compare outcomes of policy choices in relation to location, type, and intensity related to spatial growth and development of places (cities, regions, etc.) on the basis of their impacts on various quality-of-life indicators— and their potential implications on adjacent military operations. In this chapter, we have shown how geoinformatic technologies are helping to change the way we think about planning for sustainability. Future work in this arena involves the development of planning support systems that can be created and that can evolve with an understanding of the actual use and processing of information by lay people, especially with respect to decision making and risk. Such systems would be informed by an understanding of the ways in which communities of people (as opposed to the individuals in communities) think about issues and risks, and the ways in which communities come to understand the quality of information pertinent to planning from a variety of sources. These systems might be considered “sentient,” in that they respond to the individuals that are using them and the evolving data that inform them. The ultimate objective in this and related work, however, is to effectively create an easily accessible support system that con-
verts raw data into usable information that communities would find helpful in reaching better-informed decisions. ACKNOWLEDGMENTS We acknowledge the support provided by Michael C. Cain, director of the Army Environmental Policy Institute, and Tad Davis, Deputy Assistant Secretary of the Army. The real-world experiences gained at Fort Bragg and the Sandhills region would not have been possible without active engagement from a broad array of planning professionals, especially Paul Wirt, Fort Bragg, and Jon Parsons, Sustainable Sandhills. REFERENCES CITED Assistant Secretary of the Army for Installations & Environment (ASAIE), 2004, The Army Strategy for the Environment: Washington, D.C., Office of the Assistant Secretary of the Army for Installations and Environment, 12 p. Brail, R.K., and Klosterman, R.E., eds., 2001, Planning Support Systems: Integrating Geographic Information Systems, Models, and Visualization Tools: Redlands, California, Environmental Systems Research Institute Press, 428 p. Budthimedhee, K., Li, J., and George, R.V., 2002, ePlanning: A snapshot of the literature on using the World Wide Web in urban planning: Journal of Planning Literature, v. 17, no. 2, p. 227–246. Bureau of Labor Statistics, 2008, Occupational Outlook Handbook, 2008–09 Edition: Washington, D.C., U.S. Department of Labor, 720 p. Cox, D.J., 1990, The art of scientific visualization: Academic Computing, v. 4, no. 6, p. 20–56. Deal, B., 2008, Sustainable Land-Use Planning: The Integration of Process and Technology: Saarbrücken, Germany, Verlag Dr. Müller, 128 p. Deal, B., and Pallathucheril, V.G., 2007, Developing and using scenarios, in Hopkins, L.D., and Zapata, M., eds., Engaging the Future: Forecasts, Scenarios, Plans, and Projects: Cambridge, Massachusetts, Lincoln Institute of Land Policy, 374 p. Deal, B., and Schunk, D., 2004, Spatial dynamic modeling and urban land use transformation: A simulation approach to assessing the costs of urban sprawl: Ecological Economics, v. 51, no. 1–2, p. 79–95. Delucchi, M.A., 1998, The National Social Cost of Motor Vehicle Use: Federal Highway Administration Report FHWA-PD-99-001, http://www.fhwa.dot .gov/scalds/socialcost.htm (accessed June 1998). Ekstrom, M.P., 1984, Digital Image Processing Techniques: Orlando, Florida, Academic Press, Inc., 372 p. Ewing, R., Porter, D.R., Heflin, C.C., and DeAnna, M.B., 1996, Best Development Practices: Doing the Right Thing and Making Money at the Same Time: Chicago, American Planning Association, 180 p. Geertman, S., and Stillwell, J., eds., 2003, Planning Support Systems in Practice: Berlin, Springer, 578 p. Hutson, S.S., Barber, N.L., Kenny, J.F., Linsey, K.S., Lumia, D.S., and Maupin, M.A., 2004, Estimated Water Use in the United States in 2000: U.S. Geological Survey Circular 1268, 46 p. Jenicek, P., Svehla, P., Zabranska, J., and Dohanyos, M., 2004, Factors affecting nitrogen removal by nitritation/denitritation: Water Science and Technology, v. 495–496, p. 73–79. Johnes, P.J., 1996, Evaluation and management of the impact of land use change on the nitrogen and phosphorus load delivered to surface waters: The export coefficient modeling approach: Journal of Hydrology (Amsterdam), v. 183, no. 3–4, p. 323–349, doi:10.1016/0022-1694(95)02951-6. Kim, T.J., Wiggins, L.L., and Wright, J.R., eds., 1990, Expert Systems: Applications to Urban Planning: New York, Springer-Verlag, 268 p. Lozar, R.C., Ehlschlaeger, C.R., and Cox, J., 2003, A Geographic Information Systems (GIS) and Imagery Approach to Historical Urban Growth Trends around Military Installations: Champaign, Illinois, Engineer Research and Development Center Report TR-03-9, 47 p. Lozar, R.C., Meyer, W.D., Schlagel, J.D., Melton, R.H., MacAllister, B.A., Rank, J.S., MacDonald, D.P., Cedfeldt, P.T., Kirby, P.M., and Goran, W.D.,
Strategic Sustainability Assessment 2005, Characterizing Land-Use Change Trends around the Perimeter of Military Installations: Champaign, Illinois, Engineer Research and Development Center Report TR-05-4, 106 p. Malm, W., Kelley, K., Molenar, J., and Daniel, T., 1981, Human perception of visual air quality (uniform haze): Atmospheric Environment, v. 15, no. 10–11, p. 1875–1890. Orland, B., 1992, Evaluating regional changes on the basis of local expectations: A visualization dilemma: Landscape and Urban Planning, v. 21, no. 4, p. 257–259, doi:10.1016/0169-2046(92)90035-X. Rosenfeld, A., and Kak, A.C., 1982, Digital Picture Processing: Orlando, Florida, Academic Press. U.S. Census Bureau, 1990, Census of Population and Housing: Washington, D.C., U.S. Department of Commerce.
57
U.S. Department of Defense (DoD), March 2005, Department of Defense Report to the Defense Base Closure and Realignment Commission: Department of the Army Analysis and Recommendations, BRAC 2005 Volume III: Washington, D.C., U.S. Department of Defense, 760 p., http:// www.brac.gov/finalreport.html (accessed 6 July 2011). Wickens, C.D., 1992, Engineering Psychology and Human Performance (2nd ed.): Glenview, Illinois, Scott, Foresman, and Co., 560 p. Wu, F., and Martin, D., 2002, Urban expansion simulation of southeast England using population surface modelling and cellular automata: Environment & Planning A, v. 34, no. 10, p. 1855–1876, doi:10.1068/a3520. MANUSCRIPT ACCEPTED BY THE SOCIETY 17 FEBRUARY 2011
Printed in the USA
6.
The Geological Society of America Special Paper 482 2011
Grid query optimization in the analysis of cone penetration testing data Patrick M. Dudas Hassan A. Karimi Geoinformatics Laboratory, School of Information Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA Abdelmounaam Rezgui Center for Intelligent Spatial Computing, George Mason University, Fairfax, Virginia 22030, USA
ABSTRACT Soil liquefaction takes place during and/or after the occurrence of an earthquake and is a major contributor to urban seismic risk. Geologists use a technique called the cone penetration test (CPT) to determine the properties of soils, including liquefaction levels, which yields large amounts of soil data. The analysis of such massive amounts of data requires high-performance computing resources. In this paper, we present GQO (Grid Query Optimizer), a distributed algorithm that enables the analysis of large CPT data sets efficiently on a grid cyberinfrastructure.
mapping (Barney, 2009) are just a few of the many examples of computationally intensive and data-intensive problems requiring high-performance computing solutions. These types of problems involve retrieval of massive data sets from multiple data sources (e.g., university laboratories, research laboratories) and require complex computations to answer a single scientific question. Finding efficient means for managing and sharing these data sets is a challenge to not only scientists and engineers, but to any discipline that employs very large data sets in their respective fields of study. The geographical distribution and computational complexity of many scientific problems require efficient alternatives for distributed data retrieval and query processing. Grid computing has been recognized as a potential platform for distributed data processing. A grid cyberinfrastructure can be formed by connecting several computers via a network connection and using common management software to schedule and distribute jobs over these computers. Typically, the computers of a grid belong to different entities and are physically in different locations. TeraGrid
INTRODUCTION For data-intensive problems, most solutions can be found using a sequential program on a single central processing unit (CPU) within a reasonable amount of time. This requires little overhead to complete the task at hand, and scientists and engineers, with limited programming skills, can produce solutions within a practical time frame and with minimal effort. However, even in these cases, programs may still run for a long period of time. Additionally, there are cases where programs may require large amounts of storage in the order of terabytes or even petabytes. Often, such problems are compounded to a point that traditional computing resources are no longer feasible, and they require the application of dedicated high-performance computing resources with large storage capacities. For these highperformance computing resources, costs include not only the initial startup but also maintenance and upgrade. Research projects such as weather simulations, astrophysics, earthquake simulations, electrical circuits, and human genome
Dudas, P.M., Karimi, H.A., and Rezgui, A., 2011, Grid query optimization in the analysis of cone penetration testing data, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 59–67, doi:10.1130/2011.2482(06). For permission to copy, contact
[email protected]. © 2011 The Geological Society of America. All rights reserved.
59
60
Dudas et al.
(National Science Foundation TeraGrid, 2010) is an example of a national grid that connects computers from around the country and has many nodes that are designated as supercomputers, which boast roughly two petaflops of computing resources and 50 petabytes of stored data and records (National Science Foundation TeraGrid, 2010). In a grid environment, each computer is considered its own independent node, and scalability is the ultimate objective. New computers can be added to a grid without much effort. Most grid environments require a batching or queuing hierarchy whereby users can request nodes and an amount of time on each node. Other variables can be added to a request, including a specific operating system, hardware, memory allocation, or programming language support. Grids can vary in the ways in which they allocate time and resources by either allocating nodes when the system is idle or partitioning small amounts of resources at all times. By utilizing existing resources while not influencing their local operations, parallelization of computational jobs is possible and results in improved execution times similar to those found in Amdahl’s law, which states the prediction of maximum operations using multiple processors (Amdahl, 1967). Despite the potential of grid computing for solving many complex scientific problems, grid computing has yet to be applied to solve a large number of challenging problems. Many scientists are still unable to access large amounts of data at different locations, bring them together, and perform computations on them in a controllable environment. Query processing optimization remains a challenge when the size of data set is very large and the data are geographically distributed. In such cases, users may have to manually partition the query to obtain the data from different locations and then manually join the resulting data sets. To automate query processing optimization, Grid Query Optimizer (GQO) was developed (Liu and Karimi, 2008). GQO optimizes query processing in distributed heterogeneous databases on grid environments. In this paper, we discuss the use of GQO for cone penetration test (CPT) data on PittGrid, the University of Pittsburgh’s campus grid infrastructure.
CPT data are usually collected with CPT trucks (Fig. 1) that are positioned at locations of interest. Most data sets are from areas in the coasts of California, and some areas of Indiana and are used for the exploration of shallow subsurfaces ( 1) from one (Ei), where all Ei, Ei+1, Ei+2, …, En are mutually incompatible and together exhaust the whole set of possible outcomes of Ei (Fig. 10). Two modi are possible for this connective: (1) subject remains the same but changes the predicate(s) it had in the initial event, and (2) subject changes itself but the predicate(s) it had in the initial event remains the same. They will be shown below for two resulting events. 1. Attributing incompatible combinations of predicates to a similar subject, modus 1 performs division of predicates: ii Si Pj PkPl Furcation Modus 1 ii Si Pj~PkPl, ii Si Pj Pk~Pl (Fig. 10A). This represents a situation of natural choice for a process to take this or that scenario, and it can be illustrated by the following case: “Lava (Si) extrudes (Pj) yet neither piles up (Pk) not slides down (Pl) Furcation Modus 1 Lava (Si) that has extruded (Pj) does not pile up (~Pk) but slides down (Pl), Lava (Si)
Theoretical foundations of the event bush method
Sj P
153
Sj P
Sj P Si~PSj
Si~PSj Si PSj
Si~PSj
Si~PSj Si PSj Si~PSj
Figure 9. Subjects and predicates in the influx connective on the multiflow structure background.
Si~PSj Si PSj
Sj P
Si~PSj
Si~PSj Si PSj
B
A
Si Pj ~Pk Pl Si Pj Pk Pl
Sj Pl Si Pl
Si Pj Pk ~Pl
Sk Pl
Figure 10. Subjects and predicates in the furcation connective on the multiflow structure background: (A) modus 1 and (B) modus 2.
154
Pshenichny and Kanzheleva
that has extruded (Pj) piles up (Pk) and does not slide down (~Pl).” Here, “has extruded” obviously means “does not extrude anymore.” This is true if we are referring to a particular portion of lava (certainly, not the entire erupted lava, which is free to behave in a variety of ways simultaneously). Though, it should be kept in mind here that the peculiarity of “having acquired” a property may be formally interpreted not as having but, on the contrary, as not having this property anymore—“lava has extruded and does something else on the surface” actually means that lava does not extrude now; still, it has been accepted in the event bush that the property must relate to subject in such case, contrary to the case that a similar subject does not have this property at all (e.g., lava was spilled out and, hence, never extruded). With deeper formalization, this shortcoming must be fixed. 2. If subjects of resulting events are mutually exclusive and together exhaust the class that is the subject of initial event, then the following can be put forth (shown for two resulting events for simplicity): ii Si Pl Furcation Modus 2 ii Sj Pl, ii SkPl (Fig. 10B). Like in the flux modus 1, the process described here leads to complete change of subject, but unlike that modus, here is an option. For instance, “Water (Si) is in the gully on slope of the volcano (Pl) Furcation Modus 2 Stream (Sj) is in the gully on slope of the volcano (Pl), Lahar (Sk) is in the gully on slope of the volcano (Pl).” This is not just a classification of events but exactly expresses the cause-effect relationship, because first water accumulates in a gully, and then this gives birth either to a big stream or to lahar. Similar predicate(s) are necessary here to demonstrate that natural objects involved both in the initial and the resulting events belong to a similar superclass characterized by similar predicate(s) and together exhaust it. Both modi of furcation involve secondary events only. It is assumed that the geoentity denoted by the subject of the initial event does not exist anymore, and geoentities denoted by the subjects of resulting events appear as a result of the process denoted by furcation. By definition, the resulting events are mutually incompatible and together exhaust the whole variety of possible outcomes of the initial one; hence, no other consequences of the said event are possible. Then, if such event is included into the connectives of furcation and flow (regardless of modus), not more than one of these may be true. Conflux Connective Conflux describes production of one event (En) by multiple events (Ei, Ei+1, Ei+2, …, En–1) (see Fig. 11). There are three modi of this connective. 1. The events Ei, Ei+1, Ei+2, …, En–1 have at least one common predicate, the subject of En comes out of this predicate, and the predicates of En come from the subjects of Ei, Ei+1, Ei+2, …, En–1. Both transformations are governed by semantic relations between the predicate of initial events and subject of the result-
ing one defined similarly as the semantic relation in the influx connective. In terms of subjects and predicates, for two initial events with one predicate, each can be shown as ii Si Pk, ii Sj Pk Conflux Modus 1 ii SPkPSi PSj (Fig. 11A), e.g., “Droplets of magmatic melt (Si) flow in gas envelope downslope (Pk), Fragments of crystals (Sj) flow in gas envelope downslope (Pk) Conflux Modus 1 Downslope flow in gas envelope (SPk) involves droplets of magmatic melt (PSi) and fragments of crystals (PSj).” This modus portrays a case when several events coinciding by some properties or circumstances (e.g., those of space, time, composition, or others) produce a new geoentity (denoted by the subject of the resulting event), and this geoentity is determined exactly by their common properties or circumstances that unite them. 2. The events Ei, Ei+1, Ei+2, …, En–1 have at least one common predicate, the subject of En comes out of their subjects, and the predicates are the common predicates of Ei, Ei+1, Ei+2, …, En–1. For two initial events, it looks as follows: ii Si Pi, ii Sj Pi Conflux Modus 2 ii SSiSj Pi (Fig. 11B). An example of this mode could be the following: “Sericite (Si) alters granite (Pi), chlorite (Sj) alters granite (Pi) Conflux Modus 2 Sericitic-chloritic aggregate (SSiSj) alters granite (Pi).” The meaning of this modus is close to that of the previous one, but the new entity is formed not by coinciding circumstances or properties but by “merging” subjects of initial events. 3. The events Ei, Ei+1, Ei+2, …, En–1 all have similar subjects and similar predicates, and these are the subjects and predicates of the event En (shown for two initial events): ii some Si Pj, ii some Si Pj Conflux Modus 3 ii any Si Pj. This modus is used only with the modus 6 of flux described next. In any modus of conflux, geoentities denoted by the subjects of initial events do not exist anymore, and the geoentity denoted by the subject of resulting event appears. All events united by conflux are secondary. Flux Connective Modus 6 and Conflux Modus 3: Explication These two modi have been designed specially to account for the important case of self-similarity of subjects that often occurs in the geoscience. As was mentioned previously, any succession of rocks is a rock, any class of co-occurring ground shakings is ground shaking, and, ultimately, any class of related geoenvironments is a geoenvironment itself. Hence, very often, the subjects in the event bush have to be dual, being simultaneously an integer entity and class dividable into subclasses, each
Theoretical foundations of the event bush method
155
B
A
S1 P1
S1 P1 SP1 PS1 PS2
SS1S2 P1 S2 P1
S2 P1
Figure 11. Subjects and predicates in conflux connective: (A) modus 1 and (B) modus 2.
of the latter being a similar, though not the same, geoentity. The following pattern is suggested to represent this formally (Fig. 12). For instance, a single layer of sandstone is being eroded by rivers in river valleys and by abundant vegetation at watersheds. The results of these destruction processes are virtually diverse, mechanical removal in one case and chemical transformation in situ in the other. Hence, as soon as the layer may become exposed to some kind of denudation, semantically the corresponding subject loses its generality, and structurally the modus 6 of flux takes place. The state of geoentity in this case is the quiescence, but with an opportunity of disruption, which occurs by influx connectives (see Fig. 12). Any flux connective of modus 6 may be used not alone but in an ensemble with
numerically important here because it acquires temporal, spatial, or other meanings.
• other connectives of said modus with similar events to the left
5.
An Instructive Analogy The suggested kit of connectives complies well with the cases of behavior of geoentities in directed alternative change environments described in the “Properties of Geoenvironments…” subsection: 1. 2. 3. 4.
from it, • flux connectives of modus 5, the events at the left parts of
which are those to the right from flux modus 6 connectives, • influx connectives modifying events to the left from flux
modus 5, and • a conflux connective (see following) uniting unmodified
6. 7. 8.
Case 1 is represented by modus 5 of flux, Case 2 is represented by modus 4 of flux, Case 3 is represented by modus 1 or 2 of flux, Case 4 is represented by modus 4 (if results are of the same class as the initial geoentity) or 2 (if of different classes) of flux, Case 5 is represented by furcation modus 2 (if results are of similar class as the initial geoentity) or 1 (if not); sometimes the latter can also be expressed as a succession of furcation modus 2 and flux modus 1 or 2, Case 6 is represented by modus 2 of conflux, Case 7 is represented by modus 1 of conflux, and Case 8 is represented by influx.
events of flux modus 5 (Fig. 12). However, those parts of the sandstone layer that occur at valley sides and at watersheds but that remain intact continue to behave as an integer body that is expressed by a general subject. Therefore, they are confluxed back, and this is the virtue of conflux modus 3. This structural feature of event bush may appear especially useful for physical modeling of complex influences on various subjects. Quiescence expressed by repeated flux modus 5 is also
This gives us the reason to consider the choice of these connectives optimal for describing the processes in environments of directed alternative changes. In addition, one may notice a parallelism between these connectives and mechanisms of production of new organisms known in biology. Indeed, flux is an analog to vegetative reproduction, in which one event is root, and the other is offspring. Influx can be paralleled with sexual reproduction. A modified event is the “mother,” the modifier is the “father,” and the result is the
Pshenichny and Kanzheleva
Any Si Pj
156
Some Si Pj
Some Si Pj
Some Si Pj
Some Si Pj
Some Si Pj
Some Si Pj
Some Si Pj
Some Si Pj
Any Si Pj
Any Si Pj
Any Si Pj
Figure 12. Flux modus 6 and conflux modus 3.
“child” (or, again, though in a bit different sense, “offspring”). The subject of “child” is naturally the same as the subject of “mother,” which is very similar to live birth, while the contribution of “father” is purely “genetic information” (the predicate). However, with time, the child may become really very alike with the father—this option is expressed by modus 3 of flux (though, from the point of view of this analogy, it is a root-offspring relation). Furcation is nothing else but division of cells, and therefore the subject is inherited by all newly formed “cells” as a nucleus would be. However, the analogy here is not complete because furcation implies that only one of the resulting events indicates a geoentity that takes place; others remain only possible but inexistent (contrary to a dividing cell that produces coexisting cells). Conflux represents one more mechanism of producing new organisms, the colonial growth. The resulting event is a “colony” formed by some law, either the unity of subject or of predicate of uniting “primitive organisms,” which certainly are “imprinted” in its structure (e.g., subject formed from a common predicate).
This definition, still being rather far from formal, fixes the event bush in a state in which it can be, and has been, applied to various geoscientific tasks. The avenue for further research is totally formal definition of event bush, which could be taken independently of a geoscientific context to be exported into the field of general informatics for “external testing.” This is expected to be achieved with the investigation of class-subclass relations within an event bush and between the bushes to explore the behavior of subjects and predicates throughout the bush and throughout a network of bushes. At present, however, we deem it pertinent to summarize the relations established within the bush between various types of events (ia, ib, ii, and iii) by its connectives: ia Ei Flux ii Ej , ia Ei Flux iii Ej , ii Ei Flux ii Ej ,
Definition of the Event Bush The conceptual framework put forth here brings us to a definition of the event bush method. The event bush is a geoinformatic method of construction of scenarios in environments of directed alternative change that is based on the multiflow structure, must include the connectives of flux and influx, and may include the connectives of conflux and furcation.
ii Ei Flux iii Ej , ia Ei ib Ej Influx ii Ek , ia Ei ii Ej Influx ii Ek , ii Ei ib Ej Influx ii Ek ,
Theoretical foundations of the event bush method ii Ei ii Ej Influx ii Ek , ii Ei Furcation ii Ej , ii Ei, ii Ei+1..., ii Ei+k Conflux ii Ek+1. Some remarks will be made in the following section to proceed to application issues of the event bush method. Concomitant Definitions, Rules, and Corollaries The corollaries and ultimate consequences of the definitions of event bush and the connectives need to be explored further to optimize their use and avoid possible ambiguity. However, some observations can be made right now. For instance, the possible ways to generate a new subject in the event bush are modi 1 and 3 of Flux and modus 1 of Conflux. Event ii can form cyclic patterns but with an entrance from ia or another ii and exit to iii or another ii. Also, for the obvious reason that the resulting events of furcation are incompatible, neither these nor their consequences, provided they have no other causes, may be united by conflux. Another point stated in the definitions of all the connectives can be summarized as follows. If a predicate P appears for the first time in relation to subject S in event E, this automatically means that in all events throughout the bush that have the same subject but that do not follow directly or indirectly from E, there must be the negation of P, if not explicitly put otherwise. Some more definitions would enable reasoning and inference in the event bush framework. Because the environment the event bush aims to describe is that of directed alternative changes, it seems useful to interpret the notion of change in terms of event bush. In the event bush, change is a correctly built expression including one (only one!) connective of any type. General formulae for these expressions were listed in subsection Connectives of the event bush. Change has one causal part (that includes the events, one or more, left of the connective) and one effect part (the events to the right of the connective). The two changes in which an ia event may be involved as a cause alone is flux modus 5 to ii or iii events and flux modus 6 to ii events. Also, it should be postulated that the left (causal) part of each change is unique, except for the flux modi 4 and 6, and given the left part and the connective (regardless of modus), the right part may be also singular. This means that 1. a given pair of events may produce either solely one influx, or one conflux, or nothing at all; 2. any three or more events may produce either one conflux, or nothing; and 3. if one (secondary) event is the causal part of flux, except for flux modus 2 in the meaning of influence, and simultaneously is the causal part of furcation, only one of these relations may take place in reality; if one (secondary) event is the causal part
157
of flux modus 2, in the meaning of influence of one event on another (so that the geoentity denoted by the subject of the event in the causal part is not meant to wane), and is also the left part of furcation, both relations may coexist in reality. Then, flow is a finite totally ordered set (Schröder, 2003) of changes, in which the left part of the first change is a type ia event, the right part of the last change is a type iii event, and the right part of every preceding change is the left part of the following change. Flow describes such geoentities as process or quiescence. Flow is always followed from type ia to iii. If there is a change with influx in it, the modifier of this influx (ib or ii) is included, but not the events that caused it, unless there are successive influx changes. If there is a flow, in which at least one event participates in causal parts of two changes (e.g., one, being flux, and another, furcation), there is another flow, which overlaps with the former one in the interval from the beginning (ia) to this very event, and then ramifies. Furthermore, if there is a flow without the type ii events, first, it may only consist of one change, namely, the flux modus 5 from type ia to iii, and then, there also must be a flow that overlaps with this one in the beginning (i.e., at the ia event) and includes at least one influx. This influx of ia and another event, ib or ii, will produce a type ii event, which will then route this flow further. There must not be events in a bush not included in a flow. Ramifying (furcating or influxing), confluxing, or coming to common effect by different fluxes, flows form a multiflow, which is, by definition, the structure of the event bush. A graphic notation has been adopted for event bush. It is based on the multiflow structure (Fig. 6) placing events ia on the left, ib on the top, ii in the center, and iii on the right. Events are plotted as rectangular boxes. The connectives are marked: flux, as ordinary arrow from cause to effect (Fig. 7A), influx, as an arrow from the modifier to the effect with a “right turn” from the modified event (Fig. 7B), conflux, as “double right turn” (Fig. 7C), and furcation, as a circle with ramification to effects (Fig. 7D). This visualizes the bush and makes it possible to “drive” along the flows following the “road painting” of connectives. Now, the applications of the method can be considered. STRATEGIES OF APPLICATION Individual Bushes Existing applications of the event bush (Pshenichny et al., 2005, 2009; Pshenichny and Fedukov, 2007; Behncke and Pshenichny, 2009; Diviacco and Pshenichny, 2010), however tentative, already appear complicated enough to be hardly treated by a human. This is not only because of the natural complexity of the modeled environments but largely because of a lack of semantic structure, which has been introduced in the Basic Assumptions section of the present paper. This structure and the
158
Pshenichny and Kanzheleva
rules of composition of the event bush have been continuing to evolve before reaching a state of maturity. Next, we will demonstrate one of the latest event bushes that fixes this weakness. A spectacular view of a slope of an active polygenic volcano with lava cones and flows and active faults is shown in Figure 13A. Figure 13B presents an event bush that explains how the observed bodies were formed and the other bodies had to cogenetically form below the surface or could alternatively form on and below the surface. This event bush can be recorded as follows (cause and effect parts of changes are shown in brackets for better readability). ([ia] Host rocks exist in volcano without fissures and dislocations) Flux Modus 5 ([iii] Host rocks exist in volcano without fissures and dislocations). ([ia] Host rocks exist in volcano without fissures and dislocations, [ib] Fissures develop) Influx ([ii] Host rocks in volcano are dissected by fissures yet not dislocated). ([ii] Host rocks in volcano are dissected by fissures yet not dislocated) Flux Modus 5 ([iii] Host rocks in volcano are dissected by fissures yet not dislocated). ([ii] Host rocks in volcano are dissected by fissures yet not dislocated) Flux Modus 2 ([ii] Fissures develop in nondislocated host rocks of volcano but are not filled with magma). ([ii] Fissures develop in nondislocated host rocks of volcano but are not filled with magma) Flux Modus 5 ([iii] Fissures develop in nondislocated host rocks of volcano but are not filled with magma). ([ii] Fissures develop in nondislocated host rocks of volcano but are not filled with magma, [ib] Magma ascends) Influx ([ii] Fissures that develop in nondislocated host rocks of volcano are filled with magma). ([ii] Fissures that develop in nondislocated host rocks of volcano are filled with magma) Flux Modus 3 ([ii] Magma fills the fissures in nondislocated host rocks of volcano). ([ii] Magma fills the fissures in nondislocated host rocks of volcano, neither stopping in the fissures in nondislocated host rocks of volcano and solidifying nor ascending through fissures in nondislocated host rocks of volcano) Furcation Modus 1 ([ii] Magma that filled the fissures in nondislocated host rocks of volcano stops in the fissures in nondislocated host rocks of volcano yet does not solidify and does not ascend through fissures in nondislocated host rocks of volcano, [ii] Magma that filled the fissures in nondislocated host rocks of volcano does not stop in the fissures and does not solidify in nondislocated host rocks of volcano but ascends through fissures in nondislocated host rocks of volcano). ([ii] Magma stops in the fissures in nondislocated host rocks of volcano and yet does not solidify) Flux Modus 4 ([ii] Magma that stopped in the fissures in nondislocated host rocks of volcano solidifies in the fissures in nondislocated host rocks of volcano). ([ii] Magma that stopped in the fissures in nondislocated host rocks of volcano solidifies in the fissures in nondislocated
A
Figure 13 (on this and facing page). Example of interpretation of volcanic environment by the event bush: (A) the environment proper (Mount Etna, Sicily, near Rifugio Sapienza), and (B) the event bush explaining the way in which the observed bodies were formed and the other bodies had to cogenetically form below the surface or could alternatively form on and below the surface. The lava cones in the foreground are a few dozen meters high.
([ii]
([ii]
([ia]
([ia]
([ia]
([ii]
host rocks of volcano) Flux Modus 5 ([iii] Magma that stopped in the fissures in nondislocated host rocks of volcano solidifies in the fissures in nondislocated host rocks of volcano). Fissures develop in nondislocated host rocks of volcano but are not filled with magma) Flux Modus 3 ([ii] Host rocks of volcano are dislocated along fissures not filled with magma). Host rocks of volcano are dislocated along fissures not filled with magma) Flux Modus 5 ([iii] Host rocks of volcano are dislocated along fissures not filled with magma). Undeformed slope without fissures, uncovered by lava exists on volcano) Flux Modus 5 ([iii] Undeformed slope without fissures, uncovered by lava exists on volcano). Undeformed slope without fissures, uncovered by lava exists on volcano, [ii] Fissures develop in nondislocated host rocks of volcano but are not filled with magma) Influx ([ii] Undeformed slope of volcano uncovered by lava is dissected by fissures). Undeformed slope without fissures, uncovered by lava exists on volcano, [ii] Host rocks of volcano are dislocated along fissures not filled with magma) Influx ([ii] Slope of volcano uncovered by lava is deformed by host rocks of volcano dislocated along fissures not filled with magma). Slope of volcano uncovered by lava is deformed by host rocks of volcano dislocated along fissures not filled with magma) Flux Modus 5 ([iii] Slope of volcano uncovered by lava is deformed by host rocks of volcano dislocated along fissures not filled with magma).
Fissures develop
Magma ascends Host rocks exist in volcano without fissures and dislocations
Host rocks in volcano are dissected by fissures yet not dislocated
Fissures develop in nondislocated host rocks of volcano but are not filled with magma
Host rocks in volcano are dissected by fissures without dislocations
Fissures develop in nondislocated host rocks of volcano but are not filled with magma Fissures that develop in nondislocated host rocks of volcano are filled with magma
Magma fills the fissures in nondislocated host rocks of volcano neither stopping in the fissures in nondislocated host rocks of volcano and solidifying nor ascending through fissures in nondislocated host rocks of volcano
Undeformed slope uncovered by lava without fissures exists on volcano
B
Host rocks exist in volcano without fissures and dislocations
159
Host rocks of volcano are dislocated along fissures not filled with magma
Magma that filled the fissures in nondislocated host rocks of volcano stops in the fissures in nondislocated host rocks of volcano yet does not solidify and does not ascend through fissures in nondislocated host rocks of volcano Magma that filled the fissures in nondislocated host rocks of volcano stops, and in the fissures in nondislocated host rocks of volcano solidifies and does not ascend through fissures in nondislocated host rocks of volcano
Magma that filled the fissures in nondislocated host rocks of volcano does not stop in the fissures and is not solidifying in nondislocated host rocks of volcano but ascends through fissures in nondislocated host rocks of volcano
Magma that filled the fissures in nondislocated host rocks of volcano stops in the fissures in nondislocated host rocks of volcano and solidifies and does not ascend through fissures in nondislocated host rocks of volcano
Host rocks of volcano are dislocated along fissures not filled with magma Undeformed slope uncovered by lava without fissures exists on volcano
Undeformed slope of volcano uncovered by lava is dissected by fissures
Fissures develop at slope of volcano but are not filled with magma
Undeformed slope of volcano uncovered by lava is dissected by fissures
Fissures develop at slope of volcano but are not filled with magma
Fissures that develop at slope of volcano are filled with magma
Lava reaches the slope along fissures on slope of volcano
Lava forms lava cones along fissures on slope of volcano Slope of volcano uncovered by lava is deformed by host rocks of volcano dislocated along fissures not filled with magma
Fissures that develop at slope of volcano are filled with magma
Lava flows along fissures on slope of volcano from fissures
Lava cones form and erupt lava along fissures on slope of volcano
Lava flows along fissures on slope of volcano
Lava cones form and erupt lava along fissures on slope of volcano Slope of volcano uncovered by lava is deformed by host rocks dislocated along fissures
Slope of volcano deformed by host rocks dislocated along fissures is covered by lava erupted from lava cones along fissures
Figure 13 (Continued).
Slope of volcano deformed by host rocks is covered by lava erupted from lava cones along fissures
160
Pshenichny and Kanzheleva
([ii] Fissures develop in nondislocated host rocks of volcano but are not filled with magma, [ii] Magma ascends through fissures in nondislocated host rocks of volcano) Influx ([ii] Fissures that develop at slope of volcano are filled with magma). ([ii] Fissures that develop at slope of volcano are filled with magma) Flux Modus 5 ([iii] Fissures at slope of volcano are filled with magma). ([ii] Fissures that develop at slope of volcano are filled with magma) Flux Modus 2 ([ii] Lava reaches the slope of volcano). ([ii] Lava reaches the slope along fissures on slope of volcano yet neither flows form fissures nor forms lava cones along fissures) Furcation Modus 1 ([ii] Lava that reached the slope along fissures on slope of volcano flows along fissures on slope of volcano from fissures and does not form lava cones on slope of volcano, [ii] Lava that reached the slope along fissures on slope of volcano does not flow along fissures on slope of volcano but forms lava cones along fissures on slope of volcano). ([ii] Lava forms lava cones along fissures on slope of volcano) Flux Modus 3 ([ii] Lava cones form and erupt lava along fissures on slope of volcano). ([ii] Lava cones form and erupt lava along fissures on slope of volcano) Flux Modus 5 ([iii] Lava cones form and erupt lava along fissures on slope of volcano). ([ii] Lava cones form and erupt lava along fissures on slope of volcano) Flux Modus 2 ([ii] Lava flows along fissures on slope of volcano from fissures). ([ii] Slope of volcano uncovered by lava is deformed by host rocks of volcano dislocated along fissures not filled with magma, [ii] Lava forms lava cones along fissures on slope of volcano) Influx ([ii] Slope of volcano deformed by host rocks of volcano dislocated along fissures not filled with magma is covered by lava erupted from lava cones along fissures). ([ii] Slope of volcano deformed by host rocks of volcano dislocated along fissures not filled with magma is covered by lava erupted from lava cones along fissures) Flux Mode 5 ([iii] Slope of volcano deformed by host rocks of volcano dislocated along fissures not filled with magma is covered by lava erupted from lava cones along fissures). Structural and semantic rules of the event bush make it possible, first, to construct a scenario describing some particular (e.g., actually observed) case involving the necessary and sufficient set of events (ia), (ib), (ii), and (iii), then abstract from this particular case and generate a complete set of possible scenarios based on these very events, their subjects, and predicates. For example, ([ii] Magma stops in the fissures in nondislocated host rocks of volcano yet does not solidify) Flux Modus 4 ([ii] Magma that stopped in the fissures in nondislocated host rocks of
volcano solidifies in the fissures in nondislocated host rocks of volcano). ([ii] Magma that stopped in the fissures in nondislocated host rocks of volcano solidifies in the fissures in nondislocated host rocks of volcano) Flux Modus 5 ([iii] Magma that stopped in the fissures in nondislocated host rocks of volcano solidifies in the fissures in nondislocated host rocks of volcano). Thus “missing parts” of a picture can be added, requiring minimum additional geoentities (i.e., in line with the “Occam’s razor” rule). Formulations in boxes in Figure 13B may differ from those above; this is due to more exquisite relationships between the predicates (“becomes obvious,” “becomes irrelevant,” “is repeated,” etc.) or even peculiarities of natural language. These need to be addressed in further investigations. To describe more complex environments, one may, depending on his or her vision of the environment, either develop one bush, or appeal to multibush constructions. Multibush Constructions One-Level Interrelated Bushes A formalization of different geoenvironments by event bush poses a natural question, How can different bushes be related to each other? These can be united “on plane” and “in space.” “On plane,” this can be done “in line” (tertiary events of one bush become primary internal or external events of another), in a simple network (tertiary events of one bush become primary internal events, and tertiary events of another become primary external events of the same third bush), and in a complex network (tertiary events of one bush become primary internal or external events of more than one bush; or vice versa, one bush takes primary internal or external events from more than one bush). “In space,” primary internal, primary external, secondary, or tertiary events of one bush can become secondary events of another. These interrelations, which still need to be explored and understood, must give us a better vision of interrelated geoenvironments and hopefully arm us with an efficient tool to create sophisticated information structures simulating the evolutionary scenarios of Earth. Nested Self-Similar Bushes Aiming to view singular geoenvironment at different scales as shown in Figure 5, a series of bushes can be constructed from the most general down to as much detailed as desired. This opportunity was demonstrated by Behncke and Pshenichny (2009). The theoretical foundation for building self-similar bushes may come from class-subclass or part-whole relations between the subjects of events of more general and less general bushes. However, these relations within and between the event bushes need to be studied thoroughly, and then the theory of self-similar event bushes will be reported.
Theoretical foundations of the event bush method Families of Event Bushes Variations in behavior of similar geoenvironment or geoenvironments of similar class under the same conditions, which seem to be controversial, can be modeled by a set of event bushes with similar semantics, or as a family of event bushes. This can be also used to cope with the incompleteness of geological record or other data. All bushes in a family have similar sets of (ia), (ib), (ii), and (iii) events but differ in structure, i.e., in the set of connectives uniting these events. While some of the connectives will certainly repeat from bush to bush (e.g., flux modus 2 [ia–iii]), others will not. Thus, one may figure out the relations maintained, or asserted, throughout the entire family, in some, or in none of its bushes. This strongly resembles the concept of tautology and executable formula in logic (Smullyan, 1968). Meanwhile, this also models the diversity of expert opinions and may serve for their elucidation and reconciliation (Wood and Curtis, 2004). DISCUSSION: THE EVENT BUSH IN THE GEOINFORMATIC AND INFORMATIC FRAMEWORK Interrelationships between objects of similar range can be found in many domains (e.g., history, biology, business, etc.). Hence, the approach to identification and classification of methods of information modeling based on the environments they address seems to be extendable from geoinformatics onto other thematic fields. This vision may be a valuable contribution of geoinformatics to general informatics, if it passes the “external” testing. Simultaneously, it looks interesting to view the evolutional aspect of armory of geoinformatics. Indeed, the main intellectual challenge is extraction of strict, “solid” vision from a hot intuitive mess of professional intuition, personal feelings, beliefs, impressions, and pieces of other knowledge, for which a metaphor
161
would be emplacement of lava or intrusive magma and its further cooling, solidification, and crystallization of “minerals” (e.g., less evolved, subjects and predicates, and more complex, interrelated events) in particular order. Let us compare the environment-based classification with another classification of methods of geoinformatics suggested earlier by Sinha et al. (2008) based on eventually strengthening semantics (Fig. 14). Transition from weaker to stronger semantics includes, by Sinha and coauthors, a passage from taxonomy (expressed as relational model written, e.g., in extensible markup language [XML]), to thesaurus, to conceptual model, and, finally, to logical theory formulated in terms of first-order logic (Smullyan, 1968). This is a natural pathway of increasing maturity of information modeling, which can be seen in general as progressing from description of discoveries to integration of information. Stronger semantics mean larger expressive power, higher level of interoperability, and, as a result, feasibility of more complex queries. It allows, according to Sinha et al. (2008), “inferences” (not necessarily in exact logical sense—see, e.g., Gentzen, 1934) from heterogeneous data sets. At the same time, it requires community agreement on conceptual relationships in the information domain. Such agreement, complete or partial, can be fairly well expressed by an event bush or a family of bushes. In comparing the two schemes of methods of geoinformatics, our classification (Fig. 2) and the evolutional scheme of Sinha et al. (Fig. 14), one may suppose that the early stage of evolution of methodology of geoinformatics (taxonomy/thesaurus stage) refers solely to the subject-based methods by our classification, while the mature state (conceptual model/logical theory stage) refers to both subject-based and event-based ones. This seems reasonable because the primitive relations between individual subjects and predicates (class-subclass, part-whole, etc.), which are the basis at the early stage, are not enough to set the relationships between the events
Figure 14. Conceptual evolution of methods of geoinformatics (by Sinha et al., 2008). XML—extensible markup language, OWL—Web Ontology Language, UML—unified modeling language, RDF—resource description framework, and DB—database.
162
Pshenichny and Kanzheleva
unless these relations are supported by much more evolved structural rules like the connectives of the event bush—just like a volcanic rock cannot transform into an intrusive one in the process of crystallization. The event bush, hence, must be regarded as a conceptual model, by the classification of Sinha et al. (2008). Tight interrelationship between the structure and semantics is the virtue of this approach. Clarification of the class-subclass relations between the subjects and predicates of the bush will bridge the gap between the event-based and subject-based methods and allow us to carefully track all the properties and their influence on further products generated in an environment (also marking permitted omissions of properties as done in Fig. 13B). Such a “property survey” can be easily performed throughout the bush or even interrelated bushes. This is the oncoming task of our research. The event bush also has a good perspective to become a formal theory (logical or alike), which occupies the highest position in the succession plotted by Sinha et al. (2008; see Fig. 14). There is a mind-teasing analogy between the connectives of the event bush and logical connectives (though no direct parallels between particular connectives seem plausible), changes and propositional formulae, flows present in a family of bushes and tautologies/executable formulae, and some others. Taking the left and right parts of any change as “meta-events,” one perhaps may think of nested bushes similar to logical connectives that may unite not only the variables but also formulae of any length. The issue of truth values or analogs for the event bush needs to be discussed; however, the interrelationships between flows based on the definitions of connectives may open a theoretical opportunity of inference nearly sensu Gentzen (1934). Certainly, this will not be a logical interpretation but a new formal theory built in the same way as existing logical systems (Hilbert and Bernays, 1934). More research is needed to find out whether the rules and definitions adopted herein (in the “Basic Assumptions…” section) may result in duplication of the events, presence of incompatible (controversial) inferences, or situations wherein a lack of clarity surrounds a predicate P as it relates to the subject S in a particular event or not, and hence, events that may and may not follow from this event by this or that connective. To our knowledge, this is the first attempt to create a more or less strict conceptual model in the geosciences. Existing event-based methods (e.g., Bayesian belief networks—see, e.g., Jensen, 2001; Aspinall et al., 2003; or event trees—see, e.g., Newhall and Hoblitt, 2002; Bayesian trees—see, e.g., Marzocchi et al., 2008) have not entered, in fact, the field of geoinformatics, because they have no semantic rules governing the formulation and behavior of events. Nevertheless, the event bush can also be used for their optimization, being convertible to both, though, with some loss of relevant information (Pshenichny et al., 2009). The well-known method of neural networks, in our opinion, may have semantic constraints, at least at the level of architecture (input layer, hidden layers, output layer), and recent publications show promising intersections of neural networks with the
subject-based methods (i.e., ontology design—see Abolhassani et al., 2006). The issues of conceptual interrelation and an opportunity of co-application of event bush and neural networks will be addressed in future studies. Importantly, in constructing the conceptual models, the event bush neither provides nor is based on definitions of involved events, or their subjects, or predicates, i.e., it allows geoscientists to reason exactly as they do standing at an outcrop and pointing to actual rock features. Ironically, absence of common natural language (English, Spanish, French, Russian, any other) may help specialists better understand each other, just pointing to different objects in a rock wall, drawing primitive sketches on sand, making gestures, and expressing their (dis)agreement. Thus, they manage to exchange complicated ideas that would otherwise require a few research papers to express, and, yet more important, collectively reason, avoiding concepts diversely defined in their scientific schools. They appeal not to concepts but directly to geoentities, implicitly naming them as “…what we see here and anything like this anywhere else.” Meanwhile, the factor that is left completely at everyone’s discretion is what to consider “like this.” This appears to be an efficient way to discard one of the most painful issues of formalization of the geoscience, the disagreement on definitions. Instead, what is said is, in fact, the following, “Whatever to consider LIKE THIS, it MUST behave so and so depending on this, this, and that circumstance.” In many cases, this seems to be a way to perform a working formalization of geoscientific contexts. It should be also stressed that the event bush operates with the geoentities, initially not accompanied by physical parameters or mathematical variables, but enables reasoning in terms of geoentities (“things” or even “words”) almost as strict as in terms of parameters and numerical values. Thus, it draws a clear distinction between the qualitative and quantitative and proves that geoentities do not need to be interpreted in terms of physics or mathematics to be treated strictly and formally. However, this does not undermine the quantitative modeling; rather, it allows one to better focus the models and more firmly root them in the geoentities. The theoretical opportunity for binding geophysical models to the event bushes describing the corresponding phenomena qualitatively, as well as practical benefits of this venture have been shown by Carniel et al. (2011). Also, Pshenichny et al. (2009) show how time (and then, space) values can be incorporated in the event bush. This is seen as a methodologically better option to address the studies of matter and energy of Earth. Information modeling of geoentities is even more urgent, because, like many other dominantly descriptive disciplines, the geosciences show a kind of internal resistance to formalization, application of artificial intelligence and design methods, and corresponding theoretical rethinking. This is due to the character of descriptive knowledge, which is genuinely intuitive and stuck to particular processes and objects, as well as personal apprehension instead of focusing on general properties and conditions and ubiquitous laws. From one side, this opens another, so far largely unused, opportunity for formalization taken precisely
Theoretical foundations of the event bush method by the event bush—mimicking the way the objects behave in nature, a formalism is sought that best suits our “feeling” of the object as well as our way of reasoning about it. However, from the other side, any theoretical rethinking of an information domain, being a desired consequence of application of the information technologies (Loudon, 2000), draws a picture different from that the community is used to, regardless of whether the latter is adequate and ever correct. Since the very first steps, the way of thinking that underlies the event bush has differed from virtually all existing pathways of thought “officially” perceived in the earth sciences, be it essentially inductive and nonstrict traditional geological consideration (for example, reconstruction of geological history), modeling in terms of physical parameters, building of a single-root event/probability tree (Newhall and Hoblitt, 2002), or compilation of a Bayesian belief network based on the expert’s knowledge and intuition. Implementation of formalism, then, will proceed much faster if it can find direct analogies in the “physical” world, as do trees, neural network shapes, mindmaps, or loop diagrams. The event bush has at least two such obvious common-sense analogies, one, with the patterns of flow, and another, with the methods of production of new organisms in biology. Also, one may parallelize its connectives with the “driving code” (to “drive” different ways from the left and top to the right) or other examples of everyday life. This gives a hope that the event bush will successfully pass the “external testing” and become an instrument of general informatics. At this stage, its expressive and communicative power will be employed to the fullest, and it may become an efficient tool to communicate within the scientific community and between this community and the society as a whole at the optimal balance between clarity and strictness. CONCLUDING REMARKS A peculiar feature of behavior of information in the earth science domain is ubiquitous interrelationship between entities of similar range. This provides the groundwork to consider geoinformatics as the study of environments that include geoentities (objects and/or processes) of similar class, which are also bound by some other relation. On this basis, a classification of geoenvironments can be suggested, which, first, seems to sufficiently organize the set of methods that are or can be used by the geoinformatics, and then, appears transferable to other alike disciplines (some life sciences, business activities, history, and others). In this classification, the complexity ascends from subjectbased to event-based methods (which model the no-change and changing environments, correspondingly). Among the eventbased, the complexity generally grows from nondirected to directed change environments, and among the directed change ones, from those with invariant to those with alternative changes. These kinds of environments may be addressed by a variety of methods, but none of them except the event bush suggests semantic and structural rules that would allow us to consider it a conceptual model and a method of information modeling. How-
163
ever, existing methods (at least Bayesian belief networks) may benefit from the event bush, which seems to be reducible to these, and thus it may become a mature means of information modeling. Basic assumptions, definitions, and rules of composition of the event bush in their present form allow us to widely apply the event bush in a variety of geoscientific tasks. An example of volcanological application was shown in this paper. So far, the event bush enables us to construct the interrelated flows of events, including those that are missing in observation, requiring minimum additional geoentities. Nevertheless, further theoretical research toward the complete formalization is needed to examine and develop the theoretical basement of the event bush method to ensure the absence of flaws and ambiguities and formulate the guidelines for automatic reasoning in it. ACKNOWLEDGMENTS We are deeply obliged to Sergey Nikolenko for constructive criticism of the first part of this work, as well as to Victoria Shterkhun, Zina Khrabrykh, Alexander Rezyapkin, and many others, who have worked side by side with us for years, and who encourage and support us. The research was carried out in the framework of the Marie Curie Action “International Research Staff Exchange Scheme” (FP7-PEOPLE-IRSES-2008) Cross-Disciplinary Knowledge Transfer for Improved Natural Hazard Assessment (CRODINAS; 2009–2011), EC Framework Programme 7, grant no. 230826. REFERENCES CITED Abolhassani, H., Bagheri-Hariri, B., and Haeri, S.H., 2006, On ontology alignment experiments: Webology, v. 3, no. 3, article 28, http://www.webology .ir/2006/v3n3/a28.html (accessed 4 June 2011). Aspinall, W.P., Woo, G., Voight, B., and Baxter, P.J., 2003, Evidence-based volcanology: Application to eruption crises: Journal of Volcanology and Geothermal Research, v. 128, p. 273–285, doi:10.1016/S0377-0273 (03)00260-9. Aspinall, W.P., Carniel, R., Jaquet, O., Woo, G., and Hincks, T., 2006, Using hidden multi-state Markov models with multi-parameter volcanic data to provide empirical evidence for alert level decision-support: Journal of Volcanology and Geothermal Research, v. 153, p. 112–124, doi:10.1016/ j.jvolgeores.2005.08.010. Behncke, B., and Pshenichny, C.A., 2009, Modeling unusual eruptive behavior of Mt. Etna, Italy, by means of event bush: Journal of Volcanology and Geothermal Research, v. 185, p. 157–171, doi:10.1016/j.jvolgeores .2009.04.020. Bloomfield, L., 1914, An Introduction to the Study of Language: New York, Henry Holt and Company, 335 p. Bogdanov, A., 1926, Allgemeine Organisationslehre: Tektologie, v. I, p. II. Carniel, R., Pshenichny, C.A., Khrabrykh, Z., Shterkhun, V. and Pascolo, P., 2011, Modeling models: Understanding structure of geophysical knowledge by means of the event bush method, in Marschallinger, R., and Zobl, F., eds., Mathematical Geosciences at the Crossroads of Theory and Practice: Proceedings of the Conference of International Association of Mathematical Geosciences, 5–9 September, Salzburg, Austria. De Saussure, F., 1983 [1913], Course in General Linguistics (Harris, R., translator): La Salle, Illinois, Open Court. Diviacco, P., and Pshenichny, C.A., 2010, Concept-referenced spaces in computer-supported collaborative work, in Proceedings of the European Geosciences Union (EGU) 5th General Assembly, Vienna (Austria): Geophysical Research Abstracts, v. 12, EGU201-EGU6258 (CD-ROM).
164
Pshenichny and Kanzheleva
Feigenbaum, E.A., 1984, Knowledge engineering: The applied side of artificial intelligence, in Hagel, H.P., ed., Proceedings of a Symposium on Computer Culture: The Scientific, Intellectual, and Social Impact of the Computer: New York, New York Academy of Sciences, p. 91–107. Gentzen, G., 1934, Untersuchungen über das logische Schliessen, I–II. “Math. Z.,” bd. 39, h. 2, 3 (English translation: Gentzen, G., 1969, Investigations into logical deduction, in Szabo, M., ed., The Collected Papers of Gerhard Gentzen: Amsterdam, North-Holland, p. 68–128). Hilbert, D., and Bernays, P., 1934, Grundlagen der Mathematik (Fundamentals of Mathematics), Bd. I: Heidelberg, Springer-Verlag, 2 vols. Jakeman, A.J., Voinov, A., Rizzoli, A.E., and Chen, S., 2008, Environmental Modelling, Software and Decision Support: State of the Art and New Perspectives: Amsterdam, Elsevier, 384 p. Jensen, F.V., 2001, Bayesian Networks and Decision Graphs: New York, Springer, 268 p. Loudon, T.V., 2000, Geoscience after IT: Computers & Geosciences, v. 26, 13 p. Mandelbrot, B.B., 1982, The Fractal Geometry of Nature: New York, W.H. Freeman and Company, 468 p. Marzocchi, W., Sandri, L., and Selva, J., 2008, BET_EF: A probabilistic tool for long- and short-term eruption forecasting: Bulletin of Volcanology, v. 70, p. 623–632, doi:10.1007/s00445-007-0157-y. Newhall, C.G., and Hoblitt, R.P., 2002, Constructing event trees for volcanic crises: Bulletin of Volcanology, v. 64, p. 3–20, doi:10.1007/s004450100173. Oliveros, A.Q., Carniel, R., Tárraga, M., and Aspinall, W., 2008, On the application of hidden Markov model and Bayesian belief network to seismic noise at Las Cañadas caldera, Tenerife, Spain: Chaos, Solitons, and Fractals, v. 37, p. 849–857, doi:10.1016/j.chaos.2006.09.073. Pshenichny, C.A., and Fedukov, R.A., 2007, Formal treatment of knowledge in water science by means of event bush, in Proceedings of the European Geosciences Union (EGU) 2nd General Assembly, Vienna (Austria): Geophysical Research Abstracts, v. 9, EGU2007-A-01016 (CD-ROM). Pshenichny, C.A., and Khrabrykh, Z.V., 2002, Knowledge base of formation of subaerial eruption unit, in Leroy, S., and Stuart, I., eds., Environmental Catastrophes and Recovery in the Holocene: London, Brunel University, http://atlas-conferences.com/cgi-bin/abstract/caiq-22 (accessed 4 June 2011). Pshenichny, C.A., Carniel, R., and Akimova, V.L., 2005, Decreasing the uncertainty of BBN technique by means of complex formal approach to volcanological information treatment, in Proceedings of the European Geosciences Union (EGU) 2nd General Assembly, Vienna (Austria): Geophysical Research Abstracts, v. 7, EGU05-A-01016 (CD-ROM). Pshenichny, C.A., Nikolenko, S.I., Carniel, R., Sobissevitch, A.L., Vaganov, P.A., Khrabrykh, Z.V., Moukhachov, V.P., Shterkhun, V.L., Rezyapkin,
A.A., Yakovlev, A.V., Fedukov, R.A., and Gusev, E.A., 2008, The event bush as a potential complex methodology of conceptual modelling in the geosciences, in Sanchez-Marre, M., Bejar, J., Comas, J., Rizzoli, A., and Guariso, G., eds., Proceedings, iEMSs—International Congress on Environmental Modelling and Software: Barcelona, International Environmental Modelling and Software Society, v. 2, p. 900–912. Pshenichny, C.A., Nikolenko, S.I., Carniel, R., Vaganov, P.A., Khrabrykh, Z.V., Moukhachov, V.P., Akimova-Shterkhun, V.L., and Rezyapkin, A.A., 2009, The event bush as a semantic-based numerical approach to natural hazard assessment (exemplified by volcanology): Computers & Geosciences, v. 35, p. 1017–1034, doi:10.1016/j.cageo.2008.01.009. Schröder, B.S.W., 2003, Ordered Sets: An Introduction: Boston, Birkhäuser Boston, Inc., 391 p. Sinha, K., Malik, Z., Raskin, R., Barnes, C., Fox, P., McGuinness, D., and Lin, K., 2008, Semantics-based interoperability framework for geosciences: Eos (Transactions, American Geophysical Union), Fall Meeting supplement, abstract IN31D-11. Smullyan, R.M., 1968, First-Order Logic: Berlin, Springer-Verlag, 158 p. Sowa, J.F., 2000, Knowledge Representation: Logical, Philosophical, and Computational Foundations: Pacific Grove, California, Brooks/Cole Publishing Co., 594 p. Tulupyev, A.L., and Nikolenko, S.I., 2005, Directed cycles in Bayesian belief networks: Probabilistic semantics and consistency checking complexity, in Gelbukh, A., de Albornoz, A., and Terashima, H., eds., Proceedings of the Mexican International Conference on Artificial Intelligence 2005: Berlin, Springer-Verlag, Lecture Notes in Artificial Intelligence, v. 3789, p. 214–223. Tutte, W.T., 1998, Graph Theory As I Have Known It: New York, Oxford University Press, 156 p. Umpleby, S.A., 2007, Physical relationships among matter, energy and information: Systems Research and Behavioral Science, v. 24, p. 369–372, doi:10.1002/sres.761. Uspenskii, V.A., and Semenov, A.L., 1993, Algorithms: Main Ideas and Applications: Dordrecht, the Netherlands, Kluwer Academic Publishers, 269 p. Von Bertalanffy, L., 1968, General System Theory: Foundations, Development, Applications: New York, George Braziller, 289 p. Wood, R., and Curtis, A., 2004, Geological prior information and its application to geoscientific problems, in Curtis, A., and Wood, R., eds., Geological Prior Information: Geological Society of London Special Publication 239, p. 1–14.
MANUSCRIPT ACCEPTED BY THE SOCIETY 17 FEBRUARY 2011
Printed in the USA
The Geological Society of America Special Paper 482 2011
Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm A. Krishna Sinha Department of Geological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
ABSTRACT The need to develop a geoscience cyberinfrastructure framework for both the discovery and semantic integration of disciplinary databases in geosciences is abundantly clear as we seek to unravel both the evolutionary history of Earth and address significant societal challenges. Although geoscientists have produced large amounts of data, the ability to find, access, and properly interpret these large data resources has been very limited. The main reason for the difficulties associated with both discovery and integration of heterogeneous and distributed data sets is perhaps related to the adoption of various acronyms, notations, conventions, units, etc., by different research groups. This makes it difficult for other scientists to correctly understand the semantics associated with data, and it makes the interpretation and integration of data simply infeasible. This paper presents the scientific rationale for developing new capabilities for semantic integration of data across geoscience disciplines. In order to enable the sharing and integration of geosciences data on a global scale, ontology-based data registration and discovery are required. Hence, this paper describes the need to develop foundation-level ontologies for efficient, reliable, and accurate data sharing among geoscientists. Ontologically registered data can be modeled through the use of geoscientific tools to answer complex user queries. This paper emphasizes the need to share tools such as Web services that are registered to a service ontology and made accessible to the scientific community at large. Future development would include an ontology of concepts associated with processes, enabling users to conduct both forward and reverse modeling toward a more robust understanding of complex geoscience phenomena. This paper presents two use cases for a semantic infrastructure model registering data and services, including processes for analysis of complex geoscience queries.
INTRODUCTION Communities of scientists around the world are working toward the goal of discovering new knowledge through a better understanding of the fundamental principles that lie behind com-
plex and heterogeneous databases (Sinha et al., 2010). There is common consensus that access and integration (e.g., layering of data) of data are prerequisites for creating an information infrastructure, but, arguably, this cannot be the ultimate goal. We need to add transformative capabilities to data and related information
Sinha, A.K., 2011, Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 165–181, doi:10.1130/2011.2482(13). For permission to copy, contact
[email protected]. © 2011 The Geological Society of America. All rights reserved.
165
166
A.K. Sinha
through integrating processes that capture the full significance of the data leading to knowledge discovery. Figure 1 (Fleming, 1996) emphasizes the stages required to move from data to principles of science, which can be readily referred to as transforming data to knowledge.
Expressivity
Figure 1. The pathway from data to knowledge and beyond is a threestep activity: first seeking information as it relates to description, definition, or perspective (what, when, where), and then knowledge constituting strategy, practice, method, or approach (how), which leads to new insight and principles (why).
While information entails an understanding of the relations between data, it generally does not provide a foundation for why the data are what they are, nor an indication as to how the data would change over time through physical, chemical, and biological processes. When a pattern of relationships exists amidst the data and information, the pattern has the potential to represent knowledge that can be further explored through a better understanding of principles. I support the premise that semantic capabilities at all levels of logic are required to follow the path from syntactic to semantic interoperability (Fig. 2), as well to infer unknown relationships between data through reasoning (Sinha et al., 2010). Ontologies have long been used to describe all entities within an area of reality, and all relationships between such entities. Therefore, an ontology constitutes a set of well-defined terms with well-defined relationships (Gruber, 1993), and it can be a vital tool enabling researchers to turn data into knowledge. Computer scientists have made significant contributions to linguistic formalisms and computational tools for developing complex vocabulary systems using reason-based structures, and I suggest that a three-tiered ontology framework will be required to provide researchers with the pathway from data to examination of the fundamental principles that govern the sciences. There is common consensus that scientific disciplines have to deal with (1) large data volumes, (2) complexity of data, (3) volatility of data, (4) heterogeneity of data, (5) broad distribution of data resources, and (6) access to tools and services that can appropriately render and represent data and data products. A knowledgecreation infrastructure that enables access to and integration of heterogeneous data is required to meet these challenges, and it is often referred to as e-science. The semantic capabilities needed
Strong semantics o y t e r v y Modal Logic si er ove First-Order Logic v s o c e Logical Theory sc is pr x Di e D e Description Logic e ta dg r a DAML+OIL, OWL e o D wl m UML o o n t K Conceptual Model s s Semantic Interoperability le RDF/S m XTM o r
F
Extended ER
Thesaurus ER DB Schemas, XML Schema
Structural Interoperability
Taxonomy Relational Model, XML
Syntactic Interoperability
Weak semantics Application
Figure 2. There are multiple levels of semantics and associated interoperability capabilities (Obrst, 2003). Increasing interoperability services requires increasing community agreement on conceptual relationships across participating geoscience disciplines. Strong semantics allow inferences and reasoning from data set contents (Sinha et al., 2010). DAML+OIL—Darpa Agent Markup Language+Ontology Interchange Language, DB—database, ER— Entity-Relationship, OWL—Web Ontology Language, RDF/S—Resource Description Frameworks, UML—Unified Modeling Language, XML—extensible markup language, XTM—syntax for topic maps.
Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm to integrate complex and heterogeneous data within such an infrastructure is the focus of this paper. Specifically, I emphasize that this infrastructure will require a combination of upper-level, midlevel, and foundation-level ontologies coupled with data-level domain ontologies, as well as semantically enabled tools and services to achieve the goal of transforming data into knowledge. In the new e-science paradigm, geoscientists have moved toward using the Web as a medium to exchange and discover vast amounts of data. The current practice is dominated by establishing methods to access data, with little emphasis on capturing the meaning of the data, which would facilitate interoperability and integration. Some common current methods for integration include schema integration, leading to the use of mediated schemas that provide a uniform query interface for multiple resources (Halevy et al., 2006). Methods using peer data management (Aberer, 2003) can allow participating data sources to retrieve data directly from each other, and this is likely to extend data integration to the Internet scale. However, such query capabilities require syntactic and semantic mapping across resources in order to be effective. Clearly, availability of ontologies will become a prerequisite for semantic integration. In this paper, I adopt the definition of ontology as a set of knowledge terms, including the vocabulary, the semantic interconnections, and rules of inference and logic for some particular topic (Gruber, 1993; Noy, 2004). BUILDING A SEMANTIC FRAMEWORK FOR THE SCIENCES AND EARTH SCIENCE IN PARTICULAR Here, three types of ontologic frameworks are identified for discovery of data and its integration: object (e.g., materials), process (e.g., chemical reactions), and service (e.g., simulation models or geochemical filters) (Sinha et al., 2006a; Malik et al., 2007a). Objects represent our understanding of the state of the system when the data were acquired, while processes capture the physical and chemical actions on objects that may lead to changes in state and condition over time. Service provides tools (e.g., simulation models and analysis algorithms) to assess multiple hypotheses, including inference or prediction. Object ontology characterizes the semantics of the data. It maps the metadata in the databases to different concepts essential for data search and integration. The service ontology maps instances of services to conceptual tasks to permit semantic searches and automatic linkages to types of data. The process ontology captures the broad domain knowledge, including information such as understanding of the data set, relationships among the different variables, normal range of the variables, known causal relationships (e.g., Reitsma and Albrecht, 2005; Sinha et al., 2006a; Barnes, 2006), etc. These three classes of ontologies within the semantic layer of e-geoscience are thus required to enable automated discovery, analysis, utilization, and understanding of data through both induction and deduction. Although this paper emphasizes primarily the current status of object and service ontologies, I recognize the need to expand this capability to a point where sci-
167
entists can examine the relationships between data and external factors, such as processes that may influence our understanding of the reasons certain events happen. However, the development of object ontologies is a prerequisite for semantic interoperability across process, object, and service ontologies (Sinha et al., 2006b; Rezgui et al., 2007, 2008). It is important to note that the semantic framework presented in this paper is more conceptual rather than explicitly formulated to meet all known rules for building formal ontologies. It is, however, the underpinning for formalizing its semantic content. I also briefly describe the status of two software-engineered prototypes that enable data to be registered to known ontologies, leading to a new method for discovery, integration, and analysis of heterogeneous data. OBJECT ONTOLOGY: SEMANTIC FRAMEWORK FOR DATA Object ontologies can be represented at four levels of abstraction: upper- (Semy et al., 2004), mid- (Raskin and Pan, 2005), as well as foundation-level and discipline-specific (i.e., earth science) ontologies (Fig. 3). The latter two are the subject of this paper. Upper-level ontologies, e.g., Suggested Upper Ontology (SUO) (Phytila, 2002; Niles and Pease, 2001) and the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) (Masolo et al., 2002), are domain independent and provide universal concepts applicable to multiple domains, while midlevel ontologies, e.g., SWEET (Semantic Web for Earth and Environmental Terminology; sweet.jpl.nasa.gov/ontology/), constitute a concept space that organizes knowledge of earth system science across its multiple, overlapping subdisciplines. SWEET also includes concepts of data representations, services, and legacy. Foundation-level ontologies capture relationships among conceptual organizations of types of data, including their measurements, while domain-level ontologies are discipline specific and can be used for efficient, reliable, and accurate data sharing among earth scientists (Sinha et al., 2006a, 2006b). These ontologies seek to utilize existing community-accepted high-level ontologies such as SUO (http://suo.ieee.org/SUO/Ontology-refs.html, Institute of Electrical and Electronics Engineers [IEEE] endorsed) and SWEET (http://sweet.jpl.nasa.gov). In particular, the SWEET ontology contains formal definitions for terms used in earth and space sciences and encodes a structure that recognizes the spatial distribution of earth environments (earth realm) and the interfaces between different realms (Raskin and Pan, 2005; Raskin, 2006). Thus, SWEET provides an extensible midlevel terminology that can be readily utilized by both foundation-level and domain-specific ontologies (Malik et al., 2010). In this paper, I describe an approach for building both foundation- and domainlevel ontologies for interdisciplinary integration within the earth sciences. Specifically, I describe why community-supported foundation ontology development is a prerequisite for developing domain ontologies. I then use two case histories to emphasize the complexity of this endeavor and suggest that the semantic
168
A.K. Sinha
Figure 3. Conceptual organization of object ontologies (Unified Modeling Language [UML] diagram) at various levels of granularity is necessary for transformation of data to knowledge. Both SUO (http://suo.ieee.org/SUO/Ontology-refs.html) and SWEET (http://sweet.jpl.nasa.gov) ontologies can be used to provide connectivity to existing and future ontologies related to all science disciplines. The ontology for elements and isotopes under the concept of Material (Matter) discussed in this paper is considered a foundation ontology because it is common to all science domains. Such high-level UML diagrams show that Materials have properties, age, structure, and location, while Measurements include all analytical tools, including human observations, used for gathering data associated with any object. Domain-specific ontologies utilize these concepts for semantic integration across disciplines. SUMO—Suggested Upper Merged Ontology, SWEET—Semantic Web for Earth and Environmental Terminology.
platforms presented in this paper can act as the basis for deeper semantic structures in the near future. SEMANTIC FRAMEWORK FOR OBJECT ONTOLOGY AT THE FOUNDATION LEVEL Foundation ontologies for all sciences (Fig. 3) can be viewed as a representation of formal declarative specifications of all objects, phenomena, and their interrelationships. I emphasize that the concept of Matter1 (labeled as “Material” in Fig. 4), including all thermodynamic states of matter, is the most fun1
Helvetica font is used for concepts and classes.
damental of all ontologies. Clearly, without matter, there can be no semantic concept of location, time, and structure, or physical properties of matter and instruments that measure these properties. These foundation ontologies may then be used to capture discipline-specific terms, such as those for minerals, rocks, geologic time scale, geologic structures, or geologic phenomena. This approach also readily accepts geoscience terms being developed through GeoSciML (http://www.geosciml.org/.), a markup language designed to promote syntactic integration of heterogeneous resources (Boisvert et al., 2003; Simons et al., 2006; Malik et al., 2010). In the following section, I first describe individual foundation-level ontology packages through high-level concept maps. These concept maps are organized to explore relationships
Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm Import Numerics Ontology
Import http://www.isi.edu/ ~pan/OWL-Time.html
Import SWEET: Units Ontology
Time
Structure Physical Properties Location
Import SWEET: Physical Property Ontology
Material Measurements Mathematical Functions Import SWEET: Space Ontology
169
Phenomenon
Figure 4. Foundation-level representation of the concepts as packages and its connectivity with midlevel SWEET ontologies. I use the term “packages” to emphasize that multiple ontologies are contained within each package. For example, the Material package contains ontologies for elements and isotopes, which are readily linked to domain-specific concepts such as minerals, rocks, water, or magma and can be readily linked to GeoSciML-endorsed terms. The semantic relationship between all packages and their subclasses is the foundation for semantic interoperability.
Import SWEET: Physical Phenomena Ontology
without the full attribution of class structures and associations. When possible, earth-science domain-level terms and concepts have been added to foundation-level ontologies to show its extensibility and thus enable semantic registration of data and terms for integration. Material: The elements, constituents, or substances of which something is composed of, or can be made. Matter that has qualities which give it individuality and by which it can be categorized (Webster’s New Collegiate Dictionary, http://www.merriam -webster.com/dictionary/material, accessed 9 May 2011). Materials include classes of elements and isotopes, as well as states of matter, such as solids, liquids, fluids, and gases and all associated properties. These fundamental classes are linked to earth science–specific concepts of minerals and rocks (Fig. 5).
the IUPAC (International Union of Pure and Applied Chemistry) technical report on atomic weight of the elements (DeLaeter et al., 2003). The major difference between the element ontology described in this paper and the periodic table (periodic table ontology) developed under the Darpa Agent Markup Language (DAML) program is explicit packaging of elements, e.g., large ion lithophile elements (LILEs), of interest to geoscientists. I also represent each element as a subclass of the Element class, so that each element inherits all common properties of the class Element. I also use the term “nominal atomic weight” to represent atomic weights of both mono-isotopic and polyisotopic elements, and to distinguish it from isotope atomic weights. Isotope Ontology
Element Ontology Figure 6 is a class diagram representing the different concepts related to elements and the explicit relationship between them. The element class (e.g., DeLaeter et al., 2003) contains a list of properties, including: name, nominal atomic weight, symbol, atomic number, color, atomic radius, melting point, boiling points, density, specific heat, ionization potential, electron affinity, heat of fusion, heat of vaporization, bonding radius, and electron negativity. In addition to other properties, such as classifications of elements as metallic, nonmetallic, or semimetallic, and Goldschmidt’s classification (atmophile, chalcophile, siderophile, or lithophile), rare earth group, platinum group elements, etc., it also contains properties representing the arrangement of the element in the periodic table (http://www.webelements.com/): group, period, and block (s-block, p-block, d-block, or f-block). The following resources were used to gather information about elements and their properties: http://chemlab.pc.maricopa.edu/ periodic/periodic.html, http://www.webelements.com/, as well as
Figure 7 is a class diagram that shows relationships within isotope ontology. It shows that an Isotope has the following properties: symbol, number of neutrons, and isotope atomic weight (Rosman and Taylor, 1998). I separate the concept of atomic weight for elements from isotopes to prevent overwriting of the assigned atomic weight of the parent element. Every isotope (such as uranium 238) is a separate class and inherits the properties of the parent element (uranium) isotope as well as the abstract Isotope class. For instance, U238, and U235 are subclasses of the U_Isotope class, which in turn is a subclass of the Isotope class. Another concept represented in the isotope ontology is the classification of isotopes into two subclasses: nonradionuclide and radionuclide. I use this system to include the decay series where a radionuclide can be an intermediate parent, a primary parent, or an intermediate daughter (Fig. 7). Each of these subclasses has their corresponding properties. Primary or secondary parent isotopes have a half-life, decay mode, abundance, and a final daughter or a decay series including intermediate daughters.
170
A.K. Sinha
Figure 5. Unified Modeling Language (UML) representation of Materials package, emphasizing linkages to earth science concepts. Elements and isotopes constitute the fundamental attributes of Materials and can be readily utilized in other sciences, e.g., in biochemistry, where the concept of cells and their properties could replace minerals or rocks.
Figure 6. Element ontology Unified Modeling Language (UML) diagram emphasizing the details required to discover and integrate semantically registered databases. See relationship of the Class Element/Isotope to other classes in Figure 5.
Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm
171
Figure 7. Unified Modeling Language (UML) diagram for isotope ontology. Registration of isotopic measurements to such conceptual relationships will allow for integration between elements and isotopes.
Final daughter has abundance and a primary parent. For the isotope properties, the following resources were used to gather the information: http://chemlab.pc.maricopa.edu/periodic/periodic. html, http://ie.lbl.gov/toi/perchart.htm, as well as IUPAC technical report on isotopic composition of the elements (Rosman and Taylor, 1998). HOW ARE ELEMENT AND ISOTOPE ONTOLOGIES LINKED TOGETHER? Every element in the element ontology is connected to its corresponding isotope (Fig. 8). For instance, an instance of
Uranium (U) class has one to many instances of U_isotope. On the other hand, every U_isotope has a reference to its parent element Uranium. In order for isotopes to inherit all properties of elements, I have treated each element as a class of the Element class, rather than consider each element as an instance. Discovery and integration between elements and isotopes are more easily accomplished through this structure. Structure: Arrangement of particles or parts in a substance or body; aggregate of elements of an entity in their relationship to each other (Webster’s New Collegiate Dictionary, http:// www.merriam-webster.com/dictionary/structure, accessed 9 May 2011). It includes organization of all geologic structures and their
172
A.K. Sinha
relationship to each other within concepts that range from zero dimensions (0-D) to three dimensions (3-D). An ontologic framework for structures (Fig. 9) has been established that represents concepts of 0-D, 1-D, 2-D, and 3-D geometric forms, which can capture all known geometrical forms and their relationships to each other. It should be emphasized that a Contain relationship links 0-D, 1-D, and 2-D to 3-D (volume concept). For example, in earth science, discontinuities such as seismic, lithologic, planar, spherical, lattice, fault, and fracture,
Figure 8. An example illustrating how the uranium element is linked to its isotopes through both has and is a conceptual relationships.
as well as chemical or isotopic discontinuities, have a 2-D geometric form. In order to extend these concepts to data gathered by geologists, I use lineation as an example having 1-D geometric form with field measurements of trend and plunge. Specific types of lineation can then be represented as subclasses with properties inherited from the concept of lineation. I have also established a class relationship that allows the meaning of data situated in x-y-z space to be captured. For example, the concept of Relative Sequence contains concepts of Sequence, which in turn contain X, Y, and Z Sequences. I have further established links between Time Sequence and Sequence, so that geologic units above or below a datum horizon can be explicitly related to be older or younger than the datum. This is a very useful mechanism for portrayal of stratigraphic horizons, regardless of the rock type involved. Extended to its larger spatial concept, it also allows us to recognize layers of Earth such as the crust and mantle (also see SWEET for Earth layers). Location: A position or site occupied or marked by some distinguishing feature (Webster’s New Collegiate Dictionary, http:// www.merriam-webster.com/dictionary/location, accessed 9 May 2011). All materials and structures have a location within a 3-D volume. Although many coordinate reference systems are recognized (Fig. 10), geoscientists commonly use a geodetic reference frame
Figure 9. Unified Modeling Language (UML) diagram representing concepts contained in Structure package. Note the earth science concepts that are readily linked to foundation-level concepts of all dimensions, 0-D, 1-D, 2-D, and 3-D. Concepts of earth layers contained in SWEET are also linked in this package.
Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm for absolute location of samples or points of observation. Relative location is a more difficult concept to classify as data, but concepts of near or adjacent have been successfully used to identify polygons in geographic information systems (Schuurman, 2004). Time: The measured or measurable period during which an action, process, or condition exists or continues (Webster’s New Collegiate Dictionary, http://www.merriam-webster.com/ dictionary/time, accessed 9 May 2011). This also includes concepts of duration and geologic events. This package (Fig. 11) deals with relationships of classes within the concept of geologic time. I utilized OWL-time (http:// www.w3.org/TR/2006/WD-owl-time-20060927/) as a starting foundation-level ontology and added geologic concepts associated with either absolute, intervals, or durations of geologic events. I extended calendar time (required for querying presentday events as recorded through a calendar clock) to ranges that include geologic time scales. The capability to query data across these concepts uniquely provides the temporal perspective for the better-characterized spatial queries. The organization of the concepts also allows the user to access multiple geologic time scales because geologic time is a temporal entity with attributes such as beginning, end and duration.
173
Measurements: Instrumental measurements and images (chemical, physical, and morphological) of matter (Webster’s New Collegiate Dictionary, http://www.merriam-webster.com/ dictionary/time, accessed 9 May 2011). This package, when fully developed, would contain ontology for instruments, including their operating conditions. Many of these instruments and their operating parameters are already available in SWEET. Phenomenon: A fact or event of scientific interest susceptible to scientific description and explanation (Webster’s New Collegiate Dictionary, http://www.merriam-webster.com/dictionary/ phenomenon, accessed 9 May 2011). Events such as earthquakes or volcanism are considered as phenomenon. This package is fully covered in SWEET ontology. Physical Properties: An attribute common to all members of a class (Webster’s New Collegiate Dictionary, http:// www.merriam-webster.com/dictionary/%20properties, accessed 9 May 2011). All planetary materials have physical properties. This package is fully covered in SWEET ontology. Mathematical and Statistical Functions: A set of commonly used notations that describe the numerical data and attributes associated with them. Computations such as standard error, regression, and standard deviation are considered as functions. This package is fully covered in SWEET ontology.
Figure 10. Global Geographic Location system adapted from http://www.colorado.edu/geography/gcraft/notes/co and http://www.iki.rssi.ru/ vprokhor/coords.htm (Dana, 1995; Russell, 1971; Hapgood, 1992).
174
A.K. Sinha
Figure 11. Unified Modeling Language (UML) diagram representing the Geologic Time class and its properties (adapted from Hobbs and Pan, 2004; Allen, 1991).
SERVICE ONTOLOGY: SEMANTIC ORGANIZATION OF TOOLS AND SERVICES These ontologies are designed to provide a semantic framework for classes of tools and services that provide computational or classification capabilities for data sets. As an example, a classification tool such as alumina saturation index can utilize rock geochemical data to classify it as peraluminous or metaluminous. In a later section (scenario 1), A-type igneous rocks can be recognized through discriminant diagrams such as those given by Eby (1990). More complex modeling codes such as reactive transport can be used to model the behavior of elements in a mine wastedisposal facility. Within this framework, codes can be organized within a service ontology and be made available to a user based on the query (Malik et al., 2007b; Rezgui et al., 2008). Such
tools and services should be wrapped as Web services to facilitate discovery in a Web environment. I foresee these services as applications developed, shared, and registered to service ontology by geoscientists across the world. Availability of a semantic framework will allow individuals to register their application to specific concepts, e.g., A-type classification of igneous rocks. PROCESS ONTOLOGY: INTERPRETING “WHY” FROM DATA THROUGH REVERSE AND FORWARD PROCESS MODELS Because of the emerging state of ontologic research in earth sciences, no foundation- or domain-level ontologies for geologic processes are available for refinement. However, Sinha et al. (2006a) and Barnes (2006) have presented high-level conceptual
Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm schemas for geologic processes that produce or modify igneous rocks. For example, as shown in an overview diagram (Fig. 12), it is easy to recognize that there are many interrelated concepts that are woven in the logic of interpreting igneous rocks. It is clear that in the category of presolidification events, the process of partial melting is the key prerequisite for the formation of igneous rocks. Associated with this process, there are issues of sources, depth of melting, and the tectonic setting of the melting event. Additional processes represented in the presolidification hierarchy are segregation, transport, and emplacement. Syn- to postsolidification processes include phenomena such as the rate of cooling or alteration. It is clear that many phenomena act as operators on any given process, and the cumulative effect of such actions leads to the final product, i.e., igneous rock. The recognition of the characteristic contribution of each process to the final physical form, i.e., shape of pluton, or its chemical signature is often difficult to assess, and this may lead to multiple hypotheses as discussed for scenario 2. The development of process ontologies requires the integration of phenomena and events associated with the entire hierarchy as represented in Figure 12. As such, a link between processes and concepts within a semantic framework would provide relationships that exist during formation of an igneous rock, from the partial melting event to the solidification event; this is similar in concept to the feedback loop applied to granitic plutons (Petford et al., 1997). DISCOVERY, INTEGRATION, AND ANALYSIS OF RESOURCES A prototype software environment for a discovery, integration, and analysis (DIA) engine of semantically registered data
Phenomenon
Process
Source rock Heat source Depth of melting
Melting
Stress regime
Segregation
Melt viscosity Melt density Geometrical form Depth of emplacement Volatile content
Transport
Presolidification processes
Emplacement
Rate of cooling
Solidification
Alteration Metamorphism
Postsolidification
Syn- to postsolidification processes
Figure 12. A graphic view of multiple processes that are linked together through process, time, tectonic, and observational relationships (from Sinha et al., 2006a).
175
and services (Fig. 13) has been described in detail by Rezgui et al. (2007, 2008) and Malik et al. (2010). The primary objective of constructing the DIA engine is to build a service-oriented computational infrastructure that enables scientists to discover, integrate, and analyze earth science data. Data discovery enables the users to retrieve the distributed data sets, i.e., located at multiple sites, that are pertinent to the research task at hand. Data integration enables users to query various data sets along some common attributes to extract previously unknown information called data products. The data products that are generated can either be used in their delivered form or used as input to the data analysis phase. Data analysis may be used to verify certain hypotheses, or it may refine the data product with further data discovery and integration. All data used by DIA are referenced to the original data provider, even through multiple iterations, e.g., controlling provenance of data (Simmhan et al., 2005), and this is a prerequisite for building trust for online conduct of science (Zaihrayeu et al., 2005). SEMANTICS-ENABLED REGISTRATION OF DATA AND SERVICES: SIMPLIFYING DISCOVERY PRIOR TO INTEGRATION I suggest that two types of semantic registration are necessary for both discovery and integration of data: (1) data registration, e.g., SEDRE—semantically enabled data registration engine described by Malik et al. (2010); and (2) SESRE—semantically enabled services, (including a process-oriented codes) registration engine. These concepts are shown in Figure 13. Ontology-aided data integration, accomplished by registering databases to ontologies, systematically resolves both syntactic and semantic heterogeneity, allowing scientists to focus on the content of the database rather than its schema (Lin and Ludäscher, 2003). SEDRE facilitates discovery through resource registration at three levels: 1. Keyword-based registration: Discovery of data resources (e.g., gravity, geologic maps, etc.) requires registration through the use of high-level index terms. For instance, the popular AGI Index terms (American Geologic Institute GeoRef Thesaurus; http://www.agiweb.org/news/spot_nov8_thesaurus.html) can be used. If necessary other index terms, such as those provided by AGU (American Geophysical Union, http://www.agu.org/pubs/ authors/manuscript_tools/journals/index_terms/), can be used as well and eventually be cross indexed to each other. 2. Ontological class-level registration: Discovery of the semantic content of databases requires registration of the database to class-level ontology, such as rock geochemistry, gravity database, etc. 3. Item detail–level registration: Item detail–level registration consists of associating a column in a database to a specific concept or attribute of an ontology, thus allowing the resource to be queried using concepts instead of actual values. This mode of registration is most suitable for data sets built on top of relational databases. However, item detail–level registration can be extended
176
A.K. Sinha
DIA Engine User
Data and Service Discovery
DIA Registration Component Data Registration SEDRE
Object Service
Ontologies
Service Registration SESRE
Figure 13. Conceptual organization of classes of ontologies within the extensible discovery, integration, and analysis (DIA) engine, where the class of service ontologies is inclusive of both tools and process ontologies. SEDRE—semantically enabled data registration engine; SESRE— semantically enabled services registration engine.
Tools and process codes wrapped as Web services
to cover Excel spreadsheets and maps in ESRI Shapefile format by internally mapping such data sets to PostgreSQL tables. For example, a column in a geochemical database may be specified as representing SiO2 measurement. Ontological data registration at item detail level uses the concepts of Subject, Object, Value, and Unit. Figure 14 shows the relationship between these concepts and the method with which it is possible to map columns of data sets to these concepts. In an example utilizing geochemical data, Rock represents the Subject (sample 1758), which contains the element compound SiO2 as one of its Objects. The Object SiO2 has a Value of 50.72 and is measured in wt% Unit. To facilitate such registration, one can envision a graphic user interface (Fig. 15) for geochemical data. Similar interfaces can be readily created for all major disciplines, thus making it easy for data providers to semantically tag their data for ontology-based discovery. For instance, SO2 columns in the data sets are mapped to terms adopted from element ontology (see Figs. 5 and 6), while units of measurement are made available through SWEET unit ontology. SEDRE allows the data owners to maintain control over their data and only store the data– ontology term mappings. The mappings also include concepts of longitude/latitude coordinates (from location ontology) to enable efficient access to spatial data. I recognize that data registration through ontologies is a time-consuming process, and that data owners may not be able/willing to register their data sets in “one go.” Therefore, SEDRE is developed as a downloadable service, where data owners can download SEDRE (along with all the required ontology terms) on their personal machines and connect to SEDRE’s online repository only to upload the data-ontology mappings. This allows data owners to register their data at their own convenience, while keeping ownership of data. DIA uses different “registry servers” (RSs), which could be distributed worldwide, to provide directory functionalities (registration of data and tools, indexing, search, etc.). The providers of resources
advertise their resources on registry servers, which may then be (manually or automatically) discovered and used. SCENARIO 1 To illustrate the different DIA components, consider the following query: What is the distribution, U/Pb zircon ages, and the gravity expression of A-type plutons in Virginia? Query Specification In the DIA engine, the user query can be expressed in one of two ways: it can either use a text-based format or a menubased format. The text-based format allows a user to query the entire database, while the menu-based format (Fig. 16) lets the user select only specific items, which in turn queries only a subset of the data. The user does not need extensive knowledge of the querying techniques, models, or keywords (which may be required in a text-based format). The task at hand can be completed with the help of a few “mouse-clicks,” and query results are definitely produced as long as the data required to answer the query are present, i.e., empty result sets are only returned in the case of missing data. The user clicks through the different menus to “build” an exact query. Filtering and Integration Data filtering is a process in which the DIA engine transforms a raw data set into a data product. Data filtering may also take a data product as its input. Examples of data products include a map showing the A-type bodies in the Mid-Atlantic region, an Excel file giving the ages of those A-type bodies, a gravity database table spatially related to A-type bodies saved as a contoured gravity map, etc. Data products used in data integration may be
AnalyticalOxideConcentration
1
0..n
analytic alOxide: AnalyticalOxide Concentration: ValueWithUnit errorOfConcentration: ValueWithUnit
A Section from Planetary Material Ontology
Figure 14. The ontologic registration of data to concepts is the key to semantic interoperability and integration. Note the representation of a rock sample as a subject with object defining the data itself. The application of ontology-based concepts of subject and object facilitates registration and discovery. For example, column SampleID represents the concept of Subject (in this instance it is a rock sample), while the concept of Object contains the concept of analytical oxide with value and unit. Based on such deep ontologies, it is possible to easily register and query for data associated with any subject.
Figure 15. Schematic representation of registration of data through SEDRE (semantically enabled data registration engine). This user interface is specifically designed to register data from atmospheric studies, and, as such, commonly measured compounds are made available under the section Major elements. As shown in the inset labeled New Mapping, we readily register subject (e.g., SO2) and capture its value and units (in Dobson units). Such templates can be readily made for all subdisciplines and will lead to easy registration of data to known semantics. SESRE—semantically enabled services registration engine.
178
A.K. Sinha
Figure 16. Query specification through menus. The discriminant diagrams are made available as Web services, and they use a point in a polygon algorithm to recognize whether a sample has A-type affinities. The menu also provides a link history that enables a user to follow the steps involved in the classification of samples.
of two types: prepackaged or created dynamically. Querying prepackaged data is usually faster but is not flexible and provides little support for complex scientific discovery. Dynamically created data products may require on-the-fly integration and extensive query processing, but they enable far richer possibilities for scientific discovery. DIA’s Service-Oriented Approach for Facilitating Semantic Integration of Heterogeneous Data The DIA engine is a Web-based, service-oriented system developed using a variety of technologies including: ESRI’s ArcGIS Server 9.1, Microsoft’s .NET framework, Web services, Java, and JNBridge 3 (Rezgui et al., 2008; Malik et al., 2010). Users submit queries through the DIA’s Web-accessible graphical interface. The engine translates these queries into a sequence of tasks such as: accessing map servers, discovering and accessing data sources, invoking Web services (e.g., Cardoso and Sheth, 2006), filtering features, joining layers, and graphically rendering query results for visualization. The DIA engine also enables users to save their query history as well as export data products for future references. Since the DIA engine is developed along a service-oriented approach, key code modules are wrapped as Web services. This approach has two advantages. First, it makes the system readily extensible. As the geoscience community introduces new services, these could be integrated in the DIA engine as new functionalities. Second, services developed for DIA may be used as building blocks to produce other systems. The DIA engine supports several querying modes (geological map-based queries, region-based queries, etc.). To answer the example for A-type plutons query, the user first selects the option “Geological map-based queries” in DIA’s main menu (Fig. 17B). The system then accesses a geological map server, gets a (default) geological map, and displays it to the user (Fig. 17). This map enables the user to select the area of interest (i.e., Virginia). This may be done by selecting a bounding box or by selecting the entire state. In the latter case, the DIA engine accesses a gazetteer to determine the selected state’s latitude-longitude coordinates. The user then uses DIA’s drop-down menus (Fig. 17B; also Fig. 16) to identify a computational filter (A-type magma class
filter in this case) to be applied to the data samples located in the selected area. The DIA system is designed to search all semantically registered data sets that have samples located in the area of interest. The user then selects all the A-type bodies and requests U/Pb ages. If ages have been registered to these bodies within a geochronologic database, the information is retrieved and an age is displayed for each body (Fig. 17C). Similarly, access to point source gravity data from a site such as http://paces.geo.utep.edu that have been semantically registered provides the capability to plot the distribution of individual stations and use a kriging tool to construct contour maps at various scales (Fig. 17C). Ultimately, the user is now presented with an integrative view (Fig. 17D) that can be used to discover relationships between occurrence of certain types of igneous rocks and their gravity signature. SCENARIO 2 Why are sulfur contents associated with volcanic activity similar for volcanoes from different plate-tectonic settings? (This illustrates the need to eventually link object-service and process ontologies.) Volcanism and its bearing on climate change have been the subject of many studies (e.g., Blong, 1984; Robock, 2000), and this area of research provides scientists with the opportunity to study the scales at which climate can be influenced. For example, a global decrease in temperature from 0.1 °C to 10 °C can be positively correlated with sulfur yield in grams (Fischer et al., 1997). For larger eruptions, such as Mount Pinatubo in 1991, the very substantial amounts of material (e.g., sulfur dioxide reacting with water to form sulfuric acid), will reach into the stratosphere above ~15 km, where its effects are felt on a global scale and can persist for years. Although existing ontologies such as those presented here and formalized in SWEET can support smart search and integration, they are unable to explain the patterns in data or mechanisms (processes) responsible for the abundance of sulfur in volcanic eruptions. The concept of plate tectonics provides a framework for geologically associated general characteristics of volcanoes (Sinha et al., 2007), but many of the details of volcanic activity and their products are difficult to explain through this paradigm. For example, many volcanoes
Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm
179
Integration and Analysis
Data Product: plutons and gravity field
D
Gravity Gridding
Geospatial & Geotemporal: A-type
C
Figure 17. A layered bottom-up depiction of discovery, integration, and analysis (DIA) phases from data to data product. Panels show the steps required to integrate geospatial, geotemporal, and geophysical data through the use of tools and services that have been registered to known ontologies.
Igneous Area of Interest
Geochemical Magma Class
B Tool Selection
A-Type Discriminant Diagram
A Ontologically registered data and services Structure Physical Properties Location
Material Functions
Time
Geochronology Geochemistry
Measurements
Geophysics
Phenomenon
Geospatial
share common characteristics (high sulfur emissions) despite being located in different tectonic settings, and I have chosen Mount Pinatubo (convergent-margin setting in Philippines) and Nyanuragira (continental divergent setting in East Africa) to show similar SO2 loading of the atmosphere. Based on new ontologies for both volcanoes and plate tectonics (Sinha et al., 2007), one can suggest that an integrated conceptual understanding of processes associated with volcanic systems from magma generation to eruption is required to link characteristics of volcanic eruptions to plate tectonics. The ultimate objective of understanding the particular volcanoes that are capable of influencing climate change requires this integrative approach. For domain experts, investigation of geologic processes typically involves data-driven inference, specifically, the use of detailed data sets involving objects, time, and space to derive testable genetic hypotheses (“reverse” models). For volcanic rocks, reverse process models require the ability to use field, geochemical, and mineralogical data to infer magma source(s), storage
site(s), and causes for compositional diversity. This is generally done by thermodynamic calculations to determine pressure and temperature, and by matching patterns of compositional arrays to infer differentiation/assimilation processes. By a combination of logic and trial-and-error comparison, researchers eliminate as many genetic scenarios as possible. If coded for hands-off computation, this approach borders on that of artificial intelligence. In contrast, predictive models require the ability to visualize and calculate a process from its inception to completion (“forward” models). Forward modeling of magmatic processes associated with volcanism, and extensible to other geologic phenomena, is similar to workflow calculations. In the simplest approach, magmatism is a bottom-to-top phenomenon, from a mantle heat source to final pluton emplacement or eruption. However, although adding complexity to forward models is relatively straightforward, the usefulness of such complex models may suffer due to lack of general applicability in complex volcanic environments.
180
A.K. Sinha
SUMMARY AND FUTURE RESEARCH DIRECTIONS This paper presents an overview for development of foundation-level ontologies that enable semantic integration of data and tools. The recently developed prototypes for DIA and SEDRE engines readily show semantic capabilities that utilize ontologies and Web services to organize, annotate, and define data sets and tools. However, for geoscientists to advance their scientific goals, it is likely that an active presence in the emerging “Semantic Web” (Berners-Lee et al., 2001) will be required. This would enable data and applications to be automatically understood and processed without geographical or organizational boundaries (Alonso-Jiménez et al., 2006), and thus lead to efficiency through precise “information hunting,” e.g., smart search. Other advantages for the geoscience community for participating in the use of Semantic Web technologies suggested by Sinha et al. (2010) include: facilitated knowledge management (processes of capturing, extracting, processing, and storing knowledge) (Alonso et al., 2004), integration across heterogeneous domains through ontologies (Fox et al., 2008), efficient information filtering (sending selective data to right clients), formation of virtual communities (Reitsma and Albrecht, 2005), legacy capture for long-term archiving, serendipity (finding unexpected collaborators), and Web-based education (Ramamurthy, 2006). Capabilities based on semantic integration of data, services, and processes will become the new paradigm in scientific endeavors and provide a significant boost to the visibility of geoscience research and education in a competitive world (Sinha et al., 2010). ACKNOWLEDGMENTS The author acknowledges the decade-long interaction with many geoscience and computer science colleagues and extends special thanks to Kai Lin, Abdelmounaam Rezgui, Zaki Malik, Robert Raskin, Calvin Barnes, Boyan Brodaric, Peter Fox, and Deborah McGuinness for their support in developing semantic capabilities for the geoscience community. This research was supported by National Science Foundation award EAR022558. All concept maps were prepared using free software provided by the Institute for Human and Machine Cognition, http://cmap.ihmc.us/. REFERENCES CITED Aberer, K., 2003, Special issue on peer to peer data management: Special Interest Group on the Management of Data (SIGMOD) Record, v. 32, p. 69–72. Allen, J.F., 1991, Time and time again: The many ways to represent time: International Journal of Intelligent Systems, v. 6, p. 341–355, doi:10.1002/int .4550060403. Alonso, G., Casati, F., Kuno, H., and Machiraju, V., 2004, Web Services: Concepts, Architecture, and Applications: Berlin, Springer Verlag, 354 p. Alonso-Jiménez, J.A., Borrego-Díaz, J., Chávez-González, A.M., and Martín-Mateos, F.J., 2006, Foundational challenges in automated semantic Web data and ontology cleaning: IEEE Intelligent Systems, v. 21, no. 1, p. 42–52.
Barnes, C., 2006, From Object to Process Ontology: U.S. Geological Survey Scientific Investigations Report 2006-5201, p. 40–41. Berners-Lee, T., Hendler, J., and Lassila, O., 2001, The semantic web: Scientific American, v. 284, p. 34–43, doi:10.1038/scientificamerican0501-34. Blong, R.J., 1984, Volcanic Hazards: Sourcebook on the Effects of Eruptions: Orlando, Academic Press, 424 p. Boisvert, E., Johnson, B.R., Schweitzer, P.N., and Anctil, M., 2003, XML Encoding of the North American Data Model: U.S. Geological Survey Open-File Report 03-471, http://pubs.usgs.gov/of/2003/of03-471/ boisvert/index.html (accessed 4 May 2011). Cardoso, J., and Sheth, A., 2006, The semantic web and its applications, in Cardoso, J., and Sheth, A.P., eds., Semantic Web Services, Processes and Applications, Volume 3: New York, Springer Verlag, p. 3–33. Dana, P.H., 1995, Co-ordinate System Overview: http://www.colorado.edu/ geography/gcraft/notes/coordsys/coordsys.html (accessed 8 May 2011). DeLaeter, J.R., Bohlke, J.K., DeBievre, P., Hidaka, H., Peiser, H.S., Rosman, K.J.R., and Taylor, P.D., 2003, Atomic weights of the elements: Review 2000: Pure and Applied Chemistry, v. 75, p. 683–800, doi:10.1351/ pac200375060683. Eby, G.N., 1990, The A-type granitoids: A review of their occurrence and chemical characteristics and speculation on their petrogenesis: Lithos, v. 26, p. 115–134, doi:10.1016/0024-4937(90)90043-Z. Fischer, R.V., Heiken, G., and Hulen, J.B., 1997, Volcanoes: Crucibles of Change: Princeton, New Jersey, Princeton University Press, 317 p. Fleming, N., 1996, Coping with a Revolution: Will the Internet Change Learning?: Canterbury, New Zealand, Lincoln University, http:// www.vark-learn.com/documents/information_and_knowle.pdf (accessed 9 May 2011). Fox, P., Sinha, A.K., McGuinness, D., Raskin, R.G., and Rezgui, A., 2008, A Volcano Erupts—Semantic Data Registration and Integration: U.S. Geological Survey Scientific Investigations Report 2008-5172, p. 72–75. Gruber, T.R., 1993, A translation approach to portable ontologies: Knowledge Acquisition, v. 5, p. 199–220, doi:10.1006/knac.1993.1008. Halevy, A., Rajaraman, A., and Ordille, J., 2006, Data integration: The teenage years, in Dayal, U., Whang, K., Lomet, D., Alonso, G., Lohman, G., Kersten, M., Cha, S.K., and Kim, Y., eds., Proceedings of the 32nd International Conference on Very Large Data Bases (Seoul, Korea): Very Large Data Bases, VLDB Endowment: New York, Association of Computing Machinery, p. 9–16. Hapgood, M.A., 1992, Space physics coordinate transformations: A user guide: Planetary and Space Science, v. 40, p. 711–717, doi:10.1016/ 0032-0633(92)90012-D. Hobbs, J.R., and Pan, J., 2004, An ontology of time for the semantic web: ACM (Association for Computing Machinery) Transactions on Asian Language Information Processing, v. 3, p. 66–85, doi:10.1145/1017068.1017073. Lin, K., and Ludäscher, B., 2003, A system for semantic integration of geologic maps via ontologies, in Ashish, N., and Goble, C., eds., Semantic Web Technologies for Searching and Retrieving Scientific Data (SCIS), ISWC 2003 Workshop, v. 83: Aachen, Germany, Aachen University, Sun Site Central Europe. Malik, Z., Rezgui, A., and Sinha, A.K., 2007a, Ontologic Integration of Geoscience Data on the Semantic Web: U.S. Geological Survey Scientific Investigations Report 2007-5199, p. 41–43. Malik, Z., Rezgui, A., Sinha, A.K., Lin, K., and Bouguettaya, A., 2007b, DIA: A web services–based infrastructure for semantic integration in geoinformatics, in Proceedings of the IEEE (Institute of Electrical and Electronics Engineers) International Conference on Web Services (ICWS 2007): New York, IEEE, p. 1016–1023. Malik, Z., Rezgui, A., Medjahed, B., Ouzzani, M., and Sinha, A.K., 2010, Semantic integration on geosciences: International Journal of Semantic Computing, v. 4, no. 3, p. 1–30, doi:10.1142/S1793351X10001036. Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A., and Schneider, L., 2002, The Wonderweb Library of Foundational Ontologies and the DOLCE Ontology: Laboratorio di Ontologia Applicata Technical Report D 17: Padova, Italy, 37 p. Niles, I., and Pease, A., 2001, Towards a standard upper ontology, in Welty, C., and Smith, B., eds., Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001): New York, Association of Computing Machinery, p. 2–9. Noy, N.F., 2004, Semantic integration: A survey of ontology-based approaches: Special Interest Group on the Management of Data (SIGMOD) Record, v. 33, p. 65–70, doi:10.1145/1041410.1041421.
Infusing semantics into the knowledge discovery process for the new e-geoscience paradigm Obrst, L., 2003, Ontologies for semantically interoperable systems, in Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM 03): New York, Association of Computing Machinery, p. 366–369. Petford, N., Clemens, J.D., and Vigneresse, J., 1997, Application of information theory to the formation of granitic rocks, in Bochez, J., Hutton, D.H.W., and Stephens, W.E., eds., Granite: From Segregation of Melt to Emplacement Fabrics: Dordrecht, the Netherlands, Kluwer Academic Press, p. 3–10. Phytila, C., 2002, An Analysis of the SUMO and Description in Unified Modeling Language, Phytila-SUMO.htm, http://suo.ieee.org/SUO/SUMO/index .html (accessed 8 May 2011). Ramamurthy, M.K., 2006, A new generation of cyberinfrastructure and data services for earth system science education and research: Advances in Geosciences, v. 8, p. 69–78, doi:10.5194/adgeo-8-69-2006. Raskin, R.G., 2006, Development of ontologies for earth system science, in Sinha, A.K., ed., Geoinformatics: Data to Knowledge: Geological Society of America Special Paper 397, p. 195–200. Raskin, R.G., and Pan, M.J., 2005, Knowledge representation in the semantic web for earth and environmental terminology (SWEET): Computers & Geosciences, v. 31, p. 1119–1125, doi:10.1016/j.cageo.2004.12.004. Reitsma, F., and Albrecht, J., 2005, Modeling with the semantic web in the geosciences: Institute of Electrical and Electronics Engineers Intelligent Systems, v. 20, p. 86–88, doi:10.1109/MIS.2005.32. Rezgui, A., Malik, Z., and Sinha, A.K., 2007, DIA Engine: Semantic Discovery, Integration, and Analysis of Earth Science Data: U.S. Geological Survey Scientific Investigations Report 2007-5199, p. 15–18. Rezgui, A., Malik, Z., and Sinha, A.K., 2008, Semantically Enabled Registration and Integration Engines (SEDRE and DIA) for the Earth Sciences: U.S. Geological Survey Scientific Investigations Report 2008-5172, p. 47–52. Robock, A., 2000, Volcanic eruptions and climate: Review of Geophysics, v. 38, p. 191–219. Rosman, K.J.R., and Taylor, P.D.P., 1998, Isotopic compositions of the elements 1997: Pure and Applied Chemistry, v. 70, p. 217–235, doi:10.1351/ pac199870010217. Russell, C.T., 1971, Geophysical coordinate transformation: Cosmic Electrodynamics, v. 2, p. 184–196.
181
Schuurman, N., 2004, GIS: A Short Introduction: Malden, Massachusetts, Blackwell Publishing, 171 p. Semy, S., Pulvermacher, M., and Obrst, L., 2004, Towards the Use of an Upper Ontology for U.S. Government and Military Domains: An Evaluation: The MITRE Corporation (04-0603), http://handle.dtic.mil/100.2/ ADA459575 (accessed 8 May 2011). Simmhan, Y.L., Plale, B., and Gannon, D., 2005, Survey of data provenance in e-science: Special Interest Group on the Management of Data (SIGMOD) Record, v. 34, no. 3, p. 31–36, doi:10.1145/1084805.1084812. Simons, B., Boisvert, E., Brodaric, B., Cox, S., Duffy, T., Johnson, B., Laxton, J., and Richard, S., 2006, GeoSciML: Enabling the exchange of geological map data, in Australian Society of Exploration Geophysicists Extended Abstracts: Australia, Commonwealth Scientific and Industrial Research Organization, p. 1–4. Sinha, A.K., Zendel, A., Brodaric, B., Barnes, C., and Najdi, J., 2006a, Schema to ontology for igneous rocks, in Sinha, A.K., ed., Geoinformatics: Data to Knowledge: Geological Society of America Special Paper 397, p. 169– 182. Sinha, A.K., Lin, K., Raskin, R., and Barnes, C., 2006b, Cyberinfrastructure for the Geosciences-Ontology Based Discovery and Integration: U.S. Geological Survey Scientific Investigation Report 2006-5201, p. 1–2. Sinha, A.K., McGuinness, D., Fox, P., Raskin, R., Condie, K., Stern, R., Hanan, B., and Seber, D., 2007, Towards a Reference Plate Tectonics and Volcano Ontology for Semantic Scientific Data Integration: U.S. Geological Survey Scientific Investigations Report 2007-5199, p. 43–46. Sinha, A.K., Malik, Z., Rezgui, A., Zimmerman, H., Barnes, C.G., Thomas, W.A., Jackson, I., Gundersen, L.C., Heiken, G., Raskin, R., Fox, P., McGuinness, D.L., and Seber, D., 2010, Geoinformatics: Transforming data to knowledge for geosciences: GSA Today, v. 20, no. 12, p. 4–10, doi:10.1130/GSATG85A.1. Zaihrayeu, I., da Silva, P., and McGuinness, D.L., 2005, IWTrust: Improving user trust in answers from the Web, in Proceedings of 3rd International Conference on Trust Management (iTrust2005): Rocquencourt, France, Springer, p. 384–392.
MANUSCRIPT ACCEPTED BY THE SOCIETY 17 FEBRUARY 2011
Printed in the USA
The Geological Society of America Special Paper 482 2011
Global Map: International cooperation in the mapping sciences D.R. Fraser Taylor Geomatics and Cartographic Research Centre, Carleton University, Ottawa, Ontario, K1S 5B6, Canada, and Chair, International Steering Committee for Global Mapping, Geospatial Information Authority of Japan, Tsukuba, Japan
ABSTRACT This chapter discusses the origins and purpose of Global Map, the current situation of the initiative, and the challenges it faces in the future. A major societal challenge facing the world today involves finding a way to deal more effectively with growing environmental problems. Reliable geographic information at a global scale is an indispensable element in formulating policy responses to global environmental challenges. The main purpose of Global Map is to describe the status of the global environment to aid in decision-making processes. Global Map provides digital maps of the terrestrial surface of Earth at a resolution of 1 km, with consistent and comparable specifications for every country. It is produced in cooperation with the national mapping organization in each country. Global Map was initiated by the government of Japan as a contribution to the action plan of the United Nations Agenda 21 program. There are four vector and four raster layers. Version 1 of Global Map was released in June 2008 and includes coverage of Antarctica. It also includes two global maps with complete high-quality coverage, one on land cover and the other on percentage tree cover. New uses of Global Map include disaster adaptation, mitigation, and management, and educational applications. Although Global Map as a product is important, the cooperative process by which Global Map is produced is equally important. This ongoing cooperation will help to ensure the future of Global Map as it enters a new phase in its development and make a substantial contribution to capacity building in the application of geoinformation to sustainable development.
ANTECEDENTS OF GLOBAL MAP The idea of international cooperation in mapping is not a new one. The idea of an international 1:1,000,000 scale map of the world produced through a cooperative international effort was suggested at the end of the nineteenth century. At that time, the world was undergoing an unprecedented period of international cooperation in a number of fields, especially international trade and communications. Although the term was not used at that time, what is now called “globalization” was certainly a topic
of discussion (Pearson et al., 2006). Cartography at that time was primarily a tool of the nation state and, in many cases, of the colonial ambitions of these states. It was argued that this situation should change in order to respond to the international realities and possibilities of the new century. Each nation state was producing its own maps in a variety of scales and formats, and, as a result, there was no cartographic coverage at the global scale that could be used to respond to the emerging global challenges of the time. Clearly, there were problems relating “interoperability” between and among national map series.
Taylor, D.R.F., 2011, Global Map: International cooperation in the mapping sciences, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 183–191, doi:10.1130/2011.2482(14). For permission to copy, contact
[email protected]. © 2011 The Geological Society of America. All rights reserved.
183
184
D.R.F. Taylor
The idea of an internationally agreed-upon map series at the 1:1,000,000 scale had been proposed as early as 1860 by the British cartographer Sir Henry James (James, 1860), but the most significant and systematic outline of this idea was made by Penck in 1891 (Penck, 1893) at the International Geographical Congress in Switzerland. Although the idea was widely accepted and supported in principle at the time, implementation was very slow, and the international cooperation required to produce it proved difficult to achieve. There were protracted and often rancorous discussions, and initial progress was very slow. It was not until 1909 that agreement was finally reached at an international conference held in London and work began to create International Map of the World in a number of countries. A major setback took place in 1913, when the United States withdrew from the project, partially because of its impatience with the slow progress but also because of the isolationist policies of the government of the time (Pearson et al., 2006). The United States decided to produce its own 1:1,000,000 scale map of Latin America, which it considered to be in its “sphere of influence.” Despite this, at a second international map of the world conference held in Paris in 1913, final agreement on the map specifications was reached among the 34 nations represented. A coordinating bureau for the initiative was established at the British Ordnance Survey. The outbreak of World War I destroyed the cooperative process that had created the map specifications, but the impetus did not die, and the Royal Geographical Society of the UK produced a series of maps at the 1:1,000,000 scale, which used a simplified version of the international map of the world specifications (Heffernan, 1996), and the Ordnance Survey produced eight sheets of India according to those same specifications between 1914 and 1918 (Heffernan, 1996). Somewhat ironically, the Royal Geographical Society’s 1:1,000,000 maps were used at the Peace Conference in Paris in 1919, and the existence and utility of these maps provided a boost to the concept after the war. Slow progress on the creation of the international map of the world continued in the 1920s and 1930s, but this was again interrupted by the outbreak of World War II in 1939. During that war, the value of a 1:1,000,000 map series was recognized by many of the participants on both sides of the conflict, and several nations, including Japan, produced their own map series based on international map of the world specifications. In 1949, the International Geographical Union established a Commission on the International Map of the World, which suggested that the responsibility for the international map of the world be given to the cartographic unit of the newly established United Nations (UN), and this took place in 1951. There was, however, considerable skepticism in professional cartographic circles over the need for an international map of the world in the postwar era, and, despite UN interest and support, relatively few new sheets were produced. The influential American cartographer Arthur Robinson went as far as to argue that the international map of the world was no more than “cartographic wallpaper” (Robinson, 1965). The project continued to limp along, and in 1989, a UNESCO (United Nations Educational Scientific and Cultural Organization) report concluded that the international map of the
world was no longer feasible, and the project came to a formal end with less than half the map sheets required ever having been produced. Analysis of the international map of the world experience suggests that there were a number of reasons why this ambitious experiment did not succeed. There were no “…clear, consistent and manageable objectives” (Pearson et al., 2006, p. 24), and those promoting it over the years failed to create and implement a “clear and consistent vision for their project” (Pearson et al., 2006, p. 24). This experience provides valuable lessons for subsequent attempts to utilize geoinformatics, especially mapping, to respond to societal challenges, which is the unifying theme of this volume. The historical experience of the failure to fully implement the International Map of the World project, despite many decades of effort, has special significance for those attempting to implement Global Map (Pearson et al., 2006), which is the focus of this chapter. THE VISION, ORIGIN, AND PURPOSE OF GLOBAL MAP There are direct and interesting parallels between the vision and plans to create Global Map and the earlier attempts to create the international map of the world (Pearson et al., 2006). Almost exactly a century after the international map of the world was proposed, the government of Japan, with support from the United States, proposed a new initiative to create a 1:1,000,000 digital map of the world to aid in environmental and sustainable development decision making. Japan had a special interest in environmental issues at the international level and saw the creation of Global Map as one specific response to the challenges posed by the United Nations Conference on Environment and Development held in Brazil in 1992. The action plan of Agenda 21, which came out of that meeting, included a specific call for the creation of global environmental data as an aid to decision making (Pearson et al., 2006). Japan had earlier created international world maps. There is clear evidence in the report “An Image Survey Watching the Earth,” produced by the Geographical Survey Institute of Japan (GSI, 1991), that the proponents of Global Map, who were based in the GSI of Japan, had carefully considered the international map of the world experience in creating their action plan to establish and create Global Map (GSI, 1991). A deliberate attempt was made to avoid the major weaknesses that underlay the lack of success of the international map of the world project, which have been outlined by a number of authors (Winchester, 1995; Rhind, 2000; Heffernan, 2002; Pearson et al., 2006). This is evident both in the substantive content of Global Map and in the ongoing processes by which it is being created. International recognition and involvement have been keys to the creation of Global Map as outlined by Maruyama (1998), Masaharu and Akiyama (2003), and Okatani et al. (2006). Continuing international endorsement and political support, especially from the United Nations, have been critical factors for success. Global Map requires the active participation of national
Global Map: International cooperation in the mapping sciences and regional mapping organizations, and great care has been taken to ensure that each member of the Global Map family can effectively make a contribution to Global Map. This “bottomup” participative process is a key element of the Global Map initiative. For many nations, this involves an ongoing capacitybuilding process in geoinformation, which again is an important element in the creation of Global Map. The First International Workshop on Global Mapping was held in Japan in 1994, which set a target date for a first version of Global Map by the year 2000. The International Steering Committee for Global Mapping (ISCGM, which consists of representatives of national mapping organizations) was established at a second workshop held in 1996. Professor John Estes of the United States was elected as the first chair of ISCGM and the director general of the Geographical Survey Institute of Japan, Kunio Nonmura, outlined the proposals for the creation of Global Map. It was to be a digital map of the world at 1 km resolution (~1:1,000,000 scale) with eight layers, four vector and four raster, and common specifications. It was to be made freely available for public use in the international arena. Global Map was formally proposed by Japan and the United States and accepted as part of the implementation plan for Agenda 21 in 1997 at the 19th Special Session of the United Nations Economic and Social Council. The first Global Map Forum was held later that year, and the specifications of Global Map were finalized after an extensive consultation process. In January 1998, ISCGM sent out a letter with endorsement from the United Nations, inviting all of the national mapping organizations of the world to participate in Global Map. It is interesting to note the differences between the launch of Global Map and that of the international map of the world project discussed earlier. A great deal of preparatory work in the international arena was done to establish the concept for Global Map. It had clear and measurable objectives and a clear purpose. It had wide international support and was endorsed as an integral part of a United Nations initiative. Agreement was reached on the major elements of the initial specifications before nations were formally invited to participate, and these specifications were created in a pragmatic manner, taking into account the capabilities and wishes of participating nations. Global Map had a well-funded and wellorganized secretariat to coordinate its activities. Although Japan and the United States played a large role in creating the vision for Global Map, from the outset, great efforts were made to ensure that it was not seen as the initiative of any one national mapping organization but was a truly international initiative coordinated by an international steering committee of national mapping organizations with a neutral chair who was not the director of any one of the these organizations. Global Map was designed to meet global needs, but the way in which it was constructed meant that each country’s national needs were also met. In many developing nations, for example, the Global Map training programs and workshops funded by the government of Japan helped to build much needed human and institutional capacity in geospatial information management and to create national data sets. This
185
support has been continuous for over two decades, and it both encourages and facilitates national participation. Global Map thus became both a national and international endeavor. Many nations lacked the initial capacity to create their own digital map coverage. This challenge was met in two ways. First, to create version 0 of Global Map, existing digital data sets such as G TOPO 30, Vector Map level 0, and One Kilometer Advanced Very High Land Radiometer data were freely provided to each participating organization, which then updated and verified that coverage according to Global Map specifications. The provision of these data sets was facilitated by the U.S. Geological Survey. Without this support, the release of version 0 in 2000, which was largely based on these three data sets, would not have been possible. The quick release of version 0 with global coverage was important to demonstrate the viability of the Global Map vision. American isolationism seriously damaged the creation of the International Map of the World, but the involvement of the United States as a global player was a great advantage for the creation of Global Map. A second factor was the capacity-building program built in as an integral part of the Global Map initiative. From the outset, nations were asked to identify the level of involvement they wished to undertake. A country choosing level A involvement agreed not only to process its own data but also to help one or more other countries to do so. Countries requiring assistance to complete their coverage would choose the level C designation, and those choosing level B would agree only to process data for their own country. Japan has been the most active of the countries, choosing the level A designation. This is largely because of the partnership established among the Geographical Survey Institute (the national mapping organization, in 2010 renamed the Geospatial Information Authority of Japan), the Japan International Cooperation Agency (JICA), and the Ministry of Lands, Infrastructure, and Transport (MLIT). JICA is one of the few aid agencies specifically giving assistance to mapping agencies in the developing world (JICA, 2003). Part of that assistance involves support for the creation of Global Map. For a number of years, MLIT has supported a Global Map workshop in Africa, and it has also helped to facilitate an annual scholarship program funded by JICA since 1994, which brings trainees from national mapping organizations to the Geographical Information Authority of Japan for extended periods to receive training in the production of Global Map. These efforts have been supplemented by the private sector, and both ESRI and Intergraph have provided grant support to aid national mapping organizations, especially in the acquisition of software. As a result of these capacity-building efforts, many of the first nations to complete their Global Map coverage were developing nations, including Kenya, Myanmar, and Mongolia, among others. A major event in the history of Global Map was the involvement of the Global Map secretariat in the World Summit for Sustainable Development held in Johannesburg in 2002 (Masaharu
186
D.R.F. Taylor
and Akiyama, 2003). The secretariat participated in all four workshops leading up to the summit and in the summit itself. Largely as a result of this input, the implementation document that came out of the summit contains paragraphs 132 and 133 as follows:
the government of Japan has indicated that support will continue for a new phase of Global Map after the release of version 1 of Global Map in June 2008. THE PRESENT SITUATION
132. Promote the development and wider use of Earth observation technologies, including satellite remote sensing, global mapping and geographic information systems, to collect quality data on environmental impacts, land use and land-use changes, including through urgent actions at all levels to: (a) Strengthen cooperation and coordination among global observing systems and research programmes for integrated global observations, taking into account the need for building capacity and sharing of data from ground-based observations, satellite remote sensing and other sources among all countries; (b) Develop information systems that make the sharing of valuable data possible, including the active exchange of Earth observation data; (c) Encourage initiatives and partnerships for global mapping. 133. Support countries, particularly developing countries, in their national efforts to: (a) Collect data that are accurate, long-term, consistent and reliable; (b) Use satellite and remote-sensing technologies for data collection and further improvement of ground-based observations; (c) Access, explore and use geographic information by utilizing the technologies of satellite remote sensing, satellite global positioning, mapping and geographic information systems. (Capitalization as in original text; United Nations, 2002, p. 64)
The explicit recognition of Global Map and the identification of ISCGM as an implementing agency were important to reaffirm United Nations support for Global Map. All of the nations present at the summit endorsed the document, and at the national level, this provides each national mapping agency with strong arguments for the allocation of national resources for the production of Global Map. It is important to note that, as outlined earlier, Global Map is part of a much wider program by the government of Japan to support initiatives leading to improvements in environment and sustainable development at the global level. In 2000, Japan committed almost one third of its large Official Development Assistance budget for this purpose (Okada, 2003), and at the Johannesburg summit, it announced a new program, the Environmental Conservation Initiative for Sustainable Development (Okada, 2003), further strengthening Japan’s international support for environmental initiatives, of which Global Map is a part. Geomatics initiatives are much more likely to be effective if they are part of a much wider commitment to a specific goal. In the case of Global Map, this is the commitment of Japan to provide long-term funding and support to policies and programs aimed at improving global environmental conditions and furthering sustainable development at the global scale. It can be argued that no geomatics initiative can be fully effective if it is an isolated program. The International Map of the World project was isolated from other international initiatives of the time, which may have contributed to its lack of success. Global Map plays a very specific role in a much wider effort. It is also an effort that has been sustained over time with adequate funding and continuing support. This support has been provided since 1992, and
Table 1 shows the progress of Global Map over time, and Figure 1 shows the coverage as of July 2010 in map form. The initial progress of Global Map in terms of actual production of data was slower than expected, but as can be seen from Table 1, there was a very rapid acceleration in 2007 and 2008. As of July 2010, 164 countries and 16 regions including Antarctica were participating in Global Map. This represents over 96% of Earth’s territorial surface. Data have been released for almost 60% of Earth’s surface by area and 52% by population, and data for many other countries are undergoing verification. Sample coverage of the eight layers of Global Map for Kenya is shown in Figure 2. Version 1 of Global Map was released on 5 June 2008. In addition to the coverage mentioned already, a Global Percentage Tree Cover Map and a Global Land Cover Map were launched (Akatsuka, 2008). The map of global percentage tree cover is shown in Figure 3. Global Map has made remarkable progress, but many challenges remain in addition to achieving complete coverage. THE CHALLENGES AND THE FUTURE Plans for Global Map Phase III were approved by ISCGM at the 15th meeting held in Tokyo in January 2008 (Secretariat of the ISCGM, 2008a). Global Mapping Forum 2008 was also held at the same time, giving a much wider audience the opportunity to discuss and demonstrate the use of Global Map. Of particular note in this respect was the opportunity to observe “Global Map School” in action. Global Map School uses online Global Map coverage of two nations to facilitate online discussions on environmental issues between schools in each of the countries. In this case, the schools involved were Keio Futsubu and Chubotu Junior High School in Japan and Princess Chulaborn’s College
Year
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Total
TABLE 1. PARTICIPATION IN GLOBAL MAP Number of countries Number of countries and regions and regions for which data have been participating released 12 0 59 0 10 Global Map version 0 using existing global data sets released 8 4 32 2 9 6 5 2 18 1 19 6 6 20 2 24 180 75
Global Map: International cooperation in the mapping sciences
Figure 1. Progress of the Global Mapping project.
Nakhon Si Thammarat of Thailand. This was the third such Global Map School session, and it illustrated the utility of Global Map for educational purposes. The educational uses of Global Map were not specifically envisioned when Global Map was first introduced, and this is an important and growing application area for Global Map. Several nations, including the United States, are considering the use of their national coverage for educational purposes, and this will increase the value of Global Map to societies worldwide. A second important new application of Global Map is in the area of disaster adaptation, mitigation, and management. Immediately after the disastrous tsunami in the Indian Ocean in 2004, the Geographical Survey Institute of Japan developed a map of the disaster area using Global Map data (Ubukawa et al., 2008). The institute developed both land-use maps and elevation maps and models from the Global Map data and released these online. Since 2004, the institute has released maps of eight major disasters, and since 2007, these have also been posted on RELIEF WEB, the Website of the United Nations Office for the Coordination of Humanitarian Affairs, the United Nations agency that coordinates information on disaster areas. The Global Map data are posted within 48 hours of the disaster. Global Map data were of particular value during the floods in Myanmar in 2008. Since Myanmar had completed its Global Map coverage, this was one of the few comprehensive base map coverages available. Although small-scale mapping has limitations in disaster situations, it can be useful in comparative and contextual terms, as was the case in Myanmar.
Figure 2. The eight data layers of Global Map for Kenya.
187
188
D.R.F. Taylor
Figure 3. Global Map of percentage tree cover.
Global Map has also been used at the global scale to model projected change in maximum daily rainfall and regions at risk from sea-level change (www.iscgm.org). When Global Map was first conceived, the concept of national spatial data infrastructures was in its infancy. A national spatial data infrastructure can be defined as “the technology, policies, standards, human resources, and related activities necessary to acquire, process, distribute, use, maintain, and preserve spatial data” (United States Office of Management and Budget, 2002, p. 2). This concept has grown in importance in recent years, and a new Global Spatial Data Infrastructure Association has emerged. Many nations are creating national spatial data infrastructures. Although not specifically conceived as such, Global Map is an operational global spatial data infrastructure (Taylor, 2005). In addition, many nations, such as Kenya (Mweru, 2005) and Brazil, have used their Global Map coverage as the framework to create their national spatial data infrastructure by adding additional data layers and creating larger-scale coverages. Three hundred and forty-six participants from 24 countries attended Global Map Forum 2008 and unanimously adopted the Global Map Tokyo Declaration, which reads: Recalling that during the Earth Summit in 1992, the nations of the world addressed global environmental problems and adopted Agenda 21, that ISCGM was established to promote the development of Global Map, and that in 2002 global environment was further discussed in Johannesburg at WSSD, where the goal of the development of Global Map was further supported, [we] express our gratitude to 179 countries and regions of the world that have participated in the project for
their efforts to bring Global Map to this stage. At the same time, we work to further expand the use of Global Map. We also call on all those countries not yet committed to Global Map to join and work to ensure coverage of the terrestrial surface of the Earth. We further recognize that global environmental problems such as climate change, deforestation and desertification have become serious problems for humanity, that issues on climate change will be a major topic at the G8 Hokkaido Toyako Summit in July, which brings together world leaders. All people, including those dedicated to mapping the Earth should make a contribution to solving these problems. The users and producers of Global Map call for the strengthening and coordination required to make Global Map, which has been developed with common specifications and internationally agreed standards, and which accurately describes the status and impact of human activities. Global Map gives a common understanding to people who live on the Earth. Global Map should be more usable and easily available to assist in decision making to help solve the common environmental problems facing humanity. Of particular importance in this respect is capacity building activities for and with developing nations. (Secretariat of the ISCGM, 2008a, p. 5)
The declaration gave a new sense of vision for Global Map. In the declaration, mention is made of the G8 Summit in Japan, at which Global Map was prominently featured. Global Map was also featured at the meetings of the ministers of the environment of the G8 nations and at the Fourth Tokyo International Conference on African Development (TICAD IV), which was held in Yokahama in late May 2008. TICAD has been held every 5 yr since 1993, and it is an international conference on Africa jointly organized by the government of Japan, the World Bank, and The United Nations focusing on development problems in Africa. Fifty-one
Global Map: International cooperation in the mapping sciences African heads of state participated in the conference in 2008, as did over 3000 participants. A major product of the meeting was the Yokohama Action Plan, which outlines concrete support actions. Under the section dealing with Environmental/Climate Change Issues, Global Map is specifically mentioned: “2. Adaptation. Promote technical assistance such as establishing and updating the Global Map data for the entire Africa, describing the status of its environment in five years” (Nakamura, 2008, p. 7). The international recognition of the importance of Global Map is important for the future of the initiative. Such recognition makes the work of ISCGM more visible at both national and international levels, and the legitimacy of Global Map is enhanced. A specific goal for the creation and use of Global Map in Africa has been established. Although the use of Global Map is increasing and numerous examples of such use are given on the ISCGM Website, it is clear that additional efforts are required to further inform the world of the existence of Global Map and to promote and increase its use. This was a key message in the discussions and resolutions of the 15th meeting of ISCGM in June 2008 (Secretariat of the ISCGM, 2008b). In resolutions dealing with this topic, the following resolution appears: “ISCGM anticipates that the above-mentioned outreach activities will lead to a broad and effective use of Global Map for research and policy formulation for environmental protection, mitigation of natural disasters and the achievement of sustainable development and for education and other purposes” (Secretariat of the ISCGM, 2008b, p. 2). Increasing the effective use of Global Map is, perhaps, the major challenge facing Global Map as it enters the third phase of its existence. Experience has shown that regardless of the perceived value of an initiative such as Global Map from the perspective of those producing and supporting it, its value to society will depend upon its use, and encouraging such use must be an integral part of the planning process for all endeavors of this type. Producing Global Map is a necessary, but not sufficient, step. Global Map must be easy to use and readily accessible to users at low or no cost. This is a central principle of Global Map, but there are a number of barriers to be overcome if that goal is to be achieved. Some of these are technical, but others are administrative and political. On the technical side, Global Map has recently updated its specifications. The original specifications created by ISCGM are over a decade old, and although they were the best available at the time, much has changed over the years (Taylor, 2008; Nagayama, 2008). Global Map reviewed its technical specifications, and a workshop to discuss and approve the new specifications was held in September 2009. ISCGM is a Class A member of ISO (International Standards Organization), and its new specifications meet the latest ISO TC 211standards (Secretariat of the ISCGM, October 2009). The major changes include a change to the official format of vector data from Vector Product Format to Geography Markup Language (GML) 3.2.1 (ISO 91136) and the adoption of a metadata profile based on ISO 19115. The tile size is determined by each nation. Increased efforts are being made to
189
make technical access to data easier for users, and discussions are under way to create a Global Map Data portal. In 2008, the average number of downloads of Global Map data on a monthly basis increased over 100% between January and December. There is an obvious linkage between Global Map and national spatial data infrastructures, as mentioned earlier. Scale is also an issue because for many small states, especially small island states, the 1:1,000,000 scale is too large to be of real utility, and Global Map has decided to accept data at smaller scales, such as 1:250,000, to help address this problem. It is also important that Global Map data be regularly updated. This is the responsibility of each participating nation, and a 5 yr update cycle is anticipated. The adoption of the new standards and specifications poses updating challenges. Existing coverage must be updated to reflect these changes. All new Global Map coverage will be in the new format. To date, only one nation, Bulgaria, has updated their maps using the new specifications. A technical challenge facing Global Map is to create seamless regional coverage. The existing tile structure is based on a nation-by-nation approach. Euroglobal Map has resolved this problem, and similar efforts are under way to create a seamless Global Map for Asia, Latin America (Barriga, 2005), and North America. These efforts are not being led by ISCGM but by a variety of other organizations building on and expanding Global Map coverage. These include the Pan American Institute of Geography and History and the Permanent Committee for Geographic Information Processing for Asia and the Pacific. A seamless regional approach also brings with it the political issues of disputed border regions, but in contentious issues of this type, Global Map follows the United Nations’ practice. Because existing Global Map coverage is created by individual nations or regions, each participant uses their own definition of its borders, which the existing tile structure allows them to do. This ability to define their own borders has been a factor in the acceptance of Global Map specifications, and the retention of the tile structure in the new specifications reflects the need to respond to the interests of ISCGM members. The utility of Global Map in technical terms will be greatly enhanced by ensuring interoperability with other geospatial data sets, and in creating the new specifications for Global Map, special attention has been given to this issue. Despite the considerable efforts of the Open Geospatial Consortium (OGC) and others, many technical problems of interoperability between geospatial data sets, such as semantic interoperability, still remain. The new Global Map standards and specifications are fully consistent with ISO TC 211 and utilize both Open Geospatial Consortium Web Mapping Services specifications and GML. The difficult and complex administrative and political issues surrounding access to geospatial data in general were outlined in a Committee on Data for Science and Technology (CODATA) White Paper on this topic released in 2008 (CODATA, 2008). This paper was produced to help implement the data-sharing principles of GEOSS, the Global Earth Observation System of Systems, in which ISCGM is playing an active role. GEOSS
190
D.R.F. Taylor
established a data-sharing task force in 2004, and this group produced the GEOSS Data Sharing Principles action plan for consideration at the GEOSS ministerial summit in November 2010. This major international initiative is described in another chapter of this volume, but the issues discussed apply to any initiative to use geoinformation for societal benefit, including all of those described in this book. If geospatial data are not easily accessible, then no initiative designed to meet societal needs will succeed in obtaining its objective. The acceptance of principles on data access is clearly much easier than the implementation of these principles, as is clearly indicated by the Global Map experience. In signing the agreement to participate in Global Map, each participant agrees to make their data available at no, or low, cost, although a distinction is made between data for public or commercial use. In the latter case, nations are free to follow their own business models relating to charging for data. Despite this formal agreement, problems have arisen. For example, excellent Global Map data have been available for Europe in the form of Euroglobal Map for some time, but the business model of Eurogeographics has made the implementation of the free or low-cost access to data principle of Global Map very difficult to achieve. Eurogeographics as an organization did not exist when many individual European nations agreed to participate in Global Map. Discussions on this matter have continued over the years, and a partial solution was reached in late 2008, when 11 of the member nations of Eurogeographics agreed in principle to allow free access to their Euroglobal Map data. This will require new licensing agreements, which are currently under discussion. In late 2009, the business model of the Ordnance Survey, a key member of Eurogeographics, underwent substantial revision to make more map data more readily available at minimum cost. This will affect the business model of Eurogeographics, and constructive discussions continue between ISCGM and Eurogeographics to further resolve the data-access issues. Financial return is also a major factor in the creation and release of data from Russia. Russia is a participant in Global Map but argues that it requires substantial payment for its data before it is prepared to provide them to ISCGM. There have been political and organizational changes in Russia since its initial agreement to participate in Global Map, and, again, discussions are under way between ISCGM and the new organization responsible for national mapping in Russia. In times of organizational change, resolution of these issues is a difficult and complex process. Security issues are slowing down the release of data for some members such as Israel, although it is interesting to note that the Global Map data for Palestine (West Bank and Gaza Strip) have already been released. Initially, security concerns also affected the release of Global Map data for both India and Pakistan, despite the small scale of the data involved. These were satisfactorily resolved and helped to create the more general availability of geospatial data, especially in India, where discussions took place at the cabinet level, resulting in a much improved situation. Despite these difficulties, access and availability of Global Map data are rapidly increasing, as indicated by the growing
number of downloads from the Global Map Website described earlier. Global Map also has a very close working relationship with the United Nations Second Administrative Level Boundaries (SALB) project. For example, SALB boundary codes are now an integral part of the new Global Map specifications. Global Map was a model for the OneGeology project described elsewhere in this volume, and the structure and organization of OneGeology reflect much of the Global Map experience. The Global Land Cover Network of Food and Agriculture Organization (FAO) is also a Global Map partner organization and the Global Land Cover Map uses the FAO land-cover categories. CONCLUSION The empirical evidence supports the conclusion that Global Map has been remarkably successful, although many challenges remain. The technical challenges will be much easier to overcome than the administrative and political ones. The new specifications developed in late 2009 are a positive response to the technical challenges. Global Map is working closely with GEOSS to help to address the nine societal benefit areas. GEOSS is discussed in more detail elsewhere in this volume. The major administrative challenges involve the more active participation of key partner nations, especially Russia. Global Map depends on voluntary cooperation, and the approach used to encourage such participation includes incentives, capacity building, and technical support working through the national mapping organizations. Every effort is made to facilitate national needs and priorities through participation in this international project. Many factors have contributed to the ongoing success of Global Map, which is an excellent example of the utility of geospatial information to society, but one stands out above all of the others— the cooperative process by which Global Map is being created. Many nations and international organizations have the technical ability to produce global data sets, and there are numerous excellent examples of this. What sets Global Map apart is the role of the national and regional organizations participating in Global Map in creating and/or checking and verifying the digital coverage required. The Global Map coverage for countries such as Kenya, for example, is a Kenyan creation of which the nation can be justifiably proud (Mweru, 2005). Kenya is making use of Global Map coverage to meet a wide variety of societal needs. In creating Global Map, national and regional needs are being met, and, at the same time, both human and social capital is being built in the application of geospatial technologies to the development of those societies. In developing nations in particular, it is important for indigenous scientists to be involved in the application of geospatial technologies (Taylor, 2004). To help with environmental and sustainable development decision making, Global Map as a product is making an important contribution to society, but it is perhaps the process by which it is produced and the capacity building in the creation and use of geospatial data that will make the greatest contribution to sustainable development.
Global Map: International cooperation in the mapping sciences REFERENCES CITED Akatsuka, F., 2008, Release of land cover and percentage tree cover data of the Global Map, version 1 (global version): Global Mapping Newsletter, v. 51, p. 3. Barriga, R., 2005, America’s global map: Presentation to the 12th ISCGM (International Steering Committee for Global Mapping): Tsukuba, Japan, ISGM, 15 p. Committee on Data for Science and Technology (CODATA), 2008, White Paper on GEOSS Data Sharing Principles: Paris, CODATA, p. 93, http:// www.earthobservations.org/documents/dsp/Draft%20White%20 Paper%20for%20GEOSS%20Data%20Sharing%20Policies_27Sept08 .pdf (accessed January 2009). Geographical Survey Institute of Japan, 1991, An Image Survey Watching the Earth: Tokyo, Printing Bureau, Ministry of Japan, 263 p. (in Japanese). Heffernan, M., 1996, Geography, cartography and military intelligence: Transactions of the Institute of British Geographers, v. 21, p. 504–533, doi:10.2307/622594. Heffernan, M., 2002, The politics of the map in the early twentieth century: Cartography and Geographic Information Science, v. 29, p. 207–226, doi:10.1559/152304002782008512. James, H., 1860, Description of the projection used in the topographical section of the War Office for maps embracing large portions of the Earth’s surface: Journal of the Royal Geographical Society, v. 30, p. 106–111, doi: 10.2307/1798292. Japan International Cooperation Agency, 2003, Contributions to National Mapping Progress: Tokyo, JICA Social Development Study Department. Maruyama, H., 1998, History of activities for international agreement on the development of the Global Map: Bulletin of the Geographical Survey Institute of Japan, v. 44, p. 63–90. Masaharu, M., and Akiyama, M., 2003, Publicity activities of Global Mapping at Johannesburg summit and outcomes of the summit: Bulletin of the Geographical Survey Institute of Japan, v. 49, p. 56–69. Mweru, K., 2005, Kenya’s experience with Global Map: Presentation to the 12th ISCGM (International Steering Committee for Global Mapping): Tsukuba, Japan, ISGM Secretariat, 12 p. Nagayama, T., 2008, Global Map Standards and Specifications: Presentation to the International Standards Organization Standards in Action Workshop: Tsukuba, Japan, International Steering Committee for Global Mapping Secretariat, http://www.isotc211.org/WorkshopTsukuba/Workshop -Tsukuba.htm (accessed January 2009). Nakamura, T., 2008, TICADIV and Global Map: Global Mapping Newsletter, v. 50, p. 7. Okada, S., 2003, Towards a green future: Asian Pacific Perspectives: Japan+1 1, p. 26–29. Okatani, T., Maruyama, H., Sasaki, M., Yaguguchi, T., Magayama, S., Kayaba, M., Abe, M., and Kishimoto, N., 2006, Progress of Global Mapping Proj-
191
ect: The Johannesburg summit in 2002: Bulletin of the Geographical Survey Institute of Japan, v. 53, p. 7–16. Pearson, A., Taylor, D.R.F., Kline, K.D., and Heffernan, M., 2006, Cartographic ideals and geopolitical realities: International Map of the World from the 1890s to the present: The Canadian Geographer, v. 50, no. 2, p. 149–176, doi:10.1111/j.0008-3658.2006.00133.x. Penck, A., 1893, Construction of a map of the world on a scale of 1:1,000,000: Geographical Journal, v. 1, p. 253–261. Rhind, D.W., 2000, Current shortcomings of global mapping and the creation of a new geographical framework for the world: The Geographical Journal, v. 166, p. 295–305, doi:10.1111/j.1475-4959.2000.tb00031.x. Robinson, A.H., 1965, The future of the international map: The Cartographic Journal, v. 1, p. 1–4. Secretariat of the International Steering Committee for Global Mapping (ISCGM), 2008a, Global Mapping Newsletter, v. 50, June, 6 p. Secretariat of the International Steering Committee for Global Mapping (ISCGM), 2008b, Global Mapping Newsletter, v. 51, September, 4 p. Secretariat of the International Steering Committee for Global Mapping (ISCGM), 2009, Global map specifications version 2, in Report of the Sixteenth Meeting of the International Steering Committee for Global Mapping: Bangkok, Thailand, International Steering Committee for Global Mapping, October, 75 p. Taylor, D.R.F., 2004, Capacity building and geographic information technologies in African development, in Brunn, S.D., Cutter, S.L., and Harrington, J.W., eds., Geography and Technology: Dordrecht, the Netherlands, Kluwer, p. 521–546. Taylor, D.R.F., 2005, The history and development of Global Map as a global spatial data infrastructure, in Proceedings of the International Federation of Surveyors Working Week and the 8th International Conference of the Global Spatial Data Infrastructure (GSDI) Association: Cairo, GSDI (CDROM). Taylor, D.R.F., 2008, Global Map standards and specifications, in Proceedings of GSDI-10 Conference, Standards Workshop: Trinidad (CD-ROM). Ubukawa, T., Kisanuki, J., and Akatsuka, F., 2008, Global Map—An international project: Geographische Rundschau: International Edition, v. 4, p. 62–65. United Nations, 2002, Report of the World Summit on Sustainable Development: http://www.Johannesburg summit.org/html/documents/su;mmit-docs.html (accessed December 2008). United States Office of Management and Budget, 2002, National Spatial Data Infrastructure of the United States Circular A-16: http://whitehouse.gov/ OMB/circulates/a016_REV.htm/#background (accessed December 2009). Winchester, S., 1995, Taking the world’s measure: Cartography’s greatest undertaking survived wars and bureaucratic snarls only to die when it was nearly done: Civilization, v. 2, p. 56–59. MANUSCRIPT ACCEPTED BY THE SOCIETY 17 FEBRUARY 2011
Printed in the USA