EFFECTIVE GROUNDWATER MODEL CALIBRATION With Analysis of Data, Sensitivities, Predictions, and Uncertainty
MARY C. HILL...
43 downloads
629 Views
6MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
EFFECTIVE GROUNDWATER MODEL CALIBRATION With Analysis of Data, Sensitivities, Predictions, and Uncertainty
MARY C. HILL CLAIRE R. TIEDEMAN
EFFECTIVE GROUNDWATER MODEL CALIBRATION
EFFECTIVE GROUNDWATER MODEL CALIBRATION With Analysis of Data, Sensitivities, Predictions, and Uncertainty
MARY C. HILL CLAIRE R. TIEDEMAN
Published 2007 by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, 201-748-6011, fax 201-748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at 877-762-2974, outside the United States at 317-572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Hill, Mary C. (Mary Catherine) Effective groundwater model calibration: with analysis of data, sensitivities, predictions, and uncertainty/Mary C. Hill, Claire R. Tiedeman. p. cm. Includes index. ISBN-13: 978-0-471-77636-9 (cloth) ISBN-10: 0-471-77636-X (cloth) 1. Groundwater- -Mathematical models. 2. Hydrologic models. I. Tiedeman, Claire R. II. Title. GB1001.72.M35H55 2006 551.49010 5118- -dc22 Printed in the United States of America 10 9 8 7 6
5 4 3
2 1
2005036657
We dedicate this book to the groundwater modelers and software developers of the U.S. Geological Survey. These men and women devote their careers to providing sound scientific analyses for policy makers and to enabling others in the government and the private sector to do the same. We are honored to be their colleagues. We also dedicate this book to the United States taxpayers, to whom we are ultimately accountable. They have supported our educations, salaries, field work and students. We hope our efforts have improved the understanding and management of their groundwater resources. With love, we also dedicate this book to our husbands and families.
CONTENTS
Preface 1
xvii
Introduction 1.1 1.2
1.3
1.4
1.5 1.6
1
Book and Associated Contributions: Methods, Guidelines, Exercises, Answers, Software, and PowerPoint Files, 1 Model Calibration with Inverse Modeling, 3 1.2.1 Parameterization, 5 1.2.2 Objective Function, 6 1.2.3 Utility of Inverse Modeling and Associated Methods, 6 1.2.4 Using the Model to Quantitatively Connect Parameters, Observations, and Predictions, 7 Relation of this Book to Other Ideas and Previous Works, 8 1.3.1 Predictive Versus Calibrated Models, 8 1.3.2 Previous Work, 8 A Few Definitions, 12 1.4.1 Linear and Nonlinear, 12 1.4.2 Precision, Accuracy, Reliability, and Uncertainty, 13 Advantageous Expertise and Suggested Readings, 14 Overview of Chapters 2 Through 15, 16
vii
viii
2
3
CONTENTS
Computer Software and Groundwater Management Problem Used in the Exercises 2.1 Computer Programs MODFLOW-2000, UCODE_2005, and PEST, 18 2.2 Groundwater Management Problem Used for the Exercises, 21 2.2.1 Purpose and Strategy, 23 2.2.2 Flow System Characteristics, 23 2.3 Exercises, 24 Exercise 2.1: Simulate Steady-State Heads and Perform Preparatory Steps, 25 Comparing Observed and Simulated Values Using Objective Functions 3.1
3.2
3.3
3.4
3.5 3.6
4
26
Weighted Least-Squares Objective Function, 26 3.1.1 With a Diagonal Weight Matrix, 27 3.1.2 With a Full Weight Matrix, 28 Alternative Objective Functions, 28 3.2.1 Maximum-Likelihood Objective Function, 29 3.2.2 L1 Norm Objective Function, 29 3.2.3 Multiobjective Function, 29 Requirements for Accurate Simulated Results, 30 3.3.1 Accurate Model, 30 3.3.2 Unbiased Observations and Prior Information, 30 3.3.3 Weighting Reflects Errors, 31 Additional Issues 3.4.1 Prior Information, 32 3.4.2 Weighting, 34 3.4.3 Residuals and Weighted Residuals, 35 Least-Squares Objective-Function Surfaces, 35 Exercises, 36 Exercise 3.1: Steady-State Parameter Definition, 36 Exercise 3.2: Observations for the Steady-State Problem, 38 Exercise 3.3: Evaluate Model Fit Using Starting Parameter Values, 40
Determining the Information that Observations Provide on Parameter Values using Fit-Independent Statistics 4.1
18
Using Observations, 42 4.1.1 Model Construction and Parameter Definition, 42 4.1.2 Parameter Values, 43
41
ix
CONTENTS
4.2
When to Determine the Information that Observations Provide About Parameter Values, 44 4.3 Fit-Independent Statistics for Sensitivity Analysis, 46 4.3.1 Sensitivities, 47 4.3.2 Scaling, 48 4.3.3 Dimensionless Scaled Sensitivities (dss), 48 4.3.4 Composite Scaled Sensitivities (css), 50 4.3.5 Parameter Correlation Coefficients ( pcc), 51 4.3.6 Leverage Statistics, 54 4.3.7 One-Percent Scaled Sensitivities, 54 4.4 Advantages and Limitations of Fit-Independent Statistics for Sensitivity Analysis, 56 4.4.1 Scaled Sensitivities, 56 4.4.2 Parameter Correlation Coefficients, 58 4.4.3 Leverage Statistics, 59 4.5 Exercises, 60 Exercise 4.1: Sensitivity Analysis for the Steady-State Model with Starting Parameter Values, 60 5
Estimating Parameter Values 5.1
5.2 5.3 5.4 5.5 5.6
6
The Modified Gauss – Newton Gradient Method, 68 5.1.1 Normal Equations, 68 5.1.2 An Example, 74 5.1.3 Convergence Criteria, 76 Alternative Optimization Methods, 77 Multiobjective Optimization, 78 Log-Transformed Parameters, 78 Use of Limits on Estimated Parameter Values, 80 Exercises, 80 Exercise 5.1: Modified Gauss –Newton Method and Application to a Two-Parameter Problem, 80 Exercise 5.2: Estimate the Parameters of the Steady-State Model, 87
Evaluating Model Fit 6.1 6.2 6.3
67
Magnitude of Residuals and Weighted Residuals, 93 Identify Systematic Misfit, 94 Measures of Overall Model Fit, 94 6.3.1 Objective-Function Value, 95
93
x
CONTENTS
6.3.2 Calculated Error Variance and Standard Error, 95 6.3.3 AIC, AICc, and BIC Statistics, 98 6.4 Analyzing Model Fit Graphically and Related Statistics, 99 6.4.1 Using Graphical Analysis of Weighted Residuals to Detect Model Error, 100 6.4.2 Weighted Residuals Versus Weighted or Unweighted Simulated Values and Minimum, Maximum, and Average Weighted Residuals, 100 6.4.3 Weighted or Unweighted Observations Versus Simulated Values and Correlation Coefficient R, 105 6.4.4 Graphs and Maps Using Independent Variables and the Runs Statistic, 106 6.4.5 Normal Probability Graphs and Correlation Coefficient R2N, 108 6.4.6 Acceptable Deviations from Random, Normally Distributed Weighted Residuals, 111 6.5 Exercises, 113 Exercise 6.1: Statistical Measures of Overall Fit, 113 Exercise 6.2: Evaluate Graph Model fit and Related Statistics, 115 7
Evaluating Estimated Parameter Values and Parameter Uncertainty 7.1 7.2
Reevaluating Composite Scaled Sensitivities, 124 Using Statistics from the Parameter Variance – Covariance Matrix, 125 7.2.1 Five Versions of the Variance –Covariance Matrix, 125 7.2.2 Parameter Variances, Covariances, Standard Deviations, Coefficients of Variation, and Correlation Coefficients, 126 7.2.3 Relation Between Sample and Regression Statistics, 127 7.2.4 Statistics for Log-Transformed Parameters, 130 7.2.5 When to Use the Five Versions of the Parameter Variance – Covariance Matrix, 130 7.2.6 Some Alternate Methods: Eigenvectors, Eigenvalues, and Singular Value Decomposition, 132 7.3 Identifying Observations Important to Estimated Parameter Values, 132 7.3.1 Leverage Statistics, 134 7.3.2 Influence Statistics, 134 7.4 Uniqueness and Optimality of the Estimated Parameter Values, 137 7.5 Quantifying Parameter Value Uncertainty, 137
124
CONTENTS
xi
7.5.1 Inferential Statistics, 137 7.5.2 Monte Carlo Methods, 140 7.6 Checking Parameter Estimates Against Reasonable Values, 140 7.7 Testing Linearity, 142 7.8 Exercises, 145 Exercise 7.1: Parameter Statistics, 145 Exercise 7.2: Consider All the Different Correlation Coefficients Presented, 155 Exercise 7.3: Test for Linearity, 155 8
Evaluating Model Predictions, Data Needs, and Prediction Uncertainty 8.1
Simulating Predictions and Prediction Sensitivities and Standard Deviations, 158 8.2 Using Predictions to Guide Collection of Data that Directly Characterize System Properties, 159 8.2.1 Prediction Scaled Sensitivities (pss), 160 8.2.2 Prediction Scaled Sensitivities Used in Conjunction with Composite Scaled Sensitivities, 162 8.2.3 Parameter Correlation Coefficients without and with Predictions, 162 8.2.4 Composite and Prediction Scaled Sensitivities Used with Parameter Correlation Coefficients, 165 8.2.5 Parameter –Prediction ( ppr) Statistic, 166 8.3 Using Predictions to Guide Collection of Observation Data, 170 8.3.1 Use of Prediction, Composite, and Dimensionless Scaled Sensitivities and Parameter Correlation Coefficients, 170 8.3.2 Observation – Prediction (opr) Statistic, 171 8.3.3 Insights About the opr Statistic from Other Fit-Independent Statistics, 173 8.3.4 Implications for Monitoring Network Design, 174 8.4 Quantifying Prediction Uncertainty Using Inferential Statistics, 174 8.4.1 Definitions, 175 8.4.2 Linear Confidence and Prediction Intervals on Predictions, 176 8.4.3 Nonlinear Confidence and Prediction Intervals, 177 8.4.4 Using the Theis Example to Understand Linear and Nonlinear Confidence Intervals, 181 8.4.5 Differences and Their Standard Deviations, Confidence Intervals, and Prediction Intervals, 182
158
xii
CONTENTS
8.4.6 8.5
8.6 8.7 8.8
9
Using Confidence Intervals to Serve the Purposes of Traditional Sensitivity Analysis, 184 Quantifying Prediction Uncertainty Using Monte Carlo Analysis, 185 8.5.1 Elements of a Monte Carlo Analysis, 185 8.5.2 Relation Between Monte Carlo Analysis and Linear and Nonlinear Confidence Intervals, 187 8.5.3 Using the Theis Example to Understand Monte Carlo Methods, 188 Quantifying Prediction Uncertainty Using Alternative Models, 189 Testing Model Nonlinearity with Respect to the Predictions, 189 Exercises, 193 Exercise 8.1: Predict Advective Transport and Perform Sensitivity Analysis, 195 Exercise 8.2: Prediction Uncertainty Measured Using Inferential Statistics, 207
Calibrating Transient and Transport Models and Recalibrating Existing Models 9.1
Strategies for Calibrating Transient Models, 213 9.1.1 Initial Conditions, 213 9.1.2 Transient Observations, 214 9.1.3 Additional Model Inputs, 216 9.2 Strategies for Calibrating Transport Models, 217 9.2.1 Selecting Processes to Include, 217 9.2.2 Defining Source Geometry and Concentrations, 218 9.2.3 Scale Issues, 219 9.2.4 Numerical Issues: Model Accuracy and Execution Time, 220 9.2.5 Transport Observations, 223 9.2.6 Additional Model Inputs, 225 9.2.7 Examples of Obtaining a Tractable, Useful Model, 226 9.3 Strategies for Recalibrating Existing Models, 227 9.4 Exercises (optional), 228 Exercises 9.1 and 9.2: Simulate Transient Hydraulic Heads and Perform Preparatory Steps, 229 Exercise 9.3: Transient Parameter Definition, 230
213
xiii
CONTENTS
Exercise 9.4: Observations for the Transient Problem, 231 Exercise 9.5: Evaluate Transient Model Fit Using Starting Parameter Values, 235 Exercise 9.6: Sensitivity Analysis for the Initial Model, 235 Exercise 9.7: Estimate Parameters for the Transient System by Nonlinear Regression, 243 Exercise 9.8: Evaluate Measures of Model Fit, 244 Exercise 9.9: Perform Graphical Analyses of Model Fit and Evaluate Related Statistics, 246 Exercise 9.10: Evaluate Estimated Parameters, 250 Exercise 9.11: Test for Linearity, 253 Exercise 9.12: Predictions, 254 10
Guidelines for Effective Modeling 10.1 10.2 10.3
11
260
Purpose of the Guidelines, 263 Relation to Previous Work, 264 Suggestions for Effective Implementation, 264
Guidelines 1 Through 8—Model Development Guideline 1: Apply the Principle of Parsimony, 268 G1.1 Problem, 269 G1.2 Constructive Approaches, 270 Guideline 2: Use a Broad Range of System Information to Constrain the Problem, 272 G2.1 Data Assimilation, 273 G2.2 Using System Information, 273 G2.3 Data Management, 274 G2.4 Application: Characterizing a Fractured Dolomite Aquifer, 277 Guideline 3: Maintain a Well-Posed, Comprehensive Regression Problem, 277 G3.1 Examples, 278 G3.2 Effects of Nonlinearity on the css and pcc, 281 Guideline 4: Include Many Kinds of Data as Observations in the Regression, 284 G4.1 Interpolated “Observations”, 284 G4.2 Clustered Observations, 285 G4.3 Observations that Are Inconsistent with Model Construction, 286
268
xiv
CONTENTS
G4.4
Applications: Using Different Types of Observations to Calibrate Groundwater Flow and Transport Models, 287 Guideline 5: Use Prior Information Carefully, 288 G5.1 Use of Prior Information Compared with Observations, 288 G5.2 Highly Parameterized Models, 290 G5.3 Applications: Geophysical Data, 291 Guideline 6: Assign Weights that Reflect Errors, 291 G6.1 Determine Weights, 294 G6.2 Issues of Weighting in Nonlinear Regression, 298 Guideline 7: Encourage Convergence by Making the Model More Accurate and Evaluating the Observations, 306 Guideline 8: Consider Alternative Models, 308 G8.1 Develop Alternative Models, 309 G8.2 Discriminate Between Models, 310 G8.3 Simulate Predictions with Alternative Models, 312 G8.4 Application, 313 12
Guidelines 9 and 10—Model Testing
315
Guideline 9: Evaluate Model Fit, 316 G9.1 Determine Model Fit, 316 G9.2 Examine Fit for Existing Observations Important to the Purpose of the Model, 320 G9.3 Diagnose the Cause of Poor Model Fit, 320 Guideline 10: Evaluate Optimized Parameter Values, 323 G10.1 Quantify Parameter-Value Uncertainty, 323 G10.2 Use Parameter Estimates to Detect Model Error, 323 G10.3 Diagnose the Cause of Unreasonable Optimal Parameter Estimates, 326 G10.4 Identify Observations Important to the Parameter Estimates, 327 G10.5 Reduce or Increase the Number of Parameters, 328 13
Guidelines 11 and 12—Potential New Data Guideline 11: Identify New Data to Improve Simulated Processes, Features, and Properties, 330 Guideline 12: Identify New Data to Improve Predictions, 334 G12.1 Potential New Data to Improve Features and Properties Governing System Dynamics, 334 G12.2 Potential New Data to Support Observations, 335
329
xv
CONTENTS
14
Guidelines 13 and 14—Prediction Uncertainty
337
Guideline 13: Evaluate Prediction Uncertainty and Accuracy Using Deterministic Methods, 337 G13.1 Use Regression to Determine Whether Predicted Values Are Contradicted by the Calibrated Model, 337 G13.2 Use Omitted Data and Postaudits, 338 Guideline 14: Quantify Prediction Uncertainty Using Statistical Methods, 339 G14.1 Inferential Statistics, 341 G14.2 Monte Carlo Methods, 341 15
Using and Testing the Methods and Guidelines 15.1 15.2
Execution Time Issues, 345 Field Applications and Synthetic Test Cases, 347 15.2.1 The Death Valley Regional Flow System, California and Nevada, USA, 347 15.2.2 Grindsted Landfill, Denmark, 370
Appendix A: Objective Function Issues A.1 A.2 A.3 A.4
383
Vectors and Matrices for Nonlinear Regression, 383 Quasi-Newton Updating of the Normal Equations, 384 Calculating the Damping Parameter, 385 Solving the Normal Equations, 389 References, 390
Appendix C: Two Important Properties of Linear Regression and the Effects of Nonlinearity C.1
374
Derivation of the Maximum-Likelihood Objective Function, 375 Relation of the Maximum-Likelihood and Least-Squares Objective Functions, 376 Assumptions Required for Diagonal Weighting to be Correct, 376 References, 381
Appendix B: Calculation Details of the Modified Gauss– Newton Method B.1 B.2 B.3 B.4 B.5
345
Identities Needed for the Proofs, 392 C.1.1 True Linear Model, 392 C.1.2 True Nonlinear Model, 392
391
xvi
CONTENTS
C.1.3 Linearized True Nonlinear Model, 392 C.1.4 Approximate Linear Model, 392 C.1.5 Approximate Nonlinear Model, 393 C.1.6 Linearized Approximate Nonlinear Model, 393 C.1.7 The Importance of X and X, 394 C.1.8 Considering Many Observations, 394 C.1.9 Normal Equations, 395 C.1.10 Random Variables, 395 C.1.11 Expected Value, 395 C.1.12 Variance – Covariance Matrix of a Vector, 395 C.2 Proof of Property 1: Parameters Estimated by Linear Regression are Unbiased, 395 C.3 Proof of Property 2: The Weight Matrix Needs to be Defined in a Particular Way for Eq. (7.1) to Apply and for the Parameter Estimates to have the Smallest Variance, 396 C.4 References, 398 Appendix D: Selected Statistical Tables D.1
399
References, 406
References
407
Index
427
PREFACE
This book is intended for use in undergraduate and graduate classes, and is also appropriate for use as a reference book and for self-study. Minimal expertise in statistics and mathematics is required for all except a few advanced, optional topics. Knowledge of groundwater principles is needed to understand some parts of the exercises and some of the examples, but students from other fields of science have found classes based on drafts of the book to be very useful. This book has been more than 12 years in the making. Progressively more mature versions have been used to teach short courses most years since 1991. The short courses have been held at the U.S. Geological Survey National Training Center in Denver, Colorado; the International Ground Water Modeling Center at the Colorado School of Mines in Golden, Colorado; the South Florida Water Management District in West Palm Beach, Florida; the University of Minnesota, in Minneapolis, Minnesota; the Delft University of Technology, The Netherlands; Charles University in Prague, the Czech Republic; University of the Western Cape in Belleville, South Africa; and Utrecht University, The Netherlands. A version also was used to teach a semester course at the University of Colorado in Boulder, Colorado in the fall of 2000. Much of what the book has become results from our many wonderful students. We thank them for their interest, enthusiasm, good humor, and encouragement as we struggled to develop many of the ideas presented in this book. We also are deeply indebted to the following colleagues for insightful discussions and fruitful collaborations: Richard L. Cooley, Richard M. Yager, Frank A. D’Agnese, Claudia C. Faunt, Arlen W. Harbaugh, Edward R. Banta, Marshall W. Gannett, and D. Matthew Ely of the U.S. Geological Survey, Eileen P. Poeter of the Colorado School of Mines, Evan R. Anderman formerly of Calibra Consultants and McDonald-Morrissey Associates, Inc., Heidi Christiansen Barlebo of the xvii
xviii
PREFACE
Geological Survey of Denmark and Greenland, John Doherty of Watermark Numerical Computing and the University of Queensland (Australia), Karel Kovar of MNP (The Netherlands), Steen Christensen of Aarhus University (Denmark), Theo Olsthoorn of Amsterdam Water Supply (The Netherlands), Richard Waddel of HSI-Geotrans, Inc., Frank Smits formerly of Witteveen þ Bos, James Rumbaugh of ESI, Inc., Norm Jones of Utah State University, and Jeff Davis of EMS. In addition, thought-provoking questions from users of MODFLOWP, MODFLOW2000, PEST, UCODE, and UCODE_2005 throughout the years have been invaluable. The book benefited from the careful reviews provided by Peter Kitanidis of Stanford University, Eileen Poeter of the Colorado School of Mines and the International GroundWater Modeling Center (USA), Steen Christensen of the University of Aarhus (Denmark), Roseanna Neupauer of the University of Virginia (USA) (now at the University of Colorado, USA), Luc Lebbe of Ghent University (Belgium), David Lerner of the University of Sheffield (England), Chunmiao Zheng of the University of Alabama (USA), and Howard Reeves and Marshall Gannett of the U.S. Geological Survey. It also benefitted from the kind, professional editors and copyeditor at Wiley: Jonathan Rose, Rosalyn Farkas, and Christina Della Bartolomea. All errors and omissions are the sole responsibility of the authors. MARY C. HILL CLAIRE R. TIEDEMAN
1 INTRODUCTION
In many fields of science and engineering, mathematical models are used to represent complex processes and results are used for system management and risk analysis. The methods commonly used to develop and apply such models often do not take full advantage of either the data available for model construction and calibration or the developed model. This book presents a set of methods and guidelines that, it is hoped, will improve how data and models are used. This introductory chapter first describes the contributions of the book, including a description of what is on the associated web site. Sections 1.2 and 1.3 provide some context for the book by reviewing inverse modeling and considering the methods covered by the book relative to other paradigms for integrating data and models. After providing a few definitions, Chapter 1 concludes with a discussion of the expertise readers are expected to possess and some suggested readings and an overview of Chapters 2 through 15.
1.1 BOOK AND ASSOCIATED CONTRIBUTIONS: METHODS, GUIDELINES, EXERCISES, ANSWERS, SOFTWARE, AND POWERPOINT FILES The methods presented in the book include (1) sensitivity analysis for evaluating the information content of data, (2) data assessment strategies for identifying (a) existing measurements that dominate model development and predictions Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty. By Mary C. Hill and Claire R. Tiedeman Published 2007 by John Wiley & Sons, Inc.
1
2
INTRODUCTION
and (b) potential measurements likely to improve the reliability of predictions, (3) calibration techniques for developing models that are consistent with the data in some optimal manner, and (4) uncertainty evaluation for quantifying and communicating the potential error in simulated results (e.g., predictions) that often are used to make important societal decisions. The fourteen guidelines presented in the book focus on practical application of the methods and are organized into four categories: (1) model development guidelines, (2) model testing guidelines, (3) potential new data guidelines, and (4) prediction uncertainty guidelines. Most of the methods presented and referred to in the guidelines are based on linear or nonlinear regression theory. While this body of knowledge has its limits, it is very useful in many circumstances. The strengths and limitations of the methods presented are discussed throughout the book. In practice, linear and nonlinear regression are best thought of as imperfect, insightful tools. Whether regression methods prove to be beneficial in a given situation depends on how they are used. Here, the term beneficial refers to increasing the chance of achieving one or more useful models given the available data and a reasonable model development effort. The methods, guidelines, and related exercises presented in this book illustrate how to improve the chances of achieving useful models, and how to address problems that commonly are encountered along the way. Besides the methods and guidelines, the book emphasizes the importance of how results are presented. To this end, the book can be thought of as emphasizing two criteria: valid statistical concepts and effective communication with resource managers. The most advanced, complex mathematics and statistics are worth very little if they cannot be used to address the societal needs related to the modeling objectives. The methods and guidelines in this book have wide applicability for mathematical models of many types of systems and are presented in a general manner. The expertise of the authors is in the simulation of groundwater systems, and most of the examples are from this field. There are also some surface-water examples and a few references to other fields such as geophysics and biology. The fundamental aspects of systems most advantageously addressed by the methods and guidelines presented in this work are those typical of groundwater systems and shared by many other natural systems. Of relevance are that groundwater systems commonly involve (1) solutions in up to three spatial dimensions and time, (2) system characteristics that can vary dramatically in space and time, (3) knowledge about system variability in addition to the data used directly in regression methods, (4) available data sets that are typically sparse, and (5) nonlinearities that are often significant but not extreme. Four important additional aspects of the book are the exercises, answers, software, and PowerPoint files available for teaching. The exercises focus on a groundwater flow system and management problem to which students apply all the methods presented in the book. The system is simple, which allows basic principles to be clearly demonstrated, and is designed to have aspects that are directly relevant to typical systems. The exercises can be conducted
1.2 MODEL CALIBRATION WITH INVERSE MODELING
3
using the material provided in the book, or as hands-on computer exercises using instructions and files available on the web site http://water.usgs.gov/lookup/ get?crresearch/hill_tiedeman_book. The web site includes instructions for doing the exercises using files directly and/or using public-domain interface and visualization capabilities. It may also include instructions for using selected versions of commercial interfaces. The instructions are designed so that students can maximize the time spent understanding the ideas and the capabilities discussed in the book. Answers to selected exercises are provided on the web site. The software used for the exercises is freely available, open source, well documented, and widely used. The groundwater flow system is simulated using the Ground-Water Flow Process of MODFLOW-2000 (Harbaugh et al., 2000; Hill et al., 2000). The sensitivity analysis, calibration, and uncertainty aspects of the exercises can be accomplished using MODFLOW-2000’s Observation, Sensitivity, and Parameter-Estimation Processes or UCODE_2005 (Poeter et al., 2005). Most of the sensitivity analysis, calibration, and uncertainty aspects of the exercises also can be conducted using PEST (Doherty, 1994, 2005). Relevant capabilities of MODFLOW-2000 and UCODE_2005 are noted as methods and guidelines are presented; relevant capabilities of PEST are noted in some cases. The public-domain programs for interface and visualization are MFI2K (Harbaugh, 2002), GWChart (Winston, 2000), and ModelViewer (Hsieh and Winston, 2002). The web sites from which these programs can be downloaded are listed with the references and on the book web site listed above. The methods and guidelines presented in this book are broadly applicable. Throughout the book they are presented in the context of the capabilities of the computer codes mentioned above to provide concrete examples and encourage use. PowerPoint files designed for teaching of the material in the book are provided on the web site. The authors invite those who use the PowerPoint files to share their additions and changes with others, in the same spirit with which we share these files with you. The use of trade, firm, or product names in this book is for descriptive purposes only and does not imply endorsement by the U.S. Government. The rest of this introductory chapter provides a brief overview of how regression methods fit into model calibration (Section 1.2), some perspective of how the ideas presented here relate to other ideas and past work (Section 1.3), some definitions (Section 1.4), a description of expertise that would assist readers and how to obtain that expertise (Section 1.5), and an overview of Chapters 2 through 15 (Section 1.6).
1.2
MODEL CALIBRATION WITH INVERSE MODELING
During calibration, model input such as system geometry and properties, initial and boundary conditions, and stresses are changed so that the model output matches related measured values. Many of the model inputs that are changed can be characterized using what are called “parameters” in this work. The measured values related
4
INTRODUCTION
to model outputs often are called “observations” or “observed values,” which are equivalent terms and are used interchangeably in this book. The basic steps of model calibration are shown in Figure 1.1. In the context of the entire modeling process, effectively using system information and observations to constrain the model is likely to produce a model that more accurately represents the simulated system and produces more accurate predictions, compared to a modeling procedure that uses these types of data less effectively. The ideas, methods, and guidelines presented in this book are aimed at helping to achieve more effective use of data. The difficulties faced in simulating natural systems are demonstrated by the complex variability shown in Figure 1.2 as discussed by Zhang et al. (2006). Four issues fundamental to model calibration are discussed in the next four sections. These include parameter definition or parameterization, which is the mechanism used to obtain a tractable and hopefully meaningful representation of
FIGURE 1.1 Flowchart showing the major steps of calibrating a model and using it to make predictions. Bold, italicized terms indicate the steps that are directly affected by nonlinear regression, including the use of an objective function to quantify the comparison between simulated and observed values. Predictions can be used during calibration as described in Chapter 8. (Adapted from Herb Buxton, U.S. Geological Survey, written communication, 1990.)
1.2 MODEL CALIBRATION WITH INVERSE MODELING
5
FIGURE 1.2 Experimental results from a subsiding tank, showing the kind of complexity characteristic of deltaic deposits in a subsiding basin. (Reproduced with permission from Paola et al. 2001.)
systems such as that shown in Figure 1.2; the objective function mentioned in Figure 1.1; the utility of inverse modeling, which is also called parameter estimation in this book; and using the model to quantitatively connect observations, parameters, and predictions. 1.2.1
Parameterization
The model inputs that need to be estimated are often distributed spatially and/or temporally, so that the number of parameter values could be infinite. The observations, however, generally are limited in number and support the estimation of relatively few parameters. Addressing this discrepancy is one of the greatest challenges faced by modelers in many fields. Typically, so-called parameterization is introduced that allows a limited number of parameter values to define model inputs throughout the spatial domain and time of interest. In this book, the term “parameter” is reserved for the values used to define model inputs. Consider the parameters defined in three groundwater model examples. Example 1: One parameter represents the hydraulic conductivity of a hydrogeologic unit that occupies a prescribed volume of the model domain and is hydraulically distinctive and relatively uniform. Example 2: One parameter represents a scalar multiplier of spatially varying recharge rates initially specified by the modeler for a given geographic area on the basis of precipitation, vegetation, elevation, and topography. Example 3: One parameter represents the hydraulic head at a constant-head boundary that is used to simulate the water level in a lake.
6
INTRODUCTION
This book focuses primarily on models for which a limited number of parameters are defined. Alternative methods are discussed in Section 1.3.2. Historically, observed and simulated values, such as hydraulic heads, flows, and concentrations for groundwater systems, often were compared subjectively, so that it was difficult to determine how well one model was calibrated relative to another. In addition, in modeling of groundwater and other types of systems, adjustments of parameter values and other model characteristics were accomplished mostly by trial and error, which is time consuming, subjective, and inconclusive. Formal methods have been developed that attempt to estimate parameter values given a mathematical model of system processes and a set of relevant observations. These are called inverse methods, and generally they are limited to the estimation of parameters as defined above. Thus, the terms “inverse modeling” and “parameter estimation” commonly are synonymous, as in this book. For some models, the inverse problem is linear, in that the observed quantities are linear functions of the parameters. In many circumstances of practical interest, however, the inverse problem is nonlinear, and its solution is not as straightforward as for linear problems. This book discusses methods for nonlinear inverse problems. One method of solving such problems is nonlinear regression, which is the primary solution method discussed in this book. The complexity of many real systems and the scarcity of available data sets result in inversions that are often plagued by problems of insensitivity, nonuniqueness, and instability, regardless of how model calibration is achieved. Insensitivity occurs when the observations do not contain enough information to support estimation of the parameters. Nonuniqueness occurs when different combinations of parameter values match the observations equally well. Instability occurs when slight changes in, for example, parameter values or observations radically change simulated results. All these problems are exacerbated when the system is nonlinear. These problems are usually more easily detected when using formal inverse modeling and associated methods than when using trial-and-error methods for calibration. Detecting these problems is important to understanding the value of the resulting model. 1.2.2
Objective Function
In inverse modeling, the comparison of simulated and observed values is accomplished quantitatively using an objective function (Figure 1.1). The simulated and observed values include system-dependent variables (e.g., hydraulic head for the groundwater flow equation or concentration for the groundwater transport equation) and other system characteristics as represented by prior information on parameters. Parameter values that produce the “best fit” are defined as those that produce the smallest value of the objective function. 1.2.3
Utility of Inverse Modeling and Associated Methods
Recent work has clearly demonstrated that inverse modeling and associated sensitivity analysis, data needs assessment, and uncertainty evaluation methods provide
1.2 MODEL CALIBRATION WITH INVERSE MODELING
7
capabilities that help modelers take greater advantage of their models and data, even for simulated systems that are very complex (i.e., Poeter and Hill, 1997; Faunt et al., 2004). The benefits include 1. Clear determination of parameter values that produce the best possible fit to the available observations. 2. Graphical analyses and diagnostic statistics that quantify the quality of calibration and data shortcomings and needs, including analyses of model fit, model bias, parameter estimates, and model predictions. 3. Inferential statistics that quantify the reliability of parameter estimates and predictions. 4. Other evaluations of uncertainty, including deterministic and Monte Carlo methods. 5. Identification of issues that are easily overlooked when calibration is conducted using trial and error methods alone. Quantifying the quality of calibration, data shortcomings and needs, and uncertainty of parameter estimates and predictions is important to model defensibility and transparency and to communicating the results of modeling studies to managers, regulators, lawyers, concerned citizens, and to the modelers themselves. Despite its apparent utility, in many fields, such as groundwater hydrology, the methods described in this book are not routinely used, and calibration using only trial-and-error methods is more common. This, in part, is due to lack of familiarity with the methods and the perception that they require more time than trialand-error methods. It is also because inverse modeling and related sensitivity analysis methods clearly reveal problems such as insensitivity and nonuniqueness, and thereby reveal inconvenient model weaknesses. Yet if they are revealed, such weaknesses often can be reduced or eliminated. This occurs because knowledge of the weaknesses can be used to determine data collection and model development effort needed to strengthen the model. We hope this text will encourage modelers to use, and resource managers to demand, the more transparent and defensible models that result from using the types of methods and ideas described in this book.
1.2.4 Using the Model to Quantitatively Connect Parameters, Observations, and Predictions The model quantitatively connects the system information and the observations to the predictions and their uncertainty. The entities Parameters, Observations, and Predictions are in bold type in Figure 1.1 because these entities are directly used by or produced by the model, whereas the system information often is indirectly used to create model input. Many of the methods presented in this book take advantage of the quantitative links the model provides between what is referred to in this book as the triad of the observations, parameters, and predictions.
8
INTRODUCTION
The depiction of model calibration shown in Figure 1.1 is unusual in that it suggests simulating predictions and prediction uncertainty as model calibration proceeds. When execution times allow, it is often useful to include predictive analyses during model calibration so that the dynamics affecting model predictions can be better understood. Care must be taken, of course, not to use such simulations to bias model predictions. 1.3 RELATION OF THIS BOOK TO OTHER IDEAS AND PREVIOUS WORKS This section relates the ideas of this book to predictive models and other literature. 1.3.1
Predictive Versus Calibrated Models
When simulating natural systems, the objective is often to produce a model that can predict, accurately enough to be useful, for assessing the consequences of introducing something new in the system. In groundwater systems, this may entail new pumpage or transport of recently introduced or potential contamination. Ideally, model inputs would be determined accurately and completely enough from directly related field data to produce useful model results. This is advantageous because the resulting model is likely to be able to predict results in a wide range of circumstances, and for this reason such models are called predictive models (e.g., see Wilcock and Iverson, 2003; National Research Council, 2002). However, commonly quantities simulated by the model can be more readily measured than model inputs. The best possible determination of model inputs based on directly related field data can produce model outputs that match the measured equivalents poorly. If the fit is poor enough that the utility of model predictions is questionable, then a decision needs to be made about how to proceed. The choices are to use the predictive model, which has been shown to perform poorly in the circumstances for which testing is possible, or to modify the model so that, at the very least, it matches the available measured equivalents of model results. A model modified in this way is called a calibrated model. There is significant and important debate about the utility of predictive and calibrated models, and it is our hope that the debate will lead to better methods of measuring quantities directly related to model inputs. We would rejoice with all others in the natural sciences to be able to always use predictive models. Until then, however, it is our opinion that methods and guidelines that promote the best possible use of models and data in the development of calibrated models are critical. It is also our belief that such methods and guidelines can play a role in informing and focusing the efforts of developing field methods that may ultimately allow predictive models to be used in more circumstances. 1.3.2
Previous Work
For the most part, comments in this introductory chapter are limited to the history, evolution, and status of nonlinear regression and modeling as related to groundwater systems. Comments about how specific methods or ideas relate to previous
1.3 RELATION OF THIS BOOK TO OTHER IDEAS AND PREVIOUS WORKS
9
publications appear elsewhere in the book. This section contains the broadest discussion of parameterization methods presented in the book. The topics covered by this book have been addressed by others using a variety of different methods, and have been developed for and applied to many different fields of science and engineering. We do not attempt to provide a full review of all work on these topics. Selected textbooks are as follows. Parker (1994), Sun (1994), Lebbe (1999), and Aster et al. (2005) discuss nonlinear regression in the field of geophysics. More general references for nonlinear regression and associated analyses include Bard (1974), Beck and Arnold (1977), Belsley et al. (1980), Seber and Wild (1989), Dennis and Schnabel (1996), and Tarantola (2005). Saltelli et al. (2000, 2004) provide comprehensive overviews of sensitivity-analysis methods. This book focuses on what Saltelli et al. describe as local sensitivity methods, and includes new sensitivity-analysis methods not included in the previous books. The pioneers of using regression methods in groundwater modeling were Cooley (1977) and Yeh and Yoon (1981). Some of the material in this book was first published in U.S. Geological Survey reports (Cooley and Naff, 1990; Hill, 1992; Hill, 1994; Hill, 1998). Cooley and Naff (1990) presented a modified Gauss – Newton method of nonlinear regression that with some modification is used in Chapter 5, and residual analysis ideas derived from early editions of Draper and Smith (1998) that are used in Chapter 6. Hill (1992) presents sensitivity-analysis and residual-analysis methods used in Chapters 4 and 6. Cooley and Naff (1990), and Hill (1992), and Hill (1994) present methods of residual analysis and linear uncertainty analysis that are used in Chapters 6 and 8. Hill (1998) enhanced the methods presented in the previous works and presents the first version of the guidelines that are described in Chapters 10 through 14. Various aspects of the guidelines have a long history, and relevant references are cited in later chapters. To the authors’ knowledge, these guidelines provide a more comprehensive foundation for the calibration and use of models of complex systems than any similar set of published guidelines. In general, the book expands the previously presented material, presents some new methods, and includes an extensive set of exercises. Achieving Tractable Problems Regression is a powerful tool for using data to test hypothesized physical relations and to calibrate models in many fields (Seber and Wild, 1989; Draper and Smith, 1998). Despite its introduction into the groundwater literature in the 1970s (reviewed by McLaughlin and Townley, 1996), regression is only starting to be used with any regularity to develop numerical models of complicated groundwater systems. The scarcity of data, nonlinearity of the regression, and complexity of the physical systems cause substantial difficulties. Obtaining tractable models that represent the true system well enough to yield useful results is arguably the most important problem in the field. The only options are (1) improving the data, (2) ignoring the nonlinearity, and/or (3) carefully ignoring some of the system complexity. Scarcity of data is a perpetual problem not likely to be alleviated at most field sites despite recent impressive advances in geophysical data collection and analysis (e.g., Eppstein and Dougherty, 1996; Hyndman and Gorelick, 1996; Lebbe, 1999; Dam and Christensen, 2003). Methods that ignore nonlinearity are presented by, for example, Kitanidis (1997) and Sun (1994, p. 182). The large
10
INTRODUCTION
changes in parameter values that occur in most nonlinear regressions of many problems after the first iteration, however, indicate that linearized methods are unlikely to produce satisfactory results in many circumstances. This leaves option 3, which is discussed in the following paragraphs. Defining a tractable and useful level of parameterization for groundwater inverse problems has been an intensely sought goal, focused mostly on the representation of hydraulic conductivity or transmissivity. Suggested approaches vary considerably. The most complex parameterizations are cell- or pixel-based methods in which hydraulic conductivity or transmissivity parameters are defined for each model cell, element, or other basic model entity, and prior information or regularization is used to stabilize the solution (e.g., see Tikhonov and Arsenin, 1977; Clifton and Neuman, 1982; Backus, 1988; McLaughlin and Townley, 1996). The simplest parameterizations require homogeneity, such that, at the extreme, one parameter specifies hydraulic conductivity throughout the model. As more parameters are defined and the information contained in the observations is overwhelmed, prior information on parameters and/or regularization on observations and/or parameters become necessary to attain a tractable problem. In this book, we use definitions of prior information and regularization derived from Backus (1988). When applied to parameters, prior information and regularization produce similar penaltyfunction terms in the objective function. For prior information, the weighting used approximates the reliability of the prior information based on either classical or Bayesian statistical arguments. Essentially, classical statistical arguments are based on sampling methods; Bayesian statistical arguments are, at least in part, based on belief (Bolstad, 2004). In contrast, for regularization the weighting generally is determined as required to produce a tractable problem, as represented by a unique set of estimated parameter values. The resulting weights generally are much larger than can be justified based on what could possibly be known or theorized about the parameter values and distribution. For both prior information and regularization, the values used in the penalty function need to be unbiased (see the definition in Section 1.4.2). Between the two extreme parameterizations mentioned previously, there is a wide array of designs ranging from interpolation methods such as pilot points (RamaRoa et al., 1995; Doherty, 2003; Moore and Doherty, 2005, 2006) to zones of constant value designed using geologic information (see Chapter 15 for examples). For example, the Regularization Capability of the computer code PEST (Doherty, 1994, 2005) typically allows many parameters to be estimated. Indeed, the number of parameters may exceed the number of observations. Parameter estimation is made possible by requiring that the parameter values satisfy additional considerations. Most commonly, the parameter distribution is required to be smooth. This and other considerations are discussed by Tikhonov and Arsenin (1977) and Menke (1989). More recent approaches include the superparameters of Tonkin and Doherty (2006) and the representer method of Valstar et al. (2004). The former uses singular value decomposition to identify a few major eigenvectors from sensitivity matrices; only the “superparameters” defined by the eigenvectors are estimated by regression.
1.3 RELATION OF THIS BOOK TO OTHER IDEAS AND PREVIOUS WORKS
11
Parameterizations with many parameters are advantageous in that they minimize user-imposed simplifications, but they have the following problems: (1) they do not eliminate the scale problem if heterogeneities smaller than the grid or parameter scale are important, as they often are in transport problems, for example; (2) they generally require more and better hydraulic-conductivity or transmissivity data than are available in most circumstances or unsupportable assumptions about smoothness; and (3) they can easily lead to overfitting the observations and a resulting decline in predictive accuracy. Historically, parameterization methods that resulted in many parameters also were unable to accommodate easily knowledge about geologic structure. Gradually, the ability to apply geologic constraints within the context of many defined parameters is being developed and provides exciting possibilities. Simpler parameterizations (simpler in that there are fewer defined parameters) can be achieved using zonation, interpolation, or eigenvectors of the variance– covariance matrix of grid-scale parameters (e.g., Jacobson, 1985; Sun and Yeh, 1985; Cooley et al., 1986; RamaRao et al., 1995; Eppstein and Dougherty, 1996; Reid, 1996; D’Agnese et al., 1999; Tonkin and Doherty, 2006). Stochastic methods (e.g., Gelhar, 1993; Kitanidis, 1995; Yeh et al., 1995; Carle et al., 1998) also generally fall into this category, although they share some of the characteristics of the gridbased methods. These simpler parameterizations produce a more tractable problem, but it is not clear what level of simplicity diminishes utility. The principle of parsimony (Box et al., 1994; Parker, 1994) suggests that simple models should be considered, but the perception remains that many complex systems cannot be adequately represented using parsimonious models. For example, Gelhar (1993, p. 341) claims that for groundwater systems “there is no clear evidence that [nonlinear regression] methods [using simple parameterizations] actually work under field conditions.” Indeed, Beven and Binley (1992) even suggest that for some problems it may be best to abandon the concept of parameterizations simple enough to produce an optimal set of parameter values. A concept as useful as parsimony should not be given up lightly, yet there have been few conclusive evaluations of the parameter complexity needed to produce useful results for groundwater models (Hill et al., 1998). In this book we proceed from the point of view that it is best to introduce complexity slowly and carefully, which is taken to mean increase the number of parameters slowly and carefully. One reason for this approach is that models with a few parameters can be used to learn things about a system that are true for all parameterizations but are more difficult to determine when many parameters are defined. As related to the famous quote by George E. P. Box, “All models are wrong, but some are useful,” the idea is that parsimony is likely to play an important role in achieving useful models. We suggest that simpler parameterizations are useful for many models and for the initial phases of development of all models. Direct and Indirect Inverse Modeling In groundwater inverse modeling, methods have been classified as indirect and direct (Neuman, 1973; Yeh, 1986; Sun, 1994). This book considers indirect inverse modeling, which uses available observation data and optimization techniques to estimate model input values.
12
INTRODUCTION
Direct inverse modeling is dramatically different: available, usually sparse observations are interpolated or extrapolated everywhere in the model domain to create “observations” throughout the system. Using these “observations,” the differential equations describing the simulated processes (such as groundwater flow or transport) are used to calculate the model input values (parameters) directly. The direct inverse modeling methods have been in existence longer than the indirect methods but have been shown consistently to be unstable in the presence of common measurement errors (Yeh, 1986). The direct methods do not use sensitivities and rarely calculate them, so these methods cannot be used to compute many of the statistics used for model evaluation that are presented in this book.
1.4
A FEW DEFINITIONS
This section defines what is meant by a linear and a nonlinear model in the context of parameter estimation. It also defines four terms that are often confusing and states how the terms are used in this book. 1.4.1
Linear and Nonlinear
As discussed in Section 1.1, this book focuses on models for which parameter estimation is nonlinear. In this context, nonlinearity results when simulated equivalents to observations are nonlinearly related to parameters. For example, consider groundwater flow. In a confined groundwater flow system, hydraulic head is a linear function of space and time, which is why superposition can be used (Reilly et al., 1987). In contrast, for the same circumstances, head is a nonlinear function of many parameter values of interest, such as hydraulic conductivity. The simplest form of the groundwater flow equation, Darcy’s Law, can be used to demonstrate both linearity with respect to the spatial dimension and nonlinearity with respect to hydraulic conductivity. This was shown by Hill et al. (2000, pp. 16– 18) and is presented here in a modified form. Darcy’s Law relates the hydraulic head along the length of a cylinder packed with saturated porous media and flow through the cylinder. Darcy’s Law can be expressed as Q ¼ KA
@h @X
(1:1)
where Q ¼ flow produced by imposing different hydraulic heads at opposite ends of a cylinder containing homogenous, saturated, porous media [L3/T]; K ¼ hydraulic conductivity of the saturated porous media [L/T]; A ¼ cross-sectional area of the cylinder [L2]; X ¼ distance along an axis parallel to the length of the cylinder and, therefore, parallel to the direction of flow [L]; h ¼ hydraulic head at any distance X along the cylinder [L].
13
1.4 A FEW DEFINITIONS
Equation (1.1) can be solved for the hydraulic head at any distance, X, to achieve h ¼ h0
Q X KA
(1:2)
where h0 is the hydraulic head at X ¼ 0. The derivatives @h/@Q and @h/@K are sensitivities in a parameter-estimation problem in which Q and K are estimated. By using partial derivative notation, the derivatives of Eq. (1.2) with respect to X, Q, and K are @h Q ¼ @X KA @h 1 ¼ X @Q KA @h Q ¼ 2 X @K K A
(1:3) (1:4) (1:5)
The hydraulic head is considered to be a linear function of X because @h/@X is independent of X. Hydraulic head also is a linear function of Q, because @h/@Q is independent of Q. However, hydraulic head is a nonlinear function of K because @h/@K is a function of K. As in this simple example, sensitivities with respect to flows, such as Q, are nearly always functions of aquifer properties; sensitivities with respect to aquifer properties, such as K, are nearly always functions of the aquifer properties and the flows. If Q and K are both estimated, both situations make the regression nonlinear. 1.4.2
Precision, Accuracy, Reliability, and Uncertainty
The terms precision, accuracy, reliability, and uncertainty are used in this book and by many others and can cause confusion. Formal definitions of these terms as related to estimated parameters and predictions are described here using an archery analogy and by relating them to the statistical terms bias and variance or standard error of the regression. (The archery analogy was suggested by Richard L. Cooley, retired from the U.S. Geological Survey, oral communication, 1988). Precision: In archery, a set of shots is precise if the shots fall within a narrow range, regardless of whether they are near the bull’s eye. A parameter estimate or prediction is more precise if associated coefficients of variation or confidence intervals are smaller. A model fits the observations more closely if the objective function is smaller, and this may indicate a more precise model depending on the measure used (see Chapter 6). More precise estimates or predictions are said to have lower variance. A precise parameter estimate results when the observations provide abundant information about the parameter, given the model construction. A precise prediction results when the parameters important to the prediction are precisely estimated.
14
INTRODUCTION
Accuracy: In archery, a set of shots is accurate if the shots are distributed evenly about the bull’s eye, though they may fall within a large radius around the bull’s eye. Accurate estimates and predictions are, on average, close to the true, unknown value, but the range of values may be large. An accurate parameter estimate results when (1) the model is accurate and (2) the observations are unbiased. The observations may or may not provide abundant information about the parameter; abundant information would result in a parameter estimate that is both accurate and precise if points 1 and 2 were satisfied. An accurate prediction results when (1) the model is accurate and (2) parameter values important to the prediction are accurate. The observations may or may not provide much information about the parameters important to predictions. Accurate estimates and predictions are sometimes referred to as unbiased; inaccurate estimates and predictions are biased. Reliability: In archery, a set of shots is reliable if the shots are distributed in a narrow range about the bull’s eye. Reliable parameter estimates and predictions are both accurate and precisely determined. Reliable parameter estimates and predictions result when (1) the model accurately represents processes of importance to the observations and the predictions, and (2) the observations contain much information relevant to the predictions, so that the parameters important to the predictions are reliably estimated. From a probabilistic perspective, reliability is often defined as 1.0 minus the probability of failure. Uncertainty: The direct inverse of reliability, so often defined as the probability of failure. While these terms have distinct meanings, in practice, “accurate” often is used when “precise” is more applicable. In this book, we had to choose between always using these terms as defined here, or recognizing that many readers would proceed without having these definitions firmly in mind and would possibly be confused by proper usage. In some circumstances we chose more common usage to create what we thought would be an easier learning experience.
1.5
ADVANTAGEOUS EXPERTISE AND SUGGESTED READINGS
Most of this book requires little expertise in statistics and mathematics. Familiarity with basic statistics is useful, including definitions of the following terms: samples and populations; mean, standard deviation, variance, and coefficient of variation of samples and populations; normal probability distribution; log-normal probability distribution; confidence interval; and significance level. Familiarity with simple linear regression also is helpful. Good elementary references for these topics include Benjamin and Cornell (1970), Ott (1993), Davis (2002), and Helsel and Hirsch (2002). Useful advanced texts include Cook and Weisberg (1982), Seber and Wild (1989), and Draper and Smith (1998).
15
1.5 ADVANTAGEOUS EXPERTISE AND SUGGESTED READINGS
To use the exercises to learn the principles of sensitivity analysis, nonlinear regression, and associated evaluation of the regression, students will benefit from understanding groundwater flow problems well enough to follow the discussions of the physical problem considered. To perform the optional simulations of the groundwater model used in many of the exercises that accompany the methods, students will benefit from familiarity with the computer program MODFLOW-2000 (McDonald and Harbaugh, 1988; Harbaugh et al., 2000; Hill et al., 2000; Anderman and Hill, 2001). When this book is used to teach a semester- or quarter-long academic course, it may be desirable to start with two to four weeks of instruction on statistics and linear regression. Recommended topics include graphical data analysis, hypothesis testing, simple linear regression, and multiple linear regression. If, for example, Helsel and Hirsch (2002) is used, the readings and exercises in Table 1.1 address the suggested material. If Davis (2002) is used to learn basic statistics, the topics in Table 1.2 are suggested.
TABLE 1.1 Suggested Reading Assignments and Exercises in Helsel and Hirsch (2002) Chapter
Topic
Reading Assignment
Exercise
2 3 4
Graphical data analysis Uncertainty Hypothesis testing
5 8
t-Tests Correlation coefficients
None 3.1 (parametric interval) 4.1 (for untransformed data) 5.2 None
9
Simple linear regression
Introduction; Section 2.1.5 Sections 3.1, 3.2, 3.4 Introduction; Sections 4.1, 4.2, and 4.4 Introduction; Section 5.2 Introduction; Sections 8.1 and 8.4 All except Section 9.6
Multiple regression
All except Section 11.8
11
TABLE 1.2 Chapter 2 2 2 2 2 4 4 6
9.1. Use data subsets to show the effect of small data sets. 11.1
Suggested Reading Assignments and Exercises in Davis (2002) Topic Summary statistics Joint variation of two variables Comparing normal populations Testing the mean, P-values, significance Confidence limits, t-distribution Runs tests Simple linear regression Multiple regression
Reading Assignment pp. 34 – 39 pp. 40 – 46 pp. 55 – 58 pp. 60 – 66 pp. 66 – 75 pp. 185 – 191 pp. 191 – 204, 227 – 228 pp. 462 – 470
16
1.6
INTRODUCTION
OVERVIEW OF CHAPTERS 2 THROUGH 15
The primary topics of this book are (1) methods for sensitivity analysis, data assessment, model calibration, and uncertainty analysis developed on the basis of inverse modeling theory; and (2) guidelines for the effective application of these methods. The methods are presented in Chapters 3 to 9 and the guidelines are presented in Chapters 10 to 14. Field applications and tests of the methods and guidelines are presented in Chapter 15. Chapter 2 presents an overview of the exercises and the computer programs used in this work. Three appendixes go into greater depth concerning several aspects of the nonlinear regression method used and one appendix presents selected statistical tables. Chapters 2 through 15 are described in more detail in the following paragraphs. Chapter 2 presents an overview of (1) three computer codes for inverse modeling that are used throughout the book, (2) a hypothetical groundwater management problem to which the methods are applied, and (3) exercises that use this groundwater management problem to clearly demonstrate the methods. Chapters 3 to 5 present methods for measuring model fit, initial model sensitivity analysis, and parameter estimation. Chapter 3 discusses how observations of the simulated system are compared to equivalent simulated values using objective functions. Terms of the objective functions are defined, and least-squares objectivefunction surfaces are introduced. Chapter 4 discusses sensitivity analysis methods for evaluating the information that the observations provide toward estimating a set of parameters and using such an analysis to design parameterizations and decide what parameters to estimate. Several statistics are presented that are independent of model fit and thus can be applied prior to having achieved a successful inversion. These are called fit-independent statistics. Chapter 5 presents the modified Gauss – Newton gradient method for estimating parameter values that produce the best fit to the observations by minimizing the least-squares objective function. Chapters 6 to 8 present methods for evaluating model fit, parameter estimates, data needs, and prediction sensitivity and uncertainty. Most of these methods involve calculating and evaluating diagnostic and inferential statistics and conducting graphical analyses. Chapter 6 discusses methods for evaluating model fit, including using residuals (differences between observed and simulated values) and weighted residuals to calculate statistical measures of fit, and graphs that can be used to help detect model error and assess normality of weighted residuals. Chapter 7 presents methods for evaluating estimated parameters and their uncertainty, including confidence intervals and measures of the support that the observations provide for the estimated parameter values. Methods for assessing model linearity are also discussed. Chapter 8 discusses evaluation of model predictions and their sensitivity and uncertainty, and methods for identifying data that would improve model predictions. Topics include measures for assessing the importance to predictions and to confidence intervals on predictions of observations and prior information on parameters. Monte Carlo methods of evaluating uncertainty are discussed briefly. Chapter 9 presents methods for calibrating transient and transport models, and for recalibrating and reevaluating existing models when new data become available.
1.6 OVERVIEW OF CHAPTERS 2 THROUGH 15
TABLE 1.3
17
Guidelines for Effective Model Calibration Model Development (Chapter 11)
1. 2. 3. 4. 5. 6. 7.
Apply the principle of parsimony (start very simple; build complexity slowly) Use a broad range of system information (soft data) to constrain the problem Maintain a well-posed, comprehensive regression problem Include many kinds of observations (hard data) in the regression Use prior information carefully Assign weights that reflect errors Encourage convergence by making the model more accurate and by evaluating the observations 8. Consider alternative models Model Testing (Chapter 12) 9. Evaluate model fit 10. Evaluate optimized parameter values Potential New Data (Chapter 13) 11. Identify new data to improve simulated processes, features, and properties 12. Identify new data to improve predictions Prediction Uncertainty (Chapter 14) 13. Evaluate prediction uncertainty and accuracy using deterministic methods 14. Quantify prediction uncertainty using statistical methods
Exercises at the ends of Chapters 3 to 9 demonstrate the methods. Most of the exercises involve the simple hypothetical groundwater management problem mentioned in the beginning of this chapter. Chapters 10 to 14 present fourteen guidelines that address using the methods presented in Chapters 3 to 9 to analyze, simulate, calibrate, and evaluate models of complex systems. The guidelines are grouped into four topics: (1) model development, (2) model testing, (3) potential new data, and (4) prediction uncertainty. Chapter 10 introduces the guidelines and Chapters 11 to 14 each focus on the guidelines that address one of the four topics. Table 1.3 lists the guidelines to introduce the reader to the basic ideas they promote. For example, a fundamental aspect of the approach is to start simple and to build complexity slowly. Chapter 15 addresses the use and testing of the methods and guidelines. First, issues of computer execution time, which are nearly always of concern when calibrating models, are discussed. Then, selected publications describing tests of the guidelines using synthetic test cases and use of the guidelines in field applications are listed. The remainder of Chapter 15 discusses a few aspects of two field cases to illustrate some of the methods and guidelines presented in the book.
2 COMPUTER SOFTWARE AND GROUNDWATER MANAGEMENT PROBLEM USED IN THE EXERCISES
This chapter briefly describes the computer programs and the groundwater management problem on which exercises presented in Chapters 2 through 9 are based. The exercises can be completed using results provided in figures and tables of the book, or hands-on computer exercises can be pursued. The groundwater system can be simulated using the Ground-Water Flow Process of MODFLOW-2000 or MODFLOW_2005. Sensitivity analysis, parameter estimation, data needs assessment, predictions, and uncertainty evaluation can be performed using the Observation, Sensitivity, and Parameter-Estimation Processes of MODFLOW-2000 or the capabilities of UCODE_2005 or PEST. Explicit instructions for the MODFLOW-2000 and UCODE_2005 and possibly for other codes and graphical interfaces are provided on the web site listed in Chapter 1, Section 1.1 of this book. Performing the exercises using the computer programs or reviewing the instructions for doing so is expected to facilitate use of the methods in the simulation of other systems.
2.1 COMPUTER PROGRAMS MODFLOW-2000, UCODE_2005, AND PEST The computer software used for the exercises was listed in Chapter 1, Section 1.1, and access through web sites is described there. The discussion here refers to MODFLOW-2000 Version 1.15, UCODE_2005 Version 1.0, and PEST Version 9.0. Later versions of these codes may have capabilities not discussed here. Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty. By Mary C. Hill and Claire R. Tiedeman Published 2007 by John Wiley & Sons, Inc.
18
2.1 COMPUTER PROGRAMS MODFLOW-2000, UCODE_2005, AND PEST
19
MODFLOW-2000 is applicable only to solution of the transient, threedimensional groundwater flow equation represented using a control-volume finitedifference numerical method. In MODFLOW-2000, groundwater flow is simulated using the Ground-Water Flow Process, and inverse modeling calculations are performed using the Observation, Sensitivity, and Parameter-Estimation Processes. UCODE_2005 and PEST are universal inverse codes with broad applicability. They can be used with any simulation model that has ASCII input and output files and can be executed from a command prompt. Both programs use very similar template and instruction files to interact with the simulation model. MODFLOW-2000, UCODE_2005, and PEST have many capabilities in common and have a few key differences. Table 2.1 lists and compares selected capabilities of each program for defining observations and parameters and lists some graphical user interfaces that support the programs. All of these codes perform inverse modeling, posed as a parameter-estimation problem, by calculating parameter values that minimize a weighted least-squares objective function using nonlinear regression. The methods shared by MODFLOW-2000 and UCODE_2005 are described in Chapters 3 and 5. In addition, UCODE_2005 has a trust region option that is mentioned briefly in Chapter 5. The trust region approach can reduce the number of iterations required for difficult problems by as much as 50 percent (Mehl and Hill, 2002). For the problems considered in the exercises, PEST differs from the methods described in this book mostly in its definition of the Marquardt parameter and its use of a line search capability that improves regression performance in some circumstances. See Chapter 5, Section 5.1.1 for comments about the Marquardt parameter. The method for calculating sensitivities in MODFLOW-2000 differs substantially from that in UCODE_2005 and PEST. In MODFLOW-2000, the Sensitivity Process calculates sensitivities using the sensitivity-equation method, which is the most accurate method available. Implementing the sensitivity-equation method requires extensive custom programming, which can easily double the size of a code. Any subsequent change to the capabilities of the forward simulation generally requires additional coding to accommodate sensitivity-equation sensitivities. The required substantial investment means that sensitivity-equation sensitivities probably will be available for only a very few codes and will rarely be available for all possible parameters, observations, or simulated dynamics. Codes that calculate sensitivity-equation sensitivities can be produced using a program called ADIFOR (http://www.unix.mcs.anl.gov/autodiff/ADIFOR/). The resulting code tends to be difficult to develop further because the alterations created by the program are not modularly constructed and not clearly coded, but this option can be very useful. UCODE_2005 and PEST can use sensitivities generated by programs such as MODFLOW-2000, or sensitivities can be calculated using perturbation methods. Perturbation sensitivities tend to be less accurate than sensitivity-equation sensitivities but require no custom programming. This is what allows these codes to be used with any process model. The less accurate sensitivities primarily can affect performance in two ways: (1) convergence of the nonlinear regression can be less
20
COMPUTER SOFTWARE, MANAGEMENT PROBLEM FOR EXERCISES
TABLE 2.1 Capabilities of MODFLOW-2000 Version 1.15, UCODE_2005 Version 1.0, and PEST Version 9.0 Capability OBSERVATION DEFINITION Heads and temporal changes in head, not necessarily at cell centers Flows at head-dependent boundaries, not necessarily ending at a cell boundary Flows at constant-head boundaries Heads at constant-head boundaries Advective transport Any other observation PARAMETER DEFINITION Zone arrays Multiplication arrays Pilot points interpolation method Additive parametersc Association of a parameter with more than one model characteristic (e.g., layer and riverbed hydraulic conductivity) REGRESSION CAPABILITIES Trust region Line search a
MODFLOW-2000a
UCODE_2005 and PESTa
Yes
Yes
Yes
Yes
Yes Yes Yes, with the ADV Package No
Yes Yes Yes Yes
Yes Yes Yes, using multiplication arrays Easy No
Difficultb Difficultb UCODE_2005, difficultb; PEST, efficient through regularization capability Difficultb Yes
No No
UCODE_2005 onlyd PEST onlyd
MODFLOW-2000 (Harbaugh et al., 2000; Hill et al., 2000), UCODE_2005 (Poeter et al., 2005) and PEST (Doherty, 2005) are public domain, open source programs. For websites, see Section 1.1 or the reference list. b Difficult if achieved using the capabilities of the listed code(s). If the process model, such as the GroundWater Flow Process of MODFLOW-2000, performs these functions easily, these codes can take advantage of that. c The additive parameter capability of MODFLOW-2000 is very general, allowing most interpolation methods to be applied to any characteristic that can be represented using parameters. This includes variations in streambed characteristics along the length of a river and hydraulic-conductivity variations caused by depositional processes. d The trust region approach in UCODE_2005 can reduce iterations for difficult problems by 50 percent (Mehl and Hill, 2002). Performance of the line-search method has not been documented.
2.2 GROUNDWATER MANAGEMENT PROBLEM USED FOR THE EXERCISES
21
stable for poorly conditioned problems, as demonstrated by Mehl and Hill (2002), so that UCODE_2005 (without the trust region option) and PEST may not converge when MODFLOW-2000 does converge, and (2) parameter correlation coefficients calculated from the variance– covariance matrix on the parameter estimates can be inaccurate enough to be misleading (Hill and Østerby, 2003). Consequently, parameter correlation coefficients calculated by UCODE_2005 and PEST cannot be used as reliably as those calculated by MODFLOW-2000 to determine the existence of extreme parameter correlation. This issue is discussed in more detail in Chapter 4, Section 4.3; Exercise 4.1 clearly demonstrates this problem. MODFLOW-2000 and UCODE_2005 or PEST can be used together to simplify the processes that UCODE_2005 and PEST use to define parameters and simulated equivalents of observations. This is advantageous when the MODFLOW-2000 Ground-Water Flow and Observation Process capabilities apply, but some other aspect of the problem, such as the estimation of a parameter of interest, is not supported by MODFLOW-2000. In this situation, the MODFLOW-2000 Ground-Water Flow Process capabilities are used as the process model for UCODE_2005 or PEST, and the MODFLOW-2000 Parameter and Observation capabilities are used to simplify the parameter substitution and extraction of simulated values in UCODE_2005 or PEST. As mentioned earlier, both UCODE_2005 and PEST can use sensitivities calculated by the process model; they can calculate other needed sensitivities using the perturbation method.
2.2 GROUNDWATER MANAGEMENT PROBLEM USED FOR THE EXERCISES The exercises in this book focus on a groundwater management problem within the hypothetical geographic area depicted in Figure 2.1a. The groundwater system is of interest because pumping wells are being completed in aquifers 1 and 2 to supply local domestic and industrial water needs. In addition, a proposal has been submitted to local authorities for construction of a landfill (Figure 2.1a). The developers claim that the landfill is outside the capture zone of the proposed wells, and that any effluent from the landfill will reach the river sufficiently diluted to meet regulatory standards. Local authorities would like to investigate this claim. Data on the flow system without pumpage are available for model development and are from a period of time that is consistent with long-term average conditions. Seasonal variations appear to be small. Upon completion of the water-supply wells, transient and steady-state data can be collected under pumping conditions. A key issue is whether the decision on the proposed landfill should be delayed until after the transient data are collected. The developers have requested a quick decision. The flow system is complicated enough to require a numerical model for its simulation, but lacks some complexities typical of many field problems. Most notably, the subsurface material lacks local heterogeneity. The upper aquifer and confining bed are homogeneous, and the lower aquifer has a mild degree of regional heterogeneity. This is advantageous because the system is simple enough to clearly demonstrate the methods in the book. Also, using a synthetic test case means that
22
COMPUTER SOFTWARE, MANAGEMENT PROBLEM FOR EXERCISES
FIGURE 2.1 System and model used in exercises: (a) flow system; (b) finite-difference grid, boundary conditions, and locations of observation wells, proposed pumping wells and the landfill; (c) flows through a cross section; and (d) hydraulic heads. Parts (c) and (d) are produced using the true parameter values and no pumping.
2.2 GROUNDWATER MANAGEMENT PROBLEM USED FOR THE EXERCISES
23
FIGURE 2.1 Continued.
results can be compared to “truth.” Extension of the methods to realistic problems is discussed throughout the book, and especially in the guidelines and examples presented in Chapters 10 through 15. 2.2.1
Purpose and Strategy
A numerical model is needed to address this groundwater management problem because there are multiple aquifer layers and spatial variation in hydraulic conductivity and boundary conditions that are not conducive to analytic solution. To coordinate with data availability, a steady-state model without pumpage is developed first and used to produce a preliminary evaluation of effluent transport from the landfill when pumpage is applied. The effluent transport is simulated using particle tracking methods. The concern here is whether the effluent goes to the well—other issues like first arrival time are not of concern. If the particle goes to the well, there is no reason to use more computationally demanding transport simulations. The developers of the landfill raise important questions about the steady-state model, so that additional data are needed. We use the steady-state model to evaluate potential new data that can be collected once the supply wells are completed, and we use the analysis in combination with field considerations to design a monitoring network. The data are collected, a transient model is produced that includes the pumping wells, and the model is recalibrated using the data from both steady-state and stressed conditions. Finally, the effluent transport issue is reevaluated using the recalibrated model. 2.2.2
Flow System Characteristics
The groundwater flow system used for most of the exercises is shown in Figure 2.1a. The flow system is comprised of two confined aquifers separated by a confining unit. Inflow occurs as areal recharge and as flow across the boundary with the adjoining
24
COMPUTER SOFTWARE, MANAGEMENT PROBLEM FOR EXERCISES
hillside. At steady state without pumping, outflow occurs only as discharge to the river. In transient simulations and for steady-state simulations with pumpage, discharge is simulated from both of the model layers at the location shown in Figure 2.1a. MODFLOW-2000 is used to simulate groundwater flow, and its ADV Package (Anderman and Hill, 2001) is used to simulate advective transport of effluent from the landfill. The domain is divided laterally into 18 rows and 18 columns (Figure 2.1b). Each confined aquifer is represented by one model layer. In layer 1, hydraulic conductivity is uniform. In layer 2, hydraulic conductivity increases linearly in steps with distance from the river, with each step consisting of a pair of model columns. The confining bed is not represented as a separate model layer, but as a vertical hydraulic conductivity that controls flow between the two model layers. Boundary conditions include two zones of areal recharge applied to model layer 1, one zone coincident with the 9 columns closest to the river, and the other coincident with the 9 columns closest to the hillside (Figure 2.1a). Inflow from the hillside to layers 1 and 2 and outflow from layer 1 to the river are simulated as head-dependent boundaries (Figure 2.1b). No-flow boundaries are specified on the bottom of the model domain and on all model sides except that adjacent to the hillside. True steady-state simulated volumetric flows in the system without pumping are illustrated in Figure 2.1c, and simulated hydraulic heads are shown in Figure 2.1d. Without pumping, the flow system is actually two-dimensional because all stresses, boundary conditions, and subsurface properties and, therefore, all hydraulic heads and flows are the same for any cross section perpendicular to the river.
2.3
EXERCISES
Exercises are presented at the end of Chapters 2 – 9, and cover all of the methods and ideas included in those chapters. Most of the exercises involve the simple groundwater management problem described in Section 2.2. Through the development, calibration, and analysis of the steady-state and transient models that address this problem, the following steps are accomplished: Steady-State Model Simulate steady-state hydraulic heads (Exercises 2.1 and 2.2) Define steady-state parameters and observations (Exercises 3.1 and 3.2) Evaluate the initial steady-state model (Exercise 3.3) Perform sensitivity analysis (Exercise 4.1) Calibrate the steady-state model (Exercise 5.2) Evaluate model fit to observations and prior information (Exercises 6.1 and 6.2) Evaluate estimated parameter values (Exercises 7.1 –7.3) Make predictions using the calibrated steady-state model, perform sensitivity analysis, and evaluate potential new data (Exercise 8.1) Evaluate prediction uncertainty (Exercise 8.2)
2.3 EXERCISES
25
Transient Model Simulate hydraulic heads in the transient model (Exercises 9.1 and 9.2) Define transient parameters and observations (Exercises 9.3 and 9.4) Evaluate the initial transient model (Exercise 9.5) Perform sensitivity analysis (Exercise 9.6) Recalibrate the model using original steady-state observations and new transient observations (Exercise 9.7) Evaluate the calibrated transient model (Exercises 9.8 –9.11) Make predictions using the recalibrated model (Exercise 9.12) In groundwater model development, defining parameters often is difficult, as discussed in Chapter 1, Section 1.2.1. The exercises do not address this phase of model construction. Rather, the hypothetical flow system is designed so that its hydrogeologic and hydrologic characteristics can be accurately represented using only a few model parameters. Accurate representation of these aspects of the system allows the methods presented in the book to be illustrated more clearly. A more complicated problem might cause students to think that inaccurate parameterization is the problem when actually other issues are involved. The exercises contain an explanation, followed by questions to be answered or issues to be explained. These questions and issues are listed under the heading Problem and usually involve examination and evaluation of results. The results are obtained as follows. The Ground-Water Flow Process capabilities of MODFLOW2000 are used to simulate groundwater flow. The sensitivity and inverse modeling exercises can be performed using UCODE_2005, the Sensitivity and ParameterEstimation Processes of MODFLOW-2000, or, in most situations, PEST. The results of these simulations are contained in figures and tables included in this book. Students can complete all exercises in this book and thoroughly learn all methods presented without performing model simulations. To perform the model simulations, download the instructions, data, and codes as described in Chapter 1, Section 1.1. Exercises marked Optional can be skipped without disturbing continuity. The exercises in Chapter 9 are all marked optional because they provide additional experience with methods already used in previous exercises. It can be advantageous to replace the Chapter 9 exercises with application of the methods to models related to other student investigations, possibly with class presentation of results. Exercise 2.1: Simulate Steady-State Heads and Perform Preparatory Steps In these exercises, MODFLOW-2000 is used to simulate steady-state hydraulic heads for the flow system described in Section 2.2. Initial and final computer files, and instructions for modifying files, creating new files, and performing the simulations are available from the web site for this book; see Chapter 1, Section 1.1 for information about obtaining these files and instructions. Students who are not performing the simulations may skip these exercises.
3 COMPARING OBSERVED AND SIMULATED VALUES USING OBJECTIVE FUNCTIONS
The match of observed to simulated values is one of the most important indicators of how well a model represents an actual system. Objective functions measure this fit. Model calibration efforts largely involve attempting to construct a model that produces a good fit. Here, good fit means the objective function is as small as possible. Methods such as regression can determine parameter values that are optimal, meaning that they produce the best fit given the constructed model. The resulting parameter values are said to be optimal, optimized, or estimated by the regression. In later chapters of this book, we will see that a close fit is not the only goal of model calibration. However, methods that optimize parameter values are an important component of model calibration and can be used advantageously. This chapter presents the objective functions used in this book to quantify the match between observed and simulated values and discusses alternative objective functions. It also lists the conditions needed for model results to be accurate when produced using regression methods, discusses quantities used in the objective functions, and introduces objective-function surfaces.
3.1
WEIGHTED LEAST-SQUARES OBJECTIVE FUNCTION
The weighted least-squares objective function is first presented with a commonly used diagonal weight matrix. This allows use of summations, which are easier for
Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty. By Mary C. Hill and Claire R. Tiedeman Published 2007 by John Wiley & Sons, Inc.
26
27
3.1 WEIGHTED LEAST-SQUARES OBJECTIVE FUNCTION
many readers to understand than matrix and vector notation. The objective function is then presented with a full weight matrix. Often the term weighted regression is applied to regression with a diagonal weight matrix and generalized regression is applied to regression with a full weight matrix (Draper and Smith, 1998, p. 223). In this book we refer to both as weighted regression. Regression without weighting is called ordinary regression.
3.1.1
With a Diagonal Weight Matrix
The objective function is first defined in the context of a groundwater model. Using hydraulic heads and flows as the observations, the weighted least-squares objective function, S(b), can be expressed as S(b) ¼
NH X
vhi ½ yhi y0hi(b)2 þ
NQ X
vqj ½ yqj y0qj(b)2 þ
j¼1
i¼1
NPR X
v pk ½ y pk y0pk(b)2 (3:1a)
k¼1
where b ¼ a vector (which can be thought of as a list) containing values of each of the NP parameters being estimated; NP ¼ the number of estimated parameters; NH ¼ the number of hydraulic-head observations; NQ ¼ the number of flow observations; NPR ¼ the number of prior information values; yhi ¼ the ith observed hydraulic head being matched by the regression; 0 yhi (b) ¼ the simulated hydraulic head that corresponds to the ith observed hydraulic head (a function of b); yqj ¼ the jth observed flow being matched by the regression; y0qj (b) ¼ the simulated flow that corresponds to the jth observed flow (a function of b); ypk ¼ the kth prior estimate included in the regression; y0pk (b) ¼ the kth simulated value (restricted to linear functions of b in UCODE_2005 and MODFLOW-2000); vhi ¼ the weight for the ith head observation; vqj ¼ the weight for the jth flow observation; vpk ¼ the weight for the kth prior estimate. For NH and NQ, multiple observations at the same location or reach are each counted. Using y to indicate a generic contribution of any kind and v to indicate its weight, the objective function is more commonly expressed as S(b) ¼
NDþNPR X i¼1
vi ½ yi y0i (b)2 ¼
NDþNPR X i¼1
vi e2i
(3:1b)
28
COMPARING OBSERVED AND SIMULATED VALUES
where ND ¼ the number of observations; yi ¼ the ith observation or prior information value being matched by the regression; yi0 (b) ¼ the simulated equivalent, defined as the simulated value (a function of b) that corresponds to yi; vi ¼ the weight for the ith contribution to the objective function; ei ¼ the ith weighted residual, equal to [ yi 2 yi0 (b)]. Some of these terms are discussed further in Section 3.4.
3.1.2
With a Full Weight Matrix
In the simple diagonal weight matrix assumed in Eq. (3.1b), the diagonal entries are nonzero, and the off-diagonal terms equal zero. Each entry on the diagonal is the weight for a single observation or piece of prior information. More generally, the weighting requires a full weight matrix, in which one or more of the off-diagonal matrix entries are nonzero. These off-diagonal entries are needed to represent correlated observation errors. For a full weight matrix, the least-squares objective function of Eq. (3.1b) is written using vector and matrix notation as S(b) ¼ ½ y y0(b)T v½ y y0(b) ¼ eT v e
(3:2)
where v is the weight matrix and y is a vector of observations and prior information, y0(b) is a vector of simulated values, and e is a vector of residuals. The dimensions of the matrix and vectors are as follows: v is a square matrix dimensioned (ND þ NPR) by (ND þ NPR); all three vectors have (ND þ NPR) elements. ND and NPR were defined for Eq. (3.1). In Eq. (3.2), data for both observations and prior information are included in the weight matrix and in the vectors y, y0(b), and e. The structures of the weight matrix and the vectors are displayed in Appendix B, Eq. (B.1) and (B.2). MODFLOW-2000 supports full weight matrices for all types of observations (except hydraulic head) and for prior information. MODFLOW-2000 can accommodate some common temporal correlations in the errors of hydraulic-head observations by differencing as discussed in Section 9.1.2 and in Hill et al. (2000, pp. 33– 34). UCODE_2005 supports full weight matrices for all types of observations and for prior information and can accommodate any type of differencing. Full weight matrices are discussed further in Guideline 6, Section G6.2, in Chapter 11.
3.2
ALTERNATIVE OBJECTIVE FUNCTIONS
Alternatives to the least-squares objective function described in this work are the maximum-likelihood objective function, the L1 norm, and multiobjective optimization.
3.2 ALTERNATIVE OBJECTIVE FUNCTIONS
3.2.1
29
Maximum-Likelihood Objective Function
The maximum-likelihood objective function reduces to the least-squares objective function in most applications (as shown in Appendix A). The maximum-likelihood objective function is presented here and its value is calculated and printed by UCODE_2005 and MODFLOW-2000, because it can be used for model discrimination by itself or in the calculation of other statistics (e.g., see Carrera and Neuman, 1986; Loaiciga and Marino, 1986; Burnham and Anderson, 2002). The maximum-likelihood objective function is calculated as S0(b) ¼ (ND þ NPR) ln 2p ln jvj þ eT v e
(3:3)
where jvj is the determinant of the weight matrix, and, without loss of generality, it is assumed that the weight matrix is defined such that the common error variance s 2 described in Appendixes A and C equals 1.0. Unlike the least-squares objective function, Eq. (3.3) can be negative. Appendix A presents the derivation of Eq. (3.3) and the assumptions required for the derivation and explains the equivalence of using Eq. (3.2) or (3.3) in practice. An alternative but, in practice, equivalent version is derived by Burnham and Anderson (2002, p. 12). 3.2.2
L1 Norm Objective Function
The L1 norm equals the sum of the absolute values of weighted residuals (Xiang et al., 1993; Menke, 1989). Minimizing the L1 norm is often accomplished using the simplex method and does not require sensitivities or derivatives of the objective function. Inferential and diagnostic statistics that are derived from sensitivities and used for the sensitivity analysis, data assessment, and uncertainty evaluation described in this book can be obtained if the sensitivities are calculated separately. In this situation these statistics could be used as described here. L1 norms are rarely used for nonlinear systems because they do not perform as well as nonlinear regression and provide less information. 3.2.3
Multiobjective Function
Multiobjective optimization uses multiple objective functions. The objective functions may be least-squares objective functions such as those considered in this work, or, for example, the objective function may be defined as the sum of costs for well installation and sampling that are to be kept as small as possible (e.g., see Deb, 2001; Reed et al., 2003; Vrugt et al., 2003). Objective functions also can include terms related to the smoothness of the estimated parameters (e.g., see Vasco et al., 1997). Defining what to include in the different objective functions is an important part of multiobjective optimization. Some situations are clear, such as when one objective function represents the violation of established criteria, and another represents well installation and sampling costs. Other situations are not as clear, such as when one
30
COMPARING OBSERVED AND SIMULATED VALUES
objective function represents the fit to some of the observations used in model calibration, and another represents the fit to other observations. For example, is it best to include all hydraulic heads in one objective function and all streamflow gain and loss observations in another? Should the data in different subbasins be included as separate objective functions, or as one combined objective function? In the former example, if the heads are all combined, that objective function is more likely to suffer from extreme parameter correlation. Is that advantageous to understanding system dynamics? Users who consider multiobjective optimization are encouraged to consider such questions and design their multiobjective optimizations carefully.
3.3
REQUIREMENTS FOR ACCURATE SIMULATED RESULTS
Theoretically, the least-squares objective function can be used to produce a model that accurately represents a system and provides accurate measures of model uncertainty only if three conditions are met. Two of these conditions relate to true errors, which equal the unknown amounts by which an observation or prior information equation differs from the value in the actual system. The conditions are: (1) Relevant processes, system geometry, and so on are adequately represented and simulated; (2) true errors of the observations and prior information are random and have a mean of zero; and (3) weighted true errors are independent, which means that the weighting needs to be proportional to the inverse of the variance-covariance matrix on the true observation errors (Draper and Smith, 1998, p. 34, 222). The true errors cannot be analyzed, so weighted residuals are investigated and the characteristics of the true errors are inferred. Tests for weighted residuals are described in Chapter 6. To estimate parameter values with the least-squares objective function there is no requirement about the statistical distribution of the true errors (Helsel and Hirsch, 2002, Table 9.1). However, normality is often assumed, which allows calculation of observation error variances and covariances from field data and construction of linear confidence intervals. The first is discussed in Guideline 6 in Chapter 11; linear confidence intervals are discussed in Sections 7.5.1 and 8.4.2. Tests for normality are presented in Sections 6.4.5. Model linearity can be tested using measures discussed in Section 7.7. 3.3.1
Accurate Model
Many aspects of requirement 1 above are application specific, but some methods of sensitivity analysis and comparing observed and simulated values can be useful for achieving the requirement and/or for testing and demonstrating to what degree it is achieved. Much of this book presents such methods and shows how to use them. 3.3.2
Unbiased Observations and Prior Information
Requirement 2 is important because if an observation or prior information equation is biased—that is, the difference between observed and simulated values is expected to be consistently negative or positive—the model is likely to be biased. For
31
3.3 REQUIREMENTS FOR ACCURATE SIMULATED RESULTS
example, consider streamflow observations that are affected by a process that makes them higher than would be expected given the simulated processes (e.g., if baseflow contributes to streamflow but is not included in a rainfall-runoff model). Optimized parameter values may produce a good fit to the observations, but the system may not be simulated correctly. If the model is then used to simulate other circumstances the predictions are likely to be inaccurate. One consequence of requirement 2 is that bias cannot be accommodated by weighting. Instead, every effort needs to be made to eliminate bias in the observations. For the streamflow example, the noted bias is commonly eliminated by subtracting estimates of base flow to create the observations used in the regression. The importance of requirement 2 to the validity of regression methods is explained in Appendix C. 3.3.3
Weighting Reflects Errors
To understand requirement 3, consider that weighting performs two related functions. First, weighting needs to produce weighted residuals that have the same units so that they can be squared and summed using Eqs. (3.1) or (3.2). Obviously, summing numbers with different units produces nonsense. Second, weighting needs to reduce the influence of observations and prior information that are less accurate relative to those that are more accurate. These two functions relate directly to the theoretical requirement that the weight matrix be proportional to the inverse of the variance-covariance matrix of the true errors (requirement 2), which is derived in Appendix C. Errors are discussed in Chapter 11 under Guideline 6; examples are provided in Chapter 15. The assumptions implied by using a diagonal weight matrix are discussed in Appendix A. Mathematically, requirements 2 and 3 can be expressed as: E(1) ¼ 0 for a diagonal weight matrix (Eq. 3.1): vi / 1=si2 for a full weight matrix
(3:4) 1
(Eq. 3.2): v1=2 / V(1)
where / means “proportional to,” 1 is a vector of true errors, si2 is the variance of the true error of observation i, and V(1) is the variance –covariance matrix of the true errors, with variances along the diagonal and covariances off the diagonal. The true errors in vector 1 relate observed or prior information values, yi, to true, unknown values, ytrue i , through the expressions: þ 1i , yi ¼ ytrue i y¼y
true
þ1
i ¼ 1, ND þ NPR
or, equivalently, (3:5)
Additive errors are assumed in Eq. (3.5). This is not a very restrictive assumption because errors often are additive or can be converted to being additive, as discussed in the following paragraphs.
32
COMPARING OBSERVED AND SIMULATED VALUES
For many observations, and especially groundwater flow and concentration observations, errors are typically thought to be proportional to the true value, so that y ¼ ytrue (1 þ 1) ¼ ytrue þ ytrue 1:
(3:6)
An appropriate weighting strategy can be achieved by specifying the coefficient of variation as the statistic from which the weight is calculated, and using observed or simulated values to estimate ytrue (e.g., see Keidser and Rosbjerg, 1991). The variance is then calculated as [(c.v.) a]2, where c.v. is the coefficient of variation and a is the observed or simulated value. The standard deviation equals [(c.v.) a]. Anderman and Hill (1999) show that using simulated, rather than observed, concentrations is needed to obtain unbiased parameter estimates in transport problems, and this conclusion is likely to be generally applicable. See Section 9.2 for more discussion of weighting concentrations. MODFLOW-2000 supports using observations to calculate weights; UCODE_2005 supports using either observations or simulated values. Errors that can be made additive through a transformation include, for example, multiplicative errors for which yi ¼ (ytrue i ) (1i). This error model can be logtransformed to produce ln(yi) ¼ ln(ytrue i ) þ ln(1i), in which the errors are additive as in Eq. (3.5). Margulis et al. (2002) present a study in which errors are multiplicative. Observation transformations that convert multiplicative errors to be additive can be easily implemented in UCODE_2005 or PEST, though doing so can make model results harder to communicate to resource managers.
3.4
ADDITIONAL ISSUES
Issues related to prior information, weighting, and weighted residuals are discussed. 3.4.1
Prior Information
The linear prior information equations supported by MODFLOW-2000 and UCODE_2005 have the form P0p (b) ¼
NP X
(ap;j bj ) ¼ ap,1 b1 þ ap,2 b2 þ þ ap,NPR bNP
(3:7)
j¼1
where p indicates the pth prior information equation, ap, j are coefficients, and bj is the jth parameter value. In this book, the subscript p is sometimes replaced by a prior information name and j is replaced by the parameter name instead of a parameter number. Often, prior information equations have one nonzero coefficient ap, j equal to 1.0, so they are of the form Pp0 ¼ bj. In this case, the contribution to the objective function (Eq. (3.1) or (3.2)) is simply the weighted difference between the prior value of a parameter, Pp, and bj.
33
3.4 ADDITIONAL ISSUES
More than one term is needed on the right side of Eq. (3.7) when the prior information relates to a linear function that includes more than one parameter value. Consider the following two groundwater examples. Example 1: In each of two confined models, specific storage values are defined as parameters and are estimated by regression. The names of these parameters are SS1 and SS2. The combined storage coefficient of both layers has been measured from aquifer-test drawdown data that are not being used as observations in the calibration of a regional-scale model. In this situation, the prior estimate, Pp, equals the combined storage coefficient from the aquifer test; the simulated value, Pp0 , equals the simulated combined storage coefficient; and the two parameters involved, bSS1 and bSS2, are specific storage values for each model layer. In this situation, there are two nonzero coefficients in Eq. (3.4): ap,SS1, the coefficient for bSS1, equals the thickness of layer 1; ap,SS2, the coefficient for bSS2, equals the thickness of layer 2. Example 2: The distribution of hydraulic conductivity is expected to be smooth on the basis of an evaluation of depositional environment and hydraulic gradient. This smooth distribution is simulated by interpolating from a number of locations at which parameters are defined. Smoothness is imposed by introducing prior estimates, Pp, that equal zero; simulated values, P0p , that equal the difference between parameter values at neighboring locations; and two parameters, bj1 and bj2, that are involved in each prior information equation. Each equation has two nonzero values of ap,j, one equal to 1.0 and one equal to 21.0. The prior information equations are therefore of the form Pp0 ¼ bj1 bj2 Contributions to the objective function (Eq. (3.1) or (3.2)) are weighted differences between Pp, which equals 0.0, and P0p . The variance of the error could be derived from geostatistical arguments, but to the authors’ knowledge this has not been investigated. Prior information must be used carefully. Two issues related to the use of prior information are discussed briefly here, and are further discussed in Section 5.5 and in Chapter 11 under Guideline 5. First, prior information on sensitive parameters can obscure important information available from the regression. This occurs when prior information is used to restrict the parameter estimate from becoming unreasonable during regression. However, unreasonable parameter estimates can lead to important insight about problems with the model or with the observations. Second, for insensitive parameters in models with long forward execution times, it can be advantageous to set the parameter value equal to its prior estimate during regression, rather than estimating the parameter. This can significantly reduce execution times without substantially affecting the results. For final model runs, including the prior information and estimating the parameter allows the modeler to (1) assess whether the parameter value remains close to the prior value as
34
COMPARING OBSERVED AND SIMULATED VALUES
expected, and (2) include the uncertainty of the parameter in the calculation of diagnostic statistics used to evaluate the regression and uncertainty in predictions. As for observations, the model can be used to identify new prior information for which the cost of measurement would likely be a good or bad investment. See Section 8.2 and Guideline 11 under Chapter 13. 3.4.2
Weighting
The purpose of weighting is described in Section 3.3.3. For a diagonal weight matrix, Eq. (3.4) presents the requirement that weights of Eq. (3.1) need to be proportional to 1.0 divided by the variance of the data measurement error; that is, vii / 1/s2i . Specifying the weights on the basis of the inverse of the error variance achieves the goal of emphasizing observations and prior information that are thought to be accurate relative to those that are thought to be inaccurate. It is always important to analyze data error. Weighting provides a way for that analysis to be formally included in model development. An approach that is consistent with vii / 1/s2i is to define the weighting in an attempt to achieve the stricter requirement that:
vii ¼ 1=s2i
(3:8)
For a full weight matrix, the equivalent expression is
v ¼ V(1)1
(3:9)
where V(1) is the variance– covariance matrix of the observation errors, with variances along the diagonal and covariances off the diagonal. Setting the weights to be equal to, rather than proportional to, the right-hand sides results in some very useful properties, as described in Chapter 6, Section 6.3.2 and Guideline 6 in Chapter 11. Eq. (3.8) and (3.9) are used extensively in this book. Most modelers can envision standard deviations or coefficients of variation more easily than variances, and MODFLOW-2000 and UCODE_2005 allow the user to specify these statistics to characterize error; the codes then calculate the variance internally. Examples of converting judgments about errors to standard deviations and coefficients of variation are discussed under Guideline 6. As noted there, if more than one source of error exists, the variance of each source needs to be determined and the variances need to be summed to obtain the final variance of the observation or prior information. If the statistic (e.g., variance, standard deviation, or coefficient of variation) used to weight observations and prior information accurately reflects the uncertainty in the estimate, as suggested above, then (1) the observation or prior information can be viewed in a Bayesian sense and (2) measures of uncertainty produced by the model may reflect the actual uncertainty of the observations and prior information. For prior information, this issue was mentioned in Section 1.3.2 and in Section 3.4.1, and is discussed in more detail in Guideline 5 (Chapter 11).
3.5 LEAST-SQUARES OBJECTIVE-FUNCTION SURFACES
35
In some situations, errors in observations are not independent. For example, errors in streamflow gain and loss observations calculated from streamflow measurements can be correlated as discussed in Chapter 11 in Section G6.1 under the heading “Determine Covariances for Weight Matrices.” These correlations indicate that the information present in the observations is redundant. Correlations close to 0.0 indicate little redundancy; correlations close to 1.0 or 21.0 indicate extreme redundancy. Although experience to date indicates that including the correlations in the weight matrix often has a minor effect on estimated parameter values, using the full weight matrix may be important to calculated uncertainties (see Chapter 11, Section G6.1). 3.4.3
Residuals and Weighted Residuals
Residuals are calculated as ½ yi y0i (b)
(3:10)
and represent the match of the simulated values to the observations or prior estimates. For a diagonal weight matrix, weighted residuals are calculated as 0 v1=2 i ½ yi yi (b)
(3:11)
and represent the fit of the regression relative to the weights. For a full weight matrix, weighted residuals are calculated as
v1=2 ½ y y0 (b):
(3:12)
The square-root of the weight matrix is calculated such that v1/2 is symmetric (S. Christensen, Univ. of Aarhus, Denmark, written commun., 1996). For weighting as suggested by Eq. (3.8) and (3.9) and discussed in Chapter 11 under Guideline 6, weighted residuals represent the fit of the regression in the context of the expected accuracy of the observations or prior estimates. Those expected to be less accurate are de-emphasized when weighted residuals are considered; those expected to be more accurate are emphasized.
3.5
LEAST-SQUARES OBJECTIVE-FUNCTION SURFACES
For one or two parameters, it is possible to plot the objective function and to easily diagnose any problems with its minimization. Objective-function surfaces for two parameters can be constructed through the following steps: (1) vary the values of the two parameters over selected ranges, (2) calculate the simulated equivalents of the observations for each set of parameters, (3) calculate the sum of weighted squared residuals (Eq. (3.1) or (3.2)) for each set of parameters, (4) plot these objectivefunction values against the two parameter values, and (5) contour the plotted values. The objective-function surfaces resemble topographic maps except that instead of elevation above sea level, the topography is created by areas with lower and
36
COMPARING OBSERVED AND SIMULATED VALUES
higher values of the objective function. Also, instead of coordinate direction, the “location” is characterized by the values of the parameters. The goal of regression is to identify the parameter values for which the objective-function is the smallest, which is analogous to finding the location of the lowest point in the landscape. Figure 3.1a shows a simple two-parameter model and the distribution of the hydraulic heads calculated using the true values of transmissivity parameters T1 and T2. Figure 3.1b shows its weighted least-squares objective-function surface (for plotting convenience, the logarithm of this surface is shown) plotted against the log of T1 and T2. For a linear problem, the objective-function contours would be concentric ellipses or parallel straight lines symmetrically spaced about a trough. The nonlinearity of Darcy’s Law with respect to hydraulic conductivity results in the much different shape shown in Figure 3.1b. In practice, most models have more than two parameters and it is not possible to visualize the entire objective function. However, objective-function surfaces can be useful in two ways. 1. The model can be redesigned to be represented with only two parameters. For example, for a groundwater model one parameter can be defined that multiplies all the hydraulic-conductivity values in the system and a second parameter can be defined that multiplies all the recharge values in the system. The resulting objective-function surface can reveal extreme parameter correlation or other problems with multiple minima that exist but are difficult to detect when the system is represented using more parameters. This procedure is illustrated in Exercise 5.1. 2. For a problem with many defined parameters, objective-function surfaces can be used to evaluate pairs of parameters that are difficult to estimate. With UCODE_2005, it is easy to create the data sets for such plots through the Investigate Objective Function mode. Similar data sets can be produced using PEST with SENSAN. There is no simple method to produce such data sets with MODFLOW-2000. Objective functions for three or four dimensions can be represented using more sophisticated methods, but this is not considered here.
3.6
EXERCISES
Exercise 3.1: Steady-State Parameter Definition This exercise stresses the importance of checking the simulated values resulting from defined parameter values and correcting any errors in how the parameters are defined. This exercise involves defining and checking parameters of the steady-state flow system described in Section 2.2. The flow system properties, parameter names, and the starting parameter values are shown in Table 3.1. The conductance of the headdependent boundary adjacent to the hillside (see Figure 2.1a and 2.1b) is not estimated because this property has a minor effect on the flow system, as shown by the small amount of flow that enters this model boundary (Figure 2.1c).
3.6 EXERCISES
37
FIGURE 3.1 Objective-function surfaces for a simple model. (a) One-dimensional porousmedia flow field bounded by constant heads on the left and right and consisting of three transmissivity zones and two transmissivity values T1 and T2. Hydraulic heads calculated using the true parameter values are shown. (b) Logarithm of the weighted least-squares objective function that includes observations of hydraulic heads h1 through h6, in meters, and flow q1, in cubic meters per second. The observations contain no error. (c) Logarithm of the weighted least-squares objective function using observations with error, and a threedimensional portrayal of the objective-function surface. Sets of parameter values produced by modified Gauss–Newton nonlinear regression iterations are identified (þ), starting from two sets of starting values and progressing as shown by the arrows. (From Poeter and Hill, 1997.)
38
COMPARING OBSERVED AND SIMULATED VALUES
TABLE 3.1 Parameter Name and Starting Value for Properties of the Steady-State Flow System for Which Parameters Are Estimated in Subsequent Exercises Parameter Name
Flow System Property Horizontal hydraulic conductivity of layer 1, in m/s Hydraulic conductivity of the riverbed, in m/s Vertical hydraulic conductivity of confining bed, in m/s Horizontal hydraulic conductivity of layer 2 in columns 1 and 2, in m/s Recharge in recharge zone 1, in cm/yr Recharge in recharge zone 2, in cm/yr a
Starting Valuea 3.0 1024 1.2 1023 1.0 1027 4.0 1025
HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
63.072 31.536
Five significant digits are used for recharge because of a units conversion.
All work for Exercise 3.1 involves modifying computer files and simulating the system. Instructions are available from the web site for this book described in Section 1.1. Students who are not performing the simulations may skip Exercise 3.1. Exercise 3.2: Observations for the Steady-State Problem In this exercise, observations of the steady-state flow system described in Section 2.2 are defined and checked, and weights on the observations are defined and calculated. The hydraulic-head observations used for the steady-state system are listed in Table 3.2. Their locations are shown in Figure 2.1b. All head observations are from wells located at the centers of model cells. In addition, there is one flow observation equal to the groundwater discharge to a river reach. The reach extends along the entire length of the river, and the gain in streamflow is 4.4 m3/s. (a– b) Define observations in model input files. Exercises 3.2a – b involve modifying computer files and simulating the system. Instructions are available from the web site for this book described in TABLE 3.2
Hydraulic-Head Observations
Variance of Variance Variance of of the Well Elevation Water-Level Well Observation Observed Measurement Measurement Observation Error (m2) Error (m2) Identifier Name Layer Row Column Head (m) Error (m2) 1 2 3 4 5 6 7 8 9 10
hd01.ss hd02.ss hd03.ss hd04.ss hd05.ss hd06.ss hd07.ss hd08.ss hd09.ss hd10.ss
1 1 1 1 1 2 2 2 2 2
3 4 10 13 14 4 10 10 10 18
1 4 9 4 6 4 1 9 18 6
101.804 128.117 156.678 124.893 140.961 126.537 101.112 158.135 176.374 142.020
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025
1.0025 1.0025 1.0025 1.0025 1.0025 1.0025 1.0025 1.0025 1.0025 1.0025
39
3.6 EXERCISES
DATA AT HEAD LOCATIONS OBEROBSERVATION VATION OBS# NAME * 1 hd01.ss 102. 2 hd02.ss 128. 3 hd03.ss 157. 4 hd04.ss 125. 5 hd05.ss 141. 6 hd06.ss 127. 7 hd07.ss 101. 8 hd08.ss 158. 9 hd09.ss 176. 10 hd10.ss 142.
SIMUL. EQUIV. WEIGHTED * RESIDUAL WEIGHT**.5 RESIDUAL 100. 1.58 0.999 1.58 139. -11.2 0.999 -11.2 174. -17.7 0.999 -17.7 139. -14.4 0.999 -14.4 157. -16.2 0.999 -16.2 140. -13.1 0.999 -13.1 103. -1.76 0.999 -1.75 174. -15.8 0.999 -15.8 190. -13.9 0.999 -13.9 157. -15.0 0.999 -15.0
------------------
DATA FOR FLOWS REPRESENTED USING THE RIVER PACKAGE OBSERVATION MEAS. CALC. WEIGHTED OBS# NAME FLOW FLOW RESIDUAL WEIGHT**.5 RESIDUAL 11 flow01.ss -4.40 -4.86 0.461 2.27 1.05 FIGURE 3.2 Part of MODFLOW-2000 LIST output file showing initial model fit and weights for the head and flow observations.
Section 1.1. Students who are not performing the simulations may skip Exercises 3.2a – b. (c) Check simulated values. Observed hydraulic heads and flows and their simulated equivalents in the initially constructed model are shown in Figure 3.2. Problem: Using Figure 3.2, does the model fit suggest data input error? (d) Calculate weights on hydraulic-head and flow observations. As discussed in Sections 3.3 and 3.4.2 and in Chapter 11 under Guideline 6, assignment of weights requires an analysis of the likely accuracy of the observations. In the simple model used for these exercises, this assignment is easier than usual because any deviation from the accurate simulated values has been added intentionally. More realistic situations are discussed elsewhere in this book, including in Guideline 6 and Chapter 15. The observed heads of Table 3.2 were generated by simulating hydraulic head using the true model and adding randomly generated noise with known variance. The added noise has the following characteristics: 1. The elevation of each observation well has a mean error of 0.0 and a variance of 1.0, as shown in Table 3.2.
40
COMPARING OBSERVED AND SIMULATED VALUES
2. In addition, each head measurement has an error associated with the waterlevel measurement method. This error has a mean of zero and a variance of 0.0025, as shown in Table 3.2. 3. The flow has an error with a mean of zero and a coefficient of variation of 10 percent. Problem .
.
Compute the weights for each observation from the values specified for the head observations in Table 3.2 and the flow observation in the text above. Calculate the weights as the inverse of the observation variance. The final variance equals the sum of the variances of the components. Check your calculations against the weights printed in output files from Exercise 3.2b or using the output shown in Figure 3.2.
Exercise 3.3: Evaluate Model Fit Using Starting Parameter Values This exercise involves assessing initial model fit. If the evaluation from Exercise 3.2 indicates no problems, the model fit resulting from the starting parameter values is worth evaluating. Use the tables of observed and simulated hydraulic heads and flows located in the output files from Exercise 3.2, which are shown in Figure 3.2, and are produced by students performing the exercises. Problem . .
Comment on the model fit achieved with the starting parameter values. How do the residuals compare to the weighted residuals?
For students performing the model simulations, do the following parts of this exercise. .
.
Attempt to achieve a better model fit by changing the parameter values manually, using your knowledge about the behavior of the groundwater flow system. Make changes three to twelve times. Each time, document the lack of fit being addressed, the reason the change attempted was expected to address that lack of fit, whether or not the change produced the expected results, and whether there were any unexpected, welcome, or unwelcome consequences. When finished, restore the starting parameter values.
4 DETERMINING THE INFORMATION THAT OBSERVATIONS PROVIDE ON PARAMETER VALUES USING FIT-INDEPENDENT STATISTICS
This chapter focuses on selected sensitivity analysis methods that measure the information that observations provide for defining parameters and estimating parameter values. The sensitivity analysis described in this chapter uses what are herein called fit-independent statistics. The statistics are fit independent in that residuals (Eq. (3.7)) are not used to calculate these statistics—only sensitivities and the weighting are used. Sensitivities are defined in Section 4.3.1. Sensitivity analysis is a very broad field. This book includes some sensitivity methods that are not common in other text books, such as the fit-independent statistics, and there are many methods that are not presented in this book. Other sensitivity analysis methods are presented by, for example, Saltelli et al. (2000, 2004). The methods presented in this book are generally classified as local methods because they use sensitivities calculated for one set of parameter values. They are most useful if the model is not too nonlinear with respect to the parameter values. In most circumstances the methods presented have been found to be useful; models apparently have to be extremely nonlinear for the methods to fail completely. This chapter focuses on sensitivity analysis methods using fit-independent statistics that measure the information provided by observations for parameter values. The methods are discussed again in Chapter 7 along with fit-dependent statistics that serve the same basic purpose. Chapter 8 introduces fit-independent statistics for evaluating the information observations provide on predictions and the importance of parameter values to predictions. Thus, fit-independent statistics can be used
Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty. By Mary C. Hill and Claire R. Tiedeman Published 2007 by John Wiley & Sons, Inc.
41
42
INFORMATION THAT OBSERVATIONS PROVIDE
to evaluate each link of the observation – parameter – prediction sequence connected quantitatively by the model, as discussed in Chapters 1 and 10. This chapter begins by discussing how observations provide information about model construction and parameter definition, as well as providing information about parameter values. Then, sensitivities are defined mathematically and conceptually, the importance of scaling the sensitivities is discussed, and fit-independent statistics are presented. Finally, advantages and limitations of fit-independent statistics are discussed.
4.1
USING OBSERVATIONS
Observations are used to construct models, define parameters, and estimate parameter values. These roles are discussed briefly here and more in Chapter 11. 4.1.1
Model Construction and Parameter Definition
Observations provide information about model construction and parameter definition (also called parameterization) as well as about the value of model parameters. Observations provide information about what dynamics and features of a system are important. For example, consider the following circumstances. 1. Hydraulic-head observations indicate smooth spatial changes in hydraulic gradient in a groundwater system. Given the geologic history and hydraulic conditions of the system, it is suspected that the gradual changes in hydraulic gradient reflect a hydraulic-conductivity distribution that varies gradually. Such a distribution might be well represented in a model using an interpolation method in which the hydraulic-conductivity values at the interpolation points are defined as parameters. 2. Hydraulic-head observations indicate abrupt spatial changes in hydraulic gradient under natural conditions. Given the geologic history and hydraulic conditions of the system, it is suspected that the abrupt changes in hydraulic gradient reflect a hydraulic-conductivity distribution that varies abruptly. Such a distribution might be well represented using a zonation method in which the hydraulic-conductivity values in each of several hydrogeologic units are defined as parameters. 3. Concentration observations suggest that near its source, a plume in a groundwater system sinks significantly in a short distance and then sinks very slowly as it spreads and moves downstream toward the northeast. The mass of the plume appears to diminish with time. This situation is likely to result from density effects at high concentrations, the effects of areal recharge over time creating small downward vertical velocities, and the effects of advection, dispersion, and decay. The groundwater flow model boundary conditions, hydraulic-conductivity field, and so on need to reproduce these conditions as the plume moves.
4.1 USING OBSERVATIONS
43
The process by which observations are used to construct a model is dominated by professional judgment and often by trying different options in an ad hoc fashion. When the choices are discrete they do not lend themselves to gradient-based optimization methods. To the extent that the options involve parameters, the methods described in this chapter can be used as measures of the information provided by the observations on the parameter value. The information provided by the observations on, for example, system processes often can be inferred from the information provided on the parameter value. A complementary way of expressing this is that the importance indicated by a sensitivity analysis related to the parameters often can be used to infer the importance of a related process. 4.1.2
Parameter Values
For any given calibration, most of the effort generally is spent trying to use the information provided by the observations to adjust parameter values. To make this task more meaningful and manageable, this book suggests that nonlinear regression be used to estimate parameter values given a set of observations. The ability of the regression to precisely estimate a set of parameters is related to the information the observations provide on the parameter values. The statistics typically used in nonlinear regression to determine how well parameters are estimated are defined in linear regression textbooks such as Draper and Smith (1998). These include p-values, t-statistics, and so on. These statistics depend on model fit being optimized and are calculated after regression is successfully completed. For models with lengthy execution times, methods that do not require completion of regression can be very helpful, and the fit-independent statistics presented in this chapter are designed to serve this purpose. The fit-independent statistics are closely related to standard statistics, as noted in the subsequent discussion. The information that the observations provide about model construction and parameter definition is difficult to quantify. One option is to construct and estimate parameters for a variety of plausible alternative models, as discussed under Guideline 8 in Chapter 11 and in the context of prediction uncertainty under Guideline 14 in Chapter 14. This approach has the advantage of accounting for model nonlinearity and the disadvantage of sometimes requiring unattainable computer resources. An approximate approach to evaluating the information observations provide about model construction is to assume that if observations provide a large amount of information about a parameter value, then they also are likely to provide a large amount of information about the model construction related to that parameter, including how the parameter is defined. Often such a conclusion is valid and therefore focusing attention on model construction and parameter definition in areas of high parameter value sensitivity can help improve model fit to the observations. In nonlinear models, however, exceptions will occur. For example, if the material blocking groundwater flow in one part of the simulated system has an extremely small value of simulated hydraulic conductivity, most of the measures of importance discussed in this chapter will tend to be small. As more moderate
44
INFORMATION THAT OBSERVATIONS PROVIDE
values of hydraulic conductivity are simulated, the sensitivities can increase if the blockage is important to reproducing the dynamics represented by the observations. Sometimes such exceptions can be identified through understanding of the flow system. Thus, a way of assessing the information provided by observations about different aspects of model construction and parameter definition is to define parameters that control those aspects. For example, parameters can be defined to control the thickness of hydrogeologic units, to position points used in interpolation, or to position zone boundaries. This has been done, for example, by Zheng and Wang (1996) and Tung and Chou (2002). The sensitivity analysis methods presented in this work are, therefore, limited in their generality only by the parameters the user chooses to define. While parameters can be defined to represent any aspect of a simulated system, there are advantages to defining parameters frugally and carefully, as discussed in Guidelines 1 and 3 in Chapter 11. Generally, some types of parameters are not defined because these aspects of the system are better supported by independent data and/or are less important to fitting observations than other types of parameters. For example, in groundwater systems the product of hydraulic conductivity and hydrogeologic-unit thickness is important. However, it is more common to define parameters to represent hydraulic conductivity, which can vary over many orders of magnitude, than hydrogeologic-unit thickness, which is often known within 50 percent or less. Focusing on parameters that represent the least known and most important aspects of a system is a good strategy in most circumstances.
4.2 WHEN TO DETERMINE THE INFORMATION THAT OBSERVATIONS PROVIDE ABOUT PARAMETER VALUES Determining the information that observations provide toward estimating parameter values is valuable throughout model development. This analysis can help make the most of every model run and, therefore, becomes increasingly important as models require greater execution time. Determining observation information with respect to parameter values is most commonly used to: . . . . .
Decide what observations to include Design the defined parameters Decide which of the defined parameters to estimate Evaluate which potential new observations are important to the parameters Evaluate how the analysis is affected by model nonlinearity
These issues are discussed briefly here to motivate the rest of the chapter and to provide perspective, and in more detail in Chapter 11 under Guidelines 3 and 4. Examples of these issues are presented in Chapter 15.
4.2 WHEN TO DETERMINE INFORMATION THAT OBSERVATIONS PROVIDE
45
Three issues need to be considered when determining which observations to include in a regression. The first is that some observations may be affected by processes that are not simulated. For example, hydraulic heads may reflect perched conditions which are typically not simulated by saturated groundwater flow models. Omission of this category of observations needs to be based on the relevance of the observations to the simulated processes and the importance of the omitted processes to the predictions of interest. This analysis can often be addressed using the sensitivity analysis methods described in this book. The second issue is that observations are often clustered, and culling of clustered observations is one mechanism to consider. The effects of clustering can be evaluated in part with sensitivity analysis, as discussed in Chapter 11 under Guideline 4. The third issue occurs when there is the opportunity to conduct additional field work and the information provided by a potential observation is important to prioritizing the field effort. The importance of the potential observation to parameter estimates can be evaluated using the methods described in this chapter; its importance to predictions can be evaluated using the methods described in Chapter 8. When designing the defined parameters, observations commonly are more reliable indicators of system dynamics than other types of data. In many groundwater systems, for example, observations of hydraulic heads, flows, concentrations, and so on are more reliable indicators of system properties than are direct measurements of those properties. This is mostly because of problems with accessibility and scale. For example, these problems make it difficult to obtain accurate measurements of hydraulic conductivity and produce inconsistencies between the scale of most hydraulic-conductivity measurements and the scale of the model (e.g., see Barth et al., 2001; Barlebo et al., 2004). Thus, although hydraulic-conductivity measurements are valuable, it is important to consider them in the context of their likely errors and the errors of other available data. The fit-independent statistics help in this evaluation. The fit-independent statistics described in this chapter can be used to determine the parameters that are well supported by the observations, which is important when designing defined parameters. For example, a groundwater modeler may be interested in the detail supported by the observations for the hydraulic-conductivity distribution of a system. In parts of the system where observations provide abundant information, more parameters generally can be supported; where observations provide little information, fewer parameters generally can be supported. When deciding whether to define additional parameters, it is important to know how much the new parameters depend on the observation data, accounting for the effects of observation error. Deciding which of the defined parameters to estimate is important because for models with long execution times, regression runs can be lengthy. Execution times can be reduced by excluding from the regression insensitive parameters (those for which the observations provide very little information) or selected correlated parameters (those for which the observations do not provide unique information for each value). Exclusion of such parameters improves the performance of the regression, rarely affects regression results, and reduces execution
46
INFORMATION THAT OBSERVATIONS PROVIDE
times. When this strategy is followed for nonlinear models, it is important to recalculate the fit-independent statistics occasionally using updated parameter values, because, as noted at the end of this section, the value of the statistics will change as parameter values change. Evaluating which potential new observations are important to the parameters is a valuable step for guiding collection of additional field data and can be done using the sensitivity statistics presented in this chapter. These statistics produce fit-independent measures of the information that individual potential observations provide about individual or sets of parameters. A thorough discussion of using the statistics in this context is given in Guideline 11 of Chapter 13. Alternative methods for selecting new observations that improve the parameter estimates use criteria related to minimizing parameter uncertainty (e.g., Knopman and Voss, 1988, 1989; Nishikawa and Yeh, 1989). These methods often involve the design of observation networks and thus generally focus on identifying sets of observations, rather than on examining the information provided by individual observations. These methods use many of the same measures of parameter uncertainty that are used by the statistics presented here, so results are expected to be similar. Recent work on monitoring network design methodologies has tended to focus on minimizing prediction, rather than parameter, uncertainty. This topic is discussed in Chapter 8. For nonlinear models, different sensitivities are calculated for different parameter values, as discussed in Section 1.4. If a model is too nonlinear, the sensitivities vary so much that fit-independent statistics calculated from them become useless for the purposes discussed here. However, experience to date has shown that for most nonlinear models of groundwater systems, the statistics presented here have been found to be very useful. Some examples are discussed in Guideline 3 (Chapter 11), in Guideline 11 (Chapter 13), and in Chapter 15.
4.3 FIT-INDEPENDENT STATISTICS FOR SENSITIVITY ANALYSIS Fit-independent statistics are calculated using sensitivities, which are defined in the following section. Subsequent sections define scaling and five fit-independent statistics. Fit-independent statistics are measures of leverage—the potential for an observation to make a difference based on the observation sensitivities. In contrast, influence statistics measure the actual difference. The actual effect depends on the observed value, and, therefore, influence statistics depend on model fit. Leverage and influence statistics are discussed in Chapter 7, Section 7.3. Many of the fitindependent statistics described here are compared to influence statistics and the results of cross-validation in Foglia et al. (in press). The issue of observation importance to parameters spans the first two components of the observation – parameter –prediction triad composed of entities that are directly connected by the model, as discussed in Chapters 1 and 10.
4.3 FIT-INDEPENDENT STATISTICS FOR SENSITIVITY ANALYSIS
4.3.1
47
Sensitivities
Sensitivities are calculated as the derivatives of simulated equivalents to observations (such as simulated hydraulic heads and flows) with respect to the model parameters. That is,
@y0i @bj b
(4:1)
where yi0 is defined after Eq. (3.1b) as the simulated value that corresponds to an observation or item of prior information, bj is the jth parameter, and the notation indicates that the sensitivities are calculated for the parameter values listed in vector b. The latter is important because for nonlinear problems the sensitivities are different when calculated for different parameter values. For this reason, the sensitivities of Eq. (4.1) are called local sensitivities by Saltelli et al. (2000). This issue is discussed in Section 4.4. Some models, such as MODFLOW-2000, calculate the derivatives using sensitivity-equation sensitivities (Hill et al., 2000, pp. 67– 71) or adjoint states (Thomas Clemo, Boise State University, written communication, 2004) which produce the most accurate sensitivities; other models, such as UCODE_2005 and PEST, approximate sensitivities using forward, backward, or central differences. For example, the forward-difference approximation to Eq. (4.1) is
0 @y0i yi (b þ D b) y0i (b) @bj b Dbj
(4:2)
where D b is a vector of zeros except that the jth element equals Dbj. Equation (4.2) is calculated by running the model once using the parameter values in b to obtain yi0 (b), then again after changing the jth parameter value to obtain yi0(b þ D b), and finally taking the difference and dividing by the change in the jth parameter value. Execution time issues for calculating sensitivities are discussed in Chapter 15, Section 15.1. Accuracy issues are discussed in Sections 2.1, 4.4, and 7.4, and by Yager (2004) The sensitivities indicate the slope of a plot of a simulated value yi0 relative to one parameter or, approximately, how much a simulated value would change if a parameter value were changed, divided by the change in the parameter value. The parameters are considered individually; sensitivities do not account for changes in multiple parameters. Sensitivities can be used to indicate the importance of the observations to the estimation of parameter values. Observations are likely to be very valuable in estimating a parameter value if their simulated equivalents change substantially given a small change in the parameter value; observations contribute very little to estimating a parameter if their simulated equivalents change very little even with a large change in the parameter value.
48
INFORMATION THAT OBSERVATIONS PROVIDE
4.3.2
Scaling
Generally, it is useful to compare the relative importance of different observations. A problem with making this comparison using sensitivities is that sensitivities are in the units of the simulated value divided by the units of the parameter, both of which can vary considerably. For example, for groundwater models the simulated values might be hydraulic heads measured in meters, flows measured in cubic meters per day, and concentrations measured in milligrams per liter; parameters might be hydraulic conductivity measured in meters per day and recharge measured in millimeters per year. The solution pursued here and by others is to scale the sensitivities to achieve quantities with the same units. The scaling used depends on the intended purpose of the resulting scaled sensitivities. In both MODFLOW-2000 and UCODE_2005, scalings are used to produce dimensionless scaled sensitivities (dss) that are accumulated for each parameter to produce composite scaled sensitivities (css). Composite scaled sensitivities provide information about individual parameters, but cannot be used to evaluate whether a set of observations can estimate each parameter uniquely. Problems with uniqueness occur when coordinated changes in parameter values produce the same fit to the observations. Parameter correlation coefficients (pcc) indicate whether observations provide information for estimating parameters uniquely. Leverage statistics reflect the importance of observations on the basis of the effects measured by both css and pcc. Finally, one-percent scaled sensitivities (1ss) can be used to produce sensitivity maps, but there are difficulties with this scaling. These statistics are discussed below. 4.3.3
Dimensionless Scaled Sensitivities (dss)
When a diagonal weight matrix is used, dimensionless scaled sensitivities, dssij, are calculated as (Hill, 1992; Hill et al., 1998) 0 @yi dssij ¼ jbj jv1=2 (4:3a) ii @bj b where y i0 ¼ a simulated value. Here the notation indicates that the simulated value is associated with an observation (the ith observation), but similar scaling can be used with sensitivities of other quantities, such as potential observations. This is discussed in Guideline 12 in Chapter 13. bj ¼ the jth estimated parameter. 0 @yi ¼ the derivative, or sensitivity, of the simulated value associated @bj with the ith observation with respect to the jth parameter, evaluated at the set of parameter values in b. b ¼ a vector that contains the parameter values at which the sensitivities are evaluated; for nonlinear models, sensitivities will be different for different values in b. vii ¼ the weight of the ith observation. Similar scaling was used by Cooley et al. (1986) and Harvey et al. (1996).
4.3 FIT-INDEPENDENT STATISTICS FOR SENSITIVITY ANALYSIS
49
For log-transformed parameters, Eq. (4.3b) can be used to reflect the improved regression performance produced by log transformation. However, use of Eq. (4.3b) means that dss and css can vary considerably between model runs if the transformed parameters change. MODFLOW-2000 and UCODE_2005 use Eq. (4.3b). 0 @y0i @yi j ln (bj )jv1=2 ¼ bj j ln (bj )jv1=2 ii ii @ðln bj Þ b @bj b
dssij ¼
(4:3b)
To better understand the dimensionless scaled sensitivity, consider Eq. (4.3a) with the square root of the weight replaced by 1/s, where s is the standard deviation of the observation error (as discussed in Chapter 3, Section 3.4.4 and in Guideline 6 in Chapter 11, it is advantageous to define the observation weights as vii ¼ 1/s2). Also, divide and multiply the equation by 100, to achieve @y0i bj 100 dssij ¼ @bj b 100 s
(4:4)
By Eq. (4.4), the dimensionless scaled sensitivity indicates the amount the simulated value would change, expressed as a percent of the observation error standard deviation, given a one-percent increase in the parameter value. If dssij ¼ 1, a one-percent change in the parameter value, bj, would produce a change in the simulated value, y i0, equivalent to one percent of the standard deviation of measurement error, s. If dssij ¼ 10, a one-percent change in the parameter value would produce a change in the simulated value equivalent to 10 percent of s. Thus, dimensionless scaled sensitivities include the effects of sensitivity and of observation error. This is discussed further in Section 4.3.4 on composite scaled sensitivities. The dimensionless scaled sensitivities can be used in two ways. First, they can be used to compare the importance of different observations to the estimation of a single parameter bj. Observations with large dssij are likely to provide more information about parameter bj compared to observations associated with small dssij (large and small in absolute value). Also, observations with large dssij can be considered more important to the estimation of parameter bj. Second, the dimensionless scaled sensitivities can be used to compare the importance of different parameters to the calculation of a single simulated value yi0 . Parameters that are more important to the simulated value have dssij that are larger in absolute value. An example of using dimensionless scaled sensitivities is provided in Exercise 4.1b. For a full weight matrix, dimensionless scaled sensitivities are calculated as dssij ¼
ND 0 X @yk 1=2 b v j ki @b k¼1
j
(4:5)
b
where ND is the number of observations used in the regression, v1/2 is the square root of the weight matrix determined such that v1/2 is a symmetric matrix, and (v1/2)ki is the matrix element in row k and column i.
50
INFORMATION THAT OBSERVATIONS PROVIDE
In Eq. (4.5), the dimensionless scaled sensitivity dssij for simulated value yi0 is a function of the sensitivity of y i0 with respect to bj as well as a function of the sensitivities of other simulated values y k0 , k = i, with respect to bj. For observations with errors that are correlated with the error of observation yi, (v1/2)ki = 0.0. For observations with errors that are not correlated with the error of observation yi, (v1/2)ki ¼ 0.0. In practice, the off-diagonal terms of (v1/2)ki are likely be smaller than the diagonal terms, so the contribution of the sensitivity term @yi0/@bj to dssij is likely to be greater than the contribution of @yk0 /@bj, k = i. The fit-dependent equivalent to the dimensionless scaled sensitivity is the statistic DFBETAS presented in Chapter 7, Section 7.5.2. 4.3.4
Composite Scaled Sensitivities (css)
Composite scaled sensitivities reflect the total amount of information provided by the observations for the estimation of one parameter. They are calculated for each parameter using dimensionless scaled sensitivities and can be calculated for some or all observations. The composite scaled sensitivity, cssj, for the jth parameter calculated for ND observations is (Hill, 1992; Anderman et al., 1996; Hill et al., 1998)
cssj ¼
ND X
1=2 (dssij ) jb =ND 2
(4:6)
i¼1
where the quantity in parentheses equals a dimensionless scaled sensitivity of Eq. (4.3) or (4.5). The composite scaled sensitivity is equal to a scaled version of the square root of the diagonal of XTv X, which is the regression variance times the Fisher information matrix (Burnham and Anderson, 2002); X is a matrix of the sensitivities defined in detail following Eq. (5.2). Statistics that perform a similar function are the L1 norm of sensitivities used by R. L. Cooley (U.S. Geological Survey, written communication, 1988) and the CTB statistic of Sun and Yeh (1990a) and Sun (1994). The CTB statistic is scaled using the weight on prior information for parameter bj instead of the parameter value bj as in Eq. (4.3) or (4.5). Often composite scaled sensitivities are used in a comparative manner, whereby larger values indicate parameters for which the observations provide more information. If there are composite scaled sensitivities that are less than one percent of the largest value, regression often will have trouble converging. In this situation, the values of parameters with small composite scaled sensitivities may need to be assigned prior information or have the parameter specified rather than estimated by the regression (see Guideline 5 in Chapter 11). Composite scaled sensitivities are also meaningful individually. By using Eq. (4.4), they can be interpreted as the average amount that the simulated values change, expressed as a percent of the standard deviation of the observation error, given a one-percent change in the parameter value. This interpretation of the
4.3 FIT-INDEPENDENT STATISTICS FOR SENSITIVITY ANALYSIS
51
composite scaled sensitivity shows clearly that a parameter can be estimated only if the information provided by the observations, as expressed through their sensitivities, dominates the effects of observation error (noise in the data). The information provided by the sensitivities is related to the observation types, locations, measurement times, and system conditions. If a cssj value is too small, the observation data may be too noisy relative to the sensitivity information provided, and the regression may not be able to estimate a value of bj. An example of using composite scaled sensitivities is provided in Exercise 4.1b. Linear regression can be used to illustrate the interaction between the noise in the data and the sensitivity information that the observations provide by virtue of their type, location, and time. In linear regression, values of the independent variable, X, play a role similar to the sensitivities in nonlinear regression. The effect of the sampled range of X values on linear regression behavior is analogous to the effect of the range of observation types, locations, times, and system conditions on nonlinear regression behavior. In linear regression, the amount of noise in the data that can occur while still enabling accurate estimation of the parameters depends on the range of X values sampled. As this range increases, the regression can detect a trend in the data (and estimate parameters of the linear model) in the presence of a greater amount of data error. Figure 4.1 illustrates this concept by showing a linear model plotted with three different sets of observation data. In Figure 4.1a, the noise in the data overwhelms the information provided by the data locations (the range of X values), and consequently it is difficult to discern a trend in the data. In Figure 4.1b, the data locations are the same, but there is less noise in the data, and the trend in the data is thus much more discernable. In Figure 4.1c, the noise level is the same as in Figure 4.1b, but the information content of the observations is reduced by reducing the range of X, and the noise level again overwhelms the information that the observations provide. The interaction between the information content of the observations, as reflected in their sensitivities, and the noise in the observations suggests that there is some cssj value below which the observations provide insufficient information to estimate parameter bj. Although experience to date has not clearly identified this critical value, we suggest a value of 1.0. A cssj value of 1.0 means that a one-percent change in the parameter value produces, on average, a change in simulated values that is equivalent to one percent of the measurement error standard deviation. Parameter values with composite scaled sensitivities less than 1.0 are more likely to be poorly estimated, in that confidence intervals are large and regression convergence problems are persistent. 4.3.5
Parameter Correlation Coefficients ( pcc)
Parameter correlation coefficients ( pcc) used in conjunction with composite scaled sensitivities produce a useful sensitivity analysis. Parameter correlation coefficients are calculated as Covfbgjk/[Varfbgjj Varfbgkk], where Covfbgjk is the covariance between two parameters and Varfbgjj and Varfbgkk are the variances of each of
52
INFORMATION THAT OBSERVATIONS PROVIDE
FIGURE 4.1 A single linear model, y ¼ b0 þ b1X, with three sets of data. The true parameter values are b0 ¼ 3 and b1 ¼ 0.5. (a) The noise in the data has a standard deviation of s ¼ 15. (b) The data are at the same X values, and s ¼ 5. (c) The noise level is the same as in (b), but the range of X values is reduced.
4.3 FIT-INDEPENDENT STATISTICS FOR SENSITIVITY ANALYSIS
53
the parameters. Further discussion of pcc is presented in Chapter 7, Section 7.2.1, because it is closely associated with the parameter statistics discussed there. Limitations of pcc are discussed in Section 4.4.2. Here we present comments needed to support wise use of parameter correlation coefficients in sensitivity analysis. The pcc are calculated for each possible pair of model parameters. They indicate whether parameter values can be estimated uniquely by regression, given the constructed model and the observations and prior information provided. The pcc values can vary from 21.00 to þ1.00. The pcc for a parameter with itself is always 1.00. If the pcc for a pair of parameters is equal to or very close to 21.00 or þ1.00, the two parameters generally cannot be estimated uniquely. Extreme correlation between more than two parameters is indicated if pcc values for all pairs of the parameters involved are near 21.00 and þ1.00 and indicates that the parameters involved generally cannot be estimated uniquely. If the absolute values of all pcc are less than about 0.95, then it is likely that all parameter values can be estimated uniquely. However, this is a rule of thumb; experience has shown that unique estimates sometimes can be obtained even with absolute values of pcc that are very close to 1.00. Correlation coefficients are typically displayed as a matrix. This matrix is always symmetric, with diagonal elements equal to 1.00. For example, PAR1 PAR2
PAR3
PAR1
1:00
0:96
0:05
PAR2
0:96
1:00
0:98
PAR3
0:05
0:98
1:00
Here, PAR1, PAR2, and PAR3 are parameter names. Alternatively, the large values can be listed in a table, such as Parameter Pair PAR1 – PAR2 PAR2 – PAR3
Correlation Coefficient 0.96 0.98
This table lists all pcc values greater than 0.95 In the global output file for MODFLOW-2000 and the main output file for UCODE_2005, the full parameter correlation coefficient matrix is printed, followed by a list of values larger in absolute value than 0.85. Moderate correlations between 0.85 and 0.95 are included because parameter correlation coefficients can change substantially during calibration and it is useful to know whether a previously high correlation has become just moderately high or low. In addition, moderate parameter correlations can contribute to large confidence intervals in some circumstances. An example of evaluating pcc as part of a sensitivity analysis is provided in Exercise 4.1c. Additional exercises on parameter correlation include
54
INFORMATION THAT OBSERVATIONS PROVIDE
Exercise 5.1a, which uses objective-function surfaces for a two-parameter problem to investigate the performance of regression in the presence of extreme parameter correlation, and Exercise 7.1f, which uses regression from different starting parameter values to test the uniqueness of parameter estimates with correlation coefficients very close to 1.00. Parameter correlation can also be evaluated using eigenanalysis and singular value decomposition. These alternatives are discussed in Chapter 7, Section 7.2.5. In this book, we focus on parameter correlation coefficients because in most cases they are easier for most modelers and resource managers to understand and critical values for identifying nonunique estimates are clearer. The alternative methods do not offer enough advantage to overcome these considerations. It is important to assess the pcc in conjunction with the scaled sensitivities and leverage statistics before proceeding with regression. Limitations of pcc are discussed in Section 4.4.2. If the pcc for one or more parameter pairs is very close to þ1.00 or 21.00, the regression may be unable to uniquely estimate the extremely correlated parameters. Options for addressing this situation are discussed in Chapter 7, Section 7.4. 4.3.6
Leverage Statistics
Leverage statistics combine the information provided by the dss, css, and pcc to identify observations able to dominate the regression. Leverage statistics are calculated using only sensitivities and weights and so are independent of model fit. They are introduced here because of their utility when used in conjunction with dss, css, and pcc. The equation for the leverage statistic is presented in Chapter 7, Section 7.3 because it is closely associated with the parameter statistics discussed there. One leverage statistic is calculated for each observation. Observations with large values of leverage could dramatically affect one or more of the estimated parameter values, depending on the value of the observation. Often, but not always, observations with greater leverage have large absolute values of dss for one or more parameters and large css. An observation with small dss and css values can attain a large value of leverage if it is instrumental in reducing the correlation between two or more parameters. 4.3.7
One-Percent Scaled Sensitivities
A final scaling considered here produces one-percent scaled sensitivities, denoted 1ssij, which are calculated as 0 @yi bj (4:7) 1ssij ¼ @bj b 100 Commonly, one-percent scaled sensitivities are calculated for simulated values at every node of a model grid instead of just at observation locations, so the subscript i would be used to identify every grid node. Sensitivities at every node of a grid are
4.3 FIT-INDEPENDENT STATISTICS FOR SENSITIVITY ANALYSIS
55
readily available when sensitivities are calculated using the sensitivity-equation method using, for example, MODFLOW-2000. One-percent scaled sensitivities maintain the units of the simulated values. They approximately equal the amount that the simulated value would change if the parameter value increased by one percent. When calculated for simulated values of the same type, larger values of lssij indicate greater sensitivity to bj, which indicates that an observed equivalent to simulated value i could be important to the estimation of the parameter value. Because they have dimensions, one-percent scaled sensitivities cannot be used to compare the importance of different types of observations to the estimation of parameter values and generally cannot be used to form a composite statistic. However, retaining units of the simulated values allows the one-percent scaled sensitivities to more effectively communicate results in some circumstances. The omission of weights from Eq. (4.7) means that the one-percent scaled sensitivities do not reflect the importance of the observations on the regression as effectively as do the dss or leverage statistics. The omission of the weighting does, however, make the statistic easier to calculate. One-percent scaled sensitivities can be used to create contoured sensitivity maps the same way heads at every node are used to create contoured head maps. Similar maps can be produced using UCODE_2005 or PEST by defining enough model locations as observations so that accurate maps are created, but this is an arduous undertaking. Maps of one-percent scaled sensitivities can be used to identify where additional observations would be most important to the estimation of different parameters. For example, if composite scaled sensitivities show that existing observations do not provide ample information to estimate a parameter, large absolute values on maps of one-percent scaled sensitivities for this parameter can help show where in the model domain a new observation would provide the most information about the parameter. However, as noted later, significant limitations exist. There are three disadvantages that limit the use of one-percent scaled sensitivity maps in practice. First, there are potentially a large number of maps to evaluate. For each parameter, there is a map for each model layer and, in transient models, for each time step. Searching these maps for the largest values of one-percent scaled sensitivities can be cumbersome. Furthermore, the largest values of onepercent scaled sensitivity often occur at different locations and times for each of the parameters. Additional criteria are needed to determine important locations, such as the potential effect of the observation on simulated predictions and their uncertainty. Second, conclusions drawn from one-percent sensitivity maps can be difficult to justify to resource managers because the many maps can be overwhelming, and different conclusions might be drawn from different maps. Third, the maps can only be produced for an observation type for which the simulated equivalent can be calculated over the entire model domain, such as hydraulic heads in groundwater systems. The maps do not provide information about other types of observations, such as flows or advective-transport observations in groundwater systems. The opr statistic presented in Chapter 8 generally is a better method of identifying important new observations.
56
INFORMATION THAT OBSERVATIONS PROVIDE
Despite their practical limitations, an excellent use of one-percent scaled sensitivities is for instructional purposes. For relatively simple problems, using knowledge of the physics and other processes that control the simulated system to explain the patterns and magnitudes of one-percent sensitivities helps modelers understand what sensitivities mean and provide to regression calculations. This is illustrated in Exercise 4.1d. For more complex problems, one-percent sensitivity maps can help the modeler better understand the processes controlling the simulated system.
4.4 ADVANTAGES AND LIMITATIONS OF FIT-INDEPENDENT STATISTICS FOR SENSITIVITY ANALYSIS The fit-independent statistics presented in this chapter have the advantage that in many circumstances they provide a good evaluation of the information provided by the observations for estimating parameters without first having to complete a successful regression. For models with long execution times, using fit-independent statistics to design the parameterization and decide which parameters to estimate in a given regression run can be advantageous. Limitations of fit-independent statistics generally are related to the scaling, inaccurate sensitivities, or the nonlinearity of the sensitivities, as discussed in the following sections. Additional comments and guidance for addressing difficulties are provided in Guideline 3 of Chapter 11, Section G3.2. 4.4.1
Scaled Sensitivities
Three issues related to scaled sensitivities are discussed: (1) they do not account for parameter correlations, (2) the scaling by parameter values defined in Eqs. (4.3) to (4.7) works well for some circumstances but not for others, and (3) though relatively robust in the presence of inaccurate sensitivities and model nonlinearity, they fail to perform well if the model is extremely nonlinear. Scaled sensitivities are limited in that they do not account for the possibility that while the observations may provide substantial information about individual parameters, coordinated changes in the parameter values may produce the same model fit. Thus, it cannot be determined if the observations can be used to estimate each parameter uniquely. This occurs when parameters are highly correlated and can be detected by calculating the parameter correlation coefficients and leverage statistics defined in Sections 4.3.5 and 4.3.6 and discussed further in Sections 4.4.2 and 4.4.3. The scaling by the parameter value used in the definitions of dimensionless, composite, and one-percent scaled sensitivities is useful when the effect of changing parameter values by a multiplicative factor is of interest. For example, in groundwater models it is common to think of errors in flow parameters, such as recharge, as some percentage of the flow (such as 5 or 10 percent), rather than as plus or minus a particular flow value. Similarly, potential changes in hydraulic conductivity commonly are thought of as a multiplicative factor such as plus and minus an
4.4 ADVANTAGES AND LIMITATIONS OF FIT-INDEPENDENT STATISTICS
57
order of magnitude (multiplying and dividing by 10), rather than plus and minus a particular hydraulic-conductivity value. The utility of the scaling results from the underlying physics. For some types of parameters, scaling by the parameter value can produce misleading results. For example, in groundwater models, parameters that represent hydraulic head at constant-head boundaries pose a special problem. By using Eqs. (4.3) to (4.5) and (4.7), a flow system at sea level would have different dimensionless and one-percent scaled sensitivities than the identical system at 100 meters above sea level, which indicates that these scaled sensitivities provide misleading results. In other types of models, the test of whether a change of datum would change the dss values can be used to determine when scaling by the parameter value is problematic. In MODFLOW-2000, the scaling difficulty affects Constant-Head Boundary (CHD) parameters (Harbaugh et al., 2000, pp. 78 –79). For CHD parameters, modified versions of Eqs. (4.3) to (4.5) and (4.7) are used in which the scaled sensitivities are not multiplied by the parameter value. Thus, the values printed in the table of dimensionless scaled sensitivities (in the MODFLOW-2000 output file) for these parameters are not dimensionless; they have units of 1.0 divided by length. They can be thought of as the amount that the dependent variable would change if the CHD parameter changed by 1.0 unit, where the unit depends on how the parameter is defined and is commonly foot or meter. UCODE_2005 is generally applicable, so presentation of scaled sensitivities cannot be tailored to particular types of parameters. Modelers need to be aware that scaled sensitivities of parameters for which a change of datum would change the dss values are misleading and should not be used. An alternative scaling that may be useful in some circumstances was proposed by Tiedeman et al. (2003) and suggests that the parameter value be replaced by the parameter standard deviation (sbi). This scaling can be achieved by multiplying one-percent scaled sensitivities by 100/sbi. As discussed in Chapter 7, Section 7.2.2, sbi depends on model fit, making such scaled sensitivities fit-dependent. Finally, the effects of both nonlinearity and scaling by the parameter value cause scaled sensitivities to be different for different sets of parameter values. If the differences that occur for a reasonable range of parameter values are too extreme, such that different parameters are rated as important when calculated at one set of parameter values and not important when calculated at another set, the scaled sensitivities are inadequate for the purposes they serve in the guidelines discussed in Chapters 10 – 14. Their utility can be tested by calculating values for several sets of parameter values. In practice, the sensitivity analysis suggested in this book has proved to be useful even for highly nonlinear problems, as gauged by the modified Beale’s measure discussed in Chapter 7, Section 7.7. For example, their utility is demonstrated in many groundwater flow and transport problems (Anderman et al., 1996; Barlebo et al., 1996; D’Agnese et al., 1997, 1999, 2002; Poeter and Hill, 1997; Hill et al., 1998). Problems that are too nonlinear for the sensitivity analysis to be useful also may be too nonlinear for the gradient optimization methods described in Chapter 5 to be useful, but this has not been tested. Problems that are too nonlinear
58
INFORMATION THAT OBSERVATIONS PROVIDE
for gradient optimization methods need to be addressed using global search methods such as simulated annealing and genetic algorithm (see Chapter 5, Section 5.2), which are much more computationally intensive than gradient methods. 4.4.2
Parameter Correlation Coefficients
Parameter correlation coefficients (pcc) have two advantages. First, they are easier to understand than alternatives such as eigenvector analysis or the closely related singular value decomposition, as noted in Section 4.3.5. Second, except for problems related to accuracy of the sensitivities, the degree of correlation can be determined easily by comparing the absolute value of the pcc to the value 1.00. There are three limitations associated with the pcc. First, the nonlinearity of inverse problems can cause correlation coefficients to be quite different for different sets of parameter values, as shown in Figure 4.2. In Figure 4.2, the objective-function surface has a distinct minimum, indicating that the parameters can be uniquely estimated. The absolute values of pcc calculated at many of the parameter values are significantly less than 1.00, correctly indicating the existence of a unique minimum. However, pcc with absolute values very close to 1.00 are calculated for some sets of parameter values. These large pcc values could lead to the incorrect conclusion that a unique minimum does not exist. In practice, nonuniqueness can only be clearly concluded if supported by an analysis of the simulated processes and available data, by using pcc values calculated for a range of parameter values, or by using regression to investigate uniqueness as discussed in Chapter 7. A second concern about pcc is that they can be inaccurate when calculated using sensitivities with an inadequate number of correct significant digits (Hill and
FIGURE 4.2 Correlation of parameters T1 and T2 of the simple model shown in Figure 3.1a. Correlation coefficients are calculated at different parameter values and are plotted on the log10 weighted least-squares objective-function surface shown in Figure 3.1c. T1 and T2 are in square meters per day. (From Poeter and Hill, 1997.)
4.4 ADVANTAGES AND LIMITATIONS OF FIT-INDEPENDENT STATISTICS
59
Østerby, 2003). The accuracy of perturbation sensitivities suffers if the perturbation amount is too large for nonlinear parameters or too small for insensitive parameters, or if simulated values have an insufficient number of significant digits. The latter can occur because the process model lacks numerical precision or does not print numbers with sufficient precision, or the template or instruction input files for codes such as UCODE_2005 or PEST have not been set up to include enough significant digits. See Poeter et al. (2005) for further discussion. The accuracy of sensitivity-equation sensitivities can suffer if the convergence criteria for the solver are too large or the numerics of the model are inadequate. Following suggestions to enhance sensitivity precision is always important when calculating pcc values. The third issue related to pcc is that as parameter sensitivity decreases, greater sensitivity precision is required for the pcc to be accurate (Hill and Østerby, 2003). In general, as more parameters are defined, parameter sensitivity is reduced. Composite scaled sensitivities (css) can be used to identify insensitive parameters. Combining parameters can be used to identify existing correlation that is obscured by having many defined parameters. For example, consider that the hydraulic conductivity Krock of a fractured rock aquifer in an initial groundwater model has been divided into parameters Kgranite and Kschist, corresponding to different rock types within the aquifer, in a subsequent model. Commonly, the cssj for Kgranite and for Kschist will be smaller than that for Krock. Even if the sensitivities in the two models are precise to the same number of significant digits, the pcc in the first model may be more accurate than those in the second model because the parameter sensitivity has decreased. As the number of defined parameters increases, in any problem a point will be reached at which the pcc are no longer reliable indicators of parameter correlation. The reverse also is true—as parameter sensitivity increases, less precision is required for the pcc to be accurate. This characteristic can be used to advantage. For example, in groundwater models, the correlation that commonly occurs between recharge and hydraulic conductivity may not be revealed by pcc when many parameters are defined. By combining all the recharge and all the hydraulicconductivity parameters into a few parameters (using multiplication arrays to preserve the original spatial distribution of values), a more definitive test of parameter correlation can be achieved. Any extreme correlation that occurs for the few parameters also is present in the set of many parameters, it just cannot be identified using the pcc values calculated for the large set of parameters. 4.4.3
Leverage Statistics
Leverage statistics have the advantage of reflecting the importance of observations produced by the effects measured by both scaled sensitivities and correlation coefficients. They also have the advantage of not needing to be scaled and therefore do not inherit the difficulties of scaling discussed in Section 4.4.1. One difficulty of leverage statistics is that they do not reveal why observations are important; scaled sensitivities and parameter correlation coefficients can be used to gain insight. In
60
INFORMATION THAT OBSERVATIONS PROVIDE
addition, nonlinearity is likely to produce the same types of changes in leverage values that occur for pcc values. To the authors’ knowledge, this has not been tested.
4.5
EXERCISES
Exercise 4.1: Sensitivity Analysis for the Steady-State Model with Starting Parameter Values In this exercise, sensitivities, parameter correlations, and leverage statistics for the steady-state flow system described in Chapter 2, Section 2.2 are calculated and evaluated. (a) Calculate sensitivities for the steady-state flow system. This exercise involves modifying computer files and simulating the system. Instructions are available from the web site for this book described in Chapter 1, Section 1.1. Students who are not performing the simulations may skip this exercise. (b) Use dimensionless and composite scaled sensitivities (dss and css) to evaluate observations and defined parameters. Dimensionless and composite scaled sensitivities are presented in Table 4.1. These statistics are discussed in Sections 4.3.3 and 4.3.4, Guideline 3 in Chapter 11, and Guideline 11 in Chapter 13. Plotting the composite scaled sensitivities on a bar graph as shown in Figure 4.3 is an effective method for showing how much information the observations likely provide for each parameter. Problem . Use the dimensionless scaled sensitivities of Table 4.1 and the discussion in Section 4.3 to identify which observations are most important to estimation of parameter HK_1. Use information about the flow system to explain why the dss for observations hd01.ss, hd07.ss, and flow01.ss are much smaller than the dss for the other observations. . Use the composite scaled sensitivities of Table 4.1 and Figure 4.3 and the discussion in Section 4.3 to assess whether it is likely that all of the parameters for this model can be estimated with the available head and flow observations. (c) Evaluate parameter correlation coefficients ( pcc) to assess parameter uniqueness. Use the parameter correlation coefficients shown in Tables 4.2 and 4.3 and the criterion presented in Section 4.3.5 to identify parameter values that might be difficult to estimate uniquely with the 10 head and one flow observations. In these tables, results calculated by (a) MODFLOW-2000, with the more accurate sensitivity-equation sensitivities, and (b) UCODE_2005, using less accurate perturbation sensitivities, are presented to show the effects of sensitivity inaccuracy.
61
hd01.ss hd02.ss hd03.ss hd04.ss hd05.ss hd06.ss hd07.ss hd08.ss hd09.ss hd10.ss flow01.ss
ID 24
41.3
0.110 10 233.3 257.9 233.3 246.5 233.4 22.34 257.5 266.6 246.3 20.547 1023
HK_1
See Table 3.1 for parameter label definitions.
a
Composite scaled sensitivity
1 2 3 4 5 6 7 8 9 10 11
Number
Observation
0.214
20.225 20.225 20.225 20.225 20.225 20.225 20.225 20.225 20.225 20.225 20.663 1024
K_RB 26
0.783
0.105 10 20.284 20.493 20.284 20.394 20.635 22.38 20.133 20.580 1021 20.330 20.260 1025
VK_CB
Parameter Labelsa 25
11.0
0.383 10 25.47 215.7 25.47 29.95 25.35 2.08 216.0 223.3 210.1 20.190 1023
HK_2
TABLE 4.1 Dimensionless and Composite Scaled Sensitivities Calculated for Exercise 4.1a Using MODFLOW-2000 (Values Calculated by UCODE_2005 Are Similar)
27.4
0.150 24.0 38.3 24.0 32.9 24.0 1.82 37.8 38.1 32.6 27.36
RCH_1
25.6
0.0749 15.3 35.9 15.3 24.1 15.6 1.04 36.1 52.1 24.4 23.68
RCH_2
62
INFORMATION THAT OBSERVATIONS PROVIDE
FIGURE 4.3 Composite scaled sensitivities (css) for the steady-state simulation calculated using starting parameter values.
TABLE 4.2 Parameter Correlation Coefficient ( pcc) Matrixa Calculated by (a) MODFLOW-2000 and (b) UCODE_2005 with Central-Difference Perturbation, Using the Starting Parameter Values for the Steady-State Problem with 10 Hydraulic-Head Observations and One Streamflow Gain Observation HK_1
K_RB
1.00
20.37 1.00
VK_CB
HK_2
RCH_1
RCH_2
20.75 0.31 0.82 1.00
0.95 20.22 20.68 20.83 1.00
20.63 0.25 0.81 0.98 20.76 1.00
20.76 0.32 0.82 1.00
0.95 20.24 20.68 20.83 1.00
20.63 0.25 0.81 0.98 20.76 1.00
(a) MODFLOW-2000 HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
20.57 20.11 1.00
Symmetric
(b) UCODE_2005 HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2 a
1.00
20.39 1.00 Symmetric
Cells are in bold type for pcc 0.95.
20.57 20.10 1.00
63
4.5 EXERCISES
TABLE 4.3 Parameter Correlation Coefficient ( pcc) Matrixa Calculated by (a) MODFLOW-2000 and (b) UCODE_2005 for Starting Parameter Values for the Steady-State Problem Using Only the 10 Hydraulic-Head Observations HK_1
K_RB
VK_CB
(a) MODFLOW-2000 HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
1.00
1.00 1.00
1.00 1.00 1.00
Symmetric
HK_2
RCH_1
RCH_2
1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00
1.00 0.97 1.00 1.00
1.00 0.97 1.00 1.00 1.00
1.00 0.97 1.00 1.00 1.00 1.00
b
(b) UCODE-2005c HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
1.00
0.97 1.00 Symmetric
1.00 0.97 1.00
Cells are in bold type for pcc 0.95. The correct values of 1.00 calculated by MODFLOW-2000 use the more accurate sensitivity-equation sensitivities. c The incorrect values calculated by UCODE_2005 are caused by the less accurate central-difference perturbation sensitivities. a b
Problem . When the flow observation is included, are any of the pcc calculated by MODFLOW-2000 (Table 4.2a) above 0.90? Above 0.95? Although parameters with pcc values up to nearly 1.00 can probably be estimated uniquely, it is useful to be aware of which parameters have these relatively high correlations. . Based on the comments above, what do the pcc values indicate about the likelihood of being able to estimate all of the parameters independently using the head and flow data? . When only hydraulic-head observations are included, why are all the parameters extremely correlated, as indicated by the results from MODFLOW2000 (Table 4.3a)? . Why are the correlation coefficients calculated by UCODE_2005 unable to capture fully the extreme parameter correlation of all parameters when using only hydraulic-head observations (Table 4.3b)? (d) Use contour maps of one-percent sensitivities for the steady-state flow system. Contour maps of one-percent scaled sensitivities for the steady-state system, calculated for the starting parameter values, are shown in Figure 4.4. Each map is
64
INFORMATION THAT OBSERVATIONS PROVIDE
FIGURE 4.4 Contour maps of one-percent scaled sensitivities of hydraulic head for the steady-state model, calculated using Eq. (4.7), where y0 is hydraulic head evaluated at each cell in the model grid and b is one of the six model parameters: (a) HK_1, (b) HK_2, (c) K_RB, (d) VK_CB, (e) RCH_1, and ( f ) RCH_2. The sensitivities are calculated using the starting parameter values. Contour labels apply to sensitivities in both layers for all maps except that for VK_CB.
related to one parameter and can be used to identify areas with relatively large and small absolute values of scaled sensitivity. Areas with large absolute values indicate where hydraulic-head measurements are likely to be most important for estimating the parameter. Because the sensitivities are scaled by the parameter values and all sensitivities are for the same observation type, the one-percent scaled sensitivity maps also can be compared with each other.
4.5 EXERCISES
65
The simple system considered here provides the opportunity to (1) identify how sensitivities reflect system dynamics, and (2) demonstrate the utility of sensitivity analysis and the role sensitivities play in regression. This exercise can form a frame of reference for considering sensitivities calculated in more complicated systems. Problem: Explain the one-percent sensitivity maps from the steady-state system (Figure 4.4), basing your analysis on characteristics of fluxes into, out of, and within the flow system. The cell-by-cell fluxes and the boundary fluxes of the steady-state flow system along a cross section perpendicular to the river (along a row) are shown in Figure 4.5. The steady-state flow system is two-dimensional, because all features are the same for any row; thus, all information about the system is portrayed in a cross section along any row of the model. The total fluxes through the entire model are obtained by multiplying the values in Figure 4.5 by 18, which is the number of rows. In explaining the sensitivities, answer the following questions: .
.
Why are the one-percent scaled sensitivities negative for hydraulicconductivity parameters HK_1 and HK_2, and positive for the recharge parameters RCH_1 and RCH_2? Why are the magnitudes of the one-percent scaled sensitivities larger for HK_1 than for HK_2?
FIGURE 4.5 Cell-by-cell fluxes, in m3/s, along any model row of the steady-state flow system with the true parameter values.
66
INFORMATION THAT OBSERVATIONS PROVIDE .
.
.
Why do the one-percent scaled sensitivities for RCH_1 vary only over the left half of the system, whereas those for RCH_2 vary over the entire domain? Why is VK_CB the only parameter for which there are substantial differences in the one-percent scaled sensitivities for model layers 1 and 2? Why are the one-percent scaled sensitivities for K_RB the same throughout the system?
(e)
Evaluate leverage statistics.
For the initial model, four observations have leverage statistics that are larger than 0.90: flow01.ss 1.00 hd01.ss 0.99 hd07.ss 0.97 hd09.ss 0.94 Problem: Leverage statistics reflect the combined effects of sensitivity and correlation. Use the leverage statistics, the discussions of Sections 4.3 and 4.4, and Tables 4.1 and 4.2 to address the following questions. .
.
For each of the high leverage observations, which parameters have the largest dimensionless scaled sensitivities? Evaluate whether the high leverage observations are dominated by sensitivity or correlation considerations. Use the system dynamics that contribute to the importance of each observation.
5 ESTIMATING PARAMETER VALUES
As part of model calibration, it is often useful to determine parameter values that produce the smallest possible value of the objective function in Eq. (3.1) or (3.2). The process of calculating such parameter values is called optimization. If more than one set of parameter values produce the same small objective-function value, the resulting parameter values define multiple minima; if only one set of parameter values produces the smallest objective-function value, the resulting parameter values define a unique minimum. If the optimization problem has a unique minimum and the objective function is smooth enough, as in Figure 3.1b, c, optimization methods that use calculated sensitivities are very advantageous in that they are computationally efficient. These are called gradient methods because they generally use the gradient of the objective-function surface to determine how to proceed toward the minimum. They are also called regression methods. Nonlinear regression, instead of the simpler linear regression, is needed when simulated values are nonlinear functions of the parameters being estimated. This is common in groundwater models, as discussed in Chapter 1, Section 1.4.1. Model nonlinearity produces important complications to regression and has been the topic of considerable investigation in several fields. Seber and Wild (1989) and Dennis and Schnabel (1996) are excellent upper-level texts on nonlinear regression. The discussion in this book is the most accessible nonlinear regression presentation known to the authors. This book uses a modified Gauss –Newton nonlinear regression method. The method uses an iterative form of standard linear-regression equations and works Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty. By Mary C. Hill and Claire R. Tiedeman Published 2007 by John Wiley & Sons, Inc.
67
68
ESTIMATING PARAMETER VALUES
well only with modifications. This chapter describes the difficulties of the method and most of the modifications used by MODFLOW-2000 and UCODE_2005. Appendix B describes additional modifications, including quasi-Newton updating. The modified Gauss – Newton method presented here is an extension of the method presented by Cooley and Naff (1990, Chap. 3), which is similar to methods presented by Seber and Wild (1989), Sun (1994), Tarantola (2005), Dennis and Schnabel (1996), and other texts on nonlinear regression. The modified Gauss – Newton method presented in this book also can be categorized as a Marquardt or Levenberg –Marquardt method. The approach forms the basis of most multicriteria gradient optimization methods (Ehrgott, 2000). The modified Gauss – Newton method presented here has performed well relative to alternatives in that fewer or an equivalent number of total model evaluations are required and it is at least as robust as the alternatives. Cooley (1985), Hill (1990), and Cooley and Hill (1992) compare the modified Gauss –Newton method to quasi-linearization, quasi-Newton, Fletcher–Reeves, a combined Fetcher – Reeves/ quasi-Newton, a modified Gauss – Newton/full-Newton hybrid, and the modified Gauss – Newton method with the quasi-Newton updating described in Appendix B. They considered problems of steady-state and transient groundwater flow in which relatively few parameters are estimated. Results presented by Mehl and Hill (2003) suggest that the double-dogleg trust region approach of Dennis and Schnabel (1996) can substantially reduce execution times for difficult problems. This method is available in UCODE_2005; it is not described in this book.
5.1
THE MODIFIED GAUSS – NEWTON GRADIENT METHOD
Parameter values that minimize the least-squares objective function (Eqs. (3.1) and (3.2) in Chapter 3) are calculated using normal equations. Section 5.1.1 presents the normal equations for the modified Gauss –Newton method used in this work and uses a one-parameter problem to illustrate aspects of the method. Section 5.1.2 presents a two-parameter example problem that demonstrates the iterations required to solve nonlinear regression using the normal equations. Finally, Section 5.1.3 discusses the convergence criteria that govern when to stop the iterative process. 5.1.1
Normal Equations
Normal equations are derived by taking the derivative of the objective function with respect to the parameters and setting the derivative equal to zero. By using Eq. (3.2), this becomes
@ ½ y y0 (b)T v½ y y0 (b) ¼ 0 @b
(5:1)
where 0 is a vector of NP values that all equal zero, and NP is the number of estimated parameters.
5.1 THE MODIFIED GAUSS–NEWTON GRADIENT METHOD
69
When y0(b) is nonlinear, Eq. (5.1) is solved by approximating y0 (b) as a linear function using two terms of a Taylor series expansion, so that @y0 (b) (b b0 ) y (b) ffi y (b) ¼ y (b0 ) þ @b b¼b0 0
‘
0
(5:2a)
where y‘ (b) ¼ the linearized form of y0 (b); 0 b0 ¼ the vector of parameter values about which y (b) is linearized; @ y0 (b) ¼ the sensitivity matrix calculated using the parameter values @b b¼b0 listed in vector b0 . The vector y0 (b) has ND þ NPR elements, where ND is the number of observations and NPR is the number of prior information equations; the vector b has NP elements, where NP is the number of estimated parameters. If the sensitivities are expressed as the matrix X, Eq. (5.2a) can be written y‘ (b) ¼ y0 (b0 ) þ Xjb¼b0 (b b0 )
(5:2b)
where X is the sensitivity matrix (also called the Jacobian matrix), with elements equal to @yi0/@bj. X has ND þ NPR rows and NP columns, so i ¼ 1, ND þ NPR and j ¼ 1, NP as shown in Appendix B. To understand what linearizing y0 (b) means, it is useful to consider a model that has only one parameter. Here we consider the Theim equation, which describes the shape of a steady-state cone of depression around a pumping well given a homogeneous groundwater system and a constant head at radial distance r0. Figure 5.1a shows the nonlinear function linearized about b0 ¼ 0.005, which is a starting guess for the value of the transmissivity parameter (T ). At b0 , the linearized approximation equals the nonlinear function; away from b0 the linearized approximation generally differs from the nonlinear function. The function can be represented by the notation y0 (b), where b is not bold because there is only one parameter and y is not bold because it represents the function, not a vector of values simulated for a set of observations. The Gauss –Newton normal equations are developed by substituting the linearized approximation of y0 (b) into the objective function. Using the expression of the least-squares objective function from Eq. (3.2) gives S‘ (b) ¼ ½ y y‘ (b)T v½ y y‘ (b)
(5:3)
Again using the one-parameter problem to understand what this equation represents, Figure 5.1b shows the shapes of the least-squares objective function calculated using the nonlinear model and the model linearized about b0 . The figure shows that the linearized objective function reaches a minimum value at T ¼ 0.007, which is closer to the minima of the nonlinear objective function than is the starting guess of T ¼ 0.005. Starting at b0 , the Gauss –Newton method uses the objective function
70
ESTIMATING PARAMETER VALUES
FIGURE 5.1 (a) Nonlinear model y0 (b) and linearized approximation of y0 (b), linearized about point T ¼ b ¼ b0 ¼ 0.005. (b) The objective function calculated using the nonlinear function y0 (b) and the linearized objective function calculated using the linear approximation of y0 (b). The nonlinear model is the Theim equation, s ¼ [Q/(2pT)] ln(r/r0), with Q (pumpage) ¼ 1, r0 (distance of zero drawdown) ¼ 1000, and generated “observations” at r (distance to the observation well) ¼ 1, 2, 4, 6, 10, 40, 80, 120, 200, 300, 450, 600. No noise was added to the “observations”.
formed using the linear model to determine how the parameter value should be changed. Ideally, moving from b0 to the minimum of the linearized objective function will result in an estimated parameter value that is closer to the minimum of the nonlinear objective function. This is indeed the case for the objective functions shown in Figure 5.1b. One difference between linear and nonlinear regression is that in linear regression, parameter values are estimated by solving the normal equations once. In contrast, nonlinear regression is iterative in that a sequence of parameter updates is calculated, solving linearized normal equations once for each update. Thus, in nonlinear regression there are parameter-estimation iterations. The iterative form of the normal equations needed to solve nonlinear regression problems is produced by minimizing the objective function of Eq. (5.3). This is accomplished by taking the derivative with respect to the parameter values and setting it to zero. In addition, the superscript 0 (shown in Eq. (5.2b)) is replaced by r, which identifies the parameter-estimation iteration. The resulting Gauss –Newton nonlinear regression normal equations are (X Tr vXr )dr ¼ XTr v( y y0 (br ))
(5:4)
where r ¼ the parameter-estimation iteration number; Xr ¼ the sensitivity matrix calculated for the parameter values in br;
5.1 THE MODIFIED GAUSS–NEWTON GRADIENT METHOD
71
v ¼ the weight matrix; (X Tr vX r ) ¼ a symmetric, square matrix of dimension NP by NP (as noted in Chapter 4, Section 4.3.4, X Tv X is related to the Fisher information matrix); dr ¼ an NP-dimensional vector used to update the parameter estimates (called the parameter change vector in this book); br ¼ the vector of parameter estimates at the start of iteration r. The sensitivity matrix Xr appears in Eq. (5.4) because taking the derivative of Eq. (5.3) produces sensitivities @y0 /@b. When calculated at parameter values br, these sensitivities can be expressed as matrix Xr. For the first parameter-estimation iteration, the model is linearized about starting parameter values defined by the modeler. In each subsequent iteration, the model is linearized about parameter values estimated in the previous iteration. For each parameter-estimation iteration, Eq. (5.4) is solved for dr , and then dr is used to update the parameter values for the start of iteration r þ 1, using the equation brþ1 ¼ br þ dr . In practice, a modified form of this equation is used, as described later in this chapter. Figure 5.2 shows how Eq. (5.4) relates to the geometry of a linearized objective-function surface for a hypothetical two-parameter problem. The right side of Eq. (5.4) is proportional to the gradient of the linearized objective function. Without the (XTr vXr ) term on the left side of Eq. (5.4), the parameter change vector dr would point directly down the gradient of the linearized objective-function surface, as shown by arrow A in Figure 5.2. This is called the steepest descent direction. The (X Tr vX r ) term modifies the direction of dr to point toward the minimum of the linearized objective-function surface, as shown by arrow B in Figure 5.2. The basic Gauss – Newton method presented in Eq. (5.4) is prone to difficulties such as oscillations due to overshooting the optimal parameter values. It only
FIGURE 5.2 A linearized objective-function surface for a hypothetical two-parameter problem, illustrating the geometry of the normal equations. The arrows represent the direction relevant to the parameter change vector dr . Arrow A points down gradient in a direction defined by the right-hand side of Eq. (5.4). Arrow B points in the direction of dr solved for using Eq. (5.4) or (5.5). Arrow C shows that the direction of dr , solved for using a nonzero Marquardt parameter in Eq. (5.6), is between arrows A and B.
72
ESTIMATING PARAMETER VALUES
works well when modified. Three important modifications are scaling, the Marquardt parameter, and damping. These are discussed in the following paragraphs. Scaling Often the parameter values, and thus the sensitivities, have values that differ by many orders of magnitude. This can cause great difficulties with obtaining an accurate solution of Eq. (5.4). The accuracy of the parameter change vector dr can be improved by scaling Eq. (5.4). The scaling is implemented as (CT X Tr vX r C)C1 dr ¼ CT X Tr v( y y0 (br ))
(5:5a)
where C is a diagonal scaling matrix with element cjj equal to [(XTvX)jj]21/2. Marquardt Parameter The resulting scaled matrix has the smallest possible condition number (Forsythe and Strauss, 1955; Hill, 1990). Scaling with C changes the magnitude but not the direction of dr . Therefore, in Figure 5.2 the parameter change vector dr still points in the direction of arrow B after scaling has been implemented. In some circumstances, the direction of the change vector dr is nearly parallel to the contours of the objective-function surface and changing the parameter values using dr yields little progress toward estimating optimal parameter values. In this case, changing the direction of dr can be advantageous. The second modification involves introduction of a term that causes the direction of vector dr to move toward the steepest-descent direction. The term is called the Marquardt parameter (Marquardt, 1963; Theil, 1963; Seber and Wild, 1989; Cooley and Naff, 1990). In Figure 5.2 a nonzero Marquardt parameter moves the direction of dr from the direction of arrow B to the direction of arrow C. The Marquardt parameter is included in the scaled objective function of Eq. 5.5a as (CT X Tr vX r C þ Imr )C1 dr ¼ CT XTr v( y y0 (br ))
(5:5b)
where I is an NP NP identity matrix and mr is the Marquardt parameter. The procedure for determining the Marquardt parameter is discussed in the next section. Damping Overshoot is a common problem with the Gauss – Newton method, so damping is introduced. Overshoot occurs when the parameter change vector points toward locations on the objective-function surface that are closer to the minimum of the nonlinear objective-function surface, but then extends beyond these locations to larger objective-function values. Damping helps prevent overshoot by allowing the parameters to change less than the full amount calculated by dr . This can significantly improve regression performance. Damping is applied when updating the parameter values using the parameter change vector dr . Including damping in Eq. (5.5b) produces (CT X Tr vX r C þ Imr )C1 dr ¼ CT XTr v( y y0 (br )) brþ1 ¼ rr dr þ br
(5:6a) (5:6b)
where rr is the damping parameter. Together, Eqs. (5.6a) and (5.6b) almost express the normal equations and the iterative process for the modified Gauss –Newton
5.1 THE MODIFIED GAUSS–NEWTON GRADIENT METHOD
73
optimization method used in UCODE_2005 and MODFLOW-2000. Additional modifications include an iteration control mechanism and a quasi-Newton modification that are discussed in Appendix B. In addition, UCODE_2005 provides a trust region approach not described here. Calculation of the Marquardt and damping parameters is discussed next. Calculate the Marquardt Parameter The Marquardt parameter is used to change the direction of and shorten dr . These modifications improve regression performance for ill-posed problems (Marquardt, 1963). In Eq. (5.6a), mr initially equals 0.0 for each parameter-estimation iteration r. If dr is nearly orthogonal to the steepest descent direction, the resulting parameter changes are unlikely to reduce the value of the objective function, and mr is changed to a nonzero value. The modification to the direction and length of dr caused by mr . 0.0 is illustrated in Figure 5.2 as the change from arrow B to arrow C. In MODFLOW-2000 and UCODE_2005, the value of the Marquardt parameter is determined as suggested by Cooley and Naff (1990, pp. 71– 72). If the cosine of the angle between the vector dr and the vector orthogonal to the steepest descent direction is less than a threshold value (commonly 0.08), mr is increased using ¼ a mold the relation mnew r r þ b. Commonly, a ¼ 1.5 and b ¼ 0.001. The threshold value for the cosine of the angle and a and b can be specified by the user. PEST handles the Marquardt parameter somewhat differently in that it is applied to the unscaled matrix. The results obtained by PEST have been similar to those achieved by MODFLOW-2000 and UCODE_2005 in tests conducted by the authors. John Doherty (oral communication, 2003), author of PEST, suggested that PEST converged in one less parameter-estimation iteration in some circumstances, but the specifics of his numerical experiments are unknown. Calculate the Damping Parameter The damping parameter, rr, shortens dr and can vary in value from 0.0 to 1.0. This parameter modifies all values in the parameter change vector dr by the same factor. Thus, in vector terminology, the direction of dr is preserved. For each parameter-estimation iteration, the damping parameter initially equals 1.0 but is changed to a smaller value for either of two reasons: 1. To ensure that the absolute values of fractional parameter value changes are all less than a value specified by the user. This value is the input variable MaxChange of UCODE_2005 and MAX-CHANGE of MODFLOW-2000. In this book, this value is referred to as max-allowed-change. 2. To damp oscillations that occur when elements in dr and dr-1 define opposite directions (Cooley, 1993), implemented as described in Appendix B. To evaluate whether damping needs to be implemented for reason 1, fractional parameter value changes are calculated for each native parameter value as r r r r (brþ1 j jrr ¼1:0 b j )=jbj j ¼ d j =jb j j, j ¼ 1, NP
(5:7)
74
ESTIMATING PARAMETER VALUES
where brj is the jth element of vector br, that is, the value of the jth parameter at parameter estimation iteration r. If brj equals 0.0, 1.0 is used in the denominator. The in Eq. (5.7) is calculated using Eq. (5.6b) with rr ¼ 1.0. That is, the value of brþ1 j value is calculated assuming no damping. In this book the absolute value of the largest fractional parameter value change calculated using Eq. (5.7) is referred to as max-calculated-change. If max-calculated-change is greater than max-allowed-change, rr is calculated as follows unless oscillation concerns (reason 2 above) result in an even smaller value:
rr ¼
max-allowed-change max-calculated-change
(5:8)
Following computation of rr by Eq. (5.8), brþ1 is calculated by Eq. (5.6b) and contains the parameter values for starting the next parameter-estimation iteration. A somewhat different procedure is used for calculating the damping parameter for model parameters that are log-transformed in the regression. This procedure is described in Section 5.4 and Appendix B. Typically, max-allowed-change has been the same for all parameters. UCODE_2005 and PEST, however, allow different values of max-allowed-change to be assigned to different parameters. This is likely to be used to allow insensitive parameters to change more than sensitive parameters so that the insensitive parameters do not produce tiny damping parameters that can restrict updates of sensitive parameters to the point where no progress can be made. 5.1.2
An Example
To understand more clearly how the modified Gauss – Newton method works, consider its performance for the two-parameter model shown in Figure 5.3. The data shown in Figure 5.3a are transient groundwater level drawdowns caused by pumpage from a single well. The model used is the Theis equation, in which drawdown is a nonlinear function of two parameters: the transmissivity (T) and the storage coefficient (S). Both parameters are estimated. The observations are the drawdowns listed in Figure 5.3a. The nonlinear objective-function surface is shown in Figure 5.3b. Conceptually, this is analogous to the objective function in Figure 5.1b produced using the nonlinear function of Figure 5.1b. Figure 5.3c and Figure 5.3d show approximations of the objective-function surface produced by linearizing the Theis equation about the parameter values marked by X1 and X2. The problem is linearized by replacing the Theis equation with the first two terms of a Taylor series expansion (Eq. (5.2)) in which b0 includes the parameter values at X1 or X2 , and using this linearized model to replace y0 (b) in Eq. (3.2) to obtain Eq. (5.3). As in Figure 5.1b, the linearized objective-function surfaces approximate the nonlinear surface well near b0 and less well further away. In Figure 5.3c, the objective function is linearized about a point (X1) far from the minimum (†) of the nonlinear objective function. Moving from this point all the
5.1 THE MODIFIED GAUSS–NEWTON GRADIENT METHOD
75
FIGURE 5.3 Model equation, data, and objective-function surfaces for a nonlinear model. (Example from Cooley and Naff, 1990, p. 66.)
way to the minimum of the linearized objective-function surface (a point to the left of the plot) would overshoot the nonlinear objective-function minimum. As mentioned previously, this is a common problem with unmodified Gauss –Newton methods. Here, proceeding to the minimum of the linearized surface would produce a negative value of transmissivity, which is computationally infeasible. This is a situation in which more advantageous results can be obtained by limiting the parameter value changes using the damping parameter rr of Eq. (5.6b). With damping, the regression moves only part of the way from X1 to the minimum of the linearized surface.
76
ESTIMATING PARAMETER VALUES
In Figure 5.3d, the objective function is linearized about a point near the minimum of the nonlinear objective function. In this case, moving to the minimum of the linearized objective-function involves small changes in the parameter values, and damping is not needed. Moving to this minimum produces parameter values near the minimum of the nonlinear objective-function, which is the goal of the regression. Figure 5.3d also shows that the linearized model closely replicates the objectivefunction surface near the minimum. This has consequences for the applicability of linear inferential statistics, such as linear confidence intervals, as discussed in Chapter 7, Section 7.5.1 and Chapter 8, Section 8.4.2. The figures of the objectivefunction surfaces also can be used to better understand nonlinear confidence intervals, as discussed in Chapter 7, Section 7.5.1 and Chapter 8, Section 8.4.3. 5.1.3
Convergence Criteria
Convergence criteria are needed to determine when to stop the modified Gauss – Newton iterative process. In UCODE_2005 and MODFLOW-2000, parameter estimation converges if either one of two convergence criteria are satisfied. By the first criterion, convergence is achieved when the parameter values change only a small amount from one parameter-estimation iteration to the next. This indicates that at the current regression iteration, the parameter values lie in a relatively flat area that is a minimum in the objective-function space. For untransformed parameters, this condition is satisfied if, for all parameters, max-calculated-change in Eq. (5.8) is less than max-allowed-change (user-specified variable TolPar of UCODE_2005 and TOL of MODFLOW-2000). That is, using the UCODE_2005 variable name, max-calculated-change , TolPar
for all j ¼ 1, NP
(5:9)
Preferably, this convergence is achieved in the final calibrated model with a criterion value no larger than 0.01. For log-transformed parameters, a modified form of Eq. (5.9) is used, as described in Section 5.4 and Appendix B. TolPar typically is 0.01 or 0.001 for final regressions, indicating that convergence is reached when parameter values are changing between parameter-estimation iterations no more than 1 or 0.1 percent. There are situations in which it is advantageous for larger values of TolPar to be specified, especially for preliminary regressions. Typically, TolPar has been the same for all parameters. UCODE_2005 and PEST, however, allow different values of TolPar to be assigned to different parameters. This is likely to allow inclusion of parameters that are too insensitive to achieve the small convergence criteria imposed on most parameters, but not so insensitive that the instabilities are very large. There has been little experience so far with this option. By the second convergence criterion, the nonlinear regression converges if the model fit changes little over the course of two parameter-estimation iterations. If three consecutive values of the least squares objective function (Eq. (3.1) or (3.2)) change less than a user-defined amount (TolSOSC of UCODE_2005 and SOSC of MODFLOW-2000), nonlinear regression converges. The model-fit
5.2 ALTERNATIVE OPTIMIZATION METHODS
77
criterion often is useful early in the calibration process to avoid lengthy simulations that fail to improve model fit. However, satisfying this criterion does not provide as strong an indication that a minimum has been reached as the parameter-value criterion. Therefore, for final regression runs, it is preferable that the parametervalue criterion be satisfied. As discussed by Cooley and Naff (1990, p.70), modified Gauss – Newton optimization typically converges within “a number of iterations equal to five or twice the number of parameters, whichever is greater.” Well-conditioned problems (commonly those with large css values and little correlation) tend to converge in fewer iterations than poorly conditioned problems. It is rarely fruitful to increase the number of iterations to more than twice the number of parameters, and the resulting runs can take large amounts of computer time. It is generally more productive to consider alternative models (see Guideline 8, Chapter 11).
5.2
ALTERNATIVE OPTIMIZATION METHODS
Alternative algorithms for the minimization of the least-squared objective function with respect to parameter values include methods that use the gradient of the objective function and not the full sensitivity matrix (as used by, e.g., Carrera and Neuman, 1986; Hill, 1992; Xiang et al., 1993; Tarantola, 2005), and global optimization methods such as simulated annealing, genetic algorithms, tabu search, and shuffled complex evolution (SCE) (e.g., Zheng and Wang, 1996; Solomatine et al., 1999; Tsai et al., 2003b; Vrugt et al., 2003; Fazal et al., 2005). For the first set of methods, the steepest descent direction, which equals the derivative of the objective function with respect to the parameter values, generally is calculated efficiently using adjoint states (Hill, 1992; Townley and Wilson, 1985). Scaled derivative of the objective function might be able to replace the composite scaled sensitivities in the guidelines, but this has not been tested. There are no replacements for the one-percent and dimensionless scaled sensitivities, the parameter correlations, and leverage statistics. However, adjoint states themselves can be useful, as discussed by Sykes et al. (1985). In addition, adjoint-state algorithms are often programmed to calculate the sensitivities and the parameter variance– covariance matrix to provide analyses that need them after convergence is reached. In this case, the methods suggested in this book could be used. Global-search methods operate quite differently than gradient methods such as modified Gauss –Newton. Global-search methods do not use sensitivities. Instead, they proceed to the next set of parameters using a long history of the model fit produced by previous sets of parameters. The methods differ in how the previous sets are used. The advantage of global-search methods is their ability to identify parameter values that produce the best fit to observed values and prior information regardless of the degree of model nonlinearity and the presence of local minima. The disadvantage is that they are much more computationally intensive, often requiring execution times that are tens or hundreds of times as long as the execution times required by gradient-search methods.
78
ESTIMATING PARAMETER VALUES
Global-search methods are most useful for problems with very irregular objective-function surfaces that are not amenable to the much more numerically efficient gradient-search methods. For problems with such irregular objective functions, scaled sensitivities, parameter correlation coefficients, and leverage statistics are likely to change dramatically as parameter values change, and thus they are not useful. If the irregularity is local, the methods presented in this book may be useful in part of the solution space. For example, biological processes can be very nonlinear with regard to pH because outside some range of pH the organism dies. Within a certain range, however, and often the range of most interest, the process may be linear enough for the methods presented in this book to be useful. Public-domain programs are available for implementing some common global-search methods. For example, MGO (Zheng and Wang, 2003) provides globalsearch capabilities using genetic algorithms, simulated annealing, and tabu search.
5.3
MULTIOBJECTIVE OPTIMIZATION
When developing models of many natural systems, data are scarce and it often is useful for all data relevant to model outputs to be considered simultaneously using a single objective function. Weighting is used to include many kinds of data. This book focuses on this approach to model calibration. Alternatively, as mentioned in Chapter 3, Section 3.2.3, regression can be performed using subsets of the observations and prior information, whereby each subset is used to define a different objective function. This is called multiobjective optimization. A short description of multiobjective optimization can be found at http://www.fp.mcs.anl.gov/otc/Guide/OptWeb/multiobj/. Recent books on these methods include Statnikov and Matusov (1995) and Ehrgott (2000). In multiobjective optimization, trade-offs between the different objective functions are an integral part of the evaluation. The trade-offs are obtained by weighting different objective functions differently. This has consequences in the implied relative accuracy of the data contained in each of the objective functions. This issue needs to be considered when determining feasible solutions using multiobjective function optimization. For example, solutions with weights that result in one set of data dominating or being ignored may be of interest as part of the analysis but generally are not viable solutions. While for any one combination of weights the regression methods discussed here could be used, in recent applications multiobjective optimization has been accomplished with Shuffle Complex Evolution (SCE) (Nunoo and Mrawira, 2004).
5.4
LOG-TRANSFORMED PARAMETERS
Log-transformed parameters are often useful because the uncertainty of many parameters is best represented by a log-normal probability distribution. When the parameter is log-transformed, the uncertainty is then best represented by a normal
5.4 LOG-TRANSFORMED PARAMETERS
79
distribution, which is convenient to use. Log-transforming dependent variables was discussed in Chapter 3, Section 3.3.3, and is not addressed further here. Log-transformation involves taking the logarithm of selected parameters. Thus, the parameters in vector b of Eq. (3.1) or (3.2) can be either native values or the log-transform of the native values. Log-transforming parameters can produce an inverse problem that converges more easily and prevents the native parameter values from becoming negative (Carrera and Neuman, 1986). Log transformation can be defined using base e or base 10, where base e is also called the natural logarithm. Base 10 is easier for most modelers to use because a log-transformed value of 1 indicates a native value of 10, a log-transformed value of 2 indicates a native value of 100, and so on. Conversion between natural and base 10 logarithms involves multiplying by a factor of 2.3. In UCODE_2005 and MODFLOW-2000, the log-transform is implemented internally using natural logarithms (log e); the input and output use base 10 logarithms as much as possible. Even when some parameters are log-transformed, allowing modelers to consider native values has the advantage of emphasizing the connection between model results and field data. For example, even for log-transformed parameters, it is useful to define starting parameter values for regression runs as native values and to report final estimates as native values. UCODE_2005 and MODFLOW-2000 are constructed so that the user can consider native values as much as possible. There are four special circumstances, one related to model input and three related to model output, in which the modeler has to deal more directly with log-transformed values. The one model input situation occurs when there is prior information on the logtransformed parameter. In this case, only one parameter can be included in the prior information equation (one term in the summation presented after Eq. (3.2)). For UCODE_2005, the specified statistic needs to be related to the base 10 log of the parameter. MODFLOW-2000 can read statistics related to the native value and calculate the statistic related to the log-transformed parameter value. The value of the statistic specified can be determined using the methods described under Guideline 6 in Chapter 11. The first model output situation is fairly subtle and will not be noticed by most modelers. It involves calculation of the damping parameter and the convergence criteria, which are used to control or measure the change in the parameter values. For native parameter values, Eqs. (5.8) and (5.9) are used. Calculation of these quantities is different for log-transformed parameters and is described in Appendix B. The second model output situation is that log-transformed parameter estimates, standard deviations, coefficients of variation, and confidence interval limits appear in the MODFLOW-2000 and UCODE_2005 output files along with analogous statistics applicable to the native parameter values. In most circumstances, the user can ignore the statistics related to the log-transformed parameter values and instead use the statistics related to the native values. Related issues are discussed in Chapter 7, Section 7.2.4. The third model output situation occurs when there is prior information defined for a log-transformed parameter. In this situation the associated residual, weight, and
80
ESTIMATING PARAMETER VALUES
weighted residual are reported as the natural logarithms of the actual values, because the regression calculations use these values.
5.5
USE OF LIMITS ON ESTIMATED PARAMETER VALUES
Upper and lower limits on parameters that constrain possible estimated values are commonly available in inverse models and are suggested, for example, by Sun (1994, p. 35). Such limiting constraints on parameter values may appear to be necessary given the unrealistic parameter values that can be estimated through inverse modeling. However, this practice can disguise more fundamental modeling errors, as demonstrated by Poeter and Hill (1997) using a simple synthetic test case and Hill et al. (1998) using a complex synthetic test case. Anderman et al. (1996) show how unrealistic optimized values of recharge in a field problem revealed important model construction inaccuracies. As discussed in Guideline 5 in Chapter 11, unrealistic estimated parameter values are likely to indicate either that (1) the data do not contain enough information to estimate the parameters or (2) there is a more fundamental model error. In the first circumstance, the best response is to use prior information or regularization on the parameter value, which tends to produce an estimate that is close to the most likely value, instead of at the less likely values that generally constitute the imposed upper and lower limits. In the second circumstance, the best response is to find and resolve the error. In the authors’ opinions, the only circumstance in which it is advantageous to use limits on parameter estimates is to prohibit values that would make the process model fail. To prevent the regression from calculating parameter values that would cause the process model to fail, UCODE_2005 supports limits. MODFLOW-2000 does not support limits because the required limits are imposed internally. For example, if a negative value of hydraulic conductivity is calculated by the regression, the value is changed to two orders of magnitude smaller than its starting value.
5.6
EXERCISES
Exercise 5.1 uses a two-parameter version of the test case to demonstrate the effects of extreme parameter correlation and the performance of the modified Gauss – Newton method and ends with an exercise asking students to derive the Gauss – Newton equation. In Exercise 5.2 the modified Gauss – Newton method is used to estimate the six parameters of the steady-state model. Exercise 5.1: Modified Gauss– Newton Method and Application to a TwoParameter Problem This exercise involves objective-function surfaces for a two-parameter version of the steady-state model described in Chapter 2, Section 2.2. Objective-function surfaces were discussed in Chapter 3, Section 3.5 and examples were shown in Figure 3.1. Objective-function surfaces constructed for
5.6 EXERCISES
81
the two combined parameters are used to show the effects of the different types of observations (hydraulic heads and flows) on objective-function surfaces and on nonlinear regression. The two-parameter version of the model is developed by combining the six defined parameters into two parameters. One of the combined parameters multiplies hydraulic conductivities of the system and the other combined parameter multiplies recharge rates. The combined hydraulic-conductivity parameter is defined so that if the parameter value equals 1.0, all the hydraulic-conductivity values equal their starting values. As the combined parameter value changes, all the hydraulic-conductivity values change proportionately. In UCODE_2005 or PEST, defining the combined hydraulic-conductivity parameter is straightforward. In MODFLOW-2000, the hydraulic conductivities controlled by the K_RB parameter (Table 3.1) cannot be combined with the other hydraulic conductivities and the value is fixed in the simulations. This does not compromise the analysis, because the observations are much less sensitive to K_RB than to most other parameters. The combined recharge parameter is defined so that if its value equals 1.0, both recharge parameters equal their starting values. As its value changes, the recharge values of zones 1 and 2 change proportionately. Combining the parameters in MODFLOW-2000 and UCODE_2005 is described in more detail in the computer instructions available from the web site described in Chapter 1, in Section 1.1. Once a two-parameter model is constructed, UCODE_2005 or PEST can easily be used to produce data sets for constructing objective-function surfaces. There is no simple method of constructing such data sets with MODFLOW-2000. The objective-function values resulting from the UCODE_2005 and MODFLOW-2000 simulations are nearly identical. Objective-function surfaces using only hydraulic-head observations and including the flow observation with different weights are shown in Figure 5.4. (a) Assess relation of objective-function surfaces to parameter correlation coefficients. The objective-function surfaces from the two-parameter model are used in this exercise to investigate parameter correlations. With hydraulic-head observations alone (Figure 5.4a), the objective-function surface is composed of parallel lines, and no unique minimum exists. In this situation, the two parameters are completely correlated, meaning that the correlation coefficients for all parameter pairs equal positive or negative 1.00 (here, þ1.00). Thus, the hydraulic-head data cannot be used to estimate both parameters uniquely. Nonuniqueness would occur for any weighting, any combination of hydraulic-head observations in this system, and any number and configuration of hydraulic-conductivity and recharge parameters, as long as all parameters are estimated. Table 4.3a shows that with six parameters all correlation coefficients equal 1.00. With the addition of the flow data weighted using a coefficient of variation of 10 percent (Figure 5.4b), which is a reasonable level of precision for such a measurement,
82
ESTIMATING PARAMETER VALUES
FIGURE 5.4 Objective-function surfaces for the two-parameter version of the simple test case, using (a) only head data, (b) head data and flow data weighted using a reasonable coefficient of variation of 10 percent, and (c) head data and flow data weighted using an unrealistically small coefficient of variation of 1 percent. In the regression, parameter K_MULT is log-transformed and parameter RCH_MULT is not. Logarithmic scales are used for both parameters so that the objective function values can be shown for a wide range of parameter values.
the objective-function surface indicates that a minimum exists, but that it covers a fairly broad area. When the flow is weighted using a coefficient of variation of 1 percent (Figure 5.4c), which indicates a generally unachievable level of precision, a clear minimum is apparent in the objective-function surface. Objective-function surfaces that contain a minimum indicate that the parameters are not completely correlated, and at the minimum the parameter correlation coefficients lie between the extreme values of –1.0 and 1.0. As shown in Figure 4.2, the correlation coefficient can be different for different sets of parameter values in nonlinear problems. Problem . Use Darcy’s Law (Eq. (1.1)) to explain why the parameters are completely correlated when only hydraulic-head observations are used. In the equation, equate the recharge parameter, RCH_MULT, to q and the hydraulicconductivity parameter, K_MULT, to K. . Why does adding the flow measurement make such a difference in the objective-function surface? . If adding one observation prevents the parameters from being completely correlated, what effect do you expect any error in that observation to have on the regression results? . If the lines were all parallel to one of the axes, would the problem be correlation or insensitivity? (b) Examine the performance of the modified Gauss –Newton method. Parts of Exercise 5.1b involve modifying computer files and simulating the system. Instructions are available from the web site for this book described in
5.6 EXERCISES
83
Chapter 1, Section 1.1. Selected results are provided for students not performing the simulations. First, perform nonlinear regression using the problem with two combined parameters for the situation in which only hydraulic-head observations are used. Perform regression in the four situations listed below. Check whether parameter estimation converged. Plot the progression of the parameter values produced by the modified Gauss –Newton method on the objective-function surface in Figure 5.4a. The parameter values are listed in Table 5.1. 1. Set MAX-CHANGE (see definition of this variable in Section 5.1.1, preceding Eq. (5.7)) to a large number, such as 10,000, and set the starting parameters to values near those that produce the best fit. The large value of MAX-CHANGE would never be used in practice; here it causes the regression to perform as if there were no damping in the modified Gauss –Newton method. 2. Keep MAX-CHANGE large, and set the starting parameter values to values located in the lower right corner of the objective-function surface of Figure 5.4a, where the surface is relatively flat. 3. Keep the starting values as in run 2, but decrease MAX-CHANGE to 0.5. 4. Keep MAX-CHANGE small, but set the starting parameter values to values near the upper central part of the objective-function surface. Compare estimates achieved in this run to those from run 3. Second, perform nonlinear regression in the same four situations as described above, but include the flow observation weighted using a coefficient of variation of 10 percent. Plot the progression of the parameter values produced by the modified Gauss – Newton method on the objective-function surface in Figure 5.4b. The parameter values are listed in Table 5.2. Third, perform nonlinear regression in the same four situations as described above, but include the flow observation and increase its weight by decreasing its coefficient of variation to 1 percent. Plot the progression of the parameter values produced by the modified Gauss –Newton method on the objective-function surface in Figure 5.4c. The parameter values are listed in Table 5.3. Discuss the following questions related to the regression runs. Problem . Do the regression runs converge to optimal parameter values? How do the estimated parameter values compare among the different regression runs? Explain these results. Explain the difference in the progression of parameter values during these regression runs. . Based on the results shown in Table 5.1, how can parameter correlation be detected if the correlation coefficients are not reliable? Is success of the modified Gauss – Newton method a reliable indicator?
84 Did not converge
20.09 20.02
3 10213 8 10213
9 10
Converged
21.5 20.75 20.37 20.19
RCH_MULT
4 10214 7 10214 1 10213 2 10213
K_MULT
5 6 7 8
1.00 0.86 0.81 0.81
RCH_MULT
Run 2
0.20 212 26.0 23.0
1.00 1.09 1.06 1.05
K_MULT
Run 1
0.26
0.60 0.32 0.26 0.26
9.0 4.5 2.4 1.2
0.20
0.12 0.18 0.21 0.20
0.20 0.11 0.056 0.079
RCH_MULT
Converged
K_MULT
Run 3
Parameter Values Obtained for the Four Runs Using Only Hydraulic-Head Observations
9.0 1 10214 2 10214 3 10214
1 2 3 4
Iteration
TABLE 5.1
0.99 1.06 1.03 1.02
1.0 0.74 0.51 0.76
0.94 0.82 0.79 0.78
9.0 4.5 2.25 1.3
RCH_MULT
Converged
K_MULT
Run 4
85
Did not converge
0.007 20.0002
3 10211 4 10211
9 10
Converged
0.11 0.056 0.028 0.014
RCH_MULT
4 10212 6 10212 1 10211 2 10211
K_MULT
5 6 7 8
1.0 0.89 0.89 0.89
RCH_MULT 0.20 0.89 0.45 0.22
1.0 1.1 1.2 1.2
K_MULT
Run 2
9.0 8 10213 1 10212 2 10212
1 2 3 4
Iteration
Run 1
1.2 1.2 1.2 1.2
9.0 4.5 2.25 1.2 0.57 0.86 0.89 0.89
0.20 0.22 0.26 0.38
RCH_MULT
Converged
K_MULT
Run 3
TABLE 5.2 Parameter Values Obtained for the Four Runs Using Hydraulic-Head Observations and a Flow Observation Weighted Using a Reasonable Coefficient of Variation Value of 10 percent
1.2 1.2
1.0 1.0 1.0 1.1
0.89 0.89
9.0 4.5 2.25 0.89
RCH_MULT
Converged
K_MULT
Run 4
86 Did not converge
0.007 20.0002
3 10211 5 10211
9 10
Converged
0.11 0.057 0.028 0.014
RCH_MULT
4 10212 7 10212 1 10211 2 10211
K_MULT
5 6 7 8
1.0 0.91 0.91 0.91
RCH_MULT 0.20 0.90 0.45 0.23
1.0 1.1 1.2 1.2
K_MULT
Run 2
9.0 9 10213 1 10212 2 10212
1 2 3 4
Iteration
Run 1
1.2 1.2 1.2 1.2
9.0 4.5 2.25 1.2 0.58 0.87 0.91 0.91
0.20 0.22 0.26 0.39
RCH_MULT
Converged
K_MULT
Run 3
TABLE 5.3 Parameter Values Obtained for the Four Runs Using Hydraulic-Head Observations and a Flow Observation Weighted Using an Unreasonable Coefficient of Variation Value of 1 percent
1.2 1.2
1.0 1.0 1.0 1.1
0.91 0.91
9.0 4.5 2.25 0.91
RCH_MULT
Converged
K_MULT
Run 4
87
5.6 EXERCISES
(c) Derive the Gauss– Newton normal equations (optional). Problem: As shown in Eq. (5.4), the unmodified Gauss –Newton equations can be expressed as (XTr vXr )dr ¼ XTr v( y y0 (br ))
(5:10)
Derive this equation by minimizing the objective function of Eq. (3.2) after substituting in the linearized version of y0 (b), which equals y0(b) y0(br ) þ X r (b br )
(5:11)
Exercise 5.2: Estimate the Parameters of the Steady-State Model In this exercise, a range of reasonable values is assigned to each of the six parameters of the steady-state flow system model described in Chapter 2, Section 2.2, and nonlinear regression is used to estimate the parameter values. Nonlinear regression is attempted without and then with prior information on two of the parameters. Parts of this exercise involve modifying computer files and simulating the system. Instructions are available from the web site for this book described in Chapter 1, Section 1.1. Students not performing the simulations can skip those parts of the exercise. (a) Define a range of reasonable values for each parameter. In MODFLOW-2000 and UCODE_2005, a reasonable range specified by the user is compared with each parameter estimate as discussed in Section 5.5. This approach allows for a powerful check on likely model accuracy. In this exercise, ranges of reasonable values are defined for each steady-state model parameter. For students performing the simulations, files and instructions on the web site for this book can be used to complete the exercise. (b) First attempts at estimating parameters by nonlinear regression. In this exercise, first attempt to estimate the native values of all parameters, and then attempt to estimate the native values of the recharge parameters and the log-transformed values of the hydraulic-conductivity parameters. As discussed in Section 5.4, advantages of estimating the log of some parameter values instead of the native values are that (1) convergence problems can sometimes be alleviated, and (2) estimating the log-transform of a parameter prevents its native value from becoming negative. In this exercise, however, similar results are obtained whether or not the parameters are log-transformed. After the runs have been completed, consider the following questions. Selected results from the run without log-transformed parameters are presented in Tables 5.4, 5.5,
88
ESTIMATING PARAMETER VALUES
TABLE 5.4 Parameter Values for Each Parameter-Estimation Iteration of the Regression Run Without Log-Transformed Parameters in Exercise 5.2b Iteration Start 1 2 3 4 5 6 7 8 9 10
HK_1
K_RB
VK_CB
HK_2
RCH_1
RCH_2
3.00 1024 3.40 1024 4.91 1024 4.92 1024 4.92 1024 4.94 1024 4.94 1024 4.97 1024 4.97 1024 4.99 1024 4.99 1024
1.20 1023 1.20 1025a 2.16 1025 2.17 1025 2.28 1025 2.32 1025 2.44 1025 2.50 1025 2.62 1025 2.69 1025 2.82 1025
1.00 1027 9.79 1028 1.73 1028 1.00 1029a 3.00 1029 1.00 1029a 3.00 1029 1.00 1029a 3.00 1029 1.00 1029a 3.00 1029
4.00 1025 2.82 1025 4.00 1027a 4.00 1027a 1.16 1026 4.00 1027a 1.16 1026 4.00 1027a 1.16 1026 4.00 1027a 1.16 1026
63.07 61.28 63.90 64.06 62.84 63.20 62.03 62.40 61.27 61.63 60.54
31.54 30.94 22.10 21.94 23.13 22.76 23.92 23.54 24.65 24.28 25.36
Did not converge a
If hydraulic conductivities are assigned negative values by the regression, MODFLOW-2000 assigns them to be equal to the starting parameter value divided by 100.
and 5.6. For students performing the computer exercises, these results can be found in the model output files. Problem . In Table 5.4 and Figure 5.5, examine the changing values of the parameters and max-calculated-change (column 3 in the top part of Figure 5.5) to diagnose why the regressions did not converge. Max-calculated-change is the largest fractional parameter change that would occur if the damping parameter were equal to 1.0 (see Eq. (5.7)). A value of 0.50 indicates that the largest change (in absolute value) is a 50-percent increase in the parameter value, and a
TABLE 5.5 Composite Scaled Sensitivities for Each Parameter-Estimation Iteration of the Regression Run Without Log-Transformed Parameters in Exercise 5.2b Iteration Start 1 2 3 4 5 6 7 8 9 10
HK_1
K_RB
VK_CB
HK_2
RCH_1
RCH_2
41.3 41.6 35.2 35.0 35.1 35.1 35.2 35.2 35.3 35.2 35.4
0.214 20.9 10.8 10.8 10.3 10.1 9.58 9.35 8.92 8.68 8.28
0.783 0.515 0.033 0.473 0.464 0.472 0.464 0.471 0.463 0.471 0.464
11.0 7.45 0.085 0.499 0.543 0.499 0.543 0.499 0.544 0.499 0.545
27.4 37.9 28.2 28.1 27.2 27.2 26.3 26.2 25.4 25.4 24.6
25.6 30.4 17.2 17.0 17.8 17.4 18.1 17.8 18.4 18.1 18.8
89
5.6 EXERCISES
.
value of 21.00 indicates that the largest change is a 100-percent decrease. Also examine the sums of squared weighted residuals in Figure 5.5. In diagnosing why the regressions did not converge, also consider the composite scaled sensitivities (css) calculated for the starting parameter values (shown in Table 4.1 and Figure 4.3) and for the parameter values calculated at each iteration of the regression (shown in Table 5.5). What are the differences in the magnitudes of the css calculated at the starting parameter values? What might this indicate about the likelihood of estimating all six parameter values? How do the css calculated at iterations 2 through 10 differ from those calculated at the starting parameter values? How does this additional css information help explain the results shown in Table 5.4 and Figure 5.5?
SELECTED STATISTICS FROM MODIFIED GAUSS-NEWTON ITERATIONS MAX. PARAMETER CALC. CHANGE MAX. CHANGE DAMPING ITER. PARNAM MAX. CHANGE ALLOWED PARAMETER 1 2 3 4 5 6 7 8 9 10
K_RB HK_2 HK_2 VK_CB HK_2 VK_CB HK_2 VK_CB HK_2 VK_CB
-7.53194 -2.11493 -377.849 33.7467 -79.4868 33.8172 -71.9800 33.8676 -62.8221 33.9119
2.00000 2.00000 2.00000 2.00000 2.00000 2.00000 2.00000 2.00000 2.00000 2.00000
0.26554 0.94566 0.52931E-02 0.59265E-01 0.25161E-01 0.59141E-01 0.27785E-01 0.59054E-01 0.31836E-01 0.58976E-01
SUMS OF SQUARED WEIGHTED RESIDUALS FOR EACH ITERATION
ITER. 1 2 3 4 5 6 7 8 9 10
SUMS OF SQUARED WEIGHTED RESIDUALS OBSERVATIONS PRIOR INFO. TOTAL 1752.2 0.0000 1752.2 9286.4 0.0000 9286.4 650.03 0.0000 650.03 674.36 0.0000 674.36 603.16 0.0000 603.16 563.63 0.0000 563.63 504.43 0.0000 504.43 469.75 0.0000 469.75 420.64 0.0000 420.64 389.73 0.0000 389.73
PARAMETER ESTIMATION DID NOT CONVERGE IN THE ALLOTTED NUMBER OF ITERATIONS
FIGURE 5.5 Selected statistics from the modified Gauss – Newton iterations of the regression run without log-transformed parameters in Exercise 5.2b. This is a fragment from the global output file of MODFLOW-2000.
90
ESTIMATING PARAMETER VALUES
(c) Assign prior information on parameters. The analysis from Exercise 4.1 and the performance of the regression in Exercise 5.2b suggested that prior information on parameters VK_CB and K_RB may be needed for the regression to converge. In this exercise, define the starting values of these two parameters as prior estimates. The prior information needs to be weighted in the same manner as observations need to be weighted. For both VK_CB and K_RB, assign a coefficient of variation of 0.3 to the prior estimates. Then, perform nonlinear regression.
TABLE 5.6 Parameter Values for Each Parameter-Estimation Iteration for Exercise 5.2c Iteration Start 1 2 3 4 5
HK_1
K_RB 24
3.00 10 4.14 1024 4.61 1024 4.62 1024 4.62 1024 4.62 1024
VK_CB 23
1.20 10 1.16 1023 1.17 1023 1.17 1023 1.17 1023 1.17 1023
HK_2
27
25
1.00 10 9.77 1028 9.80 1028 9.88 1028 9.90 1028 9.90 1028
4.00 10 2.12 1025 1.37 1025 1.49 1025 1.53 1025 1.54 1025
RCH_1
RCH_2
63.07 49.36 48.45 47.71 47.47 47.45
31.54 36.77 37.65 38.28 38.50 38.53
Converged
SELECTED STATISTICS FROM MODIFIED GAUSS-NEWTON ITERATIONS MAX. PARAMETER CALC. CHANGE MAX. CHANGE DAMPING ITER. PARNAM MAX. CHANGE ALLOWED PARAMETER 1 2 3 4 5
HK_2 HK_2 HK_2 HK_2 HK_2
-0.470616 -0.353595 0.106790 0.302890E-01 0.330029E-02
2.00000 2.00000 2.00000 2.00000 2.00000
1.0000 1.0000 0.81707 1.0000 1.0000
SUMS OF SQUARED WEIGHTED RESIDUALS FOR EACH ITERATION
ITER. 1 2 3 4 5 FINAL
SUMS OF OBSERVATIONS 1752.2 81.454 10.954 10.562 10.548 10.548
SQUARED WEIGHTED PRIOR INFO. 0.0000 0.19343E-01 0.13860E-01 0.93029E-02 0.84770E-02 0.84769E-02
RESIDUALS TOTAL 1752.2 81.473 10.968 10.571 10.556 10.556
*** PARAMETER ESTIMATION CONVERGED BY SATISFYING THE TOL CRITERION ***
FIGURE 5.6 Selected statistics from the modified Gauss – Newton iterations from Exercise 5.2c. This is a fragment from the global output file of MODFLOW-2000.
91
5.6 EXERCISES
Problem . Compare the regression performance (Figure 5.6) with the results of Exercise 5.2b (Figure 5.5). Consider the values of max-calculated-change (column 3 in the top parts of Figures 5.5 and 5.6), and the sums of squared weighted residuals and the parameter values in Tables 5.5 and 5.6. For students who perform the computer exercises, these values are listed in the model output files. . The two parameters with prior information have estimates that are nearly identical to the respective prior value. Why? If execution times are long, under what circumstances would you suggest including prior information and estimating these parameters? Explain. . The statistic used to determine the weighting is important to whether the prior information can really be regarded as prior information or as regularization. For this problem, what would you conclude from the weighting used? (d) Parameter estimates and objective-function values. The starting, estimated, and true parameter values are shown in Table 5.7, and values of the objective function calculated for each of these parameter sets are shown in Table 5.8. Problem . Why do the estimated parameter values differ from the true parameter values? . Comment on the objective-function values for the different parameter sets.
TABLE 5.7 Starting, Estimated, and True Values of the Parameters of the Steady-State Flow-System Parameter Name HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
Starting Value
Estimated Value
True Value
3.0 1024 1.2 1023 1.0 1027 4.0 1025 63.072 31.536
4.62 1024 1.17 1023 9.90 1028 1.54 1025 47.45 38.53
4.0 1024 1.0 1023 2.0 1027 4.4 1025 31.536 47.304
TABLE 5.8 Objective-Function Values Calculated Using the Starting, Estimated, and True Parameters
Objective-function value (heads and flows only) Objective-function value (heads, flows, and prior)
Starting Parameters
Estimated Parameters
True Parameters
1752.2
10.55
11.71
1752.2
10.56
23.13
92
ESTIMATING PARAMETER VALUES
(e) Using objective-function surfaces to explore regression performance (Optional). As discussed in Chapter 3, Section 3.5, objective-function surfaces can be plotted for any two parameters, by systematically changing the values of only those two model parameters. In this exercise, use objective-function surfaces for selected parameter pairs to investigate the performance of the regression in Exercises 5.2b, 5.2c, and 5.2d. This is easily accomplished with UCODE_2005, as discussed in the instructions for this exercise on the web site described in Chapter 1, Section 1.1. Problem: What insight is gained beyond what was provided by the sensitivity analysis using composite scaled sensitivities and parameter correlation coefficients?
6 EVALUATING MODEL FIT
The fit of model simulated values to the observations and prior information tests the ability of the model to realistically represent the simulated system. This chapter presents methods of evaluating model fit in the order they generally are used in a modeling project.
6.1
MAGNITUDE OF RESIDUALS AND WEIGHTED RESIDUALS
The first step in evaluating model fit generally involves determining the largest (in absolute value) residuals and weighted residuals, which were defined in Chapter 3, Section 3.4.3. Weighted residuals have the advantage of including the effects of errors. Large absolute values of weighted residuals indicate unexpectedly poor model fit more reliably than do large absolute values of unweighted residuals. In initial model runs, the largest weighted residuals often indicate gross errors in the model, the observation data, the simulated equivalents of the observations, and/ or the weighting. For example, some observations might be misrepresented in the simulation, suffer from incorrect data interpretation, or simply have been entered incorrectly in model input files. To help detect such problems, UCODE_2005 and MODFLOW-2000 output lists the five largest positive weighted residuals and the five largest negative weighted residuals (Hill et al., 2000, Table 14; Poeter et al., 2005, Table 28). In addition, the programs print the percent contribution of these individual weighted residuals to the objective function. Weighted residuals that Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty. By Mary C. Hill and Claire R. Tiedeman Published 2007 by John Wiley & Sons, Inc.
93
94
EVALUATING MODEL FIT
individually account for a large percent of the objective-function value are suspect and should be checked carefully. In subsequent model runs, after the major problems contributing to very large weighted residuals have been corrected, analysis of systematic misfit and statistics that measure overall model fit become increasingly important, as noted in Exercise 3.2.
6.2
IDENTIFY SYSTEMATIC MISFIT
Systematic model misfit can reveal problems with the model and/or the data. For example, a groundwater model may provide a good match to hydraulic heads but a very poor match to streamflow gains and losses. This indicates that the model poorly represents the dynamics of the simulated flow system. Alternatively, the model may fit all observations well, but only with parameter values that differ substantially and systematically from prior information. For example, in groundwater models, estimated hydraulic conductivities often are smaller than hydraulic conductivities measured by aquifer tests. One possibility is that aquifer tests use wells that commonly are screened in the subsurface materials with the highest hydraulic conductivities, and the volumes of these materials are small relative to the volumes represented by model hydraulic conductivities. In this situation, the problem is that the prior information is defined using data that are not representative of much of the subsurface included in the model. Systematic misfit can be detected through application of the methods presented in this chapter. Often it is useful to apply the methods to subsets of the residuals and weighted residuals. Subsets generally are defined on the basis of observation or prior information type, location, time, and so on. Subset definition is problem dependent, and useful subsets often are determined only after some experimentation. For example, in a groundwater model it may be important for subsets to be defined based on well depth, model layer, distance from some types of boundaries, and so on. It is important to calculate the overall measures of model fit presented in Section 6.3 using subsets of observations and prior information, and the entire data set. For the graphical analyses of model fit presented in Section 6.4, it is important to use different symbols to represent different sets of observation and prior information.
6.3
MEASURES OF OVERALL MODEL FIT
Measures of overall model fit are single values that provide a quick evaluation of how well a model matches all or subsets of the observations and prior information. Measures calculated for alternative models of the same system often are used to judge how well the different models perform. The measures described here can be used in this way as long as the number of observations and prior information and their weighting do not vary between the models being compared.
6.3 MEASURES OF OVERALL MODEL FIT
95
The measures can be thought of as representing two competing goals—obtaining as good a fit as possible to observations and prior information and using as few parameters as possible. A better fit can always be obtained by increasing the number of parameters, but, as discussed in Chapter 11, Guideline 1, too many parameters can degrade the predictive ability of the model. Thus, all but the first measure presented here include a penalty for additional parameters. For the statistic to have a smaller value when a parameter is added, the model fit needs to be improved enough to overwhelm the increase in the penalty. We do not provide an extensive list of overall measures of model fit. Measures not mentioned here include Kashyap’s measure (Kashyap, 1982) and GCV (Craven and Whaba, 1979). These measures can be calculated using, for example, MMA (Poeter and Hill, in press). 6.3.1
Objective-Function Value
The value of the weighted least-squares or maximum-likelihood objective function (Eq. (3.1) to (3.3)) often is used informally to indicate model fit. Objective functions are rarely used for more formal comparisons because their values nearly always decrease as additional parameters are defined in the model and included in parameter estimation. 6.3.2
Calculated Error Variance and Standard Error
A common indicator of the overall magnitude of the weighted residuals is the calculated error variance, s2, which equals (Cooley and Naff, 1990, p. 166; Ott, 1993) s2 ¼
S(b) (ND þ NPR NP)
(6:1)
where S(b) is the weighted least-squares objective-function value (Eq. (3.1) or (3.2)) and the other variables are defined for Eq. (3.1). s2 is dimensionless if the weighting is defined as suggested in this book. The square root of the calculated error variance, s, is called the standard error of the regression. Smaller values of both the calculated error variance and the standard error indicate a closer overall fit to the observations, and smaller values are preferred as long as the weighted residuals do not indicate model error (discussed in Section 6.4). Overall Fitted Error Statistics A disadvantage to using s2 and s directly as measures of model fit is that they have little intuitive appeal because they are dimensionless. To obtain dimensional values that more effectively reflect the fit, s can be used to multiply the standard deviations and coefficients of variation used to calculate the weights for any group of observations. The resulting statistics are defined by Hill (1998, pp. 19, 53) as fitted error statistics, of which the fitted standard deviation and the fitted coefficient of variation are examples. These statistics express the
96
EVALUATING MODEL FIT
average fit to different types of observations. For example, if a standard deviation of measurement error equal to 0.3 m is used to calculate the weights for most of the hydraulic-head observations and the calculated standard error is 3.0, the fitted standard deviation of 0.3 m 3.0 ¼ 0.9 m represents the overall fit achieved for these hydraulic heads. If a coefficient of variation of 0.25 (25 percent) is used to calculate weights for a set of spring-flow observations and the calculated standard error is 2.0, the fitted coefficient of variation of 0.25 2.0 ¼ 0.50 (50 percent) represents the overall fit achieved to these spring flows. The standard deviation or coefficient of variation used to calculate the weighting reflects knowledge about observation or prior information error, and the fitted standard deviation or fitted coefficient of variation reflects both model fit and knowledge about error represented in the weighting. Although the fitted error statistic is not standard statistical terminology, in the authors’ experience, it provides a meaningful way of communicating model fit. Generally, this approach applies only if the fitted error statistic summarizes the fit to a fairly large number of observations or prior information. One or a few values can be evaluated more effectively by considering their residuals and weighted residuals directly. Interpret the Calculated Error Variance The interpretation of the calculated error variance, s2, or standard error, s, is related to the weighting used in the regression. If the weight matrix is defined as suggested in Eq. (3.8) or (3.9) and if the fit achieved by regression is consistent with the data accuracy as reflected in the weighting, the expected value of both the calculated error variance and the standard error is 1.0. This can be proved by substituting Eq. (3.2) into Eq. (6.1) and taking the expected value. It can be demonstrated using generated random numbers instead of residuals, as described in Exercise 6.1b. If the calculated error variance or the standard error is significantly different from a value of 1.0, this indicates that the model fit is inconsistent with the weighting. A value of s or s2 that is significantly greater than 1.0 indicates that the residuals are larger, on average, than is consistent with the statistics used to calculate the weighting. That is, the model fit is worse than would be expected based on the analysis of error used to determine the weighting. A value of s or s2 that is significantly less than 1.0 indicates that the residuals are smaller, on average, than is consistent with the statistics used to calculate the weights. That is, the model fits the observations better than would be expected based on the analysis of observation error used to determine the weights. For the calculated error variance, significant deviations from 1.0 can be evaluated by constructing a confidence interval. The confidence interval limits can be calculated as (Ott, 1993, p. 332) ns2 ns2 ; xU2 xL2
ð6:2Þ
where n is the degrees of freedom, here equal to ND þ NPR 2 NP (see Eq. (3.1) for definitions); x2U is the upper tail value of a chi-square distribution (Appendix D,
6.3 MEASURES OF OVERALL MODEL FIT
97
Table D.5) with n degrees of freedom, with the area to the right equal to one-half the significance level of the confidence interval (the significance level, a, is 0.05 for a 95-percent interval); and x2L is the lower tail value of a chi-square distribution with n degrees of freedom with the area to the left equal to one-half the significance level. Significant deviations from 1.0 also can be evaluated using a x2 test statistic, (Ott, 1993, p. 334). To consider the standard error instead of the calculated error variance, take the square root of each limit in Eq. (6.2). The confidence intervals are used to evaluate significant deviations of s or s2 from 1.0 as described below. If the confidence interval on s2 includes the value 1.0, a ¼ 0.05, and the weighted residuals are random, then s2 does not significantly deviate from 1.0 at a 5-percent significance level and model fit is consistent with the statistics used to calculate the weights on the observations and prior information. Expressed in terms of probability, there is only a 5-percent chance that the model fits the data in a way that contradicts the following assumptions: (1) the model is reasonably accurate and (2) the statistics used to calculate the weights correctly reflect the observation and prior information errors. If the confidence interval does not include 1.0, the model fit is inconsistent with the statistics used to calculate the weighting. Of interest is whether statistics that are consistent with the model fit are realistic measures of error in the observations and prior information. For example, if the standard error of the regression is 2.0, statistics that would be consistent with the model fit would be 2.0 times the standard deviations or coefficients of variation used to determine the weighting. If a streamflow observation was thought to have a 5-percent coefficient of variation, would an increase by a factor of 2.0 to 10 percent be unreasonable? If so, unaccounted for observation error could not explain the large standard error, and model error would be suspected. Here, we refer to the adjusted statistics as individual fitconsistent statistics. They differ from overall fitted error statistics in that individual observations and prior information can be considered and are often important. For individual fitted error statistics, the variances, standard deviations, and coefficients of variation used to calculate the weights are adjusted. New weights that are consistent with the model fit are obtained by multiplying the variances by s2 and the standard deviations and coefficients of variation by s. If the regression were carried out with these new weights, the same parameter estimates would be obtained and the residuals would be the same, but the weighted residuals would be different and s2 would equal 1.0. If the entire confidence interval on s2 is less than 1.0 and the weighted residuals are randomly distributed, the model fit is better than anticipated based on the statistics used to calculate the weights. This is not necessarily an indication of the overfitting discussed in Guideline 1, but the possibility should be considered. Hill et al. (1998), obtained a small s2 value because the actual observation error for a synthetic test case was smaller than expected. In this unusual case, the individual fitted error statistics were much smaller than the statistics used to determine the weighting and more accurately reflected the observation error. If the entire confidence interval on s2 is greater than 1.0, which is common, then the model fit is worse than anticipated based on the statistics used to calculate the
98
EVALUATING MODEL FIT
weights. In this situation, the resulting interpretation depends on whether (1) the weighted residuals are randomly distributed, and (2) the individual fitted error statistics are so large that they could not reasonably be caused by observation and prior information errors. Randomness of the weighted residuals can be evaluated as discussed in Section 6.4 and in Exercise 6.2. After the randomness of the weighted residuals has been evaluated and individual fitted error statistics have been calculated, the analysis depends on which of the following three situations apply. 1. The weighted residuals are randomly distributed and individual fitted error statistics can be justified (meaning that the observations and prior information error actually could be sufficiently larger than originally assumed). In this case, the analysis indicates that the model fit is consistent with the model being a reasonably accurate representation of the true system. 2. The weighted residuals are randomly distributed but individual fitted error statistics reflect unreasonable levels of observation and prior information error. In this case, the results of Hill et al. (1998) suggest that model error is significant but many sources of model error probably contribute to the lack of model fit. A few sources of model error do not dominate the model. This situation is not uncommon, and if the results of Hill et al. (1998) are valid, model predictions and measures of uncertainty can be accurate. Future studies are needed to test this conclusion. 3. The weighted residuals are not randomly distributed. In this case, the analysis suggests that there may be substantial and problematic model error. The best approach is to evaluate the model to determine the cause of the nonrandom residuals, and to evaluate the cause of any very large weighted residuals. 6.3.3
AIC, AICc, and BIC Statistics
The calculated error variance and standard error are sometimes criticized for not sufficiently representing the drawbacks associated with increasing the number of estimated parameters. The AIC, AICc, and BIC statistics were developed in the time-series literature to address this criticism (Brockwell and Davis, 1987; Burnham and Anderson, 2002). These statistics are calculated as the sum of the maximumlikelihood objective function (Eq. (3.3)) evaluated at the optimal parameter values, S0 (b0 ), and terms that become large as more parameters are added. Although these statistics were developed for time-series problems, Carrera and Neuman (1986) successfully used them to discriminate between different parameterizations of a groundwater flow model. The references cited below for these statistics provide statistic derivations and additional discussion. The AIC statistic was developed by Akaike (1973, 1974) and was corrected by Sugira (1978) to obtain AICc as described by Burnham and Anderson (2002, p. 66)
6.4 ANALYZING MODEL FIT GRAPHICALLY AND RELATED STATISTICS
99
AICc and AIC are calculated as: AICc (b0 ) ¼ S0 (b0 ) þ NP 2 þ
2 NP (NP þ 1) (NOBS þ NPR NP 1)
AIC(b0 ) ¼ S0 (b0 ) þ NP 2
ð6:3aÞ (6:3b)
where S0 (b0 ) is the maximum-likelihood objective function of Eq. (3.3), NP is the number of estimated parameters, NOBS is the number of observations used in the regression, and NPR is the number of prior estimate equations used in the regression. Often, S0 (b0 ) is replaced in these equations by n logðSðb0 Þ=nÞÞ, where Sðb0 Þ is defined in Eq. (3.2). AICc is needed if NOBS/NP , 40 for any model considered. The statistic BIC was developed by Akaike (1978) as a response to concern that AIC sometimes promoted use of more parameters than was required. The version of this statistic used by Carrera and Neuman (1986) is: BIC(b0 ) ¼ S0 (b0 ) þ NP ln(NOBS þ NPR)
(6:4)
For these statistics, smaller values generally indicate a more accurate model. However, if the statistics for a model with fewer parameters are only slightly larger than the statistics of another model with more parameters, it may be preferable to select the model with fewer parameters, unless the investigator has other information indicating the validity of the more complicated model. Burnham and Anderson (2002) suggest that of the three statistics, AICc has distinct advantages. These statistics can be cited in addition to s2 or s; it is common to present all of these values in a table and/or graphically, for the models considered. MODFLOW2000 prints AIC and BIC, UCODE_2005 prints AIC, AICc, and BIC.
6.4 ANALYZING MODEL FIT GRAPHICALLY AND RELATED STATISTICS In addition to overall measures of model fit, several graphical analyses and related statistics can be used to assess whether the match of simulated values to observed values contradicts the requirements of Section 3.3 and thus indicates that the regression is not valid. The graphical methods were developed for groundwater inverse modeling by Cooley and Naff (1990), using the work of Draper and Smith (1981, 1998), and were slightly modified by Hill (1992, 1994). Required data files are produced by UCODE_2005 and MODFLOW-2000. The graphical methods are described in the following sections. Examples are presented here and in Exercise 6.2. In Chapter 10, Table 10.2 lists these graphs with questions they are likely to address and guidelines they are likely to support.
100
EVALUATING MODEL FIT
6.4.1 Using Graphical Analysis of Weighted Residuals to Detect Model Error The graphical analyses of model fit presented here focus on weighted residuals. Regression results are valid (basically, model error is not indicated) only if (1) the weighted residuals from all types of observations and prior information appear to be statistically consistent (they all look like they have the same variance and a mean of zero) or (2) any statistical inconsistency can be explained by the correlation of weighted residuals expected through the fitting process imposed by the regression. The statistical consistency is evaluated using graphs of the weighted residuals with respect to: weighted or unweighted simulated values, independent variables such as space and time, and normal order statistics. This chapter focuses on using graphical analyses to detect model error. If model error is detected, see Guideline 9 in Chapter 12 for a discussion of how to proceed.
6.4.2 Weighted Residuals Versus Weighted or Unweighted Simulated Values and Minimum, Maximum, and Average Weighted Residuals Graphs of weighted residuals can be plotted against either weighted or unweighted simulated values. The need to plot weighted instead of unweighted residuals and the advantages and disadvantages of weighting the simulated values are discussed here. From an intuitive perspective, it makes sense that a model that fits the data well should not demonstrate a distinctively different fit to similar observations. In groundwater models, for example, simulated hydraulic heads would be expected to match observed hydraulic heads equally well in areas of high and low hydraulic head, all else being equal. Consider, for example, one area where the hydraulic heads are five meters, on average, above the heads in the other area. Residuals that are all negative in one area and all positive in the other would indicate model error. Yet all observations are not similar. Weighted residuals need to be considered instead of unweighted residuals when errors associated with observations or prior information have different variances and/or are correlated for the analysis to detect model error. In the groundwater example, if the average depth to water in the observation wells and/or the methods used to determine the elevation of the wells differed in the two areas, larger residuals might be expected in one area. Use of weighted residuals eliminates the effects of this expected difference in model fit to observations in the two areas, allowing the graphs to be used more easily to detect model error. Weighted simulated values are suggested for these graphs by Draper and Smith (1998, p. 63– 64) because, in most circumstances, weighted residuals and weighted simulated values are statistically independent. However, Hill (1994) shows that three problems can occur. In some situations, plotting against unweighted residuals is advantageous. Thus, graphs of weighted residuals versus weighted and unweighted simulated values are considered here. The problems are discussed after describing how the graphs are constructed.
6.4 ANALYZING MODEL FIT GRAPHICALLY AND RELATED STATISTICS
101
Example graphs are shown in Figure 6.1. Ideally, weighted residuals are scattered evenly about 0.0 for the entire range of values on the horizontal axis, as in Figure 6.1. Figure 6.2 shows examples of graphs for which the weighted residuals are not random with respect to the weighted simulated values. When using MODFLOW-2000, the data needed to produce graphs of weighted residuals against weighted simulated values are listed in the output file with filename extension _ws. For UCODE-2005, the file with extension _ws includes weighted residuals and unweighted simulated values; weighted simulated values are listed in the output file with extension _ww. The importance of testing for systematic misfit to subsets of the observations and prior information was discussed in Section 6.2. To identify systematic misfit, plot the weighted residuals for different subsets with different symbols. For example, in the exercises at the end of this chapter, hydraulic heads, flows, and prior information are plotted using different symbols. MODFLOW-2000 and UCODE_2005 facilitate this by allowing the user to specify a plot-symbol variable for each observation and piece of prior information. The plot-symbol variables are integers; plotting routines can use the integers to control the symbols used in graphs. The statistics that summarize the distribution of the weighted residuals are the minimum, maximum, and average weighted residuals. The minimum and maximum weighted residuals display the range of weighted residuals at a glance. In practice, especially in the initial stages of calibration, the minimum and maximum weighted residuals often identify problems with the model or the observation data, as discussed in Section 6.1. The average weighted residual is a simple arithmetic average of the weighted residuals and ideally equals zero. In linear regression, the average always equals zero for the optimized parameter values; in nonlinear regression, the value of the average weighted residual generally approaches zero as calibration proceeds. In MODFLOW-2000, these statistics are printed in the LIST file, which is defined in the Name File; in UCODE_2005 they are printed in the main output file for the Forward, Sensitivity-Analysis, and Parameter-Estimation modes. That output file has filename extension #uout. The three problems that can occur with graphs of weighted residuals versus weighted or unweighted simulated values and solutions for each are described next. The first problem occurs when the weighted or unweighted simulated values extend over a wide range so that it is not possible to scale the associated axis to obtain a useful graph. The problem is illustrated in Exercise 6.2a. This problem can sometimes be resolved by using weighted simulated values or by log-transforming the axis for the weighted or unweighted simulated values. Another possibility is to multiply the weighted or unweighted simulated values of extreme points by a factor so that they plot closer to the other values. To ensure that the graph can still be used to test whether the weighted residuals vary systematically for any one type of data, this adjustment needs to be applied carefully. It is usually a good idea to apply the same factor to all weighted or unweighted simulated values for a given data type. The second problem occurs when weights are calculated using coefficients of variation, as suggested after Eq. (3.5) and discussed in Chapter 11 under Guideline 6. In this case, the weight for an observed value yi equals 1/(c.v.yi)2 where (c.v.)i
102
EVALUATING MODEL FIT
FIGURE 6.1 Example graphs of weighted residuals and weighted simulated values with no model bias. The values of weighted residuals plotted here are three different realizations of 100 generated normally distributed numbers with mean 0.0 and standard deviation 1.5. The standard deviation is used to define grid lines for the weighted residuals.
6.4 ANALYZING MODEL FIT GRAPHICALLY AND RELATED STATISTICS
103
FIGURE 6.2 Example graphs of weighted residuals and weighted simulated values showing evidence of model bias for two different data sets. (a) The weighted residuals associated with smaller weighted simulated values vary less (have smaller variance). (b) The weighted residuals increase with increasing weighted simulated value. The standard deviations are of the weighted residuals (1.23 and 2.01) are used to define grid lines for the weighted residuals.
is the coefficient of variation. Then the weighted residual and the associated weighted simulated value are calculated as
v1=2 (yi y0i ) ¼ i v1=2 y0i ¼ i
1 1 y0i (yi y0i ) ¼ ðc:v:Þi yi ðc:v:Þi ðc:v:Þi yi
(6:5a)
y0i ðc:v:Þi yi
(6:5b)
If the weight is calculated using the simulated value, as can be done using UCODE_2005, the weighted residual and associated weighted simulated value are calculated as
v1=2 (yi y0i ) ¼ i v1=2 y0i ¼ i
1 yi 1 (yi y0i ) ¼ 0 0 ðc:v:Þi yi ðc:v:Þi ðc:v:Þi yi
(6:6a)
1 ðc:v:Þi
(6:6b)
The second term of Eq. (6.5a) equals Eq. (6.5b). If (c.v.)i is the same for multiple observations, then the weighted residuals for these observations plot on a straight line with a slope of 21. Equation (6.6b) is the same for all simulated values with the same coefficient of variation, so the placement on the horizontal axis completely ignores the simulated value. In both circumstances, the vertical distribution of the weighted residuals still can be used to test their independence, but the purpose of the graph described in this section has largely been circumvented.
104
EVALUATING MODEL FIT
The most straightforward way to rectify this situation is to use the unweighted instead of weighted simulated values. Alternatively, Hill (1994) multiplies the weighted simulated values of Eq. (6.5b) by the observed values, yi. Then, the modified weighted simulated values are
v1=2 y0i yi ¼ i
y0i ðc:v:Þi
(6:7a)
Similarly, multiplying Eq. (6.6b) by the simulated value yields
v1=2 y0i y0i ¼ i
y0i ðc:v:Þi
(6:7b)
Both modifications resolve the second problem but can worsen the first problem discussed earlier. The first option of plotting against simulated values is probably the most useful in many circumstances. The third problem occurs when estimated parameters that have prior information are scaled using the value of the prior information. Such scaling is sometimes convenient because it produces prior information values that equal 1.0. Then, during regression, the percent change between the estimate and the prior value is obvious. For example, with this type of scaling, a parameter estimate of 1.5 indicates that the estimate is 50 percent larger than the prior value. In this circumstance, the weighted residuals for the prior information are calculated as 0 v1=2 p (1:0 Pp )
(6:8)
and the weighted simulated values are calculated as 0 v1=2 p Pp
(6:9)
If v1/2 p is equal to 1 divided by the coefficient of variation, which is common, and the coefficient of variation is the same for multiple prior parameters, which also is common, a graph of the weighted residuals and weighted simulated values forms a straight line with a slope of 21.0. As for the second problem, the graph would not indicate whether the weighted residuals vary systematically with the size of the simulated value. A meaningful graph can be obtained by plotting against unweighted simulated values or by calculating a modified weighted simulated value as 0 v1=2 p Pp Pp
ð6:10Þ
Caution should be taken when altering the values used to create the graphs discussed in this section to ensure that the resulting graph serves the intended purpose of testing whether weighted residuals show systematic patterns of model fit when compared to simulated values.
6.4 ANALYZING MODEL FIT GRAPHICALLY AND RELATED STATISTICS
105
6.4.3 Weighted or Unweighted Observations Versus Simulated Values and Correlation Coefficient R Ideally, simulated values are close to observed values, so that graphs of observations against simulated values fall along a straight line with slope equal to 1.0 and an intercept of zero. Correspondingly, graphs constructed using weighted observations and weighted simulated values would have the same characteristics and can be useful when the weighting results in the values having a more condensed range. They also have the advantage that variations in expected error variance are accounted for already, making it easier to detect model error using the graph. Comparing Figure 6.3 with Figure 6.2 shows that, all else being equal, plotting weighted residuals provides a better test of model bias than plotting weighted or unweighted observations. This is because the typically large range in magnitudes of the weighted or unweighted observations and simulated values can obscure trends in the differences between them. The greater the range, the smaller the same difference looks. This limitation is eliminated when weighted residuals are considered instead. Graphs of weighted residuals are less commonly used. Perhaps some modelers prefer to disguise model error. When using MODFLOW-2000 or UCODE_2005, graphs of observed versus simulated values can be produced using data from the output file with filename extension _os; graphs of weighted observed values versus weighted simulated values can be produced using data listed in output file with filename extension _ww. The correlation coefficient between the weighted observations and the weighted simulated values measures how well the trends in the weighted simulated values match those of the weighted observed values and, therefore, how closely the
FIGURE 6.3 Example graphs of weighted observed and simulated values. The data plotted are the same data shown in the graphs of Figure 6.2. This display of the data does not reveal problems as clearly as do the graphs of weighted residuals and weighted simulated values in Figure 6.2.
106
EVALUATING MODEL FIT
points on a graph such as that shown in Figure 6.3 fall on the line. This correlation coefficient, R, can be calculated for a diagonal weight matrix as (Cooley and Naff, 1990, p. l66) PND
1=2 0 (v1=2 i yi my )(vi yi my0 ) i1=2 hP i1=2 1=2 1=2 0 ND ND 2 2 0) ( v y m ) ( v y m i y y i i i i¼1 i¼1
R ¼ hP
i¼1
(6:11a)
where yi and yi0 are observed and simulated values, vi is the weight for the ith observation, and my and my0 are the means of the weighted observations and simulated values. For a full weight matrix the equation is
R¼
(v1=2 y my )T (v1=2 y0 my0 )
1=2
1=2 (v1=2 y my )T (v1=2 y my ) (v1=2 y0 my0 )T (v1=2 y0 my0 )
ð6:11bÞ
where y, y0 , and v were defined for Eq. (3.2). my and my0 are vectors with all ND elements equal to my ¼
ND X
(v1=2 y)q =ND
(6:12)
(v1=2 y0 (b))q =ND
(6:13)
q¼1
m y0 ¼
ND X q¼1
Thus, my is a vector with each component equal to the average of the weighted dependent-variable observations, and my0 is an analogous vector using the weighted simulated values. Generally, a value of R that is greater than 0.90 indicates that the trends in the weighted simulated values closely match those of the weighted observations. However, R depends on the range of values and wide ranges are common when using different types of data. Use care when interpreting R. When there is prior information, R also is calculated with y, y0 (b), and v augmented as in Appendix A, in which case ND þ NPR replaces ND when calculating my and my0 . In MODFLOW-2000, these statistics are printed in the LIST file; in UCODE_2005, they are printed in the main output file (filename extension #uout). 6.4.4 Graphs and Maps Using Independent Variables and the Runs Statistic It is very important to evaluate weighted and unweighted residuals, observations, and simulated values with respect to the independent variables of a problem, such as space and time. Ideally, the signs and magnitudes of the weighted residuals plotted spatially on maps or temporally on graphs such as hydrographs show no discernible patterns and appear random. Distinct patterns, such as the presence of
6.4 ANALYZING MODEL FIT GRAPHICALLY AND RELATED STATISTICS
107
only positive weighted residuals in a particular model layer or region, can indicate substantial model error that may cause simulated predictions to be incorrect and misleading. Distinct patterns often are present, however, especially in temporal graphs. It is crucial for the modeler to understand the cause of such patterns, and analysis of these problems can lead to changes in model construction that increase model accuracy. Examples of maps used to evaluate weighted and unweighted residuals are shown in Exercise 6.2 and in Chapter 15. The runs test (Cooley, 1979; Draper and Smith, 1998, pp. 192 –198) takes the order of the residuals into account, which is ignored in all the other summary statistics. The runs test produces a summary statistic that checks for the randomness of weighted residuals with respect to the order in which they are listed. A sequence of residuals of the same sign is called a run, and the number of runs is counted and the value assigned to the variable u. For example, the sequence of numbers 25, 22, 4, 3, 6, 24, 2, 23, 29 has the five runs (25, 2 2), (4, 3, 6), (24), (2), (23, 29), so that u ¼ 5. By using the total number of positive residuals (n1), and the total number of negative residuals (n2), u can be defined as a random variable. If n1 . 10 and n2 . 10, u is normally distributed with mean, m, and variance, s2, equal to 2n1 n2 m¼ þ 1:0 (6:14) n1 þ n 2
s2 ¼
2n1 n2 (2n1 n2 n1 n2 ) (n1 þ n2 )2 (n1 þ n2 1)
(6:15)
The actual number of runs in a data set is compared with the expected value using test statistics. The test statistic for too few runs is zf ¼ (u m þ 0:5)=s
(6:16)
The test statistic for too many runs is zm ¼ (u m 0:5)=s
(6:17)
Critical values for zf and zm are printed by UCODE_2005 and MODFLOW-2000. The critical values indicate the likelihood that the weighted residuals are in a random order. The critical values only apply when there are more than 10 positive residuals and more than 10 negative residuals. For smaller numbers of positive and negative residuals, Table D.4 (Appendix D) can be used to assess the randomness of the ordered weighted residuals. This table is applicable for situations in which n1 and n2 are each greater than or equal to 3 and less than or equal to 10, and 10 n1 þ n2 20. The table gives the lower-tail and upper-tail cumulative probabilities that a particular number of runs would occur, given the values of n1 and n2. Smaller probabilities indicate that it is less likely that the signs of the ordered weighted residuals are random. In UCODE_2005 and MODFLOW-2000, the weighted residuals are analyzed using the order in which the observations are listed in the input file. The runs statistic can be made more meaningful by considering the ordering of the observations. For example,
108
EVALUATING MODEL FIT
STATISTICS FOR ALL RESIDUALS: AVERAGE WEIGHTED RESIDUAL: 0.691E+00 # RESIDUALS >=0.: 50 # RESISUALS = 0. IS GREATER THAN 10 AND # RESIDUALS < 0. IS GREATER THAN 10 THE NEGATIVE VALUE MAY INDICATE TOO FEW RUNS: IF THE VALUE IS LESS THAN -1.28, THERE IS LESS THAN A 10 PERCENT CHANCE THE VALUES ARE RANDOM, IF THE VALUE IS LESS THAN -1.645, THERE IS LESS THAN A 5 PERCENT CHANCE THE VALUES ARE RANDOM, IF THE VALUE IS LESS THAN -1.96, THERE IS LESS THAN A 2.5 PERCENT CHANCE THE VALUES ARE RANDOM. FIGURE 6.4 Example runs test result printed by MODFLOW-2000 and UCODE_2005 (from the study described by Tiedeman et al., 1997).
in a groundwater model with pump-test data, listing the drawdowns at each location by increasing time produces a situation in which the runs statistic can be used to test whether the observed drawdowns are consistently greater than or less than the simulated values over time at each observation well. As the data are matched more randomly, the runs test will move away from indicating too few runs. In this situation it will rarely show too many runs. If spatial data are considered and observations are listed predominantly north to south, the runs statistic can provide a quick indication of whether spatial trends are diminishing as regression proceeds. Even if the runs statistic is used to evaluate trends in this manner, it is also necessary to conduct more thorough examinations using the graphical analyses of residuals described in this chapter. The runs statistic information printed by MODFLOW-2000 is displayed in Figure 6.4. A two-tailed test is used, but the critical values from only one tail are printed. The information printed by UCODE_2005 is similar. The negative runs test statistic shown in Figure 6.4 indicates that, using the order in which they are listed in the input file, there are fewer runs than would be expected given 35 values consisting of 18 positive and 17 negative values. However, the 20.339 runs statistic is closer to zero than even 21.28, the critical value with the smallest absolute value. Thus, the hypothesis that the residuals are random is not rejected. This is one indication that the weighted residuals are sufficiently randomly distributed. An example of using the runs test to evaluate weighted residuals along selected transects through a model area is shown in the discussion for Guideline 9 in Chapter 12. 6.4.5
Normal Probability Graphs and Correlation Coefficient R2N
The requirements for accurate simulated results are discussed in Chapter 3, Section 3.3. If the conditions listed in Section 3.3 are met, weighted residuals are expected to
6.4 ANALYZING MODEL FIT GRAPHICALLY AND RELATED STATISTICS
109
either (1) be random, normally distributed, and independent or (2) be random, normally distributed, and correlated in a way that is consistent with the fitting process of the regression. Possibility (1) is easiest to check and so is considered first. If the data do not satisfy the criteria, further testing is conducted to determine if the violations are consistent with the expected correlations produced by the fitting process. The test for independent, normal weighted residuals is conducted using normal probability graphs of weighted residuals. If the weighted residuals are independent and normally distributed, they will fall on an approximately straight line in a normal probability graph (Cooley and Naff, 1990; Helsel and Hirsch, 2002, pp. 30– 33). Normal probability graphs can be constructed by ordering the weighted residuals from smallest to largest and plotting them against the cumulative probability that would be expected for each value if they were independent and normally distributed. The expected cumulative probabilities depend on the number of weighted residuals considered and can be calculated in a number of ways (Looney and Gulledge, 1985a,b; Draper and Smith, 1998, p. 71). For the results presented in this work they are calculated as (k 2 0.5)/n (Hazen, 1914), where n equals the number of weighted residuals and k equals 1 for the smallest weighted residual, 2 for the next largest, and so on. For the largest weighted residual, k equals n. Calculating the cumulative probabilities in this way makes the normal probability graphs consistent with how the statistic R2N is calculated, as discussed later in this section. To obtain a graph on which random, normally distributed data are expected to lie on a straight line requires that the axis on which the probabilities are plotted be scaled for a normal probability distribution, as shown in Helsel and Hirsch (2002, Figures 2.7 and 2.9). This is called a normal probability axis. Many common plotting programs, such as Microsoft Excel, do not support normal probability axes. Fortunately, as shown by Helsel and Hirsch (2002, Figure 2.8), an alternative arithmetic scale can be used. The arithmetic scale requires that the probabilities be converted into what are called “standard normal statistics,” “normal quantiles,” or “normal score.” The cumulative probability can be calculated from the standard normal statistics using, for example, the function NORMDIST in Excel. Common values printed on the axis of standard normal statistics and associated cumulative probabilities are as follows:
Standard Normal Statistic 24.0000 23.0000 22.0000 21.0000 0.0000 1.0000 2.0000 3.0000 4.0000
Cumulative Probability 0.0000 0.0013 0.0228 0.1587 0.5000 0.8413 0.9772 0.9987 1.0000
110
EVALUATING MODEL FIT
Based on the analysis above, given 101 ordered weighted residuals, the 51st largest value would have a cumulative probability of (51 2 0.5)/101 ¼ 0.5; the standard normal statistic would be 0.0000. That is, the middle value would be expected, on average, to equal the mean of the standard normal distribution. Helsel and Hirsch (2002, Figures 2.10– 2.13) show and discuss normal probability graphs characterized by several common problems. In their graphs the term “normal quantile” is used instead of “standard normal statistic,” and it is plotted on the horizontal axis instead of the vertical axis. Regression problems commonly have small numbers of observations. To illustrate the variation that would be expected given a small sample size, Figure 6.5 shows normal probability plots generated with sample sizes of 10 and 40. As sample size increases, minor deviations from a straight line become more indicative of nonnormality. The associated summary statistic, R2N, is the correlation coefficient between the weighted residuals ordered from smallest to largest and the normal order statistics (Brockwell and Davis, 1987, p. 304). R2N is nearly equivalent to the PPCC statistic of Helsel and Hirsch (2002, Chap. 4.4). R2N can be used to test for independent, normally distributed weighted residuals and was chosen instead of other statistics, such as chi-squared and Kolmogorov –Smirnov, because it is more powerful for commonly used sample sizes (Shapiro and Francia, 1972). The correlation coefficient is calculated as R2N ¼
½(e0 m)T t2 ½(e0 m)T (e0 m)(tT t)
(6:18)
where all vectors are of length ND when R2N is evaluated only for the ND observation weighted residuals, and of length ND þ NPR when R2N is evaluated for the ND þ NPR observation and prior information weighted residuals; m is a vector
FIGURE 6.5 Normal probability graphs constructed using (a) 10 and (b) 40 data points generated from a normal probability distribution.
6.4 ANALYZING MODEL FIT GRAPHICALLY AND RELATED STATISTICS
111
with all components equal to the average of the weighted residuals, e0 is a vector of weighted residuals ordered from smallest to largest, and t is a vector with the ith element equal to the standard normal statistic for a cumulative probability equal to ui ¼ (i 2 0.5)/ND. Values of R2N close to 1.0 indicate that the weighted residuals are independent and normally distributed. If R2N is too far below the ideal value of 1.0, the weighted residuals are not likely to be independent and normally distributed. To test whether R2N is close enough to 1.0, it can be compared with critical values for R2N at significance levels of 0.05 and 0.10. These critical values are shown in Table D.3 of Appendix D. 6.4.6 Acceptable Deviations from Random, Normally Distributed Weighted Residuals Weighted residuals may appear to be nonrandom when evaluated using the methods described in Sections 6.4.1 to 6.4.5 because of (1) model inadequacy, (2) correlations induced by the fitting process of the regression (Cooley and Naff, 1990, p. 168; Draper and Smith, 1998, p. 206), or (3) too few residuals. Methods presented in this section can be used to test for the latter two reasons. Otherwise unexplained deviations from expected attributes are likely due to model inadequacy. Problems could occur with the model construction, including the parameterization, the observation data, and/or the weights. If the model appears to be inadequate, then every attempt needs to be made to identify and resolve problems with the model, so that weighted residuals that are more random and normally distributed are achieved. Possible ways of dealing with an inadequate model are discussed in Guideline 9 in Chapter 12. The correlations produced by the regression fitting process is most severe when there are few observations relative to the number of parameters. An extreme example occurs when only two data points are used to determine the slope and intercept of a simple linear model. In this situation, a perfect fit is achieved for both points, and the error is completely accommodated by the fitting process. As more points are added the situation becomes less dramatic, but the fit achieved by the regression always accommodates the error to some degree, and this can cause the weighted residuals to be correlated, rather than independent. Too few residuals can cause normal probability graphs to appear nonnormal and can cause graphs of weighted residuals versus weighted simulated values to appear nonrandom, just by virtue of the small sample size. This problem was illustrated in Figure 6.5a for a normal probability graph. Residuals that appear nonrandom and/or nonnormal can be tested by generating sets of values that have the expected correlations between the weighted residuals. The expected correlations can be calculated from the variance– covariance matrix of the weighted residuals, which equals (Bard, 1974, p. 194; similar to Cooley and Naff, 1990, p. 176) V(v1=2 e) ¼ (I X(XT vX)1 X T v)s2
ð6:19Þ
112
EVALUATING MODEL FIT
The steps of the test are as follows (Cooley and Naff, 1990, p. 176). 1. Generate sets of independent, normally distributed random numbers, which do not have the regression-induced correlations (called d’s by Cooley and Naff, 1990). Generate sets of correlated, normally distributed random numbers, which do have the regression-induced correlations (called g’s by Cooley and Naff, 1990). Within each set, associate each generated number with one of the ND þ NPR observations or prior information values used in the regression using Eq. (6.19). 2. Compare graphs of the weighted residuals with graphs of the independent random numbers (d’s) as follows: a. Evaluate graphs of weighted residuals and d’s versus weighted or unweighted simulated values (as in Figure 6.1). If the graphs of the independent random numbers and of the weighted residuals have similar deviations from a random distribution about the zero line, the nonrandom distribution of the weighted residuals could result from the small number of observations. b. Evaluate normal probability graphs (as in Figure 6.5). If the graphs of the independent random numbers and of the weighted residuals have similar deviations from a straight line, the nonlinear shape of the weighted residuals graph could result from the small number of observations. 3. Compare graphs of the weighted residuals with graphs of the correlated random numbers (g’s) as follows: a. Evaluate graphs of weighted residuals and g’s versus weighted or unweighted simulated values (as in Figure 6.1). If the graphs of the correlated random numbers and of the weighted residuals have similar deviations from a random distribution about the zero line, the nonrandom distribution of the weighted residuals could result from the fitting process of the regression. b. Evaluate normal probability graphs (as in Figure 6.5). If the graphs of the correlated random numbers and of the weighted residuals have similar deviations from a straight line, the nonlinear shape of the weighted residuals graphs could result from the fitting process of the regression. The d’s and g’s can be produced by MODFLOW-2000 and RESAN-2000 (Hill et al., 2000) or by UCODE_2005 and RESIDUAL_ANALYSIS (Poeter et al., 2005). Examples of graphs produced using data sets generated with RESAN-2000 are shown in Exercise 6.2e. An alternative test is described by Cooley (2004) and Christensen and Cooley (2005). It involves generating hundreds or thousands of sets of correlated normal random numbers, calculating the mean and plus and minus two standard deviations for each normal probability plotting position, and plotting them with the weighted residuals on a normal probability graph. An example graph is presented by Christensen and Cooley (2005, p. 44). Data sets for these graphs can be produced
6.5 EXERCISES
113
by MODFLOW-2000’s UNC Process (Christensen and Cooley, 2005) or UCODE_2005 and RESIDUAL_ANALYSIS_ADV (Poeter et al., 2005). Graphs can be produced using, for example, GWChart (Winston, 2000). 6.5
EXERCISES
These exercises consider the fit of the calibrated steady-state model of the flow system described in Chapter 2, Section 2.2. Transport predictions will not be credible if the model can not produce heads and flows that are reasonably similar to the observations. Predictions will also be suspect if the match to observations is so close that it appears observation error is being fit. To investigate model fit, Exercise 6.1 considers the overall statistical measures of model fit and Exercise 6.2 considers the graphical analyses and associated statistics. Exercise 6.1: Statistical Measures of Overall Fit In this exercise, overall fit to the head, flow, and prior information data is evaluated. This evaluation uses statistics located in output files produced by the MODFLOW-2000 or UCODE_2005 regression run of Exercise 5.2c. For students who have not performed the simulations, this output file is available from the web site for this book; see Chapter 1, Section 1.1 for information about obtaining this file. The statistics also are included in tables accompanying the exercises. (a) Examine objective-function values. The values of the least-squares (Eq. (3.1)) and maximum-likelihood (Eq. (3.3)) objective functions for the final parameter values are shown in Figure 6.6. Problem . Use equation 3.3 in the text to verify the value of the maximum-likelihood objective function. . Explain why the objective function values may not be the best indicators of model fit. (b) Demonstrate the circumstance in which the expected value of both the calculated error variance and the standard error is 1.0. (optional) In Section 6.3.2, it is claimed that if the fit achieved by regression is consistent with the data accuracy as reflected in the weighting, the expected value of both the calculated error variance and the standard error is 1.0. In this exercise, demonstrate this using generated random numbers instead of residuals. A diagonal weight matrix will be used, but the results are applicable to a full weight matrix as well. Proceed through the following steps: 1. Use a software package to generate n ¼ 100 random numbers using any distribution (such as normal or uniform). These are equivalent to the residuals of Eq. (3.1) or (3.2). 2. Square each random number.
114
EVALUATING MODEL FIT
LEAST-SQUARES OBJ FUNC (DEP.VAR. ONLY) – LEAST-SQUARES OBJ FUNC (W/PARAMETERS) – – CALCULATED ERROR VARIANCE – – – – – – – – – – – STANDARD ERROR OF THE REGRESSION – – – – – – CORRELATION COEFFICIENT– – – – – – – – – – – – W/PARAMETERS – – – – – – – – – – – – – – – – – ITERATIONS – – – – – – – – – – – – – – – – – – – – – –
= 10.548 = 10.556 = 1.5080 = 1.2280 = 0.99979 = 0.99989 = 5
MAX LIKE OBJ FUNC = -17.671 AIC STATISTIC – – – = -5.6713 BIC STATISTIC – – – = -2.2816 FIGURE 6.6 Selected statistics related to overall model fit, from the modified Gauss – Newton iterations of the regression run in Exercise 5.2c. This is a fragment from the global output file of MODFLOW-2000. “DEP.VAR.ONLY” means that only observations are included in the calculation. “W/PARAMETERS” means that prior information, if defined, is also included.
3. Divide each squared number by the variance of the distribution used. If weights are defined to be one divided by the variances, the resulting numbers are equivalent to squared, weighted residuals. 4. Sum the numbers from step 3 and divide by n. 5. Compare this value to 1.0. As n increases, the value should approach 1.0. 6. Repeat the analysis with two sets of n random numbers (total sample size is 2n) generated with very different variances. Problem: Discuss the results obtained. (c) Evaluate calculated error variance, standard error, and fitted error statistics. The values of the estimated error variance, s 2 (Eq. (6.1)), and its square root, the standard error of regression, s, are shown in Figure 6.6. Problem 2 . How does s compare to the expected value of 1.0? In the analysis, consider the confidence interval on the standard error of the regression. Use the x2 distribution in Table D.5 of Appendix D to obtain the critical values needed to calculate the confidence intervals. Here, x2(1326),0.975 ¼ 1.690; x2(1326),0.025 ¼ 16.01. Does 1.0 fall within the confidence interval? . Using s and the standard deviation of measurement error used to calculate the weights for hydraulic-head observations (see Exercise 3.2d), calculate the fitted standard deviation for heads. Compare the fitted standard deviation to the total head loss across the flow system (i.e., the difference between the maximum and minimum head, derived from the contour map of heads in Figure 2.1), and use this to judge the model fit.
6.5 EXERCISES
115
(d) Examine the AIC, AICc , and BIC statistics. The values of the AIC (Eq. (6.3)) and BIC (Eq. (6.4)) statistics are shown in Figure 6.6. As discussed in Section 6.2.4, these statistics can be useful when comparing different models. Problem: . Using Eqs. (6.3) and (6.4) and the values listed in the top part of Figure 6.6, verify the values of AIC and BIC shown in Figure 6.6. Calculate AICc. Should AIC or AICc be used? . Suppose that parameters are added to the steady-state test-case model to better represent some feature of the true system. For each additional parameter added, how much does the model fit, as represented by the weighted least-squares objective function, need to improve to result in a reduced value of the AIC, AICc, and BIC statistics? Exercise 6.2: Evaluate Graphs of Model Fit and Related Statistics In this exercise, the fit of the steady-state model calibrated in Exercise 5.2c to the head, flow, and prior observation data is evaluated using graphical methods and associated statistics. This evaluation uses residuals and statistics produced by the regression run of Exercise 5.2c. Students who have performed the simulations can create the graphs from model output files; see Chapter 1, Section 1.1 for the website where instructions are provided. (a) Graph of weighted residuals versus weighted simulated values and the minimum, maximum, and average weighted residuals. The graph of weighted residuals versus weighted simulated values is shown in Figure 6.7a. Ideally, the weighted residuals show no pattern relative to the simulated values. Problem . Comment on the graph in Figure 6.7a. Do the weighted residuals appear to be randomly distributed about zero? The very small residuals for the flows and prior information are discussed in subsequent exercises. . Comment on the values of the maximum, minimum, and average weighted residuals shown in Figure 6.8. (b) Graphs of observations versus simulated values. Examine the correlation coefficient R. A graph of weighted observations versus weighted simulated values is shown in Figure 6.7b, and a graph of observed versus simulated values is shown in Figure 6.7c. The correlation coefficient between the weighted observed and simulated values, R (Eq. (6.11a)), equals 0.99979 for the head and flow
116
EVALUATING MODEL FIT
FIGURE 6.7 Plots for analyzing model fit for Exercise 5.2c. (a) Weighted residuals versus weighted simulated values (unweighted simulated values also could be used on the horizontal axis). The vertical gridlines are placed at increments of the standard error of the regression (1.2). (b) Plot of weighted observed values versus weighted simulated values. (c) Plot of observed versus simulated values.
observations, and 0.99989 for the observed values and prior values on K_RB and VK_CB. These values are shown in Figure 6.6. Ideally, the values plotted on the types of graphs shown in Figures 6.7b and 6.7c fall on a line with a slope of 1.0. Problem . Comment on the utility of the three different graphs shown in Figure 6.7. Which graph is likely to be more useful for diagnosing problems with the model fit to the observation data?
117
6.5 EXERCISES
– – – – – – – –
SMALLEST AND LARGEST WEIGHTED RESIDUALS SMALLEST WEIGHTED RESIDUALS NAME WEIGHTED PERCENT OF NAME RESIDUAL OBJ FUNC 4.ss 39.68 1.ss -2.05 6.ss -0.552 2.89 2.ss 3.ss -0.506 2.43 8.ss 9.ss -0.275 0.72 10.ss 5.ss -0.114 0.12 7.ss
LARGEST WEIGHTED RESIDUALS WEIGHTED PERCENT OF RESIDUAL OBJ FUNC 24.01 1.59 1.17 13.05 0.993 9.34 0.882 7.37 0.178 0.30
STATISTICS FOR ALL RESIDUALS: AVERAGE WEIGHTED RESIDUAL: 0.114E+00 FIGURE 6.8 Smallest, largest, and average weighted residuals from the regression run in Exercise 5.2c. This is a fragment from the global output file of MODFLOW-2000. .
Does the value of R indicate a good match between the trends in the weighted simulated and weighted observed values? Is R a useful diagnostic statistic in this situation? Why?
(c) Graphs of weighted residuals against independent variables. Evaluate runs statistic. The weighted residuals from the regression of Exercise 5.2c are plotted on maps of the model layers in Figure 6.9. Problem . Do the weighted residuals shown in Figure 6.9 appear to be randomly distributed in space?
FIGURE 6.9 Weighted residuals for the steady-state regression plotted on maps of the two model layers.
118
EVALUATING MODEL FIT
# RESIDUALS >= 0. : 8 # RESIDUALS < 0. : 5 NUMBER OF RUNS : 5 IN 13 OBSERVATIONS INTERPRETING THE CALCULATED RUNS STATISTIC VALUE OF -1.02 NOTE: THE FOLLOWING APPLIES ONLY IF # RESIDUALS >= 0 . IS GREATER THAN 10 AND # RESIDUALS < 0. IS GREATER THAN 10 THE NEGATIVE VALUE MAY INDICATE TOO FEW RUNS: IF THE VALUE IS LESS THAN -1.28, THERE IS LESS THAN A 10 PERCENT CHANCE THE VALUES ARE RANDOM, IF THE VALUE IS LESS THAN -1.645, THERE IS LESS THAN A 5 PERCENT CHANCE THE VALUES ARE RANDOM, IF THE VALUE IS LESS THAN -1.96, THERE IS LESS THAN A 2.5 PERCENT CHANCE THE VALUES ARE RANDOM. FIGURE 6.10 Runs statistic and critical values from the regression run in Exercise 5.2c. This is a fragment from the global output file of MODFLOW-2000.
.
Comment on the physical reasons for the three large weighted residuals in model layer 1. It may be helpful to consider the dimensionless scaled sensitivities of Table 4.1.
The runs statistic and critical values for this problem are shown in Figure 6.10. For the steady-state model regression, there are less than 10 positive residuals and less than 10 negative residuals, and thus the printed critical values for the runs statistic are not applicable. In most situations, there will be enough positive and negative residuals so that the critical values do apply. For cases where the critical values are applicable, understanding the runs statistic can be facilitated by locating the runs statistic and critical values on a normal probability distribution. Problem . Draw a normal probability distribution and locate the value of the test statistic and the critical values. Remember that this is a two-tailed test, so include the critical values printed in the file, and also the critical values of the other tail of the distribution. . Given the runs test statistic value and the critical values, what do you conclude about the randomness of the weighted residuals with respect to their order in the MODFLOW-2000 or UCODE_2005 input files? When answering this question, ignore the problem that the steady-state regression has too few negative and positive residuals.
6.5 EXERCISES
119
FIGURE 6.11 Normal probability graph of the weighted residuals from Exercise 5.2c.
(d) Evaluate normal probability graphs and the correlation coefficient R2N. A normal probability graph of the weighted residuals from Exercise 5.2c is shown in Figure 6.11. R2N (Eq. (6.18)) is shown in Figure 6.12. Use this plot and associated statistic to test the independence and normality of the weighted residuals. Problem . Do the weighted residuals appear to be normally distributed in Figure 6.11? Compare the results of this analysis with the calculated value of R2N shown in Figure 6.12. . Generate 10 sets of 13 normally distributed random numbers and calculate the R2N statistic for each. Compare these values to the critical value for the 5-percent significance level. Compare how many of the 10 R2N values are less than the critical value to how many are expected to be less than the critical value. (e) Determine acceptable deviations from random, independent, and normal weighted residuals. Graphs of independent and correlated random numbers versus weighted simulated values from Exercise 5.2c are shown in Figure 6.13, and normal probability graphs of independent and correlated random numbers are shown in Figure 6.14. These graphs are used to test expected correlation between the
120
EVALUATING MODEL FIT
CORRELATION BETWEEN ORDERED WEIGHTED RESIDUALS NORMAL ORDER STATISTICS FOR OBSERVATIONS = 0.941
AND
CORRELATION BETWEEN ORDERED WEIGHTED RESIDUALS NORMAL ORDER STATISTICS FOR OBSERVATIONS AND PRIOR INFORMATION = 0.926
AND
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –-– – COMMENTS ON THE INTERPRETATION OF THE CORRELATION BETWEEN WEIGHTED RESIDUALS AND NORMAL ORDER STATISTICS: Generally, IF the reported CORRELATION is LESS than the critical value, at the selected significance level (usually 5 or 10%), the hypothesis that the weighted residuals are INDEPENDENT AND NORMALLY DISTRIBUTED would be REJECTED. HOWEVER, in this case, conditions are outside of the range of published critical values as discussed below. The sum of the number of observations and prior information items is 13 which is less than 35, the minimum value for which critical values are published. Therefore, the critical values for the 5 and 10% significance levels are less than 0.943 and 0.952, respectively. CORRELATIONS GREATER than these critical values indicate that, probably, the weighted residuals ARE INDEPENDENT AND NORMALLY DISTRIBUTED. Correlations LESS than these critical values MAY BE ACCEPTABLE, and rejection of the hypothesis is not necessarily warranted. The Kolmogorov-Smirnov test can be used to further evaluate the residuals. – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –-– – FIGURE 6.12 R2N statistic and critical values from the regression run in Exercise 5.2c. This is a fragment from the global output file of MODFLOW-2000.
6.5 EXERCISES
121
FIGURE 6.13 Graphs of four sets of normally distributed (a) independent and (b) correlated random numbers versus weighted simulated values from Exercise 5.2c, as needed in Exercise 6.2e.
122
EVALUATING MODEL FIT
FIGURE 6.14 Normal probability graphs of four sets of normally distributed (a) independent and (b) correlated random numbers related to Exercise 5.2c, as needed in Exercise 6.2e.
6.5 EXERCISES
123
weighted residuals. Instructions for producing the data sets needed for these graphs are available from the web site for this book as described in Chapter 1, Section 1.1. Problem . Is the behavior of the weighted residuals more similar to that of the generated independent or correlated random numbers? To answer this question, compare Figure 6.7a with Figure 6.13, and compare Figure 6.11 with Figure 6.14. Explain your answer. . What conclusion can be drawn about the reason for the nonrandomness of the flow and prior weighted residuals in Figure 6.7a, and for the deviation of the weighted residuals from a straight line in Figure 6.11? Use your knowledge of the model construction and the observations and prior information.
7 EVALUATING ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
Once parameter values are estimated, they need to be evaluated for a number of reasons. In this chapter we present methods for analyzing estimated parameters with brief explanations of how the analyses are used. Additional discussions and examples are provided in Guideline 10 of Chapter 12. The methods described in this chapter start with reevaluation of composite scaled sensitivities. Next, five variations of the parameter variance-covariance matrix are introduced and statistics derived from the parameter variance-covariance matrix are defined. After comments about log-transformed parameters and when to use the five variations of the parameter variance-covariance matrix, five issues are discussed: (1) identification of individual observations that dominate the parameter estimates, (2) uniqueness and optimality of the estimates, (3) quantifying parameter uncertainty, (4) comparing parameter estimates against reasonable ranges, and (5) testing for model nonlinearity. The issues considered in this chapter span the first two components of the observation-parameter-prediction triad composed of entities that are directly connected by the model, as discussed in Chapters 1 and 10.
7.1
REEVALUATING COMPOSITE SCALED SENSITIVITIES
Composite scaled sensitivities (css) are a measure of the total information provided by the regression observations about a parameter value. These sensitivities were presented in Chapter 4, Section 4.3.4, which focused on using them to determine which Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty. By Mary C. Hill and Claire R. Tiedeman Published 2007 by John Wiley & Sons, Inc.
124
7.2 PARAMETER VARIANCE– COVARIANCE MATRIX
125
parameters to estimate by regression, and which to exclude because of insensitivity. At the optimal parameter values it is important to calculate css for all defined model parameters (those included in the regression as well as those excluded from the regression). These sensitivities will likely be different from those calculated for initial regression runs, because of model nonlinearity and because of scaling by bj in Eq. (4.3). If any parameters that were initially excluded from the regression appear to have increased in stature, additional regression runs should be considered with these parameters included. Although the css are useful measures of the information the data contain for a single parameter, they do not account for the many parameters being estimated simultaneously, and they do not measure the precision of the parameter estimates. Other statistics discussed in this chapter fill these roles.
7.2 USING STATISTICS FROM THE PARAMETER VARIANCE –COVARIANCE MATRIX The variance –covariance matrix on the parameters contains important information about parameter uncertainty and correlation, and about the support that the observations offer to the estimated parameters, given the model as constructed. This section presents five alternate versions of the variance– covariance matrix, statistics derived from the variance– covariance matrix, and the circumstances in which these statistics are used for each of the five versions of the matrix. Finally, the section discusses alternate statistics that are commonly suggested and notes that we believe they are more complicated without providing much additional insight for the purposes of evaluating parameter uncertainty and correlation.
7.2.1
Five Versions of the Variance – Covariance Matrix
The parameter variance– covariance matrix is calculated using an equation of the form V(b) ¼ s2 (XT vX)1
(7:1)
where V(b) is an NP by NP matrix, s2 is the calculated error variance (Eq. (6.1)), X is a matrix of sensitivities defined after Eq. (5.2b) and calculated for the parameters listed in the vector b, and v is a weight matrix defined after Eq. (3.2) and in Appendix A. It can be very useful to define X, b, and v differently to investigate different aspects of the model, the data, and the predictions. Five versions are presented here and discussed further in Section 7.2.5. 1. Variance – Covariance Matrix with Optimized Parameter Values. Only optimized parameters are included. Sensitivities are calculated for the optimized
126
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
parameter values, b ¼ b0 , and X is a matrix of sensitivities for the parameters estimated and the observations and prior information used in the regression (Bard, 1974, p. 59; Draper and Smith, 1998, p. 223). 2. Variance – Covariance Matrix with All Defined Parameters. This is the same as option 1 except that any defined parameters for which values were set and not estimated in the regression are included. This means that there are additional columns in the sensitivity matrix, X. In addition, in some circumstances this variation of the variance –covariance matrix includes realistic weighting. This means that (1) any weights that were altered to obtain parameter estimates need to be returned to values representative of realistic levels of error in the observations and prior information, and (2) if available, prior information and associated realistic weighting needs to be included, and is most important for parameters not estimated by the regression. For options 3 to 5, the useful statistics that are derived from the variance – covariance matrix do not depend on s2. 3. Variance – Covariance Matrix with Nonoptimal Parameter Values. This is the same as option 1 or 2 except that any set of parameter values can be used. 4. Variance – Covariance Matrix with Alternate Observation Sets. This is the same as option 1 or 2 except that different observations are included. Existing observations may be omitted or information on new observations may be added. This requires changes in the weight matrix, v, and, when adding observations, the sensitivity matrix, X. 5. Variance – Covariance Matrix with Predictions. This is the same as option 1 or 2 except that predictions are added. This requires changes in the weight matrix, v, and the sensitivity matrix, X. 7.2.2 Parameter Variances, Covariances, Standard Deviations, Coefficients of Variation, and Correlation Coefficients The precision (see definition in Chapter 1, Section 1.4.2) and correlation of parameter estimates can be analyzed by using the parameter variance –covariance matrix. The diagonal elements equal the parameter variances; the off-diagonal elements equal the parameter covariances. For a problem with three estimated parameters, the matrix would appear as Var(1) Cov(1, 2) Cov(2, 1) Var(2) Cov(3, 1) Cov(3, 2)
Cov(1, 3) Cov(2, 3) Var(3)
(7:2)
where Var(1) is the variance (sb21 ) of parameter 1, Cov(1, 2) is the covariance between parameters 1 and 2, and so on. The variance –covariance matrix is always symmetric, so that, for example, Cov(1, 2) ¼ Cov(2, 1). Equation (7.1) is most useful if the model is nearly linear in the vicinity of b0 (see Chapter 5, Section 5.1.2) and if the weight matrix is appropriately defined (see Chapter 3, Sections 3.3.3
7.2 PARAMETER VARIANCE– COVARIANCE MATRIX
127
and 3.4.2). A method of testing for model linearity is presented in Section 7.7. For nonlinear problems the variance– covariance matrix only approximates parameter uncertainty. Variances and covariances commonly are not intuitively understood, but they can be used to calculate informative statistics. The first of these is the parameter standard deviation, which equals the square root of the parameter variances. That is, sbj ¼ (Var( j))1=2
(7:3)
where Var( j) is the jth diagonal of the variance– covariance matrix. Parameter standard deviations have the same units as do the parameter values and are more easily understood measures of parameter uncertainty. However, parameter standard deviations are perhaps most useful when processed further to calculate three other statistics: confidence intervals for parameter values (presented in Section 7.5.1), coefficients of variation, and the t-statistic. The coefficient of variation for each parameter equals the standard deviation divided by the parameter value: c:v: ¼ sbj =b j
(7:4)
The coefficient of variation is a dimensionless number with which the relative accuracy of different parameter estimates can be compared. The t-statistic serves the same purpose and equals 1/c.v: or bj/sbj. The coefficient of variation is used instead of the t-statistic in this book. Correlation coefficients are calculated as the covariance between two parameters divided by the product of their standard deviations. Using the notation of Eq. (7.2), the correlation between the jth and kth parameter is pcc( j, k) ¼
Cov( j, k) Var( j)1=2 Var(k)1=2
(7:5)
Characteristics of parameter correlation coefficients were discussed in Chapter 4, Sections 4.3.5 and 4.4.2. Briefly, unique values are nearly always assured if the absolute values of all pcc are all less than about 0.95. However, unique estimates can be obtained with larger absolute values. Suspected problems with uniqueness can be tested as discussed in Section 7.4. 7.2.3
Relation Between Sample and Regression Statistics
For students unfamiliar with means, variances, covariances, standard deviations, coefficients of variation, and correlation coefficients, it can be beneficial to compare how they are calculated for sample data with how they are calculated in regression. The two situations are similar in that both attempt to use data to estimate some quantity and express the precision of the estimate. Sample data used in a comparison are shown in Figure 7.1. The equations for calculating the sample
128
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
FIGURE 7.1 Values and graph of x and two sets of y variables used to investigate sample variances, covariances, and correlation coefficients. The values of y1 equal the x values plus and minus small deviations; the values of y2 were generated from a random normal distribution with a mean of 4.5 and a standard deviation of 3.0.
statistics are shown in Table 7.1, with the calculated values of the sample statistics for the two data sets shown in Figure 7.1. These equations differ from those used to calculate the analogous regression statistics, which are presented in Eq. 5.6 and Section 7.2.1. The sample variance is a measure of the spread of the data. Table 7.1 shows that the sample variance for data set y1 is less than that for data set y2 (4.2 versus 11); this is expected given the wider range of values in data set y2 compared to data set y1. The sample covariance indicates whether x and y vary in a coordinated way, and the correlation coefficient is a scaled measure of the sample covariance. The covariance and the absolute value of the correlation coefficient for data set y1 are greater than the corresponding statistics for data set y2 (0.98 versus 20.26). This is because the y1 values vary in a systematic way with x, whereas the y2 values are more random with respect to the x values, as shown in Figure 7.1. In regression, the parameter values are not estimated by direct sampling. Instead, they are estimated indirectly using observations of the state of the simulated system. This can be accomplished because the simulation model used in the regression is based on equations that relate the observations and the parameter values. Because of this indirect way of estimating parameter values, parameter variances and covariances are calculated in a different manner from the sample equations of Table 7.1, as indicated in the lower half of this table. Interpretation of the variance and correlation of parameters estimated by regression is similar to that for the sample statistics but is not completely analogous. In regression, the variance indicates the range over which a parameter value could extend without affecting model fit too adversely, and the parameter correlation
129
7.2 PARAMETER VARIANCE– COVARIANCE MATRIX
TABLE 7.1 Equations for Sample Mean, Variance, Standard Deviation, Coefficient of Variation, Covariance, and Correlation Coefficients; the Values Calculated for the Data Sets Shown in Figure 7.1; and the Analogous Relations When Quantities Are Estimated by Regression Instead of Directly from Sample Data Sample Statistics Statistic
Equation (Davis, 2002) P x ¼ (1/n) Pi xi y0 ¼ (1/n) i yP i s2x ¼ 1/(n 2 1) Pi (xi 2 x0 )2 s2y ¼ 1/(n 2 1) i (yi 2 y0 )2 sx ¼ (s2x )1/2 sy ¼ (s2y )1/2 sx0 ¼ sx/n1/2 sy0 ¼ sy/n1/2 c.v.x ¼ sx/x0 c.v.y ¼ sy/y0 tx ¼ x0 /sx ty ¼ y0 /sy Cov P ¼ 1/(n 2 1)
0 0 i (xi 2 x ) (yi 2 y ) 0
Mean Variance Standard deviation Standard deviation of the mean Coefficient of variation t-statistic Covariance
y1 0
y2 0
x ¼4 x ¼4 y0 ¼ 6.1 y0 ¼ 4.38 s2x ¼ 4.7 s2x ¼ 4.7 s2y ¼ 4.2 s2y ¼ 11 sx ¼ 2.2 sx ¼ 2.2 sy ¼ 2.0 sy ¼ 3.3 sx ¼ 0.83 sx ¼ 0.83 sy ¼ 0.76 sy ¼ 1.25 c.v.x ¼ 0.55 c.v.x ¼ 0.55 c.v.y ¼ 0.33 c.v.y ¼ 0.75 tx ¼ 1.82 tx ¼ 1.82 ty ¼ 3.03 ty ¼ 1.33 4.4 21.9
P
(xi x0 )(yi y0 Þ P 0 2 1=2 0 2 1=2 i (xi x ) j (yj y )
r ¼ P
Correlation coefficient
i
0.98
20.26
¼ Cov=½sx sy When Parameters Are Estimated by Regression Statistic Mean (parameter estimate) Parameter variance
Parameter standard deviation Parameter coefficient of variation Parameter t-statistic Parameter correlation coefficient
Description Symbol: bj. Estimated using the observations, the model, and the modified Gauss –Newton normal equations (Eq. 5.6). Symbol: s2bj ; diagonals of Eq. (7.1) and (7.2). This equation uses the following quantities: (1) the sensitivities, as measures of the information provided for the parameter; (2) the weights, as measures of the error in the observations; and (3) the calculated variance of the regression, as a measure of model fit to the observations. Symbol: sbj ; Eq. (7.3). Equals the square root of the parameter variance. Analogous to the sample standard deviation of the mean instead of the standard deviation of the population. Symbol: c.v.; Eq. (7.4). Equals (parameter standard deviation)/ (parameter estimate) Equals 1/c.v. ¼ (parameter standard deviation)/(parameter estimate) Symbol: pcc(i, j); Eq. (7.5). Instead of measuring how closely x tracks y, it measures whether coordinated changes in two parameters would result in the same simulated values and, therefore, the same model fit to the observations and same objective function value.
130
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
coefficients indicate whether coordinated changes in the parameter values could produce the same simulated values and, therefore, the same model fit.
7.2.4
Statistics for Log-Transformed Parameters
For log-transformed parameters, the parameter estimates, coefficients of variation (Eq. (7.4)), and confidence intervals (discussed in Section 7.5.1) can be difficult to interpret. It is advantageous to present statistics related to native parameter values to encourage comparison with field data. The native estimate is calculated as the exponential of the log-transformed estimate obtained by regression. The nature of the log-normal distribution means that the native value reported is the mode instead of the mean of the log-normal distribution. Using the mode as the measure of central tendency of the distribution has the advantage of producing a native parameter value that produces the regression results when used in the model input files, and this consideration overrides the need to use the mean of the log-normal distribution. Confidence intervals on the native equivalent of log-transformed parameters are reported as the exponential of the confidence interval limits calculated for the logtransformed parameter. For log-transformed parameters, the linear confidence intervals for the true, unknown native parameters are symmetric when plotted on a log scale, but are not symmetric when plotted on an arithmetic scale. Despite this asymmetry of the intervals on an arithmetic plot, it is often easier for modelers to interpret and communicate to others the ranges for the native parameters than the ranges for the log-transformed parameters. Standard deviations and coefficients of variation (the standard deviation divided by the estimate) for the native parameter estimates are obtained by converting the variance for the log-transformed parameter, (slog b)2, using the expression s2b ¼ exp½2:3(slog b )2 þ 2:0 log b½exp(2:3(slog b )2 ) 1:0
(7:6)
where the exponentials and logarithms are in base 10, b is the value of the native parameter, and log b is the estimated log-transformed parameter. The coefficient of variation of the native parameter is calculated by dividing the square root of its variance by the native parameter value.
7.2.5 When to Use the Five Versions of the Parameter Variance –Covariance Matrix This section presents the circumstances for which the five variations of the variance– covariance matrix (Section 7.2.1) are used. Also discussed are the statistics calculated from the matrix that are most useful in each circumstance. Matrix with Optimized Parameter Values This version of the variance–covariance matrix is routinely calculated if regression is used for model calibration. Useful
7.2 PARAMETER VARIANCE– COVARIANCE MATRIX
131
statistics include the parameter coefficients of variation and the parameter correlation coefficients. Matrix with All Defined Parameters In Eq. (7.1), the sensitivity and weight matrices usually contain entries only for the parameters estimated by regression. In many situations, there are additional defined model parameters that are excluded from the regression because of insensitivity and/or nonuniqueness detected using the sensitivity analysis discussed in Chapter 4, or for other reasons. It is important to periodically calculate sensitivities and the variance –covariance and correlation matrices for all defined model parameters, for two reasons. First, it is important to determine whether updated parameter values or other modifications to the model have changed conclusions about insensitivity and nonuniqueness, and to evaluate observations and parameters from the perspective of predictions. Including all defined parameters can be accomplished easily using UCODE_2005 and MODFLOW-2000 by activating unestimated parameters. Second, when evaluating the uncertainty of predictions or performing other related analyses, it is important to include all defined parameters to obtain realistic results. Parameters that may not have been important to observations may be important to predictions, and this can be determined only if all defined parameters are included in the analysis. When activating all parameters to evaluate model predictions, it is important to include prior information (Chapter 3, Section 3.4.3) and associated weighting for the parameters that were not estimated by regression. This allows for a realistic degree of uncertainty in these parameters to be reflected in analyses of prediction uncertainty. If prior information on these parameters is not included, the contribution of parameter uncertainty will be unrealistically large. The prior value specified needs to equal the parameter value, so that the numerator of the s2 term in Eq. (7.1) is not affected. The denominator will not be affected if one item of prior information is included for each added parameter. The weights on the prior information need to reflect the uncertainty in the independent information about the parameter values. Weighting strategies are discussed in more detail in Guideline 6 in Chapter 11. Matrix with Nonoptimal Parameter Values Equation (7.1) can be calculated for any set of parameter values, and some of the resulting statistics are very useful for diagnosing problems with the regression (Anderman et al., 1996; Poeter and Hill, 1997; Hill et al., 1998; Hill and Østerby, 2003). For example, parameter correlation coefficients calculated with the starting model parameter values are a very important aspect of the sensitivity analysis performed at the initial stages of the regression, as discussed in Chapter 4, Section 4.2.3. Matrix with Alternate Observation Sets The fourth version of Eq. (7.1) involves observation sets that are different from that used to calibrate the model. Two such alternative observation sets are used in this book. Both are used to calculate the observation-prediction (opr) statistic for evaluating the importance of observations to model predictions, discussed in Chapter 8, Section 8.3.2. The first set consists
132
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
of the calibration observations with one or more of the observations used in model calibration omitted, which is used to evaluate existing observations in the context of model predictions. The second set consists of the calibration observations with one or more observations added, to evaluate potential new observations in the context of model predictions. Usually these analyses with alternate observation sets are conducted using the variance –covariance matrix with all defined parameters represented and with weighting that reflects realistic errors in observations and prior information. Matrix with Predictions A version of Eq. (7.1) can be used to determine if parameters that are highly correlated given the observations used in the regression are problematic to predictions of interest. This version is discussed in Chapter 8, Section 8.2.4 and is generally calculated using all defined model parameters. 7.2.6 Some Alternate Methods: Eigenvectors, Eigenvalues, and Singular Value Decomposition Alternate methods available for evaluating parameter uncertainty and correlation include calculation of the eigenvectors and eigenvalues of the parameter variance– covariance matrix and singular value decomposition (SVD) of the weighted sensitivity matrix. Both produce eigenvectors and eigenvalues. Large eigenvalues identify important eigenvectors. Each eigenvector identifies a linear combination of parameters and parameters with larger coefficients dominate the eigenvector. Dominant parameters in important eigenvectors are important parameters. Hill and Østerby (2003) compared the ability of parameter correlation coefficients and the SVD method to detect extreme parameter correlation. They found that both methods performed similarly for a simple hypothetical groundwater flow model similar to that used in the exercises of this book. Parameter correlation coefficients are emphasized in this work because they are easy to interpret, as discussed in Section 4.3.5. Sometimes parameter correlations are criticized because they only identify extreme correlation between pairs of parameters. However, as noted in Section 4.3.5, if more then two parameters are correlated all pairs will have correlation coefficients with absolute values close to 1.00, so the fact that pcc are calculated only for parameter pairs is rarely a meaningful limitation. MODFLOW-2000 and UCODE_2005 can calculate the eigenvectors and eigenvalues of the parameter variance –covariance matrix, so modelers who prefer to use these measures can do so with these computer programs.
7.3 IDENTIFYING OBSERVATIONS IMPORTANT TO ESTIMATED PARAMETER VALUES Different observations can play different roles in the regression. Even in regressions with hundreds of observations, one or two can profoundly affect parameter estimates. Important observations are not consistently associated with either very
7.3 OBSERVATIONS IMPORTANT TO ESTIMATED PARAMETER VALUES
133
large or very small weighted residuals, or large scaled sensitivities. As discussed in Chapter 4, dimensionless and composite scaled sensitivities can be used to identify observations important to individual parameters, but they cannot identify observations that reduce parameter correlation. Though parameter correlation coefficients can be used to address this concern, statistics that integrate the effects of sensitivity and correlation are needed. The regression equations presented in Chapter 5 and the parameter variance– covariance matrix of this chapter can be used to create statistics that integrate effects measured separately by scaled sensitivities and parameter correlation coefficients. The statistics are similar to the scaled sensitivities and parameter correlation coefficients in that they take advantage of the model as a quantitative connection between the simulated equivalents to the observations and the model parameters. The scaled sensitivities and parameter correlation coefficients continue to be useful in part because they can be used to understand why different observations are important. As for the previous analyses, the statistics presented here depend on how the model is constructed. Three statistics are presented: one is a leverage statistic; two are influence statistics. Leverage and influence are two important measures of the role observations play in regression. Leverage statistics were mentioned in Chapter 4, Section 4.3.6, and depend only on the independent variables associated with an observation, such as its type, location, and time. Influence depends on the observed value as well. The concepts of leverage and influence are illustrated in Figure 7.2 for a simple linear regression problem. In Figure 7.2a, the outlier data point has high leverage because its x location is very different from that of all other observations. However, it does not have high influence, because its presence does not cause the regression results to significantly differ from the results that are obtained in its absence. In contrast, in Figure 7.2b, the outlier has both high leverage and high influence. It has the same
FIGURE 7.2 The effect on a simple linear regression of an observation with (a) high leverage and (b) high leverage and high influence. (From Helsel and Hirsch, 2002, Figure 9.19.)
134
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
x location as the outlier in Figure 7.2a, but it has a y value that causes the regression line to be significantly different from the line that is obtained in its absence. In general, whether or not the observation actually dominates the regression depends on how consistent its observed value is with the simulated equivalent calculated using all other observations. If it does dominate, it not only has high leverage but also has high influence. In linear models, influential observations are a subset of the observations with substantial leverage; this relation applies approximately to nonlinear models. Often it is useful to calculate leverage and influence statistics using all defined parameters. Dominant observations deserve extra attention to ensure that simulated equivalents are appropriate and simulated correctly, that the observation is correctly determined from field data, and that the observation errors are fully considered in the weighting.
7.3.1
Leverage Statistics
Leverage statistics identify observations that are sensitive in a way that causes the observed values to potentially have a profound effect on the regression results. In general, an observation is more likely to have high leverage if its location, time, circumstance, or type provides unusual information to the regression, as for the outlier points in Figure 7.2. Leverage is calculated as (Helsel and Hirsch, 2002, p. 246): hii ¼ (v1=2 X )i (X TvX )1 (X Tv1=2 )i
(7:7)
where (v1/2 X )i is a row vector of the weighted sensitivities associated with the ith observation, (XTv1/2)i is a column vector equal to the transpose of (v1/2X )i, (XTvX)21 is from Eq. (7.1), and hii is the leverage of the ith observation. Values of hii range from 0.0 to less than 1.0. Values close to 1.0 identify observations with high leverage. The leverage hii calculated in Eq. (7.7) is the ith diagonal of the “hat” matrix v1/2X(XTvX)21XTv1/2 (Belsley et al., 1980, p. 16; Draper and Smith, 1998, p. 205). The full matrix is used for advanced regression analyses that are not discussed in this book (see, e.g., Cook and Weisberg, 1982).
7.3.2
Influence Statistics
Whereas leverage statistics indicate the potential importance of an observation to the estimation of a parameter, the actual effect of the observation in the regression also depends on the observed values, as illustrated in Figure 7.2. The Cook’s D and DFBETAS influence statistics incorporate this effect. The Cook’s D statistics are calculated for each observation and measure the influence of each individual observation on the estimation of the set of parameters as a whole. DFBETAS are calculated for each parameter bj and each observation yi and measure an
7.3 OBSERVATIONS IMPORTANT TO ESTIMATED PARAMETER VALUES
135
observation’s effect on a single parameter value. Both statistics were first applied to groundwater models by Yager (1998). Cook’s D Cook’s D is a measure of how a set of parameter estimates would change with omission of an observation, relative to how well the parameters are estimated given the entire set of observations. Cook’s D is defined as follows, where the first expression explicitly shows the components of the statistic and the second is used for calculation (Cook and Weisberg, 1982, p. 116; Draper and Smith, 1998, p. 211; Cook and Weisberg, 1999, pp. 357– 360; Helsel and Hirsch, 2002, p. 248):
1 (b(i) b0 )T s 2 (XT vX)1 (b(i) b0 ) 1 2 hii ¼ r Di ¼ NP NP i 1 hii
(7:8)
where b0 ¼ the set of parameter values optimized using all observations; b(i) ¼ the linear estimate of the set of parameter values that would be estimated if the ith observation were omitted; X ¼ a matrix of sensitivities, as defined before Eq. (5.2b); v ¼ the weight matrix of Eq. (3.2) and (5.1); NP ¼ the number of estimated parameters; s 2 ¼ the variance of the regression; ri ¼ the ith weighted residual divided by its standard error, calculated as fi/[s (1 2 hii)1/2]; fi ¼ the ith weighted residual of the regression with all observations; hii ¼ the leverage of the ith observation, calculated by Eq. (7.7). The variance of the regression, s 2, and its square root, s, are estimated using s2 (Eq. (6.1)) and s, the variance and standard error of the regression, respectively. For Cook’s D to be large, the misfit needs to be large relative to the expected accuracy of the observation (ri is large) and/or the leverage term needs to be large. Nonlinearity of influence measures that fill the same purpose as Cook’s D were investigated by Ross (1987). The measures were based on likelihood distances. One performs like Cook’s D and others were suggested by Cook and Weisberg (1982). The one that performed like Cook’s D performed well in the presence of high parameter-effects curvature (the terminology of Bates and Watts, 1980), which is called nonintrinsic nonlinearity by Christensen and Cooley (2005). See Section 7.7 for a discussion of linearity measures. This means that Cook’s D is more robust for many nonlinear models than, for example, the sensitivity methods discussed in Chapter 4. It is suspected that the fit-independence of the methods discussed in Chapter 4 is advantageous for initial models. However, more testing is needed and the advantage may depend on model nonlinearity and the misfit of the initial model. Cook’s D can be calculated very quickly.
136
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
Two distinctly different critical values for Cook’s D have been suggested. On the basis of comments by Cook (1977a), Helsel and Hirsch (2002, p. 248) suggest using Fa¼0.1(NP þ 1, ND þ NPR 2 NP), where Fa is the value of the F distribution (Table D.7 in Appendix D) with a significance level of 0.1 and with NP þ 1 and ND þ NPR 2 NP degrees of freedom. Cook (1977b) and Cook and Weisberg (1982, p. 116) note, however, that Cook’s D is not distributed as F. Rawlings (1988) suggests using 4/(ND þ NPR), which results in a much smaller critical value and more observations being identified as influential. The Rawlings (1988) critical value is used in this book. The critical value for Cook’s D lacks an associated significance level. That is, unlike confidence intervals, no probability level is suggested by the critical value of Cook’s D. It simply identifies observations that are more influential than the other observations. DFBETAS The DFBETAS statistic (pronounced d-f-beta-s) measures the influence of one observation, yi, on one parameter, bj. DFBETASij is calculated as follows, where the first expression again explicitly shows the components of the statistic and the second is used for calculation (Belsley et al., 1980):
DFBETASij ¼
(b0j b0j(i) ) 1=2 s(i)½(X T vX)1 jj
¼
cji ND P
k¼1
c2jk
1=2
fi s(i)(1 hii )
(7:9)
where bj0 ¼ the optimized value of the jth parameter using all observations; 0 ¼ the optimized value of the jth parameter omitting only the ith obserbj(i) vation; s(i) ¼ an alternate to s as an estimate of s, chosen to make the denominator statistically independent of the numerator under normal theory, and calculated as (Belsley et al., 1980, p. 14): s(i)¼½1=(NDþNPRNP1)½(NDþNPRNP)s2 fi2 =(1hii )1=2 ; cji ¼ an entry of the matrix product C ¼ (XTvX )21v1/2XT. All other symbols are defined after Eq. (7.8). A value of DFBETASij greater than the critical value of 2/(ND þ NPR)1/2 (Belsley et al., 1980, p. 28) indicates that the ith observation is influential in the estimation of the jth parameter. The likelihood of a single observation being influential to a single parameter will generally decrease as the number of regression observations increases. The critical value for DFBETASij takes this into account, as it decreases with increasing ND þ NPR. As a result, there are roughly the same proportion of influential observations identified regardless of the size of ND þ NPR. DFBETAS can be calculated very quickly.
7.5 QUANTIFYING PARAMETER VALUE UNCERTAINTY
137
7.4 UNIQUENESS AND OPTIMALITY OF THE ESTIMATED PARAMETER VALUES Two important questions are (1) given the constructed model, are the observations and prior information sufficient to have estimated the one and only set of parameter values that provide the best fit? and (2) does a set of parameter values exist that produces a better fit than that achieved? The first issue primarily requires investigation of uniqueness, the second primarily involves optimality. Uniqueness of the estimated parameter values can be investigated by (a) evaluating parameter correlation coefficients and (b) repeating the regression using different starting values. These methods are the focus of Sections 4.3.5, 4.4.2, and 7.2.2, and Exercises 4.1c, 5.1a, and 7.1f. The method in (a) is considered to be a local method because parameter correlation coefficients apply locally in the objective function surface (see Figure 4.2). Optimality can not be investigated using local methods; they require more computationally demanding global methods such as that described in (b). As mentioned previously, nonunique parameter estimates may be indicated if parameter correlation coefficients calculated at the optimal parameter estimates are greater than about 0.95 in absolute value or pcc accuracy is suspect because of inaccurate sensitivities and/or insensitive parameters. In these situations or to test optimality, additional regression runs can be useful, as shown in Exercise 7.1f. If significantly different parameter estimates result from the regression runs with different starting values, and these estimates produce nearly identical values of the objective function (Eq. (3.1) or (3.2)), the parameter estimates are not unique. If smaller values are encountered, the orginal solution is not optimal. In the case of nonuniqueness, the identical objective function values generally are produced because coordinated changes in parameter values produce identical simulated equivalents. This indicates that the available observation data are insufficient to uniquely estimate each parameter value. To reduce correlation and improve the likelihood of obtaining a unique solution and to address nonoptimality, parameters can be redefined, observations can be added to the regression, or prior information on the correlated parameters can be added to the regression. Redesigning parameters and prior information are discussed in Chapters 11 and 12 in Guidelines 3, 5, and 10. 7.5
QUANTIFYING PARAMETER VALUE UNCERTAINTY
Two methods of quantifying prediction uncertainty are discussed—inferential statistics and Monte Carlo methods. 7.5.1
Inferential Statistics
Linear inferential statistical methods are used here to calculate confidence intervals on estimated parameter values. Nonlinear confidence intervals also are discussed briefly and are calculated in Exercise 7.1g, but details of calculating nonlinear intervals are presented in Chapter 8, which focuses on evaluating predictions. The most common use of nonlinear intervals is to assess prediction uncertainty.
138
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
Confidence intervals on parameter values are intervals that, with a specified likelihood, contain the true, unknown parameters, if the model is correct. Here we consider individual confidence intervals because they are most often used for parameters. Other types of intervals are discussed in Chapter 8 because they are often used to quantify the uncertainty of predictions. Other types of intervals on parameters can be calculated using the ideas and methods presented for intervals on predictions. Confidence intervals are discussed in many texts, such as Miller (1981), Seber and Wild (1989), Cooley and Naff (1990), Davis (2002), and Helsel and Hirsch (2002). Linear Individual Confidence Intervals An individual confidence interval on a quantity, such as a parameter estimate, has a specified probability of including the true value of the quantity, regardless of whether confidence intervals on other quantities, such as other parameter estimates, include their true values. Usually individual intervals are used to evaluate uncertainty in parameter estimates. An individual linear confidence interval for the true, unknown jth parameter bj is calculated as b j + tðn, 1:0 a=2Þsb j
(7:10)
where tðn, 1:0 a=2Þ is the Student t-statistic (Appendix D, Table D.2) for n degrees of freedom and a significance level of a; n is the degrees of freedom, here equal to ND þ NPR 2 NP; and sbj is the standard deviation of the jth parameter. Because a confidence interval is a range that has a stated probability of containing the true value, it is stated in terms of the true, unknown value that is being estimated. Thus, Eq. (7.10) is said to be the confidence interval for the true value of the jth parameter, bj, and the width of the confidence interval is a measure of the likely precision of the estimate. Narrower intervals indicate greater precision. If the model correctly represents the system, the interval also can be thought of as a measure of the likely accuracy of the estimate. Definitions of precision and accuracy relevant to parameter estimates are given in Chapter 1, Section 1.4.2. Linear confidence intervals truly represent uncertainty at the given significance level only to the extent that the assumptions underlying the calculation of Eq. (7.10) are satisfied. These requirements are discussed in Chapter 3, Section 3.3. Normality of the parameter estimates is required because the Student t-statistic is used in Eq. (7.10). The Student t-distribution is similar to a normal distribution but accounts for small sample sizes. The normality assumption is tested using methods for assessing the normality of weighted residuals (Chapter 6, Sections 6.4.5 and 6.4.6), because the probability distribution of the true errors is unknown. Linear confidence intervals require trivial amounts of execution time, and individual linear 95-percent confidence intervals are calculated and printed by UCODE_2005, MODFLOW-2000, and PEST. However, in many natural systems the assumptions discussed above are not met and calculated linear confidence intervals are not accurate. More accurate nonlinear intervals developed by Vecchia and Cooley (1987) can be calculated, as discussed next and in Chapter 8, Section 8.4.3, but require substantial execution time.
7.5 QUANTIFYING PARAMETER VALUE UNCERTAINTY
139
Nonlinear Individual Confidence Intervals Nonlinearity of simulated values with respect to parameters is discussed in Chapter 1, Section 1.4.1. The nonlinearity of a model with respect to its parameters can be evaluated using the methods described in Section 7.7. For nonlinear models, linear confidence intervals on parameters calculated using Eq. (7.10) can be inaccurate. More accurate nonlinear confidence intervals can be calculated using inferential statistics, as described briefly here, or using Monte Carlo methods, as described in Section 7.5.2. Vecchia and Cooley (1987) developed inferential methods to compute nonlinear confidence intervals on any function of the model parameters. For nonlinear intervals on a parameter, the function is specified to be the value of a single parameter. Calculating a nonlinear confidence interval involves finding the smallest and largest parameter values on a confidence region for the model parameters, as illustrated in Figure 7.3 for parameter b1 of a hypothetical two-parameter model. Unlike a linear confidence interval, the nonlinear confidence interval generally is not symmetric about the optimal value of b1: in Figure 7.3, the upper limit of the interval is much further from b10 than is the lower limit of the interval. The method for calculating nonlinear intervals is substantially more complicated and more computationally intensive than is the method for calculating linear intervals, as discussed in Chapter 8, Section 8.4.3. MODFLOW-2000’s UNC Process (Christensen and Cooley, 2005), UCODE_2005, and PEST support calculation of nonlinear confidence intervals for parameter values. Because they are expensive computationally, these intervals usually are calculated only for selected quantities of interest. If computation time is a limiting factor, it is likely that the intervals will be calculated for model predictions instead of for model parameters. Nonlinear intervals for parameters are presented here largely to introduce students to nonlinear intervals in as simple a context as possible.
FIGURE 7.3 Confidence region (shaded area) and upper (b1,U) and lower (b1,L) limits of a nonlinear confidence interval on parameter b1, for a hypothetical two-parameter model. (Adapted from Christensen and Cooley, 1999, Figure 9.)
140
7.5.2
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
Monte Carlo Methods
There are two approaches with which random sampling by Monte Carlo analysis can be used to evaluate uncertainty of estimated parameter values. The first approach involves investigating the variation in estimated parameter values that would result if the observations had a different realization of error. These methods are generally called bootstrap methods (Efron and Tibshirani, 1993; Chernick, 1999) and can produce measures of uncertainty that are consistent with confidence intervals. The calibrated model is used to generate sets of observations, which are then contaminated with noise. Prior information can be generated on the basis of the estimated parameter values. The generated observations and prior information are then used to estimate parameter values. This type of Monte Carlo analysis essentially addresses the question of how much the estimated parameter values would vary for different realizations of error in the observations and prior information. The second way Monte Carlo methods can be used is to investigate how different model construction alternatives would affect the estimated parameter values (see Poeter and McKenna, 1995). These Monte Carlo runs could be combined with the first type, or confidence intervals could be calculated for the alternative model constructions using inferential methods. The Monte Carlo approach is discussed in Chapter 8, Section 8.5 in the context of predictions. Beven and Binley (1992) and Binley and Beven (2003) present an interesting method of portraying Monte Carlo results called dottie plots. In their Monte Carlo analyses, many forward model runs are conducted using different parameter values, and for each a function of the sum of squared residuals is calculated such that larger values indicate a better fit. The dottie plots consist of x– y graphs with these statistics plotted against each parameter value. Optimal parameter values exist if there is a peak in the dottie plot.
7.6 CHECKING PARAMETER ESTIMATES AGAINST REASONABLE VALUES When plotted on graphs with the related estimated values, linear confidence intervals can provide a vivid image of the approximate precision with which parameters are estimated using the data included as observations in the regression, given the constructed model. It often is useful to compare these intervals to ranges of reasonable parameter values. This comparison can be a powerful tool for diagnosing error in data interpretation and model construction. As discussed in Chapter 5, Section 5.5, avoiding limits that constrain the estimated parameters allows the regression to estimate unreasonable parameter values, and thus makes this comparison possible. In Figure 7.4, the estimates and confidence intervals for three hypothetical parameters are plotted with reasonable ranges for each parameter. This figure shows three situations that might result from considering reasonable parameter ranges.
7.6 CHECKING PARAMETER ESTIMATES AGAINST REASONABLE VALUES
141
FIGURE 7.4 Graph illustrating the comparison of parameter estimates and confidence intervals with the reasonable range of parameter values. Closed circles are parameter estimates, black bars are confidence intervals, and grey bars represent the range of reasonable values for each parameter.
1. For parameter A, the parameter estimate and most of the confidence interval lie within the reasonable range of values. This suggests that the estimate is consistent with independent information about the parameter. 2. For parameter B, the estimate and entire confidence interval lie outside the range of reasonable values, meaning that the regression data together with the given model construction produce a parameter estimate that is inconsistent with independent information about the parameter. This indicates the existence of model bias and that the data, as represented by the observations and the reasonable range of parameter values, are sufficient to detect it. In this situation, the interpretation of the observations, prior information, model construction and reasonable range need to be carefully scrutinized. 3. For parameter C, the estimate is unreasonable, but the confidence interval partly lies in the reasonable range of values. This result indicates that there may or may not be model bias; the data are insufficient to make either conclusion. In this last situation, the modeler needs to consider both (a) the possibility of model error and (b) additional data that could provide information toward estimating the parameter value or the reasonable range more precisely. Linear confidence intervals often are sufficient for this analysis, though the analyses also can be performed using nonlinear intervals. The analysis described above does not evaluate one very important characteristic of reasonable parameter values. In some situations the reasonable ranges of two parameters may overlap, but it is known that the value of one should be greater or less than the value of the other. That is, in addition to the requirement that parameters lie in their respective reasonable ranges, the relative magnitudes of two or more different parameter values are important. This is a valuable test that needs to be considered when evaluating estimated parameter values (Poeter and McKenna, 1995; Poeter and Anderson, 2005). If the parameters cannot be estimated uniquely, a parameter that equals the ratio of the parameters could be estimated, and the ratio
142
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
could be evaluated for its consistency with known relative properties. This manipulation of parameters can be accomplished with UCODE_2005 and PEST, and, in some circumstances with MODFLOW-2000. 7.7
TESTING LINEARITY
The application and utility of some of the methods presented in this chapter depend on the linearity of the model with respect to the parameter values. Although the modified Gauss – Newton optimization method and many of the statistical methods discussed are useful even for problems that are quite nonlinear, more stringent requirements on linearity are needed for linear confidence intervals to represent parameter uncertainty adequately. Linearity can be tested using the modified Beale’s measure (also called Linssen’s measure) described by Cooley and Naff (1990, pp. 187– 189). The original Beale’s measure is described by Beale (1960). The modified Beale’s measure indicates nonlinearity of the parameter confidence region and does not directly measure nonlinearity of confidence intervals. It is, however, a good measure of nonlinearity that is likely to affect confidence intervals on parameters. The modified Beale’s measure tests model linearity with respect to the regression observations. Use of this measure as an indicator of model linearity with respect to predictions becomes increasingly problematic as the predictive quantities or situations differ more from the calibration observations and situations. For more information, see Chapter 8, Section 8.7. The modified Beale’s measure is calculated by the following four steps. Step 1. Sets of parameter values are generated that lie on the edge of the linear confidence region for the parameters. This is accomplished using the following equation: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
~b ¼ b0 + NP Fa (NP, ND þ NPR NP) V(b0 ) , j ¼ 1, NP (7:11) j sb j where b˜ ¼ a vector of generated parameter values; b0 ¼ a vector of optimal parameter estimates; Fa (NP, ND þ NPR NP) ¼ the value from the F distribution with significance level a (equal to 0.05 for the calculations in this book) and with NP and ND þ NPR 2 NP degrees of freedom; [V(b0 )]j ¼ a vector equivalent to the jth column of V(b0 ); sbj ¼ the standard deviation of the jth parameter, defined in Eq. (7.3). Because of the +, Eq. (7.11) yields two parameter vectors for each j, yielding a total of 2 NP generated parameter vectors. In the generated vectors, values of parameters with small variances generally are relatively near the optimal par-
143
7.7 TESTING LINEARITY
ameter estimates, and values of parameters with large variances generally are relatively far from optimal parameter estimates. The process by which Beale’s measure generates parameter values is statistical, and sometimes one or more of the parameter sets generated do not yield valid or accurate solutions. Resolution of such problems is discussed later. For the jth pair of generated parameter vectors, the jth parameter value usually varies the most, but parameter correlation causes other parameter values to vary as well. Step 2. Simulated equivalents of the calibration observations are computed by executing a forward model run for each generated set of parameter values. These simulated values are y~ ik , where i refers to the ith observation and k refers to the kth generated parameter vector. Step 3. Linearized estimates of the simulated values are calculated using the generated parameter sets as follows: NP
@y0 X b0j b~ 0jk i (7:12) y~ oik ¼ y0i þ @b 0 j b
j¼1
where y~ oik ¼ the linearized simulated equivalent of the ith observation calculated using the kth parameter set; y0j ¼ the simulated equivalent of the ith observation calculated using the original model and optimal parameter estimates; 0 bj ¼ the jth optimal parameter estimate; b~ 0jk ¼ the jth parameter value from the kth generated parameter set. bb , is calculated as a measure of the differStep 4. The modified Beale’s measure, N ence between the model-computed and the linearized estimates of the simulated values (Cooley and Naff, 1990, p. 188): 2NP ND P ND PP
y~ ik y~ oik viq y~ qk y~ oqk
bb ¼ NP s2 k¼1 i¼1 q¼1 N 2NP ND P ND PP k¼1 i¼1 q¼1
y~ oik y0i vij y~ oqk y0q
(7:13)
where y~ ik the simulated equivalent of the ith observation calculated using the original model and the kth parameter set. The program BEALE-2000 (Hill et al., 2000) can be used to calculate the modified Beale’s measure in conjunction with regression performed using MODFLOW-2000. The program MODEL_LINEARITY can be used to calculate the measure for regressions performed using UCODE_2005. In step 2 above, problems can occur with obtaining a forward model solution using the parameter values generated in step 1. The most common problems are that physically impossible negative parameter values are generated (such as for
144
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
hydraulic conductivity in groundwater models), or that solution with the specified solver convergence criteria is not possible within the number of allowed solver iterations. The problem of negative parameters can be solved by log-transforming the parameter(s) involved, possibly repeating the regression to account for any resulting change in optimized parameter values, and regenerating the parameter values for Beale’s measure. For any parameter with prior information, the log-transformation may require alteration of the statistic used to calculate the prior information weights, as discussed in Guideline 6 in Chapter 11. In some situations the solver will not converge for some of the data sets. As long as the final solution obtained is not too inaccurate, the resulting value of Beale’s measure will still reflect model nonlinearity adequately. If the lack of convergence is accompanied by a very inaccurate solution, this is, of course, problematic. Sometimes better results can be obtained using a different solver. To assess the degree of linearity of the model, the modified Beale’s measure calculated by Eq. (7.13) is compared with two critical values (Cooley and Naff, 1990, p. 189), as follows. If b Nb , 0:09=Fa (NP, ND þ NPR NP), the model is effectively linear, in that linear confidence intervals closely approximate the exact nonlinear confidence intervals of Vecchia and Cooley (1987). If b Nb . 1:0=Fa (NP, ND þ NPR NP), the model is highly nonlinear. If the modified Beale’s measure lies between these two critical values, then the model can be considered moderately nonlinear. Alternative methods for evaluating model nonlinearity with respect to parameters described by Cooley (2004) and Christensen and Cooley (2005) are derived from Beale (1960) and Linssen (1975). Using the terminology used in Poeter et al. (2005), the statistics are called total model nonlinearity and intrinsic model nonlinearity. Intrinsic model nonlinearity is the nonlinearity that cannot be removed by a parameter transformation of any kind. For many circumstances it is the intrinsic nonlinearity that is problematic. The following equations apply if the weighting satisfies v ¼ V(1)21 (see Chapter 3, Section 3.4.2). Total model nonlinearity can be calculated as follows. The sets of parameter values are calculated as b~ ¼ b0 +
pffiffiffiffiffiffiffi
NP V(b0 ) j , sbj
j ¼ 1, NP
(7:14)
The total nonlinearity statistic is calculated as 2NP ND P ND PP
b N¼
1 k¼1 i¼1 q¼1 NP s2
y~ ik y~ oik viq y~ qk y~ oqk 2 NP
(7:15)
Total model nonlinearity performs like the modified Beale’s measure (Eq. (7.13)) but is scaled differently. Values of total model nonlinearity that are less than 0.09
145
7.8 EXERCISES
indicate a linear model and values greater than 1.0 indicate a highly nonlinear model. The intrinsic model nonlinearity uses the sets of parameters from Eq. (7.14). The intrinsic model linearity statistic is calculated as follows, using matrix notation instead of the summations of Eqs. (7.13) and (7.15): 2NP P
b Nmin
1 ¼ NP s2
where
k¼1
T y~ k y~ ok Xc viq y~ k y~ ok X c 2 NP
c ¼ (X T vX)1 X T v y~ k y~ ok
(7:16)
(7:17)
As for total model nonlinearity, values of intrinsic model nonlinearity that are less than 0.09 indicate a linear model and values greater than 1.0 indicate a highly nonlinear model. The total and instrinsic model linearity are calculated using UCODE_2005 and associated code MODEL_LINEARITY_ADV, or with the UNC Process of MODFLOW-2000. Calculation of the statistics is illustrated in Exercise 7.3.
7.8
EXERCISES
Exercise 7.1: Parameter Statistics This exercise uses parameter statistics to evaluate the optimal parameter estimates from the steady-state model regression run of Exercise 5.2c. These statistics are used to reevaluate the importance of the observations to the parameter estimates, and to evaluate parameter uncertainty and correlation. (a) Evaluate composite scaled sensitivities. For nonlinear models, the composite scaled sensitivities calculated for the final estimated parameters are likely to be different from those for the starting parameter values. In the regression of Exercise 5.2c, the initial css considered in Exercise 4.1b (shown in Figure 4.3 and 7.5a) suggested that prior information (actually, regularization) was needed for parameters K_RB and VK_CB. It is important to examine the final css, shown in Figure 7.5b, to assess whether their relative values for the model parameters are similar. If the final css for the parameters with regularization have become larger (relative to the css for other parameters), then it is important to try to estimate these parameters without the regularization imposed. Problem . Discuss the differences between the initial (Figure 7.5a) and final (Figure 7.5b) css values in terms of model nonlinearity and scaling.
146
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
FIGURE 7.5 Composite scaled sensitivities from the (a) starting and (b) final steady-state model.
. .
.
How does nonlinearity and the scaling used affect the utility of the css? Do the css in Figure 7.5b suggest that the regression should be attempted without prior information specified for parameters K_RB or VK_CB? Use the css to explain why the weighted residuals for the prior information are so small (Figure 6.7a).
(b) Evaluate leverage statistics. Problem: Compare the leverage statistics of Exercise 4.le and Table 7.2 and comment on any differences. Refer to system dynamics and the added prior information. To help explain the leverage of hd07.ss, consider Table 7.4, which shows the parameter correlation coefficients calculated using the final parameter estimates from Exercise 5.2e and all the regression data except observation hd07.ss. (c) Evaluate importance using influence statistics. In this exercise, the importance of individual observations and prior information to the estimation of the model parameters is assessed using the Cook’s D and DFBETAS measures. Table 7.2 shows the Cook’s D values for the steady-state regression. The critical value of Cook’s D is 0.308. Table 7.3 shows the DFBETAS statistics for the steady-state regression; the critical value is 0.555. Problem .
Which observations have values of Cook’s D and of DFBETAS that exceed the critical values? Why would these observations be most influential to the
147
7.8 EXERCISES
TABLE 7.2 Leverage Statisticsa and Cook’s D Valuesb for the Steady-State Regression Observation Name
Leverage Statistic
Cook’s D
hd01.ss hd02.ss hd03.ss hd04.ss hd05.ss hd06.ss hd07.ss hd08.ss hd09.ss hd10.ss flow01.ss K_RB prior VK_CB prior
0.00 0.14 0.18 0.14 0.22 0.19 0.96 0.18 0.84 0.19 1.00 1.00 1.00
0.0013 0.0283 0.0079 0.0860 0.0005 0.0093 0.5934 0.0300 0.2879 0.0247 589.1 35.94 72.16
a
The values in bold type are larger than 0.90. The values in bold type are larger than the critical value of 0.308.
b
TABLE 7.3
DFBETAS Valuesa for the Steady-State Regression DFBETAS
Observation Name hd01.ss hd02.ss hd03.ss hd04.ss hd05.ss hd06.ss hd07.ss hd08.ss hd09.ss hd10.ss flow01.ss K_RB prior VK_CB prior a
HK_1
K_RB
VK_CB
HK_2
RCH_1
RCH_2
0.030 20.074 0.020 0.162 0.0012 0.075 20.586 20.030 20.098 20.046 259.4 20.611 21.60
20.089 20.013 20.006 0.029 0.0003 20.001 20.001 0.009 0.011 0.003 20.088 14.7 0.010
0.000 0.018 0.003 20.040 0.002 20.007 20.012 20.001 20.009 20.013 20.018 0.007 20.1
20.006 20.001 20.005 0.001 0.016 20.114 1.74 20.019 20.032 20.037 1.14 0.929 4.05
0.018 0.069 0.001 20.149 20.021 0.066 21.46 0.020 0.223 0.084 232.8 20.607 23.10
0.005 20.076 0.000 0.167 0.022 20.070 1.605 20.024 20.245 20.089 221.1 0.685 3.41
The values in bold type are larger than the critical value of 0.555.
.
estimation of the model parameters? As for leverage, to help explain the influence of observation hd07.ss, consider Table 7.4. Compare the DFBETAS values in Table 7.3 to the dimensionless scaled sensitivities shown in Table 7.5. Explain why observation – parameter combinations with the largest DFBETAS values can have very small dimensionless scaled sensitivities. What is the implication for using dimensionless scaled
148
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
TABLE 7.4 Parameter Correlation Coefficient Matrix Calculated by MODFLOW2000 for the Final Parameter Estimates, Using All Hydraulic-Head and Flow Observations and Prior Information Except Observation hd07.ssa
HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2 a
HK_1
K_RB
VK_CB
HK_2
RCH_1
RCH_2
1.00
20.025 1.00
20.035 0.0006 1.00
20.79 0.015 0.034 1.00
0.87 20.011 20.029 20.98 1.00
20.72 0.011 0.029 0.99 20.97 1.00
Symmetric
Bold values have absolute value greater than 0.95.
TABLE 7.5 Dimensionless Scaled Sensitivities Calculated for the Final Parameter Values Estimated for the Steady-State Regression Dimensionless Scaled Sensitivities Observation Name hd01.ss hd02.ss hd03.ss hd04.ss hd05.ss hd06.ss hd07.ss hd08.ss hd09.ss hd10.ss flow01.ss
HK_1
K_RB
1.18E-05 20.210 225.5 20.210 252.7 20.210 225.5 20.210 238.5 20.210 225.6 20.210 20.699 20.210 252.7 20.210 268.9 20.210 238.5 20.210 25.65E-04 22.38E-05
VK_CB
HK_2
28.23E-09 1.29E-06 20.020 21.16 20.041 24.13 20.020 21.16 20.028 22.30 20.181 21.02 20.677 0.657 0.003 24.19 0.184 27.67 20.096 22.25 4.10E-07 26.20E-05
RCH_1 RCH_2 0.116 13.2 22.1 13.2 18.6 13.2 0.490 21.9 22.0 18.5 25.54
0.094 13.7 35.1 13.7 22.4 13.9 0.440 35.2 54.5 22.6 24.50
sensitivities and composite scaled sensitivities to determine which observations are most important to estimating the parameters? (d) Evaluate the uniqueness of the parameter estimates using correlation coefficients. Parameter correlation coefficients were introduced in Exercise 4.1a, to assess likely parameter uniqueness using the starting parameter values, and in Exercise 5.1a, to demonstrate the relation between these coefficients and objective-function surfaces and to illustrate the necessity of flow observations in preventing complete correlation between groundwater flow model parameters. Here, the correlation coefficients are used to evaluate uniqueness of the parameter estimates from Exercise 5.2c. The correlation coefficients calculated by MODFLOW-2000 are shown in Table 7.6a, and those calculated by UCODE_2005 are shown in Table 7.6b.
149
7.8 EXERCISES
TABLE 7.6 Parameter Correlation Coefficient Matrix for Final Parameter Values Using the Hydraulic-Head Observations, the Streamflow Observation, and Prior Information Calculated for the Steady-State Problem by MODFLOW-2000 and UCODE_2005a (a) MODFLOW-2000
HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
HK_1
K_RB
VK_CB
HK_2
RCH_1
RCH_2
1.00
20.042 1.00
20.080 0.0005 1.00
20.36 0.063 0.20 1.00
0.72 20.041 20.15 20.85 1.00
0.025 0.047 0.17 0.91 20.65 1.00
0.72 20.038 20.15 20.85 1.00
0.024 0.043 0.16 0.91 20.65 1.00
Symmetric
(b) UCODE_2005 HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
1.00
20.039 1.00
20.077 0.0005 1.00
Symmetric
20.36 0.059 0.19 1.00
Problem . Which parameter pairs are most highly correlated? What physical arguments can be used to explain why these parameters are correlated? . Do the parameter correlations calculated by MODFLOW-2000 at the final parameter values (Table 7.6a) differ from those calculated at the starting parameter values (Table 4.2a)? Is this expected? . Are there any significant differences between the correlations calculated by MODFLOW-2000 (Table 7.6a) and by UCODE_2005 (Table 7.6b)? What would produce the differences? (e) Detecting nonunique parameter estimates. The next part of this Exercise repeats some of the types of regression runs performed in Exercise 5.1b for the two-parameter combined model, to demonstrate further the effects of parameter correlation and methods for detecting it. Recall that the data available for estimating the six parameters of the steady-state model consist of 10 hydraulic heads (five in each model layer) and the gain in streamflow. When the streamflow gain observation is omitted, no prior information on parameters is specified, and only the 10 head observations are used, then all parameter correlation coefficients equal 1.0. This result is a direct consequence of Darcy’s Law, as discussed in the answer to Exercise 5.1a (available on the web site described in Chapter 1, Section 1.1). When the absolute values of any correlations are 1.00 or very close to 1.00, it may be that no single set of parameter values will produce the smallest value of the sum of squared, weighted residuals, and the nonlinear
150
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
regression may have trouble converging, or the solution may be nonunique in that different solutions would result from using different initial parameter values. Instructions for these simulations are available from the web site for this book described in Chapter 1, Section 1.1. (1) Perform a regression run in which the flow observation is omitted from the calibration data set, and there is no prior information on parameters. Statistics from the modified Gauss –Newton iterations of this run are shown in Figure 7.6. Problem . What happened in this regression run? Discuss the calculated value of the maximum change (column 3 in the top of the figure). SELECTED STATISTICS FROM MODIFIED GAUSS-NEWTON ITERATIONS
ITER.
---------
MAX. PARAMETER PARNAM --------------
---------------------
---------------------
----------------------
1 2 3 4 5 6 7 8 9 10
K_RB HK_2 K_RB HK_2 VK_CB VK_CB VK_CB VK_CB VK_CB HK_2
-6.09916 -0.902458 0.849936 -4.83286 39.8648 -329.378 13.8278 12.3276 1.96435 -1.84155
2.00000 2.00000 2.00000 2.00000 2.00000 2.00000 2.00000 2.00000 2.00000 2.00000
0.32791 1.0000 1.0000 0.41383 0.50170E-01 0.30360E-02 0.36159E-01 0.16224 1.0000 1.0000
CALC. CHANGE MAX. CHANGE
MAX. CHANGE ALLOWED
DAMPING PARAMETER
SUMS OF SQUARED WEIGHTED RESIDUALS FOR EACH ITERATION
ITER. 1 2 3 4 5 6 7 8 9 10
SUMS OF SQUARED WEIGHTED RESIDUALS OBSERVATIONS PRIOR INFO. TOTAL 1751.1 7941.6 695.77 148.52 63.538 58.476 62.818 59.196 46.333 16.519
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1751.1 7941.6 695.77 148.52 63.538 58.476 62.818 59.196 46.333 16.519
PARAMETER ESTIMATION DID NOT CONVERGE IN THE ALLOTTED NUMBER OF ITERATIONS
FIGURE 7.6 Selected statistics from the modified Gauss – Newton iterations of the regression run with only hydraulic-head observations in Exercise 7.1e. This is a fragment from the global output file of MODFLOW-2000.
151
7.8 EXERCISES .
Explain the parameter correlations resulting from this run. These correlations are shown in Table 4.3 in Exercise 4.1c; in that exercise a regression run was not performed but correlations for the six-parameter steady-state model were calculated using only the head observations.
(2) Test model nonuniqueness by starting the regression from the different sets of initial parameter values listed in Table 7.7, and comparing the resulting estimates, as suggested in Section 7.5.1. Include the flow observation and prior information. The results are shown in Table 7.8. Problem: How much do the estimated parameter values differ from those produced using the original initial values? Are any differences large when compared to the associated parameter standard deviations? What are the strengths and weaknesses of this test? ( f ) Evaluate the precision of the estimates using standard deviations, linear confidence intervals, and coefficients of variation. Table 7.9 and Figure 7.7 show the starting and estimated (optimal) parameter values for the steady-state regression of Exercise 5.2c, and the approximate linear, individual, 95-percent confidence intervals on the estimated parameter values. Problem . Which estimated parameters have the largest individual, linear, 95-percent confidence intervals as a percentage of the estimated value? Do these same parameters have the largest coefficients of variation? Explain. . What conclusions can be drawn about the relative uncertainty among the six parameters? . Theoretically, 95-percent confidence intervals should include the true value 95 percent of the time. Use the last column in Table 7.9 to note how many of these
TABLE 7.7
New Sets of Starting Parameter Values for Exercise 7.1e HK_1
K_RB 24
3 10 1.5 1024 6 1024
Original Set 1 Set 2
TABLE 7.8
1.2 10 0.6 1023 2.4 1023
27
1 10 0.5 1027 2 1027
HK_2 25
4 10 2 1025 8 1025
RCH_1
RCH_2
63.072 31.536 126.144
31.536 15.768 63.072
Estimated Parameter Values for Exercise 7.1e HK_1
Original Set 1 Set 2
VK_CB 23
K_RB 24
4.62 10 4.62 1024 4.62 1024
VK_CB 23
1.17 10 1.17 1023 1.17 1023
28
9.90 10 9.90 1028 9.90 1028
HK_2 25
1.54 10 1.54 1025 1.54 1025
RCH_1
RCH_2
47.45 47.45 47.43
38.53 38.53 38.54
152
HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
Parameter Name
4.62 10 1.17 1023 9.90 1028 1.54 1025 47.45 38.53
24
Estimated Value 0.14 0.38 0.37 1.77 0.27 0.31
Coefficient of Variation 24
3.11 10 ; 6.13 10 1.26 1024; 2.21 1023 1.19 1028; 1.86 1027 24.91 1025; 7.99 1025 16.6; 78.2 10.5; 66.6
24
Individual, Linear, 95percent Confidence Interval
4.0 1024 1.0 1023 2.0 1027 4.4 1025 31.536 47.304
True Value
TABLE 7.9 Estimated Values of the Steady-State Flow System Parameters; Coefficients of Variation; Individual, Linear, 95-percent Confidence Intervals; and True Parameter Values Coefficient Interval Includes True Value?
7.8 EXERCISES
153
FIGURE 7.7 Starting and true parameter values, limits of approximate, individual, linear, 95-percent confidence intervals (black bars), and limits of reasonable ranges of parameter values, expressed as percentage of the estimated values, for the steady-state regression run. Note that linear confidence intervals can have a negative lower limit, even if physically implausible, when parameters are not log-transformed in the regression. Here, none of the parameters are log-transformed.
linear 95-percent confidence intervals include the true value. If the percent is significantly smaller than 95 percent, explain why. In your answer, consider the prior information imposed, whether it constitutes regularization, and its effect on measures of uncertainty such as confidence intervals. (g) Compare estimated parameter values with reasonable ranges. Figure 7.7 shows the estimated parameter values and individual linear confidence intervals in relation to the reasonable ranges of parameter values. Problem: Are the estimated parameter values reasonable on the basis of the specified reasonable ranges? Are parameter confidence intervals needed to answer this question for this problem?
154
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
(h) Evaluate the precision of the estimates using nonlinear confidence intervals. Figure 7.8 shows the nonlinear 95-percent confidence intervals calculated on parameter values for the steady-state regression of Exercise 5.2c, together with the approximate, individual, linear 95-percent confidence intervals and reasonable ranges of parameter values from Figure 7.7. The nonlinear confidence intervals were computed using the UNC Process (Christensen and Cooley, 2005) of MODFLOW2000. UCODE_ 2005 produced the same results. The web site for this book (see Chapter 1, Section 1.1) provides instructions for calculating the nonlinear intervals. Problem . Compare the individual linear and nonlinear intervals, in terms of their size and symmetry. . How many of the nonlinear 95-percent confidence intervals include the true parameter value? How does this analysis compare to that performed in Exercise 7.1c for the linear intervals?
FIGURE 7.8 Limits of individual, linear, 95-percent confidence intervals (thin error bars); individual, nonlinear 95-percent confidence intervals (thick error bars); and reasonable ranges of parameter values, for the steady-state regression run. All values are expressed as percentage of the estimated parameter values.
7.8 EXERCISES .
155
Using the nonlinear intervals, assess the relative uncertainty among the six parameter values. How do the conclusions about relative parameter uncertainty compare to the conclusions drawn in Exercise 7.1f using the linear intervals?
Exercise 7.2: Consider All the Different Correlation Coefficients Presented Three different statistics referred to as correlation coefficients have been presented: R (Eq. (6.11)), the correlation between weighted simulated and observed values; R2N (Eq. (6.18)), the correlation coefficient between weighted residuals ordered from smallest to largest and the order statistics from a N(0, 1) probability distribution; and pcc, parameter correlation coefficients (Eq. (7.5)), which measure whether coordinated changes in parameter values would produce the same simulated values and, therefore, the same value of the objective function. For all of these correlation coefficients, values range between – 1.0 and 1.0, and values close to these extremes indicate high correlation. For R and R2N , values close to 1.0 are good: for R, this means the simulated values are in some ways similar to the observed values; for R2N , this means the weighted residuals are normally distributed. For pcc, values close to 21.0 or 1.0 are bad: it means that the available data are insufficient to uniquely estimate the parameter values being estimated. Problem: Consider the equations for these three statistics. Note how they are similar and different, and use the equations to explain why extreme values of R2N and of R are good, whereas extreme values of the parameter correlation coefficients are problematic. Exercise 7.3: Test for Linearity (a) Use the modified Beale’s measure. In this exercise, the linearity of the steady-state model is tested using the modified Beale’s measure. First, calculate the measure using the weights on the prior values for K_RB and VK_CB that were used in the regression. Then, recalculate the measure using weights that more realistically reflect likely uncertainty in these hydraulic-conductivity parameters. Instructions for model and postprocessor simulations needed to calculate the modified Beale’s measure are available from the web site for this book described in Chapter 1, Section 1.1. For students not performing the simulations, the information shown in Figures 7.9 and 7.10 can be used to complete the exercise. Problem . Does the modified Beale’s measure indicate that the model is effectively linear so that linear confidence intervals accurately display the uncertainty in the parameters? . Would using nonlinear instead of linear confidence intervals change the conclusions reached in Exercise 7.1g?
156
ESTIMATED PARAMETER VALUES AND PARAMETER UNCERTAINTY
USING FSTAT = 3.8700, BEALES MEASURE = 35.564 IF BEALES MEASURE IS GREATER THAN 0.26, THE MODEL IS NONLINEAR. IF BEALES MEASURE IS LESS THAN 0.23E-01, THE MODEL IS EFFECTIVELY LINEAR, AND LINEAR CONFIDENCE INTERVALS ARE FAIRLY ACCURATE IF THE RESIDUALS ARE NORMALLY DISTRIBUTED. FIGURE 7.9 Part of BEALE-2000 output file showing Beale’s measure calculated with the prior weights used in the regression for Exercise 5.2c.
USING FSTAT = 3.8700, BEALES MEASURE = 61.107 IF BEALES MEASURE IS GREATER THAN 0.26, THE MODEL IS NONLINEAR. IF BEALES MEASURE IS LESS THAN 0.23E-01, THE MODEL IS EFFECTIVELY LINEAR, AND LINEAR CONFIDENCE INTERVALS ARE FAIRLY ACCURATE IF THE RESIDUALS ARE NORMALLY DISTRIBUTED. FIGURE 7.10 Part of BEALE-2000 output file showing Beale’s measure calculated with a more realistic coefficient of variation of 1.0 used to compute the weights on prior values for both K_RB and VK_CB.
.
How does the modified Beale’s measure change when the weights on the prior values for K_RB and VK_CB are changed? Which calculated measure more realistically reflects the nonlinearity of the steady-state model?
(b) Use total and intrinsic model nonlinearity. In this exercise, the linearity of the steady-state model is tested using total and intrinsic model nonlinearity measures mentioned at the end of Section 7.7. As in Exercise 7.3a, first calculate these measures using the weights on the prior values for K_RB and VK_CB that were used in the regression. Then, recalculate the measures using weights that more realistically reflect the uncertainty in these two parameters. Instructions for the simulations needed to calculate total and intrinsic nonlinearity are available from the web site for this book described in Chapter 1, Section 1.1. For students not performing the simulations, the results are as follows. With the weights used in the regression, total model nonlinearity is 223.7 and intrinsic model nonlinearity is 0.142, and with a more realistic coefficient of variation of 1.0, total model nonlinearity is 359.0 and intrinsic model nonlinearity is 0.138. See Section 7.7 for critical values against which to compare the total model nonlinearity measures.
7.8 EXERCISES
157
Problem . Does the total model nonlinearity statistic indicate that the model is effectively linear so that linear confidence intervals accurately display the uncertainty in the parameters? Is this result consistent with the analysis of the modified Beale’s measure? . Does the steady-state model have a large degree of intrinsic model nonlinearity? . How do the statistics change when the weights on the prior values for K_RB and VK_CB are changed? Which calculated measure more realistically reflects the nonlinearity of the steady-state model?
8 EVALUATING MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
This chapter presents methods for evaluating model predictions and focuses on three broad topics: (1) defining the predictions of interest and calculating the predictions, their sensitivities, and their standard deviations; (2) using simulated predictions to assess future data needs, which involves statistics that indicate which parameters and which existing or potential observations are important to the predictions; and (3) quantifying prediction uncertainty using linear and nonlinear inferential statistics and Monte Carlo methods.
8.1 SIMULATING PREDICTIONS AND PREDICTION SENSITIVITIES AND STANDARD DEVIATIONS Model predictions typically are made to investigate the simulated system at a past or future time, under stress conditions that may differ from those used to calibrate the model and/or at spatial locations where no observations exist. For example, in a groundwater transport model, the predictions might be future solute concentrations resulting from evolution of a contaminant plume under the steady-state flow conditions for which the model was calibrated. Or, in a groundwater flow model, the predictions might be simulated hydraulic heads under future pumping conditions that are substantially different from those for which the model was calibrated. Or, both future transport and changes in pumpage might be of interest. Simulating predictions involves imposing the appropriate stresses and conditions and then Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty. By Mary C. Hill and Claire R. Tiedeman Published 2007 by John Wiley & Sons, Inc.
158
8.2 COLLECTION OF DATA THAT CHARACTERIZE SYSTEM PROPERTIES
159
calculating the predicted quantity. After the predictions have been simulated, their sensitivities can be calculated by the same methods used to calculate sensitivities for the simulated equivalents of the observations. In MODFLOW-2000, UCODE_2005, and PEST, any type of quantity that can be treated as an observation also can be treated as a prediction. The quantities that can be used as observations are listed in Table 2.1. After the predictions have been defined and simulated, and their sensitivities have been calculated, the prediction standard deviations can be calculated as " #1=2 NP X NP X @z0‘ @z0‘ V(b) (8:1a) sz0‘ ¼ @bj @bi i¼1 j¼1 where z‘0 ¼ the ‘th prediction; sz0‘ ¼ the standard deviation of the predictions; 0 @z‘ =@bj ¼ the sensitivity of the ‘th prediction with respect to bj, the jth parameter; V(b) ¼ the parameter variance –covariance matrix (Eq. (7.1)), often calculated for all parameters, as described in Sections 7.2.1 and 7.2.5. Expressing the sensitivities of prediction z ‘0 as vector xz‘ and expanding V(b) using Eq. (7.1) yields j k1=2 (8:1b) sz0‘ ¼ s2 (xz‘ (X T vX)1 xTz‘ Prediction sensitivities and standard deviations are not often used directly, but rather are scaled or used to derive measures for evaluating prediction uncertainty and assessing data needs, as discussed later in this chapter. They can be used to measure and communicate substantial insight about the system information, parameters, and observations that are most important to the calculated predictions and, to the extent that the model is accurate, the actual predictions. Prediction standard deviations also can be used to quantify the uncertainty of the predictions. This chapter describes methods for accomplishing these tasks. Use of measures derived from prediction sensitivities and from standard deviations computed using Eq. (8.1) assumes that the model is linear. When calculating these measures for nonlinear models, it is important to conduct the analyses with likely sets of parameter values that differ from the optimal parameter estimates. This tests the robustness of the conclusions drawn from these measures when applying them to nonlinear models. 8.2 USING PREDICTIONS TO GUIDE COLLECTION OF DATA THAT DIRECTLY CHARACTERIZE SYSTEM PROPERTIES The expense of data collection and the inaccessibility of many natural systems typically limits the amount of information that can be obtained about the properties and state of a simulated system. It is, therefore, important to design data
160
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
collection strategies that provide as much information as possible about aspects of the system that are important to the predictions. Here, we consider collection of data that directly characterize the properties of the simulated system. For a groundwater system, these data might include information about stratigraphy, hydrogeologic unit geometries, hydraulic conductivities, and areal recharge values. In Section 8.3, we consider collection of data related to quantities that can be used as observations in model calibration, such as hydraulic heads, streamflow gains and losses, and concentrations. One way to use a calibrated model to identify system properties that are most important to the predictions is by identifying model parameters that are most important to the predictions. The issue of parameter importance to predictions spans the last two components of the observation –parameter – prediction triad composed of entities that are directly connected by the model, as discussed in Chapters 1 and 10. Using predictions to design and evaluate strategies for collecting data related to the model parameters requires statistics that measure the importance of the parameters to the predictions. In nonlinear regression, predictions may be distinctly different kinds of quantities from the observations. For example, in groundwater models, the observations might be heads and flows, while the predictions might be solute concentrations. Any analysis needs to accommodate this. The methods discussed in this book for identifying parameter importance to predictions include (1) prediction scaled sensitivities (pss); (2) combined use of prediction, composite, and dimensionless scaled sensitivities ( pss, css, and dss), to identify parameters important to predictions that are not well supported by the observations; (3) parameter correlation coefficients (pcc) that include both observations and predictions, to evaluate whether parameters that are highly correlated in the calibrated model are individually important to the predictions; and (4) the parameter – prediction (ppr) statistic, which includes the effects of parameter uncertainty and correlation, in addition to prediction sensitivities. The pss, css, dss, and pcc statistics can be used to help reveal why different parameters are important; the ppr statistic does the best job of identifying parameters for which additional data is most advantageous. Foglia et al. (in press) compare these statistics to the results of cross validation. The broad field of model sensitivity analysis offers many additional measures for evaluating the importance of model inputs to model predictions (e.g., Saltelli et al., 2000, 2004). These range from simple measures like pss, to computationally intensive measures that account for model nonlinearity. In this book, we focus on a set of methods that are conceptually intuitive and fairly simple to calculate. 8.2.1
Prediction Scaled Sensitivities ( pss)
Prediction sensitivities (@z ‘0 /@bj) indicate the importance of the parameter values to the predictions, but need to be scaled when used for comparing the relative importance of different parameters. These are called prediction scaled sensitivities regardless of the exact scaling used. When calculating and presenting these measures, it is important to state clearly the scaling used.
161
8.2 COLLECTION OF DATA THAT CHARACTERIZE SYSTEM PROPERTIES
Generally, prediction scaled sensitivities are calculated in one of four ways: Sensitivity pss‘j ¼ pss‘j ¼ pss‘j ¼ pss‘j ¼
(@z0‘ =@bj ) (@z0‘ =@bj ) (@z0‘ =@bj ) (@z0‘ =@bj )
Scaling (bj =100)(100=r‘0 )
(sbj =100)(100=r‘0 ) (bj =100)(100=z0‘ ) (sbj =100)(100=z0‘ )
(8:2a) (8:2b) (8:2c) (8:2d)
where pss‘j is the scaled sensitivity of prediction z ‘0 to parameter bj; r‘0 is a reference value defined by the modeler, as described below; sbj is the standard deviation of parameter bj calculated in Eq. (7.3); and z ‘0 is the simulated value of the prediction. As noted, the first term is the sensitivity; the following terms are the applied scaling. UCODE_2005 produces data-exchange files with these prediction scaled sensitivities (Poeter et al., 2005, Table 20). The multiplication by bj/100 in the scaling of Eq. (8.2a) is equivalent to the scaling for the one-percent scaled sensitivity of Eq. (4.7). When the scaling by 100/r‘0 or 100/z ‘0 also is included, the resulting statistic is the change in the predicted value, expressed as a percentage of r‘0 or z ‘0 , caused by a one-percent change in the parameter value. This scaling can be produced rather awkwardly with MODFLOW2000 by setting the statistic for the weighting of the predictions to r‘0 or z ‘0 , and specifying the STAT-FLAG as 1 (Hill et al., 2000, p. 53). Prediction scaled sensitivities with this scaling will then be listed in the table of dimensionless scaled sensitivities printed by the programs. The multiplication by sbj/100 in Eq. (8.2b) produces scaled sensitivities that equal the change in the predicted value, expressed as a percentage of r‘0 or z ‘0 , caused by changing the parameter value by an amount equal to one-percent of the parameter standard deviation. This scaling expresses prediction sensitivity in the context of parameter uncertainty and has two advantages and one disadvantage. The first advantage is that, unlike parameter values, sbj almost never equals zero. The second advantage is that it is valid for parameters that are affected by the datum of the model, such as groundwater model parameters representing the head at constanthead boundaries. Its disadvantage is that it is not fit-independent because the value of the objective function is a term of the variance– covariance matrix (Eq. (7.1)). For all pss calculated using results from a single regression run, this disadvantage will affect all parameters proportionately, so the relative importance of parameters in a single run can be evaluated. This scaling can be accomplished awkwardly with MODFLOW-2000 by printing unscaled sensitivities and then applying the scaling using spreadsheet software. Possible alternatives for r ‘0 in Eq. (8.2) include a regulatory limit or another quantity relevant to a given modeling situation. Clearly, using 0.0 in Eq. (8.2) is not mathematically valid. In some circumstances the prediction is the difference between two simulations, as discussed in Section 8.4.5. For example, in groundwater models, a common prediction is the drawdown or the change in flow to a stream caused by pumpage.
162
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
UCODE_2005 and MODFLOW-2000 are designed to calculate sensitivities related to differences, and thus pss also can be computed for these differences. For UCODE_2005, this is accomplished using the Predictions mode and derived predictions; for MODFLOW-2000, this is achieved using the computer program YCINT2000 (Hill et al., 2000, pp. 87– 91). 8.2.2 Prediction Scaled Sensitivities Used in Conjunction with Composite Scaled Sensitivities In Figure 8.1, selected pss values calculated using Eq. (8.2c) for the model discussed in Chapter 15, Section 15.2.1 are compared to the css of Eq. (4.6). In the example, the predictions are the Cartesian components of advective travel simulated by particle tracking using the ADV Package of Anderman and Hill (2001). The model grid is oriented with the north compass direction, so the predictions are the particle travel distance in the north or south, east or west, and vertical directions. Figure 8.1b shows results for the north – south component of travel. The figure shows the mean and range of the pss values for five transported particles. Here, the pss values are defined to equal the percent change in the advective transport caused by a one-percent change in parameter value. The simulated value equals the simulated length of advective transport in the north or south coordinate direction. In Figure 8.1b, the pss show that HK2, HK3, and RCH2 (the hydraulic conductivity of two rock types and the recharge potential of one area) are the most important parameters to the determination of advective transport. In Figure 8.1a, the css show that the observations used in the regression provide more information for parameters HK2 and HK3 than for RCH2. This suggests that of these three parameters, it is probably most important to collect additional information about RCH2 for improving the transport predictions. This parameter was estimated by the regression, as shown by its black bar, but collecting additional information about its characteristics or additional observation data that support it could help improve its representation in the model and its estimated value, and thereby probably also improve the predictions. This analysis can be taken one step further by evaluating the dss shown in Figure 8.1c. Although the css for parameter HK4 is large, the dss show that the support primarily comes from just four observations, suggesting that these observations should be closely investigated. This type of analysis can be used to understand and communicate model strengths and weaknesses and to justify and plan additional model development and data collection efforts. It can also be used to better understand more sophisticated statistics such as the parameter – prediction (ppr) statistic described in Section 8.2.5. 8.2.3
Parameter Correlation Coefficients without and with Predictions
To determine whether parameters that are highly correlated for the calibrated model are individually important to predictions of interest, two different sets of parameter correlation coefficients are compared: those calculated using one of the first two
8.2 COLLECTION OF DATA THAT CHARACTERIZE SYSTEM PROPERTIES
163
FIGURE 8.1 (a) Composite scaled sensitivities for selected parameters, (b) one-percent prediction scaled sensitivities for the north–south component of predicted advective transport, and (c) dimensionless scaled sensitivities showing the support provided by the observations for parameter HK4. In (a), the composite scaled sensitivities for parameters estimated in the regression are shown using black bars; those not estimated in the regression are shown using gray bars. In (b), the prediction scaled sensitivities are defined as the percent change in the prediction given a one-percent change in the parameter value. These are selected results from simulations of the model discussed in Chapter 15, Section 15.2.1.
164
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
versions of the parameter variance –covariance matrix (Eq. (7.1)) described in Chapter 7, Sections 7.2.1 and 7.2.5, and those calculated with predictions as well (the fifth version of the parameter variance– covariance matrix described in Chapter 7, Sections 7.2.1 and 7.2.5). The pcc calculated with the predictions as well as the observations used in the regression is produced by augmenting the terms of Eq. (7.1) to include information related to the predictions. This produces an alternate parameter variance –covariance matrix that can be represented as V ‘ (b0 ) ¼ s2 (X T‘ v‘ X‘ )1
(8:3)
where the sensitivity matrix, X‘ , and the weight matrix, v‘ , are augmented to include the predictions. Predictions can be included individually or in groups, as appropriate for the particular problem to be addressed. These augmentations can be implemented easily when using MODFLOW-2000 or UCODE_2005 by adding the predictions to the list of observations, and executing the sensitivity analysis mode of either computer program. The value specified for the prediction as the “observed value” does not affect the calculated parameter correlation coefficients with predictions because the s2 term in Eq. (8.3) cancels out in the calculation of Eq. (7.5). However, the specified weight does affect the calculation. One implication of this is that in MODFLOW-2000 and UCODE_2005 it is generally not desirable to specify the statistic used to calculate the prediction weight as a coefficient of variation because in that situation the “observed value” is used to calculate the weight. Additional comments about determining “weights” for predictions are provided at the end of this section. The resulting pcc with predictions are compared with pcc calculated only with the observations used in the regression. If adding the predictions causes some highly correlated parameter pairs to become much less correlated, this indicates that the predictions are likely to depend on individual parameter values that the regression could not estimate uniquely. This identifies a weakness in the calibrated model. The utility of pcc with predictions is illustrated by a groundwater modeling example. Consider a groundwater flow model calibrated by estimating parameter values using observations of hydraulic-head and streamflow gain or loss. The calibrated model is used to predict (a) hydraulic head at a location where no measurement can be obtained and (b) advective transport from the site of a contaminant spill. Parameter correlation coefficients are first calculated using the calibrated model, all calibration observations, and all defined parameters. Two sets of pcc with predictions are then obtained, by first including the predicted hydraulic-head location and then the predicted advective transport. Using an analysis with calculations similar to these, Anderman et al. (1996) show that prediction of the hydraulic head using the calibrated model did not require uncorrelated parameter estimates and thus this prediction could be used with some confidence. Prediction of advective transport did require uncorrelated estimates, and thus the transport prediction is highly suspect.
8.2 COLLECTION OF DATA THAT CHARACTERIZE SYSTEM PROPERTIES
165
The weights for the predictions can be established using one of the two following approaches: 1. The weight can be established using a statistic (standard deviation or variance; see Chapter 3, Section 3.4.4 and Guideline 6 in Chapter 11) that reflects an acceptable range of uncertainty in the prediction. Compared to approach 2, this approach is more consistent with the scaling of the CTB statistic of Sun and Yeh (1990a) and Sun (1994). 2. The weight determined using approach 1 can be increased (by decreasing the value of the statistic used to calculate the weight) so that the results clearly indicate whether unique parameter values are important to predictions. By approach 1, predictions for which a larger amount of uncertainty is acceptable have smaller weights. Predictions that are desired to be more certain have larger weights, which increases the absolute values of the pcc for parameters to which these predictions are sensitive. Approach 2 allows weights for certain predictions to be subjectively increased. This option ensures that if individual parameter values are important to predictions, this will be revealed by the pcc.
8.2.4 Composite and Prediction Scaled Sensitivities Used with Parameter Correlation Coefficients Composite and prediction scaled sensitivities and parameter correlation coefficients (css, pss, and pcc) can be used together to assess whether an improved estimate of a parameter is needed. A classification system for this analysis is shown in Figure 8.2. The upper portion of this figure classifies the precision of a parameter estimate in combination with the importance of the parameter to the predictions. The lower portion of the figure classifies the uniqueness of the estimates for a parameter pair in combination with the importance to the predictions of having unique estimates of the two parameters. If the analysis of a parameter indicates a classification in box IV of Figure 8.2a or 8.2b, this means that improved estimation of this parameter and/or improved representation of the system features with which it is associated are likely to improve prediction accuracy. If the analysis indicates a classification in boxes I, II, or III, the term “acceptable” in these boxes means that a parameter is estimated well, is unimportant to the predictions, or both. Improved estimation of the parameter and improved representation of the system features with which it is associated are likely to be less beneficial to improving prediction accuracy than for parameters that are classified in box IV. The pcc are used in Figure 8.2b as measures of both the uniqueness of the parameter estimate and the importance of unique parameter values to the predictions. To measure the uniqueness of the parameter estimate, pcc are calculated with only the observations and prior information used in the calibration. To measure whether unique parameter estimates are important to the predictions, pcc are calculated with
166
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
FIGURE 8.2 Classification of the need for improved estimation of a parameter and, perhaps, associated system features. The classification is based on statistics that indicate (a) the precision and importance to predictions of a single parameter and (b) the uniqueness and importance to predictions of a pair of parameters. See text for additional explanation.
the predictions as well as the calibration observations and prior information. These two different types of pcc are discussed in Section 8.2.3. The classification system illustrated in Figure 8.2b only addresses uniqueness caused by lack of parameter correlation. Methods for detecting nonuniqueness caused by multiple minima are discussed in Chapter 7, Section 7.4. This method of using css, pss, and pcc can be revealing but is awkward. Fortunately, the ppr statistic described next incorporates the effects of parameter correlation as well as observation and prediction sensitivity. 8.2.5
Parameter– Prediction ( ppr) Statistic
Unlike prediction scaled sensitivities (pss), the parameter – prediction (ppr) statistic assesses the importance of parameters to predictions in a way that accounts for
8.2 COLLECTION OF DATA THAT CHARACTERIZE SYSTEM PROPERTIES
167
parameter correlations. The ppr statistic also takes advantage of the connection between parameter uncertainty, parameter correlation, and prediction uncertainty provided by Eq. (8.1) for the prediction standard deviation. The drawback of the ppr statistic is that the equation and procedure for obtaining the statistic is more complicated than for pss, which can cause the results to be less clear. Evaluating the dss, css, pss, and pcc statistics as described above can help explain the ppr results. The ppr statistic was developed by Tiedeman et al. (2003) and was called the value of improved information (voii) statistic in that work. In this book, the statistic name has been changed to better reflect its purpose. The equation for the ppr statistic is derived using Eq. (8.1), which calculates the prediction standard deviation using a calibrated model with existing independent information about parameter values included as prior information. In this calculation, it is important that the parameter variance –covariance matrix for all parameters (defined in Chapter 7, Sections 7.2.1 and 7.2.5) be used. Then, the standard deviation is recomputed under the assumption of increased certainty in one or more parameter values. The difference in prediction standard deviation is used to calculate the ppr statistic. The parameter variance– covariance matrix in Eq. (8.1) is calculated as V(b) ¼ s2 (X T vX)1
(8:4)
where X and v include sensitivities and weights for prior information as well as for observations used in the regression. To explain the method for calculating the ppr statistic, it is convenient to express X and v as X Y;PRI X¼ (8:5) I vY;PRI 0 v¼ (8:6) vppr 0 where XY,PRI ¼ the NP by ND þ NPR matrix of sensitivities of the ND calibration observations and the NPR prior equations with respect to the NP model parameters, with elements equal to @y0i =@bj (Eq. 4.1); NP ¼ the total number of defined model parameters and may be greater than the number of estimated model parameters; I ¼ the NP by NP identity matrix (all elements equal 1.0); v ¼ the weight matrix expressed here as in Appendix B; vY;PRI ¼ the ND by ND þ NPR matrix of weights on observations and prior equations; vppr ¼ the NP by NP matrix used to calculate ppr statistics, defined after Eq. 8.7. In calculating the variance– covariance matrix for all parameters, there is usually no prior information on parameters for which the calibration observations supply
168
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
abundant information. For parameters supported better by independent information than by the calibration observations (commonly these parameters are not estimated by the regression), it is important that prior information and associated weighting be specified, as discussed in Chapter 7, Section 7.2.5. By specifying prior weights in this manner, the parameter variance –covariance matrix calculated using Eq. (8.4) reflects actual levels of uncertainty, and the prediction uncertainty calculated using Eq. (8.1) reflects these realistic parameter uncertainties. The prediction uncertainty produced with improved information on one parameter is calculated using a modified form of Eq. (8.1): " sz0‘ (j) ¼
NP0 X NP0 X @z0
@z0 V(b)( j) ‘ @bj @bj j¼1
#1=2
‘
i¼1
(8:7)
where sz0‘ ( j) ¼ the standard deviation of the ‘th predicted value, z0‘ , calculated with improved information on the jth parameter, bj; NP ¼ the total number of defined parameters; V(b)( j) ¼ the symmetric, square NP by NP parameter variance –covariance matrix for all parameters, calculated with improved information on the jth parameter, expressed as V(b)( j) ¼ s2 (X T v( j) X)1 ; 2 s ¼ identical to s2 in Eq. (8.1), because the model has not been recalibrated and s2 is still considered the best estimate of the true error variance s2; v( j) ¼ the weight matrix in which the jth parameter has improved information, expressed as
v( j) ¼
vY;PRI 0
0
vppr( j)
;
vppr( j) ¼ a NP by NP matrix, in which all entries are zero except for the diagonal entry related to the jth parameter. The matrix vppr( j) is central to calculating the ppr statistic. Improved information on the jth parameter is implemented in this matrix by specifying a positive value on its jth diagonal. Conceptually, this positive value represents the increased certainty in the prior value that might result from collection of additional field data; that is, from improved information about the parameter. The consequence of including vppr( j) is that the variance of the jth parameter, which has improved information, will be smaller in V(b)( j) than in V(b). Parameters that do not have improved information, but that are correlated with the jth parameter, also tend to have smaller variances in V(b)( j) compared to those in V(b). Primarily because of the reductions in parameter variances, the prediction standard deviation calculated with improved information (sz0‘ (j) of Eq. (8.7)) generally is smaller than
8.2 COLLECTION OF DATA THAT CHARACTERIZE SYSTEM PROPERTIES
169
the prediction standard deviation calculated without improved information (sz0‘ of Eq. (8.1)). The scaled difference between sz0‘ ( j) and sz0‘ measures the value of the improved information on the jth parameter with respect to prediction z ‘0 and is calculated as
ppr‘(j) ¼ 100
s s z‘ z0 ( j) ‘
sz ‘
sz0 ( j) ¼ 100 1 ‘ sz ‘
(8:8)
where ppr‘( j) is the parameter – prediction statistic and equals the percent reduction in the standard deviation of prediction z ‘0 that results from improved information on the jth parameter. To rank the importance of individual parameters to prediction z ‘0 , ppr‘( j) is calculated NP times, each time with improved information on one parameter. The parameter associated with the largest value of ppr‘( j) ranks as most important to prediction z ‘0 . To implement improved information on each model parameter in a consistent manner, Tiedeman et al. (2003) suggest increasing the positive value on the diagonal of the vpprðjÞ matrix until the standard deviation on the parameter estimate is decreased by a specified percent. This requires an iterative procedure. Tiedeman et al. (2003) specified a 10 percent decrease, which represents the situation in which improved, but not perfect, information is collected about a parameter. For diagonal vpprðjÞ matrices, implementing this specified percent decrease is accomplished by increasing the value on the diagonal associated with the parameter in question; for full weight matrices it is less clear how to proceed and this issue has not been investigated. The method presented above is easily extended to the case of evaluating improved information on more than one parameter. When evaluating multiple parameters, the effect of parameter correlations can strongly influence which parameters are important to a prediction. This effect can produce situations in which the set of parameters with the highest individual ppr‘( j) values is not identical to the set of parameters that are most important when improved information on multiple parameters is considered. Tiedeman et al. (2003) present the method for the general case of improved information on any number of parameters. The computer program OPR-PPR (Tonkin et al., in press) can be used to calculate the ppr statistic. OPR-PPR easily can calculate the ppr statistic for models developed and calibrated using UCODE_2005 or MODFLOW-2000 because it is designed to use their output files directly. OPR-PPR also can be used with other models if appropriate files are produced. Other methods for evaluating the importance of model parameters to model predictions include those developed for hydrologic models by Walker (1982), Melching et al. (1990), Indelman et al. (1996), Høybye (1998), Levy et al. (1998), and Levy and Ludy (2000). These are similar to the ppr statistic in that they incorporate parameter uncertainty and prediction sensitivity, but unlike the ppr statistic, most of these methods do not include the effects of parameter correlations.
170
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
The calculation of the ppr statistic assumes that the model is linear with respect to the parameter values (see Chapter 1, Section 1.4.1). However, Tiedeman et al. (2003) found that the method is fairly robust for a mildly nonlinear groundwater model, for an application that is summarized in Chapter 15. Methods that account for model nonlinearity are presented by Sulieman et al. (2001) for models calibrated by regression, and by Saltelli et al. (2000) for the general case in which the model may not have been calibrated by regression. These methods are more complex than that for calculating the ppr statistic and also are substantially more computationally intensive. The basic concept embodied by the ppr statistic is the use of the first-order second-moment equation for prediction uncertainty (Eq. (8.1)) as a basis for assessing parameter importance. This concept has been used by other researchers as the basis for designing sampling networks for collecting data about the properties or parameters of groundwater systems. McLaughlin and Wood (1988) were among the first to investigate aquifer property sampling strategies in this context. McKinney and Loucks (1992), Sun and Yeh (1992), and Wagner (1995, 1999) incorporated this type of analysis into an optimization framework and developed methods for designing aquifer property sampling networks that minimize prediction uncertainty. The ppr statistic differs in that it is used as a tool for ranking the importance of all model parameters to any individual prediction.
8.3 USING PREDICTIONS TO GUIDE COLLECTION OF OBSERVATION DATA Evaluating the importance of observations to predictions spans the entire observation – parameter –prediction triad discussed in Chapters 1 and 10. We present two methods for evaluating the importance of observations to predictions, both of which use sensitivity statistics and are computationally fast. The first, using scaled sensitivities and parameter correlation coefficients (pss, css, and dss), is more awkward than the second, which uses the observation – prediction (opr) statistic. The statistics available in classical regression methods to address evaluation of observation importance to predictions include jackknife and bootstrap methods, both of which require many regressions and therefore often require prohibitive amounts of computer execution time for models of environmental systems. Foglia et al. (in press) demonstrate that opr statistics perform comparably to leave-oneout cross-validation in the evaluation of a groundwater model. 8.3.1 Use of Prediction, Composite, and Dimensionless Scaled Sensitivities and Parameter Correlation Coefficients Using pss, css, and dss together as illustrated in Figure 8.1 is one method for spanning the observation – parameter – prediction sequence to identify existing and potential observations important to the predictions. This figure was presented and
8.3 USING PREDICTIONS TO GUIDE COLLECTION OF OBSERVATION DATA
171
discussed in Section 8.2.2 in the context of identifying parameters that are important to the predictions but are not well-supported by the existing observations. A similar approach is also suggested by Merry et al. (2003). The pss, css, and dss also can be evaluated with the primary objective of identifying important existing and potential observations. Identification of the most important existing observations involves first identifying the parameters most important to the predictions using the methods illustrated in Figure 8.1, then identifying observations with large dss for these parameters. In Figure 8.1, this analysis revealed the potentially problematic situation of only four observations providing information for parameter HK_1. This same type of analysis can be used to identify observations most important to the predictions; in this case, observation types and locations with large dss are likely to be important, for example, to continue monitoring in the future (see Section 8.3.4). This type of analysis also can be used to identify important potential new observation types and locations, by calculating the dss for potential observations instead of for existing observations. The importance of potential new observations to the predictions also can be evaluated with respect to parameter correlations because pcc does not depend on the value of the observation. If the analysis of pcc without and with predictions (Section 8.2.3) shows that the predictions are likely to depend on parameter values that the regression could not estimate uniquely, then potential new observations could improve this situation if they enable unique estimation of the parameters. This can be evaluated prior to actually collecting the observations, by calculating the pcc with both the existing and the potential observations. In this calculation, the simulated conditions might be different for the existing and potential observations, and both sets of conditions need to be properly represented. The pcc calculated with the existing and potential observations are then compared to those calculated with only the existing observations. If adding the potential observations reduces the absolute values of pcc that are very large when only the existing observations are included, then the potential observations probably are important to predictions that depend on the individual parameter values. A drawback of using the pss, css, dss, and pcc together in this manner is that this procedure is awkward. It can result in many graphs from which it can be difficult to extract the key results. However, these methods can be quite useful in providing insight about values of observation – prediction (opr) statistics. 8.3.2
Observation – Prediction (opr) Statistic
The opr statistic integrates the information contained in the fit-independent statistics dimensionless and composite scaled sensitivities (css and dss of Chapter 4), parameter correlation coefficients ( pcc of Chapter 7), and prediction scaled sensitivities ( pss of Chapter 8). As indicated in Sections 8.3.1 and 8.3.3, it often is useful to investigate those statistics to better understand opr results. The methodology for the opr statistic assumes that the model is linear with respect to the model parameters. Tests of this assumption are presented in Chapter 7, Section 7.7 and Section 8.7.
172
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
The observation –prediction (opr) statistic assesses the effect on the prediction standard deviation of either removing one or more existing observations or adding one or more new observations. This evaluation can address issues related to monitoring the state of the simulated system, as discussed in Section 8.3.4. Calculating the opr statistic does not involve recalibrating the model with these observations added or removed. Thus, the leverage of the observations, rather than their influence, is determined (leverage and influence are defined in Chapter 7, Section 7.3). The opr statistic requires trivial computational effort. In contrast, identifying existing observations that are influential with respect to the predictions requires jackknifing or similar methods, which repeat the nonlinear regression with one or more observations omitted (Efron, 1982; Good, 2001). In addition, influence cannot be determined for potential observations because assessing influence requires the observed value in addition to other information associated with the potential observation, such as its type, location, and time. A modified version of Eq. (8.1) is used to evaluate the effect on prediction uncertainty of omitting or adding one observation (Hill et al., 2000; Tiedeman et al., 2004): " sz0‘ (+i) ¼
NP0 X NP0 X @z0
@z0 V(b)(+i) ‘ @bj @bi j¼1
#1=2
‘
i¼1
(8:9)
where sz0‘ (+i) ¼ the standard deviation of the ‘th predicted value, z ‘0 , calculated with the ith observation either added (þ) or removed (2); NP ¼ the number of defined parameters, which may exceed the number of estimated parameters; V(b)(+i) ¼ the symmetric, square NP by NP parameter variance –covariance matrix for all parameters, with the ith observation either added or removed and is calculated as V(b)(+i) ¼ s2 (XT(+i) v(+i) X(+i) )1
(8:10)
where s2 ¼ identical to s2 in Eq. (8.1), because the model has not been recalibrated and s2 is still considered the best estimate of the true error variance s2; X(+i) ¼ a sensitivity matrix formed either by adding (þ) or removing (2) the sensitivities of the simulated equivalent of the ith observation; v(+i) ¼ formed by modifying matrix v (defined after Eq. (8.6)), either by adding (þ) or by removing (2) the weight associated with the ith observation. As for the ppr statistic, it is important that the parameter variance– covariance matrix for all parameters (Chapter 7, Sections 7.2.1 and 7.2.5) be calculated in Eq. (8.10), and that prior information and associated weighting be specified for parameters that are supported better by independent information than by the calibration observations.
8.3 USING PREDICTIONS TO GUIDE COLLECTION OF OBSERVATION DATA
173
In practice, the ith observation is removed by setting its weight equal to zero, and by leaving the sensitivity matrix X unchanged. An observation is added by calculating the related sensitivities and assigning weights on the basis of an analysis of errors that would be expected for the potential observed values. The observed value itself does not affect the opr statistic because s2 from the regression is used in Eq. (8.10). The percent change in prediction uncertainty that results from removing or adding the ith observation is used as the measure of its importance to prediction z ‘0 : sz0 sz0 (+i) sz0‘ (+i) ‘ ‘ (8:11) opr‘(+i) ¼ 100 ¼ 100 1 sz0‘ sz0‘ where opr‘(+i) is the observation – prediction statistic, and the vertical lines indicate absolute value. This method can easily be extended to evaluate adding or omitting any combination of existing or potential observations (Tiedeman et al., 2004). The computer program OPR-PPR (Tonkin et al., in press) can be used to calculate the opr statistic. As discussed in Section 8.2.5, this program can calculate the statistic for UCODE_2005 and MODFLOW-2000 models, as well as for any other model that produces the needed output files. Some of the strengths and weaknesses of the opr statistic, and, indeed of all statistics calculated using a model, are that they reflect model simplifications and approximations. Generally, the model is the best available representation of the system in question, and as such it is important to consider model-calculated statistics. Close evaluation of results that do not make sense can help improve model results, as discussed in the next section. 8.3.3 Insights About the opr Statistic from Other Fit-Independent Statistics The reasons that certain observations rank as important to the model predictions by the opr statistic can determine what action is advised on the basis of the opr results. For example, large values of opr might be caused by aspects of model construction that are unrealistic. The appropriate response is to fix the model, which is likely to have the advantageous consequence of allowing other observations that are more accurately simulated to have greater influence on simulated results. In other circumstances the opr analysis may reveal plausible improvements in data collection strategies. Several of the fit-independent statistics discussed in previous chapters can help reveal why particular observations have large opr statistics. The contribution to the opr statistic of the prediction sensitivities in Eq. (8.9) can be investigated using the pss of Eq. (8.2). The contribution of the variance– covariance matrix can be investigated by first noting that Eq. (8.11) is designed so that the s2 term of Eq. (7.1) cancels out, causing the opr statistic to be fit-independent. Thus, only the term (X T vX)1 from Eq. (7.1) remains. The contribution of this term to the opr statistic can be investigated by considering dimensionless and composite scaled sensitivities (dss of Eq. (4.3) or (4.5) and css of Eq. (4.6)) and the parameter
174
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
correlation coefficients ( pcc of Eq. (7.5)). Generally, if (1) observation yi provides substantial information about parameter bj or about parameter bk correlated with bj (dssij is large), and (2) bj is important to prediction z ‘0 ( pss‘j is large), then it is likely that opr‘(+j) will be large. It is important to be aware that, in some situations, values of dss can be large because of simplifications made during model construction rather than because of actual hydrogeologic conditions. For example, consider a groundwater system in which a pumping well draws water from a localized zone of high hydraulic conductivity, but in a regional model of this system, the cell containing the well has a much lower hydraulic conductivity. Because the pumping well is located in a zone of low conductivity, hydraulic head in this cell will have a relatively large sensitivity to the hydraulic conductivity of the cell. In this case, the importance of an existing or potential hydraulic-head observation in this cell may not actually be as important to a prediction as the opr statistic may indicate. Additional insight into why certain observations can rank as important using opr is provided in Exercises 8.1d and 8.1f. Also, Tiedeman et al. (2004) apply the opr statistic to a groundwater flow model with advective-transport predictions. This application is summarized in Chapter 15. 8.3.4
Implications for Monitoring Network Design
Minsker (2003) summarizes methods and applications for groundwater monitoring network design. The prediction standard deviation of Eq. (8.1) has been used by many authors for monitoring network design. For example, Sun and Yeh (1990b) and Wagner (1995, 1999) used Eq. (8.1) together with optimization methods to determine an optimal set of groundwater observations for minimizing prediction uncertainty. Reeves et al. (2000) used it to identify new data locations most beneficial to groundwater remediation designs. Valstar and Minnema (2003) used a Bayesian method that considers prediction uncertainty. A strength of these methods and the opr statistic of Eq. (8.11) is that they can be used to evaluate and rank individual and user-defined groups of observations by their importance to predictions.
8.4 QUANTIFYING PREDICTION UNCERTAINTY USING INFERENTIAL STATISTICS Prediction uncertainty can be evaluated and quantified using inferential statistics and/or Monte Carlo analysis. The Monte Carlo method is discussed in Section 8.5. In both techniques, the magnitude of prediction uncertainty is related to the uncertainty in the model parameters and the sensitivity of the predicted quantities to the model parameters. The inferential methods discussed here produce intervals on predictions. Larger intervals indicate greater uncertainty. The methods are sometimes called first order, second moment (FOSM) methods: first order because they are linear, second moment because they use standard deviations, which are second moment statistics.
8.4 INFERENTIAL STATISTICS
175
Elementary texts that discuss inferential methods include Ott (1993, pp. 201 –204) and Davis (2002, pp. 200 –204). More advanced references include Seber and Wild (1989), Cooley and Naff (1990), Hill (1994), Helsel and Hirsch (2002), Glasgow et al. (2003), and Stauffer et al. (2004). 8.4.1
Definitions
The intervals discussed in this book can be individual or simultaneous intervals, and they can be confidence or prediction intervals. Thus, four types of intervals are possible: individual confidence intervals, individual prediction intervals, simultaneous confidence intervals, and simultaneous prediction intervals. The four terms are described in the following sections. Individual Intervals An individual confidence or prediction interval is said to have a (1 2 a) probability of including the true value of one predicted quantity. a is the significance level; a ¼ 0.05 produces 95-percent confidence intervals. Simultaneous Intervals Simultaneous intervals have the specified probability of containing their respective true predicted values simultaneously. Because they simultaneously account for uncertainty in more than one quantity, simultaneous intervals are always of equal size or larger than equivalent individual intervals. To understand this, consider 95-percent intervals on a set of predictions. If calculated using Monte Carlo methods, individual intervals would need to be set so that each interval contains the predictions produced by 950 of 1000 randomly generated sets of parameter values. Simultaneous intervals, on the other hand, would need to be set so that all intervals contain the predictions produced by 950 of 1000 randomly generated sets of parameter values. As more intervals are considered, the intervals tend to become larger. The size of linear simultaneous intervals increases until the number of intervals equals the number of parameters included in the uncertainty analysis. Additional intervals do not increase the size of linear simultaneous intervals. Nonlinear simultaneous intervals generally are similar, but there may be exceptions. Confidence Intervals Confidence intervals on predictions are intervals that, with a specified likelihood, contain the true, unknown predictions, if the model is correct. Confidence intervals reflect the uncertainty with which the parameters are estimated, as represented by the variance– covariance matrix on the parameters, projected using prediction sensitivities (Eq. (8.1)). Prediction Intervals Prediction intervals account for the same uncertainty in the parameter values reflected in confidence intervals, but also account for random error incurred when the predicted quantity is measured. A prediction interval is needed if the interval is to be compared with a measurement of the prediction. Prediction intervals are most often calculated for predictions and rarely for parameters. The use of the term “prediction” to describe both a type of interval and the
176
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
quantity for which the interval is constructed is confusing, but well established in the statistical literature. 8.4.2
Linear Confidence and Prediction Intervals on Predictions
All linear confidence intervals have the form z0‘ + ½critical valuesz0‘
(8:12)
where z ‘0 is the ‘th simulated value; sz0‘ is the standard deviation of the prediction, calculated as shown in Eq. (8.1); and critical value is a critical value from a statistical distribution. Critical values for four types of intervals are defined in Table 8.1. All linear prediction intervals have the form z0‘ + ½critical value(s2z0 þ s2a )1=2
(8:13)
‘
where sa is the product of (1) the standard error of the regression s (defined after Eq. (6.1) in Chapter 6, Section 6.3.2) and (2) the standard deviation of the error associated with a measured equivalent of the prediction (Hill, 1994, p. 32; Miller, 1981). Thus, to calculate prediction intervals the modeler needs to estimate the likely uncertainty in a measurement of the predicted value. Strategies for estimating this uncertainty are similar to those for observations discussed in Chapter 3, Sections 3.3.3, 3.4.2 and in Guideline 6 in Chapter 11. The calculation of linear confidence and prediction intervals can (and often should) include more parameters than were included in the regressions performed for model calibration. Thus, when calculating sz0‘ by Eq. (8.1), the parameter variance –covariance matrix of Eq. (7.1) often will be the parameter variance– covariance matrix with all parameters and with realistic weighting, as defined in Chapter 7, Sections 7.2.1 and 7.2.5 and discussed in Sections 8.2.5 and 8.3.2.
TABLE 8.1 Critical Values Required in Eqs. (8.12) and (8.13) to Calculate the Linear Confidence and Prediction Intervals Used in This Book Type of Interval Individual Bonferroni simultaneous Scheffe´ d ¼ k simultaneous Scheffe´ d ¼ NP simultaneous
Critical Value
Table in Appendix D
ts (n, 1.0 2 a/2) tB (n, 1.0 2 a/2k) [d Fa (d, n)]1/2 ¼ [k Fa (k, n)]1/2 [d Fa (d, n)]1/2 ¼ [NP Fa (NP, n)]1/2
D.2, Student-t distribution D.6, Bonferroni t statistic D.7, F distribution D.7, F distribution
Note: a is the significance level and is commonly 0.05 or 0.10 (5 or 10 percent), which results in 95- or 90-percent intervals, respectively; n is the degrees of freedom, here equal to ND þ NPR 2 NP; k is the number of simultaneous intervals or NP, whichever is smaller; NP is the number of parameters for which sensitivities are used in Eq. (8.1). NP commonly equals either the number of estimated parameters or the number of defined parameters.
8.4 INFERENTIAL STATISTICS
177
Linear, individual and simultaneous, confidence and prediction intervals for predictions are listed in output files produced by MODFLOW-2000 and computer program YCINT-2000 (Hill et al., 2000), or by UCODE_2005 and computer program LINEAR_UNCERTAINTY (Poeter et al., 2005). Calculation of linear intervals requires only the sensitivities calculated for the optimized parameter values and, therefore, takes very little computer execution time. Individual confidence intervals are exact when constructed using the critical value of Table 8.1 and Eq. (8.12) if the model is linear and satisfies the requirements of Chapter 3, Section 3.3, as tested for using the methods of Chapters 6, 7, and 8. “Exact” means that intervals have the stated probability of including the true value. Exact critical values for linear simultaneous intervals are difficult to calculate, but can be approximated using the Bonferroni, Scheffe´ d ¼ k, and Scheffe´ d ¼ NP critical values of Table 8.1, as discussed by Miller (1981). The Bonferroni, Scheffe´ d ¼ k, and Scheffe´ d ¼ NP approximate critical values tend to be large. For example, an interval calculated for a 5 percent significance level (a 95 percent interval) may be large enough to satisfy a smaller significance level such as 3 percent (resulting in a 97 percent interval). The linear simultaneous intervals tend to indicate that the uncertainty is greater than it really is. If k is less than NP, either the Bonferroni or Scheffe´ d ¼ k critical values could be used. Both tend to be too large; using the smaller critical value reduces the error. If k is larger than NP, Scheffe´ d ¼ NP critical values are needed. In some cases, k is not finite. For example, if a prediction of interest is the largest simulated value over a defined area, the predicted quantity cannot be exactly specified before performing a model simulation. An infinite number of simultaneous predictions then need to be considered. In this circumstance, the Scheffe´ d ¼ NP critical value is needed. These intervals are denoted Scheffe´ d ¼ NP intervals in Table 8.1 and throughout this book; in other publications the term Scheffe´ interval almost always refers to these d ¼ NP intervals. For all linear intervals, as the model becomes nonlinear and violates the requirements of Chapter 3, Section 3.3 the calculated intervals become less accurate. This means that the actual significance level can be substantially different than intended, and is a serious concern for nonlinear models (Donaldson and Schnabel, 1987). For some non-ideal situations linear intervals may be accurate enough to be useful, as discussed in Section 8.4.3. Hopefully, evolving experience will provide additional guidance on when the computationally expensive nonlinear intervals are needed. 8.4.3
Nonlinear Confidence and Prediction Intervals
For nonlinear models, nonlinear intervals are sometimes much more accurate than linear intervals. Nonlinear intervals can be calculated using the methods of Vecchia and Cooley (1987). These methods compute individual or simultaneous intervals on any function of the model parameters g(b). The intervals can be individual or simultaneous Scheffe´ d ¼ NP confidence intervals or individual prediction intervals. To obtain a nonlinear interval on a parameter, the function g(b) is specified to represent the parameter; this situation was discussed in Chapter 7, Section 7.5.1. Here, we consider that g(b) represents a prediction.
178
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
Calculating nonlinear intervals involves determining the minimum and maximum values of g(b) over a confidence region on the parameter set listed in vector b. The confidence region is defined in NP-dimensional parameter space and has a specified probability of containing the true set of parameter values. Vecchia and Cooley (1987) present methods for calculating intervals using exact confidence regions and using approximate likelihood confidence regions. The method that uses the likelihood confidence region is presented here for three reasons: (1) determining the exact confidence region is mathematically more difficult than determining the likelihood region; (2) it has been shown that the probability that b lies in the likelihood region is very close to the true probability determined using the exact confidence region (Donaldson and Schnabel, 1987); and (3) this method is used in MODFLOW-2000’s UNC Process (Christensen and Cooley, 2005), UCODE_2005 (Poeter et al., 2005), and PEST (Doherty, 2005). The method for computing nonlinear confidence intervals involves first defining the (1 2 a)100-percent likelihood parameter confidence region. This region is defined as the set of parameter values for which the objective-function values, S(b), satisfy the following condition (modified from Vecchia and Cooley, 1987, Eq. (10) and Christensen and Cooley, 2005, Eqs. (8) and (19)): S(b) S(b0 ) þ s2 critical value þ a
(8:14)
where S(b0 ) ¼ the objective function for the optimal parameter values b0 ; s2 ¼ the calculated error variance defined in Eq. (6.1); critical value ¼ a critical value from a statistical distribution; the critical values required for different types of intervals are defined in Table 8.2; a ¼ 0.0 for confidence intervals and for prediction intervals reflects the accuracy of a measured observed equivalent of the prediction.
TABLE 8.2 Critical Values for Eq. (8.14) Required to Calculate the Nonlinear Confidence and Prediction Intervals Used in This Book Type of Interval Individual confidence Scheffe´ d ¼ NP simultaneous confidence Individual prediction
Critical Value
Table in Appendix D 2
cc[ts(n, 1.0 2 a/2)] cr[NP Fa (NP, n)]
D.2, Student-t distribution D.7, F distribution
cp[ts(n, 1.0 2 a/2)]2
D.2, Student-t distribution
Note: a is the significance level and is commonly 0.05 or 0.10 (5 or 10 percent), which results, respectively, in 95- or 90-percent parameter confidence region and intervals. n is the degrees of freedom, here equal to ND þ NPR 2 NP. NP is the dimension of the parameter space, which commonly equals the number of estimated parameters. ND is the number of observations and NPR is the number of prior information equations. cc, cr, and cp are correction factors defined by Christensen and Cooley (2005), as discussed by Poeter et al. (2005, Table 38). The correction factors are set to 1.0 for the results presented in this book.
8.4 INFERENTIAL STATISTICS
179
The quantity on the right-hand side of Eq. (8.14) defines the bounding surface of the parameter confidence region. The term a is not discussed further in this book. For additional information see Christensen and Cooley (2005, pp. 11– 12). Some characteristics of nonlinear intervals can be investigated using Eq. (8.14) and the critical value from Table 8.2 with ND þ NPR 2 NP substituted for n. All of the critical values generally increase as NP increases or ND þ NPR 2 NP decreases. The product NP Fa(NP, ND þ NPR 2 NP) generally increases with NP. Thus, the size of the parameter confidence region and nonlinear intervals are larger for poorer model fits (larger values of s2), more parameters (larger NP), and fewer observations (smaller ND). After defining the (1 2 a)100-percent parameter confidence region, the method finds the minimum and maximum values of the prediction g(b) on the boundaries of this region. These extreme values are the lower and upper limits of the (1 2 a)100-percent nonlinear confidence interval on g(b). Figure 8.3 illustrates the confidence region and the limits of a nonlinear confidence interval on a simple prediction g(b) made with a hypothetical two-parameter model. Unlike a linear interval, the nonlinear confidence interval is not symmetric about the value calculated using the optimized parameter values, g(b0 ): the upper limit of the interval (g(b) ¼ c4) is much further from g(b0 ) than is the lower limit of the interval (g(b) ¼ c2). Nonlinear confidence intervals can be larger or smaller than corresponding linear confidence intervals and, as shown in Figure 8.3, can be asymmetric about the estimated value. These characteristics are illustrated in Figure 8.4, which shows linear and nonlinear intervals calculated by Christensen and Cooley (1999) for
FIGURE 8.3 The geometry of a nonlinear confidence interval on prediction g(b0 ).The parameter confidence region (shaded area), contours of constant g(b) (dashed lines), and locations of the minimum (g(b) ¼ c2, with b ¼ bL) and maximum (g(b) ¼ c4, with b ¼ bU) values of the prediction on the confidence region are shown. The lower and upper limits of the nonlinear confidence interval on prediction g(b) are thus c2 and c4, respectively. (Adapted from Christensen and Cooley, 1999, Figure 9.)
180
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
FIGURE 8.4 Linear and nonlinear Scheffe´ simultaneous confidence intervals on hydraulic heads predicted by a steady-state groundwater flow model of an aquifer in Denmark. (From Christensen and Cooley, 1999, Figure 5a.)
hydraulic-head predictions in a groundwater flow model of an aquifer in Denmark. For these predictions, the differences between the linear and nonlinear intervals tend to increase as the size of the intervals increase. For relatively small intervals, the linear and nonlinear intervals are roughly the same size. For some of the larger intervals, such as 9, 13, 17, and 20, the linear confidence interval is larger than the nonlinear confidence interval. For others, such as 4, 7, and 8, the nonlinear confidence interval is larger. Some of the nonlinear intervals, such as 4, 7, 8, and 13, are highly asymmetric. A significant difference in the evaluation of linear and nonlinear confidence intervals involves the assumptions that apply to the calculation of the different intervals. Recall that three important assumptions apply for linear confidence intervals to be accurate: (1) the model is correct, (2) the model is linear, and (3) the true errors are normally distributed. For nonlinear confidence intervals, only the first assumption is needed. To the extent that model nonlinearity and deviations from normality of the weighted residuals are problematic, nonlinear intervals are likely to be more accurate than associated linear intervals. Calculating nonlinear intervals is computationally intensive because of the difficulty of determining the extreme values of g(b) over the confidence region. Calculation of each limit of each nonlinear confidence interval involves a computational effort approximately equivalent to a full nonlinear regression simulation. Furthermore, it is good practice to calculate each limit using a few different starting parameter values, as the results can depend on these values. It has been stressed in this book that it is important to include defined parameters that were not estimated in the regression in evaluations of prediction uncertainty. Though including such parameters in the calculation of nonlinear intervals has not been investigated, it is likely that their inclusion will cause difficulties in
8.4 INFERENTIAL STATISTICS
181
determining the interval limits. If that proves to be the case, an advantage of the linear intervals would be their ability to include the effects of these parameters. This difference can be significant if the added parameters are important to predictions. The importance of parameters to predictions can be evaluated using the methods described earlier in this chapter, in Section 8.2. The substantial effort required to compute nonlinear confidence and prediction intervals suggests that a practical approach is first to calculate linear intervals, and then to calculate nonlinear intervals for selected predictions. Additional nonlinear intervals may be needed depending on the discrepancies between the linear and nonlinear intervals and the requirements of the uncertainty evaluation. 8.4.4 Using the Theis Example to Understand Linear and Nonlinear Confidence Intervals The nonlinear and linearized objective-function surfaces shown in Figure 8.5 (modified from Figure 5.3) can be used to better understand linear and nonlinear confidence intervals on predictions. Each contour of the nonlinear objectivefunction surface (Figure 8.5a) can be related to a significance level for inferential statistics. Contours that are closer to the minimum of the nonlinear surface relate to larger significance levels. Consider a situation like that in Figure 8.5b, in which the objective-function surface has been linearized around a point close to or equal to the minimum of the objective function. If the designated significance level is large enough, then the contour of this linearized surface will be close to the associated contour on
FIGURE 8.5 Objective functions for the Theis problem defined in Figure 5.3. (a) Objective function for the nonlinear model, with the minima (†) and the linearization () points near to and far from the minima. (b) Objective function for model linearized about the S and T values at . The point † is the minimum for the nonlinear model.
182
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
the nonlinear surface. This is illustrated by comparing the contours close to the large dot in Figure 8.5a and 8.5b. In this case, the inferential statistics calculated using linear theory are likely to be accurate if the other required assumptions hold (that the model is correct and the weighted residuals are normally distributed). As the significance level declines, a contour more distant from the optimum parameter values is needed, a broader range of parameter values is included in the interval of interest, and more nonlinear parts of the objective-function surface become important. This is illustrated by comparing contours distant from the large dot in Figure 8.5a and 8.5b. In this circumstance, the stated significance level for the critical value used to calculate linear confidence intervals (a in Table 8.1) becomes less reliable. Thus, while a 90-percent confidence interval (10-percent significance level) might be well-estimated using linear theory in a certain situation, a 99-percent confidence interval (1-percent significance level) might not. 8.4.5 Differences and Their Standard Deviations, Confidence Intervals, and Prediction Intervals In many management situations, the prediction is the change in a simulated quantity under certain conditions. For example, in a groundwater model, the relevant predictions might be drawdowns or changes in flow to a stream caused by changes in pumping rates. Such changes are called differences here and are calculated by subtracting values produced by a base simulation from values produced by a predictive simulation. Thus, the difference, u0‘ , is calculated as u0‘ ¼ z0p‘ z0q‘
(8:15)
where p represents the predictive conditions and q represents the base conditions. Figure 8.6 illustrates the concept of differences. In the simple groundwater flow system shown in Figure 8.6a, hydraulic heads and flow to a river were simulated for steady-state calibration conditions and for steady-state conditions representing two pumping scenarios. If the management criterion is that hydraulic heads cannot decline by more than 2 meters, simulated drawdown is of interest. This difference is calculated by subtracting simulated hydraulic heads for pumping scenario 1 or 2 from simulated hydraulic heads for the calibration conditions. If the management criterion is that the streamflow gain along a reach must not decrease by more than 20 percent of an observed flow, the simulated change in streamflow is of interest. This difference is calculated by subtracting streamflow gain simulated for pumping scenarios 1 and 2 from the streamflow gain simulated for the calibration conditions. To compute the standard deviation of a difference, z0‘ of Eq. (8.1a) is replaced with the difference, u0‘ , yielding "
0 #1=2 NP X NP @z0 X @z0q‘ @z p‘ @z0q‘ p‘ 0 V(b )ij su0‘ ¼ @bj @bj @bi @bi i¼1 j¼1
(8:16)
183
8.4 INFERENTIAL STATISTICS
FIGURE 8.6 (a) Cross section through a simple groundwater system at steady state, showing simulated hydraulic head for calibration conditions and two predictive scenarios with pumping. (b) Hydrograph of a transient groundwater system showing simulated hydraulic head during a calibration period and during two predictive scenarios with pumping. Quantities u1, u2, and u3 are differences that may be of interest.
where su0‘ is the standard deviation of the difference u0‘ . To compute confidence or prediction intervals on differences, z0‘ of Eqs. (8.12) and (8.13) is replaced with u0‘ , and sz0‘ of these equations is replaced with su0‘ . In Eq. (8.13), s2a is calculated as s2a ¼ s2a p þ s2aq
(8:17)
The calculation of confidence and prediction intervals on differences involves differences in the sensitivities, as shown in Eq. (8.16). If the sensitivities to each of the parameters are the same for the two subtracted values, su0‘ equals zero and the limits of calculated confidence intervals on differences each equal the simulated difference u0‘ . As a result, the width of these confidence intervals equals zero and prediction intervals only reflect s2a . An unrealistic but informative example illustrates this circumstance. If all conditions, including stresses, are the same in the two simulations for which differences are calculated, then all differences equal zero and the confidence interval limits on differences all equal zero. This result indicates the certainty that if simulated conditions do not change, the simulated values do not change. A more realistic situation is that the differences between two simulations are small, in which case the confidence intervals on the differences also will tend to be small.
184
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
For some parameters, sensitivities might be identical for the two predictions being subtracted, so the difference in sensitivity will equal zero. For example, consider a groundwater flow simulation with areal recharge and pumpage in which all model layers are confined and all boundary conditions are linear. Predictions are changes in head or flow to a stream. In such a simulation, a specified increase in areal recharge produces the same increase in hydraulic head or flow at any location in the system regardless of the simulated pumpage. Thus, the sensitivities related to areal recharge are independent of the pumpage that causes a difference in heads or flows, and uncertainty in the recharge rate would not affect the uncertainty of the predicted changes in head and flow. Differences need not be between calibration conditions and alternative conditions. For example, in Figure 8.6a, the relevant predictions might be the differences between hydraulic heads simulated under pumping scenarios 1 and 2. Figure 8.6b shows the hydrograph for a simulated well in such a model. Differences that might be useful are (1) the decline in hydraulic head since the end of the calibration period (differences u1 and u2 of Figure 8.6b) or (2) the additional decline in hydraulic head that would occur under pumping scenario 1 compared to scenario 2 (difference u3 of Figure 8.6b). Differences could also be spatial. For example, in a groundwater system the predicted head loss across a confining layer might be of interest. With MODFLOW-2000, YCINT-2000 (Hill et al., 2000, pp. 87– 91) can be used to calculate differences and their linear intervals. In UCODE_2005, differences can be defined using derived predictions (Poeter et al., 2005); their linear intervals can be calculated using LINEAR_UNCERTAINTY and nonlinear intervals can be calculated as described by Poeter et al. (2005, Chapter 17). 8.4.6 Using Confidence Intervals to Serve the Purposes of Traditional Sensitivity Analysis Confidence intervals on simulated values can be employed to replace the traditional procedure used to perform sensitivity analyses. According to Anderson and Woessner (1992, p. 246), “the purpose of a sensitivity analysis is to quantify the uncertainty in the calibrated model caused by uncertainty in the estimated parameter values”; and in the procedure traditionally followed to fulfill this purpose, “calibrated values for hydraulic conductivity, storage parameters, and recharge and boundary conditions are systematically changed within the previously established plausible range.” The results of several traditional sensitivity analyses are shown in Anderson and Woessner (1992, pp. 247– 254). The major weaknesses of the traditional procedure are as follows: 1. The “plausible range” usually is determined subjectively prior to model calibration (Anderson and Woessner, 1992, p. 231). Thus, this range does not reflect the possibly substantial information provided by model calibration on the parameter values. One effect of this is that many sets of parameter values used in a traditional sensitivity analysis may result in a much poorer
8.5 MONTE CARLO ANALYSIS
185
match to the observations than was achieved in model calibration. Yet if the model reasonably represents the system, the poor fit produced by these parameter values suggests that the values are unlikely. Thus, by using the previously defined plausible ranges, traditional sensitivity analysis tends to produce unrealistically large measures of uncertainty for results of calibrated models. 2. Coordinated changes in two or more parameter values are rarely considered, though they are often important, and some attempts to consider coordinated parameter changes can actually be detrimental. For example, Anderson and Woessner (1992, p. 248) suggest that, in traditional sensitivity analysis of groundwater models, hydraulic conductivity and recharge values be changed in opposite directions because such parameters are often positively correlated. In some cases this can be useful, but in others it can exacerbate the problem noted in weakness 1, producing an even more severely exaggerated impression of model uncertainty. Because of these weaknesses, confidence intervals often can be used to fulfill the purpose of sensitivity analyses more effectively than the traditional approach. 8.5 QUANTIFYING PREDICTION UNCERTAINTY USING MONTE CARLO ANALYSIS As discussed in Chapter 7, Section 7.5.2, in Monte Carlo analysis, uncertain aspects of the model input data are changed and for each change or set of changes a model run is conducted and changes in selected simulated results are evaluated. The changed model input data are often parameter values but can be any other model attributes. Monte Carlo analysis often is used to evaluate prediction uncertainty and can be used to test the significance level of confidence intervals (e.g., Hill, 1989). The present work seeks only to introduce the reader to Monte Carlo analysis. More detailed description of Monte Carlo methods can be found in a number of texts, including Skinner (1999), Vose (2000), and Bedford and Cook (2001). 8.5.1
Elements of a Monte Carlo Analysis
The elements that define a Monte Carlo analysis include: 1. 2. 3. 4. 5.
What model inputs are changed. How changed model inputs are generated. What constitutes a model run. How many Monte Carlo runs are conducted. What simulated values or quantities calculated using the simulated values are saved. 6. How the saved results are analyzed.
186
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
Each of these elements is discussed in more detail below. 1. Commonly, parameter values are changed during Monte Carlo analysis, but other entities such as aspects of the conceptual model also could be changed. For example, in groundwater models hydrogeologic interpretations might be changed. In this situation, the modified model inputs could be the configuration of hydraulic-conductivity zones that represent different hydrogeologic units, or the variation of hydraulic conductivity within such zones using, perhaps, pilot points. A very different type of analysis involves changing the errors on the observations. These analyses generally are called bootstrap methods and are discussed briefly in Chapter 7, Section 7.5.2. 2. If parameter values are changed, commonly they are assumed to be normally or log-normally distributed with their means equal to the parameter estimates from model calibration and their variation characterized by the parameter variance – covariance matrix. Parameter values that honor these means and variation can be generated randomly, or more frugal methods can be used that require fewer Monte Carlo runs to produce an equivalently accurate evaluation. One such method is Latin hypercube sampling (Gwo et al., 1996; Zhang and Pinder, 2003). Another method, Markov chain Monte Carlo, has received substantial attention recently, as noted by the contributions posted at the web site http://www.statslab. cam.ac.uk/mcmc/. For other types of changed quantities, analogous options can be used. UCODE_2005 supports a very simple method in which parameter values are sampled at equal intervals within a stated range. If other aspects of a model are changed, often it is useful to consider a discrete number of changes that produce deterministically derived alternative models. For example, in groundwater modeling, a limited number of alternative interpretations of the hydrogeologic framework could be tested. 3. The model run may be a forward simulation or can be a more complicated run such as an inverse simulation. Poeter and McKenna (1995) provide an example of using inverse simulations. This application is summarized in Guideline 8 (Chapter 11, Section G8.3) and Guideline 14 (Chapter 14, Section G14.2). 4. The number of Monte Carlo runs required depends on many factors, including the number of model inputs changed, how they are changed, and whether they are continuous or discrete variables. The desired results of the Monte Carlo analysis also are important. For example, using Monte Carlo to determine an estimate of the mean of a predicted value to a given accuracy generally takes far fewer Monte Carlo runs than determining 90-percent confidence intervals, which in turn takes fewer runs than determining 95-percent confidence intervals. In addition, accounting for model nonlinearity can require an increased number of Monte Carlo runs. The number of runs required often is determined by calculating the desired result after some number of Monte Carlo runs, and then examining whether this result changes after conducting an additional set of runs. When the result becomes stable, sufficient runs have been conducted. The number of runs can be very large in some circumstances and can commonly exceed 1000. Even if the
8.5 MONTE CARLO ANALYSIS
187
model run takes only 1 minute of execution time, a 1000-run Monte Carlo analysis requires almost 17 hours of computer time. This is problematic in fields such as groundwater, in which even with modern computers model runs can take 30 minutes or more. For a 30-minute run, conducting 1000 runs requires 21 days of execution time. Availability of parallel processors often can be used to great advantage as long as each processor has enough random access memory to run the model. Additional guidance on determining the required number of Monte Carlo runs, with application to groundwater models, is presented by Ballio and Guadagnini (2004). 5. Important results to save from Monte Carlo runs can include any model output, or any quantity calculated from the output. The obvious items to save are values of the defined model predictions. Less obvious items might be measures that reflect the numerical accuracy of the solution, to ensure that any solutions that did not satisfy the convergence criteria are identified; and statistics that measure model fit to observations such as the objective function or standard error. Often it also is useful to save information about what was changed for each solution, such as parameter values, so that unusual results can easily be evaluated. It is important to carefully choose what results to save from each Monte Carlo run, to avoid (a) repeating the analysis to obtain model results that were not saved and (b) saving too much output, which can be unwieldy and difficult to process. 6. There are many ways to analyze and display Monte Carlo results, including calculation of histograms and confidence limits and presentation on maps or cross sections. In many circumstances the main criterion motivating the presentation of results is to convey the essence of the results to resource managers. Many authors have suggested that (a) for any type of change considered, simulated values need to be compared to observations, and (b) including simulated values that produce a poor match to the observations can result in an overstatement of model uncertainty (Beven and Binley, 1992; Brooks et al., 1994; Evers and Lerner, 1998; Binley and Beven, 2003; and Morse et al., 2003). To account for this, a measure of model fit needs to be saved for each run, and runs with poor matches need to be omitted from the analyses and display of the Monte Carlo results. 8.5.2 Relation Between Monte Carlo Analysis and Linear and Nonlinear Confidence Intervals Some Monte Carlo analyses produce results that approximate those produced by inferential statistics. Differences occur because of approximations made in the inferential methods or inadequate sampling by the Monte Carlo analysis. For example, nonlinear intervals are routinely estimated using an approximate likelihood-function approach (e.g., Cooley, 2004; Christensen and Cooley, 2005). Considering the approximate equivalence of Monte Carlo and inferential results helps to understand both methods. For example, if the following conditions are met, the results of a Monte Carlo analysis approximate nonlinear confidence intervals (described in Section 8.4.3).
188
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
a. Only parameter values are changed for the Monte Carlo runs, and the parameters changed are the same as those that are active for the nonlinear confidence interval calculations. b. The same observations, prior information, and weighting are used for the Monte Carlo analysis and to calculate the nonlinear intervals. In the Monte Carlo runs the least-squares objective function of Eq. (3.1) is used to compare the simulated values to the observations, and runs are omitted if the result exceeds a specified value (denoted the “fit criterion” here), as suggested by Beven and Binley (1992) and Binley and Beven (2003). This objectivefunction fit criterion is the quantity on the right-hand side of Eq. (8.14). c. No local minima exist in the objective-function surface and predictions are continuous and monotonic with respect to the parameter values in the range of interest. d. The ranges of parameter values used for the Monte Carlo analysis are sufficiently large to cover the entire parameter confidence region, as defined by the objective function being less than the fit criterion. e. The Monte Carlo interval is constructed as the maximum and minimum predicted values that occur for the objective-function fit criterion. Advantages of the Monte Carlo method are that it is possible to consider (1) changes in aspects of the system other than parameter values, and (2) highly nonlinear models that violate (c) above. In the first situation, inferential statistical methods currently have no equivalent capability. In the second situation, inferential methods would be unable to produce meaningful limits on the predictions. 8.5.3
Using the Theis Example to Understand Monte Carlo Methods
The Theis example of Figure 8.5 can be used to understand Monte Carlo analysis, in a manner similar to that for understanding linear and nonlinear confidence intervals (Section 8.4.4). In this example, suppose a Monte Carlo analysis is conducted that involves changing parameter values, and that an objective-function value of 1.0 is chosen as the fit criterion defined in Section 8.5.2. Figure 8.5a illustrates the importance of choosing an appropriate range of parameter values sampled. For example, to ensure that the analysis samples the entire parameter space within the objectivefunction contour of 1.0, transmissivity values greater than 0.28 ft2/s and storage coefficients larger than 0.001 need to be included. Objective-function surfaces such as that in Figure 8.5 are almost never available to guide selection of the range of parameter values, and thus in Monte Carlo analysis often it is difficult to verify that the ranges are sufficiently large. After the ranges of parameter values are chosen, it also is important to carefully select the sampled parameter values within these ranges. If prediction uncertainty is being evaluated, and the prediction consistently increases or decreases with the parameter values, as for the example in Figure 8.3, then the extreme predictions will occur for an objective-function value equal to the chosen fit criterion. In this case, it is most important to sample parameter sets that produce an objective-function
8.7 TESTING MODEL NONLINEARITY
189
value near the fit criterion. However, for some types of nonlinearity, the extreme predictions may occur for smaller values of the objective function. In this case, Monte Carlo methods are needed to identify extreme values, and it is important to thoroughly sample sets of parameters that produce objective-function values smaller than the fit criterion. Clearly, it can be difficult to determine the extreme predictions with Monte Carlo methods and, when applicable, the nonlinear confidence interval calculation is much more efficient for evaluating prediction uncertainty.
8.6 QUANTIFYING PREDICTION UNCERTAINTY USING ALTERNATIVE MODELS Recent work has proposed a number of methods for quantifying prediction uncertainty in a way that accounts for alternative models. The methods involve calculation of confidence intervals for each alternative model, weighting the intervals to reflect the validity of the related model, and producing composite intervals. Methods have been presented that calculate the weighting based on the AICc statistic (Eq. (6.3)) (Burnham and Anderson, 2004) and Bayes factors (Kass and Raftery, 1995; Neuman, 2003; Meyer et al., 2004). Poeter and Anderson (2005) compared these two methods using the Multi-Model Analysis (MMA) computer code (Poeter and Hill, in press) and found the intervals produced using the weights based on the AICc statistic to be more useful. Additional testing in a variety of circumstances is needed, however, to determine the applicability of that conclusion.
8.7 TESTING MODEL NONLINEARITY WITH RESPECT TO THE PREDICTIONS The test for linearity described in Chapter 7, Section 7.7 uses the modified Beale’s measure to evaluate the linearity of the model with respect to observed quantities. If the predictions are similar in type, location, and time to the observations, and if the predictive conditions are similar to the calibration conditions, then that test is sufficient. In many circumstances, however, we are in the unenviable position of trying to make predictions that are in some way very different from the observations. The prediction conditions may be very different, or the predicted quantity could be very different. In this circumstance, a model that is linear on the basis of the analysis presented in Chapter 7, Section 7.7 may be nonlinear with respect to the predictions. In this situation, the judgment of linearity could lead to the incorrect conclusion that linear intervals on predictions are accurate measures of prediction uncertainty. Methods for testing model nonlinearity with respect to predictions are just being developed. As an introduction to these methods, consider first the following statistic, which is a direct extension of the modified Beale’s measure. 1. Use the same sets of parameters produced from step 1 of Chapter 7, Section 7.7, using Eq. (7.11). These parameter values are on the edge of the linearized parameter confidence region.
190
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
2. Compute the predictions by executing a forward model run for each generated set of parameter values. These simulated values are z~ k‘ , where k refers to the kth generated parameter vector, ‘ refers to the ‘th prediction, and the tilde () is used to designate values associated with the generated parameter values. 3. Calculate linearized estimates of the predictions using the generated parameter sets as follows: z~ o‘k
¼
z0i
þ
NP X j¼1
(b0j
@z0‘ ~ bkj ) @bj b0
(8:18)
where z~ o‘i ¼ the linearized estimate of the ith prediction; z‘0 ¼ the ‘th simulated prediction calculated using the optimal parameter estimates; b0j ¼ the jth optimal parameter estimate; b~ kj ¼ the jth parameter value from the kth generated parameter set. 4. Calculate the proposed modified Beale’s measure for predictions, N^ bz , which is a measure of the difference between the model-computed and the linearized estimates of the predictions: N^ bz ¼
2NP X j~z‘k z~ o‘k j j~zo‘k z0‘ j k¼1
(8:19)
Equation (8.19) produces a unique measure of linearity for each prediction. For a truly linear model, the numerator of Eq. (8.19) equals zero. Thus, values of N^ bz that are close to zero generally indicate that the model is close to being linear. As model nonlinearity increases, the magnitude of the numerator increases. Testing of Eq. (8.19) would be needed to develop critical values of N^ bz that can be used as objective criteria against which to evaluate model linearity. A deficiency of Eq. (8.19) is that it measures only the nonlinearity of the predictions with respect to the parameters. If Eq. (8.19) is close to zero, but the measures discussed in Chapter 7, Section 7.7 indicate nonlinearity of the observation with respect to the parameters, linear intervals on predictions may be in error. The combined nonlinearity can be measured using the combined intrinsic model nonlinearity measures of Cooley (2004). The method is described by Cooley (2004) and Christensen and Cooley (2005, pp. 20 –24) and is available in MODFLOW-2000’s UNC Process with BEALE2-2K (Christensen and Cooley, 2005) and in UCODE_2005 with MODEL_LINEARITY_ADV (Poeter et al., 2005). While the complete equations are not repeated here, the following analysis applicable to individual confidence intervals on predictions provides an introduction. Two issues of concern in Christensen and Cooley (2005) are (1) the validity of linear confidence intervals and (2) the validity of correction factors that account for unrepresented heterogeneity (the cc , cr , and cp of Table 8.2). The equations are presented below along with many of the steps required to go from the more
191
8.7 TESTING MODEL NONLINEARITY
general equations presented in Christensen and Cooley (2005) to the equations presented here, which apply if the weight matrix used in the regression has been defined in Chapter 3, Section 3.4.2 (i.e., v ¼ V(1)1 ). If intrinsic model nonlinearity is small, the first issue is addressed by what we call the combined intrinsic model nonlinearity measure for confidence intervals. Using Christensen and Cooley (2005, Eq. (48)), the notation of this book, and rearranging terms, the measure is calculated for each prediction as 2 X bmin ¼ 1 (˜y y˜ op Xwp )T v(˜yp y˜ op X wp ) M 2 2s p¼1 p
(8:20)
X is defined for Eq. (5.2b) and is calculated using the optimal parameter values, b0 . Terms covered by a tilde () and with a p subscript are calculated for parameter values generated as (Christensen and Cooley, 2005, p. 21; Cooley, 2004, pp. 86 –87) 1 @z‘ b˜ ¼ b0 + ½V(b0 ) @b sz0‘
ð8:21Þ
where addition is used for p ¼ 1, substraction is used for p ¼ 2 b˜ ¼ a vector of generated parameter values; b0 ¼ a vector of optimal parameter estimates; sz0‘ ¼ the standard deviation of the ‘th prediction, defined in Eq. (8.1); [V(b0 ) ¼ the parameter variance– covariance matrix of Eq. (7.1); @z‘ =@b ¼ the sensitivity of the ‘th prediction with respect to the optimal parameter values; this is a vector with NP elements. Superscript o indicates values calculated using a model linearized about the optimized parameter values—that is, y˜ op ¼ y0 (b0 ) þ X(b˜ b0 ). The remaining term, wp , accounts for prediction nonlinearity and is calculated as
wp ¼ w0‘ þ
1 @z‘ (~zp z~ 0p ) ½V(b0 ) @b s2z0
(8:22)
‘
where
"
w0‘ ¼
! T 1 @z @z ‘ ‘ ½V(b0 ) (XT vX)1 XT v1=2 @b @b sz0‘ (XT vX)1 X T v1=2 v1=2 (~yp y~ 0p )
(8:23)
To obtain Eqs. (8.22) and (8.23) from the results presented by Christensen and Cooley (2005, p. 21, Eqs. (50) and (51)), their term QT Q needs to be expanded using the definition of Q on their page 8. The definition of s2z0 using the square of ‘
192
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
Eq. (8.1b) also is used. This proof is valid for all types of intervals. T @z‘ T 1 T 1=2 1=2 T 1 @z‘ (X vX) X v v X(X vX) Q Q¼ @b @b T @z‘ T @z‘ (X vX)1 XT vX(X T vX)1 ¼ @b @b T @z‘ T @z‘ (X vX)1 ¼ @b @b T 1 @z‘ 2 T @z‘ s (X vX)1 ¼ 2 @b s @b T
¼
s2z0
‘
(8:24)
s2
It is assumed that V(b) ¼ s2 (XT vX)1 , which is valid for linear systems with the correct model (see Appendix C), and approximate otherwise. bmin is calculated for each prediction. If intrinsic nonlinearity is One value of M small, values less than 0.01 indicate that a standard linear individual confidence interval, calculated using the equations in Section 8.4, should not be affected significantly by nonlinearity. If intrinsic model nonlinearity is small, the second issue is addressed by what we call the combined intrinsic model nonlinearity for correction factors. The measure is bU or jM bmin 2B bL j. M bmin would bmin þ 2B calculated as the largest of two values: M bU if y were linear, so only the last term in the parentheses of Eq. (8.20) is equal B nonzero. Thus, 2 X bU ¼ 1 B (X wp )T vij (Xwp ) 2s2 p¼1
!T ! 2 1 X 1 1 0 @z‘ 0 0 @z‘ 0 (~zp z~ p ) v X 2 ½V(b ) (~zp z~ p ) X 2 ½V(b ) ¼ 2 @b @b 2s p¼1 sz 0 sz0 ‘ ‘ ! 2 1 X 1 @zT @z‘ (~zp z~ 0p )T ‘ ½V(b0 )XT vX½V(b0 ) (~zp z~ 0p ) ¼ 2 2 2 @b @b 2s sz0 p¼1 sz0 ‘ ‘ ! 2 1 X 1 @zT (~zp z~ 0p )T ‘ ¼ 2 2 2 @b 2s sz0 p¼1 sz0 ‘
‘
@z‘ (~zp z~ 0p ) s2 (XT vX)1 XT vX½V(b0 ) @b ! 2 T s2 X 1 @z‘ 0 T @z‘ (~ z ¼ 2 2 ½V(b0 ) (~zp z~ 0p ) z ~ ) p p 2 @b @b 2s sz0 p¼1 sz0 ‘
‘
193
8.8 EXERCISES
! 2 s2 X 1 (~zp z~ 0p )T s2z0 (~zp z~ 0p ) ¼ 2 2 ‘ 2s sz0 p¼1 s2z0 ‘
¼
1 2s2z0
‘
2 X
‘
((~zp z~ 0p )T (~zp z~ 0p ))
ð8:25Þ
p¼1
^ min This can be compared to Eq. (53) of Christensen and Cooley (2005, p. 21). M ^ would equal BL if z were linear. In this circumstance, the last term in the parentheses of Eq. (8.20) is zero. Thus, 2 1 X (˜y y˜ op Xw0p )T vij (˜yp y˜ op X w0p ) B^ L ¼ 2 2s p¼1 p
ð8:26Þ
This can be compared with Eq. (54) of Christensen and Cooley (2005, p. 21). One value of B^ U and B^ L is calculated for each parameter. Two values are bU and jM bmin 2B bL j. If the largest of these values is less bmin þ 2B considered: M than 0.09, the correction factors used to determine the confidence intervals are valid. For individual confidence intervals on predictions, the correction factor is 1.0, so that satisfying the linearity requirements means that the individual confidence intervals of Section 8.4 are not adversely affected by nonlinearity from the perspective of correction factors.
8.8
EXERCISES
As discussed in Chapter 2, Section 2.2, water-supply wells are being completed and a landfill has been proposed for the groundwater flow system considered in the exercises of this book. The wells are located in the center of the area, at row 9, column 10; one well would pump from model layer 1, one would pump from layer 2, and each is expected to pump, on average, 1.1 m3/s. The proposed site for the landfill is near the center of row 2, column 16 (Figure 2.1a,b). The landfill developers claim that if the landfill liner leaks, effluent from the landfill would flow toward the river, not toward the supply wells. Also, the landfill developers (who are knowledgeable about regression) argue that it is inappropriate to use this model to evaluate potential advective travel from the landfill because it is calibrated using hydraulic-head and streamflow-gain observations (no transport observations), and they claim that the need for prior information indicates clearly that the data used to calibrate the model are insufficient. They claim, therefore, that there is no reason to believe the model predictions related to transport. In Exercises 8.1 and 8.2, we evaluate these claims. The county wants to use the steady-state model to evaluate the potential transport and the likely utility of additional data in addressing the complaints of the developer. It would be possible to collect hydraulic-head and streamflow-interaction observations
194
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
under pumping conditions and to use these data to further calibrate the model, but obtaining these model results would require substantial delay of the landfill permitting process. County officials would like to know whether the additional information is likely to be important enough to warrant such a delay. As an initial evaluation, all parties agree to use the steady-state model to evaluate two potential observations using simulated steady-state pumping conditions: (1) the streamflow gain or loss and (2) hydraulic head in one location far from the river (row 9, column 18). While a thorough analysis of the potential transport requires an advective – dispersive transport model, a preliminary analysis can be conducted by simulating advective transport alone, using the Advective-Transport Observation Package (ADV) (Anderman and Hill, 2001) of MODFLOW-2000. If the advective-travel path goes to the well, advective –dispersive modeling is not necessary. If the advective path does not go to the well, it is still possible that landfill effluent will reach the well by dispersive transport, and an analysis using a model that includes dispersive processes will be needed. The analysis of advective transport from the potential landfill site will address the following questions: Question 1. When the supply wells are pumped, where does an advective path from the landfill travel? Does it go to the well or the river? If it goes to the well, how long does it take to get there? Question 2. What parameter values are most important to the predicted advective transport and how does this compare to the information provided for their estimation by the observation data? Question 3. Of the existing head and flow observations used for calibration of the steady-state model, which are most important to the advectivetransport predictions? Question 4. Is collection of the streamflow and hydraulic-head data under pumping conditions likely to contribute additional information that is important to the simulation of advective transport, and if so, which of these potential observations would contribute the most information? Question 5. What is the uncertainty with which the predictions are simulated using the steady-state calibrated model? Question 1 is addressed in Exercise 8.1a with a forward model run that includes calculation of advective travel from the landfill site using the ADV Package. Question 2 is partly addressed in Exercise 8.1b using two comparisons. First, prediction scaled sensitivities (pss) are compared with composite scaled sensitivities (css) of Exercise 5.2c. Second, parameter correlation coefficients ( pcc) calculated with observations and predictions are compared to those calculated with only observations. These calculated pcc each have prior information omitted, to address the developer’s concern about the use of prior information. Question 2 is further addressed in Exercise 8.1c using the parameter – prediction (ppr) statistic.
8.8 EXERCISES
195
Question 3 is addressed in Exercise 8.1d using the observation – prediction (opr) statistic for evaluating existing observations. Question 4 is first addressed in Exercise 8.1e by considering dimensionless scaled sensitivities (dss) for the possible new observations and pcc calculated including the new observations. It is further addressed in Exercise 8.1f by using the observation – prediction (opr) statistic. Question 5 is addressed in Exercise 8.2 using confidence intervals. Exercise 8.1: Predict Advective Transport and Perform Sensitivity Analysis This exercise addresses Questions 1 –4. Parts of this exercise involve simulations using either MODFLOW-2000 or UCODE_2005, and in Exercises 8.1c,d and 8.1f calculations are performed using the computer program OPR-PPR (Tonkin et al., in press). For students performing the simulations and calculations, instructions are available from the web site for this book listed in Chapter 1, Section 1.1. (a) Predict advective transport. This exercise addresses Question 1 using a forward MODFLOW-2000 run with the ADV Package and with steady-state pumping imposed to predict the advectivetransport path originating at the proposed landfill location. The ADV Package uses particle-tracking methods comparable to those of Pollock (1994) to determine advective-transport paths. To compute a particle path, total particle movement is decomposed into displacements in the three spatial grid dimensions, resulting in three advective-transport predictions at every location of interest along a path. An additional system property, effective porosity, is needed to simulate advective transport. Effective porosity does not affect the path trajectory but does affect the particle travel time. In the model used for the exercises, the spatial grid dimensions are the x, y, and z directions. Predictions are defined for 10, 50, and 100 years of advective transport. Thus, the ADV Package calculates nine advective-transport predictions: the transport distances in the x, y, and z directions at 10, 50, and 100 years. For this run an observation also is defined for 200 years so that the full path is simulated. Two effective porosity parameters are defined. POR_1&2 is the porosity of the aquifers (layers 1 and 2 of the model) and POR_CB is the porosity of the confining bed. The values of POR_1&2 and POR_CB are set to 0.33 and 0.10, respectively. Output from the MODFLOW-2000 simulation describing the movement of a particle originating at the proposed landfill location is shown in Figure 8.7. Problem: Use the information about the particle path to answer all parts of Question 1. Plot the particle path on the model grid shown in Figure 8.7b. The X-position is measured along model grid rows: X ¼ 0 at the left and increases to the right. The Y-position is measured along model grid columns: Y ¼ 0 at the northern boundary and increases toward the south. The Z-position is measured vertically. In this application, Z ¼ 210 at the bottom of layer 2 and increases in the upward direction. (MODFLOW users will note that in the ADV output the Z-axis definition is opposite to that used in MODFLOW, in which the model layer numbers increase
196
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
(a) ADVECTIVE-TRANSPORT OBSERVATION NUMBER 1 PARTICLE TRACKING LOCATIONS AND TIMES: LAYER ROW COL X-POSITION Y-POSITION Z-POSITION
TIME
---------------------------------------------------------------------------------------------------------------1
2
16
15500.
1500.0
100.00
0.0000
12-
14
1
2
16
15156.
1609.3
89.366
0.31500E+09
1 1 1
2 3 3
15 15 14
15000. 14085. 14000.
1657.2 2000.0 2028.4
85.481 69.953 69.024
0.44658E+09 0.11164E+10 0.11668E+10
15-
17
3
14
2341.2
62.686
0.15700E+10
2457.4 3000.0 3041.5 3817.0 4000.0 5000.0 5028.3 5363.8
60.867 56.119 55.844 52.850 52.431 50.679 50.627 50.000
0.17072E+10 0.21508E+10 0.21813E+10 0.25811E+10 0.26481E+10 0.29476E+10 0.29548E+10 0.30232E+10
..................................................................................................................................................... OBS #
OBS NAME: AD10x
.....................................................................................................................................................
..................................................................................................................................................... OBS# 1
OBS NAME: AD50x 13269.
..................................................................................................................................................... 1 1 1 1 1 1 1 2 PARTICLE
3 13 13000. 4 13 12076. 4 12 12000. 4 11 11000. 5 11 10834. 6 11 10022. 6 10 10000. 6 10 9804.5 ENTERING CONFINING UNIT
..................................................................................................................................................... OBS #
18-
20
2
6
10
9804.5
OBS NAME: A100x 5363.8
46.239
0.31500E+10
2
6
10
9804.5
5363.8
40.000
0.33604E+10
6000.0 7000.0 8000.0
35.216 22.891 8.4270
0.39200E+10 0.44052E+10 0.45677E+10
..................................................................................................................................................... ..................................................................................................................................................... PARTICLE EXITING CONFINING UNIT 2 2 2
7 8 9
10 10 10
9552.7 9375.4 9379.0
FIGURE 8.7 (a) Part of MODFLOW-2000 List output file describing the movement of a particle that originates at the top of the cell (row 2, column 16, layer 1) containing the proposed landfill. Predictions AD10, AD50, and A100 are defined for times of 10, 50, and 100 years, respectively. (b) Diagram of the model grid for plotting the particle path (see Figure 2.1 for explanation of symbols).
with depth. The Z-axis is defined in this way so that the particle elevations are more intuitive and consistent with plotting routines.) (b) Determine the parameters that are important to the predictions using prediction scaled sensitivities and parameter correlation coefficients. This exercise addresses Question 2 by making the two comparisons that were presented in Figure 8.2 and discussed in Section 8.2.4. First, prediction and
197
8.8 EXERCISES
FIGURE 8.7 Continued.
composite scaled sensitivities are compared. Second, parameter correlation coefficients calculated using only the calibration observations are compared with those calculated with the addition of the predictions. These pcc are both calculated with prior information omitted. For this problem, calculate pss that equal the percent change in the predicted quantity produced by a one-percent change in the parameter value (Eq. (8.2c)). Prediction scaled sensitivities calculated by UCODE_2005 are listed in data-exchange files. For MODFLOW-2000, the pss calculated by Eq. (8.2c) are in tables of dimensionless scaled sensitivities if the statistic for calculating prediction weights is set to the predicted value (the transport distance at a given time in a given direction) and STAT-FLAG is specified as 1. Composite scaled sensitivities calculated without the advective-transport prediction are listed in the _sc output file produced by MODFLOW-2000 and UCODE_2005 and are shown in Figure 7.5b of Exercise 7.1a for the optimal parameter estimates. To obtain pcc that include only the calibration observations, the prior information needs to be omitted, which addresses one of the developer’s concerns. These correlation coefficients can be calculated by using the optimized parameter values and completing a model run that produces the pcc without changing the parameter values. To calculate pcc for the calibration observations plus the predictions, note first that the hydrologic conditions for the calibration are different from those for the predictions. In this model, the difference is the addition of pumpage. Therefore, correctly producing the pcc requires simulation of two conditions—one without pumpage and one with pumpage. The hydraulic-head and flow observations used for calibration occur during conditions without pumpage; advective-transport predictions occur during conditions with pumpage. Sensitivities related to
198
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
TABLE 8.3 Standard Deviations (in meters) Used to Calculate Weightsa for the Advective-Transport Predictions Time of Advective Travel Direction X Y Z
10 years
50 years
100 years
200 200 10
600 600 15
1000 1000 25
a The weights are needed to calculate parameter correlation coefficients and are determined using criterion 1 of Section 8.2.3.
calibration observations are calculated without pumpage, and those related to the predictions are calculated with pumpage. Both sets of sensitivities are used to compute the pcc. As discussed in Section 8.2.3, the weighting specified for a prediction affects the calculated pcc (Eq. (8.3)). The standard deviations used to calculate the weights for the advective travel at specified times in the three coordinate directions are listed in Table 8.3. The pss are plotted together with the css in Figure 8.8. In this figure, the scales are different for the css and pss, but this is not problematic because the analysis involves evaluating the relative values of each measure. The pss for effective porosity also are included in Figure 8.8. These parameters are not relevant to the model calibration with head and flow observations, but could be important in the calculation of advective transport times. The pss for effective porosity are computed with UCODE_2005 using central-difference perturbation. The pcc without and with predictions are shown in Tables 8.4 and 8.5, respectively.
FIGURE 8.8 Composite and prediction scaled sensitivities for the calibrated parameters and for effective porosity. The pss for the x, y, and z grid directions are shown as the left, middle, and right columns, respectively, for each advective-travel time. The pss are defined as the percent change in advective travel caused by a one-percent change in the parameter value (Eq. (8.2c)).
199
8.8 EXERCISES
TABLE 8.4 Parameter Correlation Matrix Using Only the Hydraulic-Head and Flow Observations, with Prior Information Omitted, Using Final Parameter Values, and Calculated by MODFLOW-2000a
HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2 a
HK_1
K_RB
VK_CB
HK_2
RCH_1
RCH_2
1.00
20.40 1.00
20.90 0.20 1.00
20.93 0.34 0.97 1.00
0.96 20.32 20.97 20.99 1.00
20.90 0.32 0.97 0.996 20.98 1.00
Symmetric
Correlation coefficients greater than 0.95 are in bold type.
TABLE 8.5 Parameter Correlation Matrix Using the Hydraulic-Head and Flow Observations and the Advective-Transport Predictions, with Prior Information Omitted, Using Final Parameter Values, and Calculated by MODFLOW-2000a
HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2 a
HK_1
K_RB
VK_CB
HK_2
RCH_1
RCH_2
1.00
20.15 1.00
0.097 20.62 1.00
20.15 20.009 0.32 1.00
0.70 0.27 20.17 20.46 1.00
0.36 20.090 0.26 0.81 20.23 1.00
Symmetric
No correlation coefficients are greater than 0.95.
Problem . Answer Question 2 above using the css and pss in Figure 8.8 and the pcc without and with predictions in Tables 8.4 and 8.5. . Why are the pss in Figure 8.8 equal to or very close to zero for parameters K_RB, VK_CB, and RCH_1? . Why is the pss for POR_CB equal to zero for all predictions except A100z? Consider the location of the particle at 100 years, shown in Figure 8.7a. . Which of the effective porosity parameters should be included in analyses of prediction uncertainty conducted to further answer Question 2, and to answer Questions 3– 5? Why? (c) Determine the parameters that are important to the predictions using the parameter –prediction statistic. This exercise addresses Question 2 using the parameter – prediction (ppr) statistic, calculated with the omission of existing prior information on parameters K_RB and VK_CB. As discussed in Section 8.2.5, the ppr statistic calculates parameter importance to predictions in a manner that includes the effects of parameter uncertainty and correlation as well as prediction sensitivities. The pss shown in Figure 8.8 include only the effects of prediction sensitivities.
200
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
The program OPR-PPR (Tonkin et al., in press) is used to calculate the ppr statistic for individual parameters. For these calculations, a 10-percent reduction in parameter standard deviation is specified. Thus, the ppr statistic represents the percent decrease in the standard deviation of a prediction that is produced by a 10-percent decrease in the standard deviation of one parameter. The effective porosity of the aquifer is also included in the ppr calculation, and prior information and weighting are used to realistically represent its uncertainty. The sensitivities of the heads and flows to POR_1&2 are zero; thus, its uncertainty would be infinite if prior information were not imposed. The weighting used for the prior information is calculated by forming a 95-percent confidence interval of 0.27 to 0.39 for the true effective porosity value (see Guideline 6 in Chapter 11). Average ppr statistics for all advective-transport predictions are shown in Figure 8.9a. Figure 8.9b presents ppr statistics for predictions at 100 years, and Figure 8.9c shows the corresponding decreases in the prediction standard deviation. This intermediate result of the ppr statistic calculation is useful because it is important to evaluate whether a large percent reduction in a prediction standard deviation is associated with a very small change in the standard deviation. If so, then it might not be beneficial to use that particular ppr result for guiding field data collection, despite the large value of the statistic. Problem . Compare the results of Figure 8.9a,b with those in Figure 8.8. What are the differences in terms of which parameters rank as most important to predicted advective travel? For future data collection, which parameters would be most beneficial to further investigate according to the ppr results? . Explain the different rankings of parameter importance by the pss and ppr results. Consider the parameter correlations shown in Table 8.4 in answering this question. Recall that because POR_1&2 is not applicable to the model calibration, its correlation with all other parameters is zero. . Figure 8.9c shows the standard deviation decreases associated with the ppr values in Figure 8.9b. Are the standard deviation decreases for any of the predictions small enough to suggest that it might not be beneficial to collect additional data aimed at improving that prediction? Use distances traveled derived from Figure 8.7 and system dimensions of Figure 2.1. OPR-PPR also is used to calculate the ppr statistic for all possible groups of two parameters. This analysis is applicable if field data collection will involve simultaneously obtaining information about two parameters. A 10-percent reduction in the standard deviation of each parameter is specified, and so the ppr statistic for a parameter pair represents the percent decrease in the prediction standard deviation that is produced by a 10-percent decrease in the standard deviation of each parameter in a group. The results for the advective-transport predictions at 100 years are shown in Figure 8.9d. The ppr statistics are similar for all pairs that include K_RB or POR_1&2 and are similar for all other pairs, so average values are shown for each of these groups of parameter pairs.
8.8 EXERCISES
201
FIGURE 8.9 Parameter – prediction ( ppr) statistic and intermediate calculations for evaluating the importance of model parameters to predicted advective transport. The statistic for each prediction is computed as the percent decrease in prediction standard deviation produced by a 10-percent reduction in the standard deviation of a parameter. (a) For each parameter, average ppr statistic for all predictions. (b) The ppr statistics for predicted transport in the x, y, and z directions at 100 years. The statistics for K_RB and POR_1&2 are similar, as are the statistics for all other parameters. Thus, the average value for each of these groups is shown. (c) Decreases in prediction standard deviation (s0z‘ sz0‘ ( j) in Eq. (8.8)) corresponding to the ppr results in (b). (d) Average ppr statistics for evaluating the importance of pairs of parameters to predicted advective transport at 100 years.
202
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
Problem: Which parameter pairs would be most beneficial to simultaneously investigate, according to the ppr results shown in Figure 8.9d? (d) Assess the importance of existing observations to the predictions using the observation –prediction (opr) statistic. This exercise addresses Question 3, by evaluating the relative contribution that the existing head and flow observations make toward reliably simulating the advective-transport predictions. This analysis is useful after initial model calibration to guide further field investigation of existing observations that rank as most important to the predictions with the goal of ensuring that their representation in the model is as accurate as possible. For example, for the most important head observations, field work might involve more accurately measuring the screened interval depth and areal location of the corresponding monitoring wells. This information then could be used to update the observation location in the model. The program OPR-PPR (Tonkin et al., in press) is used to calculate the opr statistic for omitting individual observations. The opr statistic values equal the percent increase in the standard deviation of a prediction that is produced by omitting one observation. This analysis includes uncertainty in parameter POR_1&2, as described in Exercise 8.1c. The results for the advective-transport predictions at 100 years are displayed in Figure 8.10a, which shows the opr statistics, and Figure 8.10b, which shows the corresponding increases in the prediction standard deviation. As discussed for the ppr statistic in Exercise 8.1c, this intermediate result of the opr statistic calculation is useful because it is important to evaluate whether a large percent increase in a prediction standard deviation is associated with a small increase in the standard deviation. Problem . Using the opr results presented in Figure 8.10a, identify the observations that rank as most important to the predictions. . Observations can rank as important by the opr statistic if they are sensitive to parameters to which the predictions are sensitive, or if they are sensitive to parameters that are correlated with parameters to which the predictions are sensitive. Examine the dimensionless scaled sensitivities for the observations shown in Table 7.5 and the prediction scaled sensitivities shown in Figure 8.8. Do these sensitivities help explain the importance of observations head01.ss and flow01.ss? . Observations also can rank as important if their removal substantially increases parameter correlations, because prediction uncertainty tends to increase as these correlations increase. In the base case for the opr statistic calculations, prior information on K_RB and VK_CB is omitted, and there are several parameter correlations that are very large in absolute value, as shown in Table 8.4. Table 8.6 summarizes the increases in these parameter correlations that occur when individual head observations are omitted in the opr calculations. Use the information in this table, along with knowledge of how omission of the
8.8 EXERCISES
203
FIGURE 8.10 (a) Observation–prediction (opr) statistic calculated to evaluate the importance of existing head and flow observations to predicted advective transport in the x, y, and z directions at 100 years. The statistic is computed as the percent increase in prediction standard deviation produced by omitting an observation individually. The opr statistics for observation flow01.ss and predictions A100x and A100z equal 280,000 and 5404, respectively. (b) Increase in prediction standard deviation (sz0‘ sz0‘ (+i) in Eq. (8.11)) produced by omitting an observation. Increase in standard deviation for observation flow01.ss and prediction A100x equals 6 106 m. Note that this figure uses a logarithmic scale on the vertical axis.
.
flow observation affects parameter correlations (discussed in Exercises 4.1c, 5.1a, and 7.1f), to help explain the importance of observations head01.ss and flow01.ss by the opr analysis. Figure 8.10b shows the standard deviation increase associated with each opr statistic value in Figure 8.10a. Are the standard deviation increases for any
204
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
TABLE 8.6 Summary of Increases in Parameter Correlation Coefficients ( pcc) Caused by Omitting Individual Observationsa
Observation Name hd01.ss hd02.ss hd03.ss hd04.ss hd05.ss hd06.ss hd07.ss hd08.ss hd09.ss hd10.ss flow01.ss
Maximum Percent Increase in Any pcc
Number of pcc that Increase by More Than 1 percent
5.8 2.1 0.0 2.1 2.6 2.3 0.0 0.0 0.5 1.7 7.5
3 3 0 3 3 3 0 0 0 3 6
a
The percent increases and number of pcc shown in columns 1 and 2 of the table all occur for pcc that are greater than 0.90 in the base case calculation, in which no observations are omitted.
of the predictions small enough to suggest that further investigating observations for purposes of improving a particular prediction might not be warranted, despite relatively large opr statistics for the prediction? (e) Assess the likely importance of potential new observations to the predictions using dimensionless and composite scaled sensitivities and parameter correlation coefficients. This exercise addresses Question 4, by evaluating the likely importance to the predictions of potential new observations that would be collected under pumping conditions. The potential new observations include a hydraulic head in layer 1, row 9, column 18, and a streamflow gain or loss over all river cells in column 1 of the flow model. The potential new hydraulic-head observation location was chosen because the county has access to this property and county modelers believe that a location far from the river would provide substantial information about the model parameters. The two potential observations have different units of measurement. Their relative importance can be evaluated using dimensionless scaled sensitivities, which are scaled using weights. As discussed in Chapter 3, Section 3.3.3, the statistics commonly used to determine the weights are variances, standard deviations, and coefficients of variation that reflect observation error. For potential measurements, it is preferable to use variances or standard deviations. Coefficients of variation can only be used to calculate the weights if a reasonable guess is specified for the anticipated observed value, because when coefficients of variation are used the weight is calculated as vii ¼ 1/[cvi y]2, where y is the specified observed value. For this problem the variance of the potential head observation error is specified as 1.0025,
8.8 EXERCISES
205
FIGURE 8.11 Composite scaled sensitivities (css) for the observations used in model calibration and dimensionless scaled sensitivities (dss) for two potential new observations. All sensitivities are calculated using the parameter values estimated by regression.
to be consistent with the variances of head observation errors used in the calibration. The standard deviation of measurement error for the flow is set to 0.44 m3/s. Parameter correlation coefficients with and without the potential new observations also need to be included in the analysis. They are used to evaluate if adding potential observations reduces the values of any problematic correlations, as discussed in Section 8.3.1. The dss associated with the potential head and flow observations are plotted in Figure 8.11. Parameter correlation coefficient matrices with the potential observations are shown in Table 8.7. Problem: Answer Question 4 by comparing the dss to the css for the existing observations and by comparing the pcc for this exercise with those evaluated in Exercise 8.1b. ( f ) Assess the likely importance of potential new observations to the predictions using the observation – prediction (opr) statistic. This exercise addresses Question 4, by using the opr statistic to calculate the decrease in prediction uncertainty caused by adding potential new observations collected under pumping conditions. First, opr statistics are calculated for the potential head and flow data described in Exercise 8.1e. Second, the opr statistic is calculated for the case of individually adding a new head observation in each cell of the model domain. This latter analysis identifies all areas of the domain that would be good candidates for new head observation locations, in terms of improving the advective-transport predictions. Both of these analyses can be completed using the program OPR-PPR (Tonkin et al., in press). Both analyses include uncertainty in parameter POR_1&2, as described in Exercise 8.1c.
206
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
TABLE 8.7 Parameter Correlation Matrices for Final Parameter Values Calculated by MODFLOW-2000 Using the Existing Hydraulic-Head and Flow Observations Under Conditions of No Pumping Together with, Under Pumping Conditions, (a) Only the Potential Flow Observation, (b) Only the Potential Hydraulic-Head Observation, and (c) Both Potential Observationsa HK_1
K_RB
VK_CB
HK_2
RCH_1
RCH_2
0.97 20.32 20.97 20.996 1.00
20.94 0.32 0.97 0.998 20.99 1.00
(a) Only the Potential Flow Observation HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
1.00
20.42 1.00
20.93 0.20 1.00
Symmetric
20.96 0.34 0.97 1.00
(b) Only the Potential Hydraulic-Head Observation HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2
1.00
20.27 1.00
20.56 20.38 1.00
Symmetric
20.98 0.18 0.64 1.00
0.96 20.062 20.61 20.95 1.00
20.93 0.093 0.63 0.97 20.97 1.00
0.93 20.060 20.50 20.91 1.00
20.89 0.10 0.53 0.95 20.95 1.00
(c) Both Potential Observations HK_1 K_RB VK_CB HK_2 RCH_1 RCH_2 a
1.00
20.33 1.00 Symmetric
20.43 20.42 1.00
20.96 0.21 0.54 1.00
Values greater than 0.95 are in bold type.
Results of the first analysis are presented in Figure 8.12, which shows the opr statistic calculated for each potential observation and each of the nine advectivetransport predictions. Problem . Answer Question 3 by evaluating the opr statistics shown in Figure 8.12. . Why does the potential head observation have larger values of the opr statistic for all predictions than does the potential flow observation? Consider the parameter correlation coefficients shown in Table 8.7. . Why is the potential flow observation relatively unimportant to the predictions, in contrast to the very large importance of the existing flow observation as shown in Figure 8.10a.
8.8 EXERCISES
207
FIGURE 8.12 Observation – prediction (opr) statistic calculated to evaluate the importance of the potential head and flow observations to predicted advective transport in the x, y, and z directions at 10, 50, and 100 years. The statistic is computed as the percent decrease in prediction standard deviation produced by adding either the head or the flow observation.
Results of the second analysis are presented in Figure 8.13. To understand these results, examine the reduction in parameter correlation coefficients ( pcc) caused by adding a potential hydraulic-head observation at any cell throughout the model domain. The reductions in pcc are summarized in Figure 8.14, which shows the maximum percent reduction in any pcc when a hydraulic-head observation is added at a particular cell center. Problem . Examine Figure 8.13 and identify the best locations for collecting additional hydraulic-head data, from the perspective of reducing the uncertainty in predicted advective transport at 100 years. . How do the percent reductions in parameter correlations shown in Figure 8.14 help explain the results in Figure 8.13? In Figure 8.14, the maximum percent reduction in correlation is almost always associated with parameter VK_CB. Use your knowledge about the flow system to help explain why adding a hydraulic head most reduces the correlations for this parameter. Exercise 8.2: Prediction Uncertainty Measured Using Inferential Statistics This exercise involves computing both linear and nonlinear confidence intervals on the predicted advective transport at 10, 50, and 100 years. Uncertainty in parameter POR_1&2 is included in these analyses, through prior information and weighting, as explained in Exercise 8.1c.
208
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
FIGURE 8.13 Observation-prediction (opr) statistics (Eq. (8.9)), showing the percent decrease in prediction standard deviation likely to be produced by collecting one new hydraulic-head observation under pumping conditions. New observations are located at each node in model layer 1. For each node, the new head is added to the existing 11 observations and the opr statistic is calculated. This procedure is repeated for all nodes and the resulting values are contoured. The contours range from 10 to 80 percent. The opr values plotted are averaged over the three advective transport directions for which the ADV Package produces results.
(a) Calculate linear confidence intervals on the components of advective transport. This exercise addresses Question 5, using linear confidence intervals on the simulated advective-transport predictions to quantify their uncertainty. For students performing the simulations, instructions for calculating the intervals are available from the web site for this book listed in Chapter 1, Section 1.1. Linear, individual 95-percent confidence intervals and linear simultaneous (Bonferroni) 95-percent confidence intervals in the x and y directions are shown on a map of model layer 1 in Figure 8.15a,b. Bonferroni simultaneous intervals are used instead of Scheffe´ simultaneous intervals because, for the finite number of intervals of interest here, the Bonferroni intervals are smaller (see discussion in Section 8.4.1). The intervals in the z direction are shown in Figure 8.16. Problem . Explain conceptually why the linear simultaneous intervals are larger than the linear individual intervals. Of these two linear confidence intervals, which might be the preferred representation of uncertainty? Why? . Answer Question 4 above using the linear confidence intervals on the advective-transport predictions.
8.8 EXERCISES
209
FIGURE 8.14 Maximum percent reduction in any parameter correlation coefficient (pcc), produced by adding one new hydraulic-head observation under pumping conditions. New observations are located at each node in model layer 1. For each node, the new head is added to the existing 11 observations and the pcc is calculated, and the percent decrease from the base case pcc (with only the existing 11 observations) is calculated. This procedure is repeated for all nodes and the resulting values are contoured. The contours range from 10 to 40 percent. The maximum percent reduction for each node is determined using only base case pcc that are greater than 0.90.
.
Do the confidence intervals make sense? Is there anything surprising about them? Note that because they are linear, these intervals sometimes do not account for the physics of the problem. They can include values that are physically implausible, such as predicted advective-transport values that lie outside the model domain. This is typical of linear intervals.
(b) Calculate nonlinear confidence intervals on the components of advective transport. This exercise revisits Question 5, using nonlinear individual and simultaneous (Scheffe´ d ¼ NP) confidence intervals on the components of simulated advective transport. The nonlinear 95-percent confidence intervals in the x and y directions are shown on a map of model layer 1 in Figure 8.15c,d. The intervals in the z direction are shown in Figure 8.16. The nonlinear intervals were calculated using UCODE_2005. In Figure 8.15c and 8.15d, there are dashed lines on the confidence intervals calculated for the particle position at 100 years. The interval limits involved are simulated using parameter values that cause the advective transport path to reach the well
210
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
FIGURE 8.15 Plan view of the model grid showing the predicted advective-transport path from the proposed landfill and 95-percent confidence intervals in the x and y directions at simulated travel times of 10, 50, and 100 years (locations labeled on each map). The true path also is shown. (a) Linear individual confidence intervals. (b) Linear simultaneous (Bonferroni) confidence intervals. (c) Nonlinear individual confidence intervals. (d) Nonlinear simultaneous (Scheffe´ d ¼ NP) intervals. (a) and (b) At 100 years, the linear intervals in the x and y directions extend outside the model domain. In (a), the upper limit in the x direction is 18,930 m and the limits in the y direction are 217,680 m and 28,390 m. In (b), the limits in the x direction are 25,430 m and 25,040 m and those in the y direction are 233,110 m and 43,820 m. (c) and (d ) The dashed lines at 100 years reflect projections simulated by the ADV Package when a particle exits the model. See discussion in text about this and about the intervals in (d ) at 50 and 100 years that extend to the river.
8.8 EXERCISES
211
FIGURE 8.16 Predicted z locations of the advective-transport path, linear individual and simultaneous (Bonferroni) 95-percent confidence intervals, and nonlinear individual and simultaneous 95-percent confidence intervals at simulated travel times of 10, 50, and 100 years. True z locations also are shown. The bottom of the model domain lies at an elevation of 210 m. For prediction A100z, the limits of the linear individual intervals are 266 m and 152 m. The limits of the linear simultaneous intervals are 2141 m and 234 m.
prior to 100 years. When a particle reaches a flow model boundary prior to the prediction time, the ADV Package projects the particle until the prediction time is reached, using the particle velocity when the particle exits the model (Anderman and Hill, 2001, p. 12). As they explain, this procedure is very useful when estimating parameter values because it makes sensitivities remain informative. In the model considered here, the large velocity at the pumping well causes the particle to be projected a considerable distance in the y direction after entering the well, and the resulting interval limit is not meaningful. A better approximation of the confidence interval limit in that direction is the location where the particle left the system. Here, that location is the well, as indicated by the solid lines in Figure 8.15c,d. Other examples of projected particles are not apparent in Figure 8.15d, but affect the output files used to construct the figure. In Figure 8.15d, the confidence intervals in the x direction at 50 and 100 years extend to the river. These limits are simulated using parameter values that cause the advective transport path to reach the river prior to the prediction time of 50 or 100 years. The ADV Package projects the particle
212
MODEL PREDICTIONS, DATA NEEDS, AND PREDICTION UNCERTAINTY
until the prediction time is reached. The interval limit printed in the output file falls outside the model domain, which is not meaningful. In this case, the appropriate limit at both 50 and 100 years is the location of the river, as shown in Figure 8.15d. Dashed lines are not needed because interval limits outside the model domain are not plotted. Problem: Reevaluate the answer to Question 4 considering the nonlinear confidence intervals on the components of advective transport. Is the answer based on the nonlinear intervals different from that based on linear intervals?
9 CALIBRATING TRANSIENT AND TRANSPORT MODELS AND RECALIBRATING EXISTING MODELS
The methods presented in Chapters 3 to 8 are applicable to models of any system. However, there are special considerations when applying the methods to certain types of models. This chapter discusses three types of models that are of special interest to many scientific and engineering fields: transient models, transport models, and existing models that may need to be recalibrated.
9.1
STRATEGIES FOR CALIBRATING TRANSIENT MODELS
In many natural and engineered systems, conditions change with time. When simulating these transient systems, it is important to carefully consider (1) initial conditions, (2) representation and weighting of transient observations used for model calibration, and (3) definitions of model inputs and parameters added for the transient simulation. 9.1.1
Initial Conditions
For transient models, initial conditions define the system state at the beginning of the simulation. In some situations, such as for atmospheric systems, conditions change very quickly and there is no choice but to begin the simulation with initial conditions derived directly from measurements. In this case, the specified initial state of the system generally is inconsistent to some degree with the conservation equations Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty. By Mary C. Hill and Claire R. Tiedeman Published 2007 by John Wiley & Sons, Inc.
213
214
CALIBRATING TRANSIENT AND TRANSPORT MODELS
and properties of the model. That is, the initial simulated processes and properties result in a simulated system state that differs from the initially specified state. When the simulation is started, the simulated state changes from the initially specified state to become consistent with the simulated initial processes and properties and ongoing stresses. For a system in which conditions change rapidly, the inconsistencies between the initially specified state and the simulated initial processes and properties generally are not problematic because the ongoing stresses soon dominate the solution. As a result, comparing observed and simulated values becomes meaningful after a relatively short simulated time. Other systems, like most groundwater flow systems, change slowly with time. For these systems it is preferable, and often possible, to begin the transient simulation with initial conditions produced by a steady-state simulation. This ensures initial conditions that are consistent with all simulated processes and properties except for the imposed transient stresses (Franke et al., 1987). This consistency is important, because inconsistencies in initial conditions can endure for long periods of simulated time in slowly changing systems. From the perspective of model calibration, enduring inconsistencies can affect model fit and, therefore, estimated parameter values and model design. If initial conditions for slowly changing systems are derived directly from measurements, it is critical to evaluate how any enduring inconsistencies affect the model, the estimated parameters, and predictions of interest. Often, the effect of imposed initial conditions can be evaluated using a simulation in which transient stresses and processes are omitted from the simulation. For example, in a groundwater flow simulation, if the transient stress is pumpage and the predictions of interest are changes in head (drawdowns), it is important to simulate the system without any transient changes in pumpage to determine the head changes simulated as the model adapts to the imposed initial conditions. If the head changes are large relative to the head changes simulated with pumpage, comparisons of observed and simulated drawdowns can be meaningless, and attempts to calibrate the model using such comparisons can lead to a badly flawed model. Finally, for slowly changing systems it is important to reevaluate the effects of imposed initial conditions as regression proceeds and model construction and parameter values change. 9.1.2
Transient Observations
Four issues important to using transient observations are discussed in this book: the effects of imposed initial conditions, inconsistencies in observed and simulated temporal effects, weighting of transient observations, and temporal differencing. The first issue was discussed in Section 9.1.1. The other three are introduced here and are also addressed in Guidelines 4 and 6 of Chapter 11 and the field examples presented in Chapter 15. Observed and Simulated Temporal Effects When temporally varying observations are used to calibrate models of transient systems, it is important to consider whether the transient effects contained in the observations are consistent with the
9.1 STRATEGIES FOR CALIBRATING TRANSIENT MODELS
215
simulated transient processes. For example, consider water-level data in wells collected on a daily or weekly basis by an automated recording device being used in a model for which simulated transient effects are limited to pumpage and recharge that vary seasonally. The data are likely to be affected by hydrologic processes that are not simulated in the model, such as water-table fluctuations due to short-term hydrologic events like storms or daily variations in pumping rates. Using the numerous high-frequency data as observations can require substantial data preparation effort and make it difficult to evaluate the aspects of model fit most important to the designed purpose of the model. It can also result in (a) optimized parameter estimates that are unreasonable because they are making up for unrepresented processes, (b) poor fit to observations and simulated values that are often consistently above or below the observations, and/or (c) problems with convergence of the parameter-estimation iterations. Using observations that are consistent with the simulated processes eliminates the difficulties. We call these time-consistent observations. While it can be difficult to obtain completely time-consistent observations, substantial understanding of the data and the model can be derived from trying and generally sufficient consistency is achievable. In the example above, for which the data are far more frequent than the simulated processes, time-consistent observations can be obtained by temporally averaging observations or by selectively using observations that are thought to be representative for each season. Obviously, clearly describing the criteria used to obtain the observations from the data is critical to the integrity of any model. If the data are affected by unrepresented, high-frequency transient processes but the data are infrequently collected, other possibilities need to be considered. In some cases, the time of year or time of day can be used to determine if the observation is likely to be different from the desired time-average quantity, and time-consistent observations can be obtained by making appropriate adjustments. More frequent data in nearby locations or similar circumstances sometimes can be used to support adjustments. If there is no way to know whether a value is too high or too low, this implies that no bias can readily be identified and the data can be used directly as an observation. In all cases, the weighting of the observations needs to reflect the analysis of errors conducted, as discussed in the next section. A common situation is that data affected by transient processes are used to calibrate a steady-state model. This situation is discussed in Chapter 15, Section 15.2.1. An example of using transient data to define the time discretization of simulated processes to obtain time-consistent observations is presented by Bravo et al. (2002). They used this approach to calibrate a model of steady-state groundwater flow and transient heat transport in a wetland system. Frequency domain analysis of waterlevel and temperature data revealed time periods when heat transport showed transient variation yet groundwater flow could be assumed to be at steady state. Calibration of the model over these time periods using the water-level and temperature observations allowed estimation of average rates of flow from the wetland to the groundwater system.
216
CALIBRATING TRANSIENT AND TRANSPORT MODELS
Weighting Transient Observations When determining weights for temporal observations, it is important to assess whether observation errors are correlated or are independent over time. Observation errors can be classified as presented in Appendix A, Section A.3 in the fifth assumption required for diagonal weights to be correct. Briefly, the classes are (1) constant over time (perfectly correlated), (2) correlated over time (autocorrelated), and (3) random over time (uncorrelated). For measurements of hydraulic head over time in a single monitoring well, examples of these types of error are as follows. Well elevation measurement error is constant in time, because the elevation is typically measured once and used to calculate hydraulic head whenever water levels in the well are measured. Temporally correlated errors include the transient drift of the reference point of a pressure transducer and the lag due to mechanical friction of float-type measuring device (Rosenberry, 1990). Random errors can be generated by procedures and devices used to measure the depth to water in a well. Determining each of these types of error for a given observation requires careful consideration of measurements that contribute to each observation and the errors associated with each measurement. As discussed in Guideline 6 in Chapter 11, variances of these individual measurement errors can be added to obtain the total variance for the observation. Temporal Differencing Differencing is a useful method for addressing errors that are constant over time. In groundwater modeling, differencing is commonly used because often the drawdown of hydraulic head caused by pumping is the observation of greatest interest. Drawdown equals the difference in hydraulic heads measured at two different times. In other scientific fields the system stress may be different, but often the change caused by some stress is of interest. In these circumstances it is desired not only to match the measured values, but also to match the changes in these values. Differences can be used to emphasize these changes and to achieve a simpler diagonal weight matrix. Differencing can produce a much more effective set of calibration observations in some circumstances. See Assumption 5 in Appendix A for more information. 9.1.3
Additional Model Inputs
A final consequence of a transient model is that, in most systems, the transition from steady state to transient simulation requires additional model inputs. In groundwater systems, the primary additional inputs include (1) storage coefficients; (2) stresses that vary over time, such as recharge and pumping rates; and (3) boundary conditions that vary over time, such as lake or river levels. These properties can be parameterized and estimated in the same manner as for system properties that are invariant in time, as suggested by the methods and guidelines presented in this book. As for time-invariant model inputs, a common calibration issue for transient properties is that the potential number of parameters generally is large. For example, for a transient groundwater flow model with a total simulation time of 10 years and with recharge from precipitation that varies on a monthly basis, there are potentially
9.2 STRATEGIES FOR CALIBRATING TRANSPORT MODELS
217
120 different recharge parameters even before spatial variability is considered. It is not likely that the observation data available for calibration would support independent estimation of all of these parameters. One strategy is to estimate an initial distribution of recharge over time, using site data on climate, geology, vegetation, and so on, and then define one or more multiplicative parameters that scale this initial distribution of recharge (e.g., Tiedeman and Gorelick, 1993). The parameters have initial values of 1.0, and regression is used to estimate their optimal values. In some models of transient systems, only a subset of the simulated stresses vary in time. For example, in a transient model of a regional groundwater system in the arid environment of the Albuquerque Basin, Tiedeman et al. (1998b) simulated historical pumping rates that varied by season and used transient observations for calibration. Recharge and regional groundwater inflow were simulated as temporally constant, representing average annual values over the calibration period. This approach was reasonable because the typically large depth to the water table resulted in small seasonal recharge effects. A similar method was employed by Faunt et al. (2004).
9.2
STRATEGIES FOR CALIBRATING TRANSPORT MODELS
Predicting transport of introduced or naturally occurring constituents is of interest in many systems. Preferably transport observations, such as concentrations or concentrations summarized as advective-travel observations, moment observations, and so on, are available for model calibration. However, sometimes only observations related to the flow system are available for model calibration, and the model is used to predict some aspect of transport. This circumstance is difficult in that predictive capability can be compromised. However, models are still the only way to introduce conservation principles into the analysis of complicated natural environments, and, as discussed in Chapter 1, calibration is an important option in many circumstances. The difficulties involved in simulating concentration with models developed using only flow-system observations make the sensitivity analysis and uncertainty evaluation methods described in this book especially critical to pursue in these circumstances. When calibrating transport models, it is important to carefully consider (1) what transport processes to include, (2) definition of the source, (3) scale issues, (4) numerical accuracy and execution time, (5) representation and weighting of transport observations used for model calibration, and (6) additional model inputs. These issues are addressed in the following sections. We also give some examples of how to obtain a tractable, useful model. 9.2.1
Selecting Processes to Include
The transport of constituents in natural and engineered systems is the result of many processes, generally including advection, transient changes in advection, dispersion, retardation, and chemical reaction with other dissolved or particulate transported
218
CALIBRATING TRANSIENT AND TRANSPORT MODELS
constituents and the surrounding material (e.g., rocks and sediments in groundwater systems). In addition, there are processes such as density and temperature that can result in feedback effects on the flow field. Also, multiple phases such as nonaqueous phase liquids (NAPLs), alternate transport mechanisms such as colloids, and living particles such as viruses can be simulated. Execution times generally increase dramatically as more processes are included. The importance of thinking carefully about execution times is discussed in Chapter 15, Section 15.1. The options for simulating transport are generally as follows: Option 1. Use advection to approximate transport travel time and direction. Option 2. Use advection and dispersion to simulate the arrival of low concentrations at the plume front, the concentration and time at the plume peak, and/or enduring low concentrations at the plume tail. Option 3. Use advection, dispersion, reactions, and other mechanisms to account for additional processes that can affect arrival times, peak concentration times, and duration of low concentrations. A model calibration effort need not use just one of the options mentioned. As suggested by Anderman et al. (1996), it can be advantageous to begin with just advective transport (option 1) to obtain the correct transport direction and timing. Progressively more processes can then be developed and tested against the available data and can be evaluated on the basis of the importance of the additional processes to predictions. This progression is consistent with Guideline 1 of Chapter 11. Decisions about which transport processes to simulate depends on many things (e.g., see Zheng and Bennett, 2002, Chapter 7). Here we are concerned with how observations and parameters can be defined to achieve a tractable model calibration problem and a useful model. Relevant issues are discussed and examples are presented in Sections 9.2.2 – 9.2.7. 9.2.2
Defining Source Geometry and Concentrations
When simulating the transport of contaminants from a disposal source, the location at which the disposal occurred and the release history of contamination are important to simulated concentrations. To the extent that simulated source characteristics are incorrect, parameter estimates that produce a good match to measured concentrations might be unattainable or, even if a good match is achieved, may be unable to produce good predictions for other circumstances. In some situations, the source characteristics are well known and can be used directly in the model. Even in these situations, however, difficulties can occur because local flow characteristics associated with source emplacement may not be simulated and/or the extreme concentration gradients at the source cause unrealistic numerical spreading of the plume at early time (e.g., LeBlanc and Celia, 1991; Zhang et al., 1998; Barlebo et al., 2004).
9.2 STRATEGIES FOR CALIBRATING TRANSPORT MODELS
219
In other situations, the source location and history are poorly known. An alternative to specifying the source characteristics is to simultaneously estimate its location and release history along with model parameter values. Wagner (1992) and Mahar and Datta (2001) present methods for this simultaneous estimation problem. Applications of their methods to synthetic examples show that the approach has promise, but that nonuniqueness can be problematic. Simultaneously estimated, unique values could only be achieved with constraints imposed on the parameter values and the source characteristics. When the source location is well known, but the source history is not, nonuniqueness can be less problematic: Sonnenborg et al. (1996) and Medina and Carrera (1996) successfully estimated source concentrations along with flow and transport parameters. An advantage of defining parameters that represent the source characteristics is that sensitivity analysis methods can be used to investigate the importance of the parameters to model fit to observations and to model predictions. 9.2.3
Scale Issues
In addition to carefully considering which transport processes to simulate and which parameters to estimate, it also is important to recognize that features that are inconsequential to flow may be important to transport. This has been demonstrated for groundwater systems by a number of authors, including Poeter and Gaylord (1990), Zheng and Gorelick (2003), and De Marsily et al. (2005). Both unresolved smaller-scale features and misrepresented larger-scale features are of concern. Calibration of transport models with unresolved and misrepresented features can cause problems with the regression and its results. For instance, the regression may estimate unreasonable parameter values to compensate for the inaccurate representation of system features, produce a poor fit to transport observations, and/or have difficulty converging to optimal parameter estimates. Scale issues are problematic for groundwater models because (1) the variability consists of subsurface materials with properties that can vary by many orders of magnitude, (2) only a small portion of the subsurface material of any system is measured or measurable using current technology, and (3) the variability is important to the quantity and the quality of the groundwater needed by human and ecological communities. This book does not address scale issues comprehensively, though the methods and guidelines provide important tools and ideas for addressing scale issues. Interesting methods that have been developed to deal with scale issues in transport problems include, for example, zonation (discussed further in Guideline 2, Chapter 11), the transitional probability method of Carle et al. (1998), the superparameter method of Tonkin and Doherty (2005), the constrained minimization method of Doherty (2003) and Moore and Doherty (2005, 2006), the representer method of Valstar et al. (2004), and the sequential indicator simulation method of Deutsch and Journel (1992, pp. 123 –125, 148) and Gomez-Hernandez (2006), some of which were mentioned in Chapter 1 of this book. Scale issues are also addressed
220
CALIBRATING TRANSIENT AND TRANSPORT MODELS
in many of the guidelines presented in Chapters 10 to 14. This is an area of active research and it has not yet matured to the point of achieving thorough comparison of methods for realistic problems. One can imagine that there might be a continuum between the solute spreading best represented by explicit representation of subsurface heterogeneity and solute spreading best represented by a simple representation of the heterogeneity combined with volume-averaged processes. An intermediate model would include some explicit representation of heterogeneity and some dispersion. It is not known where models along this continuum would be of most use in different practical problems. This seems to the authors of this book to be a rich area for future research. 9.2.4
Numerical Issues: Model Accuracy and Execution Time
Forward execution times of less than about 30 minutes or so are important to being able to explore the meaning of data and model processes regardless of the method used for model calibration (see Chapter 15, Section 15. 1). Numerical accuracy is important to obtaining accurate sensitivities and parameter estimates. For transport problems, there is often a trade-off between accuracy and execution time—greater accuracy requires longer execution times. Also, as mentioned in Section 9.2.1, added processes can require much greater execution times. The three options for simulating transport defined in Section 9.2.1 each have issues related to model execution time and accuracy of the numerical simulation. Option 1 Simulate advective transport with particle tracking methods that require only slightly more execution time and computer storage than a flow model without any transport. A problem with applying regression methods to models with advective-transport observations is that if a particle exits the grid when the model is solved for a particular set of parameters, its sensitivities cannot be calculated. Anderman and Hill (2001) address this problem by calculating a projected position for any particle that leaves the grid, using the particle velocity at the point where it exits the model. Sensitivities are then calculated using the projected position. Advective-transport simulation accuracy can deteriorate as the model grid becomes coarse. This can be addressed by testing different grid refinements and possibly locally refining grids in selected parts of the model (e.g., Mehl and Hill, 2006). Zheng (1994) discusses two situations that can cause inaccurate particle tracking results when the model grid discretization becomes coarse: the presence of a weak sink or source, and a vertically distorted grid. He presents computational methods for minimizing the particle tracking errors that arise from these problems, which can be used when it is impractical to more finely discretize the grid. These methods are now used in many particle tracking codes. Options 2 and 3 Simulating advection and dispersion requires substantially more execution time than does simulating advection only, and including reactions and other transport mechanisms generally requires even greater execution times. For
9.2 STRATEGIES FOR CALIBRATING TRANSPORT MODELS
221
these types of transport simulations, numerical accuracy issues are much more problematic than for simulation of advective transport. Concerns discussed here are time-step size coordination, problems with perturbation sensitivities when using Lagrangian methods, and the consequences of numerical dispersion. Slight changes in time-step size can cause slight changes in the concentration simulated at a particular location and time. When calculating sensitivities by perturbation methods, these small changes in concentrations can cause large changes in sensitivities. This situation occurs commonly when the transport-step size is automatically calculated by the modeling software to satisfy, for example, the Courant number criterion (e.g., Zheng and Bennett, 2002, p. 187; Barth and Hill, 2005a). It is very likely that the transport-step size will be different in the two simulations needed to compute the perturbation sensitivities for each parameters, which adversely affects the sensitivities. This problem can be eliminated by executing the transport code for the starting parameter values, noting the model-calculated transport time step, and defining a somewhat smaller time step to calculate sensitivities and perform parameter estimation. In regression runs, the parameter values and simulated flow field changes from one regression iteration to the next. To ensure that the imposed transport-step size remains valid, occasionally use the updated parameter values in a forward run for which the model calculates the step size. In Lagrangian methods of solute transport simulation, concentrations in a model cell or element are calculated from masses or concentrations associated with the particles present in the cell. Examples are the method of characteristics and the random-walk method. Simulated concentrations at one location tend to be accurate on average but are not smooth over time. Instead, they tend to oscillate about a smooth curve as particles leave and enter the cell. The oscillations are reduced as more particles are used, but enough particles to obtain a reasonably smooth curve often results in long execution times for practical problems. It is dangerous to use perturbation methods to calculate sensitivities based on oscillating concentrations. Depending on the oscillation captured in the two runs required to obtain perturbation sensitivities, the sensitivity can range from being much too small to much too large. Resulting sensitivities commonly do not vary smoothly from one time step to the next, or from one parameter estimation iteration to the next, which can cause substantial problems with convergence of the nonlinear regression procedure. Sonnenborg et al. (1996) minimized this problem by calculating the sensitivities using concentrations averaged over time periods that are longer than the model time steps. This approach can easily be implemented using universal inverse models such as UCODE_2005 and PEST. Numerical dispersion is a common problem and its presence affects estimated parameters and model predictions. Mehl and Hill (2001) illustrated this by simulating a two-dimensional laboratory experiment constructed of discrete, randomly distributed, homogeneous blocks of five sands. They first demonstrated that when laboratory measurements of hydraulic conductivity and dispersivity values are used directly in the transport model, a poor fit to the measured breakthrough curve (BTC) is achieved (Figure 9.1a). Results of these simulations also show
222
CALIBRATING TRANSIENT AND TRANSPORT MODELS
FIGURE 9.1 Measured and simulated breakthrough curves (BTCs) for a two-dimensional laboratory experiment of transport through saturated sands. For the measured concentration values, 95 percent confidence intervals are shown and reflect measurement error. (a) BTCs using measured hydraulic conductivities and dispersivities. Computation times are listed in brackets and are from a LINUX workstation, Pentium II-333, 64 Mb RAM. (b) BTCs using optimized hydraulic conductivities and measured dispersivities. The solution labeled P-C(2) uses dispersivity values increased to approximate the numerical dispersion common to the FD and MMOC methods of MT3DMS. (From Mehl and Hill, 2001.)
9.2 STRATEGIES FOR CALIBRATING TRANSPORT MODELS
223
that, as coded in MT3DMS (Zheng and Wang, 1999; Zheng, 2005), the finitedifference (FD) method and the modified method of characteristics (MMOC) exhibit more numerical dispersion than the method of characteristics (MOC) and the total variation diminishing (TVD) method. A predictor-corrector (P-C) method added to MT3DMS for the study also had little numerical dispersion. Sensitivities found using the different solution methods produced similar conclusions about what parameters were important, but regression estimates of dispersivity and hydraulic conductivity parameters were strongly affected by numerical dispersion. When dispersivity was set to laboratory-measured values, regression using FD and MMOC produced substantially different hydraulic-conductivity estimates than did MOC, TVD, and P-C. Better fits to measured BTCs were achieved for FD and MMOC (Figure 9.1b), which have more numerical dispersion. This suggests that the measured dispersivities were consistently too small and the estimated hydraulic conductivities were compensating for the bias in the measured dispersivities. When a single multiplicative dispersivity parameter and the five hydraulic-conductivity parameters were estimated, similar hydraulicconductivity estimates and a similar fit were attained for all solution methods, and dispersivity estimates were larger for methods with little numerical dispersion. 9.2.5
Transport Observations
Three issues important to using transport observations are discussed: (1) simultaneous use of transport and flow-system observations, (2) weighting concentration observations, and (3) using point concentrations to determine other types of observations. Simultaneous Use of Transport and Flow-System Observations Concentration observations are important to the estimation of both flow and transport parameters, typically providing substantial information about transmissive properties such as hydraulic conductivity and transmissivity, because (1) concentrations are sensitive to velocities and (2) in process-based models, velocity magnitude and direction depend on these properties. Simultaneous use of concentrations and other types of data is likely to be more successful than a sequential estimation strategy, by which, for example, head and flow observations are used to estimate flow model parameter values, when these values are fixed and concentration data are used to estimate the transport parameters. Wagner and Gorelick (1987) were the first to develop a coupled estimation methodology and applied it to a synthetic example. Several later studies have shown that coupled estimation of flow and transport parameters produces parameter estimates that are more reasonable and have reduced uncertainty, compared to a sequential estimation strategy or a procedure whereby subsets of the observations (e.g., only heads or only concentrations) are used to estimate both flow and transport parameters (e.g., Gailey et al., 1991; Sonnenborg et al., 1996; Barlebo et al., 1998; Anderman and Hill, 1999). In some cases, a sequential estimation strategy might produce the same results as those from a coupled inverse procedure (e.g., Jacques
224
CALIBRATING TRANSIENT AND TRANSPORT MODELS
et al., 2002). However, there is no guarantee of this, and thus the use of simultaneous estimation of flow and transport parameters is encouraged. Weighting Concentration Observations As discussed in the context of Eq. (3.6) in Chapter 3, Section 3.3.3, the standard deviation of errors in concentrations often can be thought of as being proportional to the concentration. Under this circumstance, the standard deviation equals the product of a coefficient of variation and a concentration; both observed and simulated concentrations have been used in the literature, though there is some indication that simulated values are needed for unbiased parameter estimates (Anderman and Hill, 1999). Valstar et al. (2004) provide an example of using errors that are proportional to concentrations. Figure 9.2b shows how weighted residuals can vary depending on whether simulated or observed values are used to calculate the weights. When applying parameter-estimation methods to transport models, a numerical problem can occur when the range of concentration observations spans more than about four orders of magnitude (Barth and Hill, 2005a,b). Such a large range occurs for many constituents, such as dissolved aqueous species or pathogens. The difficulty arises when observation uncertainty is represented as being proportional
FIGURE 9.2 (a) Simulated breakthrough curves trailing and not overlapping the observed breakthrough curve, and (b) the resulting weighted residuals employing weights calculated using coefficients of variation and either observed or simulated concentrations (referred to as observed- and simulated-value weights in the legend). Observed- and simulatedvalue weights produce weighted residuals of similar magnitude. However, decreasing the transport rate so that the simulated breakthrough curve shifts to the right and beyond the period of observations decreases the sum of squared weighted residuals with observedvalue weighting while having no significant effect on weighted residuals with simulatedvalue weighting. (From Barth and Hill, 2005a.)
9.2 STRATEGIES FOR CALIBRATING TRANSPORT MODELS
225
to the concentration. Applying a constant coefficient of variation for all observations can result in enormous weighted residuals for small concentrations. This can occur even if the concentrations are log-transformed. A solution is to place a lower bound on the statistic, and thus an upper bound on the weight, as suggested by, for example, Keidser and Rosbjerg (1991). Barth and Hill (2005a) show that it can be important to approach the upper bound gradually. Alternatives to Using Point Concentration Measurements as Observations Often when calibrating transport models, use of point concentrations as measures of goodness of fit can be problematic, because (1) concentration measurements can vary over many orders of magnitude, (2) concentration measurements are often scarce spatially, and (3) simulated point concentrations depend on the particular representation of heterogeneity in the model, though the predictions of interest may be averaged quantities that do not have this dependence. Thus, alternative measures of goodness of fit might be preferable when calibrating transport models. Here, we cite four alternatives to using point concentration measurements. 1. To calibrate a model of natural-gradient tracer transport in an extremely heterogeneous aquifer, Feehley et al. (2000) divide the model domain into six zones along the flow direction and compare simulated and observed masses within each zone. 2. For calibrating a model of a different natural-gradient tracer test in the same aquifer studied by Feehley et al. (2000), Julian et al. (2001) compare the maximum simulated concentration from all model layers at a given areal location with the maximum observed value from all vertical sampling points at that location. 3. Barth and Hill (2005a,b) use moments of the concentration distribution. 4. Anderman et al. (1996) use concentration measurements to derive advectivetransport observations. 9.2.6
Additional Model Inputs
Simulation of transport often brings additional observations to the model calibration effort, and also brings additional system characteristics that typically are determined at least in part using additional estimated parameters. If only advection is considered, effective porosity is the sole additional system characteristic required (as was the case for simulation of advective-transport predictions in Exercise 8.1). If dispersive processes are included, dispersivity in up to three spatial directions is needed. If multicomponent reactive transport is considered, there are potentially a large number of additional system characteristics (e.g., see Parkhurst, 1995; Prommer et al., 2003). All of the new system characteristics can vary spatially. As for transient models, the additional model inputs can be parameterized and parameters estimated as suggested by the methods and guidelines presented in this book.
226
CALIBRATING TRANSIENT AND TRANSPORT MODELS
Problems with insensitivity and correlation, as discussed in Chapter 4, can be troublesome in transport models. When only advection is simulated and the flow field is steady state, hydraulic conductivity, recharge, and effective porosity tend to be intercorrelated. Insensitivity and correlation can be severe for multicomponent reactive transport models, because each component is potentially characterized by several separate transport properties.
9.2.7
Examples of Obtaining a Tractable, Useful Model
First, we mention two examples presented in more detail in Chapter 15 with very different scales and modeling objectives. In a regional model of the Death Valley groundwater system with 1500-meter grid spacing, advective transport was chosen to simulate transport predictions. In a site-scale model of the Grindsted landfill in Denmark with grid spacing as small as 1 meter, advection and dispersion were included to calibrate with concentration observations. See Chapter 15 for additional information. Three recent applications of nonlinear regression to multicomponent reactive transport models achieved tractable problems by carefully selecting which parameters to specify rather than estimate, or by simplifying the simulated processes without sacrificing the ability of the model to reasonably represent the simulated system.
1. To calibrate a model of vapor phase hydrocarbon transport, Gaganis et al. (2002) grouped sets of individual hydrocarbon constituents with similar thermodynamic properties into composite constituents. They then defined thermodynamic parameters associated with these composite constituents, and thus substantially reduced the total number of model parameters. These transport parameters were estimated using concentrations of the composite constituents as the calibration observations. 2. Ghandi et al. (2002b) calibrated a model of cometabolic trichloroethylene biodegradation by retaining the full model complexity and large number of parameters, and fixing several insensitive parameters at values obtained from laboratory experiments or previous modeling studies. 3. Essaid et al. (2003) made two simplifications when applying regression to a model of hydrocarbon dissolution and biodegradation. First, they represented biodegradation by using first-order reactions rather than Monod kinetics, because of high correlations between the Monod kinetics parameters. Second, they defined a single dissolution rate parameter for all hydrocarbon components, because of high correlation between the dissolution rate parameter and biodegradation rate parameter for each component. This approach was supported by independent experiments. Even with these simplifications, the model retained a considerable amount of complexity, and the observation data supported estimation of a large number of
9.3 STRATEGIES FOR RECALIBRATING EXISTING MODELS
227
parameters including individual first-order anaerobic biodegradation rates for all of the hydrocarbon components. 9.3
STRATEGIES FOR RECALIBRATING EXISTING MODELS
Models frequently are recalibrated as additional observations become available or as other new information is obtained. For groundwater systems, examples of other new information include new observations, possibly affected by different stresses such as pumpage or drought; new geologic interpretations; and new information on the distribution of areal recharge. Recalibrated models can be developed and evaluated using the methods described in previous chapters of this book. While methods that allow building on previous regression results, such as the Kalman filter (Dre´court and Madsen, 2002), could be used, for nonlinear models using nonlinear regression with the new information is more straightforward. Negative consequences such as greater execution time often are not serious enough to warrant using the more complicated methods. Model results can help determine when the possibly considerable additional investment in model recalibration is needed. While such decisions are often based largely on data and policy criteria, the model (or alternative models) generally provides the best available representation of system processes and can provide important insight. Table 9.1 presents the issues likely to be of concern and methods useful in addressing each issue. The guidelines presented in this book are also likely to be useful. If a model is recalibrated, the following issues need to be considered. .
Does the recalibrated model produce predictions that differ significantly from those produced using previous models?
To address this issue, compare predictions simulated using the recalibrated model to those from previous models. .
Is the uncertainty of the predictions greater or smaller than previously calculated?
To address this issue, compare linear and possibly nonlinear confidence intervals for predictions produced by the recalibrated model to intervals produced by previous models. Generally, it is expected that model uncertainty will be reduced when data are added, given the same number of parameters, but this may not always be the case, because of model nonlinearity. Even if the model construction is unchanged, the new estimated parameter values will cause the sensitivities to be different, and the effect of this difference on the calculated uncertainty may be greater than the effect of the information provided by the new observations. Often new observations motivate modifications to the model construction, such as changes in where parameters apply, what processes are included, and perhaps how many parameters are defined and/or estimated.
228
CALIBRATING TRANSIENT AND TRANSPORT MODELS
TABLE 9.1
Issues to Consider When Deciding if Model Recalibration Is Neededa
Issue
Section or Comment
Method
Do new observations suggest the model is incorrect or suggest an alternative model?
Do new system data suggest the model is incorrect or suggest an alternative model?
Is the information provided by new observations likely to affect estimated parameter values or parameter uncertainty? Is the information provided by new observations likely to affect predictions of interest? Is new information about parameter values or model construction likely to affect predictions of interest?
Observation residuals and weighted residuals. Use graphical analysis to compare model fit to new observations with the fit to observations used in model calibration. Compare new system data with model input and parameter values. Framing new data as prior information and calculating prior residuals and weighted residuals is sometimes useful. dss, css, pcc Leverage statistics Influence statistics: Cook’s D DFBETAS
6.2.1
Guideline 4, 6, 9
6.4.1 to 6.4.4
Usually requires GIS or 3D visualization;
2, 6, 10
6.2.1, 6.4.1 to 6.4.4
4.3 4.3.6, 7.5.2
3, 11
7.5.2 7.5.2
opr
8.2.2
12
ppr
8.2.1
12
a
Validity of results depends on the accuracy of the model; consider analyzing results from the perspective of simplifications and approximations made in model construction.
9.4
EXERCISES (OPTIONAL)
In Exercises 9.1– 9.8 the model that was developed, calibrated, and evaluated in Exercises 2.1 through 8.2 is recalibrated using new hydraulic-head and streamflow
9.4 EXERCISES
229
gain observations obtained during a long-term transient aquifer test. Watersupply wells have been completed in both aquifers at the areal location shown in Figure 2.1a and the aquifer test was conducted using these wells. During the test, groundwater was withdrawn for 283 days at a rate of about 1.0 m3/s from each of the two aquifers (total pumping rate of about 2.0 m3/s). Because of fluctuations in the pumping rate during the aquifer test, the true average pumping rate is uncertain. The model used in the recalibration simulates steady-state groundwater flow without pumping (the conditions used in the previous exercises), and then uses that solution as initial conditions for a transient simulation of the aquifer test. Observations used for the recalibration include the observations used in the previous exercises and the observations of hydraulic head and river discharge collected during the aquifer test. In these exercises, the term “transient model” refers to the combined simulation that includes steady-state flow without pumping and transient flow during the aquifer test. The term “steady-state model” refers to the model without pumping that was used in previous exercises. Figure 9.3 shows the volumetric budget for the transient flow system with pumping, calculated using the true parameter values. Simulated heads at selected times are shown in Figure 9.4a –d. Exercises 9.1 and 9.2: Simulate Transient Hydraulic Heads and Perform Preparatory Steps Exercises 9.1 and 9.2 involve initial MODFLOW-2000 simulations of the transient model. Instructions for performing these simulations are available from the web site for this book listed in Chapter 1, Section 1.1. Students who are not performing the simulations may skip these exercises.
FIGURE 9.3 Transient budget showing simulated flows in the true system with pumping. Inflows are positive in sign; outflows are negative.
230
CALIBRATING TRANSIENT AND TRANSPORT MODELS
FIGURE 9.4 Simulated hydraulic heads for the true system with pumping, after (a) 4 days, (b) 58 days, (c) 283 days, and (d ) at steady state.
Exercise 9.3: Transient Parameter Definition Parameters needed for the transient model that were not applicable to the steady-state model are the pumping rate and the specific storage for each model layer. The pumping rate is treated as a potentially estimated parameter because of fluctuations in the pumping rate during the aquifer test. The regression is used here to estimate the constant rate that is most consistent with the observed drawdown. The names and starting values of storage and pumping parameters are given in Table 9.2. In addition, parameters HK_1, VK_CB, HK_2, K_RB, RCH_1, and RCH_2 are defined as for the steady-state model (see Exercise 3.1).
231
9.4 EXERCISES
TABLE 9.2 Parameter Names and Starting Values for Properties that Are Only Applicable in the Transient System Flow-System Property Specific storage in layer 1 Specific storage in layer 2 Pumping rate in each of model layers 1 and 2, in m3/s
Parameter Name SS_1 SS_2 Q_1&2
Starting Value 2.6 1025 4.0 1026 21.1
All work for this exercise involves modifying computer files, as described in the instructions on the web site for this book listed in Chapter 1, Section 1.1. Students who are not performing the simulations may skip this exercise. Exercise 9.4: Observations for the Transient Problem This exercise involves defining observations and their weights for the transient model. The observations are listed in Tables 9.3 and 9.4 and include steady-state hydraulic heads, drawdowns during pumping, and steady-state and transient discharge to the river. The head and drawdown observations are in the same locations used for the steadystate model (Figure 2.1b). As discussed for the steady-state model, the hydraulic-head observations were generated by including random error. The error in the elevation of each observation well has a mean of 0.0 and a variance of 1.0, and each water-level measurement error has a mean of zero and a variance of 0.0025. The total variance of the error in each hydraulic-head observation is 1.0025. There are three observations of groundwater discharge to the river, which include one observation for the steady-state conditions without pumping, and two observations during the aquifer test (Table 9.4). As in the steady-state model, the reach over which flow is measured extends the entire length of the river. (a) Define observations of hydraulic head, drawdown, and flow. Use the information in Tables 9.3 and 9.4 to define the observations of hydraulic head, drawdown, and flow in the appropriate input files, and simulate a forward model run. Instructions for performing this simulation are available from the web site for this book listed in Chapter 1, Section 1.1. Students who are not performing the simulations may skip this exercise. (b) Calculate weights on observations for the transient model. Problem . Use the information provided on head observation error, the discussion of transient observations in Section 9.1.2, and the discussion of weighting in Guideline 6 and Appendix A to explain the variance on drawdowns listed in Table 9.3.
232
1 1 1
1 1 1 1 1
1 1 1
1 1 1
hd02.ss hd02.1 hd02.4 hd02.108 hd02.283
hd03.ss hd03.1 hd03.283
hd04.ss hd04.1 hd04.283
Lay
13 13 13
10 10 10
4 4 4 4 4
3 3 3
Row
4 4 4
9 9 9
4 4 4 4 4
1 1 1
Column
0.0 1.0088 282.8595
0.0 1.0088 282.8595
0.0 1.0088 4.0353 107.6825 282.8595
0.0 1.0088 282.8595
Time of Observation, After Beginning of Simulation (days)
1 3 5
1 3 5
1 1 1 1 1
1 1 1
Reference Stress Period in MODFLOW
Transient Hydraulic-Head Observationsa
hd01.ss hd01.1 hd01.283
Observation Name
TABLE 9.3
0.0 0.0 272.7713
0.0 0.0 272.7713
0.0 1.0088 4.0353 107.6825 282.8595
0.0 1.0088 282.8595
Time of Observation, After Beginning of Reference Stress Period (days)
124.893 124.826 110.589
156.678 152.297 114.138
128.117 128.076 127.560 116.586 113.933
101.804 101.775 101.675
Observed Head (m)
0.000 0.067 14.304
0.000 4.381 42.540
0.000 0.041 0.557 11.531 14.184
0.000 0.029 0.129
Observed Drawdown (m) (Calculated by MODFLOW)
1.0025 1.0025 1.0025
1.0025 1.0025 1.0025
1.0025 1.0025 1.0025 1.0025 1.0025
1.0025 1.0025 1.0025
Variance of Head Observation Error (m2)
— 0.005 0.005
— 0.005 0.005
— 0.005 0.005 0.005 0.005
— 0.005 0.005
Variance of Drawdown Observation Error (m2)
233
2 2 2
2 2 2
2 2 2
2 2 2
2 2 2
hd06.ss hd06.1 hd06.283
hd07.ss hd07.1 hd07.283
hd08.ss hd08.1 hd08.283
hd09.ss hd09.1 hd09.283
hd10.ss hd10.1 hd10.283
18 18 18
10 10 10
10 10 10
10 10 10
4 4 4
14 14 14
6 6 6
18 18 18
9 9 9
1 1 1
4 4 4
6 6 6
All observations are located at cell centers.
a
1 1 1
hd05.ss hd05.1 hd05.283
0.0 1.0088 282.8595
0.0 1.0088 282.8595
0.0 1.0088 282.8595
0.0 1.0088 282.8595
0.0 1.0088 282.8595
0.0 1.0088 282.8595
1 3 5
1 3 5
1 3 5
1 3 5
1 3 5
1 3 5
0.0 0.0 272.7713
0.0 0.0 272.7713
0.0 0.0 272.7713
0.0 0.0 272.7713
0.0 0.0 272.7713
0.0 0.0 272.7713
142.020 142.007 122.099
176.374 176.373 138.132
158.135 152.602 114.918
101.112 101.160 100.544
126.537 126.542 112.172
140.961 140.901 119.285
0.000 0.013 19.921
0.000 0.001 38.242
0.000 5.533 43.217
0.000 20.048 0.568
0.000 20.005 14.365
0.000 0.060 21.676
1.0025 1.0025 1.0025
1.0025 1.0025 1.0025
1.0025 1.0025 1.0025
1.0025 1.0025 1.0025
1.0025 1.0025 1.0025
1.0025 1.0025 1.0025
— 0.005 0.005
— 0.005 0.005
— 0.005 0.005
— 0.005 0.005
— 0.005 0.005
— 0.005 0.005
234
0.0 10.0882 282.8595
Observation Name
flow01.ss flow01.10 flow01.283
1 5 5
Reference Stress Period
Data for Transient Flow Observations
Time of Observation, After Beginning of Simulation (days)
TABLE 9.4
0.0 0.0 272.7713
Time of Observation, After Beginning of Reference Stress Period (days) 4.4 4.1 2.2
Observed Gain in River Flow (m3/s)
0.10 — —
Coefficient of Variation of River Gain
— 0.38 0.21
Standard Deviation of River Gain (m3/s)
9.4 EXERCISES .
235
Calculate the weights on the hydraulic-head, drawdown, and flow observations for the transient model using information in Tables 9.3 and 9.4. Compare your results with the square roots of the weights shown in Figure 9.5.
Exercise 9.5: Evaluate Transient Model Fit Using Starting Parameter Values In this exercise, the initial fit is evaluated for the forward transient model run. Tables of observed and simulated hydraulic heads and flows are shown in Figure 9.5. Problem: Comment on the model fit achieved with the starting parameter values. Are there any residuals that are clearly outliers? How do the residuals compare to the weighted residuals? Exercise 9.6: Sensitivity Analysis for the Initial Model This exercise involves evaluating sensitivities for the transient model. For students performing the simulations, instructions for calculating the sensitivities are available from the web site for this book listed in Chapter 1, Section 1.1. (a) Evaluate contour maps of one-percent scaled sensitivities for the transient flow system. Contour maps of one-percent scaled sensitivities after 4, 58, and 283 days of pumpage, for parameters HK_1, HK_2, VK_CB, K_RB, SS_1, and SS_2 are shown in Figures 9.6 –9.8. One-percent scaled sensitivities for parameters RCH_1 and RCH_2 at all times are the same as those for the steady-state flow system without pumpage (shown in Figure 4.4). These maps reflect the flow system dynamics and are useful for understanding the effect of each parameter on the simulated hydraulic heads. Thus, they could be used to help guide collection of additional head observations that would provide information about individual parameters. However, note that for the transient model, there are a large number of maps, and that a location or time important to one parameter might not be important to another parameter. It is difficult to use these maps to clearly identify locations that would be most beneficial, for example, to improving a set of parameter estimates, and they do not address the issue of improving predictions. Limitations on the use of the one-percent scaled sensitivity maps are discussed further in Chapter 4, Section 4.3.7. This exercise focuses on using the physics of the groundwater flow system to understand the one-percent scaled sensitivities. In a confined flow system like the one considered here, the principal of superposition applies. This occurs because hydraulic head is a linear function of the applied fluxes, spatial dimension and time, as discussed in Chapter 1, Section 1.4.1. As a result of these linear relationships, hydraulic head calculated for the transient flow system, with applied fluxes of areal recharge and pumpage, is equal to the sum of the hydraulic head calculated for the flow system with areal recharge only and the drawdown calcu-
236
CALIBRATING TRANSIENT AND TRANSPORT MODELS
DATA AT THE HEAD LOCATIONS OBS# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
OBSERVATION NAME hd01.ss dd01.1 dd01.tr2 hd02.ss dd02.tr1 dd02.tr2 dd03.tr3 dd02.tr4 hd03.ss dd03.tr1 dd03.tr2 hd04.ss dd04.tr1 dd04.tr2 hd05.ss dd05.tr1 dd05.tr2 hd06.ss dd06.tr1 dd06.tr2 hd07.ss dd07.tr1 dd07.tr2 hd08.ss dd08.tr1 dd08.tr2 hd09.ss dd09.tr1 dd09.tr2 hd10.ss dd10.tr1 dd10.tr2
OBSERVATION * 102. -0.290E-01 -0.129 128. -0.410E-01 -0.557 -11.5 -14.2 157. -4.38 -42.5 125. -0.670E-01 -14.3 141. -0.600E-01 -21.7 127. 0.500E-02 -14.4 101. 0.480E-01 -0.568 158. -5.53 -43.2 176. -0.992E-03 -38.2 142. -0.130E-01 -19.9
SIMUL. EQUIV. * 100. -0.153E-04 -0.906E-01 139. -0.949E-02 -0.276 -13.0 -18.8 174. -3.67 -56.2 139. -0.163E-01 -18.8 157. -0.368E-01 -28.5 140. -0.125E-01 -19.2 103. -0.109E-02 -1.38 174. -5.81 -57.3 190. -0.506E-01 -49.5 157. -0.436E-02 -26.1
RESIDUAL 1.58 -0.290E-01 -0.384E-01 -11.2 -0.315E-01 -0.281 1.43 4.59 -17.7 -0.715 13.7 -14.4 -0.507E-01 4.54 -16.2 -0.232E-01 6.79 -13.1 0.175E-01 4.82 -1.76 0.491E-01 0.813 -15.8 0.277 14.0 -13.9 0.496E-01 11.3 -15.0 -0.864E-02 6.21
WIEGHTED WEIGHT**.5 RESIDUAL 0.999 14.1 14.1 0.999 14.1 14.1 14.1 14.1 0.999 14.1 14.1 0.999 14.1 14.1 0.999 14.1 14.1 0.999 14.1 14.1 0.999 14.1 14.1 0.999 14.1 14.1 0.999 14.1 14.1 0.999 14.1 14.1
1.58 -0.410 -0.543 -11.2 -0.446 -3.97 20.3 64.9 17.7 -10.1 194. -14.4 -0.718 64.3 -16.2 -0.328 96.0 -13.1 0.247 68.2 -1.75 0.694 11.5 -15.8 3.91 199. -13.9 0.702 159. -15.0 -0.122 87.8
............................................
DATA FOR FLOWS REPRESENTED USING THE RIVER PACKAGE OBS# OBSERVATION NAME 33 flow01.ss 34 flow01.10 35 flow01.283
MEAS. FLOW -4.40 -4.10 -2.20
CALC. FLOW -4.86 -4.72 -2.86
RESIDUAL 0.461 0.618 0.663
WEIGHTED WEIGHT**.5 RESIDUAL 2.27 1.05 2.63 1.63 4.76 3.16
FIGURE 9.5 Part of MODFLOW-2000 LIST output file showing initial model fit and weights for the head, drawdown, and flow observations.
9.4 EXERCISES
237
FIGURE 9.6 Contour maps of one-percent scaled sensitivity of hydraulic head to (a) – (c) parameter HK_1 [(@h/@HK_1) (HK_1/100)] and (d ) – ( f ) parameter HK_2 [(@h/ @HK_2) (HK_2/100) after 4, 58, and 283 days of pumping in the transient flow model, calculated using the starting parameter values. Contour labels apply to contours for both model layers.
238
CALIBRATING TRANSIENT AND TRANSPORT MODELS
FIGURE 9.7 Contour maps of one-percent scaled sensitivity of hydraulic head to (a) –(c) parameter K_RB [(@h/@K_RB) (K_RB/100)] and (d )–( f ) parameter VK_CB [(@h/ @VK_CB) (VK_CB/100) after 4, 58, and 283 days of pumping in the transient flow model, calculated using the starting parameter values. In (a)–(c), contour labels apply to contours for both model layers; in (d) – ( f ), bold contour labels apply to model layer 1 and italic contour labels apply to model layer 2.
9.4 EXERCISES
239
FIGURE 9.8 Contour maps of one-percent scaled sensitivity of hydraulic head to (a) – (c) parameter SS_1 [(@h/@SS_1) (SS_1/100)] and (d) – ( f ) parameter SS_2 [(@h/ @SS_2) (SS_2/100) after 4, 58, and 283 days of pumping in the transient flow model, calculated using the starting parameter values. In (a) – (c) and (e) and ( f ), contour labels apply to contours for both model layers; in (d ), bold contour labels apply to model layer 1 and italic contour labels apply to model layer 2.
240
CALIBRATING TRANSIENT AND TRANSPORT MODELS
lated for the flow system with pumpage only. Because taking the derivative is a linear process, the principle of superposition also applies to sensitivities. The one-percent scaled sensitivities of hydraulic head for the flow system with areal recharge only are those calculated for the steady-state model, and are shown in Figure 4.4. The one-percent sensitivities of hydraulic head to the hydraulicconductivity parameters HK_1, HK_2, K_RB, and VK_CB for the transient flow system without areal recharge (with pumpage only) are shown in Figures 9.9 and 9.10.
Problem: Explain the one-percent scaled sensitivity maps for the transient system by answering the following questions using your knowledge of the flow system and the principle of superposition: .
.
.
.
.
Why are the one-percent scaled sensitivities of hydraulic head to HK_1 and HK_2 positive for the flow system with pumpage only? Explain the distribution of the one-percent scaled sensitivities for HK_1 at 4 days and at 283 days. Use the sensitivities for HK_1 in Figure 4.4a, Figure 9.6a –c, and Figure 9.9a –c to convince yourself that the principle of superposition can be used to calculate sensitivities for this model. In the steady-state model, the one-percent scaled sensitivities for K_RB are the same throughout the model domain (Figure 4.4b). Why do the sensitivities for K_RB in the transient system vary over the model domain (Figure 9.7a– c and Figure 9.10a– c)? Why are the one-percent scaled sensitivities for SS_1 and SS_2 (Figure 9.8) concentric around the pumping wells at early time and nearly parallel to the river at late time?
(b) Use composite scaled sensitivities to evaluate the information observations provide about the defined parameters. In preparation for performing nonlinear regression, examine the composite scaled sensitivities for the parameters of the transient model, shown in Figure 9.11. Use these css to help decide which parameters to estimate by regression. Problem . Which parameters have the smallest and largest composite scaled sensitivities? . Using suggestions from Chapter 4, Section 4.3.4 about evaluating relative and individual css values, determine which parameters are likely to be estimated by the regression, given the information provided by the observations.
9.4 EXERCISES
241
FIGURE 9.9 Contour maps of one-percent scaled sensitivity of hydraulic head to (a) – (c) parameter HK_1 [(@h/@HK_1) (HK_1/100)] and (d ) – ( f ) parameter HK_2 [(@h/ @HK_2) (HK_2/100) after 4, 58, and 283 days of pumping in the transient model without areal recharge (with pumpage only), calculated using the starting parameter values. Contour labels apply to contours for both model layers.
242
CALIBRATING TRANSIENT AND TRANSPORT MODELS
FIGURE 9.10 Contour maps of one-percent scaled sensitivity of hydraulic head to (a) – (c) parameter K_RB [(@h/@K_RB) (K_RB/100)] and (d )–( f ) parameter VK_CB [(@h/ @VK_CB) (VK_CB/100)] after 4, 58, and 283 days of pumping in the transient model without areal recharge (with pumpage only), calculated using the starting parameter values. In (a)–(c), contour labels apply to contours for both model layers; in (d ) – ( f ), bold contour labels apply to model layer 1 and italic contour labels apply to model layer 2.
9.4 EXERCISES
243
FIGURE 9.11 Composite scaled sensitivities calculated at the starting parameter values for the transient model.
(c) Evaluate parameter correlation coefficients. As discussed for the steady-state regression, it is important to use parameter correlation coefficients for the initial model to assess the likelihood of uniquely estimating all flow system parameters given the available observation data. The correlation coefficients calculated by MODFLOW-2000 are shown in Table 9.5. Problem . Are any of the correlation coefficients greater than 0.95 in absolute value? Are any greater than 0.90 in absolute value? . What do the correlation coefficients indicate about the likelihood of estimating all of the parameters independently using the head, drawdown, and flow data? Exercise 9.7: Estimate Parameters for the Transient System by Nonlinear Regression In most applications, the problems of sensitivity and uniqueness identified by the analysis above would lead to first trying to estimate the more sensitive parameters and then, using the updated values, attempt the regression with the less sensitive parameters as well. Here, however, model execution times are relatively short, so it is feasible to try estimating all the parameters in the first regression run. Generally, execution times for parameter estimation can be long and tend to be longer when using UCODE_2005 than when using MODFLOW-2000 because of the perturbation sensitivity calculations performed by UCODE_2005. Approximate execution times can be calculated as described in Chapter 15, Section 15.1.
244
CALIBRATING TRANSIENT AND TRANSPORT MODELS
TABLE 9.5 Correlation Coefficient Matrix for Starting Parameter Values Using the Hydraulic-Head, Drawdown, and Flow Observations, Calculated for the Transient Problem by MODFLOW-2000a,b Q_1&2 Q_1&2 1.00 SS_1 HK_1 K_RB VK_CB SS_2 HK_2 RCH_1 RCH_2 a b
SS_1
HK_1 K_RB VK_CB SS_2 HK_2 RCH_1 RCH_2
20.91 1.00
20.99 20.057 20.67 20.41 20.96 20.66 20.83 0.88 20.078 0.80 0.043 0.89 0.58 0.75 1.00 20.029 0.68 0.41 0.92 0.67 0.82 1.00 20.36 0.38 0.22 0.051 0.055 1.00 20.23 0.61 0.43 0.55 Symmetric 1.00 0.41 0.30 0.35 1.00 0.62 0.81 1.00 0.16 1.00
The matrix produced using UCODE_2005 is nearly identical. Correlation coefficients greater than 0.95 in absolute value are in bold type.
Performing nonlinear regression for the transient model involves modifying computer files and simulating either MODFLOW-2000 or UCODE_2005. For students performing the computer simulations, instructions are available from the web site for this book listed in Chapter 1, Section 1.1. For students not performing the simulations, Figure 9.12 summarizes the results of the regression run. Problem . Examine the results shown in Figure 9.12a. What happened during this regression run? . The parameter values calculated for all regression iterations are shown in Figure 9.12b. On the basis of this figure, which parameter does the regression have the most difficulty estimating? Is the answer consistent with the information about the regression behavior shown in Figure 9.12a? . The starting, estimated, and true parameter values are shown in Table 9.6. Why do the estimated values differ from the true values? . In this problem the starting parameter values are close to the final parameter values. Given the information provided about objective functions in Chapters 4 and 5, what problems might be expected given starting parameter values that are progressively further from the optimal values? Exercise 9.8: Evaluate Measures of Model Fit This exercise evaluates model fit for the transient model regression performed in Exercise 9.7, using statistics shown in Figure 9.13. Problem 2 . What conclusion about model fit might be drawn from the result that s (the calculated error variance of Figure 9.13) is less than 1.0?
245
9.4 EXERCISES (a) SELECTED STATISTICS FROM MODIFIED GAUSS-NEWTON ITERATIONS MAX. ITER. ----1 2 3 4 5 6
PARAMETER CALC. PARNAM --------------VK_CB K_RB K_RB K_RB K_RB K_RB
CHANGE MAX. MAX. CHANGE -----------0.868519 1.43887 0.894564 0.346358 0.559047E-01 0.182992E-02
CHANGE ALLOWED ----------2.00000 2.00000 2.00000 2.00000 2.00000 2.00000
DAMPING PARAMETER -----------1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
SUMS OF SQUARED WEIGHTED RESIDUALS FOR EACH ITERATION SUMS OF SQUARED WEIGHTED RESIDUALS ITER. OBSERVATIONS PRIOR INFO. 1 0.13469E+06 0.0000
TOTAL 0.13469E+06
2
625.19
0.0000
625.19
3
38.638
0.0000
38.638
4
26.242
0.0000
26.242
5
23.892
0.0000
23.892
6
23.846
0.0000
23.846
FINAL
23.841
0.0000
23.841
*** PARAMETER ESTIMATION CONVERGED BY SATISFYING THE TOL CRITERION ***
FIGURE 9.12 (a) Selected statistics from the modified Gauss – Newton iterations from Exercise 9.7. This is a fragment from the global output file of MODFLOW-2000. (b) Normalized parameter values at the end of each iteration of the transient regression. For each parameter, the graphed values are normalized by the starting value of the parameter (see Table 9.6).
246
CALIBRATING TRANSIENT AND TRANSPORT MODELS
TABLE 9.6 Starting, Estimated, and True Parameter Values for the Transient Model
Parameter Name Q_1&2 SS_1 HK_1 K_RB VK_CB SS_2 HK_2 RCH_1 RCH_2
Starting Value
Estimated Value in Steady-State Regression
Estimated Value in Transient Regression
True Value
21.10 2.6 1025 3.0 1024 1.2 1023 1.0 1027 4.0 1026 4.0 1025 63.072 31.536
— — 4.6 1024 1.2 1023 9.9 1028 — 1.5 1025 47.45 38.53
21.07 2.3 1025 4.3 1024 1.3 1023 2.2 1027 1.2 1026 4.8 1025 34.10 50.44
21.00 2.0 1025 4.0 1024 1.0 1023 2.0 1027 2.0 1026 4.4 1025 31.536 47.304
LEAST-SQUARES OBJ FUNC (DEP.VAR. ONLY)- - - - LEAST-SQUARES OBJ FUNC (W/PARAMETERS)- - - - - CALCULATED ERROR VARIANCE- - - - - - - - - - - - - - STANDARD ERROR OF THE REGRESSION- - - - - - - - - CORRELATION COEFFICIENT- - - - - - - - - - - - - - - - W/PARAMETERS- - - - - - - - - - - - - - - - - - - - ITERATIONS- - - - - - - - - - - - - - - - - - - - - - - - - - -
= = = = = = =
23.841 23.841 0.91697 0.95758 0.99999 0.99999 6
MAX LIKE OBJ FUNC = -35.070 AIC STATISTIC- - - - = -17.070 BIC STATISTIC- - - - = -3.0715
FIGURE 9.13 Selected statistics related to overall model fit, from the modified Gauss – Newton iterations of the regression run in Exercise 9.7. This is a fragment from the global output file of MODFLOW-2000
.
.
Construct a confidence interval for the true error variance (Eq. (6.2)) to determine if the deviation of s2 from a value of 1.0 is significant. How does this result affect the answer to the question in the previous bullet? Calculate the fitted standard deviation for heads and for drawdowns. Do the fitted standard deviations suggest that the model provides a good fit to these data?
Exercise 9.9: Perform Graphical Analyses of Model Fit and Evaluate Related Statistics In this exercise, the fit of the transient model to the head, drawdown, and flow data is evaluated using graphical methods and associated statistics.
9.4 EXERCISES
247
FIGURE 9.14 (a) Weighted residuals versus simulated values, (b) weighted observed values versus weighted simulated values, and (c) observed versus simulated values for the transient regression.
(a) Evaluate graphs of weighted residuals and weighted and unweighted simulated and observed values. Graphs for analyzing model fit are shown in Figure 9.14. Problem . Do the weighted residuals appear to be randomly distributed with respect to the simulated values? . Comment on the utility of the graphs in Figure 9.14b,c for analyzing the model fit to the data. . Does the correlation R between the weighted simulated values and weighted observed values, shown in Figure 9.13, provide evidence that there is a good fit of the model to the data?
248
CALIBRATING TRANSIENT AND TRANSPORT MODELS
FIGURE 9.15 Weighted residuals for the transient regression plotted on maps of the two model layers.
(b) Evaluate graphs of weighted residuals against independent variables and the runs statistic. In this exercise, the randomness of the hydraulic head and drawdown weighted residuals is evaluated (a) graphically with respect to their spatial location in the model, and (b) by applying the runs statistic to residuals as ordered in the input files containing the observations. The graphical analysis is conducted using
9.4 EXERCISES
249
# RESIDUALS >=0.: 18 # RESIDUALS =0. IS GREATER THAN 10 AND #RESIDUALS