Statistical Methods for Fuzzy Data

RED BOX RULES ARE FOR PROOF STAGE ONLY. DELETE BEFORE FINAL PRINTING. VIERTL Statistical Methods for Fuzzy Data Reinha...

Author: Reinhard Viertl

172 downloads 3541 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

RED BOX RULES ARE FOR PROOF STAGE ONLY. DELETE BEFORE FINAL PRINTING.

VIERTL

Statistical Methods for Fuzzy Data Reinhard Viertl, Vienna University of Technology, Austria

Statistical analysis methods have to be adapted for the analysis of fuzzy data. In this book the foundations of the description of fuzzy data are explained, including methods on how to obtain the characterizing function of fuzzy measurement results. Furthermore, statistical methods are then generalized to the analysis of fuzzy data and fuzzy a-priori information. Key features:

• Provides basic methods for the mathematical description of fuzzy data,

as well as statistical methods that can be used to analyze fuzzy data. • Describes methods of increasing importance with applications in areas such as environmental statistics and social science. • Complements the theory with exercises and solutions and is illustrated throughout with diagrams and examples. • Explores areas such as quantitative description of data uncertainty and mathematical description of fuzzy data. This book is aimed at statisticians working with fuzzy logic, engineering statisticians, finance researchers, and environmental statisticians. The book is written for readers who are familiar with elementary stochastic models and basic statistical methods.

Statistical Methods for Fuzzy Data

Statistical data are not always precise numbers, or vectors, or categories. Real data are frequently what is called fuzzy. Examples where this fuzziness is obvious are quality of life data, environmental, biological, medical, sociological and economics data. Also the results of measurements can be best described by using fuzzy numbers and fuzzy vectors respectively.


Reinhard Viertl

P1: SBT fm JWST020-Viertl

November 30, 2010

13:58

Printer Name: Yet to Come


November 30, 2010

13:58




November 30, 2010

13:58



November 30, 2010

13:58


Statistical Methods for Fuzzy Data Reinhard Viertl Vienna University of Technology, Austria

A John Wiley and Sons, Ltd., Publication


November 30, 2010

13:58


This edition first published 2011 © 2011 John Wiley & Sons, Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data Viertl, R. (Reinhard) Statistical methods for fuzzy data / Reinhard Viertl. p. cm. Includes bibliographical references and index. ISBN 978-0-470-69945-4 (cloth) 1. Fuzzy measure theory. 2. Fuzzy sets. 3. Mathematical statistics. I. Title. QA312.5.V54 2010 515 .42–dc22 2010031105 A catalogue record for this book is available from the British Library. Print ISBN: 978-0-470-69945-4 ePDF ISBN: 978-0-470-97442-1 oBook ISBN: 978-0-470-97441-4 ePub ISBN: 978-0-470-97456-8 Typeset in 10/12pt Times by Aptara Inc., New Delhi, India


November 30, 2010

13:58


Contents Preface

xi

Part I FUZZY INFORMATION

1

1 1.1 1.2 1.3 1.4 1.5

Fuzzy data One-dimensional fuzzy data Vector-valued fuzzy data Fuzziness and variability Fuzziness and errors Problems

3 3 4 4 4 5

2 2.1 2.2 2.3 2.4

Fuzzy numbers and fuzzy vectors Fuzzy numbers and characterizing functions Vectors of fuzzy numbers and fuzzy vectors Triangular norms Problems

7 7 14 16 18

3 3.1 3.2 3.3 3.4 3.5 3.6 3.7

Mathematical operations for fuzzy quantities Functions of fuzzy variables Addition of fuzzy numbers Multiplication of fuzzy numbers Mean value of fuzzy numbers Differences and quotients Fuzzy valued functions Problems

21 21 23 25 25 27 27 28

Part II DESCRIPTIVE STATISTICS FOR FUZZY DATA

31

4 4.1 4.2 4.3 4.4

33 33 33 34 34

Fuzzy samples Minimum of fuzzy data Maximum of fuzzy data Cumulative sum for fuzzy data Problems


vi

November 30, 2010

13:58


CONTENTS

5 5.1 5.2 5.3 5.4

Histograms for fuzzy data Fuzzy frequency of a fixed class Fuzzy frequency distributions Axonometric diagram of the fuzzy histogram Problems

37 37 38 40 41

6 6.1 6.2 6.3 6.4

Empirical distribution functions Fuzzy valued empirical distribution function Fuzzy empirical fractiles Smoothed empirical distribution function Problems

43 43 45 45 47

7 Empirical correlation for fuzzy data 7.1 Fuzzy empirical correlation coefficient 7.2 Problems

49 49 52

Part III FOUNDATIONS OF STATISTICAL INFERENCE WITH FUZZY DATA

53

8 8.1 8.2 8.3 8.4

Fuzzy probability distributions Fuzzy probability densities Probabilities based on fuzzy probability densities General fuzzy probability distributions Problems

55 55 56 57 58

9 9.1 9.2 9.3 9.4 9.5

A law of large numbers Fuzzy random variables Fuzzy probability distributions induced by fuzzy random variables Sequences of fuzzy random variables Law of large numbers for fuzzy random variables Problems

59 59 61 62 63 64

10 10.1 10.2 10.3 10.4

Combined fuzzy samples Observation space and sample space Combination of fuzzy samples Statistics of fuzzy data Problems

65 65 66 66 67

Part IV CLASSICAL STATISTICAL INFERENCE FOR FUZZY DATA

69

11 Generalized point estimators 11.1 Estimators based on fuzzy samples

71 71


November 30, 2010

13:58


CONTENTS

vii

11.2 Sample moments 11.3 Problems

73 74

12 12.1 12.2 12.3

Generalized confidence regions Confidence functions Fuzzy confidence regions Problems

75 75 76 79

13 13.1 13.2 13.3

Statistical tests for fuzzy data Test statistics and fuzzy data Fuzzy p-values Problems

81 81 82 86

Part V BAYESIAN INFERENCE AND FUZZY INFORMATION

87

14 14.1 14.2 14.3

Bayes’ theorem and fuzzy information Fuzzy a priori distributions Updating fuzzy a priori distributions Problems

91 91 92 96

15 15.1 15.2 15.3

Generalized Bayes’ theorem Likelihood function for fuzzy data Bayes’ theorem for fuzzy a priori distribution and fuzzy data Problems

97 97 97 101

16 16.1 16.2 16.3

Bayesian confidence regions Bayesian confidence regions based on fuzzy data Fuzzy HPD-regions Problems

103 103 104 106

17 17.1 17.2 17.3 17.4

Fuzzy predictive distributions Discrete case Discrete models with continuous parameter space Continuous case Problems

107 107 108 110 111

18 18.1 18.2 18.3 18.4 18.5

Bayesian decisions and fuzzy information Bayesian decisions Fuzzy utility Discrete state space Continuous state space Problems

113 113 114 115 116 117


viii

November 30, 2010

13:58


CONTENTS

Part VI REGRESSION ANALYSIS AND FUZZY INFORMATION

119

19 19.1 19.2 19.3 19.4 19.5

121 121 125 128 131 132

Classical regression analysis Regression models Linear regression models with Gaussian dependent variables General linear models Nonidentical variances Problems

20 Regression models and fuzzy data 20.1 Generalized estimators for linear regression models based on the extension principle 20.2 Generalized confidence regions for parameters 20.3 Prediction in fuzzy regression models 20.4 Problems

133 134 138 138 139

21 21.1 21.2 21.3 21.4 21.5 21.6 21.7

Bayesian regression analysis Calculation of a posteriori distributions Bayesian confidence regions Probabilities of hypotheses Predictive distributions A posteriori Bayes estimators for regression parameters Bayesian regression with Gaussian distributions Problems

141 141 142 142 142 143 144 144

22 22.1 22.2 22.3 22.4

Bayesian regression analysis and fuzzy information Fuzzy estimators of regression parameters Generalized Bayesian confidence regions Fuzzy predictive distributions Problems

147 148 150 150 151

Part VII FUZZY TIME SERIES

153

23 23.1 23.2 23.3

Mathematical concepts Support functions of fuzzy quantities Distances of fuzzy quantities Generalized Hukuhara difference

155 155 156 161

24 Descriptive methods for fuzzy time series 24.1 Moving averages 24.2 Filtering 24.2.1 Linear filtering 24.2.2 Nonlinear filters

167 167 169 169 173


November 30, 2010

13:58


CONTENTS

ix

24.3 Exponential smoothing 24.4 Components model 24.4.1 Model without seasonal component 24.4.2 Model with seasonal component 24.5 Difference filters 24.6 Generalized Holt–Winter method 24.7 Presentation in the frequency domain

175 176 177 177 182 184 186

25 25.1 25.2 25.3 25.4

189 189 190 193 196

More on fuzzy random variables and fuzzy random vectors Basics Expectation and variance of fuzzy random variables Covariance and correlation Further results

26 Stochastic methods in fuzzy time series analysis 26.1 Linear approximation and prediction 26.2 Remarks concerning Kalman filtering

199 199 212

Part VIII APPENDICES

215

A1

List of symbols and abbreviations

217

A2

Solutions to the problems

223

A3

Glossary

245

A4

Related literature

247

References

251

Index

253


November 30, 2010

13:58



November 30, 2010

13:58


Preface Statistics is concerned with the analysis of data and estimation of probability distribution and stochastic models. Therefore the quantitative description of data is essential for statistics. In standard statistics data are assumed to be numbers, vectors or classical functions. But in applications real data are frequently not precise numbers or vectors, but often more or less imprecise, also called fuzzy. It is important to note that this kind of uncertainty is different from errors; it is the imprecision of individual observations or measurements. Whereas counting data can be precise, possibly biased by errors, measurement data of continuous quantities like length, time, volume, concentrations of poisons, amounts of chemicals released to the environment and others, are always not precise real numbers but connected with imprecision. In measurement analysis usually statistical models are used to describe data uncertainty. But statistical models are describing variability and not the imprecision of individual measurement results. Therefore other models are necessary to quantify the imprecision of measurement results. For a special kind of data, e.g. data from digital instruments, interval arithmetic can be used to describe the propagation of data imprecision in statistical inference. But there are data of a more general form than intervals, e.g. data obtained from analog instruments or data from oscillographs, or graphical data like color intensity pictures. Therefore it is necessary to have a more general numerical model to describe measurement data. The most up-to-date concept for this is special fuzzy subsets of the set R of real numbers, or special fuzzy subsets of the k-dimensional Euclidean space Rk in the case of vector quantities. These special fuzzy subsets of R are called nonprecise numbers in the one-dimensional case and nonprecise vectors in the k-dimensional case for k > 1. Nonprecise numbers are defined by so-called characterizing functions and nonprecise vectors by so-called vector-characterizing functions. These are generalizations of indicator functions of classical sets in standard set theory. The concept of fuzzy numbers from fuzzy set theory is too restrictive to describe real data. Therefore nonprecise numbers are introduced. By the necessity of the quantitative description of fuzzy data it is necessary to adapt statistical methods to the situation of fuzzy data. This is possible and generalized statistical procedures for fuzzy data are described in this book.


xii

November 30, 2010

13:58


PREFACE

There are also other approaches for the analysis of fuzzy data. Here an approach from the viewpoint of applications is used. Other approaches are mentioned in Appendix A4. Besides fuzziness of data there is also fuzziness of a priori distributions in Bayesian statistics. So called fuzzy probability distributions can be used to model nonprecise a priori knowledge concerning parameters in statistical models. In the text the necessary foundations of fuzzy models are explained and basic statistical analysis methods for fuzzy samples are described. These include generalized classical statistical procedures as well as generalized Bayesian inference procedures. A software system for statistical analysis of fuzzy data (AFD) is under development. Some procedures are already available, and others are in progress. The available software can be obtained from the author. Last but not least I want to thank all persons who contributed to this work: Dr D. Hareter, Mr H. Schwarz, Mrs D. Vater, Dr I. Meliconi, H. Kay, P. Sinha-Sahay and B. Kaur from Wiley for the excellent cooperation, and my wife Dorothea for preparing the files for the last two parts of this book. I hope the readers will enjoy the text. Reinhard Viertl Vienna, Austria July 2010

P1: SBT c01 JWST020-Viertl

November 12, 2010

16:50


Part I FUZZY INFORMATION Fuzzy information is a special kind of information and information is an omnipresent word in our society. But in general there is no precise definition of information. However, in the context of statistics which is connected to uncertainty, a possible definition of information is the following: Information is everything which has influence on the assessment of uncertainty by an analyst. This uncertainty can be of different types: data uncertainty, nondeterministic quantities, model uncertainty, and uncertainty of a priori information. Measurement results and observational data are special forms of information. Such data are frequently not precise numbers but more or less nonprecise, also called fuzzy. Such data will be considered in the first chapter. Another kind of information is probabilities. Standard probability theory is considering probabilities to be numbers. Often this is not realistic, and in a more general approach probabilities are considered to be so-called fuzzy numbers. The idea of generalized sets was originally published in Menger (1951) and the term ‘fuzzy set’ was coined in Zadeh (1965).


November 12, 2010

16:50



November 12, 2010

16:50


1

Fuzzy data All kinds of data which cannot be presented as precise numbers or cannot be precisely classified are called nonprecise or fuzzy. Examples are data in the form of linguistic descriptions like high temperature, low flexibility and high blood pressure. Also, precision measurement results of continuous variables are not precise numbers but always more or less fuzzy.

1.1

One-dimensional fuzzy data

Measurement results of one-dimensional continuous quantities are frequently idealized to be numbers times a measurement unit. However, real measurement results of continuous quantities are never precise numbers but always connected with uncertainty. Usually this uncertainty is considered to be statistical in nature, but this is not suitable since statistical models are suitable to describe variability. For a single measurement result there is no variability, therefore another method to model the measurement uncertainty of individual measurement results is necessary. The best up-to-date mathematical model for that are so-called fuzzy numbers which are described in Section 2.1 [cf. Viertl (2002)]. Examples of one-dimensional fuzzy data are lifetimes of biological units, length measurements, volume measurements, height of a tree, water levels in lakes and rivers, speed measurements, mass measurements, concentrations of dangerous substances in environmental media, and so on. A special kind of one-dimensional fuzzy data are data in the form of intervals [a; b] ⊆ R. Such data are generated by digital measurement equipment, because they have only a finite number of digits.

Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

3


4

November 12, 2010

16:50


STATISTICAL METHODS FOR FUZZY DATA

Figure 1.1 Variability and fuzziness.

1.2

Vector-valued fuzzy data

Many statistical data are multivariate, i.e. ideally the corresponding measurement results are real vectors (x1 , . . . , x k ) ∈ Rk . In applications such data are frequently not precise vectors but to some degree fuzzy. A mathematical model for this kind of data is so-called fuzzy vectors which are formalized in Section 2.2. Examples of vector valued fuzzy data are locations of objects in space like positions of ships on radar screens, space–time data, multivariate nonprecise data in the form of vectors (x1∗ , . . . , x n∗ ) of fuzzy numbers xi∗ .

1.3

Fuzziness and variability

In statistics frequently so-called stochastic quantities (also called random variables) are observed, where the observed results are fuzzy. In this situation two kinds of uncertainty are present: Variability, which can be modeled by probability distributions, also called stochastic models, and fuzziness, which can be modeled by fuzzy numbers and fuzzy vectors, respectively. It is important to note that these are two different kinds of uncertainty. Moreover it is necessary to describe fuzziness of data in order to obtain realistic results from statistical analysis. In Figure 1.1 the situation is graphically outlined. Real data are also subject to a third kind of uncertainty: errors. These are the subject of Section 1.4.

1.4

Fuzziness and errors

In standard statistics errors are modeled in the following way. The observation y of a stochastic quantity is not its true value x, but superimposed by a quantity e, called


November 12, 2010

16:50


FUZZY DATA

5

error, i.e. y = x + e. The error is considered as the realization of another stochastic quantity. These kinds of errors are denoted as random errors. For one-dimensional quantities, all three quantities x, y, and e are, after the experiment, real numbers. But this is not suitable for continuous variables because the observed values y are fuzzy. It is important to note that all three kinds of uncertainty are present in real data. Therefore it is necessary to generalize the mathematical operations for real numbers to the situation of fuzzy numbers.

1.5

Problems

(a) Find examples of fuzzy numerical data which are not given in Section 1.1 and Section 1.2. (b) Work out the difference between stochastic uncertainty and fuzziness of individual observations. (c) Make clear how data in the form of intervals are obtained by digital measurement devices. (d) What do X-ray pictures and data from satellite photographs have in common?


November 12, 2010

16:50



October 29, 2010

20:52


2

Fuzzy numbers and fuzzy vectors Taking care of the fuzziness of data described in Chapter 1 it is necessary to have a mathematical model to describe such data in a quantitative way. This is the subject of Chapter 2.

2.1

Fuzzy numbers and characterizing functions

In order to model one-dimensional fuzzy data the best up-to-date mathematical model is so-called fuzzy numbers. Definition 2.1: A fuzzy number x ∗ is determined by its so-called characterizing function ξ (·) which is a real function of one real variable x obeying the following: (1) ξ : R → [0; 1]. (2) ∀δ ∈ (0; 1] the so-called δ-cut C δ (x ∗ ) := {x ∈ R : ξ (x) ≥ δ} is a finite union kδ [aδ, j ; bδ, j ] = ∅. of compact intervals, [aδ, j ; bδ, j ], i.e. C δ (x ∗ ) = j=1

(3) The support of ξ (·), defined by supp[ξ (·)] := {x ∈ R : ξ (x) > 0} is bounded. The set of all fuzzy numbers is denoted by F(R). For the following and for applications it is important that characterizing functions can be reconstructed from the family (Cδ (x ∗ ); δ ∈ (0; 1]), in the way described in Lemma 2.1. Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

7


8

October 29, 2010

20:52



Lemma 2.1: For the characterizing function ξ (·) of a fuzzy number x ∗ the following holds true: ξ (x) = max δ · ICδ (x ∗ ) (x) : δ ∈ [0; 1]

∀x ∈ R

Proof: For fixed x0 ∈ R we have δ · ICδ (x ∗ ) (x0 ) = δ · I{x:ξ (x)≥δ} (x0 ) =

δ 0

for ξ (x 0 ) ≥ δ for ξ (x 0 ) < δ.

Therefore we have for every δ ∈ [0;1] δ · ICδ (x ∗ ) (x0 ) ≤ ξ (x0 ), and further sup δ · ICδ (x ∗ ) (x0 ) : δ ∈ [0; 1] ≤ ξ (x0 ). On the other hand we have for δ0 = ξ (x0 ): δ0 · ICδ0 (x ∗ ) (x0 ) = δ0 and therefore sup δ · ICδ (x ∗ ) (x0 ) : δ ∈ [0; 1] ≥ δ0 which implies sup δ · ICδ (x ∗ ) (x0 ) : δ ∈ [0; 1] = max δ · ICδ (x ∗ ) (x0 ) : δ ∈ [0; 1] = δ0 . Remark 2.1: In applications fuzzy numbers are represented by a finite number of δ-cuts. Special types of fuzzy numbers are useful to define so-called fuzzy probability distribution. These kinds of fuzzy numbers are denoted as fuzzy intervals. Definition 2.2: A fuzzy number is called a fuzzy interval if all its δ-cuts are non-empty closed bounded intervals. In Figure 2.1 examples of fuzzy intervals are depicted. The set of all fuzzy intervals is denoted by FI (R). Remark 2.2: Precise numbers x0 ∈ R are represented by its characterizing function I{x0 } (·), i.e. the one-point indicator function of the set {x0 }. For this characterizing function the δ-cuts are the degenerated closed interval [x0 ; x0 ]. = {x 0 }. Therefore precise data are specialized fuzzy numbers. In Figure 2.2 the δ-cut for a characterizing function is explained. Special types of fuzzy intervals are so-called LR- fuzzy numbers which are defined by two functions L : [0; ∞) → [0; 1] and R : [0, ∞) → [0, 1] obeying the following: (1) L(·) and R(·) are left-continuous. (2) L(·) and R(·) have finite support. (3) L(·) and R(·) are monotonic nonincreasing.


October 29, 2010

20:52


FUZZY NUMBERS AND FUZZY VECTORS

9

Figure 2.1 Characterizing functions of fuzzy intervals. Using these functions the characterizing function ξ (·) of an LR-fuzzy interval is defined by: ⎧

m−s−x ⎪ ⎪ L for ⎪ ⎪ ⎨ l ξ (x) = 1

for ⎪ ⎪ m − s − x ⎪ ⎪ for ⎩R r

x <m−s m−s ≤ x ≤m+s x > m + s,

where m, s, l, r are real numbers obeying s ≥ 0, l > 0, r > 0. Such fuzzy numbers are denoted by m, s, l, r LR . A special type of LR-fuzzy numbers are the so-called trapezoidal fuzzy numbers, denoted by t ∗ (m, s, l, r ) with L(x) = R(x) = max {0, 1 − x}

∀x ∈ [0; ∞).


10

October 29, 2010

20:52



Figure 2.2 Characterizing function and a δ-cut. The corresponding characterizing function of t ∗ (m, s, l, r ) is given by ⎧ x −m +s +l ⎪ ⎪ ⎪ ⎪ l ⎨ 1 ξ (x) = m + s + r − x ⎪ ⎪ ⎪ ⎪ r ⎩ 0

for m − s − l ≤ x < m − s for m − s ≤ x ≤ m + s for m + s < x ≤ m + s + r otherwise.

In Figure 2.3 the shape of a trapezoidal fuzzy number is depicted. The δ-cuts of trapezoidal fuzzy numbers can be calculated easily using the socalled pseudo-inverse functions L −1 (·) and R −1 (·) which are given by L −1 (δ) = max {x ∈ R : L(x) ≥ δ} R −1 (δ) = max {x ∈ R : R(x) ≥ δ} Lemma 2.2: The δ-cuts Cδ (x ∗ ) of an LR-fuzzy number x ∗ are given by Cδ (x ∗ ) = [m − s − l L −1 (δ); m + s + r R −1 (δ)] ∀δ ∈ (0, 1].


October 29, 2010

20:52



11

Figure 2.3 Trapezoidal fuzzy number. Proof: The left boundary of Cδ (x ∗ ) is determined by min {x : ξ (x) ≥ δ}. By the definition of LR-fuzzy numbers for l > 0 we obtain

m−s−x ≥ δ and x < m − s min {x : ξ (x) ≥ δ} = min x : L l m−s−x = min x : ≤ L −1 (δ) and x < m − s l = min x : x ≥ m − s − l L −1 (δ) and x < m − s = m − s − l L −1 (δ). The proof for the right boundary is analogous. An important topic is how to obtain the characterizing function of fuzzy data. There is no general rule for that, but for different important measurement situations procedures are available. For analog measurement equipment often the result is obtained as a light point on a screen. In this situation the light intensity on the screen is used to obtain the characterizing function. For one-dimensional quantities the light intensity h(·) is normalized, i.e. ξ (x) :=

h(x) ∀x ∈ R, max {h(x) : x ∈ R}

and ξ (·) is the characterizing function of the fuzzy observation. For light points on a computer screen the function h(·) is given on finitely many pixels x 1 , . . . , x N with intensities h(xi ), i = 1(1)N . In order to obtain the characterizing function ξ (·) we consider the discrete function h(·) defined on the finite set {x1 , . . . , x N }.


12

October 29, 2010

20:52



Let the distance between the points x 1 < x2 < . . . < x N be constant and equal to x. Defining a function η(·) on the set {x 1 , . . . , x N } by η(xi ) :=

h(xi ) for i = 1(1)N , max {h(xi ) : i = 1(1)N }

the characterizing function ξ (·) is obtained in the following way: Based on the function η(·) the values ξ (x) are defined for all x ∈ R by ⎧ ⎪ ⎪ 0 for ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ for η(x 1 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ for max{η(x 1 ), η(x2 )} ⎪ ⎪ ⎪ ⎪ ⎪ . ⎪ .. ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ for max{η(xi−1 ), η(xi )} ⎪ ⎪ ⎪ ⎪ ⎨ ξ (x) := η(xi ) for ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ for max{η(xi ), η(xi+1 )} ⎪ ⎪ ⎪ ⎪ . ⎪ ⎪ .. ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ max{η(x N −1 ), η(x N )} for ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ for η(x N ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩0 for

x x < x1 − 2

x x x ∈ x1 − ; x1 + 2 2 x x = x1 + 2 x x = xi + 2

x x x ∈ xi − ; xi + 2 2 x x = xi + 2 x x = xN − 2

x x x ∈ xN − ; xN + 2 2 x . x > xN + 2

Remark 2.3: ξ (·) is a characterizing function of a fuzzy number. For digital measurement displays the results are decimal numbers with a finite number of digits. Let the resulting number be y, then the remaining (infinite many) decimals are unknown. The numerical information contained in the result is an interval [x; x], where x is the real number obtained from y by taking the remaining decimals all to be 0, and x is the real number obtained from y by taking the remaining decimals all to be 9. The corresponding characterizing function of this fuzzy number is the indicator function I[x;x] (·) of the closed interval [x; x]. Therefore the characterizing function ξ (·) is given by its values ⎧ ⎨ 0 for ξ (x) := 1 for ⎩ 0 for

⎫ x < x, ⎬ x ≤ x ≤ x for all x ∈ R. ⎭ x>x


October 29, 2010

20:52



13

If the measurement result is given by a color intensity picture, for example diameters of particles, the color intensity is used to obtain the characterizing function ξ (·). Let h(·) be the color intensity describing the boundary then the derivative h (·) of h(·) is used, i.e.

ξ (x) :=

h (x) max {|h (x)| : x ∈ R}

∀x ∈ R.

An example is given in Figure 2.4. For color intensity pictures on digital screens a discrete version for step functions can be applied. Let {x 1 , . . . , x N } be the discrete values of the variable and h(xi ) be the color intensities at position xi as before, and x the constant distance between the points xi . Then the discrete analog of the derivative h (·) is given by the step function η(·)

Figure 2.4 Construction of a characterizing function.


14

October 29, 2010

20:52



which is constant in the intervals xi −

x ; xi 2

+

x 2

:

η (x) := |h (xi − ε) − h (xi + ε)| with 0 < ε
A2 [2 Pr{T ≥ t2 (δ)}; min{1, Pr{T ≥ t1 (δ)}] if A1 ≤ A2

∀δ ∈ (0; 1].

Proposition 13.1: The intervals Cδ ( p∗ ) defined above are δ-cuts of a characterizing function. Proof: Since t ∗ is a fuzzy number with characterizing function η(·), the intervals [t1 (δ); t2 (δ)] are δ-cuts for all δ ∈ (0; 1]. If t ∗ is a fuzzy interval then the δ-cuts Cδ ( p∗ ) are closed finite intervals [ p1 (δ); p2 (δ)]. Therefore in this case p ∗ is a fuzzy interval. Note that p1 (δ) ≥ 0 and p2 (δ) ≤ 1 for all δ ∈ (0; 1]. Therefore the δ-cuts of p∗ can be interpreted in terms of probabilities and compared with the significance level α of the test. Now we have the following decision rule for fuzzy values t ∗ of the test statistic T: If, for all δ ∈ (0; 1] and p1 (δ) ≤ p2 (δ) p1 (α) < α : reject H p2 (α) > α : accept H for α ∈ [ p1 (δ); p2 (δ)] take more observations. Remark 13.2: In the third case the uncertainty of making a decision is expressed.


October 26, 2010

22:5


STATISTICAL TESTS FOR FUZZY DATA

85

f (x)

x

p

(a)

(b)

Figure 13.3 Construction of the characterizing function of the fuzzy p-value. Example 13.1 Let T have a standard normal distribution, and η(·) a triangular characterizing function of the fuzzy value t ∗ of the test statistic with center 0.7. Figure 13.3(a) shows the density f (·) of the standard normal distribution and the characterizing function η(·). In order to test the hypothesis H0: θ ≤ θ0 against the alternative H1: θ > θ0 taking θ0 = 0 we have a one-sided test. The fuzzy p-value p∗ is determined as above. The characterizing function of p ∗ is presented in Figure 13.3(b). Figure 13.3 shows in detail how the δ-cut of p∗ is computed for δ = 0.5. In Figure 13.3(a) the δ-cut C δ (t ∗ ) = [t1 (δ); t2 (δ)] for δ = 0.5 is indicated by two vertical lines. The exact p-value for t2 (0.5) is shown by the dark area, and for t1 (0.5) by the dark and gray area und the function f (·). These p-values form the δ-cut Cδ ( p∗ ) for p = 0.5, which is presented in Figure 13.3(b) at the intersection of the horizontal line with the vertical lines through the exact p-values. Using this procedure for all δ ∈ (0; 1], the characterizing function ξ (·) of p ∗ is obtained. Finally, the resulting fuzzy p-value is compared with the significance level α which has to be fixed in advance. If supp[ p ∗ ] does not contain α a decision is possible. Otherwise more data have to be taken in order to make a well-based decision. Example 13.2 Continuing Example 13.1 and assuming the characterizing function η(·) of t ∗ to be of triangular shape, we consider four cases as depicted in Figure 13.4(a). For the four characterizing functions η1 , . . . , η4 (·) the characterizing functions ξ1 (·), . . . , ξ4 (·) of the corresponding fuzzy p-values are given in Figure 13.4(b).


86

October 26, 2010

22:5


STATISTICAL METHODS FOR FUZZY DATA m

f (x)

x

p

(a)

(b)

f(·) is the standard normal density.

0

is rejected for η 1(·), not rejected for

η 3(·) until η 4(·). No decision is possible for η 2(·), all at significance level α = 0.5.

Figure 13.4 Fuzzy p-values for different fuzzy values t ∗ of a test statistic. Remark 13.3: For fuzzy p-values in the case of fuzzy hypotheses, see Parchami et al. (2010). Further details are given in Filzmoser and Viertl (2004).

13.3

Problems

(a) Take a fuzzy sample of a normal distribution N (µ, σ 2 ) whose characterizing functions are trapezoidal. How is the p-value for a hypothesis H: µ = µ0 with given µ0 calculated? (b) Work out the details of the construction of the characterizing function of p∗ in Figure 13.3. (c) Continuing from (a), how is the fuzzy p-value p ∗ calculated?


October 26, 2010

19:34


Part V BAYESIAN INFERENCE AND FUZZY INFORMATION In standard Bayesian statistics all unknown quantities are described by stochastic quantities and their probability distributions. Consequently, so are the parameters θ of stochastic models X ∼ f (·|θ) ; θ ∈ in the continuous case, and X ∼ p (·|θ) ; θ ∈ in the discrete case. The a priori knowledge concerning θ is expressed by a probability distribution π (·) on the parameter space , called a priori distribution. If the parameter θ is continuous, the a priori distribution has a density function, called a priori density. For discrete parameter space = {θ1 , . . . , θm } the a priori distribution is a discrete probability distribution with point probabilities π(θ j ), j = 1(1)m. If the observed stochastic quantity X ∼ p (·|θ) ; θ ∈ {θ1 , · · · , θm } is also discrete, i.e. the observation space M X of X is countable with M X = {x1 , x 2 , . . .} ◦

◦

◦

◦

and observations x 1 , . . . , xn of X are given [sample data D = (x 1 , . . . , x n )], then the conditional distribution Pr θ |D of the parameter is obtained via Bayes’ formula in the following way: The so-called a posteriori probabilities π (θ j |D) of the parameter ◦ ◦ values θ j , given the data D = (x 1 , . . . , x n ), are obtained by ◦ ◦ ◦ ◦ π θ j | x 1 , . . . , x n = Pr θ = θ j |X 1 = x 1 , . . . , X n = x n ◦ ◦ θ = θj Pr X 1 = x 1 , . . . , X n = x n , = ◦ ◦ Pr X 1 = x 1 , . . . , X n = x n


88

October 26, 2010

19:34



◦ ◦ p x 1 , . . . , x n |θ j · π (θ j ) = m ◦ ◦ p x 1 , . . . , x n |θ j · π(θ j )

∀θ j ∈ .

j=1 ◦

◦

◦In the ◦case of independent observations x 1 , . . . , x n the joint probability p x 1 , . . . , x |θ j is given by ◦

◦

p(x 1 , . . . , x n |θ j ) =

n

◦

p(x i |θ j ).

i=1

This probability distribution on the parameter space = {θ1 , . . . , θm } is called a posteriori distribution of θ , where θ denotes the stochastic quantity describing the uncertainty concerning the parameter. For continuous stochastic quantities X with parametric probability density f (.|θ ) and continuous parameter space ⊆ Rk , i.e. stochastic model X ∼ f (·|θ) ; θ ∈ , the a priori distribution of θ is given by a probability density π (·) on the parameter space . This probability density is called a priori density which obeys

π (θ )dθ = 1.

For observed sample x 1 , . . . , xn of X the so-called a posteriori density π (·|x 1 , · · · , x n ) is the conditional density of the parameter, given the data x1 , . . . , xn , i.e. π (θ|x1 , . . . , xn ) =

g (x1 , . . . , xn , θ ) h (x 1 , . . . , xn )

∀θ ∈

where g(., . . . , .) is the joint probability density of the stochastic vector X 1, . . . , X n , θ and h(x1 , . . . , xn ) is the marginal density of (X 1 = x1 , . . . , X n = xn ). In the case of independent observations x 1 , . . . , x n the joint density g(x1 , . . . , xn , θ ) is given by the following product: g (x1 , . . . , x n , θ ) =

n

f (xi |θ ) · π (θ ).

i=1

The function

n

i=1

f (xi |θ) = l (θ; x1 , . . . , xn ) is the likelihood function.


October 26, 2010

19:34


BAYESIAN INFERENCE AND FUZZY INFORMATION

89

The marginal density h(x1 , . . . , x n ) is given by

h (x1 , . . . , x n ) =

l (θ ; x1 , . . . , x n ) · π(θ )dθ ·

Therefore the a posteriori density is given by its values π (θ|x 1 , . . . , xn ) =

π (θ ) · l (θ ; x1 , . . . , xn ) π (θ ) · l (θ ; x1 , . . . , xn ) dθ

∀θ ∈ .

This is called Bayes’ theorem. Remark V.1: After the sample x1 , . . . , x n is observed these values are constants. Therefore h(x1 , . . . , x n ) is a normalizing constant, and Bayes’ theorem can be written in its short form π (θ|x 1 , . . . , x n ) ∝ π (θ ) · l (θ ; x1 , . . . , xn ) where ∝ stands for ‘proportional to π (θ ) · l(θ x 1 , . . . , xn ) up to a normalizing constant’. For more general data D – for example for censored data – the likelihood function has to be adapted. The general form of Bayes’ theorem is given by π (θ|D) ∝ π(θ ) · (θ ; D)

∀θ ∈ .


October 26, 2010

19:34



October 26, 2010

19:34


14

Bayes’ theorem and fuzzy information Looking at real data and realistic a priori distributions two kinds of fuzzy information appear. First, fuzzy samples and secondly, fuzzy a priori distributions. The necessity of generalizing Bayesian inference was pointed out in Viertl (1987). The description of fuzzy data is given in earlier parts of this book. Fuzzy a priori information can be expressed by fuzzy probability distributions in the sense of Chapter 8.

14.1

Fuzzy a priori distributions

In the case of discrete stochastic model p(.|θ ); θ ∈ and discrete parameter space = {θ1 , . . . , θk } fuzzy a priori information concerning the parameter can be expressed by k fuzzy intervals π ∗ (θ1 ), . . . , π ∗ (θk ) with π ∗ (θ j ) = Pr{θ j } for which π ∗ (θ1 ) ⊕ . . . ⊕ π ∗ (θk ) is a fuzzy interval whose characterizing function η (·) fulfills 1 ∈ C1 [η(·)] and the characterizing functions ξ j (·) of π ∗ (θ j ) are fulfilling the following: For each j ∈ {1, . . . , k} there exists a number p j ∈ C1 π ∗ (θ j ) such that k

p j = 1·

j=1


Reinhard Viertl

91


92

October 26, 2010

19:34



Since Cδ π ∗ (θ j ) are closed intervals for all δ ∈ (0; 1], the fuzzy probability of a subset 1 ⊂ with 1 = {θ j1 , . . . , θ jm } is denoted by P ∗ (1 ). The δ-cuts of P ∗ (1 ) are defined to be the set of all possible sums of numbers x j ∈ Cδ ( p∗jl ), l = 1(1)m, obeying k

x j = 1·

j=1

Remark 14.1: From the above definition it follows that P ∗ (∅) = 0 and P ∗ () = 1, and the inequalities from Remark 8.2 in Chapter 8. For continuous parameter space a fuzzy a priori distribution P ∗ (·) on is given by a fuzzy valued density function π ∗ (·) on as described in Section 8.1.

14.2

Updating fuzzy a priori distributions

For stochastic quantities X with parametric probability distribution p(.|θ ) or f (.|θ ), ◦ ◦ given observed data x 1 , · · · , x n the a priori distribution has to be updated in order to obtain up-to-date information concerning the parameter θ . updated information is the conditional distribution of θ , shown as This ◦ ◦ ◦ ◦ ∗ P ·|x 1 , · · · , x n , conditional on the sample x 1 , · · · , x n · For a discrete stochastic model X ∼ p (·|θ); θ ∈ with discrete parameter space = {θ1 , . . . , θk } ◦ ◦ the a posteriori distribution P ∗ ·|x 1 . . . , x n is given by its conditional point prob ◦ ◦ abilities p∗ θ j |x 1 . . . , x n , j = 1(1)k. These fuzzy point probabilities are obtained as conditional probabilities

◦

◦

π ∗ θ j |x 1 , . . . , x n

◦ ◦ Pr x 1 , . . . , x n |θ j π ∗ (θ j ) = k ◦ ◦ Pr x 1 , . . . , x n |θ j π ∗ (θ j ) j=1

=

n

◦ p x i |θ j π ∗ (θ j )

i=1

n k

j=1

i=1

◦ p x i |θ j π ∗ (θ j )

for j = 1(1)k·


October 26, 2010

19:34


BAYES’ THEOREM AND FUZZY INFORMATION

93

This can be written using the likelihood function as

◦

◦

π ∗ θ j |x 1 , · · · , x n

◦ ◦ π ∗ (θ j ) · l θ j ; x 1 , · · · , x n = k . ◦ ∗ ◦ x x π (θ j ) · l θ j ; 1 , . . . , n j=1

◦ ◦ The value π ∗ (θ j ) · l θ j ; x 1 , . . . , x n is a product of a fuzzy interval π ∗ (θ j ) and ◦ ◦ a real number l θ j ; x 1 , . . . , x n . ◦ ◦ Therefore π ∗ (θ j ) · l θ j ; x 1 , . . . , x n is also a fuzzy interval. Using the generalized arithmetic for fuzzy numbers (cf. Chapter 3) the fuzzy a ◦ ◦ posteriori probabilities π ∗ θ j |x 1 , . . . , x n can be calculated. ◦ ◦ ◦ In order to make the notation compact we use the abbreviation x = x 1 , . . . , x n · ◦ Using this, the δ-cuts Cδ π ∗ (θ j | x) of the a posteriori probabilities are given, using the δ-cuts C δ π ∗ (θ j ) = π δ (θ j ); π δ (θ j ) of the a priori probabilities, by ◦

π δ (θ j ) · l(θ j ; x) π δ (θ j | x) = k ◦ π δ (θ j ) · l(θ j ; x) ◦

j=1

and ◦

π (θ j ) · l(θ j ; x) π δ (θ j | x) = k δ ◦ π δ (θ j ) · l(θ j ; x) ◦

for all δ ∈ (0; 1].

j=1

Recalling the sequential property of the Bayesian updating procedure, i.e. splitting ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ the data x 1 , . . . , x n into two parts x 1 = (x 1 , . . . , x m ) and x 2 = (x m+1 , . . . , x n ), and ◦ ◦ calculating π ∗ (·|x 1 ), then using this as a priori distribution for the next sample x 2 , the ◦ ◦ resulting a posteriori distribution should be the same as if the whole sample x 1 , . . . x n would be used in one step to calculate the a posteriori distribution. Unfortunately the above generalization of Bayes’ theorem fails to keep this. This is explained below for the continuous case X ∼ f (·|θ ); θ ∈ with fuzzy a priori density π ∗ (·) on the continuous parameter space . Considering a sample (x 1 , . . . , x n ) = x of X the δ-level curves would be given by π δ (θ |x) =

π δ (θ ) · l(θ ; x) π δ (θ ) · l(θ ; x)dθ


94

October 26, 2010

19:34



and π δ (θ |x) =

π δ (θ ) · l(θ ; x) π δ (θ ) · l(θ ; x)dθ

for all δ ∈ (0; 1].

Using the splitting of the sample x1 , . . . , xn above into x1 , . . . , x m and xm+1 , . . . , x n we obtain π δ (θ |x 1 ) =

π δ (θ ) · l(θ ; x 1 ) π δ (θ ) · l(θ ; x 1 )dθ

and π δ (θ |x 1 ) =

π δ (θ ) · l(θ ; x 1 ) . π δ (θ ) · l(θ ; x 1 )dθ

Using this fuzzy density π ∗ (·|x 1 ) as a priori distribution for the second updating based on the second sample (x m+1 , . . . , xn ) = x 2 we obtain for the lower δ-level curve: π δ (θ |x 1 , x 2 ) =

=

=

π δ (θ |x 1 ) · l(θ ; x 2 ) π δ (θ |x 1 ) · l(θ ; x 2 )dθ π δ (θ )·l(θ ;x 1 ) l(θ ; x ) 2 π δ (θ )·l(θ ;x 1 )dθ

π δ (θ )·l(θ ;x 1 ) l(θ ; x )dθ 2 π δ (θ )·l(θ ;x 1 )dθ

π δ (θ ) · l(θ ; x 1 )dθ π δ (θ ) · l(θ ; x 1 )dθ

π δ (θ ) · l(θ ; x 1 ) · l(θ ; x 2 ) π δ (θ ) · l(θ ; x 1 ) · l(θ ; x 2 )dθ

·

Observing l(θ ; x 1 ) · l(θ ; x 2 ) = l(θ ; x) and by

π δ (θ ) · l(θ ; x 1 )dθ π δ (θ ) · l(θ ; x 1 )dθ

=C ≤1

we obtain π δ (θ |x 1 , x 2 ) = C · π δ (θ |x).


October 26, 2010

19:34


BAYES’ THEOREM AND FUZZY INFORMATION

95

Therefore the results differ by a constant C. In order to keep the sequential nature of the updating procedure in Bayes’ theorem it is necessary to use a common denominator for π δ (·|x) and π δ (·|x). The natural way to do this is to use the mean value π δ (θ ) + π δ (θ ) . 2 Using this, the following form of the Bayes’ updating procedure is obtained: π δ (θ |x) :=

π δ (θ ) · l(θ ; x) π δ (θ )+π δ (θ ) · l(θ ; x)dθ 2

and π δ (θ |x) :=

π δ (θ ) · l(θ ; x) π δ (θ )+π δ (θ ) · l(θ ; x)dθ 2

.

Using this generalization of Bayes’ theorem the two-step calculated fuzzy a posteriori density π ∗ (·|x 1 , x 2 ) equals the one-step calculated a posteriori density π ∗ (·|x). In order to prove this we use the abbreviation

N (x) :=

π δ (θ ) + π δ (θ ) · l(θ ; x)dθ. 2

For the lower δ-level curve of π ∗ (·|x 1 , x 2 ) with actual a priori density π ∗ (·|x 1 ) we obtain π δ (θ |x 1 , x 2 ) =

=

=

π δ (θ |x 1 ) · l(θ ; x 2 ) π δ (θ |x 1 )+π δ (θ |x 1 ) · l(θ ; x 2 )dθ 2 N (x 1 )−1 · π δ (θ ) · l(θ ; x 1 ) · l(θ ; x 2 ) π δ (θ )·l(θ ;x 1 )+π δ (θ )·l(θ ;x 1 ) 2

· l(θ ; x 2 )dθ

π δ (θ ) · l(θ ; x 1 ) · l(θ ; x 2 ) π δ (θ )+π δ (θ ) · l(θ ; x 1 ) · l(θ ; x 2 )dθ 2

= π δ (θ |x).

N (x 1 )−1 ·

The upper δ-level curve π δ (·|x 1 , x 2 ) is proved to be equal to π δ (θ |x) in an analogous way. So for classical samples x = (x1 , . . . , x n ) a suitable generalization of Bayes’ theorem to the case of fuzzy a priori distributions is given. The situation of fuzzy a priori distribution and fuzzy sample x1∗ , . . . , x n∗ is solved in Chapter 15.


96

14.3

October 26, 2010

19:34



Problems

(a) Looking at a fuzzy discrete probability distribution P ∗ on the finite set {x1 , . . . , xk } with fuzzy point probabilities p∗ (x j ), j = 1(1)k explain the validity of the rules for a fuzzy probability distribution given in Section 8.3. (b) Prove the validity of the sequential nature for the upper δ-level curves of the fuzzy a posteriori density π ∗ (·|x). (c) Show that the above procedure for fuzzy a posteriori densities gives the classical a posteriori density if the a priori density is a classical density.


November 1, 2010

8:17


15

Generalized Bayes’ theorem For continuous stochastic models X ∼ f (·|θ ); θ ∈ with continuous parameter space in general a priori distributions as well as observations are fuzzy. Therefore it is necessary to generalize Bayes’ theorem to this situation.

15.1

Likelihood function for fuzzy data

In the case of fuzzy data x1∗ , . . . , x n∗ the likelihood function l(θ ; x 1 , . . . , xn ) has to be generalized to the situation of fuzzy variables x1∗ , . . . , x n∗ . The basis for that is the combined fuzzy sample element x ∗ from Chapter 10. Then the generalized likelihood function l ∗ (θ ; x ∗ ) is represented by its δ-level functions l δ (·; x ∗ ) and l δ (·; x ∗ ) for all δ ∈ (0; 1]. For the δ-cuts of the fuzzy value l ∗ (θ ; x ∗ ) we have Cδ (l ∗ (θ ; x ∗ )) = l δ (θ ; x ∗ ), l δ (θ ; x ∗ ) . Using this and the construction from Chapter 14 in order to keep the sequential property of the updating procedure in Bayes’ theorem, the generalization of Bayes’ theorem to the situation of fuzzy a priori distribution and fuzzy data is possible. Remark 15.1: The generalized likelihood function l ∗ (θ ; x ∗ ) is a fuzzy valued function in the sense of Section 3.6, i.e. l ∗ : → FI ([0; ∞)).

15.2

Bayes’ theorem for fuzzy a priori distribution and fuzzy data

Using the averaging procedure of δ-level curves of the a priori density from Section 14.2 and combining it with the generalized likelihood function from Section Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

97


98

November 1, 2010

8:17



15.1, the generalization of Bayes’ theorem is possible. The construction is based on δ-level functions. Based on a fuzzy a priori density π ∗ (·) on with δ-level functions π δ (·) and π δ (·), and a fuzzy sample x 1∗ , . . . , x n∗ with combined fuzzy sample x ∗ whose vector-characterizing function is ξ (., . . . , .), the characterizing function ψl ∗ (θ ;x ∗ ) (·) of l ∗ (θ ; x ∗ ) is obtained by the extension principle from Section 3.1, i.e. ψl ∗ (θ ;x ∗ ) (y) =

sup ξ (x) : l(θ ; x) = y 0 if ∃/ x : l(θ ; x) = y

∀y ∈ R.

The δ-level curves of the fuzzy a posteriori density π ∗ (·|x1∗ , . . . x n∗ ) = π ∗ (·|x ∗ ) are defined in the following way: π δ (θ |x ∗ ) :=

π δ (θ ) · l δ (θ ; x ∗ )

π δ (θ ) · l δ (θ ; x ∗ ) + π δ (θ ) · l δ (θ ; x ∗ ) dθ

1 2

and π δ (θ |x ∗ ) :=

π δ (θ ) · l δ (θ ; x ∗ )

π δ (θ ) · l δ (θ ; x ∗ ) + π δ (θ ) · l δ (θ ; x ∗ ) dθ

1 2

for all δ ∈ (0; 1]. Remark 15.2: For the definition of the δ-level curves of the fuzzy a posteriori density the sequentially calculated a posteriori density π ∗ (·|x ∗1 , x ∗2 ) is the same as ∗ π ∗ (·|x ∗ ), where x 1∗ , . . . , x n∗ = x1∗ , . . . , x m∗ , xm+1 , . . . , x n∗ and x ∗1 = (x1∗ , . . . , xm∗ ) and ∗ ∗ ∗ x 2 = (x m+1 , . . . , x n ). To see this we calculate the δ-cut Cδ π ∗ (θ |x ∗1 , x ∗2 ) = π δ (θ |x ∗1 , x ∗2 ); π δ (θ |x ∗1 , x ∗2 ) using π ∗ (·|x ∗1 ) and x ∗2 , and the abbreviation ⎤ ⎡ 1 N (x ∗ ) = ⎣ π δ (θ ) · l δ (θ ; x ∗ )dθ + π δ (θ ) · l δ (θ ; x ∗ )dθ ⎦ 2 1 ∗ = π δ (θ ) · l δ (θ ; x ) + π δ (θ ) · l δ (θ ; x ∗ )dθ . 2

π δ (θ |x ∗1 , x ∗2 ) =

=

1 2

π δ (θ |x ∗1 ) · l δ (θ ; x ∗2 )

π δ (θ |x ∗1 ) · l δ (θ ; x ∗2 ) + π δ (θ |x ∗1 ) · l δ (θ ; x ∗2 ) dθ

1 N (x ∗1 )−1 2

N (x ∗1 )−1 π δ (θ ) · l δ (θ ; x ∗1 ) · l(θ ; x ∗2 )

π δ (θ )·l δ (θ ; x ∗1 )·l δ (θ ; x ∗2 )+π δ (θ )·l δ (θ ; x ∗1 )·l δ (θ ; x ∗2 ) dθ


November 1, 2010

8:17


GENERALIZED BAYES’ THEOREM

=

=

1 2

1 2

99

π δ (θ ) · l δ (θ ; x ∗1 ) · l δ (θ ; x ∗2 )

π δ (θ )

· l δ (θ ; x ∗1 )

· l δ (θ ; x ∗2 ) + π δ (θ ) · l δ (θ ; x ∗1 ) · l δ (θ ; x ∗2 ) dθ

π δ (θ ) · l δ (θ ; x ∗ )

= π δ (θ |x ∗ ). π δ (θ ) · l δ (θ ; x ∗ ) + π δ (θ ) · l δ (θ ; x ∗ ) dθ

The proof for the upper δ-level curves is analogous and left to the reader as a problem. Remark 15.3: For standard a priori densities and precise data this concept reduces to the classical Bayes’ theorem. Example 15.1 For X ∼ E x θ ; θ ∈ = (0; ∞), i.e. exponential distribution with density f (x|θ ) =

1 −x e θ I(0;∞) (x) θ

a fuzzy a priori density π ∗ (·) on is given for which some δ-level curves are depicted in Figure 15.1.

π 0+ (θ ) and π 0+ (θ ) are the boundary points of {x ∈ : ξπ *(θ ) ( x) > 0} where

ξπ *(θ ) (⋅) is the characterizing function of the fuzzy interval π*(θ).

Figure 15.1 Some δ-level curves of π ∗ (·).


100

November 1, 2010

8:17



ξi (⋅) is the characterizing function of xi*.

Figure 15.2 Fuzzy sample. In Figure 15.2 the characterizing functions of eight fuzzy observations x1∗ ,. . . , x8∗ of X are given. Applying the generalized Bayes’ theorem the δ-level curves of the a posteriori density π ∗ (·|x1∗ , . . . , x 8∗ ) are obtained. For the present example some δ-level curves π δ (·|x1∗ , . . . , x 8∗ ) and π δ (·|x1∗ , . . . , x 8∗ ) are depicted in Figure 15.3.

x

x x x

x x x

Figure 15.3 Fuzzy a posteriori density.


November 1, 2010

8:17


GENERALIZED BAYES’ THEOREM

15.3

101

Problems

(a) Prove the validity of the sequential updating in the generalized Bayes’ theorem for the upper δ-level curves of the fuzzy a posteriori density. (b) Prove that in case of precise data and classical a priori density the generalized Bayes’ theorem coincides with the classical Bayes’ theorem.


November 1, 2010

8:17



October 26, 2010

19:41


16

Bayesian confidence regions In contrast to classical confidence regions in objectivist statistics Bayesian confidence regions have an immediate probabilistic interpretation also after the sample is observed. This is by the availability of the a posteriori distribution π(·|D). A Bayesian confidence region for the parameter θ of a stochastic model X ∼ f (·|θ ); θ ∈ and confidence level 1 – α is a subset 1−α of for which Pr θ˜ ∈ 1−α |D = 1 − α. In the case of continuous parameter space the confidence region 1−a is defined by π (θ |D)dθ = 1 − α. 1−α

For fuzzy data D ∗ the concept of Bayesian confidence regions has to be generalized.

16.1

Bayesian confidence regions based on fuzzy data

In the case of classical a priori distributions the generalization to the situation of fuzzy samples x 1∗ , . . . , x n∗ is possible using the combined fuzzy sample x ∗ , whose vector-characterizing is denoted by ξ (., . . . , .). The generalization is similar to that in Section 12.2. Let x,1−α be a standard Bayesian confidence region for θ based on a classical sample x = (x1 , . . . , xn ). The Bayesian confidence region for θ based on a fuzzy sample, whose combined fuzzy sample has vector-characterizing function ξ : M Xn → [0; 1], is a fuzzy subset ∗1−α of whose membership function ϕ(·) is given by its values ϕ(θ ) in the following Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

103


104

October 26, 2010

19:41



way: ϕ(θ ) =

sup ξ (x) : θ ∈ x ,1−α 0

if ∃x : θ ∈ x,1−α if ∃/ x : θ ∈ x,1−α

∀θ ∈

Remark 16.1: For classical samples x = (x1 , . . . , xn ) the membership function ϕ(·) reduces to the indicator function of x,1−α , i.e. ϕ(θ ) = Ix,1−α (θ ) ∀θ ∈ . The proof of this assertion is left to the reader as a problem.

16.2

Fuzzy HPD-regions

Classical HPD-regions H,1−α are defined for standard a posteriori densities π(·|x) in the following way: H,1−α is a subset of for which π(θ |x) dθ = 1 − α

(16.1)

H,1−α

such that π (θ |x) ≥ C

∀θ ∈ H,1−α

(16.2)

where C is the maximal possible constant such that (16.1) is fulfilled. In the case of classical a priori distribution and fuzzy data x1∗ , . . . , xn∗ with combined fuzzy sample x ∗ = ˆ ξ (., . . . , .), the concept of highest posterior density regions can be generalized in the following way: For confidence level 1 − α and all θ ∈ , and x ∈ supp(x ∗ ) = supp [ξ (., . . . , .)], a classical subset B(θ, x) of is defined by B(θ, x) := θ 1 ∈ : π(θ 1 |x) ≥ π(θ |x) . The generalized HPD-region for θ based on the fuzzy sample x ∗ is the fuzzy subset ∗H,1−α whose membership function ψ(·) is determined by is values

ψ(θ ) =

⎧ ⎨ ⎩

sup ξ (x) :

π(θ 1 |x)dθ 1 ≤ 1 − α

B(θ,x)

0

where x ∈ supp [ξ (., . . . , .)].

if ∃x :

B(θ,x)

otherwise

π (θ 1 |x)dθ 1 ≤ 1 − α

⎫ ⎬ ⎭

∀θ ∈ ,


October 26, 2010

19:41


BAYESIAN CONFIDENCE REGIONS

105

Figure 16.1 Fuzzy sample x1∗ , . . . , x 8∗ . Remark 16.2: For this construction in the case of classical data x = (x1 , . . . , x n ) the resulting membership function ψ(·) is the indicator function of the classical HPDregion, i.e. ψ(·) = I H,1−α (·). The proof of this is left to the reader as a problem. Example 16.1 For X ∼ f (x|θ ) = θ · e−θ x I(0;∞) (x), = (0; ∞), and a priori density π(·) a Gamma density Gam(1, 1), a fuzzy sample x 1∗ , . . . , x 8∗ , with corresponding characterizing functions ξ1 (·), . . . , ξ8 (·), is depicted in Figure 16.1. The membership functions of fuzzy HPD-intervals for θ are depicted in Figure 16.2. The different membership functions correspond to different confidence levels 1 − α. Remark 16.3: The construction of generalized confidence sets based on fuzzy a posteriori distributions in the general case is an open research topic.

Figure 16.2 Fuzzy HPD-intervals for θ.


106

16.3

October 26, 2010

19:41



Problems

(a) Prove the statement in Remark 16.1. (b) Prove the statement in Remark 16.2.


October 26, 2010

19:45


17

Fuzzy predictive distributions Let X ∼ Pθ ; θ ∈ be a stochastic model with a priori distribution π (·). Then in standard Bayesian inference the predictive distribution of X , given observed data D, is denoted by X |D. There are three practically important situations for predictions:

r discrete X and discrete ; r discrete X and continuous ; r continuous X and continuous . For the standard situation predictive distributions conditional on observed samples D = (x1 , . . . , xn ) are available. In the case of fuzzy a priori distributions and fuzzy samples (x1∗ . . . , xn∗ ) of X the concept of predictive distributions has to be generalized.

17.1

Discrete case

For discrete stochastic quantities X ∼ p(·|θ ), θ ∈ = {θ1 , . . ., θm }, and sam ple x1 , . . . , x n of X , the a posteriori probabilities π θ j |x1 , . . . , xn = Pr θ˜ = θ j |x1 , . . . , x n are obtained as explained at the beginning of Part V. The predictive distribution of X |x1 , . . . , x n is given by its predictive probabili p(x|θ j )π (θ j |x1 , . . . xn ), for all x ∈ Mx where Mx is the ties p(x|x 1 , . . . , x n ) = θ j ∈

observation space of X , i.e. the set of all possible values of X . Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

107


108

October 26, 2010

19:45



In the case of fuzzy a posteriori distribution π ∗ (θ j |x1 , . . . , xn ) the so-called fuzzy predictive distribution of X |x1 , . . . , xn is given by its fuzzy probabilities

p∗ (x|x 1 , . . . , xn ) =

p(x|θ j ) π ∗ (θ j |x1 , . . . , xn ).

θ j ∈

The fuzzy a posteriori probabilities π ∗ (θ j |x1 , . . . x n ) are obtained as explained in Section 14.2. Remark 17.1: Observations from discrete stochastic quantities can be exact values xi , whereas observations from continuous quantities are usually fuzzy.

17.2

Discrete models with continuous parameter space

For discrete model X ∼ p(·|θ ); θ ∈ where is continuous with a priori density π(·) on , and data D, Bayes’ theorem has the form π(θ |D) ∝ π(θ ) · l(θ ; D) where l(.; D) is the likelihood function and ∝ stands for ‘proportional’ which means equal up to a multiplicative constant, i.e. π(θ |D) =

π(θ ) · l(θ ; D) π(θ ) · l(θ ; D)dθ

∀θ ∈ .

Example 17.1 Let X ∼ Aθ ; θ ∈ [0; 1] have a Bernoulli distribution with point probabilities p(1|θ ) = θ and p(0|θ ) = 1 − θ . For a priori density π (·) = I[0;1] (·) on = [0; 1] and observed sample x 1 , . . . , xn with xi ∈ {0, 1} the a posteriori density is obtained using the likelihood function l(θ ; x1 , . . . , xn ) =

n

θ (1 − θ ) xi

i=1

1−xi

xi ∈ {0, 1} θ ∈ [0; 1].

Therefore the a posteriori density π (·|x1 , . . . , xn ) is fulfilling π(θ |x 1 , . . . , xn ) ∝ π(θ ) · l(θ ; x1 , . . . , xn )

∀θ ∈ [0; 1]

or π (θ ) · π(θ |x1 , . . . , x n ) =

n

θ xi (1 − θ )1−xi

i=1

1 0

π (θ )·

n i=1

θ xi (1 − θ )1−xi dθ

.


October 26, 2010

19:45


109

0

1

2

3

4

FUZZY PREDICTIVE DISTRIBUTIONS

0.0

0.2

0.4

0.6

0.8

1.0

˜ Figure 17.1 A priori and a posteriori density of θ. By π(·) = I[0;1] (·) we obtain n

π(θ |x1 , . . . , xn ) ∝ θ

i=1

xi

(1 − θ )

n−

n

i=1

xi

I[0;1] (θ )

which is proportional to the density of a beta distribution. Be

n i=1

xi + 1, n −

n

xi + 1 .

i=1

In Figure 17.1 the uniform a priori density on [0;1] and a corresponding a posteriori density is depicted. From this the information gain by the reduced uncertainty concerning θ can be seen. The predictive distribution of X is given by the point probabilities p(x|D) = p(x|θ ) · π(θ |D)dθ in the case of continuous parameter space .

For fuzzy a posteriori density π ∗ (·) of θ˜ , the generalized point probabilities p (x|D ∗ ) are obtained by application of the generalized integration from Section 3.6, ∗


110

October 26, 2010

19:45



Definition 3.3: p (x|D ) := − p(x|θ ) π ∗ (θ |D ∗ )dθ, ∗

∗

∀x ∈ M X .

Remark 17.2: By this definition the fuzzy probabilities p ∗ (x|D ∗ ) define a fuzzy probability distribution on the observation space M X of X .

17.3

Continuous case

For continuous stochastic models X ∼ f (·|θ ), θ ∈ with continuous parameter θ and fuzzy a posteriori density π ∗ (·|D ∗ ) on the parameter space , the fuzzy predictive density f ∗ (·|D ∗ ) of X is defined by the generalization of the standard predictive density: f (x|D ) := − f (x|θ ) π ∗ (θ |D ∗ )dθ ∗

∗

where the integral is the generalized integral defined in Section 3.6. Remark 17.3: f ∗ (·|D ∗ ) is a fuzzy density in the sense of Definition 8.1 from Section 8.1.

.

.

.

.

Figure 17.2 δ-level curves of f ∗ (·|x ∗ ).


October 26, 2010

19:45


FUZZY PREDICTIVE DISTRIBUTIONS

111

Example 17.2 In continuation of Example 16.1 the calculation of the fuzzy predictive density f ∗ (·|x ∗ ) can be calculated. In Figure 17.2 some δ-level curves of f ∗ (·|x ∗ ) are depicted for the fuzzy sample x1∗ , . . . , x 8∗ from Figure 16.1. The combined fuzzy sample x ∗ is constructed by the minimum combination rule.

17.4

Problems

(a) Prove the assertion in Remark 17.2. (b) Prove the assertion in Remark 17.3.


October 26, 2010

19:45



November 12, 2010

17:15


18

Bayesian decisions and fuzzy information Decisions are frequently connected with utility or loss. The quantification of the utility U (d) of a decision d can be a difficult task. Therefore a suitable model is to describe the utility by a fuzzy number U * (d). Moreover the utility of a decision depends on the state of the considered system. Therefore the utility depends on both the state θ and the decision d, i.e. U (θ, d).

18.1

Bayesian decisions

In Bayesian decision analysis it is possible to include a priori information in the form of a priori distributions, and also a posteriori distributions on the space of possible states of the considered decision situation. Let = {θ: θ possible state} be the state space, D = {d : d possible decision} be the set of possible decisions, and P be a probability distribution on the state space . Moreover let U (.,.) be a utility function, i.e. U: × D → [0; ∞), where U (θ, d) denotes the utility of the decision d if the system is in state θ . Assuming that the corresponding sums or integrals exist, the basis for optimal decisions is the expected utility ˜ d). E P U (θ, Here θ˜ is the stochastic quantity describing the random character of the system. Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

113


114

November 12, 2010

17:15



Definition 18.1: Based on the context above the Bayesian decision d B is the solution of the following optimization problem: ˜ d B ) = max E P U (θ˜ , d) : d ∈ D . E P U (θ, Therefore the Bayesian decision d B is maximizing the expected utility. Remark 18.1: If the approach by loss considerations is used, instead of utility functions so-called loss functions L(θ, d) are considered. In this case Bayesian decisions are defined by minimizing expected loss E P L(θ˜ , d). Remark 18.2: Expected utility and expected loss are calculated using the probability distribution P in the following way: ˜ d) = E P U (θ,

U (θ, d) d P(θ )

and ˜ d) = E P L(θ,

L(θ, d) d P(θ ),

respectively.

18.2

Fuzzy utility

In reality it is frequently not possible to find exact utility values. Therefore a more realistic approach to describe utilities is to use fuzzy utility values U ∗ (θ, d) which are fuzzy intervals. Moreover, in general the uncertainty concerning the state of the system can be fuzzy too. This can be expressed by a fuzzy probability distribution P* on , for example a fuzzy a posteriori distribution π *(·|D*). In this situation, in order to find Bayesian decisions generalized expected utilities have to be calculated, i.e. ˜ d). E P∗ U ∗ (θ, For classical probability distributions P on the generalized expected utility can be calculated using the generalized integral from Section 3.6: E P U ∗ (θ˜ , d) = − U ∗ (θ, d) d P(θ )


November 12, 2010

17:15


BAYESIAN DECISIONS AND FUZZY INFORMATION

115

is calculated using the δ-level functions Uδ (·, d) and U δ (·, d) of U ∗ (·, d). Remark 18.3: The calculation of the above generalized Stieltjes integral is explained separately for discrete and continuous in the next two sections.

18.3

Discrete state space

Let ={θ 1 ,. . . ,θ m } be discrete, and P a discrete probability distribution on with point probabilities p j = P(θ j ) for j = 1(1)m. Then the expected utility of a decision d for the fuzzy utility function U ∗ (. , d) is calculated using δ-level functions in the ˜ d), following way: The δ-cuts of the generalized expected utility E P U ∗ (θ, ˜ d) = Eδ U ∗ (θ˜ , d); Eδ U ∗ (θ˜ , d) , Cδ E P U ∗ (θ, are calculated by Eδ U ∗ (θ˜ , d) =

m

U δ (θ j , d) · p j

j=1

and Eδ U ∗ (θ˜ , d) =

m

Uδ (θ j , d) · p j

j=1

for all δ∈(0; 1]. The characterizing function χ (·) of E P U ∗ (θ˜ , d) is obtained from the construction lemma:

χ(x) = sup δ · IE U ∗ (θ˜ ,d);E U ∗ (θ˜ ,d) (x) : δ ∈ [0; 1] ∀x ∈ R δ

δ

˜ d)] = R. with C 0 [E p U ∗ (θ, In the case of fuzzy probability distribution P ∗ on the generalized expected ˜ d) is calculated in the following way: utility E P∗ U ∗ (θ, ∗ Let p j = P*(θ j ), j = 1(1)m be the fuzzy point probabilities (cf. Section 14.1). The fuzzy expected utility is constructed using the δ-level functions of the utility function and the δ-cuts Cδ [ p ∗j ] of p ∗j , ∀δ ∈ (0; 1]. Cδ p ∗j = p j,δ ; p j,δ ˜ d); Eδ U ∗ (θ, ˜ d) Cδ E p∗ U ∗ (θ˜ , d) = Eδ U ∗ (θ,


116

November 12, 2010

17:15



with Eδ U ∗ (θ˜ , d) = and ∗

Eδ U (θ˜ , d) =

⎫ ⎪ U δ (θ j , d) · p j,δ ⎪ ⎪ ⎪ ⎬ j=1 m

m j=1

Uδ (θ j , d) · p j,δ

⎪ ⎪ ⎪ ⎪ ⎭

∀δ ∈ (0; 1].

From the δ-cuts the characterizing function χ (·) of the fuzzy expected utility is obtained by the construction lemma as above.

18.4

Continuous state space

For continuous state space the fuzzy probability distribution P* on is given by a fuzzy probability density f * on (cf. Definition 8.1 in Section 8.1). In general for fuzzy utility U ∗ (. , d) with δ-level functions Uδ (· , d) and U δ (· , d), the fuzzy expected utility E P ∗ U ∗ (θ˜ , d) = − U ∗ (θ, d) f ∗ (θ )dθ

is calculated in the following way: Denoting the δ-cuts of E P ∗ U ∗ (θ , d) by ˜ d); Eδ U ∗ (θ˜ , d) , Eδ U ∗ (θ,

the endpoints are obtained by Eδ U ∗ (θ˜ , d) =

and Eδ U ∗ (θ˜ , d) =

⎫ ⎪ U δ (θ, d) f δ (θ )dθ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ Uδ (θ, d) f δ (θ )dθ ⎪ ⎪ ⎭

∀δ ∈ (0; 1]

By the construction lemma the characterizing function of E P ∗ U ∗ (θ˜ , d) is obtained. Remark 18.4: In general the resulting generalized expected utility is a fuzzy interval. Contrary to the classical situation, where expected utilities are real numbers, it is not always simple to come to an optimal decision. However, the characterizing functions of fuzzy expected utilities provide valuable information concerning the suitability of a decision.


November 12, 2010

17:15


BAYESIAN DECISIONS AND FUZZY INFORMATION

18.5

117

Problems

(a) Specialize the described approach to the classical situation of Bayesian decisions. What is the characterizing function of the expected utility in this case? (b) Under which conditions for fuzzy expected utilities is a Bayesian decision uniquely determined? (Hint: Look at the support of the fuzzy expected utility.)


November 12, 2010

17:15



November 15, 2010

11:7


Part VI REGRESSION ANALYSIS AND FUZZY INFORMATION In regression analysis usually it is assumed that observed data values are real numbers or vectors. In applications where continuous variables are observed this assumption is unrealistic. Therefore data have to be considered as fuzzy. Another kind of fuzzy information is present in Bayesian regression models. Here the a priori distributions of the parameters in regression models are typical examples of fuzzy information in the sense of fuzzy a priori distributions. Both kinds of fuzzy information, fuzzy data as well as fuzzy probability distributions for the quantification of a priori knowledge concerning parameters has to be taken into account. This is the subject of this part.


November 15, 2010

11:7



November 15, 2010

11:7


19

Classical regression analysis Regression models describe nondeterministic dependencies between so-called independent variables x and the corresponding so-called dependent variables y. In standard statistics the values of both kinds of variables are assumed to be numbers or vectors. In reality observed values are not precise numbers but more or less fuzzy. Therefore a generalization of regression analysis to the situation of fuzzy data is necessary. In this chapter regression models are first described, and then the generalization to fuzzy data is explained.

19.1

Regression models

Standard regression models consider k independent deterministic variables x1 , · · · , xk and a dependent variable y which is modeled by a stochastic quantity Y whose distribution depends on the values x 1 , · · · , x k of the independent variables. Using the notation x = (x1 , · · · , x k ) the independent variables of the corresponding stochastic quantity describing the dependent variable is denoted by Y x . A so-called parametric regression model models the expectation IEYx of Y x as a function of the independent variables and the parameters θ0 , θ1 , · · · , θk in the following way: IEY x = ψ (x1 , · · · , x k ; θ0 , θ1 , · · · , θk )

(19.1)

where ψ(·) is a known function, and θ0 , θ1 , · · · , θk are unknown constants called regression parameters. Usually the variance of Yx are assumed to be the same for all x ∈ Rk : Var Y x = σ 2 Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

121

for all

x

(19.2)


122

November 15, 2010

11:7



Condition (19.2) is called variance homogeneity or homoscedasticity. The values of the dependent variable y are considered to be the sum of a deterministic function and a stochastic quantity U with IEU = 0 and Var U = σ 2 : Y x = ψ (x; θ) + U

(19.3)

where θ = (θ0 , θ1 , · · · , θk ) denotes the vector of unknown parameters. Based on observed data (xi1 , · · · , xik ; yi ) for i = 1(1)n the parameters θ0 , θ1 , · · · , θk are estimated using the principle of least sum of squared distances (LSS): n

[yi − ψ(xi1 , · · · , xik ; θ0 , θ1 , · · · , θk )]2 −→ min.

i=1

An estimate of the variance σ 2 is obtained as a function of the number n of data ‘points’ and n

[yi − ψ(xi1 , · · · , xik ; θ0 , θ1 , · · · , θk )]2 .

i=1

A specialized type of regression models are linear regression models for which Equation (19.1) specializes, using the notation x˚ = (1, x 1 , · · · , x k ) ∈ Rk+1 , to be of the following form: ⎛

⎞ 1 k ⎜ x1 ⎟ ⎜ ⎟ θjxj EY x = θ x˚ T = (θ, θ1 , · · · , θk ) ⎜ . ⎟ = θ0 + ⎝ .. ⎠

(19.4)

j=1

xk where θ ∈ Rk+1 . For linear regression models under certain conditions the solution θ0 , θ1 , · · · , θk of the LSS problem can be given explicitly. This is stated in the well known Gauss– Markov theorem. In order to formulate the Gauss–Markov theorem the so-called data matrix X is useful.


November 15, 2010

11:7


CLASSICAL REGRESSION ANALYSIS

123

For data (xi 1 , · · · , xi k ; yi ), i = 1(1)n the data matrix is defined by ⎛

1 ⎜1 ⎜ X = ⎜. ⎝ ..

x 11 x 21

x12 x22

··· ···

1

x n1

xn2

···

⎞ x 1k x 2k ⎟ ⎟ ⎟ ⎠ x nk

with

x ij ∈ R.

The vector of dependent values y is defined by ⎞ y1 ⎜ y2 ⎟ ⎜ ⎟ y = ⎜. ⎟ ⎝ .. ⎠ ⎛

with

yi ∈ R.

yn The vector x i is defined by x i = (xi 1 , · · · , xi k )

for i = 1(1)n.

So the data matrix is given by ⎛

⎞ x˚1 ⎜ x˚ ⎟ ⎜ 2⎟ X = ⎜. ⎟. ⎝ .. ⎠ x˚n Definition 19.1: Estimators θ j for parameters θ j in linear regression models are called linear if θj =

n

cij yi

(19.5)

i=1

where the coefficients cij are depending only on the known values xij of the data matrix X . Remark 19.1: Using the notation above and C = (cij )

i = 1(1)n j = 1(1)k

(19.6)

θ1 , · · · , θk can be expressed in the following way: the vector of estimates θ= θ0 , θ =Cy

(19.7)


124

November 15, 2010

11:7



The following Gauss–Markoff theorem gives the explicit solution of the LSS estimation of θ. Moreover it gives an unbiased estimation for σ 2 . Theorem 19.1: For a linear regression model (19.4) and data (x i 1 , · · · , xi k ; yi ), i = 1(1)n obeying the following conditions (1) n ≥ k + 1; (2) the elements x ij of the data matrix are known and rank (X ) = k + 1; (3) the stochastic quantities Y1 , · · · , Yn denoting the dependent variables Y x i , i = 1(1)n are pairwise uncorrelated; (4) Var Yi = σ 2 for i = 1(1)n, i.e. all variances are identical; (5) the space of possible parameter values θ is not a hyperplane in Rk+1 ; the following holds true: 1) The LSS estimates for the parameters are given by

−1 T T θ = XTX X Y where X T denotes the transposed matrix of X , and Y is the vector of dependent variables ⎛ ⎞ Y1 ⎜ ⎟ Y = ⎝ ... ⎠ . Yn 2) The estimators θj above are also the minimum variance unbiased linear estimators (MVULE) for θ j , j = 0(1)k. 3) The variance–covariance matrix of θ , denoted by

−1 2 VCov( θ 0, θ 1, · · · , θk) = X T X σ . 4) An unbiased estimator for the variance σ 2 is given by S2 =

T T T Y − X θ Y − X θ n−k−1

n

=

i=1

Yi − θ0 −

k

2 θj xij

j=1

n−k−1

The proof is given in books on regression analysis. For linear regression models Y x = x˚ θ + U and parameter estimates

θ= θ 0, θ 1, · · · , θk

.


November 15, 2010

11:7



125

a prediction for the dependent variable y to independent variables x = (x1 , · · · , xn ) is given by T x = θ x˚ T = x˚ θ . Y

(19.8)

Proposition 19.1: For the prediction (19.8) the following holds true: x is an unbiased estimator for EY x . (1) Y

x = x˚ X T X −1 x˚ T σ 2 . (2) Var Y Proof:

k k T

x = E θ x˚ = E x j x j E θj . By the unbiθ0 + θ j = E (1) EY θ0 + j=1

j=1

asedness of θj for θ j , j = 0(1)k, the last expression equals θ0 +

k

xjθj =

j=1

EY x by the linear regression model (19.4). T

−1 T T −1 T x = Var x˚ (2) Var Y θ = Var x˚ X T X X Y x˚ X X X =z= therefore we obtain (z 1 , · · · , z n ) is a vector of constants and n

−1 T

T Var x˚ X X X Y = Var (z Y ) = Var z i Yi . Because Y1 , · · · , Yn n i=1 n

are pairwise uncorrelated we obtain Var z i Yi = zi2 Var Yi = z z T σ 2 . i=1 i=1

−1 T T −1 T T

−1 T ˚ By z z T = x˚ X T X x X X X X = x˚ X T X X X T T

−1 −1 −1 XTX x˚ T = x˚ X T X x˚ T = x˚ X T X x˚ T, and we obtain T −1 T 2 x˚ σ . for the variance x˚ X X

19.2

Linear regression models with Gaussian dependent variables

Assuming Gaussian distributions for the dependent variables Yi = Y x i the estimators from the Gauss–Markov theorem have additional properties. These can be used to construct confidence intervals for the parameters. In order to obtain the confidence intervals some results are necessary. These are given in the following theorems. Theorem 19.2: Assuming a linear regression model (19.4) and all conditions from Theorem 19.1 and additionally ⎛ ⎞ the stochastic quantities Y1 , · · · , Yn to have Gaussian Y1

⎜ .. ⎟ distributions, i.e. Y = ⎝ . ⎠ has multivariate Gaussian distribution N X θ T , σ 2 In Yn


126

November 15, 2010

11:7



where In is the n × n identity matrix ⎛

⎞ 0 ⎜ .. ⎟ ⎜0 1 .⎟ ⎟, In = ⎜ ⎜. ⎟ .. ⎝ .. . 1 0⎠ 0 ··· 0 1 ···

1 0

then the following holds: (1) The estimator θ is also the maximum likelihood estimator for θ. (2) The distribution of the estimator θ is a multivariate normal distribution ⎞ θ0 ⎜ θ1 ⎟

−1 2 ⎜ ⎟ σ . N ⎜. ⎟; XTX ⎝ .. ⎠ ⎛

θk (Y −X θ )

T

Y −X θ

T

has chi-square distribution with n − k − T θ−θ )X T X ( θ−θ ) has chi-square 1 degrees of freedom, and the pivotal quantity ( 2 σ distribution with k + 1 degrees of freedom.

(3) The pivotal quantity

(4)

T Y −X θ

T σ2

T Y −X θ

σ2

θ −θ )X T X ( θ−θ ) and ( σ2

T

are statistically independent.

The proof can be found in books on regression analysis. Proposition 19.2: Under the assumptions of Theorem 19.2 the stochastic vector ⎛

Y1 T ⎜ .. Y − Xθ = ⎝ . Yn

− θ0 − θ0

− −

k

j=1 x 1 j θ j

k

⎞ ⎟ ⎠

j=1 x n j θ j

is normally distributed and statistically independent from θ. For the proof compare books on regression analysis. Now confidence intervals for θ0 , θ1 , · · · , θk and σ 2 can be obtained. These are given in the next theorem. Theorem 19.3: Under the assumptions of Theorem 19.2 and denoting the elements

−1 by sij the following confidence intervals for θ j , j = 0(1)k, and for σ 2 of X T X are obtained:


November 15, 2010

11:7



127

(1) A confidence interval for θ j with coverage probability 1 − α is given by √ √ θ j + tn−k−1 ;1− α2 · s j j · S θ j − tn−k−1;1− α2 · s j j · S;

where tn−k−1 ;1− α2 is the 1 − α2 quantile of the student-distribution with n − k − 1 degrees of freedom, and S is the positive square root of S 2 from Theorem 19.1. (2) A confidence interval for σ 2 with coverage probability 1 − α is given by

(n − k − 1)S 2 (n − k − 1)S 2 ; 2 2 χn−k−1;1− χn−k−1; α α 2

2

2 where χn−k−1; p is the p-quantile of the chi-square distribution with n − k − 1 degrees of freedom, S is as above.

Proof: (1) From Theorem 19.2 we obtain by standardization that

θ −θ √j j2 s j j ·σ

has standard nor-

(n−k−1)S 2 σ2

mal distribution. The pivotal quantity is chi-square distributed with n − k − 1 degrees of freedom. By Proposition 19.2 the two pivotal quantities are statistically independent. Therefore the stochastic quantity

√ θj − θj / sjj · σ θj − θj = √ 2 (n−k−1)S S sjj σ 2 /(n−k−1)

is student distributed with n − k − 1 degrees of freedom. Therefore θj − θj α α ≤ tn−k−1;1− 2 = 1 − α, Pr tn−k−1; 2 ≤ √ S sjj and identical transformation of the inequalities yields √ √ θ j − tn−k−1; α2 · s j j · S = 1 − α. Pr θ j − tn−k−1;1− α2 · s j j · S ≤ θ j ≤ Considering the symmetry of the density of the student distribution we obtain tn−k−1; α2 = −tn−k−1;1− α2 , and therefore the equivalent inequalities √ √ Pr θ j − tn−k−1;1− α2 · s j j · S ≤ θ j ≤ θ j + tn−k−1; α2 · s j j · S = 1 − α. (2) The confidence interval for σ 2 is obtained using the pivotal quantity

(n−k−1)S 2 σ2

(n − k − 1)S 2 2 2 ≤ χn−k−1;1− α = 1 − α. Pr χn−k−1; α ≤ 2 2 σ2

:


128

November 15, 2010

11:7



Identical transformation of the inequalities yields Pr

(n − k − 1)S 2 (n − k − 1)S 2 ≤ σ2 ≤ 2 2 χn−k−1;1− α χn−k−1; α 2

= 1 − α.

2

This proves the confidence interval for σ 2 .

19.3

General linear models

In generalization of the linear regression model the functional form of EY x = (x1 , · · · , xk ; θ0 , θ1 , · · · , θr ) for a general linear model is assumed to be given by r known functions f j (x1 , · · · , x k ), j = 1(1)r in the form of (x1 , · · · , x k ; θ0 , θ1 , · · · , θr ) = θ0 +

r

θ j f j (x1 , · · · , x k )

(19.9)

j=1

with unknown parameters θ0 , θ1 , · · · , θr , and linear independent functions f 1 (·), · · · , fr (·). The functions f j (x1 , · · · , x k ) can be written in compact form as f j (x), and using the notation ⎞ f 1 (·) ⎟ ⎜ f (·) = ⎝ ... ⎠ fr (·)

⎛

⎛

⎜ ˚f (·) = ⎜ ⎜ ⎝

and

⎞ 1 f 1 (·) ⎟ ⎟ .. ⎟ . ⎠ fr (·)

θ = (θ0 , θ1 , · · · , θr ) , x = (x 1 , · · · , x k ) the general linear model (19.9) takes the form EY x = θ ˚f (x). As in the linear regression model usually the variances are assumed to be identical, i.e. Var Yx = σ 2

for all

x.


November 15, 2010

11:7



129

In order to estimate the parameters θ0 , θ1 , · · · , θr and σ 2 observations are taken, i.e. the data are of the same type as for linear regression models: (xi1 , · · · , xik ; yi ) ,

i = 1(1)n.

Based on these data, and using the notation x i = (xi1 , · · · , xik ) the data matrix F is defined by ⎛

1, ⎜ .. F =⎝ . 1,

f 1 (x 1 ) .. .

, · · · , fr (x 1 )

⎞ ⎟ ⎠.

f 1 (x n ) , · · · , fr (x n )

In order to estimate the regression parameters θ0 , θ1 , · · · , θr again the method of LSS is used, i.e. ⎤ ⎡ n r ⎣ yi − θ0 − θ j f j (xi1 , · · · , xik )⎦ −→ min . SS = i=1

j=1

Similar to linear regression models under the following conditions the estimations θ j for the parameters θ j can be given explicitly. This is the subject of the following theorem: Theorem 19.4: Let a general linear model (19.9) be given. Furthermore a sample (x i , yi ), i = (1)n, and the following conditions are fulfilled: (1) EY x = θ ˚f (x)

∀

x to be considered.

(2) Var Yx = σ 2 ∀ x to be considered. (3) Y x 1 , · · · , Y x n are pairwise uncorrelated.

(4) rank F T F = r + 1 < n. (5) The space of possible parameters θ is not a hyperplane in Rr+1 . Then the following holds: 1) The LSS estimates for the parameters are given by

−1 T T F y. θ = FT F

−1 T F Y provides the MVULE for the parameters. 2) The estimator F T F 3) The variance–covariance matrix of θ is given by T −1 2 VCov θ = F F σ .


130

November 15, 2010

11:7



4) An unbiased estimator for σ 2 is given by S2 =

Y − F θ

T

T

T Y − F θ

n −r −1

n =

i=1

Yxi − θ0 −

r j=1

θ j f j (x i )

n −r −1

.

The proof is given in books on regression analysis. A prediction for the dependent variable y based on the value x for the independent variable is given by x = (x; θ) = θ ˚f (x). Y

(19.10)

x above is an unbiased estimate for EY x , and the variance of this Proposition 19.3: Y estimate is

x = ˚f T (x) F T F −1 ˚f (x)σ 2 . Var Y

Proof: E θ ˚f (x) = E θ ˚f (x) = θ ˚f (x) = EY x T x = Var Var Y θ ˚f (x) = Var (F T F)−1 F T Y ˚f (x)

−1 T

= Var Y T F (F T F) $% #

⎛

⎛

n

⎞

˚ z i Yi ⎠ . f (x) = Var Y T z = Var ⎝ & ⎞ i=1

z1 ⎜ .. ⎟ with z = ⎝ . ⎠ zn

⎞

⎛

n

Since Y1 , · · · , Yn are pairwise uncorrelated we obtain Var ⎝

z i Yi ⎠ =

i=1 σ n % &# $ 2

z i2 Var Yi = z T zσ 2 .

i=1

T T T F (F T F)−1 ˚f (x) = From z T z = F (F T F)−1 ˚f (x)

−1 T T −1 T T ˚f (x) = ˚f T (x) (F T F)−1 T ˚f (x) ∈ R F F (F F) = ˚f (x) F T F # $% & In T T T T T ⇒ ˚f (x) (F T F)−1 ˚f (x) = ˚f (x) F T F)−1 ˚f (x) T T T ⇒ ˚f (x) (F T F)−1 ˚f (x) = ˚f (x)(F T F)−1 ˚f (x) x = ˚f T (x)(F T F)−1 ˚f (x)σ 2 . ⇒ Var Y


November 15, 2010

11:7



19.4

131

Nonidentical variances

The assumption of identical variances in Theorem 19.1 is sometimes not justified. Moreover the error terms described by stochastic quantities Ui are not always uncorrelated. Therefore the regression model Y = Xθ + U

with

VCov(U) = In σ 2

is generalized in the following way: VCov(U) = σ 2

where is a given positive definite (n × n) matrix. The corresponding linear regression model is EY x = θ ˚f (x), and including the error terms U x it becomes Y x = θ ˚f (x) + U x

with VCov(U) = σ 2 . Again the method of LSS is applied to estimate the parameters θ0 , θ1 , · · · , θr from data (x i , yi ), i = 1(1)n. The following theorem is a generalization of the Gauss– Markov theorem, called the Gauss–Markov–Aitken theorem. Theorem 19.5: Let a general linear model (19.9) be given. Furthermore a sample (x i , Yi ) , i = 1(1)n. Assuming the following conditions are fulfilled: (1) rank (F) = r + 1 < n;

(2) VCov (Y ) = σ 2 ; (3) the space of possible parameters θ is not a hyperplane in Rr+1 , then the following holds: −1

T 1) By θ = F T −1 F F T −1 Y the unequally determined MVULE for the parameter vector θ are given (called Aitken estimators). 2) The variance–covariance matrix VCov( θ ) of this estimator is given by −1

σ 2. VCov( θ) = F T −1 F


132

November 15, 2010

11:7



3) An unbiased estimator for the parameter σ 2 is given by S 2 :=

T Y − F θ

T −1 n −r −1

T Y − F θ

.

For the proof see books on regression analysis, for example Schönfeld (1969).

19.5

Problems

(a) Explain that polynomial regression functions are special forms of linear regression. (b) What kind of functions f j (·) in the general linear model generate the classical linear regression model? (c) Assuming the random variables Y x i are all normally distributed. What does assumption (3) in Theorem 19.1 imply?


October 26, 2010

20:42


20

Regression models and fuzzy data As mentioned at the beginning of this book, all measurement results of continuous quantities are not precise numbers. Therefore regression models also have to take care of fuzzy data. A sample therefore is given by n vectors of fuzzy numbers, i.e.

∗ ∗ , · · · , xik ; yi∗ , i = 1(1)n. xi1

Another kind of fuzzy data are given if the vector of independent variables is a fuzzy vector x i∗ . Then the sample takes the following form:

x i∗ ; yi∗ , i = 1(1)n.

∗ Remark 20.1: It is important to note ∗that the ∗fuzzy vector x i is essentially different from the vector of fuzzy numbers xi1 , · · · , xik . The former is defined by a vectorcharacterizing function, whereas the latter is defined by a vector of characterizing functions. In applications both kinds of fuzzy data appear. Another kind of fuzziness in this context is the fuzziness of parameters in regression models. There are several situations possible if fuzziness is taken into account in regression models:

(a) The parameters and the independent variables xi are assumed to be classical real valued, but the dependent variable is fuzzy, i.e. yi∗ are fuzzy numbers. Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

133


134

October 26, 2010

20:42



(b) The independent variables xi as well as the values of the dependent variables yi are classical real numbers but the parameters are fuzzy numbers θ ∗j , i.e. EY x = θ0∗ ⊕ θ1∗ x1 ⊕ · · · ⊕ θk∗ xk . Here ⊕ denotes the generalized addition operation for fuzzy numbers. (c) The values of the input variables xi are fuzzy numbers xi∗ and all other quantities are classical real numbers. (d) The input variables as well as the output variables are fuzzy, and the parameters are classical real valued, i.e. EYx = θ0 ⊕ θ1 x1∗ ⊕ · · · ⊕ θk xk∗ , ∗ ∗ , · · · , xik ; yi∗ . and the data set is x i1 (e) All considered quantities are fuzzy, i.e. EY x = θ0∗ ⊕ θ1∗ x 1∗ ⊕ · · · ⊕ θk∗ xk∗ . The above described possibilities apply also to general linear models. All the above situations can be adapted. There are different approaches for the generalization of estimation procedures in the case of fuzzy data. The most direct approach is based on the extension principle. This is described in the next section.

20.1

Generalized estimators for linear regression models based on the extension principle

Depending on the kind of available data generalized estimators for the parameters can be obtained in the following way: ad (a): EYx = θ0 + θ1 x1 + · · · + θk xk fuzzy dependent variables yi∗ : For data (x 1i , · · · , x ki ; yi∗ ), i = 1(1)n the fuzzy values yi∗ are characterized by their characterizing functions ηi (·), i = 1(1)n. In this situation the estimators θ j for θ j , j = 1(1)k can be generalized in the following way: In order to apply the extension principle first the fuzzy data y1∗ , · · · , yn∗ have to be combined to form a fuzzy vector y∗ in Rn . The vector-characterizing function ζ (·, · · · , ·) of the combined fuzzy vector y∗ is obtained from the characterizing functions η1 (·), · · · , ηn (·) by application of the minimum t-norm, i.e. the values ζ (y1 · · · , yn ) are defined by ζ (y1 , · · · , yn ) := min {η1 (y1 ), · · · , ηn (yn )}

∀

(y1 , · · · , yn ) ∈ Rn .


October 26, 2010

20:42


REGRESSION MODELS AND FUZZY DATA

135

The estimators θ j of the regression parameters θj can now be generalized. θ j for θ j from Using the notation x i = (xi1 , · · · , xik ) for i = 1(1)n the estimators Theorem 19.1 can be written as θj = θ j (x 1 , · · · , x n ; y1 , · · · , yn ) . θ ∗j Now the membership functions ϑ j (·) of the generalized (fuzzy) estimator ∗ ∗ based on the fuzzy data y1 , · · · , yn , i.e. θ j x 1 , · · · , x n ; y1∗ , · · · , yn∗ θ ∗j = are given by their values ϑ j (z) in the following way:

ϑ j (z) =

⎧ ⎪ ⎨ ⎪ ⎩

⎫ θ j (x 1 , · · · , x n ; y1 , · · · , yn ) = z ⎪ sup ζ (y1 , · · · , yn ) : ⎬ if ∃(y1 , · · · , yn ) : θ j (x 1 , · · · , x n ; y1 , · · · , yn ) = z ∀ z ∈ R. ⎪ ⎭ θ j (x 1 , · · · , x n ; y1 , · · · , yn ) = z 0 if ∃(y1 , · · · , yn ) :

Remark 20.2: By the continuity of the functions θ j (x 1 , · · · , x n ; y1 , · · · , yn ) the membership functions ϑ j (·) are characterizing functions of fuzzy intervals if all functions η j (·) are characterizing functions of fuzzy intervals yi∗ . A generalized (fuzzy) estimator of the variance σ 2 is given by the fuzzy element 2∗ σ of R whose membership function ψ(·) is given by its values ⎫ ⎧ 2 , · · · , y ) : S (x , · · · , x ; y , · · · , y ) = z sup ζ (y 1 n 1 n 1 n ⎪ ⎪ ⎬ ⎨ 2 if ∃ (y , · · · , y ) : S (x , · · · , x ; y , · · · , y ) = z 1 n 1 n 1 n ∀ z ∈ R. ψ(z) = ⎪ ⎪ ⎭ ⎩ 2 / 0 if ∃ (y1 , · · · , yn ) : S (x 1 , · · · , x n ; y1 , · · · , yn ) = z where S 2 is the function from conclusion 4) in Theorem 19.1 Remark 20.3: The generalized regression function for fuzzy data is a fuzzy valued function of the classical variables x1 , · · · , x n . This fuzzy valued function y∗ = θ0∗ ⊕ θ1∗ x1 ⊕ · · · ⊕ θk* xk can be characterized by the lower and upper δ-level functions y δ (x1 , · · · , x k ) and y δ (x1 , · · · , xk ), respectively. An example for k = 1 is a generalized regression line EYx = θ0 + θ1 x

for

x ∈ R.

For fuzzy data given in Table 20.1 the generalized estimators aˆ ∗ and bˆ ∗ for the regression parameters a and b in EYx = a + bx


136

October 26, 2010

20:42



Table 20.1 Fuzzy data. xi

yi∗ =< m i , li , ri > L

6.15 9.00 12.00 12.15 15.00 15.19 17.55 20.48

< 2.5, < 2.0, < 3.0, < 2.5, < 3.0, < 3.5, < 3.5, < 4.0,

0.17, 0.17, 0.30, 0.25, 0.13, 0.35, 0.27, 0.32,

0.40 > L 0.45 > L 0.50 > L 0.53 > L 0.46 > L 0.63 > L 0, 72 > L 0.61 > L

are given by aˆ ∗ = < 1.3079, 0.9208, 1.0510 > L bˆ ∗ = < 0.1259, 0.0707, 0.0820 > L . For the generalized regression function EYx = aˆ ∗ ⊕ bˆ ∗ x we obtain the fuzzy line f ∗ (x) with f ∗ (x) =< 1.3079, 0.9208, 1.0510 > L ⊕ < 0.1259, 0.0707, 0.0820 > L x. In Figure 20.1 the fuzzy data y1∗ , · · · , y8∗ as well as the fuzzy regression line are depicted. The solid line is the (δ = 1) level curve of f ∗ (·). The dashed lines are the connections of the lower and upper ends of the supports of f ∗ (·). ad (c): If the values of input variables xi are fuzzy numbers, again the generalized estimators for the regression parameters are becoming fuzzy. Let ξi (·), i = 1(1)n denote the vector-characterizing functions of the fuzzy vectors x i∗ = (xi 1 , · · · , xi k )∗ . In order to generalize the estimators θ j to the situation of fuzzy data, the fuzzy vectors x i∗ have to be combined into a fuzzy element of the Euclidean space Rnk . This fuzzy element x ∗ of Rnk has a vector-characterizing function ζ : Rnk → [0; 1] whose values ζ (x) are defined by the minimum t-norm: Obeying x = (x 1 , · · · , x n ) the values ζ (x) are given by ζ (x 1 , · · · , x n ) := min {ξ1 (x 1 ), · · · , ξn (x n )}

∀

x i ∈ Rk .


October 26, 2010

20:42



137

Figure 20.1 Linear regression line. ∗ Based on this the generalized estimators θ j are obtained by the extension ∗ principle. The membership function φ j (·) of θ j is given for j = 1(1)k by

⎧ ⎫ sup ζ (x) : θ j (x; y1 , · · · , yn ) = z ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ θ j (x; y1 , · · · , yn ) = z if ∃ x ∈ Rnk : φ j (z) := ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ 0 if ∃/ x ∈ Rnk : θ j (x; y1 , · · · , yn ) = z

∀ z ∈ R.

ad (d): If the input variables as well as the output variables are fuzzy, the fuzziness of all of these variables has to be combined in order to apply the extension principle. This is done by using the minimum t-norm. Applying the notation of case (c) above, and denoting the characterizing function of yi∗ by ηi (·) for i = 1(1)n, the combined fuzziness is contained in the fuzzy element t ∗ of Rn(k+1) whose vector-characterizing function τ (·, · · · , ·) is given by its values


138

October 26, 2010

20:42



τ (t), defined by τ (x 1 , · · · , x n , y1 , · · · , yn ) := min {ξ1 (x 1 ), · · · , ξn (x n ), η1 (y1 ), · · · , ηn (yn )}

x i ∈ Rk ∀ yi ∈ R . Based on this fuzzy element t ∗ in Rn(k+1) , which is a fuzzy vector, the estimators θ j for the regression parameters can be generalized. The characterizing function φ j (·) of the fuzzy estimator θˆ ∗j is given by

φ j (z) :=

sup τ (t) : θ j (t) = z if 0 if

∃ t ∈ Rn(k+1) ∃/ t ∈ Rn(k+1)

: θ j (t) = z ∀z ∈ R. : θ j (t) = z

The resulting estimates θˆ ∗j are fuzzy elements of R. In the case of continuous functions θ j (·) the fuzzy estimators are fuzzy numbers. ad (e): In this case the estimation for the parameters is done in the same way as before in case (d). For the model (b) classical estimation procedures would be suitable. The fuzzy parameters θ ∗j can be obtained by fuzzification of the standard estimates θ j.

20.2

Generalized confidence regions for parameters

For standard regression models with existing confidence function κ j (Y1 , · · · , Yn ) for parameters θ j in the case of fuzzy data y1∗ , · · · , yn∗ corresponding generalized (fuzzy) confidence regions ∗1−α can be obtained by application of the method from Section 12.2.

20.3

Prediction in fuzzy regression models

Depending on the assumed regression model, predictions of dependent values for values of the independent variable where no data are given can be obtained using the generalized algebraic operations for fuzzy quantities. The predictions depend on the different cases (a)–(e) given at the start of the chapter. ad (a): In this case the estimated parameters are fuzzy numbers θˆ ∗j and the predicted value y ∗ (x) = θˆ0∗ ⊕ θˆ1∗ x1 ⊕ · · · ⊕ θˆk∗ xk . Therefore y ∗ (x) is a fuzzy number whose characterizing function is obtained by the generalized arithmetic operations. ad (b): For the prediction the situation is essentially the same as in (a). ad (c): By the estimation procedure for θ j the estimators are fuzzy values θˆ ∗j , and the prediction equation reads y ∗ (x ∗ ) = θˆ0∗ ⊕ θˆ1∗ x1∗ ⊕ · · · ⊕ θˆk∗ x k∗ .


October 26, 2010

20:42



139

Therefore fuzzy multiplication as well as fuzzy addition has to be applied here. ad (d): Here the prediction equation based on observed data is the same as in case (c). ad (e): Again the prediction equation has the form from case (c). Example 20.1 For applications triangular fuzzy numbers are not sufficient. Frequently trapezoidal fuzzy numbers are more realistic. In this situation for the regression model y ∗ (x ∗ ) = θˆ0∗ ⊕ θˆ1∗ x 1∗ ⊕ · · · ⊕ θˆk∗ xk∗ the resulting fuzzy values y ∗ (x ∗ ) are not necessarily trapezoidal fuzzy numbers but fuzzy intervals.

20.4

Problems

(a) For regression line y(x) = θ0 + θ1 x and two fuzzy observations (x1∗ , y1∗ ) and (x 2∗ , y2∗ ) explain how to calculate the fuzzy estimators θˆ0∗ and θˆ1∗ . (b) Calculate the fuzzy prediction y ∗ (x ∗ ) = θˆ0∗ ⊕ θˆ1∗ x ∗ for the fuzzy value x ∗ of the independent variable which is a fuzzy number.


October 26, 2010

20:42



November 12, 2010

17:39


21

Bayesian regression analysis In Bayesian inference the parameters θ are also considered as stochastic quantities θ˜ with corresponding probability distribution π (·), called a priori distribution. In the case of continuous parameters the a priori distribution is determined by a probability density π (·) on the parameter space = {θ : θ possible parameter value} . Then π(·) is a probability density on the parameter space, called a priori density. The stochastic model for the dependent variable y is Y x ∼ f x (· | θ).

21.1

Calculation of a posteriori distributions

For observed data (x i , yi ), i = 1(1)n the likelihood function θ ; (x 1 , y1 ), · · · , (x n , yn ) is given by n f x i (yi | θ ) θ; (x 1 , y1 ), · · · , (x n , yn ) = i=1

in the case of independent observations y1 , · · · , yn . The a posteriori density π · | (x 1 , y1 ), · · · , (x n , yn ) of θ is given by Bayes’ theorem which reads here π θ | (x 1 , y1 ), · · · , (x n , yn ) ∝ π(θ ) · θ; (x 1 , y1 ), · · · , (x n , yn ) Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

141

∀ θ ∈ .


142

November 12, 2010

17:39



In extended form, Bayes’ theorem takes the following form: π(θ ) · θ ; (x 1 , y1 ), · · · , (x n , yn ) ∀ θ ∈ . π θ | (x 1 , y1 ), · · · , (x n , yn ) = π(θ ) · θ ; (x 1 , y1 ), · · · , (x n , yn ) dθ

Here the parameter θ = (θ1 , · · · , θk ) ∈ Rk can be k-dimensional. Then the above integral is

π(θ1 , · · · , θk ) · θ1 , · · · , θk ; (x 1 , y1 ), · · · , (x n , yn ) dθ1 · · · dθk .

The a posteriori density carries all information concerning the parameter vector θ = (θ1 , · · · , θk ) which comes from the a priori distribution and the data.

21.2

Bayesian confidence regions

The a posteriori distribution can be used to construct Bayesian confidence regions, called highest a posteriori density regions, abbreviated as HPD-regions for the parameter vector.

21.3

Probabilities of Hypotheses

Another way of using a posteriori distributions is for the calculation of a posteriori probabilities of statistical hypotheses H0 : θ ∈ 0 ⊂ . The a posteriori probability of the hypothesis H0 is the probability of 0 based on the a posteriori density π · | (x 1 , y1 ), · · · , (x n , yn ) , i.e.

Pr 0 | (x 1 , y1 ), · · · , (x n , yn ) =

π θ | (x 1 , y1 ), · · · , (x n , yn ) dθ .

0

Depending on the value of Pr 0 | (x 1 , y1 ), · · · , (x n , yn ) the hypothesis H0 is accepted or rejected.

21.4

Predictive distributions

An elegant way of using the a posteriori distribution π · | (x 1 , y1 ), · · · , (x n , yn ) is the predictive distribution of Y x based on the observed data (x 1 , y1 ), · · · , (x n , yn ). This predictive distribution is calculated as the marginal distribution of Y x of the stochastic vector Y x , θ .


November 12, 2010

17:39


BAYESIAN REGRESSION ANALYSIS

143

For continuous variables Yx and continuous density parameter θ with probability θ the joint f x (· · · | θ) of Y x and a posteriori density π · | (x 1 , y1 ), · · · , (x n , yn ) of density g(·, ·) of (Y x , θ ) is given by its values g(y, θ) = f x (y | θ ) · π θ | (x 1 , y1 ), · · · , (x n , yn ) . Therefore the predictive distribution of Y x is given by its probability density p x · | (x 1 , y1 ), · · · , (x n , yn ) which is the marginal density of Y x , given by its values px y | (x 1 , y1 ), · · · , (x n , yn ) =

f x (y | θ ) · π θ | (x 1 , y1 ), · · · , (x n , yn ) dθ .

The density p x · | (x 1 , y1 ), · · · , (x n , yn ) is denoted as the predictive density of Yx . The predictive distribution of Yx can be used to calculate Bayesian predictive intervals for Y x . Based on the predictive density a predictive interval [a; b] for Y x with probability of coverage 1 − α is defined by b

p x y | (x 1 , y1 ), · · · , (x n , yn ) dy = Pr (Y x ∈ [a; b]) = 1 − α.

a

Special predictive intervals are HPD intervals for Yx which make the predictive density maximal in the interval [a; b].

21.5

A posteriori Bayes estimators for regression parameters

In case one wants to have point estimates for the regression parameters θ1 , · · · , θk this is possible by using the marginal distributions of the individual parameters. Let π j (·) be the marginal distribution of θ˜j calculated from the a posteriori density π (· | D), then the a posteriori Bayes estimator for θ j is the expectation of θ˜j , i.e. θ j :=

θ · π j (θ )dθ. j

The a posteriori Bayes estimator for the parameter vector θ = (θ1 , · · · , θk ) is given by θk ). θ = ( θ1 , · · · ,


144

November 12, 2010

17:39



Remark 21.1: Another possibility for point estimates for the θ j would be the median of the marginal distribution of θj.

21.6

Bayesian regression with Gaussian distributions

For linear regression models with vector independent variable x = (x1 , · · · , x k ), EY x = θ0 +

k

θjxj

j=1

and homoscedastic Gaussian distributions of the dependent variable Yx the likelihood function takes the following form: If the variance σ 2 is known we obtain n

θ0 , θ1 , · · · , θk ; (x 1 , y1 ), · · · , (x n , yn ) = f x i (yi | θ0 , · · · , θk ) i=1

=

i=1

yi −θ0 −

1 1 e− 2 √ 2 2π σ

2

k

j=1 σ2

θj xj

.

For unknown variance σ 2 the likelihood is a function of σ 2 also,

θ0 , · · · , θk , σ 2 ; (x 1 , y1 ), · · · , (x n , yn ) =

n i=1

1 e− √ 2π σ 2

yi −θ0 −

k

j=1 2σ 2

2 θj xj

,

and the a priori density has to be a function of θ0 , · · · , θk and σ 2 , i.e. π(θ0 , θ1 , · · · , θk , σ 2 ). The a posterior density is given by Bayes’ theorem: π θ0 , · · · θk , σ 2 | (x 1 , y1 ), · · · , (x n , yn ) π (θ0 , · · · , θk , σ 2 ) · θ0 , · · · , θk , σ 2 ; (x 1 , y1 ), · · · , (x n , yn ) . = π(θ0 , · · · , θk , σ 2 ) · θ0 , · · · , θk , σ 2 ; (x 1 , y1 ), · · · , (x n , yn ) dθ0 · · · dθk dσ 2 ×(0,∞)

21.7

Problems

(a) Assume the probability density f x (· | θ ) is determined by a parameter vector ψ which is depending on the regression parameters θ = (θ0 , θ1 , · · · , θk ), i.e. f x i (· | θ ) = g · | ψ(θ , x i ) . Which form has the likelihood function for observed data (x 1 , y1 ), · · · , (x n , yn )? How does Bayes’ theorem look in this situation?


November 12, 2010

17:39


BAYESIAN REGRESSION ANALYSIS

145

(b) Let the dependent quantity Yx have an exponential distribution E x τ with T density f (y | τ ) = τ1 e−1/τ I(0,∞) (y) with τ > 0. Let θ = θ0 , θ1 . Assuming a regression line θ0 + θ1 · x for value τ , i.e. τ (x) = θ0 + θ1 ·x and there the mean y 1 fore f x (y | θ) = θ0 +θ1 x exp − θ0 +θ1 x I(0,∞) (y), calculate the likelihood function and write down Bayes’ theorem for this special case for data points (x 1 , y1 ), · · · , (xn , yn ) in the plane R2 .


November 12, 2010

17:39



October 26, 2010

20:49


22

Bayesian regression analysis and fuzzy information In Bayesian regression analysis two kinds of fuzziness are present: Fuzziness of a priori distributions and fuzziness of data. In this chapter we assume that fuzzy a priori distributions are given by fuzzy densities π ∗ (·) with corresponding δ-level functions π δ (·) and π δ (·), δ ∈ (0; 1). The data are given in the form of n vectors of fuzzy numbers, i.e.

x i∗ 1 , · · · , xi∗ k ; yi∗ ,

i = 1(1)n,

or in the form of fuzzy vectors for the independent variable x = (x 1 , · · · , xk ), and fuzzy number yi∗ for the dependent variable y:

x i∗ ; yi∗ ,

i = 1(1)n

where x i∗ is a k-dimensional fuzzy vector. In order to apply Bayes’ theorem first the likelihood function has to be generalized for fuzzy data. In accordance with Chapter 15 the combined fuzzy sample has to be formed. In the first situation of data xi∗ 1 , · · · , xi∗ k , yi∗ , i = 1(1)n the combined fuzzy sample z ∗ is an n × (k + 1)-dimensional fuzzy vector whose vector-characterizing function ζ (x11 , · · · , x nk , y1 , · · · , yn ) is formed by the minimum t-norm, i.e. ζ (x 11 , · · · , x nk , y1 , · · · , yn ) = min ξi j (xi j ), ηi (yi ) : i = 1(1)n, j = 1(1)k where ξi j (·) are the characterizing functions of the fuzzy values x i∗j , and ηi (·) are the characterizing functions of the fuzzy values yi∗ of the dependent variable. Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

147


148

October 26, 2010

20:49



Based on the vector-characterizing function ζ (·, · · · , ·) of the combined fuzzy sample z∗ the fuzzy value of the generalized likelihood function ∗ ∗ ∗ θ ; x11 , · · · , x nk , y1∗ , · · · , yn∗ is obtained by the extension principle. In order to apply the generalized Bayes’ theorem the δ-level functions δ (θ ; z ∗ ) and δ (θ ; z ∗ ) of the fuzzy likelihood ∗ (θ ; z ∗ ) are used. Here θ = (θ0 , θ1 , · · · , θk ), and the fuzzy a priori density π ∗ (θ ) has also to be given by its δ-level functions π δ (·) and π δ (·), δ (·; z ∗ ) and δ (·; z ∗ ), ∀ δ ∈ (0; 1]. Now the generalized Bayes’ theorem can be applied in order to obtain the fuzzy a posteriori density π ∗ (· | z ∗ ) via its δ-level functions π δ (· | z ∗ ) and π δ (· | z ∗ ) ∀ δ ∈ (0, 1]. This fuzzy a posteriori density carries the information from the a priori distribution and the data.

22.1

Fuzzy estimators of regression parameters

Based on the fuzzy a posteriori density π ∗ ( · | z ∗ ) on the parameter space ⊆ Rr generalized point estimators for the parameter θ = (θ1 , · · · , θr ) can be calculated. Without any loss function the most popular method is to calculate the generalized expectation of θ j with respect to π ∗ (· | z ∗ ); i.e. ∗ ∗ ∗ ∗ ∗ θ j = Eπ (·|z ) θ j = − θ j · π j (θ | z )dθ = − θ j · π ∗j (θ j | z ∗ )dθ j R

where π ∗j (· | z ∗ ) is the fuzzy marginal density π ∗j (θ j

| z ) = − π ∗ (θ1 , · · · , θr | z ∗ ) dθ1 · · · dθ j−1 dθ j+1 · · · dθr . ∗

Rr −1 ∗ θ j is obtained via the construction lemma for The characterizing function ψ j (·) of characterizing functions. The nested family of subsets of R, Aδ ; δ ∈ (0; 1] is given by

Aδ := Aδ ; Aδ

∀ δ ∈ (0; 1],

where Aδ := R

θ j · π j,δ (θ j | z ∗ ) dθ j


October 26, 2010

20:49


BAYESIAN REGRESSION ANALYSIS AND FUZZY INFORMATION

149

and Aδ :=

θ j · π j,δ (θ j | z ∗ ) dθ j .

R Then ψ j (·) of θˆ ∗j is given by its values

ψ j (x) = sup δ· I[Aδ ;Aδ ] (x) : δ ∈ [0; 1] ∀ x ∈ R. Another possibility is to calculate the generalized (fuzzy) median of the fuzzy ∗ marginal distribution π ∗j (· | z∗ ). The fuzzy median xmed is obtained from the fuzzy ∗ cumulative distribution function F j (·) coresponding to π ∗j (· | z ∗ ). The fuzzy valued function F j∗ (·) is defined via its δ-cuts Cδ [F j∗ (·)] which are defined by the δ-level functions of π ∗j (· | z ∗ ), i.e. π j,δ (·) and π j,δ (·) for all δ ∈ (0; 1], F j,δ (x) : = sup

F j,δ (x) : = inf

⎧ x ⎨

⎩ −∞ ⎧ x ⎨ ⎩

f (t) dt : f (·) ∈ Sδ

f (t) dt : f (·) ∈ Sδ

−∞

⎫ ⎬

⎭ ⎫ ⎬ ⎭

where Sδ :=

f (·) : f (·) density obeying π j,δ (x) ≤ f (x) ≤ π j,δ (x) ∀ x ∈ R .

∗ of F j∗ (·) is the fuzzy number whose characterizing The generalized median xmed function ϕ(·) is constructed from the nested family of intervals Bδ ; δ ∈ (0; 1] , given by Bδ = B δ ; B δ defined by

⎫ B δ := F −1 j,δ (0.5) ⎬ B δ := F j,δ (0.5) ⎭ −1

∀ δ ∈ (0; 1] :

ϕ(x) := sup δ· I[B δ ;B δ ] (x) : δ ∈ [0; 1]

∀ x ∈ R.

For loss functions L(·, ·) describing the loss which is obtained when the parameter value is θ and this is estimated by θ , i.e. L(θ , θ ), the corresponding estimator is the value θ B for which the expectation Eπ L( θ, θ ) is minimized.


150

October 26, 2010

20:49



In the case of fuzzy loss functions and fuzzy a posteriori distribution π ∗ (· | z ∗ ) a fuzzy expectation is obtained. For details see Sections 18.2, 18.3 and 18.4.

22.2

Generalized Bayesian confidence regions

∗ The generalized (fuzzy) estimators θ j for the regression parameters θ j are already fuzzy subsets of the parameter space, generalizing point estimators from standard statistics. In order to obtain generalized confidence sets for regression parameters, based on the explanations in Chapters 16 and 21 fuzzy confidence sets for the regression parameters can be obtained.

22.3

Fuzzy predictive distributions

Predictive distributions for the dependent variable y in a Bayesian regression model are described in Section 21.4. In the case of fuzzy data the a posteriori distribution is a fuzzy density π ∗ (θ | z ∗ ). The predictive density p x (· | z) from Section 21.4 has to be generalized to the situation of fuzzy data z ∗ . Therefore the definition of the predictive density from Section 21.4, i.e. p x (y | z) =

f x (y | θ ) · π(θ | z) dθ

has to be adapted to the case of fuzzy data z ∗ . This is possible using the integration of fuzzy valued functions described in Section 3.6. The defining equation for the predictive density becomes in the case of fuzzy a posterior density π ∗ (· | z ∗ ): p∗x (y | z ∗ ) = − f x (y | θ ) π ∗ (θ | z ∗ ) dθ

∀ y ∈ MY .

This generalized integration is conducted by the integration of δ-level functions, and then applying the construction lemma for characterizing functions. The defining nested family of subsets of R for p ∗x (y | z ∗ ), i.e. Aδ (y); δ ∈ (0; 1] with Aδ (y) = Aδ (y); Aδ (y) is obtained by the following equations: Aδ (y) :=

and Aδ (y) :=

⎫ f x (y | θ) · π δ (θ | z ∗ ) dθ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ f x (y | θ) · π δ (θ | z ) dθ ⎪ ⎭ ∗

∀ δ ∈ (0; 1].


October 26, 2010

20:49


BAYESIAN REGRESSION ANALYSIS AND FUZZY INFORMATION

151

From this the characterizing function ρ(·) of p ∗x (y | z ∗ ) is constructed by ρ(x) := sup δ· I Aδ (y) (x) : δ ∈ [0; 1]

∀ x ∈ R.

The δ-level functions p x,δ (· | z ∗ ) and p x,δ (· | z ∗ ) are determined by ⎫ p x,δ (y | z ∗ ) = Aδ (y) ⎪ ⎪ ⎬ and ⎪ ⎪ p (y | z ∗ ) = A (y) ⎭ x,δ

∀ δ ∈ (0; 1].

δ

Remark 22.1: For fuzzy multivariate data (x i∗ ; yi∗ ) the vector-characterizing functions ζi (· · ·) of x i∗ are frequently approximated by finitely many δ-cuts. The characterizing functions ηi (·) of yi∗ are often assumed to be of trapezoidal shape. If the data are given in the form (xi∗ 1 , · · · , xi∗ k ; yi∗ ) usually the characterizing functions are assumed to be fuzzy intervals. Then the δ-cuts of the vector-characterizing function of the combined fuzzy sample are the Cartesian products of the δ-cuts of the individual fuzzy intervals.

22.4

Problems

(a) Let π ∗ (·) be a fuzzy density on Rr whose δ-level functions are all integrable functions with finite integral, i.e. π δ (x1 , · · · , xr )d x 1 · · · d xr < ∞

∀ δ ∈ (0; 1].

Rr Prove that all marginal densities − π ∗ (x1 , · · · , xr ) d x 1 · · · d xs d x s+1 · · · d xr Rr −1 are fuzzy densities. (b) Explain that the definition of the expectation of a fuzzy probability density f ∗ (·) specializes to the classical expectation for classical densities f (·) with existing expectation.


October 26, 2010

20:49



October 24, 2010

17:37


Part VII FUZZY TIME SERIES Many time series are the results of measurement procedures and therefore the obtained values are more or less fuzzy. Also, many economic time series contain values with remarkable uncertainty. In particular, development data and environmental time series are examples of time series whose values are not precise numbers. If such data are modeled by fuzzy numbers the resulting time series are called fuzzy time series. Based on the results from Part I of this book it is necessary to consider time series with fuzzy data. In standard time series analysis the values xt of a time series (xt )t∈T , where T is the series of time points, i.e. T = {1, 2, · · · , N }, are asumed to be real numbers. By the fuzziness of many data, especially all measurement data from continuous quantities, it is necessary to consider time series whose values are fuzzy numbers xt∗ . Such time series (x t∗ )t∈T are called fuzy time series. Mathematically a fuzzy time series is a mapping from the index set T to the set F(R) of fuzzy numbers. The generalization of time series analysis techniques to the situation of fuzzy values should fulfill the condition that it specializes to the classical techniques in the case of real valued time series. In this part, first the necessary mathematical techniques are introduced which are used to analyze fuzzy time series. Then descriptive methods of time series analysis are generalized for fuzzy valued time series. Finally, stochastic methods of time series analysis are generalized for fuzzy time series. This Part is based on the doctoral dissertation of Hareter (2003).


October 24, 2010

17:37



October 24, 2010

17:37


23

Mathematical concepts In this chapter mathematical concepts will be described which are used later in this part.

23.1

Support functions of fuzzy quantities

Let · denote the Euclidean norm, and S n−1 = {u ∈ Rn : u = 1} the unit sphere, and ·, · the inner product in Rn . Then each nonempty compact and convex subset A ⊆ Rn is characterized by its support function s A (u) := sup{u, a : a ∈ A}

∀ u ∈ S n−1 .

An element a ∈ Rn belongs to the set A if and only if u, a ≤ s A (u) for all u ∈ S n−1 . By a specialization of Definition 2.3 the δ-cuts of a fuzzy vector x ∗ ∈ F (Rn ) are compact and convex subsets of Rn . Therefore x ∗ is characterized by s x ∗ (u, δ) := sup{u, a : a ∈ Cδ (x ∗ )}

∀ u ∈ S n−1 , ∀ δ ∈ (0; 1]

(23.1)

where sx ∗ (·, ·) is called the support function of the fuzzy vector x ∗ . The set of all support functions of n-dimensional fuzzy vectors is denoted by H(Rn ), i.e. H(Rn ) := {s x ∗ : x ∗ ∈ F(Rn )}. The support function s x ∗ (·, ·) of a fuzzy vector x ∗ ∈ F (Rn ) obeys the following [See Diamond and Kloeden (1994, page 242) and Körner (1997a, page 16)]: Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

155


156

October 24, 2010

17:37



(1) For all δ ∈ (0; 1] the support function s x ∗ (·, δ) is continuous, positive homogeneous, i.e. s x ∗ (λ u, δ) = λ s x ∗ (u, δ)

∀ λ ≥ 0, ∀ u ∈ S n−1 ,

and sub-additive, i.e. s x ∗ (u1 + u2 , δ) ≤ s x ∗ (u1 , δ) + s x ∗ (u2 , δ)

∀ u1 , u2 ∈ S n−1 .

(2) For all u ∈ S n−1 the function s x ∗ (u, ·) is nonincreasing, i.e. sx ∗ (u, α) ≥ s x ∗ (u, β)

for 0 < α ≤ β ≤ 1,

and left continuous. Remark 23.1: For every Lebesgue integrable function f ∈ L S n−1 × (0; 1] fulfilling the conditions above there is exactly one fuzzy vector x ∗ ∈ F (Rn ) whose δ-cuts Cδ (x ∗ ) are fulfilling C δ (x ∗ ) = x ∈ Rn : u, x ≤ f (u, δ) ∀ u ∈ Rn and whose support function is f [See Körner and Näther (1998, page 100)]. Example 23.1 The unit sphere in R is S 0 = {−1, 1}. Therefore the support function of a fuzzy number x ∗ = m, s, l, r L R takes only two values: sx ∗ (u, δ) =

m + s + r R (−1) (δ) −m + s + l L (−1) (δ)

for for

u=1 u = −1.

Later a suitable distance on the set F (Rn ) is needed. This is defined in the next subsection.

23.2

Distances of fuzzy quantities

The so-called Hausdorff metric d H (·, ·) on the space Kn of all nonempty compact and convex subsets of Rn is defined for A, B ∈ Kn by

d H (A, B) := max sup inf a − b, sup inf a − b . a∈A b∈B

b∈B a∈A

With this metric the space (Kn , d H ) is a metric space.


October 24, 2010

17:37


MATHEMATICAL CONCEPTS

157

There is the following relationship between this metric and the difference of support functions [see Diamond and Kloeden (1990, page 243)]: d H (A, B) = sup |s A (u) − s B (u)|.

(23.2)

u∈S n−1

Based on the Hausdorff metric different metrics can be defined on the set F (Rn ). Frequently the following distances are used: For x ∗ , y∗ ∈ F (Rn ) and 1 ≤ p < ∞ ∗

∗

h p (x , y ) :=

1

p 1/ p ∗ ∗ d H Cδ (x ), C δ ( y ) d δ

(23.3)

0

and

h ∞ (x ∗ , y∗ ) := sup d H Cδ (x ∗ ), C δ ( y∗ ) .

(23.4)

δ∈(0;1]

For details on the spaces (F (Rn ) , h p ) and (F (Rn ) , h ∞ ) see Diamond and Kloeden (1990, 1992, 1994), Feng et al. (2001) and Puri and Ralescu (1986). (F (Rn ) , h 1 ) is complete and separable, (F (Rn ) , h ∞ ) is only complete. Other useful metrics on F (Rn ) can be defined based on the support function s(·) and the metrics on the functional spaces L p S n−1 × (0; 1] , p ∈ [1; ∞]. For x ∗ , y∗ ∈ F (Rn ) and 1 ≤ p < ∞ the metric ρ p (·, ·) is defined by ρ p (x ∗ , y∗ ) := sx ∗ − s y∗ p = n

1

1/ p

|s (u, δ) − s (u, δ)| µ(d u) d δ x∗

S n−1

0

y∗

p

(23.5) where µ is denoting the Lebesgue measure on S n−1 with µ(S n−1 ) = 1. For n = 1, i.e. S n−1 = S 0 = {−1, 1}, we obtain µ(−1) = µ(1) = 1/2. For p = ∞ the metric ρ∞ (·, ·) is defined by ρ∞ (x ∗ , y∗ ) = sup

sup |s x ∗ (u, δ) − s y∗ (u, δ)|,

δ∈(0;1] u∈S n−1

and from Equation (23.2) we obtain

ρ∞ (x ∗ , y∗ ) = sup d H Cδ (x ∗ ), C δ ( y∗ ) = h ∞ (x ∗ , y∗ ). δ∈(0;1]

The corresponding norm is given by x p := s p = n ∗

1

1/ p

|s (u, δ)| µ(d u) d δ

x∗

x∗

0

S n−1

p

(23.6)


158

October 24, 2010

17:37



for 1 ≤ p < ∞, and x ∗ ∞ := s x ∗ ∞ = sup

sup |s x ∗ (u, δ)|

δ∈(0;1] u∈S n−1

for p = ∞. For more details on the spaces F (Rn ) , ρ p and F (Rn ) , ρ∞ see Diamond and Kloeden (1994) and Krätschmer (2002). Based on the support function the space F (Rn ) can be embedded in the Banach space (S n−1 × (0; 1]). This embedding obeys the following proposition: Proposition 23.1: The mapping s : x ∗ → s x ∗ from the defining equation (23.1) is an isometry from F(Rn ) onto the positive cone H(Rn ) ⊆ L S n−1 × (0; 1] , with

h p (x ∗ , y∗ ) =

⎧ ⎪ ⎪ ⎨

1

1/ p sup |sx ∗ (u, δ) − s y∗ (u, δ)| p d δ

for 1 ≤ p < ∞

0 u∈S n−1

⎪ ⎪ ⎩ sup

sup |s x ∗ (u, δ) − s y∗ (u, δ)|

for p = ∞

δ∈(0;1] u∈S n−1

,

semilinear: for λ, ν ≥ 0 and x ∗ , y∗ ∈ F Rn .

sλ·x ∗ ⊕ν· y∗ = λ s x ∗ + ν s y∗ Moreover s x ∗ (u, δ) ≤ s y∗ (u, δ)

⇔

C δ (x ∗ ) ⊆ Cδ ( y∗ )

∀ δ ∈ (0; 1].

For the proof see Körner (1997a, Theorem II.9, page 11) and Körner and Näther (2002, page 30). By the foregoing proposition addition and scalar multiplication with non-negative numbers in the space F (Rn ) can be done in the functional space L S n−1 × (0; 1] . Remark 23.2: By the definition of the difference and Proposition 23.1 for the support function of the difference y∗ x ∗ we obtain s y∗ x ∗ = s y∗ ⊕ (−x ∗ ) = s y∗ + s−x ∗ where possibly s y∗ x ∗ (u, δ) = s y∗ (u, δ) − sx ∗ (u, δ). For the support function of a linear combination λ · x ∗ ⊕ ν · y∗ of two fuzzy vectors x ∗ und y∗ , and λ, ν ∈ R we obtain sλ·x ∗ ⊕ν· y∗ = |λ| ssign(λ)·x ∗ + |ν| ssign(ν)· y∗ .

(23.7)

Based on the support function different characteristics of fuzzy vectors can be obtained. The first is the so-called Steiner point.


October 24, 2010

17:37



159

Definition 23.1: The Steiner point σx ∗ ∈ Rn of a fuzzy vector x ∗ is given by σ x ∗ := n 0

1

S n−1

u sx ∗ (u, δ) µ(d u) d δ.

Proposition 23.2: The mapping σ : x ∗ → σx ∗ is linear, i.e. σλ·x ∗ ⊕ν· y∗ = λ σ x ∗ + ν σ y∗

for λ, ν ∈ R

and

x ∗ , y∗ ∈ F Rn .

Proof: The linearity follows from (23.7) and

1

σ−x ∗ = n

0 1

=n

0 1

=n

0 1

=n 0

S n−1

S n−1

S n−1

S n−1

u s−x ∗ (u, δ) µ(d u) d δ u sup{u, a : a ∈ Cδ (−x ∗ )} µ(d u) d δ (−u) sup{−u, a : a ∈ Cδ (−x ∗ )} µ (d (−u)) d δ (−u) sup{u, a : a ∈ C δ (x ∗ )} µ (d (−u)) d δ = −σx ∗ .

Remark 23.3: The Steiner point of a classical vector x ∈ Rn is the vector itself, i.e. σ x = x. From this and Proposition 23.2 we obtain for the fuzzy vector x ∗0 := x ∗ σx ∗ σx ∗0 = σ x ∗ σx∗ = σ x ∗ − σσx∗ = 0. Example 23.2 For a fuzzy number x ∗ = m, s, l, r L R of L R-type the Steiner point is given by σx ∗ = m + r σx R∗ − l σx L∗ . Proposition 23.3: The Steiner point σ x ∗ of a fuzzy vector x ∗ fulfills the following: min ρ2 (a, x ∗ ) = ρ2 (σ x ∗ , x ∗ ).

a∈Rn

For x ∗0 = x ∗ σx ∗ and y∗0 = y∗ σ y∗ we have ρ22 (x ∗ , y∗ ) = ρ22 (x ∗0 , y∗0 ) + σx ∗ − σ y∗ 22 .

(23.8)

The proof is given in Körner (1997a, Theorem II.16, page 17). Remark 23.4: For p = 2 the space L p S n−1 × (0; 1] is a Hilbert space. Therefore in the following the metric ρ2 (·, ·) on F (Rn ) is used frequently. Moreover only fuzzy


160

October 24, 2010

17:37



vectors in F2 (Rn ), defined by F2 Rn := x ∗ ∈ F Rn : x ∗ 2 < ∞ , will be considered in order to guarantee the existence of the metric ρ2 (·, ·). The space of all support functions s x ∗ (·, ·) with x ∗ ∈ F2 (Rn ) will be denoted by H2 (Rn ). with the closed and convex Proposition 23.4: The space (F2 (Rn ) , ρ2 ) is isomorphic cone H2 (Rn ) of the Hilbert space L 2 S n−1 × (0; 1] with the inner product ∗

∗

1

x , y := s , s = n x∗

y∗

0

S n−1

s x ∗ (u, δ) s y∗ (u, δ) µ(d u) d δ.

(23.9)

For the proof see Körner (1997a, page 15). The closedness of the cone which is isomorphic to (F2 (Rn ) , ρ2 ) is important for least sum of squares approximations in linear models. Proposition 23.5: The inner product defined by (23.9), for x ∗ , y∗ , z ∗ ∈ F2 (Rn ) and λ ∈ R+ 0 obeys the following: (1) x ∗ , x ∗ ≥ 0

and

x ∗ , x ∗ = 0 ⇔ x ∗ = 0.

(2) x ∗ , y∗ = y∗ , x ∗ . (3) x ∗ ⊕ y∗ , z ∗ = x ∗ , z ∗ + y∗ , z ∗ . (4) λ · x ∗ , y∗ = λ x ∗ , y∗ . √ (5) |x ∗ , y∗ | ≤ x ∗ , x ∗ y∗ , y∗ . Moreover ρ2 (x ∗ , y∗ ) =

x ∗ , x ∗ − 2 x ∗ , y∗ + y∗ , y∗ .

(23.10)

The proof follows from the definition of the inner product; see also Feng (2001, page 12). Similar to real vectors we have the following relationships for fuzzy vectors: x ∗ , y∗ =

1 ∗ x ⊕ y∗ 22 − x ∗ 22 − y∗ 22 2

and

x ∗ 22 = x ∗ , x ∗ ,

which is seen by application of the definition of the inner product (23.9) and the norm (23.6). Example 23.3 For the inner product x ∗ , y ∗ of two fuzzy numbers x ∗ = m x , sx , l x , r x L R and y ∗ = m y , s y , l y , r y L R of L R-form we obtain:


October 24, 2010

17:37



161

x ∗ , y ∗ = m x m y + sx s y + x L∗ 22 l x l y + x R∗ 22 r x r y + x L∗ 1 (−m x l y − m y l x + sx l y + s y l x ) + x R∗ 1 (m x r y + m y r x + sx r y + s y r x ) where

x L∗ 22

1 = 2

1

L

(−1)

2

(δ)

x L∗ 1

d δ,

0

1 = 2

1

L (−1) (δ) d δ

0

and x R∗ 22 and x R∗ 1 are defined analogously. For the distance ρ2 (x ∗ , y ∗ ) of these fuzzy numbers we obtain: ρ22 (x ∗ , y ∗ ) = (m x − m y )2 + (sx − s y )2 + x L∗ 22 (l x − l y )2 + x R∗ 22 (r x − r y )2 − 2 x L∗ 1 (m x − m y ) − (sx − s y ) (l x − l y ) + 2 x R∗ 1 (m x − m y ) + (sx − s y ) (r x − r y ). In the case of trapezoidal fuzzy numbers t ∗ (m, s, l, r ) we have L (−1) (δ) = R (−1) (δ) = 1 − δ and x L∗ 1

=

x R∗ 1

1 = 2

x L∗ 22 = x R∗ 22 =

23.3

1 2

1

(1 − δ) d δ =

0 1

1 4

(1 − δ)2 d δ =

0

1 . 6

Generalized Hukuhara difference

The generalized difference operation for fuzzy numbers via the extension principle increases the fuzziness. For symmetric fuzzy numbers x ∗ in L R-form with L(·) = R(·), l = r and m = 0 we obtain x ∗ x ∗ = x ∗ ⊕ (−x ∗ ) = 0. In general for given x ∗ ⊕ y∗ and given x ∗ the vector y∗ cannot be calculated by (x ∗ ⊕ y∗ ) x ∗ . Moreover the difference does not correspond to the difference in the functional space L 2 S n−1 × (0; 1] , i.e. the support function of y∗ x ∗ is not equal to s y∗ − s x ∗ . In order to find a difference which corresponds more to the usual difference of numbers the following argument is helpful: The difference x − y of two real numbers x and y is the real number z for which y + z = x. This results in the so-called Hukuhara difference.


162

October 24, 2010

17:37



Definition 23.2: For x ∗ ∈ F2 (Rn ) and y∗ ∈ F2 (Rn ) the so-called Hukuhara difference y∗ H x ∗ is – if it exists – the solution h∗ ∈ F2 (Rn ) of the equation x ∗ ⊕ h∗ = y ∗ Remark 23.5: In Diamond and Kloeden (1994) the following is proved. If the Hukuhara difference of two fuzzy numbers x ∗ and y∗ exists in L 2 S n−1 × (0; 1] , then for the corresponding support functions it follows s y∗ H x ∗ = s y∗ − s x ∗ and the δ-cuts of y∗ H x ∗ are given by Cδ ( y∗ H x ∗ ) = z ∈ Rn : C δ (x ∗ ) + z ⊆ Cδ ( y∗ )

∀ δ ∈ (0; 1].

The Hukuhara difference need not exist for all x ∗ , y∗ ∈ F2 (Rn ). If supp( y∗ ) ⊂ supp(x ∗ ) no element h∗ ∈ F2 (Rn ) is possible obeying x ∗ ⊕ h∗ = y∗ . Example 23.4 The Hukuhara difference y ∗ H x ∗ of two symmetric triangular fuzzy numbers x ∗ = d ∗ (m x , l x , l x ) and y ∗ = d ∗ (m y , l y , l y ) with l y < l x does not exist. In the case of l y ≥ l x the Hukuhara difference of y ∗ H x ∗ is the triangular fuzzy number d ∗ (m y − m x , l y − l x , l y − l x ). By the above results it is necessary to construct a more general difference for two fuzzy vectors. This is possible by a generalization of the Hukuhara difference. Definition 23.3: For x ∗ , y∗ ∈ F2 (Rn ) the generalized Hukuhara difference ∼ H is the l 2 -approximation of the Hukuhara difference, i.e. y∗ ∼ H x ∗ := h∗

⇔

ρ2 (x ∗ ⊕ h∗ , y∗ ) =

min ρ2 (x ∗ ⊕ z ∗ , y∗ ).

z ∗ ∈F2 (Rn )

(23.11)

The generalized Hukuhara difference ∼ H is an extension of the Hukuhara difference H to the whole space F2 (Rn ), i.e. if y∗ H x ∗ exists it is identical to y∗ ∼ H x ∗ . Remark 23.6: For the generalized Hukuhara difference usually (x ∗ ∼ H y∗ ) ⊕ y∗ = x ∗ , i.e. the generalized Hukuhara difference is not a difference in F2 (Rn ) in the standard sense. The inequality holds if y∗ is more fuzzy than x ∗ . In this situation the result x ∗ ∼ H y∗ is a real number. But for known sum x ∗ ⊕ y∗ and given x ∗ the generalized Hukuhara difference (x ∗ ⊕ y∗ ) ∼ H x ∗ is the best possible value for the unknown y∗ . Contrary to the operation for the operations H and ∼ H the fuzziness of the result is less than the fuzziness of the minuend, moreover by x ∗ ⊕ 0 = x ∗ we have x ∗ H x ∗ = x ∗ ∼ H x ∗ = 0. Remark 23.7: For real numbers we have y x = y ∼ H x = y − x. For fuzzy numbers and fuzzy vectors and given sum and one quantity the other quantity is best calculated by the generalized Hukuhara difference. The existence and uniqueness of y∗ ∼ H x ∗ follows from the projection theorem in Hilbert spaces.


October 24, 2010

17:37



163

Theorem 23.1: Let (H, · ) be a Hilbert space, x ∈ H and K = ∅ a closed convex subset of H . Then there exists exactly one element y ∈ K obeying x − y = inf x − z. z∈K

The proof is given in books on functional analysis. to the closed convex By Proposition 23.4 the space F2 (Rn ) , ρ2 is isomorphic cone H2 (Rn ) in the Hilbert space L 2 S n−1 × (0; 1] . The approximation problem (23.11) for the calculation of the generalized Hukuhara difference of two fuzzy ∗ ∗ n vectors xn−1, y ∈ F2 (R ) can be transformed to an equivalent minimization problem 2 × (0; 1] . This equivalent problem is in L S min

s z∗ ∈H2 (Rn )

(s x ∗ + s z∗ ) − s y∗ 2 =

min

s z∗ ∈H2 (Rn )

(s y∗ − s x ∗ ) − s z∗ 2 .

The support function of the generalized Hukuhara difference s y∗ ∼ H x ∗ is the unique orthogonal projection of the function s y∗ − s x ∗ onto the cone of support functions H2 (Rn ). Another important property of the generalized Hukuhara difference ∼ H is the following: σ y∗ ∼ H x ∗ = σ y∗ − σ x ∗ In general the calculation of the generalized Hukuhara difference can be complicated. For some special cases the result can be given in closed form. An important tool for this is the Kuhn–Tucker theorem. Theorem 23.2: Kuhn–Tucker Let C be a symmetric positive semidefinite matrix. Then the solution of the quadratic optimization problem min (x − c)T C (x − c)

Ax ≤ a

is fulfilling the so-called Kuhn–Tucker condition 2C(x − c) + A T u = 0, u ≥ 0,

y≥0

Ax + y = a, and

u T y = 0.

The proof is given in Luptácˇ ik (1981, Theorem IV.C, page 118). A special case where the solution of the generalized Hukuhara difference can be given in closed form is the generalized Hukuhara difference of triangular fuzzy numbers. The generalized Hukuhara difference of two trapezoidal fuzzy numbers is not possible in a simple way. An example is the following.


164

October 24, 2010

17:37



Figure 23.1 Generalized Hukuhara difference of two trapezoidal fuzzy numbers.

Example 23.5 Let x ∗ and y ∗ be two trapezoidal fuzzy numbers x ∗ = t ∗ (6, 2, 1, 1) and y ∗ = t ∗ (6, 1, 3, 3). Then h ∗ = y ∗ ∼ H x ∗ has a characterizing function of polygonal type (cf. Figure 23.1). In many cases the generalized Hukuhara difference y ∗ ∼ H x ∗ of two trapezoidal fuzzy numbers x ∗ = t(m x , sx , l x , r x ) and y ∗ = t(m y , s y , l y , r y ) is again a trapezoidal fuzzy number. For sx > s y and (2 sx + l x + r x ) < (2 s y + l y + r y ) as in the above Example 23.3, i.e. the support of x ∗ is less fuzzy than the support of y ∗ , and the 1-cut of x ∗ a larger uncertainty, then the result is a polygonal fuzzy number. h ∗ = {(x1 , 0), (x2 , y2 ), (x 3 , 1), (x4 , 1), (x5 , y5 ), (x 6 , 0)} . The determination of the numbers x1 , x2 , x3 , x 4 , x5 , x6 , y2 and y5 is done by min

x1 ,x2 ,x3 ,x4 ,x5 ,x6 ,y2 ,y5

ρ2 (x ∗ ⊕ h ∗ , y ∗ )

under the conditions xi ≤ xi+1 , i = 1 (1) 5, and 0 ≤ y2 , y5 ≤ 1. The distance is calculated using the abbreviations f 1 = m x − m y − sx + s y and f 2 = m x − m y + sx − s y in the following way: ρ22 (x ∗

∗

∗

y2

⊕h ,y ) =

0

+

1 y2

x 2 − x1 f 1 − (1 − δ)(l x − l y ) + x1 + δ y2

2 dδ

2 x2 − x3 f 1 + x3 + (1 − δ) l y − l x + dδ 1 − y2


October 24, 2010

17:37



y5

165

2

x 5 − x6 dδ y5 0 2 1 x 5 − x4 f 2 + x 4 + (1 − δ) r x − r y + dδ + 1 − y5 y5

+

f 2 + (1 − δ)(r x − r y ) + x6 + δ

For this minimization problem the methods of nonlinear programming can be used. For details see Luptácˇ ik (1981). The Hukuhara difference of two polygonal fuzzy numbers is again of polygonal type. The calculation complexity increases exponentially with the number of points describing the polygons. In applications the complexity of the calculation can be reduced by looking only to a limited number of δ-cuts. Another possibility to reduce the complexity is to restrict the solution space for the projection. Remark 23.8: If the solution space for the Hukuhara difference h ∗ = y ∗ ∼ H x ∗ of two trapezoidal fuzzy numbers x ∗ = t ∗ (m x , sx , l x , r x ) and y ∗ = t ∗ (m y , s y , l y , r y ) is restricted to the set of trapezoidal fuzzy numbers, the parameters m h , sh , lh and rh of sh := s y − sx , lh := l y − l x , rh := r y − r x , the solution h ∗ = t ∗ (m h , sh , lh , rh ) with r1 := x R∗ 1 , l1 := x L∗ 1 , r2 := x R∗ 22 and l2 := x L∗ 22 can be calculated in the following way: ⎧ max { sh , 0} ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ max { rh , 0} sh + r1 sh = ⎪ lh , 0 max sh + l 1 ⎪ ⎪ ⎪ ⎪ ⎩ lh + r 1 rh , 0 max sh + l 1

for

lh , rh ≥ 0

for

lh ≥ 0, rh < 0

for

lh < 0, rh ≥ 0

for

lh , rh < 0

⎧ for max lh , 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ l1 r1 ⎪ ⎪ ⎪ rh , 0 for max lh + ⎪ ⎨ l2 − l12 lh = l1 (r2 − 2 r12 ) ⎪ ⎪ max lh + sh , 0 for ⎪ ⎪ (r2 − r12 ) (l2 − l12 ) − l12 r12 ⎪ ⎪ ⎪ ⎪ sh + l 1 r 1 rh l1 ⎪ ⎪ max ⎩ ,0 for lh + l2 − l12

sh , rh ≥ 0 sh ≥ 0, rh < 0 sh < 0, rh ≥ 0 sh , rh < 0


166

October 24, 2010

17:37



⎧ ⎪ for rh , 0} ⎪ max { ⎪ ⎪ ⎪ ⎪ ⎪ l1 r1 ⎪ ⎪ max rh + lh , 0 for ⎪ ⎪ ⎨ r2 − r12 rh = r1 (l2 − 2 l12 ) ⎪ ⎪ sh , 0 for max rh + ⎪ ⎪ (r2 − r12 ) (l2 − l12 ) − l12 r12 ⎪ ⎪ ⎪ ⎪ ⎪ r1 lh sh + l 1 r 1 ⎪ ⎪ ,0 for rh + ⎩ max r2 − r12 m h = m y − m x + r1 ( lh − lh . rh − rh ) − l1

sh , lh ≥ 0 sh ≥ 0, lh < 0 sh < 0, lh ≥ 0 sh , lh < 0

The proof is given in Munk (1988, page 33). For the examples in the following chapter the solution space for the calculations of Hukuhara differences of two trapezoidal fuzzy numbers is restricted to the set of trapezoidal fuzzy numbers, and the calculations of Example 23.3 are used.


November 15, 2010

11:14


24

Descriptive methods for fuzzy time series Descriptive methods of time series analysis work without stochastic models. The goal of these methods is to find trends and seasonal influence in time series by elementary methods. A short survey of such methods for standard time series is given in Janacek (2001). A fuzzy time series (xt∗ )t∈T is an ordered sequence of fuzzy numbers, where usually T = {1, 2, . . . , N }. Formally a one-dimensional fuzzy time series is a mapping T → F (R) which gives for any time point t a fuzzy number xt∗ . Classical time series are special forms of fuzzy time series. A reasonable generalization should generate the classical results in the case of classical data. This is guaranteed if the extension principle is used. For the approximation by a polynomial trend in the case of fuzzy time series see Körner (1997a), Körner and Näther (1998), Munk (1998), Näther and Albrecht (1990).

24.1

Moving averages

In order to obtain an overview concerning the long time behavior of a time series it is useful to eliminate the random oscillations of an observed time series (x t∗ )t∈T . This elimination is possible by local approximation. A simple way to do this is by smoothing the values of the time series by a local arithmetic mean. This smoothing can be taken from classical time series analysis, i.e. yt =

q 1 xt+i , 2q + 1 i=−q


t = q + 1 (1) N − q ,

Reinhard Viertl

167


168

November 15, 2010

11:14



and application of the extension principle in the case of fuzzy data xt∗ . For that,first the ∗ ∗ , . . . , x t+q have to be combined to a fuzzy vector x ∗ ∈ F R2q+1 fuzzy numbers x t−q with vector-characterizing function ξx ∗ (·, . . . , ·). From this the fuzzy value yt∗ and its characterizing function ξ yt∗ (·) is given by its values ξ yt∗ (y) for all y ∈ R by ⎧ ⎨

q

⎫ ⎬

1 ξ yt∗ (y) = sup ξx ∗ (x−q , . . . , xq ) : (x−q , . . . , xq ) ∈ R2q+1 : xi = y . ⎩ ⎭ 2 q + 1 i=−q The δ-cuts C δ (yt∗ ) are given by theorem 3.1 ⎡

q 1 ∗ ⎣ min xi , Cδ (yt ) = (x−q ,...,x q )∈C δ (x ∗ ) 2 q + 1 i=−q

⎤ q 1 max xi ⎦ . (x−q ,...,xq )∈Cδ (x ∗ ) 2 q + 1 i=−q

The result can be given in short by yt∗ =

q 1 x∗ , · 2 q + 1 i=−q t+i

t = q + 1 (1) N − q ,

∗ ∗ For fuzzy observations (xt∗ )t∈T with approximately linear behavior, i.e. xt ≈ a∗ ⊕ ∗ ∗ t · b and corresponding δ-cuts of the coefficients Cδ (a ) = a δ , a δ and C δ (b ) = bδ , bδ , respectively, the δ-cuts C δ (xt∗ ) can be written, using random variables ε t,δ and εt,δ in the following way:

Cδ (xt∗ ) = a δ + bδ t + ε t,δ , a δ + bδ t + εt,δ . The random variables (ε t,δ )t∈T and (εt,δ )t∈T can be considered as independent relative to the time, obeying E εt,δ = 0 and E εt,δ = 0 with finite variances Var εt,δ and Var εt,δ . For fixed time t ∈ T the random variables εt,δ and εt,δ are dependent. For the δ-cuts Cδ (yt∗ ) of the mean value yt∗ we obtain ⎡ C δ (yt∗ )= ⎣

min

(x−q ,...,xq )∈C δ (x ∗ )

q 1 xi , 2q + 1 i=−q

max

(x −q ,...,x q )∈Cδ (x ∗ )

⎤ q 1 xi ⎦ 2q + 1 i=−q

⎤ q q 1 1 a δ + bδ (t + i) + ε t+i,δ , =⎣ a δ + bδ (t + i) + ε t+i,δ ⎦ 2q + 1 2q + 1 ⎡

i=−q

i=−q

⎤ q q 1 1 = ⎣a δ + bδ t + εt+i,δ , a δ + bδ t + εt+i,δ ⎦ 2q + 1 2q + 1 ⎡

i=−q

i=−q

≈ a δ + bδ t + E ε t,δ , a δ + bδ t + E ε t,δ = a δ + bδ t, a δ + bδ t .


November 15, 2010

11:14


DESCRIPTIVE METHODS FOR FUZZY TIME SERIES

169

Figure 24.1 Moving averages of length 2.

For strict linear time series, like in classical time series analysis, the moving averages are identical to the original time series. Moving average procedures are a kind of filtering of time series. The filtered time series is smoother if more observations are used for the filtration, i.e. as larger q is. But on the boundaries of T it is not possible to obtain filtered values. The smoothed time series (yt∗ ) is shorter than the original time series (xt∗ )t∈T . Example 24.1 Figure 24.1 gives a fuzzy time series with moving averages of length 2. Here T = {1, 2, · · · ,15}. The solid trapezoids are the characterizing functions of the original values xt∗ t=1 (1) 15 = (t ∗ (m t , st , lt , rt ))t=1 (1) 15 . The support of the individual measurements is the basis of the trapezoid. The characterizing functions of the values of the smoothed time series (yt∗ )t=3(1)13 are depicted by broken lines.

24.2

Filtering

Filtering or filtration of time series are procedures to transform the values of a time series. Moving averages are special forms of filtering procedures.

24.2.1

Linear filtering

Moving averages are examples of linear transformations of fuzzy time series (x t∗ )t∈T to a smoothed time series (yt∗ ). A general method from classical time series analysis can be adapted for fuzzy time series. This generalization is given in the following definition.


170

November 15, 2010

11:14



Definition 24.1: A transformation L = (a−q , . . . , as ) ∈ Rs+q+1 of a time series (x t∗ )t∈T to a time series (yt∗ )t=q+1 (1) N −s by yt∗ = L x t∗ =

s

∗ ai · xt+i ,

t = q + 1 (1) N − s ,

i=−q

is called a linear filter. s ai = 1 is called a moving average. A linear filter L with i=−q

If s = q with ai = (2q + 1)−1 , i = −q (1) q it is called an ordinary moving average. Remark 24.1: By the following rules for fuzzy numbers x ∗ , y ∗ , z ∗ ∈ F (R) and real numbers α, β, γ ∈ R, α · (β · x ∗ ⊕ γ · y ∗ ) = (α β) · x ∗ ⊕ (α γ ) · y ∗ and (α · x ∗ ⊕ β · y ∗ ) ⊕ γ · z ∗ = α · x ∗ ⊕ (β · y ∗ ⊕ γ · z ∗ ) for two fuzzy time series (xt∗ )t∈T and (yt∗ )t∈T , and α, β ∈ R we obtain the following: L (α · xt∗ ⊕ β · yt∗ ) = s s ∗ ∗ ∗ ∗ (ai α) · xt+i ai · (α · xt+i ⊕ β · yt+i ) = ⊕ (ai β) · yt+i = i=−q

⎛

=⎝

s

⎛

⎞

∗ ⎠ ⊕⎝ (ai α) · xt+i

i=−q

i=−q s

⎞

∗ ⎠ = α · L xt∗ ⊕ β · L yt∗ (24.1) (ai β) · yt+i

i=−q

Proposition 24.1: The δ-cuts Cδ (yt∗ ) of the smoothed time series yt∗ = L xt∗ can be calculated from the δ-cuts Cδ (xt∗ ) = x t,δ , x t,δ of the observed time series (x t∗ )t∈T by ⎛ Cδ (yt∗ ) = Cδ ⎝

s

⎞

⎡

∗ ⎠ ai · xt+i =⎣

i=−q

s

ai ct+i ,

i=−q

s

⎤ ai ct+i ⎦

i=−q

with ct+i =

x t+i,δ x t+i,δ

for for

ai ≥ 0 ai < 0

and

ct+i =

x t+i,δ x t+i,δ

for for

ai ≥ 0 ai < 0 .


November 15, 2010

11:14



Proof: Applying Theorem 3.1 with f (x −q , . . . , xs ) = min ∗ a x =

x∈Cδ (x )

a xδ a xδ

for for

a≥0 a N is given by % x N∗ f = m %∗N f , i.e. the value of the regression trend.

24.4.2

Model with seasonal component

For this kind of model the calculation of the components m ∗t or st∗ is far more complicated than in the classical case. The problems come from the semilinear structure of F (R). In the classical situation moving averages over a period are used to remove seasonal effects. As estimate of the seasonal component for a time t the difference between the observed value xt and the moving average is used. In the case of fuzzy data this method is not suitable. First, the fuzziness of the individual values is increasing for sums. Secondly, an estimation by the Hukuhara difference between observed value xt∗ and the moving average is problematic because for some time points the fuzziness of the observation is smaller than the fuzziness of the mean value. Using the Hukuhara difference would yield a precise number as an estimation.


178

November 15, 2010

11:14



A method which reduces the determination of the fuzzy components to the methods of classical time series is the following: The seasonal influence on the fuzziness of the observations (xt∗ )t∈T is recognized in the length of the δ-cuts Cδ (xt∗ ) = x t,δ , x t,δ . The length of the δ-cuts lt,δ = lCδ (xt∗ ) = x t,δ − x t,δ = m &t,δ +& st,δ +& εt,δ ∈ R+ 0 of the elements (xt∗ )t∈T for every δ ∈ (0; 1] can be considered as standard time series (lt,δ )t∈T with seasonal component. Conditions for the trend component m &t,δ and the &t,δ ≥ 0 and& st,δ ≥ 0, respectively. These conditions can seasonal component& st,δ are m be fulfilled because the following holds: lt,δ ∈ R+ 0 . In order to provide the uniqueness st,δ different conditions can be used. of the components m &t,δ and & In the following, the condition min m &t,δ = 0

(24.3)

t=1(1)N

is used. Other possibilies are min & st,δ = 0 or a mixture of similar conditions. By t=1(1)N

using (24.3) a larger part of the fuzziness of the observations is ascribed to the seasonal component. st,δ are influenced by the fuzziness of the individual The quantities m &t,δ and & observations and not by their position. Therefore they can be used to estimate the fuzziness of trend and seasonal components. By Remark 23.3 the Steiner point σx ∗ of a fuzzy number x ∗ is a kind of midpoint. Therefore σx ∗ can be used as information on the position, and x 0∗ = x ∗ σx ∗ can be used as information on the fuzziness of x ∗ . In order to determine the trend and seasonal component first for all values of the fuzzy time series (x t∗ )t∈T the Steiner point σxt∗ is calculated. From the classical time series of the Steiner points (σxt∗ )t∈T with standard time series methods the position m t and st of the trend and seasonal component can be calculated. The ∗ calculation of the fuzziness m ∗t,0 and st,0 of the components is done from the time series ∗ ∗ (xt,0 )t∈T = (xt σxt∗ )t∈T . With this decomposition the whole component model reads ∗ xt∗ ≈ (m t + st + εt ) ⊕ m ∗t,0 ⊕ st,0 ,

t = 1 (1) N .

(24.4)

The decomposition into position and fuzziness does not change the length of the ∗ δ-cuts, i.e. lCδ (xt∗ ) = lCδ (xt,0 &t,δ , & st,δ , and & εt,δ of the classical ) = lt,δ . The components m ∗ %t,0 and time series (lt,δ )t∈T from the beginning of this section and the δ-cuts Cδ m ∗ ∗ in the following way: st,0 are used to estimate m ∗t,0 and st,0 Cδ % ∗ m &t,δ ∗ %t,0 = Cδ m Cδ (xt,0 ) lt,δ with

and

∗ ∗ %t,0 = Cδ % st,0 = [0; 0] = 0 Cδ m

∗ & st,δ ∗ st,0 = Cδ % Cδ (xt,0 ) if lt,δ = 0 lt,δ if lt,δ = 0.


November 15, 2010

11:14



179

The above decomposition of the δ-cuts follows the consideration after which the st,δ the influence of the seasonal value m &t,δ describes the influence of the trend, and & component on the length of the δ-cut Cδ (xt∗ ). This method has the following problems: Without additional conditions concernst,δ the δ-cuts of the ing the quantities m &t,δ and & ∗ estimations ∗for the components can %∗t,0 ⊆ Cδ1 m %∗t,0 and C δ2 % st,0 ⊆ Cδ1 % st,0 for 0 < δ1 < δ2 ≤ 1. be such that C δ2 m Additional conditions in order to avoid this are in general complicated and increase the computational work for the determination of the components. A similar but simpler method is the following: For the determination of the fuzziness of the trend and seasonal component not all δ-cuts are considered but the ‘average’ length of fuzziness. Instead of the time series (lt,δ )t∈T , δ ∈ (0; 1], the time series !'

1

(lt )t∈T =

" lt,δ d δ

0

t∈T

is considered. (lt )t∈T is a time series with seasonal influence, i.e. lt = m &t +& st +& εt ∈ R+ 0,

t = 1 (1) N ,

st ≥ 0. From these two values estimates of the fuzziness of the trend with m &t ≥ 0 and& and seasonal component are obtained by m %∗t,0 =

m &t ∗ · x t,0 lt

and

∗ % st,0 =

& st ∗ · x t,0 lt

if lt = 0

∗ for t ∈ T . For lt = 0 the estimates are m %∗t,0 = % st,0 = 0. Unfortunately for these esti∗ ∗ mates the condition % st,0 = % st+kp,0 , k ∈ Z, is not necessarily fulfilled. ∗ is the average of all elements The final estimate of the seasonal component st,0 ∗ ∗ % st+kp,0 , k ∈ Z. The fuzzy trend m t,0 is calculated using a regression of the quantities x N∗ f for x N∗ f for time N f > N is m %∗t,0 . The estimate %

% x N∗ f = (% m N f +% sN f ) ⊕ m %∗N f ⊕% s N∗ f = (% mNf ⊕ m %∗N f ) ⊕ (% s N f ⊕% s N∗ f ) , s N f are the estimates calculated from the real valued time series % N f and % where m %∗N f and % s N∗ f are the estimates of the fuzziness. The quantity (% mNf ⊕ σxt∗ t∈T , and m m %∗N f ) is the fuzzy trend and (% s N f ⊕% s N∗ f ) is the fuzzy seasonal component. Example 24.4 Table 24.1 contains 48 trapezoidal fuzzy observations with seasonal component. The values are depicted in Figure 24.3. First, for every time the Steiner point σxt∗ and the average length lt of the δ-cuts of the observation are calculated. In Figure 24.4 the time series σxt∗ t∈T of the Steiner points is depicted as a solid line, and the time series (lt )t∈T of average lengths of the δ-cuts is depicted as a dotted line.


180

November 15, 2010

11:14



Table 24.1 Values of 48 trapezoidal fuzzy observations with seasonal component. x1∗ = t ∗ (1.35, 0.20, 0.44, 0.56) x3∗ = t ∗ (7.94, 0.26, 0.67, 1.25) x5∗ = t ∗ (7.41, 0.28, 0.74, 0.85) x7∗ = t ∗ (0.62, 0.16, 0.59, 0.66) x 9∗ = t ∗ (−1.47, 0.10, 0.56, 0.44) ∗ x11 = t ∗ (−1.42, 0.12, 0.34, 0.57) ∗ x13 = t ∗ (1.72, 0.19, 0.97, 0.63) ∗ x15 = t ∗ (9.74, 0.27, 1.28, 0.98) ∗ x17 = t ∗ (9.82, 0.27, 0.66, 1.14) ∗ x19 = t ∗ (2.65, 0.20, 0.81, 0.74) ∗ x21 = t ∗ (0.08, 0.12, 0.46, 0.46) ∗ x23 = t ∗ (0.33, 0.15, 0.58, 0.61) ∗ x25 = t ∗ (3.64, 0.22, 0.98, 0.52) ∗ = t ∗ (10.67, 0.34, 0.80, 0.81) x27 ∗ x29 = t ∗ (10.99, 0.30, 1.21, 1.36) ∗ = t ∗ (4.66, 0.23, 0.67, 0.86) x31 ∗ x33 = t ∗ (1.81, 0.16, 0.64, 0.51) ∗ = t ∗ (2.32, 0.18, 0.60, 0.40) x35 ∗ x37 = t ∗ (6.04, 0.22, 0.74, 0.73) ∗ = t ∗ (12.45, 0.35, 0.93, 1.43) x39 ∗ x41 = t ∗ (13.17, 0.33, 1.38, 1.34) ∗ x43 = t ∗ (6.02, 0.26, 1.10, 0.86) ∗ x45 = t ∗ (3.46, 0.19, 0.53, 0.47) ∗ x47 = t ∗ (3.92, 0.16, 0.67, 0.46)

x 2∗ = t ∗ (4.84, 0.16, 0.62, 0.74) x 4∗ = t ∗ (9.57, 0.23, 0.62, 1.24) x 6∗ = t ∗ (4.37, 0.25, 0.84, 0.84) x 8∗ = t ∗ (−1.49, 0.16, 0.70, 0.48) ∗ x 10 = t ∗ (−1.41, 0.11, 0.29, 0.43) ∗ x12 = t ∗ (−0.67, 0.13, 0.46, 0.58) ∗ x14 = t ∗ (5.45, 0.18, 0.55, 1.20) ∗ x16 = t ∗ (11.88, 0.26, 0.72, 0.77) ∗ x18 = t ∗ (6.46, 0.26, 0.85, 1.29) ∗ x20 = t ∗ (0.68, 0.18, 0.79, 0.49) ∗ x22 = t ∗ (0.49, 0.10, 0.49, 0.35) ∗ x24 = t ∗ (1.21, 0.14, 0.38, 0.74) ∗ x26 = t ∗ (7.24, 0.28, 1.17, 1.20) ∗ x28 = t ∗ (11.88, 0.28, 0.92, 0.98) ∗ x30 = t ∗ (8.20, 0.24, 0.72, 0.74) ∗ x32 = t ∗ (2.23, 0.19, 0.82, 0.84) ∗ x34 = t ∗ (2.46, 0.16, 0.53, 0.35) ∗ x36 = t ∗ (2.51, 0.19, 0.62, 0.56) ∗ x38 = t ∗ (10.34, 0.26, 1.15, 0.96) ∗ x40 = t ∗ (14.80, 0.27, 0.75, 1.36) ∗ x42 = t ∗ (8.86, 0.26, 0.96, 1.16) ∗ x44 = t ∗ (4.22, 0.18, 0.69, 0.74) ∗ x46 = t ∗ (3.58, 0.19, 0.55, 0.47) ∗ x48 = t ∗ (4.76, 0.21, 0.50, 0.77)

Figure 24.3 Fuzzy observations with seasonal influence.


November 15, 2010

11:14



181

Figure 24.4 Time series of Steiner points and average length of δ-cuts. From Figure 24.4 it can be seen, that the length of the period of both time series is 12. For both time series with classical methods the trend and seasonal component are calculated. After elimination of the seasonal component, both (σxt∗ )t=1(1)48 and (lt )t=1(1)48 show approximately linear behavior. To these values using regression models a line is attached. ∗ ,k ∈ From this the fuzzy seasonal components st∗ , t = 1 (1) 48, with st∗ = st+12k Z, are obtained. These are given in Table 24.2 and depicted in Figure 24.5. The fuzzy linear trend is m ∗t = t ∗ (1.59, 0.043, 0.16, 0.19) + t ∗ (0.14, 0.0018, 0.0061, 0.0095) · t . In Figure 24.6 the observed data xt∗ , t = 1 (1) 48, and the predicted values % x t∗ = m ∗t ⊕ st∗ , t = 1 (1) 48, are depicted for comparison. Table 24.2 Values of the fuzzy seasonal components. s1∗ s2∗ s3∗ s4∗ s5∗ s6∗

= t ∗ (−1.30, 0.13, 0.46, 0.37) = t ∗ (2.47, 0.16, 0.61, 0.75) = t ∗ (5.58, 0.22, 0.65, 0.78) = t ∗ (7.36, 0.17, 0.48, 0.69) = t ∗ (5.70, 0.22, 0.72, 0.87) = t ∗ (2.09, 0.16, 0.54, 0.63)

s7∗ = t ∗ (−1.55, 0.12, 0.44, 0.44) s8∗ = t ∗ (−3.90, 0.10, 0.42, 0.35) s9∗ = t ∗ (−4.36, 0.05, 0.20, 0.18) ∗ s10 = t ∗ (−4.12, 0.03, 0.11, 0.10) ∗ s11 = t ∗ (−4.36, 0.06, 0.21, 0.20) ∗ s12 = t ∗ (−3.88, 0.07, 0.19, 0.26)


182

November 15, 2010

11:14



Figure 24.5 Fuzzy seasonal components.

24.5

Difference filters

In matching polynomials to time series (xt∗ )t∈T a problem is the degree of the polynomial. Similar to classical time series it is also possible for fuzzy time series using so-called difference filters.

Figure 24.6 Observed and predicted values of the time series.


November 15, 2010

11:14



183

Remark 24.4: For a polynomial f (t) = a0∗ ⊕ a1∗ · t ⊕ a2∗ · t 2 ⊕ . . . ⊕ am∗ · t m with degree m > 0 and a0∗ , a1∗ , . . . , am∗ ∈ F2 (R) the difference g(t) = f (t) ∼ H f (t − 1) is for t > 0 a polynomial of maximal degree m −1. Proof: By Remark 23.6 with g(t) = a1∗ ⊕ a2∗ · t 2 − (t − 1)2 ⊕ . . . ⊕ am∗ · t m − (t − 1)m and t i − (t − 1)i ≥ 0, i = 1 (1) m we obtain g(t) ⊕ f (t − 1) = a1∗ ⊕ a2∗ · t 2 − (t − 1)2 ⊕ . . . ⊕ am∗ · t m − (t − 1)m ⊕ a0∗ ⊕ a1∗ · (t − 1) ⊕ a2∗ · (t − 1)2 ⊕ . . . ⊕ am∗ · (t − 1)m = f (t) . The Hukuhara difference g(t) is a polynomial of degree (m −1) which is uniquely determined by Theorem 23.1. Definition 24.4: Let (x t∗ )t∈T be a fuzzy time series. A mapping , defined by ∗ ,

xt∗ = xt∗ ∼ H xt−1

t = 2 (1) N ,

is called a difference filter of order 1. Difference filters of order m with m > 1 are recursively defined by ∗

m xt∗ = m−1 xt∗ ∼ H m−1 xt−1 ,

t = m + 1 (1) N .

The difference operation reduces the degree of a polynomial by 1. Applying the difference operation (m −1) times to a polynomial of degree m results in constant values. Remark 24.5: A difference filter in the context of Definition 24.3 is a general filter L H (a−1 , a0 ) = (−1, 1). Similar to standard time series difference filters can be used to eliminate seasonal effects. In order to do this so-called seasonal differences of length p, defined by ∗

p xt∗ = xt∗ ∼ H xt− p,

t = p + 1 (1) N ,

are used. It is possible to combine the difference operation with seasonal differences. If a fuzzy time series (x t∗ )t∈T has seasonal oscillations with period p and linear trend, than the filtered series yt∗ = p xt∗ , where first a seasonal filter of length p is applied, and then a difference filter, is approximately constant.


184

November 15, 2010

11:14



For numerical calculation of the length of the period and the trend, for different natural numbers p and m the hypothesis

m p xt∗ ≡ const can be tested. Some references for testing hypotheses based on fuzzy data are Arnold (1996), Grzegorzewski (2001), Körner (2000) and Römer and Kandel (1995).

24.6

Generalized Holt–Winter method

For details of the Holt–Winter method in the case of standard time series see Janacek (2001). The Holt–Winter method for standard time series (xt )t∈T assumes a component model xt = m t + st + εt ,

t = 1 (1) N ,

with local linear trend m t = a + b t and period length p of the seasonal component st . If for time N − 1 estimators m %N −1 , % b N −1 and % s N − p are given, then the estimates for time N can be calculated recursively: % b N = (1 − α) m % N = (1 − β) % s N = (1 − γ )

(% m N −1 − m %N −2 ) + α % b N −1 x N −% sN − p + β m bN % N −1 + % (x N − m %N ) + γ % sN − p

(24.5) (24.6) (24.7)

Starting values for the recursion from the first p-values of the time series are calculated as follows: m %p =

p 1 xt , p t=1

% bp = 0

and % st = x t − m % p , t = 1 (1) p

(24.8)

The value % b p = 0 follows the assumption that at the beginning no trend is present. A h-step prediction at time N , with h > 0, is given by ⎧ %N + h % b N +% s N +h− p ⎪ ⎨m % + h b +% s N +h−2 p m % N N % x N ,h (α, β, γ ) = ⎪ .. ⎩ .

h = 1 (1) p h = p + 1 (1) 2 p .. .

.

There are different approaches for the selection of the smoothing parameters α, β and γ . Most frequently the one-step predictions % xt,1 (α, β, γ ) are compared with the


November 15, 2010

11:14



185

data (xt )t∈T and the values α, ˜ β˜ and γ˜ obeying N −1

2

˜ γ˜ ) xt+1 − % x t,1 (α, ˜ β,

= min α,β,γ

t= p

N −1

2

xt+1 − % xt,1 (α, β, γ )

t= p

are used. In the case of fuzzy data (xt∗ )t∈T a model of the form x t∗ ≈ m ∗t ⊕ st∗ ,

t = 1 (1) N ,

with linear trend m ∗t = a ∗ ⊕ b∗ · t is assumed. b∗N −1 and % s N∗ −1 at time N −1, estimates m %∗N , % b∗N and For given estimates m %∗N −1 , % ∗ % s N can be calculated using Equations (24.5), (24.6) and (24.7): % m ∗N −1 ∼ H m %∗N −2 ) ⊕ α · % b∗N −1 b∗N = (1 − α) · (% ∗ ∗ ∗ ∗ m %N = (1 − β) · (x N ∼ H % s N − p ) ⊕ β · (% m N −1 ⊕ % b∗N ) % s N∗ = (1 − γ ) · (x N∗ ∼ H m %∗N ) ⊕ γ ·% s N∗ − p Starting values can be obtained as described in Section 24.4 from the first p-values by decomposition into position and fuzziness component. From (24.3) providing uniqueness of the decomposition follows that the estimate for m %∗p is the average of the Steiner points (σxt∗ )t=1(1) p , and therefore a real number. The seasonal components are estimated by % st∗ = xt∗ m %∗p ,

t = 1 (1) p.

Analogously to the classical case b∗p = 0 is used. A prediction % x N∗ +h for x N∗ +h , h > 0, at time N is calculated by ⎧ ∗ %N ⊕ h · % b∗N ⊕% s N∗ +h− p ⎪ ⎨m %∗N ⊕ h · % b∗N ⊕% s N∗ +h−2 p % x N∗ ,h (α, β, γ ) = m ⎪ . ⎩ ..

h = 1 (1) p h = p + 1 (1) 2 p .. .

.

The determination of suitable smoothing parameters α, ˜ β˜ and γ˜ is possible as in the classical case by N −1 t= p

N −1 ∗ ∗ ∗ ∗ ˜ γ˜ ) = min ρ22 xt+1 ,% xt,1 (α, ˜ β, ρ22 xt+1 ,% xt,1 (α, β, γ ) . α,β,γ

t= p

(24.9)


186

November 15, 2010

11:14



Figure 24.7 Observed and predicted values. Example 24.5 The generalized Holt–Winter method is applied to the 48 observations from example 24.4. The smoothing parameters from (24.9) are α = 0.95, β = 0.31 and γ = 0.38. In Figure 24.7 both, the observed data as well as the one-step predictions using optimal smoothing parameters are depicted.

24.7

Presentation in the frequency domain

For an absolute summable sequence (xt )t∈Z of real numbers and the functions C(λ) =

xt cos(2 π λ t)

and

S(λ) =

t∈Z

xt sin(2 π λ t)

t∈Z

the inverse of the Fourier transformation exists and the following holds: '

0.5

xt = 2 0

' C(λ) cos(2 π λ t) dλ + 2

0.5

S(λ) sin(2 π λ t) dλ .

0

Therefore the consideration of sequences in the time domain is equivalent to that in the frequency domain. The Fourier transformation contains the same information as the series itself. In the case of finite time series (xt )t∈T it is sufficient to know the Fourier transformation for finitely many frequency values. A time series with an odd number of observations N = 2 M + 1 can be represented uniquely using the


November 15, 2010

11:14



so-called Fourier frequencies λk =

k , N

187

k = 0 (1) M:

C(λk ) =

N 1 xt cos(2 π λk t), N t=1

k = 0 (1) M,

S(λk ) =

N 1 xt sin(2 π λk t), N t=1

k = 1 (1) M,

and

implies

x t = C(λ0 ) + 2

M C(λk ) cos(2 π λk t) + S(λk ) sin(2 π λk t) .

(24.10)

k=1

For a fuzzy time series (xt∗ )t∈T such a presentation does not exist in general. The reason for that is the semilinearity of the structure (F (R) , ⊕, ·). Another possibility would be using the Hukuhara addition. Examples show that uniqueness cannot be achieved for fuzzy time series. Example 24.6 For three observations x 1∗ , x2∗ , x3∗ ∈ F2 (R) with corresponding characterizing functions ξx1∗ (x) = I[−1,1] (x), ξx2∗ (x) = I[−1,1] (x) and ξx3∗ (x) = I{0} (x) as well as frequencies λk = k3 , k = −1, 0, 1, there exist no fuzzy coefficients C ∗ (λk ), k = −1, 0, 1, and S ∗ (λk ), k = −1, 0, 1, with

xt∗ =

1 H k=−1

C ∗ (λk ) · cos(2 π λk t) ⊕ H

1 H k=−1

S ∗ (λk ) · sin(2 π λk t),

t = 1 (1) 3.

Proof: For the time points t = 1, 2, 3 the following equations hold: t =1: x1∗ = − 12 · C ∗ (λ−1 ) ⊕ H C ∗ (λ0 ) ⊕ H − 12 · C ∗ (λ1 ) ⊕ H √ √ − 23 · S ∗ (λ−1 ) ⊕ H 23 · S ∗ (λ1 ) √ = C ∗ (λ0 ) ⊕ 23 · S ∗ (λ1 ) ∼ H 12 · C ∗ (λ−1 ) ⊕ 12 · C ∗ (λ1 ) ⊕

√ 3 2

· S ∗ (λ−1 )


188

November 15, 2010

11:14



t =2: x 2∗ = − 12 · C ∗ (λ−1 ) ⊕ H C ∗ (λ0 ) ⊕ H − 12 · C ∗ (λ1 ) ⊕ H √ √ 3 ∗ − 23 · S ∗ (λ1 ) · S (λ ) ⊕ −1 H 2 √ = C ∗ (λ0 ) ⊕ 23 · S ∗ (λ−1 ) ∼ H 12 · C ∗ (λ−1 ) ⊕ 12 · C ∗ (λ1 ) ⊕

√ 3 2

· S ∗ (λ1 )

t =3: x 3∗ = C ∗ (λ−1 ) ⊕ H C ∗ (λ0 ) ⊕ H C ∗ (λ1 ) The equation for t = 3 implies that all coefficients C ∗ (λk ), k = −1, 0, 1, are real numbers. Therefore the following equations for the first two time points remain: √

√ 3 ∗ 3 ∗ · S (λ1 ) ∼ H · S (λ−1 ) 2 2 √ √ 3 ∗ 3 ∗ x 2∗ = · S (λ−1 ) ∼ H · S (λ1 ). 2 2 x1∗ =

(24.11) (24.12)

From x1∗ = x2∗ we obtain √ √ √ 3 ∗ 3 ∗ 3 ∗ 3 ∗ · S (λ1 ) ∼ H · S (λ−1 ) = · S (λ−1 ) ∼ H · S (λ1 ). (24.13) 2 2 2 2

√

Equation (24.13) can only be fulfilled if S ∗ (λ1 ) and S ∗ (λ−1 ) are real numbers. But in this case Equations (24.11) and (24.12) cannot be fulfilled. Therefore for fuzzy time series the frequency approach is not used.


November 15, 2010

11:15


25

More on fuzzy random variables and fuzzy random vectors For model-based inference methods in time series analysis the concept of fuzzy random variables is useful. Basic definitions and some propositions concerning fuzzy random variables are given in Chapter 9 (A Law of Large Numbers). For generalized stochastic methods in time series analysis more details on fuzzy random variables are necessary. These will be explained in this section.

25.1

Basics

Definition 25.1: A fuzzy random vector (also called an n-dimensional fuzzy random variable) is a mapping X from some probability space (, E, P) into the space (F (Rn ) , h ∞ ) with h ∞ (·, ·) the metric defined in (23.4) if X is measurable relative to the σ -field in F (Rn ) which is generated by the open sets

∗

x ∈F R

n

: sup d H Cδ (x ∗ ), C δ ( y∗ ) < r

δ∈(0;1]

= x ∈ F Rn : h ∞ (x ∗ , y∗ ) < r ,

∗

with y∗ ∈ F (Rn ) and r > 0. Remark 25.1: Using (F (Rn ) , h ∞ ) in Definition 25.1 is important as pointed out in Körner (1997b). Statistical Methods for Fuzzy Data © 2011 John Wiley & Sons, Ltd

Reinhard Viertl

189


190

November 15, 2010

11:15



Theorem 25.1: Let X ∗ and Y ∗ be fuzzy random vectors, and δ ∈ (0; 1], and u ∈ S n−1 . Then X ∗ 2 , s X ∗ (u, δ) and ρ2 (X ∗ , Y ∗ ) are random variables. Moreover the Steiner point σ X ∗ is an n-dimensional stochastic vector. For the proof see Körner (1997b). Definition 25.2: Two fuzzy random vectors X ∗1 and X ∗2 are called independent if P X ∗1 ∈ B1∗ , X ∗2 ∈ B2∗ = P X ∗1 ∈ B1∗ P X ∗2 ∈ B2∗ , for all elements B1∗ and B2∗ in the corresponding σ -fields. Remark 25.2: In the following only fuzzy random variables with values in F2 (Rn ) are considered, i.e. X ∗ (ω) ∈ F2 (Rn ) ∀ ω ∈ . Such fuzzy random variables are quadratic integrable with E X ∗ 22 < ∞.

25.2

Expectation and variance of fuzzy random variables

Let X : → M be a random quantity defined on (, E, P) with values in a metric space (M, ρ ). Using the so-called Fréchet principle the (not necessarily unique) expectation E F X is defined to be the set of all A ∈ M, which minimize the squared X , A , i.e. distance E ρ 2

X = A ∈ M : Eρ 2 EF 2 X , A = inf E ρ X, B ,

(25.1)

B∈M

X , A < ∞ (see Fréchet, 1948). if an A ∈ M exists with E ρ 2 The infimum of the expected quadratic distance is called the Fréchet variance, i.e. 2 X, A . Var F X := inf E ρ

(25.2)

A∈M

Remark 25.3: For classical random variables X we have VarX = E (X − E X )2 = inf E (X − a)2 . a∈R

Therefore the Fréchet principle generalizes the classical variance. The Fréchet principle for the expectation and the variance can be applied for fuzzy random quantities X ∗ with values in the metric space (F2 (Rn ) , ρ2 ). The expectation E F X ∗ is calculated as ∗

∗

E F X = x ∈ F2 R

n

:

E ρ22 (X ∗ ,

∗

x )=

inf

y∗ ∈F2 (Rn )

E ρ22 (X ∗ ,

∗

y )

(25.3)


November 15, 2010

11:15


FUZZY RANDOM VARIABLES AND FUZZY RANDOM VECTORS

191

and the variance using (23.5) as Var F X ∗ = E ρ22 (X ∗ , E F X ∗ ) 1 |s X ∗ (u, δ) − sE F X ∗ (u, δ)|2 µ(d u) d δ =E n S n−1

0

1

=n 0

Vars X ∗ (u, δ) µ(d u) d δ.

S n−1

(25.4)

Another possibility for the definition of the expectation is the so-called Aumann expectation. Definition 25.3: The Aumann expectation E A X ∗ of a fuzzy random variable X ∗ is defined by its δ-cuts Cδ (E A X ∗ ), δ ∈ (0; 1], using families of n-dimensional random vectors Z : → Rn by Cδ (E A X ∗ ) := E Z : Z (ω) ∈ Cδ X ∗ (ω) P a.e. and E Z 2 < ∞ .

(25.5)

By the one-to-one correspondence between X ∗ and their δ-cuts C δ (X ∗ ), the Aumann expectation is clearly defined if all C δ (E A X ∗ ) for δ ∈ (0; 1] are non-empty. A third possibility for the definition of the expectation of X ∗ is the so-called Bochner expectation. Definition 25.4: The Bochner expectation EX ∗ of a fuzzy stochastic quantity X ∗ is defined by the support function sE X ∗ (u, δ) = E s X ∗ (u, δ)

∀ u ∈ S n−1 , ∀ δ ∈ (0; 1].

(25.6)

Remark 25.4: The Fréchet expectation and the Fréchet variance depend on the norm used. It can be proved [see Näther (1997) and Körner and Näther (1998)], that for a fuzzy stochastic quantity X ∗ with values in the metric space (F (Rn ) , h p ), p ∈ (1; ∞), the Fréchet expectation (h ) EF p

∗

∗

X = x ∈F R

n

:

E h 2p (X ∗ ,

∗

x )=

inf

y∗ ∈F(Rn )

E h 2p (X ∗ ,

is in general not linear, i.e. E F p (X ∗ ⊕ Y ∗ ) = E F p X ∗ ⊕ E F p Y ∗ (h )

(h )

(h )

is possible. Contrary to that, the Aumann expectation is linear: E A (λ1 · X ∗ ⊕ λ2 · Y ∗ ) = λ1 · E A X ∗ ⊕ λ2 · E A Y ∗ .

∗

y )


192

November 15, 2010

11:15



Moreover in Körner (1997b) the following is proved: For the norm ρ2 (·, ·) the Aumann expectation and the Bochner expectation are identical, and are uniquely determined. For this norm the Fréchet expectation is linear. Therefore in the following the Fréchet expectation is used for E X ∗ , and the Fréchet variance for VarX ∗ , based on fuzzy random quantities X ∗ . Remark 25.5: By definition the Fréchet variance is a real number. There are other definitions for the generalization of the variance to fuzzy random quantities. A survey concerning different definitions for the variance of a fuzzy random quantity X ∗ is given in Körner (1997a). Natural conditions for the variance of a fuzzy stochastic quantity X ∗ are the following: (1) If X ∗ is constant almost everywhere, then its variance is equal to 0: VarX ∗ = 0

⇔

P{ω ∈ : X ∗ (ω) = x ∗ } = 1 for x ∗ ∈ F2 Rn .

(2) For all λ ∈ R it follows Var(λ · X ∗ ) = λ2 Var(X ∗ ). (3) The variance does not depend on the location: Var(X ∗ ⊕ x ∗ ) = Var(X ∗ )

∀ x ∗ ∈ F2 R n .

Lemma 25.1: Let X ∗ and Y ∗ be fuzzy random variables, and X : → R, x ∗ ∈ F2 (Rn ), and λ, γ ∈ R. Then the following holds: (1) E (λ · X ∗ ⊕ γ · Y ∗ ) = λ · E X ∗ ⊕ γ · E Y ∗ (2) VarX ∗ = E X ∗ 22 − E X ∗ 22 (3) Var(λ · X ∗ ) = λ2 VarX ∗ (4) Var(X · x ∗ ) = x ∗ 22 VarX (5) Var(X ∗ ⊕ x ∗ ) = VarX ∗ (6) Var(X ∗ ⊕ Y ∗ ) = VarX ∗ + VarY ∗ if X ∗ and Y ∗ are independent. For the proof see Körner (1997a). Expectation and variance of X ∗ can be decomposed into a local part and a fuzzy part: E X ∗ = E X ∗0 ⊕ E σ X ∗ VarX ∗ = VarX ∗0 + Varσ X ∗

(25.7)


November 15, 2010

11:15



193

where X ∗0 = X ∗ σ X ∗ is the centered quantity. The first equation follows from the linearity of the expectation operator (cf. Lemma 25.1), and the second from the definition of the variance (25.4) and formula (23.8). Remark 25.6: The variance is a non-negative real number. For the special case of a classical random vector X : → Rn it cannot be a generalization of the n × ndimensional variance–covariance matrix. Using the notation uT for the transposed vector of u, and In for the n-dimensional unit matrix, the following holds: s X (ω) (u, δ) = u, X (ω) = uT X (ω), ω ∈ , sE X (u, δ) = u, E X = u T E X and VarX = E n =E n

1

|s X (u, δ) − sE X (u, δ)| µ(d u) d δ 2

0 1 0

=E n

S n−1

|u T X − u T E X |2 µ(d u) d δ

1

S n−1

S n−1

0

(X − E X )T u u T (X − E X ) µ(d u) d δ

1 u u T µ(d u) d δ (X − E X ) = E (X − E X )T n n−1 0 S

In

= E (X − E X )T (X − E X ). For classical stochastic vectors X the Fréchet variance is the trace of its variance– covariance matrix E (X − E X ) (X − E X )T . For classical (one-dimensional) random variables it is equal to its standard variance.

25.3

Covariance and correlation

Definition 25.5: For two fuzzy random variables X ∗ and Y ∗ the covariance Cov (X ∗ , Y ∗ ) is defined using the scalar product of the support functions from (23.9): Cov (X ∗ , Y ∗ ) := E s X ∗ − sE X ∗ , sY ∗ − sE Y ∗ 1 (s X ∗ − sE X ∗ ) (sY ∗ − sE Y ∗ ) µ(d u) d δ = E n

= n 0

0

1

S n−1

S n−1

Cov (s X ∗ , sY ∗ ) µ(d u) d δ.


194

November 15, 2010

11:15



The correlation Corr (X ∗ , Y ∗ ) is defined by Cov (X ∗ , Y ∗ ) . Corr (X ∗ , Y ∗ ) := Var(X ∗ ) Var(Y ∗ ) This covariance can be decomposed into a local and a fuzzy part. With X ∗0 = X σ X ∗ and Y ∗0 = Y ∗ σY ∗ we obtain ∗

Cov (X ∗ , Y ∗ ) = Cov (X ∗0 , Y ∗0 ) + Cov (σ X ∗ , σY ∗ ). This is proved by using the definition of the covariance. Lemma 25.2: Covariance and correlation of two fuzzy random quantities X ∗ and Y ∗ fulfill the following: (1) Cov (X ∗ , X ∗ ) = VarX ∗ (2) Var(X ∗ ⊕ Y ∗ ) = VarX ∗ + VarY ∗ + 2 Cov (X ∗ , Y ∗ ) (3) Corr (X ∗ , Y ∗ ) = 0 if X ∗ and Y ∗ are stochastically independent (4) |Corr (X ∗ , Y ∗ )| ≤ 1 (5) Cov (X ∗ , Y ∗ ) = E X ∗ , Y ∗ − E X ∗ , E Y ∗ . For the proof see Körner (1997a). The following proposition gives further properties of covariance and correlation. Proposition 25.1: For fuzzy random quantities X ∗ , X ∗1 , X ∗2 and Y ∗ , and x ∗ , y∗ ∈ F2 (Rn ) the following holds: (1) Cov (X ∗1 ⊕ X ∗2 , Y ∗ ) = Cov (X ∗1 , Y ∗ ) + Cov (X ∗2 , Y ∗ ) (2) E X ∗ , x ∗ = E X ∗ , x ∗ (3) Cov (λ · X ∗ ⊕ x ∗ , ν · Y ∗ ⊕ y∗ ) = λ ν Cov (X ∗ , Y ∗ ) λν ≥ 0 (4) Corr (X ∗ , Y ∗ ) = 1 (5) Corr (X ∗ , Y ∗ ) = −1

⇔ ⇔

Y ∗ ⊕ y∗ = α · X ∗ ⊕ x ∗ Y ∗ ⊕ α · X ∗ = y∗

λ, ν ∈ R with

for with

with

α≥0

α ≥ 0.

Proof: ad (1) This follows immediately from the definition of the covariance and the properties of the support function.


November 15, 2010

11:15



195

ad (2) This follows directly from the definition of the scalar product [cf. (23.9) and (25.6)]. ad (3) By Lemma 25.1 and (23.7) for λ ν ≥ 0, i.e. sign (λ) = sign (ν) we obtain: Cov (λ · X ∗ ⊕ x ∗ , ν · Y ∗ ⊕ y∗ ) = E sλ·X ∗ ⊕x ∗ − sλ·E X ∗ ⊕x ∗ , sν·Y ∗ ⊕ y∗ − sν·E Y ∗ ⊕ y∗ = E sλ·X ∗ + s x ∗ − sλ·E X ∗ − s x ∗ , sν·Y ∗ + s y∗ − sν·E Y ∗ − s y∗ = E |λ| ssign(λ)·X ∗ −|λ| ssign(λ)·E X ∗ , |ν| ssign(ν)·Y ∗ −|ν| ssign(ν)·E Y ∗ = |λ| |ν| E ssign(λ)·X ∗ − ssign(λ)·E X ∗ , ssign(ν)·Y ∗ − ssign(ν)·E Y ∗ = λ ν Cov (X ∗ , Y ∗ ). ad (4) The necessity of the condition is seen in the following way: Defining β = VarY ∗ ∗ β · E X ∗ and x ∗ = E Y ∗ we have Corr (X ∗ , Y ∗ ) = Var X ∗ , α = β, y = √ 1 ⇒ Cov (X ∗ , Y ∗ ) = VarX ∗ VarY ∗ and by condition (2) for the quadratic distance of the fuzzy random quantities Y ∗ ⊕ y∗ and α · X ∗ ⊕ x ∗ we obtain E ρ22 (Y ∗ ⊕ y∗ , α · X ∗ ⊕ x ∗ ) = E ρ22 (Y ∗ ⊕ β · E X ∗ , β · X ∗ ⊕ E Y ∗ ) = E Y ∗ ⊕ β · E X ∗ , Y ∗ ⊕ β · E X ∗ −2 E Y ∗ ⊕ β · E X ∗ , β · X ∗ ⊕ E Y ∗ + E β · X ∗ ⊕ E Y ∗ , β · X ∗ ⊕ E Y ∗ = E Y ∗ , Y ∗ + 2 β E Y ∗ , E X ∗ + β 2 E X ∗ , E X ∗ − 2 E Y ∗ , E Y ∗ − 2 β E Y ∗ , X ∗ − 2 β E X ∗ , E Y ∗ − 2 β 2 E E X ∗ , X ∗ + β 2 E X ∗ , X ∗ + 2 β E X ∗ , E Y ∗ + E Y ∗ , E Y ∗ = E Y ∗ , Y ∗ + 2 β E Y ∗ , E X ∗ + β 2 E X ∗ , E X ∗ − 2 E Y ∗ , E Y ∗ − 2 β E Y ∗ , X ∗ − 2 β E X ∗ , E Y ∗ − 2 β 2 E X ∗ , E X ∗ + β 2 E X ∗ , X ∗ + 2 β E X ∗ , E Y ∗ + E Y ∗ , E Y ∗ = E Y ∗ , Y ∗ − E Y ∗ , E Y ∗ − 2 β E Y ∗ , X ∗ − 2 β E X ∗ , E Y ∗ + β 2 E X ∗ , X ∗ − β 2 E X ∗ , E X ∗ = VarY ∗ − 2 β Cov (X ∗ , Y ∗ ) + β 2 VarX ∗ VarY ∗ √ VarY ∗ ∗ ∗ ∗ = VarY − 2 VarX VarY + VarX ∗ = 0 . VarX ∗ VarX ∗ The fuzzy random quantities Y ∗ ⊕ y∗ and α · X ∗ ⊕ x ∗ are identical P – almost everywhere. The sufficiency of the condition follows from (3) and conditions (3) and (5) from Lemma 25.1:


196

November 15, 2010

11:15



Cov (X ∗ , Y ∗ ) = Cov (X ∗ , Y ∗ ⊕ y∗ ) = Cov (X ∗ , α · X ∗ ⊕ x ∗ ) √ = α Cov (X ∗ , X ∗ ) = α 2 VarX ∗ VarX ∗ = VarX ∗ Var(α · X ∗ ⊕ x ∗ ) = VarX ∗ Var(Y ∗ ⊕ y∗ ) √ 1 = VarX ∗ VarY ∗ . ad (5) The necessity follows similar to the proof of the necessity in (4): From VarY ∗ , α = β and y∗ = β · E X ∗ ⊕ E Y ∗ it follows E ρ 2 (Y ∗ ⊕ α · β = Var 2 X∗ X ∗ , y∗ ) = E ρ22 (Y ∗ ⊕ β · X ∗ , β · E X ∗ ⊕ E Y ∗ ) = 0. For the sufficiency, from (1) and (3) we obtain Cov (X ∗ , Y ∗ ) = Cov (X ∗ ⊕ y∗ , Y ∗ ) = Cov (X ∗ ⊕ Y ∗ ⊕ α · X ∗ , Y ∗ ) = (1 + α) Cov (X ∗ , Y ∗ ) + VarY ∗ , from which follows −α Cov (X ∗ , Y ∗ ) = VarY ∗

(25.8)

and Cov (X ∗ , Y ∗ ) = Cov (X ∗ , Y ∗ ⊕ y∗ ) = Cov (X ∗ , Y ∗ ⊕ Y ∗ ⊕ α · X ∗ ) = 2 Cov (X ∗ , Y ∗ ) + α VarX ∗ ⇒ −Cov (X ∗ , Y ∗ ) = α VarX ∗ .

(25.9)

From Equations (25.8) and (25.9) we obtain VarY ∗ = α 2 VarX ∗ and from Equation (25.9) −Cov (X ∗ , Y ∗ ) = α VarX ∗ =

25.4

√ √ α 2 VarX ∗ VarX ∗ = VarY ∗ VarX ∗ .

Further results

A fuzzy stochastic quantity X ∗ in L R-form is defined by four real random variables m, s, l and r with P{s ≥ 0} = P{l ≥ 0} = P{r ≥ 0} = 1 and two fuzzy numbers x L∗ , x R∗ ∈ F2 (R): X ∗ = m, s, l, r L R := m ⊕ s · [−1, 1] l · x L∗ ⊕ r · x R∗ . In order to fulfill E X ∗ 22 the random variables m, s, l and r have to be squareintegrable. By the linearity of the expectation operator the expectation E X ∗ is a fuzzy number in L R-form obeying E X ∗ = E m, E s, E l, E r L R = E m ⊕ E s · [−1, 1] E l · x L∗ ⊕ E r · x R∗ .


November 15, 2010

11:15



197

The variance of this fuzzy random quantity is given by VarX ∗ = Var m + Var s + x L∗ 22 Var l + x R∗ 22 Var r − 2 x L∗ 1 Cov (m, l) − Cov (s, l) + 2 x R∗ 1 Cov (m, r ) − Cov (s, r ) . Many results for real random variables can be generalized for fuzzy random quantities. Proposition 25.2 (Chebysheff’s inequality): For fuzzy stochastic quantities X ∗ with E X ∗ 22 < ∞ the following holds: Var(X ∗ ) P ρ2 (X ∗ , E X ∗ ) ≥ ε ≤ ε2

∀ ε > 0.

Proof: By the definition of the variance we have

1 Var(X ∗ ) 2 ∗ ∗ X P(d ω) = ρ (ω), E X 2 ε2 ε2

1 I{ω : ρ22 ( X ∗ (ω),E X ∗ )≥ε2 } (ω) ρ22 X ∗ (ω), E X ∗ P(d ω) ≥ 2 ε

≥ I{ω : ρ22 ( X ∗ (ω),E X ∗ )≥ε2 } (ω) P(d ω) = P ρ2 (X ∗ , E X ∗ ) ≥ ε .

The generalized Chebysheff inequality can be used to prove a generalized weak law of large numbers. Theorem 25.2 (Weak law of large numbers): Let (X ∗n )n∈N be a sequence of independent and identical distributed fuzzy random quantities obeying E X ∗1 22 < ∞. n 1 ∗ X the following holds: Defining X ∗ (n) = · n i=1 i lim P ρ2 X ∗ (n), E X ∗1 ≥ ε = 0

n→∞

∀ ε > 0.

Proof: The proof is analogous to the classical theorem obeying VarX ∗ (n) = n −1 VarX ∗1 which follows from Lemma 25.1. Theorem 25.3 (Kolmogorov’s inequality): For independent fuzzy random quantities X ∗1 , . . . , X ∗n obeying E X i∗ 22 < ∞ for i = 1 (1) n the following inequality holds: P

max ρ2

1≤k≤n

k i=1

X i∗ ,

k

E X i∗

i=1

For the proof see Körner (1997a).

≥ε

≤

n 1 VarX i∗ ε2 i=1

∀ ε > 0.


198

November 15, 2010

11:15



From the Kolmogorov inequality we obtain a generalized strong law of large numbers. Theorem 25.4 (Strong law of large numbers): For a sequence (X ∗n )n∈N of in∞ 1 VarX ∗n and dependent fuzzy random quantities with convergent sequence 2 n n=1

E X ∗n 22 < ∞ for n ∈ N the following holds:

P

lim ρ2

n→∞

n n 1 ∗ 1 Xi , · E X i∗ · n i=1 n i=1

For the proof see Körner (1997a).

=0 =1


November 15, 2010

11:17


26

Stochastic methods in fuzzy time series analysis In Chapter 24 fuzzy time series are analyzed without assumptions on a stochastic model. The only stochastic element was the error component εt∗ for moving averages. In the present chapter the fuzzy observations are considered as realizations of a stochastic process whose elements are fuzzy stochastic quantities. The goal is to use dependencies of the observations for predictions of future values.

26.1

Linear approximation and prediction

A fuzzy stochastic process (X t∗ )t∈T is a family of fuzzy random quantities X t∗ , where T is an index set, usually a subset of the set of real numbers R. The first approach is looking for the ‘best’ linear approximation of the fuzzy random quantity X t∗ , t ∈ T, ∗ ∗ from the p quantities X t−1 , . . . , X t− p , p > 0 with t > p. The approximation is assumed to be X t∗ ≈

p

∗ αi · X t−i

(26.1)

i=1

with real numbers αi . The quality of the approximation is determined by the expectation of the squared distance between approximated value and observed value. This α1 , . . . , α p ) ∈ R p fulfilling means for the coefficients αi looking for those values ( E ρ22 X t∗ ,

p

∗ αi · X t−i

i=1


=

min

(α1 ,...,α p )∈R p

Reinhard Viertl

199

E ρ22 X t∗ ,

p i=1

∗ . αi · X t−i

(26.2)


200

November 15, 2010

11:17



In the case of real valued weakly stationary time series (X t )t∈T , i.e. constant expectation E X t = µ for all t ∈ T and covariance Cov (X t , X s ) depending only on the distance t − s for all t, s ∈ T for E X t = 0 and Cov (X t , X s ) = γt−s the coefficients αi , i = 1 (1) p, can be obtained from the so-called Yule–Walker equations of order p. For details see Schlittgen and Streitberg (1984). From E

p

Xt −

2

αi X t−i

=

i=1

min

(α1 ,...,α p )∈R p

E

Xt −

p

2 αi X t−i

,

(26.3)

i=1

assuming the following matrix to be invertible, it follows: ⎛

α1 α2 .. .

⎞

⎛

γ1 γ0 .. .

γ2 γ1 .. .

... ... .. .

γ p−2 γ p−3 .. .

γ p−3 γ p−2

γ p−4 γ p−3

... ...

γ2 γ1

γ0 γ1 .. .

⎟ ⎜ ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ ⎜ ⎟=⎜ ⎜ ⎟ ⎜ ⎜ ⎝ α p−1 ⎠ ⎝γ p−2 αp γ p−1

⎞−1 ⎛ ⎞ γ p−1 γ1 ⎜ ⎟ γ p−2 ⎟ ⎟ ⎜ γ2 ⎟ .. ⎟ ⎜ .. ⎟ ⎜ ⎟ . ⎟ ⎟ ⎜ . ⎟ γ1 ⎠ ⎝γ p−1 ⎠ γ0 γp

In the case of fuzzy data, by Equation (26.2), Equation (23.10) is equivalent to E ρ22

X t∗ ,

p

αi ·

∗ X t−i

=E

i=1

+

X t∗ ,

X t∗

−2

X t∗ ,

p

αi ·

i=1 p

αi ·

∗ X t−i ,

i=1

p

αi ·

∗ X t−i

∗ X t−i

→ min .

i=1

The determination of the coefficients is complicated because the separation of the scalar product is only possible for positive coefficients, which follows from Theorem 23.2. Assuming all coefficients αi to be positive, from Theorem 23.2 and Lemma 25.1 we obtain: E

X t∗ ,

X t∗

X t∗ ,

−2

p

αi ·

∗ X t−i

+

⎛

i=1

∗

E ⎝ X t∗ , X t − 2

p i=1

p

αi ·

∗ X t−i ,

i=1

∗ αi X t∗ , X t−i +

p p

p

αi ·

i=1

∗ X t−i

=

⎞

∗ ∗ ⎠= αi α j X t−i , X t− j

i=1 j=1

∗ ∗ ∗ + E X t∗ , X t∗ − 2 αi E X t∗ , X t−i αi α j E X t−i , X t− j → min p

i=1

p

p

i=1 j=1


November 15, 2010

11:17


STOCHASTIC METHODS IN FUZZY TIME SERIES ANALYSIS

201

Using the notation α = (α1 , . . . , α p ) ∗ ∗ γ = E X t∗ , X t−1 , . . . , E X t∗ , X t− p and ⎛

∗ ∗ E X t−1 , X t−1 ⎜ E X ∗ , X ∗ ⎜ t−2 t−1 =⎜ .. ⎜ ⎝ . ∗ ∗ E X t− p , X t−1

∗ ∗ E X t−1 , X t−2 ∗ ∗ E X t−2 , X t−2 .. . ∗ ∗ E X t− p , X t−2

... ... .. .

⎞ ∗ ∗ E X t−1 , X t− p ∗ ∗ ⎟ E X t−2 , X t− p ⎟ ⎟ .. ⎟ ⎠ .

...

∗ ∗ E X t− p , X t− p

where is a symmetric matrix, the problem is a minimization problem of a quadratic form p 2 ∗ ∗ αi · X t−i = α α T − 2 γ α T → min E ρ2 X t , i=1

with side conditions αi ≥ 0, i = 1 (1) p. This problem is well known in economics. For details see Luptácˇ ik (1981). Remark 26.1: The matrix is positive semi-definite. Proof: By definition of the inner product ·, · from Theorem 23.2 for the elements in the matrix the following holds: ∗ , E X t−i

∗ X t− j

1

=E

S0

0

∗ (u, δ) s ∗ (u, δ) µ(d u) d δ , s X t−i X t− j

i, j = 1 (1) p.

∗ (u, δ) are for all u ∈ {−1, 1} and δ ∈ (0; 1] one-dimensional The functions s X t−i random variables by Theorem 25.1. For arbitrary vector α = (α1 , . . . , α p ) ∈ R p therefore

α αT =

p p i=1 j=1

=

p p

∗ ∗ αi α j E X t−i , X t− j

αi α j E

i=1 j=1

=E 0

=E 0

1

⎛

⎝ S0

0 p p

S0

∗ (u, δ) s ∗ (u, δ) µ(d u) d δ s X t−i X t− j

⎞

∗ (u, δ) s ∗ (u, δ)⎠ µ(d u) d δ αi α j s X t−i X t− j

i=1 j=1

p 1 S0

1

i=1

∗ (u, δ) αi s X t−i

2 µ(d u) d δ ≥ 0.


202

November 15, 2010

11:17



In applications the elements of the matrix are not known and have to be estimated based on the data (xt∗ )t∈T . In order to keep the number of quantities which have to be estimated reasonable, the following conditions are assumed for the fuzzy stochastic quantities (X t∗ )t∈T : (S1) E X t∗ = µ∗

∀t ∈ T

(S2) σE X t∗ = σµ∗ = 0 (S3) Cov X t∗ , X s∗ = γ (t − s) which means the expectation is constant with Steiner point 0, and the covariance of two different fuzzy stochastic quantities X t∗ and X s∗ depends on the time difference t − s only. A more stringent condition instead of condition (S3) would be the following: For all δ ∈ (0; 1], u ∈ {−1, 1} and any pair of one-dimensional fuzzy random quan tities X t∗ (u, δ) and X s∗ (u, δ) the covariance has to fulfill Cov X t∗ (u, δ), X s∗ (u, δ) = γt−s (u, δ). In the following the weaker condition (S3) is used. By the definition of the covariance it follows Cov (X t∗ , X s∗ ) = E X t∗ , X s∗ + E X t∗ , E X s∗ , also the scalar product η(t − s) = E X t∗ , X s∗ = Cov (X t∗ , X s∗ ) − E X t∗ , E X s∗ = γ (t − s) − µ∗ , µ∗ depends only on the time difference t − s. For classical time series (X t )t∈T the above conditions correspond to the conditions E X t = 0 and Cov (X t , X s ) = γ (t − s), i.e. the conditions of a weakly stationary process. In order to estimate the matrix estimates of E X t∗ , X s∗ are needed. First, the estimation of the expectation is considered. Similar to the standard real valued case the expectation µ∗ is estimated by a temporal mean values of the observations: µ∗ = X ∗N =

N 1 ∗ X · N t=1 t

By the linearity of the expectation operator (cf. Lemma 25.1), the mean is an unbiased estimator for µ∗ : N N 1 ∗ 1 ∗ Xt = E X t∗ = µ∗ · · E XN = E N t=1 N t=1 For the variance of this estimator we obtain N N N N N 1 1 ∗ 1 ∗ VarX N = Var · Xt = 2 Cov X i∗ , X ∗j = 2 γ (i − j) N t=1 N i=1 j=1 N i=1 j=1 1 1 |h| (N − |h|) γ (h) = γ (h). = 2 1− N |h|

Statistical Methods for Fuzzy Data

Statistical Methods for Reliability Data

Statistical Methods for Reliability Data

Statistical methods for spatial data analysis

Statistical Methods for Categorical Data Analysis

Statistical Models and Methods for Lifetime Data

Statistical methods for spatial data analysis

Statistical Methods for Geography

Statistical methods for forecasting

Statistical Methods for Biostatistics

Statistical Methods for Psychology

Statistical techniques for data analysis

Statistical techniques for data analysis

Statistical Techniques for Data Analysis

Statistical Methods of Geophysical Data Processing

Statistical methods of geophysical data processing

Advanced Statistical Methods for the Analysis of Large Data-Sets

Statistical and Econometric Methods for Transportation Data Analysis

Advanced Statistical Methods for the Analysis of Large Data-Sets

SAS for Data Analysis: Intermediate Statistical Methods (Statistics and Computing)

Advanced Statistical Methods for the Analysis of Large Data-Sets

Probability theory for statistical methods

Statistical Methods for Meta-Analysis

Statistical methods for research workers

Statistical Methods for Clinical Trials

Statistical Methods for Meta-Analysis

Statistical Methods for Health Sciences

statistical methods for biostatistics hardle

Statistical Methods for Human Rights

Statistical Methods for Quality Improvement

Statistical Methods for Fuzzy Data

Statistical Methods for Reliability Data

Statistical Methods for Reliability Data

Statistical methods for spatial data analysis

Statistical Methods for Categorical Data Analysis

Statistical Models and Methods for Lifetime Data

Statistical methods for spatial data analysis

Statistical Methods for Geography

Statistical methods for forecasting

Statistical Methods for Biostatistics

Statistical Methods for Psychology

Statistical techniques for data analysis

Statistical techniques for data analysis

Statistical Techniques for Data Analysis

Statistical Methods of Geophysical Data Processing

Statistical methods of geophysical data processing

Advanced Statistical Methods for the Analysis of Large Data-Sets

Statistical and Econometric Methods for Transportation Data Analysis

Advanced Statistical Methods for the Analysis of Large Data-Sets

SAS for Data Analysis: Intermediate Statistical Methods (Statistics and Computing)

Advanced Statistical Methods for the Analysis of Large Data-Sets

Probability theory for statistical methods

Statistical Methods for Meta-Analysis

Statistical methods for research workers

Statistical Methods for Clinical Trials

Statistical Methods for Meta-Analysis

Statistical Methods for Health Sciences

statistical methods for biostatistics hardle

Statistical Methods for Human Rights

Statistical Methods for Quality Improvement

Recommend Documents