12345346 789 8 9 342423 !99"#"$% &' ()*+,-./0)+-/1 2+3535465352567...
208 downloads
1901 Views
6MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
12345346 789 8 9 342423 !99"#"$% &' ()*+,-./0)+-/1 2+353546535256758659236:5325;538 % 8&@A "" B(6C3272>87565838343525623655553487325 F GHIJ9G'"FKLH89MG D+24868:252:675:257256NC325 OG7 G9"8G J&M ()*+-P1Q*R/S(*Q,Q*Q/1Q1 T132523454 V 9 "8 9OIO &89"H G@IW"'& X1575=483>8356243>964845254358434358645255 50 000 SEK/month
Source: Abou-Zeid et al. (2010).
Figure 17.4 Cdf of VTTS for the standard (base) model and the model with latent attitudes for three income levels expressed in Swedish Kronas per month
Hybrid choice models 3.2
397
Efficiency
Compared to the standard discrete choice mixture model, the behavioral mixture model enables the analyst to use indicators which improve the efficiency of the model because they provide more information about the latent variables. In this section, we discuss efficiency in the context of a mode choice example with unobserved choice sets taken from Ben-Akiva and Boccara (1995) where the full model specification and further background are available. This example also differs from the previous example in that the latent variables representing choice sets are discrete (indicators are also available from a survey). In what follows, we describe the motivation and show the relevant equations including how measurement equations can be specified when the latent variables are discrete, estimation results comparing models with and without the latent variables and with or without indicators, and comparison of the models in terms of efficiency. 3.2.1 Motivation A standard discrete choice model assumes that the choice set can be predicted deterministically for every individual. However, choice sets are better represented as latent variables, because in addition to observed socio-demographic variables that determine the choice set (for example, car availability, driver’s license, and so on in the context of mode choice), the perceived availability of alternatives may depend on subjective factors like the individual’s travel attitudes and perceptions of the attributes of the modes. The approach used to model choice set generation is based on the concept of random constraints discussed earlier (Swait and Ben-Akiva, 1987a, 1987b). It is postulated that an individual perceives a certain alternative to be available only if a number of individual-specific constraints related to that alternative are satisfied (for example, in the case of transit, the constraints may be related to walking distance to the bus stop, travel time, and so on, and they are satisfied when the corresponding variables exceed certain individual-specific latent thresholds). Since different individuals may have different availability criteria, these constraints or criteria are latent, and so the availability of an alternative is also latent. 3.2.2 General formulation Choice set (class membership) model Let Ki denote the set of constraints related to the availability of alternative i, Hkin the value of the kth criterion or constraint for alternative i and individual n, and A*in the latent availability of alternative i for individual n (A*in is equal to 1 if the alternative is available and 0 otherwise). Hkin, which was given in equation (17.15) in a generic form, can be expressed as the difference of a systematic part hkin and a random part wkin as follows: Hkin 5 hkin 2 wkin
(17.26)
The probability that alternative i is available for individual n can then be expressed as the probability that all constraints related to alternative i are satisfied: Pr (A*in 5 1) 5 Pr (Hkin $ 0, 4k [ Ki)
(17.27)
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
398
Handbook of choice modelling
Conditional on having a non-empty choice set, the probability that choice set C is considered by individual n can then be expressed as follows (see Ben-Akiva and Boccara, 1995): Pn (C) 5
Pr ({ A*in 5 1, 4i [ C } d { A*jn 5 0, 4j [ Mn\C }) 1 2 Pr (A*l n 5 0, 4l [ Mn)
(17.28)
where Mn represents the set of all deterministically feasible alternatives for individual n, and Mn\C contains the alternatives that are in Mn but not in C. In the latent class terminology of section 2.1.2, equation (17.28) represents the class membership model Q (s 0 Xn;a, Sw) (with a certain choice set C representing a class) which is given behavioral meaning using the random constraints approach. Choice probability The unconditional choice probability Pn (i) is obtained by mixing the conditional choice probability Pn (i 0 C) given a choice set over the probability distribution of the choice sets as follows: Pn (i) 5 a Pn (i 0 C) Pn (C)
(17.29)
C [Gn
where Gn represents the set of all non-empty subsets of Mn. Introducing indicators of alternative availabilities Indicators of the availability of the alternatives can be obtained from a survey using an ordinal scale of availability (for example, ‘never available’ to ‘always available’) or a binary scale (available or not). Since perceived availability may be related to actual availability as well as to the desirability (for example, the utility or the choice probability) of an alternative, these indicators can then be expressed as follows: I*n 5 I* (Hn,Un) 1 un
(17.30)
where I*n denotes a vector of latent response variables underlying the ordinal or binary observed availability indicators In, Hn is the matrix of all latent constraints considered by individual n, Un is the vector of utilities of all alternatives considered by individual n, and un is a vector of error terms. The log-likelihood for individual n given both the choice and indicators is expressed by considering all possible combinations of responses to the availability questions and the actual choice: Ln5 a IinyinlnPn (i) 1Iin (12yin) lnPr (Iin51,yin50) 1 (12Iin) (12yin) lnPr (Iin5 0,yin50) J
i51
(17.31)
where J represents the number of alternatives in the universal choice set and Iin is equal to 1 if the individual stated that alternative i is available and 0 otherwise. The relevant expressions are available in Boccara (1989).
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Hybrid choice models
399
3.2.3 Estimation The above framework was applied to model mode choice among drive alone (DA), shared ride (SR), and transit (T). The dataset included the following binary indicators related to alternative availabilities: (i) Is drive alone available for your trip? (ii) Is shared ride available for your trip? (iii) Is transit available for your trip? The following three models were estimated: ●
●
●
a logit choice model with deterministic choice sets (DA unavailable if the individual has no driver’s license; SR and T always available) a probabilistic choice set model (PCS) combining a choice set model (choice sets including: {DA, SR, T}, {DA, SR}, {SR}, {SR, T} and {T}) and a logit choice model conditional on the choice set an integrated model which combines the PCS with measurement equations expressing the availability indicators as a function of the latent availabilities and desirability of the alternatives, that is, it is a PCS model with indicators.
All three models have the same utility specifications with the following attributes and characteristics: in-vehicle travel time, out-of-vehicle travel time divided by distance by auto, cost, number of cars divided by number of driving-age individuals in the household, and walking distance to transit. Overall, eight parameters in the utilities are estimated. In the PCS and integrated models, one random constraint is used for each mode i as shown in equation (17.32), where a1i and a2i are parameters to be estimated, and assuming a logistic disturbance w1in with a location of 0 and a scale of 1. It is assumed that DA is perceived to be available if the number of cars divided by the number of driving-age individuals in the household (x1in in equation (17.32)) exceeds a certain threshold, while transit is available if the walking distance to transit (x1in in equation (17.32)) is below a certain threshold. It is assumed that SR is available when DA is available; conditional on DA being unavailable, the systematic part of the SR constraint consists of a constant term only. Thus, five additional parameters are to be estimated in the random constraints equations (that is, a1 and a2 for each of DA and transit, and a1 for SR). H1in 5 a1i 1 a2ix1in 2 w1in
(17.32)
Finally, in the integrated (PCS 1 indicators) model, the measurement equation takes the following form: I*in 5 (l1i 1 l2iPin) A*in 1 (l3i 1 l4iPin) (1 2 A*in) 2 uin
(17.33)
where Pin denotes the probability that individual n chooses alternative i from the universal choice set, and l1i, l2i, l3i, and l4i are parameters to be estimated. Thus, compared to the PCS model, the integrated model contains an additional 4 parameters per alternative. Table 17.1 shows the parameter estimates and t-statistics of the parameters in the structural equations of the three models: utility equations, and random constraints in the PCS and integrated models. The coefficients of the measurement equations are not shown. There seems to be a difference in magnitude between the parameter estimates of the
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
400
Handbook of choice modelling
Table 17.1 Estimation results for the logit, PCS, and integrated models Variable
Logit
PCS
Integrated (PCS with indicators)
Parameter t-statistic Parameter t-statistic estimate estimate Utilities Constant for DA Constant for SR In-vehicle travel time (min) Out-of-vehicle travel time/ distance (min/miles) Cost (cents) Number of cars/number of driving-age individuals (specific to DA) Number of cars/number of driving-age individuals (specific to SR) Distance to transit (miles; specific to transit) Availability constraints DA: constant (a1) DA: Number of cars/ number of driving-age individuals (a2) SR: constant (a1) Transit: constant (a1) Transit: Distance to transit (miles) (a2)
Parameter t-statistic estimate
−1.61 −2.80 −0.48
−3.63 −6.75 −3.20
−4.83 −4.83 −0.23
−1.90 −2.15 −2.17
−7.85 −7.40 −0.25
−4.20 −4.53 −5.85
−0.84
−3.76
−5.66
−3.18
−7.29
−5.85
−0.18 4.25
−0.77 6.39
0.02 −0.44
0.29 −0.24
0.04 1.48
1.02 1.19
3.86
5.85
−1.06
−0.57
0.43
0.51
−1.07
−1.61
1.02
0.18
−2.90
−1.58
6.61 16.05
1.72 4.89
8.01 18.88
5.36 5.99
7.11 1.19 1.68
2.47 1.99 1.84
5.08 1.05 0.75
3.81 5.18 0.62
Source: Ben-Akiva and Boccara (1995).
logit model and those of the PCS model, possibly reflecting a scale effect. The parameter estimates of the PCS and those of the integrated model are closer in magnitude. The signs are generally according to expectations, or if not, the corresponding variables are insignificant. One interesting result is that the car availability variable has a positive coefficient and is highly significant in the utility equation of the logit model, but not in the PCS and integrated models. In the latter two models, this variable is instead a highly significant predictor of the perceived availability of DA and SR, as one would expect. 3.2.4 Efficiency Using indicators of the latent variables in a HCM adds to the information content of the data and is expected to result in a gain in efficiency if the measurement equations are
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Hybrid choice models
401
correctly specified. Referring to the example considered in this section, one can expect that the integrated model which has indicators is more efficient than the PCS model. This gain in efficiency can be demonstrated by comparing the variance-covariance matrices of the parameter estimators. Considering the estimators of the common set of 13 parameters in the structural equations of the PCS and integrated models, denoted as bˆ PCS and bˆ Integrated, respectively, the difference in their variance-covariance matrices, that is, Sˆb 2 Sˆb , turns out to be positive-definite (has positive Eigen values), which shows that the integrated model is more efficient. In addition, one can compare the t-statistics of the parameter estimates. If both models are consistent (see below), their parameter estimates should be close to each other, yet the more efficient model will have lower standard errors and hence higher t-statistics. Referring to Table 17.1, the integrated model is more efficient as indicated by the higher t-statistics of the parameter estimates in the integrated model for those variables that are common and significant in both models. Finally, one can compare the standard errors of the predicted choice probabilities of the HCM and a mixture model without indicators. A more efficient model will result in smaller standard errors. For a given individual, the distribution of the choice probabilities can be simulated by drawing from the multivariate distribution of the parameter estimators (using the parameter estimates as their means and the estimated variance-covariance matrix).2 Having established that the integrated model is more efficient than the PCS, a Hausman specification test (Hausman, 1978) can be used to check for misspecification of the integrated model. The null hypothesis is that the difference in the parameter estimates of these two models is zero. The test is conducted using the common set of 13 parameters in the structural equations of the two models. The test statistic is: PCS
Integrated
(bˆ PCS 2 bˆ Integrated) r (Sˆb
PCS
2 Sˆb
) 21 (bˆ PCS 2 bˆ Integrated)
Integrated
(17.34)
This test statistic is chi-squared distributed with 13 degrees of freedom. The value of the above statistic turns out to be 3.61, which is smaller than the critical value of 7.04 at the 90 percent level of confidence. Therefore, the null hypothesis cannot be rejected, indicating that the parameter estimators of the integrated model are consistent. 3.3
Behavioral Realism
The standard discrete choice model has been criticized on the grounds that it is too simplistic to adequately model behavior. It can be viewed as a black box that maps observed inputs into observed choices through a preference function represented by the utility. The actual decision-making process involves several stages, including awareness/ knowledge of opportunities and attributes of alternatives, formation of perceptions and cognitive and affective attitudes, and plans or intentions for implementing a certain behavior (McFadden, 1999). It may also be affected by subjective norms (Ajzen, 1991) or other contextual factors related to the behavior of others (Ben-Akiva et al., 2012). These factors affecting individuals’ preferences are latent or unobserved by the analyst. The standard discrete choice model would gain behavioral richness by explaining observed behavior as a function of these latent factors. A recent example by Theis (2011) illustrates how the behavioral realism of airline
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
402
Handbook of choice modelling
itinerary choice models can be enhanced by modeling preferences for connecting time between flights as a function of attitudes towards the risk of misconnection, rush aversion, and trust in airlines’ scheduling abilities, thus representing more transparently how people make such decisions. This section discusses the motivation, data, and model for this case study. 3.3.1 Motivation Theis (2011) postulated that, contrary to the standard assumption that airlines make about individuals preferring the minimum connection time possible and which is used as a basis for scheduling flights, individuals may often prefer to have some additional buffer time (beyond the minimum connection time set by an airport). This preference is given behavioral meaning by explaining it as a function of passengers’ attitudes. In particular, if passengers fear the risk of missing their connecting flight, if they do not like to be rushed through an airport terminal, and if they have low trust in airlines’ abilities to provide reliable connections, they are more likely to prefer some buffer time. This hypothesis is tested by explicitly incorporating passengers’ attitudes towards risk, rush, and trust in a model predicting their itinerary choice. 3.3.2 Data The dataset used in the study by Theis (2011) is obtained from an SP survey. Every respondent was presented with eight choice experiments each involving the choice between the respondent’s recent US domestic air trip (on which information was obtained) and an alternative flight itinerary. The attributes presented for each itinerary include: airline, aircraft type, departure airport, departure time, arrival airport, arrival time, layover (or connection) time including information about the minimum connection time required by the airport, number of connections, on-time performance (percentage of similar flights that are on time) and round-trip fare. The on-time performance attribute was included to avoid bias towards choosing the recent trip itinerary. A snapshot of a choice experiment is shown in Figure 17.5. Individuals’ preferences regarding specific airports and airlines were also collected to help in the design of the SP survey. Socio-demographic characteristics, number of US domestic air trips made in the last year, membership in frequent flyer programs for all ranked airlines, and information on whether an individual missed a connecting flight in the past two years were also collected. Finally, respondents’ attitudes towards risk, rush and trust were measured by asking them to rate their level of agreement with the statements shown in Table 17.2 on a five-point Likert scale with response categories including ‘strongly disagree’, ‘somewhat disagree’, ‘neither agree nor disagree’, ‘somewhat agree’ and ‘strongly agree’. 3.3.3 Model framework and specification Framework The modeling framework is an integrated choice and latent variable model as shown in Figure 17.6. The utility of a flight itinerary alternative is a function of attributes of the itinerary such as the fare, the airline and the buffer time; characteristics of the traveler such as gender, age, income and trip purpose; and the attitudes of risk, rush and trust, interacted with attributes of the itinerary such as buffer time and number
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Hybrid choice models
403
Which would you choose for a trip to Jacksonville, FL? Your current flight
Alternate flight
Delta Regional Jet
Continental Standard Jet
Logan International Airport, Boston MA
Burlington International Airport, Burlington VT
AIRLINE AIRCRAFT TYPE AIRPORT DEPARTURE
8.00 AM
5.00 PM
Jacksonville International 12:00 PM
Jacksonville International 10:00 PM
1 hr. (your connecting airport requires a minimum of 40 mins. to connect) 4 hrs.
40 mins. (the connecting airport requires a minimum of 40 mins. to connect) 5 hrs.
TIME AIRPORT ARRIVAL
TIME
LAYOVER TIME
TOTAL TRAVEL TIME
1
1
80% of these flights are on time
90% of these flights are on time
$250
$188
my current flight
the alternate flight
NUMBER OF CONNECTIONS ON-TIME PERFORMANCE ROUND TRIP FARE
I would choose: Source: Theis (2011).
Figure 17.5 SP experiment example Table 17.2 Attitudinal statements Indicator
Description
I1 I2 I3 I4
I like to take my time when connecting between flights. It’s hard for me to find my way through airports. I don’t think time at airports is wasted because I can shop, eat, or work at airports. I don’t mind being rushed at a connecting airport if this means I’ll arrive at my final destination earlier. I enjoy having extra time at airports. I usually arrive at the check-in counter just before the check-in deadline. Catching my scheduled connecting flight is of great importance to me. I try to avoid short connections because of the risk of either me or my luggage missing the connecting flight. Given two itineraries that only differ in connecting time, I always choose the one with shorter connecting time. I’m willing to accept the risk of a missed connection if this gets me earlier to my destination most of the time. Airlines only sell connections that they expect passengers could make. Airlines sometimes underestimate the time needed to connect between flights. It is the passenger’s responsibility to plan for a sufficient transfer time when booking a connecting itinerary. I make sure that the planned connecting time is adequate for me when booking a connecting itinerary.
I5 I6 I7 I8 I9 I10 I11 I12 I13 I14
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
404
Handbook of choice modelling
I1
Bags
I2
1
I4 I5
RISK
I6 2
Preferred Airlines
RUSH
Most Preferred Airport
Night Departure
U
Access Time Elapsed Time Min Connecting Time
y
Interactions with attributes
Number of Connections
I7 I8 I9
Second Preferred Airport
Attributes of the Alternative
Fare On-time Performance
I10 3 TRUST
I3
I11
Rating Exercise Responses
Employmt
Age
Income
Trip Duration
Party Size
Paid by
Distr Ch
Trip Purpose
Frequency of Trips
FFP Level
Gender
Missed
Characteristics of the Traveler
I12 I13 I14
Buffer Time
Source: Theis (2011).
Figure 17.6 Integrated choice and latent variable model of airline itinerary choice of connections. The interaction of an attitude such as rush with an attribute such as buffer time captures the varying sensitivity to buffer time as a function of the degree of rush aversion. The latent variables are functions of the characteristics of the traveler. The attitudinal indicators collected in the survey are used as indicators of the latent variables. The selection of specific indicators to use for a given attitude depends on a combination of the researcher’s judgment of the correspondence of these statements with the attitude and the estimated factor loadings and their statistical significance; larger loadings correspond to a stronger relationship between the attitude and the corresponding indicator. Finally, the utility is measured by the choice. We show below how the latent variables enter the utility equations, the form of the measurement equations of the indicators, the distributions of the disturbances and error terms in the various equations, and the likelihood function expression. All the structural and measurement equations are linear in the parameters. Note that indicator I7 is excluded from the model formulation and estimation results shown below. Formulation The latent variables enter the utility equation of alternative i as follows (the reference to an individual n is implicit and omitted for simplicity):
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Hybrid choice models
405
| Ui 5 Vi 1 Rush (b16Buffer timei , 15 min 1 b17Buffer timei 15 2 59 min 1 b18Buffer timei . 60 min) 1 (b19Risk 1 b20Trust) Number of connectionsi 1 ei (17.35) | where V i denotes the systematic part of the utility excluding the latent variables, and Buffer timei is additional connecting time in minutes associated with itinerary i beyond the minimum connecting time. The disutility of buffer time is specified as a piecewise linear function with two breakpoints at 15 and 60 minutes, and the three ranges of buffer time are defined as follows: Buffer timei , 15 min 5 min (Buffer timei,15)
(17.36)
Buffer timei 15 2 59 min 5 max (0, min (Buffer timei 2 15) ,45)
(17.37)
Buffer timei . 60 min 5 max (0,Buffer timei 2 60)
(17.38)
The buffer time variables and the number of connections variable are additionally included in the systematic utility without interaction with the latent variables. The disturbances in the utility equations of the flight itineraries are i.i.d. extreme value type I (0,1). The disturbances in the structural equations of the attitudes are i.i.d. normal (0,1). Their variances are fixed at 1 to set their scale. The indicators are modeled as continuous variables for simplicity, and every indicator Ir,r 5 1,. . .6,8,. . .,14 is expressed as a function of one or more latent variables as follows: Ir 5 kr 1 lr1Risk 1 lr2Rush 1 lr3Trust 1 ur and ur~N (0,s2u ) r
(17.39)
where kr is a constant and lr1, lr2, and lr3 are parameters to be estimated (some of which are fixed at 0). The error terms u are assumed to be multivariate normally distributed with a diagonal variance-covariance matrix Su. For a given individual, the joint probability of the choice and the 13 indicators is expressed as the product of their conditional probabilities, integrated over the joint density function of the three latent variables as follows:
3
P (y,I1,. . .,I6,I8,. . .,I14 0 X;b,a,l,k,Se,Sw,Su) 5
3 3 P (y 0 X,Risk,Rush,Trust;b,Se) g (I1,. . .,I6,I8,. . .,I14 0 Risk,Rush,Trust;l,k,Su)
Trust Rush Risk
f (Risk,Rush,Trust 0 X;a,Sw) d Risk d Rush d Trust
(17.40)
The conditional choice probability is a logit model. The joint density function of the attitudinal indicators is expressed as follows: 14 Ir2kr2lr1Risk2lr2Rush2lr3Trust 1 g (I1, . . . , I6, I8,. . ., I14 0 Risk, Rush, Trust; l, k,Su ) 5 q fa b s s ur r51 ur
(17.41)
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
406
Handbook of choice modelling
The product on the right hand side of equation (17.41) does not include a term for the seventh indicator. The joint density function of the latent attitudes is expressed as follows: f (Risk,Rush,Trust 0 X;a,Sw) 5
Risk 2 hRisk (X;a) Rush 2 hRush (X;a) 1 1 fa fa b b sw sw sw sw Risk
1
sw
Trust
Risk
Trust 2 h sw
Trust
fa
Rush
(X;a)
Trust
Rush
b
5f (Risk 2 hRisk (X;a)) f (Rush 2 hRush (X;a) ) f (Trust 2 hTrust (X;a) ) (17.42) In the above expression, hRisk (X;a) , hRush (X;a) , and hTrust (X;a) represent the systematic parts of the structural equations of the attitudes Risk, Rush and Trust, respectively. The above expression simplifies due to the normalization sw 5 sw 5 sw 5 1. Risk
Rush
Trust
3.3.4 Estimation Table 17.3 shows the estimation results of the parameters related to the following variables in the utility equations: number of connections, minimum connecting time, buffer time, and the interactions with latent variables. While an increasing number of connections and an increasing minimum connecting time decrease the utility of a flight alternative, passengers who are rush averse may gain utility from the first 15 minutes of buffer time beyond the minimum connecting time, after which additional buffer time causes disutility. Figure 17.7 shows the utility of buffer time for two values of the latent variable Rush: a low value of –0.410 and a higher value of 0.0124 (median in the sample3). Individuals with a rush value of –0.410 have zero utility from the first 15 minutes of buffer time and then the utility decreases monotonically; individuals with a rush value smaller than –0.410 have a monotonically decreasing utility function. Table 17.3 Estimation results (part of the utility equations) for the airline itinerary choice model with latent variables Variable
Parameter estimate
Number of connections Minimum connecting time (in min) Buffer time < 15 min (in min) Buffer time 15–59 min (in min) Buffer time > 60 min (in min) Interactions Buffer time < 15 min (in min) × rush aversion Buffer time 15–59 min (in min) × rush aversion Buffer time > 60 min (in min) × rush aversion Number of connections × risk tolerance Number of connections × trust
Standard error t-statistic
−0.418 −0.00656 0.0113 −0.00397 −0.00141
0.132 0.003 0.005 0.002 0.002
−3.2 −2.1 2.4 −1.9 −0.9
0.0193 −0.00671 0.00117 0.0107 0.0720
0.006 0.003 0.001 0.065 0.072
3.5 −1.9 0.9 1.7 1.0
Source: Theis (2011).
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Hybrid choice models
407
0.2 0.1
Utility
0 −0.1
0
20
40
60
80
100
120
140
−0.2 −0.3 −0.4 −0.5 −0.6 Buffer time rush aversion = –0.410
rush aversion = 0.0124
Source: Theis (2011).
Figure 17.7 Buffer time utility for different rush aversion levels And individuals with a rush value of 0.0124 gain utility as buffer time increases from 0 to 15 minutes, but further increases in buffer time cause disutility. Higher levels of risk tolerance and trust in airlines’ schedule reliabilities decrease the disutility caused by the number of connections as indicated by positive coefficients for the interactions between these latent variables and the number of connections. This makes sense since if passengers tolerate risk, they are less likely to be annoyed by having more connections compared to passengers who are risk-averse, and similarly for the level of trust interpretation. To conclude, including attitudes of rush, risk and trust helps explain the nonmonotonicity of the disutility of buffer time through variation in the values of these attitudes. As for the attitudes themselves, the variables that were statistically significant in all three attitudinal structural equations were gender, having elite status on any airline, having missed a connection in the past 12 months and whether the trip is paid for by the individual’s company, but the explanatory power of the attitude equations was low (pseudo R2 ranged from 0.07 to 0.14). 3.4 Policy Relevance Modeling the influence of latent variables on behavior is likely to make a significant difference in the accuracy of predictions, the design of effective policies, and the appraisal of policies and projects due to several reasons. First, the HCM allows the analyst to segment people by latent variables such as attitudes and satisfaction; the importance of market segmentation by customer attitudes has long been recognized in the marketing literature as a policy tool for marketing products and services differently to different market segments (for example, Anable, 2005; Proussaloglou et al., 2001; Shiftan et al., 2008) thus allowing for greater customer satisfaction and potentially greater revenues. The airline itinerary choice example illustrates the advantages of explicitly modeling passengers’ attitudes both from the perspective of the passengers themselves (having better options) and the airline that may be able to reduce its costs.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
408
Handbook of choice modelling
Second, explicitly modeling the latent variables is likely to lead to better predictions of the impacts of policies when the latent variables are important predictors of the choice and when there is significant heterogeneity in the latent variables across the population. This is illustrated by the value of time example and the latent choice set example. Ashok et al. (2002) also show that when the latent variables are important predictors of the choice yet are misspecified (for example, using the indicators directly in the utility equations or using fitted latent variables from a factor analysis model without accounting for their distribution), the results can be misleading from a policy perspective. Third, by explicitly modeling the determinants of the latent variables, one can test policies that may impact the choice indirectly through their influence on the latent variables. This effect is discussed below in the context of the latent choice set example. Another example may be a change in the transportation system, such as the introduction of a new rail system, which may influence people’s attitudes towards travel modes (see for example, Yáñez et al., 2010). 3.4.1 Airline itinerary choice The airline itinerary choice case study (Theis, 2011) illustrates that individuals have varying preferences for connecting time based on their level of rush aversion, which is itself a function of several socio-economic variables and past experiences. This finding has several policy implications that airlines can capitalize on to improve their flight schedules in a way that better aligns with passengers’ preferences, especially those that favor some extra buffer time beyond the minimum connecting time. First, airlines can enhance distribution channel displays by offering more choices (for example, with longer connecting times) to customers booking their itineraries online, possibly based on a customer’s socio-demographics which influence attitudes if the customer is identified, or by giving a warning if a customer selects a flight with a short connecting time. Second, airlines can change their default sorting of flights shown to a customer so that it is not necessarily in increasing order of elapsed time. Third, if airlines de-peak their timetables, which results in an increase in connecting times, they can save on operational costs owing to more effective use of resources (gates, ground equipment, and so on). Overall, airlines can benefit from longer connecting times (beyond the minimum possible) as this reduces the irregularity costs (such as misconnection follow-up costs and passenger goodwill) for the airlines. Airlines can also increase their revenues if they are able to charge money for additional connecting time since at least a segment of the population has a positive willingness to pay for additional buffer time. 3.4.2 Value of travel time savings In standard appraisal methods of transportation projects, travel time savings represent the major category of benefits. These savings are monetized by using an estimate of the value of time. A richer representation of value of travel time savings as a function of attitudes would lead to better estimates of the VTTS and consequently of the benefits of new transportation projects. 3.4.3 Mode choice with latent choice sets The mode choice with latent choice sets example (Ben-Akiva and Boccara, 1995) also illustrates that there are several advantages from the explicit representation of latent
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Hybrid choice models
409
Table 17.4 Prediction results for the logit and PCS models
Scenario 1: 100% increase in DA and SR in-vehicle travel time Change in share of DA Change in share of SR Change in share of T Scenario 2: 100% increase in transit out-of-vehicle travel time Change in share of DA Change in share of SR Change in share of T
Logit
PCS
−34.4% −10.5% +44.9%
−7.1% −10.2% +17.3%
+2.3% +1.8% −4.1%
+2.9% +0.8% −3.7%
Source: Ben-Akiva and Boccara (1995).
choice sets from a marketing perspective. One advantage has to do with the prediction of the impact of advertisements, promotions, and so on. Including these variables in the utility equations directly, as is typically done, is not desirable because such factors do not alter the utility of the product. The causality instead is at the level of the choice set through, for example, an increase in awareness about the products available in the market. Moreover, if the latent choice set model contains information capturing consumer captivity or loyalty to certain brands or products, specific marketing plans can then be customized to certain consumer segments. Another advantage of explicitly modeling the choice set is greater predictive power when there is significant heterogeneity in the choice set across consumers in the market. An example of differences in predictive power is shown next. Since the integrated and PCS models are consistent with each other as was shown in section 3.2.4, it is sufficient to compare the predictive power of one of them to that of the logit model. Table 17.4 shows the percentage change in modal shares for two scenarios: a 100 percent increase in DA and SR in-vehicle travel time, and a 100 percent increase in transit out-of-vehicle travel time. The logit model predicts larger changes in mode shares than the PCS model for changes in attributes that do not influence the choice set; this is because in the PCS model, any such changes in attributes for alternatives that are unavailable for a given individual make no difference in the individual’s choice probabilities, while they do in the logit model. On the other hand, if the scenario involves a variable that influences the alternative availabilities (for example, number of cars), the modal share changes predicted by the PCS model are expected to be larger than those predicted by the logit model. Thus, the predictive power of the PCS model seems to be stronger than that of the logit model.
4
CONCLUSION
The HCM, which integrates latent variable models with discrete choice models, has been in use for about a decade now. This chapter reviewed the framework and formulation of the HCM and discussed its four main advantages: ability to explicitly model unobserved heterogeneity, increased efficiency, enhanced behavioral realism and extended policy
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
410
Handbook of choice modelling
relevance. These advantages were illustrated in the context of three applications: heterogeneity in the value of travel time savings arising from heterogeneity in attitudes towards travel modes; mode choice with latent choice sets; and airline itinerary choice incorporating attitudes towards risk of misconnection, rush aversion and trust in airlines. Despite these advantages and the growing number of applications employing the HCM as a modeling framework, there remain a number of difficulties that have hindered widespread use of this framework. These are discussed below, along with directions for future research in this area. First, there are estimation issues. Unlike a logit model, the likelihood function of the HCM is not globally concave, which makes the estimation process more complex and necessitates that the model be estimated from multiple starting values to check that convergence to the same set of ‘behaviorally plausible values’ is achieved (Ben-Akiva and Boccara, 1995). Moreover, since general conditions for the identification of these models have not been established, the researcher has to rely to some extent on empirical tests to ensure that the model is identified. Also, from a practical perspective, until recently there was no software that would allow the simultaneous estimation of the HCM without coding the likelihood function. Second, one issue with the structural equations of the latent variables is that these equations usually have low explanatory power in most empirical applications as usually indicated by insignificant variables and low pseudo R2 values. This is because latent variables like attitudes and perceptions are usually expressed as a function of socio-demographic variables. However, it is doubtful whether latent variables such as attitudes are actually a function of socio-demographic variables (see for example, Anable, 2005). They are more likely to be shaped by people’s life experiences, lifestyles, and so on. The challenge is in adequately collecting such data in surveys and incorporating them in the models. Third, there is the issue of endogeneity. In the HCM framework, the latent variables are predictors of choice. But it may also be the case that the latent variables are affected by the choice, such as attitudes towards a travel mode being affected by repeated exposure to that mode. If there is an effect of the choice on the attitude which is not modeled, the parameter estimates will be biased. This is less of an issue when using data from stated preferences experiments, where it may be reasonably assumed that people’s attitudes (formed before the SP study) influence the choices they make in the SP experiment, but should be tested for when using revealed preferences data. Ideally, panel data would be needed to test these causalities. Fourth, and from a practical perspective, the development of HCMs has mostly dealt with estimation as opposed to application. As discussed in this chapter, the structural part of the HCM (that is, the framework shown in Figure 17.2 without the indicators) can be used in the application and generally does not require additional information beyond what is included in the model. More work on model application is needed to illustrate the potential of the HCM in leading to more sensible policy analysis. Finally, in this chapter, we presented the formulation and example applications of the static HCM. When the latent variables evolve over time, the dynamics in the behavior or actions are driven by the dynamics in the underlying latent variables. The dynamic HCM is a discrete choice model integrated with a hidden Markov model. Formulations and examples of the dynamic HCM are available in Ben-Akiva (2010) and Choudhury et al. (2010), but this area is still under-researched.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Hybrid choice models
411
NOTES 1. With exogenous latent variables, the formulation of the choice probability remains the same except that the distribution of the latent variables in the probability expression is no longer a function of explanatory variables (that is, is not a behavioral model). 2. The data were unavailable to conduct this analysis. 3. A fitted value of the latent variable was computed for every individual in the sample as the systematic part of the structural equation of the latent variable evaluated at the estimated values of the parameters.
REFERENCES Abou-Zeid, M., M. Ben-Akiva, M., Bierlaire, C. Choudhury and S. Hess (2010), ‘Attitudes and value of time heterogeneity’, in E. Van de Voorde and T. Vanelslander (eds), Applied Transport Economics: A Management and Policy Perspective, Antwerp: Uitgeverij De Boeck nv, pp. 523–45. Ajzen, I. (1991), ‘The theory of planned behavior’, Organizational Behavior and Human Decision Processes, 50 (2), 179–211. Anable, J. (2005), ‘Complacent car addicts or aspiring environmentalists? Identifying travel behaviour segments using attitude theory’, Transport Policy, 12 (1), 65–78. Ashok, K., W.R. Dillon and S. Yuan (2002), ‘Extending discrete choice models to incorporate attitudinal and other latent variables’, Journal of Marketing Research, 39 (1), 31–46. Ben-Akiva, M. (2010), ‘Planning and action in a model of choice’, in S. Hess and A. Daly (eds), Choice Modelling: The State-of-the-Art and the State-of-Practice, Proceedings from the Inaugural International Choice Modelling Conference, Bingley: Emerald, 19–34. Ben-Akiva, M. and B. Boccara (1995), ‘Discrete choice models with latent choice sets’, International Journal of Research in Marketing, 12 (1), 9–24. Ben-Akiva, M. and S. Lerman (1985), Discrete Choice Analysis: Theory and Application to Travel Demand, Cambridge, MA: MIT Press. Ben-Akiva, M., D. McFadden, K. Train, J. Walker, C. Bhat, M. Bierlaire, D. Bolduc, A. Boersch-Supan, D. Brownstone, D.S. Bunch, A. Daly, A. de Palma, D. Gopinath, A. Karlstrom and M.A. Munizaga (2002a), ‘Hybrid choice models: progress and challenges’, Marketing Letters, 13 (3), 163–75. Ben-Akiva, M., A. de Palma, D. McFadden, M. Abou-Zeid, P.-A. Chiappori, M. de Lapparent, S.N. Durlauf, M. Fosgerau, D. Fukuda, S. Hess, C. Manski, A. Pakes, N. Picard and J. Walker (2012), ‘Process and context in choice models’, Marketing Letters, 23 (2), 439–56. Ben-Akiva, M., J. Walker, A. Bernardino, D. Gopinath, T. Morikawa and A. Polydoropoulou (2002b), ‘Integration of choice and latent variable models’, in H. Mahmassani (ed.), Perpetual Motion: Travel Behaviour Research Opportunities and Application Challenges, Bingley: Emerald, pp. 431–70. Boccara, B. (1989), ‘Modeling choice set formation in discrete choice models’, PhD dissertation, Massachusetts Institute of Technology. Bolduc, D. and R. Alvarez-Daziano (2010), ‘On estimation of hybrid choice models’, in S. Hess and A. Daly (eds), Choice Modelling: The State-of-the-Art and the State-of-Practice, Proceedings from the Inaugural International Choice Modelling Conference, Bingley: Emerald, pp. 259–87. Bolduc, D., M. Ben-Akiva, J. Walker and M. Michaud (2005), ‘Hybrid choice models with logit kernel: applicability to large scale models’, in M. Lee-Gosselin and S. Doherty (eds), Integrated Land-Use and Transportation Models: Behavioural Foundations, Amsterdam: Elsevier Science, pp. 275–302. Bollen, K.A. (1989), Structural Equations with Latent Variables, New York: John Wiley and Sons. Choo, S. and P.L. Mokhtarian (2004), ‘What type of vehicle do people drive? The role of attitude and lifestyle in influencing vehicle type choice’, Transportation Research Part A, 38 (3), 201–22. Choudhury, C., M. Ben-Akiva and M. Abou-Zeid (2010), ‘Dynamic latent plan models’, Journal of Choice Modelling, 3 (2), 50–70. Daly, A., S. Hess, B. Patruni, D. Potoglou and C. Rohr (2012), ‘Using ordered attitudinal indicators in a latent variable choice model: a study of the impact of security on rail travel behaviour’, Transportation, 39 (2), 267–97. Everitt, B.S. (1984), An Introduction to Latent Variable Models, London: Chapman and Hall. Gopinath, D.A. (1995), ‘Modeling heterogeneity in discrete choice processes: application to travel demand’, PhD dissertation, Massachusetts Institute of Technology, Cambridge, MA. Gopinath, D.A. and M. Ben-Akiva (1997), ‘Estimation of randomly distributed value of time’, working paper, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
412
Handbook of choice modelling
Hausman, J. (1978), ‘Specification tests in econometrics’, Econometrica, 46 (6), 1251–71. Johansson, M.V., T. Heldt and P. Johansson (2006), ‘The effects of attitudes and personality traits on mode choice’, Transportation Research Part A, 40 (6), 507–25. Kitrinou, E., A. Polydoropoulou and D. Bolduc (2010), ‘Development of integrated choice and latent variable (ICLV) models for the residential relocation decision in island areas’, in S. Hess and A. Daly (eds), Choice Modelling: The State-of-the-Art and the State-of-Practice, Proceedings from the Inaugural International Choice Modelling Conference, Bingley: Emerald, pp. 593–618. McFadden, D. (1986), ‘The choice theory approach to market research’, Marketing Science, 5 (4), 275–97. McFadden, D. (1999), ‘Rationality for economists?’, Journal of Risk and Uncertainty, 19 (1–3), 73–105. Morikawa, T., M. Ben-Akiva and D. McFadden (2002), ‘Discrete choice models incorporating revealed preferences and psychometric data’, in T.B. Fomby, R.C. Hill and I. Jeliazkov (eds), Advances in Econometrics, vol. 16, Bingley: Emerald, pp. 29–55. Proussaloglou, K., K. Haskell, R. Vaidya and M. Ben-Akiva (2001), ‘An attitudinal market segmentation approach to commuter mode choice and transit service design’, paper presented at the 80th Annual Meeting of the Transportation Research Board, Washington, DC, January. Raveau, S., M.F. Yáñez and J. de D. Ortúzar (2012), ‘Practical and empirical identifiability of hybrid discrete choice models’, Transportation Research Part B, 46 (10), 1374–83. Shiftan, Y., M.L. Outwater and Y. Zhou (2008), ‘Transit market research using structural equation modeling and attitudinal market segmentation’, Transport Policy, 15 (3), 186–95. Swait, J. and M. Ben-Akiva (1987a), ‘Incorporating random constraints in discrete models of choice set generation’, Transportation Research Part B, 21 (2), 91–102. Swait, J. and M. Ben-Akiva (1987b), ‘Empirical test of a constrained discrete choice model: mode choice in São-Paulo, Brazil’, Transportation Research Part B, 21 (2), 103–15. Theis, G.W. (2011), ‘Incorporating attitudes in airline itinerary choice: modeling the impact of elapsed time’, PhD dissertation, Massachusetts Institute of Technology, Cambridge, MA. Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge: Cambridge University Press. Walker, J. and J. Li (2007), ‘Latent lifestyle preferences and household location decisions’, Journal of Geographical Systems, 9 (1), 77–101. Walker, J.L. (2001), ‘Extended discrete choice models: integrated framework, flexible error structures, and latent variables’, PhD dissertation, Massachusetts Institute of Technology, Cambridge, MA. Walker, J.L. and M. Ben-Akiva (2002), ‘Generalized random utility model’, Mathematical Social Sciences, 43 (3), 303–43. Walker, J.L. and M. Ben-Akiva (2011), ‘Advances in discrete choice: mixture models’, in A. de Palma, R. Lindsey, E. Quinet and R. Vickerman (eds), A Handbook of Transport Economics, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 160–87. Walker, J.L., M. Ben-Akiva and D. Bolduc (2007), ‘Identification of parameters in normal error component logit-mixture (NECLM) models’, Journal of Applied Econometrics, 22 (6), 1095–125. Yáñez, M.F., S. Raveau and J. de D. Ortúzar (2010), ‘Inclusion of latent variables in mixed logit models: modelling and forecasting’, Transportation Research Part A, 44 (9), 744–53.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
18 Choice modeling and risk management Glenn W. Harrison and Jimmy Martínez-Correa1
Choice models in economics often present decision makers with a fixed set of alternatives. This setting certainly mimics many naturally occurring decisions, such as the homely selection of products from shelves in a supermarket. But it does not reflect the type of decisions that are the center of attention in the field of risk management, where it is precisely the ability to ‘fine tune’ the choice alternative that is the behavior of interest. It is one thing to ask someone to select between a safe lottery and a risky lottery, where the return on the former is higher than the latter, and another thing to provide a menu of options to change the consequences of risk for each lottery. Of course, this can be viewed as just an expanded choice set, but it is central to understanding how decision makers mitigate risk. They can do so by simply choosing the safer options, or by engaging in a range of activities which alter the risks of those options or the consequences of the options. In many respects this is a natural domain of application for ‘choice modeling,’ as the term is used here, since one can simultaneously discover how decision makers want to structure their choices as well as discover their attitudes to risk. Although our focus is on the economics of risk management, many of the basic ideas have developed in other related disciplines. A good example is transport economics, where the alternatives may be fixed but the attributes selected and adjusted endogenously. This possibility arises naturally in negotiation settings, as demonstrated by Hensher et al. (2007) and Marcucci et al. (2009), but applies equally well when considering the endogenous process as akin to the risk management choices discussed here. The same idea arises in marketing, where software-based ‘configurators’ allow mass customization of products to meet the preferences of customers (for example, Kamis et al., 2008). And the choice of one mode of transport implies tradeoffs between many attributes that need to be considered simultaneously, akin to the discussion of multivariate risk aversion discussed here. In fact, that literature derived from multi-attribute concerns of the kind that are standard in transport economics and marketing applications (for example, Louviere et al. 2000). It is also close to the literature that spans economics and transportation engineering on the value of a statistical life, when one tries to untangle the correlated effects of the risk of fatal and non-fatal injuries: what does not kill you likely injures you seriously (for example, Viscusi, 1993; Leeth and Ruser, 2003). We review five major topics within this emerging domain of application of choice modeling, with an eye for how they change the nature of the choice modeling task. We particularly focus on a canonical choice task under risk, the purchase of insurance. Choices can be insured through market transactions with companies specializing in providing insurance coverage, but the same risk management can often be accomplished by the decision maker undertaking other types of costly activities. Certain transactions might change the probabilities that the decision maker faces, or change the final consequences of those choices. We introduce the basic elements of the vast literature on risk 413 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
414
Handbook of choice modelling
management in section 1, and draw out implications for the modeling of choice behavior over risky outcomes. Choices are often modeled by assuming that decision makers perfectly integrate them with all other choices they have made, or that decision makers completely ignore other choices. The issue here is whether one assumes perfect or imperfect asset integration in some settings, or perfect or imperfect capital markets in other settings. We review the implications for choice modeling of these assumptions in section 2. Choices are characterized by several attributes, and indeed the complex manner in which multiple attributes trade off is one of the central concerns of most models of choice behavior. If one is to properly characterize risky choice over multiple attributes, then one needs to recognize the formal implications of dealing with multi-attribute risk aversion. We do that in section 3. The major ‘behavioral moving parts’ in any choice involving risk and time are risk attitudes, broadly defined, time preferences, and subjective beliefs. We have theoretical, experimental and econometrics tools to evaluate their confounding role in understanding observed choice, but they are rarely applied systematically in the existing literature. We examine these tools in section 4, and in section 5 discuss extensions to consider uncertainty and ambiguity.
1
RISK MANAGEMENT OF FINANCIAL AND NON-FINANCIAL RISKS
Ehrlich and Becker’s (1972) paper is widely viewed as the first theoretical paper in risk management. They considered the role of financial and non-financial risk management tools, such as insurance and self-protection activities, to deal with hazards and their economic costs, and derived four important results. First, when ‘market’ insurance is not available, individuals engage in self-insurance (for example, cash balances or savings) and self-protection activities (for example, theft alarms) according to the costs and benefits associated with those activities. Selfinsurance is defined as an activity that is able to affect the cost associated with a potential loss, whereas self-protection affects the probability of the loss itself.2 Each of these activities has to involve some cost to the decision maker, so that there is an economic tradeoff in choosing the risk management strategy. Second, market insurance and self-insurance are substitutes. Both risk management strategies are able to redistribute resources from good states of the world to bad states of the world, thus affecting the size cost of the loss and smoothing consumption across states of nature. Third, market insurance and self-protection can be substitutes or complements. For example, the installation of a theft alarm at home might be enough to deter robbery so home insurance might not be necessary. On the contrary, a theft alarm might increase the demand for insurance if this self-protection strategy reduces the probability of a loss and the cost of insurance varies proportionally with this probability. An implication of this relationship is that if the price of insurance to cover certain risks is independent of expenditures on self-protection activities that reduce the probability of a loss, then the market will tend to consider those risks as uninsurable since moral hazard is more likely
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Choice modeling and risk management
415
to arise. However, in contrast to the moral-hazard intuition, this relationship may also imply that the presence of market insurance may increase the self-protection activities. Finally, the framework of Ehrlich and Becker (1972, p. 627) is able to explain the insurance-gambling puzzle that Friedman and Savage (1948) identified. They argued that behavior of people buying insurance and gambling at the same time can be explained if the broader choice options available are sufficiently favorable: ‘inferences about attitudes towards risk cannot be made independently of existing market opportunities: a person may appear to be “risk avoider” under one combination of prices and potential losses and a “risk taker” in another’ (Ehrlich and Becker, 1972, p. 627). Using the expected utility theory (EUT) framework, the analysis has been extended in several directions. Dionne and Eeckhoudt (1985) found that more risk aversion increases self-insurance activities but does not necessarily increase self-protection. Briys and Schlesinger (1990, p. 466) provided an explanation for these results: self-insurance unambiguously reduces risk, while self-protection does not. Consequently, it is no surprise that an increase in risk aversion unambiguously increases the level of self-insurance, but may sometimes decrease the level of self-protection. However, Hiebert (1989, pp. 300–301) showed that self-insurance activities do not necessarily increase with more risk aversion if the effectiveness on loss mitigation of these activities is uncertain. Briys et al. (1991) take the analysis one step further and analyze the main tools of risk management when their reliability cannot be guaranteed; that is, when there is as chance that these tools might not work as expected in the case of a loss, just as an insurer might default on its clients because of solvency issues. In contrast to Ehrlich and Becker (1972), Briys et al. (1991, p. 47) find that the ‘riskiness’ of final wealth is not necessarily reduced by an increase in market insurance or self-insurance when they are not fully reliable, and market insurance and self-insurance might be complements without full reliability. Finally, Sweeny and Beard (1992) study the comparative statics of self-protection when initial wealth and the size of the loss changes. From a non-EUT perspective many of the results in Ehrlich and Becker (1972) hold. In fact, Quiggin (1991, p. 340) showed that a wide range of comparative static results under EUT can be extended to rank dependent utility, because this model ‘may be regarded as expected utility with respect to a transformed probability distribution.’ Konrad and Skaperdas (1993), Machina (1995, 2000) and Courbage (2001) showed that the main results in Ehrlich and Becker (1972) hold under a wide range of alternatives to EUT. The framework of Ehrlich and Becker (1972) can be used to study how policies or activities that change risks can affect individuals’ welfare. Shogren and Crocker (1991, pp. 6, 9 and 11), using the EUT model, analyze the ex ante value of reducing risk when there are self-protection possibilities and find three important results. First, when self-protection influences the probability and/or the severity of a loss, the ex ante valuation of a reduction in risk is a function of risk attitudes and the marginal rate of technical substitution between self-protection and hazard concentrations. This implies that willingness to pay for reductions in risk cannot be studied just by looking at observable expenditures in self-protection and exposure to risk. This also implies that one cannot simply sum the unweighted compensating or equivalent variation to study the societal impact of a policy that affects a risk individuals are exposed to. Second, even under intuitive conditions, such as individuals’ pecuniary costs of hazards being convex to risk, increased exposure to a hazard does not necessarily mean that the individual needs to be compensated. The intuition behind this result is that a change in exposure to risk
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
416
Handbook of choice modelling
induced by self-protection may have effects both on the probability and the severity of the loss. There are no ex ante reasons to believe that self-protection affects both probability and severity in the same direction. Finally, intuitive and simple conditions on the costs of risks are not sufficient to guarantee an unambiguous response of self-protection expenditures to changes in risk. This implies that observed expenditures in self-protection are not necessarily a lower bound on the subject’s ex ante value of a reduction in risk. Intuitively this may happen because these expenditures may not be necessarily increasing in risk. Quiggin (1992, p. 41) claims that, under certain intuitive conditions, the negative results in Shogren and Crocker (1991) will not hold and that the ‘standard willingness-topay approach to valuing environmental hazards is valid under fairly general conditions’. These conditions are decreasing absolute risk aversion and a separability condition. The former is an assumption that deserves discussion but is widely accepted by economists; however, the latter implies that self-protection activities mitigates the individual’s exposure to risk but does not affect the risk itself. Shogren and Crocker (1999) claim that this separability assumption is problematic and they argue that risk is endogenous in many situations. This implies that self-protection activities can mitigate the consequences of risk to the individual but can also affect the general level of risk itself. If this is in fact the case, then the welfare analysis of risk reduction cannot avoid the identification of risk attitudes. This is an important instance in which choice tasks and experiments can help identify risk attitudes in order to carry out welfare analysis. A modern perspective on the analysis of financial decisions was formally introduced by Mayers and Smith (1983), but recognized earlier by Gould (1969, p. 151), who proposed that insurance decisions should be analyzed in the presence of traded and non-traded assets such as human capital. The introduction of a non-traded asset can significantly change some of the standard results in insurance economics, such as the presumption of a wealthier individual buying less insurance. A non-traded asset is closely related to the concept of uninsurable background risk, which is of general relevance to the analysis of decisions under risk. As pointed out by Schlesinger and Doherty (1985), incomplete markets induce the presence of uninsurable background risks that can affect the standard results in the analysis in insurance decisions. Doherty and Schlesinger (1983) studied the robustness of the standard results to the presence of an uninsurable background risk that is independent to the insurable loss. They found that under certain restrictive conditions the full insurance theorem by Mossin (1968) and Smith (1968) holds, and that more risk-averse people will choose a lower deductible. Turnbull (1983, p. 217) also showed that in the presence of many risks, the Arrow-Pratt measure of risk aversion is not a sufficient statistic to describe individuals’ behavior in insurance purchasing. Doherty (1984, p. 209) showed in an EUT framework that the Mossin-Smith theorem only holds if the covariance between the nontraded asset and the insurable loss is negative. The intuition is that even if the insurance premium is fair, the decision maker might still not be willing to fully insure if he or she can compensate high losses with high realizations of his or her human capital. However, if the covariance is negative, the individual might want to fully insure if a health shock negatively affects his or her productivity, which would undermine his or her human capital. Moreover, in the presence of a non-traded asset circumstances may arise where a risk-averse individual prefers a coinsurance arrangement to an actuarially equivalent insurance contract with straight deductible.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Choice modeling and risk management
2
417
ASSET INTEGRATION AND PERFECT CAPITAL MARKETS
A common assumption in many models of insurance choice, even those that consider portfolios and related risk management strategies, is perfect asset integration within a given time period. This amounts to the assumption that there exist perfect markets that allow all assets to be traded, and aggregated into one scalar value of wealth. This scalar wealth is then used as the sole argument of some utility function. The same issue arises in an intertemporal context. However, when imperfect capital markets exist, it is no longer possible for the individual to aggregate time-dated wealth or consumption into one aggregate. When imperfect markets are assumed, things change fundamentally. Pye (1966) considered the implications for the optimal investment rule of a company, and demonstrated that imperfect capital markets implied that there no longer existed a ‘utility free’ investment rule, such as implied by the Fisher separation theorem. That utility-free rule held that production and consumption choices could be separated, and that one does not need to know the utility function of the agent in order to identify optimal investment and production. Pye (1966) and Hirshleifer (1970, ch. 7) showed that when capital markets were imperfect, in general one could not define the intertemporal budget constraint without knowing the utility function of the individual. More generally, imperfect markets force you to consider multivariate risk aversion when evaluating insurance demand. Because it is then no longer possible to aggregate to a scalar wealth measure, you must pay attention to the utility evaluation of two or more components of wealth with tools of multivariate risk aversion. Generalizations of the one-dimensional Arrow-Pratt measure of risk aversion have been proposed by Kihlstrom and Mirman (1974), Duncan (1977) and Karni (1979). Kihlstrom and Mirman (1974) posed the issue of multivariate risk aversion under the restrictive assumption that the ordinal preferences underlying two expected utility functions exhibit the same preferences over non-stochastic outcomes. In this case they propose a scalar measure of total risk aversion that allows one to make statements about whether one person is more risk averse than another in several dimensions, or if the same person is more risk averse after some event than before. If one relaxes this assumption, which is not an attractive one in most applications, Duncan (1977) shows that the Kihlstrom and Mirman (1974) multivariate measure of risk aversion naturally becomes matrix-valued. Hence one has vector-valued risk premia, and this vector is not ‘direction dependent’ in terms of evaluation. Karni (1979) shows that you can define the risk premia in terms of the expenditure function, rather than the direct utility function, and then evaluate it ‘uniquely’ by further specifying an interesting statistic of the stochastic process. For example, if you are considering risk attitudes towards a vector of stochastic price shocks, then you could use the mean of those shocks.
3
MULTI-ATTRIBUTE RISK ATTITUDES
A closely related literature defines multi-attribute risk aversion where the utility function is defined over more than one attribute. In this context, Keeney (1973) first defined the concept of conditional risk aversion, Richard (1975) defined the same concept as
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
418
Handbook of choice modelling
bivariate risk aversion, and Epstein and Tanny (1980) defined it as correlation aversion.3 There are several ways to extend these pairwise concepts of risk aversion over two attributes to more than two attributes, as reviewed by Dorfleitner and Krapp (2007). One attraction of the concept of multi-attribute risk aversion is that it allows a relatively simple characterization of the functional forms for utility that rule out multiattribute risk attitudes: additivity. To see the significance of this for insurance demand, consider time-dating as the attribute in question. If you assume the popular additive intertemporal utility function, you rule out correlation aversion. In this case, as is well known, a-temporal risk preferences and the intertemporal elasticity of substitution cannot be estimated or calibrated independently: one is the inverse of the other. But with non-additive intertemporal utility functions, you can immediately separate ‘risk preferences’ and ‘time preferences’. And you can then talk about individuals having preferences for how risk is resolved over time, the essence of any insurance contract. That is, preferences for how risk is resolved over time can be distinct from preferences for how risk is resolved at any given point of time, and hence be a separate behavioral determinant of the demand for insurance. Controlled experiments provide a way to identify and estimate the degree of correlation aversion, and Andersen et al. (2011b) present evidence that it exists and is significant for the Danish population.
4
ESTIMATING PREFERENCES AND BELIEFS
There are three fundamental, behavioral ‘moving parts’ in almost any decision of importance concerning the attributes of choice under risk: risk attitudes, time preferences and subjective beliefs. Experimental economists now have a robust set of tools to elicit each of these, although controversies remain, as expected in foundational concepts such as these. We consider the role of each ‘moving part’ in the effort to identify the determinants of choice over insurance products. Risk attitudes refer to the risk premium that individuals place on lotteries. The familiar diminishing marginal utility explanation of EUT provides one characterization of the risk premium, and allows a wide range of flexible utility functions to be estimated. But it is a simple matter to also allow for probability weighting to explain the risk premium: ‘pessimistic’ attitudes towards probabilities can just as easily account for risk aversion.4 Similarly, it is possible to extend the estimation to allow for sign-dependent preferences, whereby ‘losses’ are evaluated differently than ‘gains’. We add quotation marks for losses and gains because the Achilles heel of sign-dependent models is the specification of the reference point, and this is the subject of considerable debate. All of these approaches simply decompose and explain the risk premium in different ways, and build on the approach before it. Experimental and econometric methods for the estimation of risk attitudes using all of these approaches are relatively well developed: see Harrison and Rutström (2008) for an extensive survey. There is also considerable evidence that behavior towards risky lotteries is not characterized by just one model of decision-making under risk. Mixture specifications in rich and poor countries, in the laboratory and the field, show a remarkable combination, close to 50:50, of both EUT and non-EUT characterizations (for example, Harrison and Rutström, 2009, Harrison et al., 2010). This finding is likely to vary from domain
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Choice modeling and risk management
419
to domain, and population to population, but offers a much richer characterization of behavior than the usual approach favored by economists.5 Most of the effort to go into estimating ‘risk attitudes’ is actually directed at estimating utility functions. So a trivial by-product of that effort is to be able to generate estimates of higher-order concepts such as ‘prudence’ and ‘temperance’, which under EUT reflect an aversion to skewness and kurtosis, respectively. Although it is possible to generate lotteries that identify preferences driven solely by prudence or temperance (for example, Ebert and Wiesen, 2011), these designs typically require that subjects satisfy reduction of compound lotteries, which is a strong assumption and appears to limit the generalization to non-EUT models such as RDU. Recent extensions include attention to the problem, noted earlier, of the presence of ‘background risk’ affecting decisions over foreground risk (for example, Harrison et al., 2007). For example, it makes little sense to evaluate the value of a statistical life without worrying about the confound of compensating differentials for non-fatal injuries: what does not kill often injures. Time preferences are also now relatively well understood. The first generation of experiments used loose procedures by modern standards, often relying on the elicitation of present values using fill-in-the-blank (FIB) methods that have notoriously poor behavioral properties. This literature is characterized by the need to use scientific notation to summarize estimated astronomic discount rates, a sure sign that something was wrong with behavior, experimental design or inferential methods. Frederick et al. (2002) summarize the literature up to this point. The second generation of experiments moved towards binary choice tasks to ensure incentive compatibility, albeit at the loss of information precision (if the FIB methods behaved the way theorists advertised them, which was not the case), and stakes that were more substantial. Inferred discount rates were now at the level of consumer credit cards: high, but believable (for example, Coller and Williams, 1999; Harrison et al., 2002). The third generation of experiments recognized that discount factors equalize time-dated utility, and not time-dated money, so you needed to account for diminishing marginal utility when inferring discount factors. This is a simple matter of theory, from the conceptual definition of a discount factor. Jensen’s inequality does the rest theoretically: inferred discount rates must be lower if you have a concave utility function than if you assume a linear utility function. Appropriate experimental designs and econometric inferences then simply quantify this insight from theory, with a dramatic reduction in estimated discount rates down to 10 percent or even lower (for example, Andersen et al., 2008). Quite apart from the level of discount rates, there appears to be no support for ‘hyperbolicky’ specifications of the discounting function in field data (for example, Andersen et al., 2011a). This does not mean that exponential specifications are appropriate for all populations, just that the monolithic presumption in favor of non-exponential specifications is not supported by the data. Subjective beliefs can be elicited using scoring rule procedures that have a venerable tradition, such as Savage (1971b). These procedures do require that one correct for risk attitudes, and only directly elicit true subjective beliefs under the assumption of risk neutrality. But it is a relatively simple matter to condition inferences about beliefs on the estimated risk attitudes of individuals, by combining experimental tasks that allow you to identify the risk attitudes independently of the task that elicits subjective beliefs (for
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
420
Handbook of choice modelling
example, Andersen et al., 2014). One can also use generalizations of these scoring rules to elicit whole subjective probability distributions, rather than just one subjective probability (for example, Mathieson and Winkler, 1976, for the theory). This area is the least developed of the three, but the experimental tools are in place for rigorous elicitation, and are being widely applied. It should be stressed that there are also many loose claims about how you can elicit risk attitudes, time preferences and subjective beliefs ‘on the cheap’ with simpler methods. In some cases these are hypothetical survey methods, with no theoretical claim to be eliciting anything of interest. In other cases these are experimental methods that rely, as noted, on tasks that are simply not incentive compatible: subjects could exploit the experimenter, for gain, by deliberately misrepresenting their true preferences. Or experimenters use FIB elicitation methods that have known behavioral biases.6 The fact that experimenters assert that these problems did not arise says nothing about whether they do. The existence of relatively transparent, incentive compatible methods leads you to wonder why you would risk using other methods.7 It is appropriate that all of these methods were first developed in laboratory environments, and that the econometric procedures for estimation of preferences and beliefs were first refined in that setting. Laboratory experiments give us control, if designed and executed correctly. If we cannot identify the conceptually correct measure in that setting, we cannot hope to do so in more complicated field settings. But there is a relatively easy bridge between the laboratory and the field, as stressed by Harrison and List (2004), so that both are complementary ways to make inferences.
5
RISK AND UNCERTAINTY
The evaluation of naturally occurring choices involves more than just the evaluation of objective risk. Does anything change when we allow for subjective beliefs in the evaluation of a choice? Unfortunately, yes and no. Nothing changes if we assume, following Savage (1971a), that decisions are made as if one obeys the reduction of compound lotteries (ROCL) axiom. But things change radically if one does not make that assumption. This seemingly technical issue is actually of great significance for the evaluation of policy choices, and is worth explaining carefully. Figure 18.1 illustrates the situation. Assume that the subjective beliefs are symmetric, with mean one-half as shown by the solid, vertical line. But they vary in terms of the underlying distribution, as shown in the four panels of Figure 18.1. Some are just more or less precise than others, and one is bimodal. Under ROCL, all would generate decisions with the same outcome, since all have the same (weighted) average. Something nags at us to say that behavior ought to be different under these different sets of beliefs, but ROCL begs to differ. Figure 18.2 raises the stakes by considering asymmetric distributions. Again, ROCL is a strong, identifying assumption. Together, Figures 18.1 and 18.2 remind us that Savage (1971a) did not assume that people had degenerate subjective probabilities that they held with certainty, he only assumed that under ROCL they behaved as if they did. We often forget that linguistic methodological sidestep, and confuse the ‘as if’ behavior for what was actually assumed. In some cases the difference does not matter, but here it does. The reason
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Choice modeling and risk management Mean = 0.5
0
0.25 0.5 0.75 Subjective probability
1
Mean = 0.5
0
0.25 0.5 0.75 Subjective probability
Mean = 0.5
0
0.25
0.5 0.75 Subjective probability
1
421
1
Mean = 0.5
0
0.25 0.5 0.75 Subjective probability
1
Figure 18.1 Symmetric subjective probability distributions
Mean = 0.35
0
0.25 0.75 0.5 Subjective probability
1
Mean = 0.23
0
0.25 0.5 0.75 Subjective probability
Mean = 0.16
0
0.25 0.5 0.75 Subjective probability
1
1
Mean = 0.13
0
0.25 0.5 0.75 Subjective probability
Figure 18.2 Asymmetric subjective probability distributions
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
1
422
Handbook of choice modelling 0.6
Subjective density
0.5 0.4 0.3 0.2 0.1 0 0.6
0.7 Subjective probability ()
0.8
Figure 18.3 ROCL at work is that when we have to worry about the underlying non-degenerate distribution, when ROCL is not assumed, then we have moved from the realm of (subjective) risk to uncertainty. And when the individual does not even have enough information to form any subjective belief distribution, degenerate or non-degenerate, we are in the realm of ambiguity. Figure 18.3 allows a simple illustration of how ROCL allows one to collapse these disparate, non-degenerate distributions into one degenerate weighted average. Figure 18.3 displays a three-point discrete, non-degenerate, subjective distribution over a binary event in which the individual holds subjective probability π = 0.6 with ‘prior’ probability 0.1, π = 0.7 with ‘prior’ probability 0.6, and π = 0.8 with ‘prior’ probability 0.3, for a weighted average π = 0.72. Now consider a lottery in which one gets $X if the event occurs, and $x otherwise. Then the subjective expected utility (SEU) is 0.1 × 0.6 × U(X) + 0.1 × 0.4 × U(x) + 0.6 × 0.7 × U(X) + 0.6 × 0.3 × U(x) + 0.3 × 0.8 × U(X) + 0.3 × 0.2 × U(x),
(18.1)
which collapses to (0.1 × 0.6 + 0.6 × 0.7 + 0.3 × 0.8) × U(X) + (0.1 × 0.4 + 0.6 × 0.3 + 0.3 × 0.2) × U(x)
(18.2)
and hence to 0.72 × U(X) + 0.28 × U(x)
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
(18.3)
Choice modeling and risk management
423
under ROCL. So the non-degenerate distribution in Figure 18.3 can be boiled down to a degenerate subjective probability of 0.72 under ROCL: an impressive identifying restriction! How we relax ROCL is a matter for important, foundational research. Although it has taken half a century for the implications of Ellsberg (1961) to be formalized in tractable ways, we are much closer to doing so. One popular approach is the ‘smooth ambiguity model’ of Klibanoff et al. (2005). Another popular approach is due to Ghirardoto et al. (2004), generalizing Gilboa and Schmeidler (1989). We can illustrate the smooth ambiguity model with a simple example. Let CE(π=0.6) be the certainty equivalent of the lottery 0.6 × U(X) + 0.4 × U(x), CE(π = 0.7) be the certainty equivalent of the lottery 0.7 × U(X) + 0.3 × U(x), and CE(π = 0.8) be the certainty equivalent of the lottery 0.8 × U(X) + 0.2 × U(x). Then the evaluation of the lottery can be written 0.1 × f(U(CE(π = 0.6))) + 0.6 × f(U(CE(π = 0.7))) + 0.3 × f(U(CE(π = 0.8))), (18.4) where f is a function defined over the certainty-equivalent of the lottery that is conditional on a particular subjective probability value. Akin to the properties of U(·) defining risk attitudes under EUT or SEU, the properties of f(·) define attitudes towards the uncertainty over the particular subjective probability value.8 If f is concave, then the decision-maker is uncertainty averse; if f is convex, then the decision maker is uncertainty loving; and if f is linear, then the decision maker is uncertainty neutral. The familiar SEU specification emerges if f is linear, since then ROCL applies after some irrelevant normalization. The overall evaluation of the lottery choice depends on risk attitudes and uncertainty attitudes, and there is no reason for the decision-maker to be averse to both at the same time. An important econometric corollary is that one cannot infer attitudes toward uncertainty from observed choice until attitudes toward risk are characterized. An equally important implication is that the very definition of a choice setting as involving risk, uncertainty or ambiguity is subjective. The propensity to make decisions using ROCL is a subjective one. The propensity to fill in ambiguous blanks with welldefined subjective belief distributions, whether or not you then apply ROCL to them, is also a subjective one. So you cannot speak a priori of any given environment being one of risk, uncertainty or ambiguity without knowing, or assuming, more about the decision maker.
6
CONCLUSIONS
Choice modeling frameworks can be extended to consider the management and perception of risk, and there is already a rich literature on the implications for modeling and experimental inference. Working backwards from observed choice behavior to infer latent preferences is not easy when you allow for decision makers to want to mitigate that risk by their choices, but the theoretical, experimental and econometrics tools are in place.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
424
Handbook of choice modelling
NOTES 1. We are grateful to a reviewer for constructive comments. 2. Many of the risk management strategies that people have at their disposal can potentially affect both the cost and the probability of the loss. However, for pedagogic reasons it is common to assume that there is a separation between self-insurance and self-protection activities. 3. Several studies note that the core concept appeared as early as de Finetti (1952), but this was written in Italian and we cannot verify that claim. 4. The logic is easy to see. Assume lotteries defined solely over gains, and a linear utility function just to remove the effect of diminishing marginal utility. Then if the weighted probability is always equal to or less than the actual (objective or subjective) probability, the EU based on these weighted probabilities will be less than the EV based on the actual probabilities, hence there is a risk premium. 5. In effect, the usual methodological approach is akin to running a horserace, declaring a winner, maybe by a nose, and shooting all of the losing horses. The fact that one of these losers might have done better on a different, wetter track is ignored. 6. See Harrison (1992, 2005, 2006a, 2006b) and Harrison and Rutström (2008) for examples and surveys of these biases. 7. This strong claim is supported by extensive discussion by Harrison (2006a) of the claims that one can easily generate incentive-compatible choices using appropriate hypothetical surveys. 8. In the original specifications f is said to characterize attitudes towards ambiguity, but the earlier definition of risk, uncertainty and ambiguity makes it apparent why one would not want to casually confound the two. One would only be dealing with ambiguity in the absence of well-defined prior probabilities over the three subjective probability values 0.6, 0.7 and 0.8.
REFERENCES Andersen, S., J. Fountain, G.W. Harrison and E.E. Rutström (2014), ‘Estimating subjective probabilities’, Journal of Risk and Uncertainty, forthcoming. Andersen, S., G.W. Harrison, M.I. Lau and E.E. Rutström (2008), ‘Eliciting risk and time preferences’, Econometrica, 76 (3), 583–618. Andersen, S., G.W. Harrison, M.I. Lau and E.E. Rutström (2011a), ‘Discounting behavior: a reconsideration’, Working Paper 2011-03, Center for the Economic Analysis of Risk, Robinson College of Business, Georgia State University. Andersen, S., G.W. Harrison, M.I. Lau and E.E. Rutström (2011b), ‘Multiattribute utility, intertemporal utility and correlation aversion’, Working Paper 2011-04, Center for the Economic Analysis of Risk, Robinson College of Business, Georgia State University. Briys, E. and H. Schlesinger (1990), ‘Risk aversion and the propensities for self-insurance and self-protection’, Southern Economic Journal, 57 (2), 458–67. Briys, E., H. Schlesinger and J.M.G. Schulenburg (1991), ‘Reliability of risk management: market insurance, self-insurance and self-protection’, The Geneva Papers on Risk and Insurance Theory, 16 (1), 45–58. Coller, M. and M.B. Williams (1999), ‘Eliciting individual discount rates’, Experimental Economics, 2 (2), 107–27. Courbage, C. (2001), ‘Self-insurance, self-protection and market insurance within the dual theory of choice’, The Geneva Papers on Risk and Insurance Theory, 26 (1), 43–56. Dionne, G. and L. Eeckhoudt (1985), ‘Self-Insurance, self-protection and increased risk aversion,’ Economics Letters, 17 (1–2), 39–42. Doherty, N.A. (1984), ‘Portfolio efficient insurance buying strategies’, Journal of Risk and Insurance, 51 (2), 205–24. Doherty, N.A. and H. Schlesinger (1983), ‘The optimal deductible for an insurance policy when initial wealth is random’, Journal of Business, 56 (4), 555–65. Dorfleitner, G. and M. Krapp (2007), ‘On multiattributive risk aversion: some clarifying results’, Review of Managerial Science, 1 (1), 47–63. Duncan, G.T. (1977), ‘A matrix measure of multivariate local risk aversion’, Econometrica, 45 (4), 895–903. Ebert, S. and D. Wiesen (2011), ‘Testing for prudence and skewness seeking’, Management Science, 57 (7), 1334–49. Ehrlich, I. and G.S. Becker (1972), ‘Market insurance, self-insurance and self-protection’, Journal of Political Economy, 80 (4), 623–48.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Choice modeling and risk management
425
Ellsberg, D. (1961), ‘Risk, ambiguity, and the savage axioms’, Quarterly Journal of Economics, 75 (4), 643–69. Epstein, L.G. and S.M. Tanny (1980), ‘Increasing generalized correlation: a definition and some economic consequences’, Canadian Journal of Economics, 13 (1), 16–34. Frederick, S., G. Loewenstein and T. O’Donoghue (2002), ‘Time discounting and time preference: a critical review’, Journal of Economic Literature, 40 (2), 351–401. Friedman, M. and L.J. Savage (1948), ‘The utility analysis of choices involving risk’, Journal of Political Economy, 56 (4), 279–304. Ghirardoto, P., F. Maccheroni and M. Marinacci (2004), ‘Differentiating ambiguity and ambiguity attitude’, Journal of Economic Theory, 118 (2), 133–73. Gilboa, I. and D. Schmeidler (1989), ‘Maxmin expected utility with a non-unique prior’, Journal of Mathematical Economics, 18 (2), 141–53. Gould, J.P. (1969), ‘The expected utility hypothesis and the selection of optimal deductibles for a given insurance policy’, Journal of Business, 42 (2), 143–51. Harrison, G.W. (1992), ‘Theory and misbehavior of first-price auctions: reply’, American Economic Review, 82 (December), 1426–43. Harrison, G.W. (2005), ‘Hypothetical bias over uncertain outcomes’, in J.A. List (ed.), Using Experimental Methods in Environmental and Resource Economics, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 41–69. Harrison, G.W. (2006a), ‘Making choice studies incentive compatible’, in B. Kanninen (ed.), Valuing Environmental Amenities Using Stated Choice Studies: A Common Sense Guide to Theory and Practice, Boston, MA: Kluwer, pp. 65–108. Harrison, G.W. (2006b), ‘Experimental evidence on alternative environmental valuation methods’, Environmental and Resource Economics, 34 (1), 125–62. Harrison, G.W. and J. List (2004), ‘Field experiments’, Journal of Economic Literature, 42 (4), 1009–55. Harrison, G.W., S.J. Humphrey and A. Verschoor (2010), ‘Choice under uncertainty: evidence from Ethiopia and Uganda’, Economic Journal, 120 (543), 80–104. Harrison, G.W., M. Lau and M.B. Williams (2002), ‘Estimating individual discount rates in Denmark: a field experiment’, American Economic Review, 92 (5), 1606–17. Harrison, G.W., J. List and C.A. Towe (2007), ‘Naturally occurring preferences and exogenous laboratory experiments: a case study of risk aversion’, Econometrica, 75 (2), 433–58. Harrison, G.W. and E.E. Rutström (2008), ‘Experimental evidence on the existence of hypothetical bias in value elicitation experiments’, in C.R. Plott and V.L. Smith (eds), Handbook of Experimental Economics Results, New York: Elsevier Press, pp. 41–196. Harrison, G.W. and E.E. Rutström (2009), ‘Expected utility theory and prospect theory: one wedding and a decent funeral’, Experimental Economics, 12 (2), 133–58. Hensher, D.A. S.M. Puckett and J.M. Rose (2007), ‘Extending stated choice analysis to recognize agentspecific attribute endogeneity in bilateral group negotiation and choice: a think piece’, Transportation, 34 (6), 667–79. Hiebert, L.D. (1989), ‘Optimal loss reduction and increases in risk aversion’, Journal of Risk and Insurance, 56 (2), 300–305. Hirshleifer, J. (1970), Investment, Interest and Capital, Englewood Cliffs, NJ: Prentice-Hall. Kamis, A., M. Koufaris and T. Stern (2008), ‘Using an attribute-based DSS for user-customized products online: an experimental investigation’, MIS Quarterly, 32 (1), 159–77. Karni, E. (1973), ‘On multivariate risk aversion’, Econometrica, 47 (6), 1391–401. Keeney, R.L. (1973), ‘Risk independence and multiattributed utility functions’, Econometrica, 41 (1), 27–34. Kihlstrom, R.E. and L.J. Mirman (1974), ‘Risk aversion with many commodities’, Journal of Economic Theory, 8, 361–88. Konrad, K.A. and S. Skaperdas (1993), ‘Self-insurance and self-protection: a nonexpected utility analysis’, The Geneva Papers on Risk and Insurance Theory, 18 (2), 131–46. Leeth, J.D. and J. Ruser (2003), ‘Compensating wage differentials for fatal and nonfatal injury risk by gender and race’, Journal of Risk and Uncertainty, 27 (3), 257–77. Louviere, J.J., D.A. Hensher and J.D. Swait (2000), Stated Choice Methods: Analysis and Application, New York: Cambridge University Press. Machina, M. (1995), ‘Non-expected utility and the robustness of the classical insurance paradigm’, The Geneva Paper on Risk and Insurance Theory, 20 (1), 9–50. Machina, M. (2000), ‘The robustness of the classical insurance paradigm’, in G. Dionne (ed.), Handbook of Insurance, Boston, MA: Kluwer, pp. 37–96. Marcucci, E., L. Rotaris and G. Paglione (2009), ‘A methodology to evaluate the prospects for the introduction of a Park&Buy service’, European Transport, 42, 26–46. Mathieson, J.E. and R.L. Winkler (1976), ‘Scoring rules for continuous probability distributions’, Management Science, 22 (10), 1087–96.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
426
Handbook of choice modelling
Mayers, D. and C.W. Smith Jr (1983), ‘The interdependence of individual portfolio decisions and the demand for insurance’, Journal of Political Economy, 91 (2), 304–11. Mossin, J. (1968), ‘Aspects of rational insurance purchasing,’ Journal of Political Economy, 79 (4), 553–6. Pye, G. (1991), ‘Present values for imperfect capital markets’, Journal of Business, 39 (1), 45–51. Quiggin, J. (1991), ‘Comparative statics for rank-dependent expected utility theory’, Journal of Risk and Uncertainty, 4 (2), 338–50. Quiggin, J. (1992), ‘Risk, self-protection and ex ante economic value-some positive results’, Journal of Environmental Economics and Management, 23 (1), 40–53. Richard, S.F. (1975), ‘Multivariate risk aversion, utility independence and separable utility functions’, Management Science, 22 (1), 12–21. Savage, L.J. (1971a), The Foundations of Statistics, 2nd edn, New York: Dover Publications. Savage, L.J. (1971b), ‘Elicitation of personal probabilities and expectations’, Journal of the American Statistical Association, 66 (December), 783–801. Schlesinger, H. and N. Doherty (1985), ‘Incomplete markets for insurance: an overview’, Journal of Risk and Insurance, 52 (3), 402–23. Shogren, J.F. and T.D. Crocker (1991), ‘Risk, self-protection and ex ante economic value,’ Journal of Environmental Economics and Management, 20 (1), 1–15. Shogren, J.F. and T.D. Crocker (1999), ‘Risk and its consequences’, Journal of Environmental Economics and Management, 37 (1), 44–51. Smith, V.L. (1968), ‘Optimal insurance coverage’, Journal of Political Economy, 76 (1), 68–77. Sweeney, G.H. and T.R. Beard (1992), ‘The comparative statics of self-protection’, Journal of Risk and Insurance, 59 (2), 301–9. Turnbull, S.M. (1983), ‘Additional aspect of rational insurance purchasing’, Journal of Business, 56 (2), 217–29. Viscusi, W.K. (1993), ‘The value of risks to life and health’, Journal of Economic Literature, 11 (4), 1912–46.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
19 Multiple discrete-continuous choice models: a reflective analysis and a prospective view Chandra Bhat and Abdul Pinjari
1
BACKGROUND
Several consumer choices are characterized by a discrete dimension as well as a continuous dimension. Examples of such choice situations include vehicle-type holdings and usage, appliance choice and energy consumption, housing tenure (rent or purchase) and square footage, brand choice and quantity, and activity-type choice and duration of time investment of participation. Two broad model structures may be identified in the literature to handle such discrete-continuous choice situations. The first structure (sometimes referred to as the ‘reduced-form’ structure) has a separate equation for the discrete choice and another separate equation for the continuous choice, with jointness introduced through the statistical correlation in the random stochastic components of each equation. That is, a discrete choice model and a continuous regression model are specified separately, and then simply statistically stitched together through the stochastic terms. This first structure has seen extensive use and has proved useful to handle many empirical situations, but it is not based off an underlying (and unifying) theoretical economic model (this structure does not include the class of indirect utility function-based models that are consistent with utility maximization, as discussed in the next section). The second structure to discrete-continuous choice modeling, and the one of interest in this chapter, originates from the classical microeconomic theory of utility maximization. While much work in the context of consumer utility maximization has been focused on the case of a single discrete-continuous (SDC) choice situation (where the choice involves the selection of one of many alternatives and the continuous dimension associated with the chosen alternative), there has been increasing interest recently in the multiple discrete-continuous (MDC) choice situation (where the choice situation involves the selection of one or more alternatives, along with a continuous quantity dimension associated with the consumed alternatives). Such MDC choices are pervasive in the social sciences, including transportation, economics, and marketing. Examples include individuals’ time-use choices (decisions to engage in different types of activities and time allocation to each activity), investment portfolios (where and how much to invest) and grocery purchases (brand choice and purchase quantity). Regardless of whether a choice situation belongs to an SDC case or an MDC case, at a basic level, the choice process faced by the consumer can be formulated using the theory of utility maximization as described next.
427 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
428
Handbook of choice modelling
1.1
The Random Utility Maximization (RUM) Approach to Modeling Discrete-Continuous Choices
Consumers are assumed to maximize a direct utility function Uk over a set of non-negative consumption quantities x 5 (x1, . . . , xk, . . . , xK) subject to a budget constraint, as below: Max U(x) such that x.p 5 E and xk $ 0
(19.1)
where Uk is a quasi-concave, increasing and continuously differentiable utility function with respect to the consumption quantity vector, p is the vector of unit prices for all goods, and E is the total expenditure (or income). Note that we are suppressing the index for the consumer in equation (19.1) for presentation efficiency. The formulation above is equally applicable to cases with complete or incomplete demand systems (that is, the modeling of demand for all commodities that enter preferences or the modeling of demand for a subset of commodities that enter preferences).1 The vector x in equation (19.1) may or may not include an outside good. The outside good, when included, represents the part of the total budget (for example, income) that is not spent on the K inside goods of interest to the analyst. Generally, the outside good is treated as a numeraire with unit price, implying that the prices and characteristics of all goods grouped into the outside category do not influence the choice and expenditure allocation among the inside goods (see Deaton and Muellbauer, 1980). The outside good allows for the overall demand for the inside goods to change due to changes in prices and other influential factors of the inside goods. Other assumptions typically made in the above utility maximization formulation are: (a) the direct utility contribution due to the consumption of different alternatives is additively separable, and (b) the constraint is linear in prices, and it is the only constraint governing consumers’ decisions. We will return to these assumptions later. The form of the utility function U (x) in equation (19.1) determines whether the formulation corresponds to an SDC model or an MDC model. The SDC case assumes that the choice alternatives are perfect substitutes; that is, the choice of one alternative precludes the choice of others. The MDC case accommodates imperfect substitution among goods, thus allowing for the possibility of consuming multiple alternatives. A linear utility form with respect to consumption characterizes the perfect substitutes (or SDC) case, while a non-linear utility form allowing diminishing marginal utility with increasing consumption characterizes the imperfect substitutes (or MDC) case. An example SDC framework is Hanemann’s (1984) specification: U (x) 5 U*a a yk xk, x1 b, K
(19.2)
k 52
where U* is a bivariate utility function and yk(k 5 2,. . ., K) represents the quality index (or baseline preference) specific to each inside good k, with the first good considered as the outside good. This functional form assures that, in addition to the outside good, exactly one inside good (k 5 2, 3,. . ., K) is consumed. Hanemann (1984) refers to this as the ‘extreme corner solution’. Examples of MDC frameworks are discussed later.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 429 Two approaches have been used to derive demand functions for the consumption quantities for the utility maximization problem in equation (19.1). The first approach, due to Hanemann (1978) and Wales and Woodland (1983), takes a direct approach to solving the constrained utility maximization problem in equation (19.1) via standard application of the Karush–Kuhn–Tucker (KKT) first-order necessary conditions of optimality. Considering the utility function U(x) to be random over the population leads to stochastic KKT conditions, which form the basis for deriving probabilities for consumption patterns (including corner solutions). This approach is called the KKT approach due to the central role played by the KKT conditions (more popularly, the approach is referred to as the KT approach, but we use the label ‘KKT’ to give credit to Karush who, in an unpublished manuscript, derived the first order optimality conditions in a constrained optimization setting even earlier than Kuhn and Tucker). The second approach, due to Hanemann (1984) and Lee and Pitt (1986), solves the maximization problem in equation (19.1) by using ‘virtual prices’ (a method that is dual to the KKT approach), which allows the analysis to start with the specification of a conditional indirect utility function. Subsequently, the implied Marshallian demand functions are obtained via Roy’s identity (Roy, 1947).2 The vast majority of applications in the literature have involved single discrete or SDC choices. These use the indirect utility approach as opposed to the KKT approach (that is, the direct utility approach). This is mainly because the KKT approach has been perceived to be difficult to use until the past decade. This is primarily due to the absence of practical methods for estimating the structural parameters. In particular, the KKT conditions, in a stochastic setting, lead to a probability expression for the consumption vector that involves multidimensional integrals of the order of the number of goods in the analysis as discussed in section 3.2 (and, until Bhat, 2005, this expression was thought to be analytically intractable). Further, simple and practically feasible prediction and welfare analysis methods were not available for models based on the KKT approach. However, recent interest in MDC problems has brought renewed attention to the KKT approach. Besides, the use of direct utility functions has some advantages: the relationship of the utility function to behavioral theory is more transparent, offering more interpretable parameters and better insights into identification issues. This is true even for the SDC case. For example, Bunch (2009) shows that the indirect utility function used by Chintagunta (1993) is in fact from the linear expenditure system, so the direct utility function is known. Applying the KKT approach yields the correct analytical expression for the reservation price in terms of parameters from the direct utility function, which has a clear behavioral interpretation. Over the past decade, the field has witnessed significant strides in using the KKT approach for modeling MDC choices – both for estimation of the parameters for KKT models and for application of the models for forecasting and welfare analysis. Thus, in this chapter, we focus on the KKT approach to modeling MDC choices. Specifically, we review the recent advances and outline an agenda for future research. 1.2 Structure of the Chapter The rest of this chapter is organized as follows. The next section provides an overview of the utility forms used to model MDC choices. Section 3 outlines the econometric
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
430
Handbook of choice modelling
structure and KKT conditions of optimality that form the basis for deriving the model structure and likelihood expressions. Section 4 outlines the specific model structures used in the literature based on different specifications of the utility form and the stochastic structure. Section 5 provides a brief discussion of the case where the choice alternatives comprise a combination of imperfect and perfect substitutes. Section 6 presents methods that enable the use of the KKT-based MDC models for forecasting and policy analysis purposes. Section 7 discusses several developments on the horizon and the challenges that lie beyond. Section 8 summarizes the chapter.
2
UTILITY FORMS FOR MODELING MDC CHOICES
As discussed earlier, non-linear utility forms that allow diminishing marginal utility with increasing consumption can be used to model ‘multiple discreteness’ in consumer choices. A number of different utility forms have been used in the literature. In this section, we discuss the following form used in Bhat (2008) as it subsumes a variety of utility forms used in previous studies as special cases: k
K g a xk k U (x) 5 a yk e a 1 1b 2 1 f gk k 51 ak
(19.3)
In the above utility function, U(x) is a quasi-concave, increasing, and continuously differentiable function with respect to the consumption quantity (K 3 1)-vector x (xk ≥ 0 for all k), and yk, gk and ak are parameters associated with good k. The function in equation (19.3) is a valid utility function if yk.0 and ak ≤ 1 for all k. Further, for presentation ease, we assume temporarily that there is no Hicksian composite outside good that is consumed by all decision makers, so that corner solutions (that is, zero consumptions) are allowed for all the goods k. The possibility of corner solutions implies that the term gk, which is a translation parameter, should be greater than zero for all k. The reader will note that there is an assumption of additive separability of preferences in the utility form of equation (19.1). More on this assumption later. The form of the utility function in equation (19.1) highlights the role of the various parameters yk, gk, and ak, and explicitly indicates the inter-relationships between these parameters that relate to theoretical and empirical identification issues. The form also assumes weak complementarity (see Mäler, 1974), which implies that the consumer receives no utility from a non-essential good’s attributes if he or she does not consume it (that is, a good and its quality attributes are weak complements). The functional form proposed by Bhat (2008) in equation (19.3) generalizes earlier forms used by Hanemann (1978), von Haefen et al. (2004), Phaneuf et al. (2000) and others. Specifically, the utility form of equation (19.3) collapses to the following linear expenditure system (LES) form when ak S 04k: U (x) 5 a gkykln ( (xk /gk) 1 1) K
k 51
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
(19.4)
Multiple discrete-continuous choice models 431 2.1 Role of Parameters in the Utility Specification 2.1.1 Role of yk The role of yk can be inferred by computing the marginal utility of consumption with respect to good k, which is: a 21 xk 0U (x) 5 yk a 1 1b gk 0xk k
(19.5)
It is clear from above that yk represents the baseline marginal utility, or the marginal utility at the point of zero consumption of good k. Alternatively, the marginal rate of substitution between any two goods k and l at the point of zero consumption of both goods is yk /yl. This is the case regardless of the values of gk and ak. Thus, a good k with a higher baseline marginal utility is more likely to be consumed than a good l with a lower baseline marginal utility. 2.1.2 Role of gk An important role of the gk terms is to shift the position of the point at which the indifference curves are asymptotic to the axes from (0,0,0. . .,0) to ( 2 g1,2g2,2g3,. . .,2gK) , so that the indifference curves strike the positive orthant with a finite slope. This, combined with the consumption point corresponding to the location where the budget line is tangential to the indifference curve, results in the possibility of zero consumption of good k. To see this, consider two goods 1 and 2 with y1 5 y2 5 1, a15 a2 5 0.5, and g2 5 1. Figure 19.1 presents the profiles of the indifference curves in this two-dimensional space for various values of g1(g1 . 0). To compare the profiles, the indifference curves are all drawn to go through the point (0,8). You will also note that all the indifference curve profiles strike the y-axis with the same slope. As can be observed from Figure 19.1, the positive values of g1 and g2 lead to indifference curves that cross the axes of the positive orthant, allowing for corner solutions. The indifference curve profiles are asymptotic to the x-axis at y 5 −1 (corresponding to the constant value of g2 5 1), while they are asymptotic to the y-axis at x 5 2g1. Figure 19.2 points to another role of the gk term as a satiation parameter. Specifically, the figure plots the sub-utility function for alternative k for ak S 0 and yk 5 1, and for different values of gk. All of the curves have the same slope yk 5 1 at the origin point, because of the functional form used here. However, the marginal utilities vary for the different curves at xk . 0. Specifically, the higher the value of gk, the less is the satiation effect in the consumption of xk. Thus, different values of gk lead to different satiation effects, provided ak , 1. 2.1.3 Role of ak The express role of ak is to reduce the marginal utility with increasing consumption of good k; that is, it represents a satiation parameter. When ak 5 1 for all k, this represents the case of absence of satiation effects or, equivalently, the case of constant marginal utility. The utility function in equation (19.1) in such a situation collapses to g ykxk, k which represents the perfect substitutes case. This is the case of single discreteness. As ak moves downward from the value of 1, the satiation effect for good k increases. When
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
432
Handbook of choice modelling 11
Consumption quantity of good 2
10
1 = 2 = 1
9
1 = 2 = 0.5
8
2 = 1
7 6 5 4
1 = 0.25
3 2 1 = 1
1 –2
–1
1 = 2
0 –0.25 0 –1
1
2
–2
3
4
5
1 = 5
6
Consumption quantity of good 1
Utility accrued due to consumption of good k
Figure 19.1 Indifference curves corresponding to different values of g1
180 k = 1
160
k u 0
k = 100
140 120 k = 20
100 80 60 40
k = 10 1k = 5
20
k = 1
0 0
50
100
150
200
150
300
350
400
Consumption quantity of good k
Figure 19.2 Effect of gk value on good k’s sub-utility function profile ak S 0, the utility function collapses to the LES form, as discussed earlier. ak can also take negative values and, when ak S 2`, this implies immediate and full satiation. Figure 19.3 plots the utility function for alternative k for gk 5 1 and yk 5 1, and for different values of ak. Again, all of the curves have the same slope yk 5 1 at the origin point, and accommodate different levels of satiation through different values of ak for any given gk value.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 433 Utility accrued due to consumption of good k
40
k = 1 k = 0.5
k = 1
30 k = 0.4
20
k = 0.25
10 k u 0 k u –0.5 k = –2
0 0
50
100
150
200
150
300
350
400
Consumption quantity of good k
Figure 19.3 Effect of ak value on good k’s sub-utility function profile 2.2
Empirical Identification Issues Associated with Utility Form
The discussion in the previous section indicates that yk reflects the baseline marginal utility, which controls whether or not a good is selected for positive consumption (or the extensive margin of choice). The role of gk is to enable corner solutions, though it also governs the level of satiation. The purpose of ak is solely to allow satiation. The precise functional mechanism through which gk and ak impact satiation are, however, different; gk controls satiation by translating consumption quantity, while ak controls satiation by exponentiating consumption quantity. Clearly, both these effects operate in different ways, and different combinations of their values lead to different satiation profiles. However, empirically speaking, and as discussed in detail in Bhat (2008), it is very difficult to disentangle the two effects separately, which leads to serious empirical identification problems and estimation breakdowns when one attempts to estimate both gk and ak parameters for each good. In fact, for a given yk value, it is possible to closely approximate a sub-utility function profile based on a combination of gk and ak values with a sub-utility function based solely on gk or ak values. In actual application, it would benefit the analyst to estimate models based on both the ak-profile (that is, a utility function based solely on ak values) and the yk-profile (that is, a sub-utility function based solely on values gk, with the ak values set to zero), and choose a specification that provides a better statistical fit. Alternatively, the analyst can stick with one functional form a priori, but experiment with various fixed values of ak for the gk-profile and gk for the ak-profile. 2.3 Utility Form for Situations with an Outside Good Thus far, the discussion has assumed that there is no outside numeraire good (that is, no essential Hicksian composite good). If an outside good is present, label it as the first
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
434
Handbook of choice modelling
good which now has a unit price of one. Then, the utility functional form needs to be modified as follows: U (x) 5
K g a xk 1 k y1 { (x1 1 g1) a } 1 a yk e a 1 1b 2 1 f gk a1 k 52 ak k
1
(19.6)
In the above formula, we need g1 # 0, while gk . 0 for k . 1. Also, we need x1 1 g1 . 0 The magnitude of g1 may be interpreted as the required lower bound (or a ‘subsistence value’) for consumption of the outside good. The identification considerations discussed for the ‘no-outside good’ case carries over to the ‘with outside good’ case. For example, as in the ‘no-outside good’ case, the analyst will generally not be able to estimate both ak and gk for the outside and inside goods. Another important normalization necessary for parameter identification, regardless of the presence or absence of the outside good, is that the coefficients of explanatory variables (including the constants) in the baseline utility parameters yk (k 5 1,2,. . .,K) should be normalized (for example, to zero) for at least one alternative. In situations with a Hicksian composite outside good, the natural candidate for such normalization is the baseline marginal utility parameter of the outside good. This identification condition is similar to that in the standard discrete choice model, though the origin of the condition is different between standard discrete choice models and the multiple discrete-continuous models. In standard discrete choice models, individuals choose the alternative with the highest indirect utility, so that all that matters is relative utility. In multiple discrete-continuous models, the origin of this condition is the adding up (or budget) constraint associated with the quantity of consumption of each good.
3
ECONOMETRIC STRUCTURE AND KARUSH-KUHNTUCKER (KKT) CONDITIONS OF OPTIMALITY
The KKT approach employs a direct stochastic specification by assuming the utility function U(x) to be random over the population. In all recent applications of the KKT approach for multiple discreteness, a multiplicative random element is introduced to the baseline marginal utility of each good as follows: y (zk,ek) 5 y (zk) # eek,
(19.7)
where zk is a set of attributes characterizing alternative k and the decision maker, and ek captures idiosyncratic (unobserved) characteristics that impact the baseline utility for good k. The exponential form for the introduction of the random term guarantees the positivity of the baseline utility as long as y (zk) . 0. To ensure this latter condition, y (zk) is further parameterized as exp (brzk) , which then leads to the following form for the baseline random utility associated with good k: y (zk,ek) 5 exp (brzk 1 ek)
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
(19.8)
Multiple discrete-continuous choice models 435 The zk vector in the above equation includes a constant term. The overall random utility function of equation (19.3) then takes the following form: a gk xk U (x) 5 a [ exp (brzk 1 ek) ] # e a 1 1b 2 1 f gk k ak k
(19.9)
As indicated earlier, the part of b9 (that is, the coefficients of explanatory variables) corresponding to at least one alternative must be normalized to zero. In the presence of a Hicksian composite outside good, arbitrarily designating the first alternative as the outside good, the overall random utility function can be written as: U (x) 5
a gk xk 1 exp (e1){(x1 1g1) a } 1 a [ exp (brzk 1 ek) ] # e a 1 1b 21f gk a1 k ak k
1
(19.10)
Note that, for identification, y (z1, e1) is specified as ee , by normalizing the coefficients of z1 to zero. But some studies (particularly those in the environmental economics literature) impose a stronger normalization by considering the utility of the outside good as being deterministic (that is, e1 5 0) and setting y (z1, e1) 5 1. Then the overall random utility function becomes: 1
U (x) 5
a g x 1 { (x1 1 g1) a } 1 a k [ exp (brzk 1 ek) ] # e a k 1 1b 2 1 f gk a1 k ak k
1
(19.11)
While the above normalization is not theoretically inappropriate, it is unnecessary. Further, it is arbitrary to set a good’s utility contribution to be deterministic. This is particularly a problem in situations with no Hicksian composite outside good, where the analyst has to arbitrarily choose the utility contribution of any one alternative to be deterministic. Further, as demonstrated in Bhat (2008), the probability expressions and probability values for the consumption pattern depend on which choice alternative is chosen for this normalization. Finally, in contexts with an outside good, including the stochastic term on the outside good e1 helps in capturing correlation among the random utilities of the inside goods. Such correlation helps in inducing greater competition among the consumptions of the inside goods, when compared with the competition between the inside goods and the outside good. Thus, we prefer the specification with stochasticity in the utility contribution of all choice alternatives, including that of the outside good in situations with an outside good. 3.1 Optimal Consumptions The analyst can solve for the optimal expenditure allocations by forming the Lagrangian and applying the KKT conditions. For the utility form in equation (19.10), the Lagrangian function for the problem is:3 K g a K x 1 k [ exp (brzk 1ek)] ea k 11b 2 1f 2l c a pkxk 2 Ed , exp (e1){(x11g1) a }1 a gk a1 k 52 ak k 51 k
L5
1
(19.12)
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
436
Handbook of choice modelling
where l is the Lagrangian multiplier associated with the budget constraint (that is, it can be viewed as the marginal utility of total expenditure or income). The KKT first-order conditions for the optimal consumptions (the x*k values) are given by: exp (e1) (x*1 1 g1) a1 21 5 l, since x*1 . 0 p1 ak 21 exp (brzk 1 ek) x*k 1 1b 5 l, if x*k . 0, k 5 2,. . ., K a pk gk
(19.13)
a 21 exp (brzk 1 ek) x*k 1 1b , l, ifx*k 5 0, k 5 2,. . ., K a pk gk k
In the above KKT conditions, the first condition is for the outside good, while the next two sets of conditions are for the inside goods (k 5 2, 3,. . ., K). Note that the price of the Hicksian outside numeraire good p1 is unity. The optimal demand satisfies the conditions in equation (19.13) plus the budget constraint g pk x*k 5 E. Substituting for the expression of l from the KKT condition for the k outside good into the KKT conditions for the inside goods, and taking logarithms, one can rewrite the KKT conditions as: Vk 1 ek 5 V1 1 e1 if x*k . 0 (k 5 2, 3,. . ., K) Vk 1 ek , V1 1 e1 if x*k 5 0 (k 5 2, 3,. . ., K) ,
(19.14)
where V1 5 (a1 2 1) ln (x*1 1 g1) 2 ln p1, and Vk 5 brzk 1 (ak 2 1) lna 3.2
x*k 1 1b 2 ln pk (k 5 2, 3,. . ., K) . gk
General Econometric Model Structure and Identification
To complete the model structure, the analyst needs to specify the error structure. In the general case, let the joint probability density function of the ek terms be f (e1, e2,. . . , eK) Then, the probability that the individual consumes the first M of the K goods is: 1`
P (x*1, x*2, x*3, . . ., x*M, 0, 0, . . ., 0) 5 0 J 0 3
e1 52`
V1 2VM 1 1 1e1 V1 2VM 1 2 1e1
3
3
eM 1 1 52`
eM 1 2 52`
V1 2VK 2 1 1e1 V1 2VK 1e1
...
3
3
eK 2 1 52`
eK 52`
f (e1, V1 2 V2 1 e1, V1 2 V3 1 e1, . . ., V1 2 VM 1 e1, eM 11, eM 12, . . ., eK 21, eK) deKdeK 21. . .deM 12deM 11de1,
(19.15)
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 437 where J is the Jacobian whose elements are given by (see Bhat, 2005): Jih 5
0 [ V1 2 Vi11 1 e1 ] 0 [ V1 2 Vi11 ] 5 ; i, h 5 1, 2, c, M 2 1. 0x*h 11 0x*h 11
(19.16)
The probability expression in equation (19.15) is a (K−M 1 1)-dimensional integral. The dimensionality of the integral can be reduced by one by noticing that the KKT conditions can also be written in a differenced form. To do so, define | e k1 5 ek 2 e1, and let | ,. . .,e | ) . Then, the implied multivariate distribution of the error differences be g (| e 21,e 31 K1 equation (19.11) may be written in the equivalent (K−M)-integral form shown below: V1 2VM 1 1 V1 2VM 1 2
) P (x*1, x*2, x*3, . . . , x* M, 0, 0, . . . , 0 5 0 J 0
3
3
| e M 1 1,1 52` |eM 1 2,1 52`
...
V1 2VK 2 1
V1 2VK
3
3
| e K 2 1,1 52` | e K,1 52`
| de | | g (V1 2 V2, V1 2 V3, . . ., V1 2 VM, | e M 11,1, | e M 12,1, . . ., | e K,1) de K,1 K 21,1. . .de M 11,1 (19.17) The equation above indicates that the probability expression for the observed optimal consumption pattern of goods is completely characterized by the (K − 1) error terms in the differenced form. Thus, all that is estimable is the (K − 1) 3 (K − 1) covariance matrix of the error differences. In other words, it is not possible to estimate a full covariance matrix for the original error terms (e1, e2,. . ., eK) because there are infinite possible densities for f (.) that can map into the same g (.) density for the error differences (see Train, 2003, p. 27, for a similar situation in the context of standard discrete choice models). There are many possible ways to normalize f (.) to account for this situation. For example, one can assume an identity covariance matrix for f (.), which automatically accommodates the normalization that is needed. Alternatively, one can estimate g(.) without reference to f (.). In the general case when the unit prices pk vary across goods, it is possible to estimate K* (K 2 1) /2 parameters of the full covariance matrix of the error differences, as just discussed (though the analyst might want to impose constraints on this full covariance matrix for ease in interpretation and stability in estimation). However, when the unit prices are not different among the goods, an additional scaling restriction needs to be imposed. A typical way to do is by normalizing the scale of the random error terms (that is, the scale of the ek terms) to one.
4
SPECIFIC MODEL STRUCTURES
4.1 The Multiple Discrete-Continuous Extreme-Value (MDCEV) Model Following Bhat (2005, 2008), consider an extreme value distribution for ek and assume that ek is independent of zk (k 5 1, 2,. . . , K) . The ek’s are also assumed to be independently distributed across alternatives with a scale parameter of s (s can be normalized to one if there is no variation in unit prices across goods). Let y (zk, ek) 5 y (zk) # eek be defined as follows:
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
438
Handbook of choice modelling V1 5 (a1 2 1) ln (x*1 1 g1) 2 lnp1 Vk 5 brzk 1 (ak 2 1) ln (x*k 1 1) 2 ln pk (k 5 2, 3, . . . , K) , when the a-profile is used, and Vk 5 brzk 2 lna
(19.18)
x*k 1 1b 2 lnpk (k 5 2, 3,. . ., K) , when the g-profile is used. gk
As discussed earlier, it is generally not possible to estimate the Vk form in equation (19.14), because the ak terms and gk terms serve a similar satiation role. From equation (19.17), the probability that the individual allocates expenditure to the first M of the K goods (M ≥ 1) with a corresponding consumption vector x* 5 (x*1, x*2, x*3, . . . , x*M, 0, 0, . . . , 0) is: P (x*1, x*2, x*3, . . ., x*M, 0, 0, . . ., 0) e1 51`
M K V1 2 Vs 1 e1 1 e1 1 V1 2 Vi 1 e1 5 0 J 0 3 e a q lc d b f 3 e q Lc d f la bde1, s s s s s i52 s5M 11
e1 52`
(19.19)
where l is the standard extreme value density function, L is the standard extreme value cumulative distribution function, and 0 J 0 is the determinant of the Jacobian matrix obtained from applying the change of variables calculus between the stochastic KKT conditions and the consumptions, given by the following expression (Bhat, 2008): 0J0 5
M p 1 2 a1 1 M i a q fi b a a b, where fi 5 a b p1 i51 f x* 1 1 g1 i51 i
(19.20)
The integral in equation (19.19) collapses to a surprisingly simple closed form expression providing the following overall expression (Bhat, 2008): qe M
M p 1 # 1 # M i ) P (x*1, x*2, x*3, . . . , x* f a b a b≥ M, 0, 0, . . . , 0 5 i q a p1 sM 21 i51 f i51 i
Vi/s
i51
a ae K
k 51
Vk/s
b
M
¥ (M21) !
(19.21)
The reader will note that the above probability expression can be used even in contexts without an essential Hicksian composite outside good. The only difference in the probability expressions between the two contexts is in how V1 is defined. Specifically, in situations without an essential Hicksian composite outside good, V1 is defined in the same fashion as Vk (k 5 2, 3, . . . , K) are defined in equation (19.18). Further, the expression in equation (19.21) is dependent on the unit price of the good that is used as the first one (see the 1/p1 term in front). In particular, different probabilities of the same consump-
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 439 tion pattern arise depending on the good that is labeled as the first good (note that any good that is consumed may be designated as the first good).4 In terms of the likelihood function, the 1/p1 term can be ignored, since it is simply a constant in each individual’s likelihood function. Thus, the same parameter estimates will result independent of the good designated as the first good for each individual. In the case when M 5 1 (that is, only one alternative is chosen), there are no satiation effects (ak51 for all k) and the Jacobian term drops out (that is, the continuous component drops out, because all expenditure is allocated to good 1). Then, the model in equation (19.21) collapses to the standard multinomial logit (MNL) model. Thus, the MDCEV model is a multiple discrete-continuous extension of the standard MNL model. 4.2
Closed Form Extensions of the Multiple Discrete-Continuous Extreme-Value (MDCEV) Model
Thus far, we have assumed that the ek terms are independently and identically extreme value distributed across alternatives k. The analyst can extend the model to allow correlation across alternatives using a generalized extreme-value (GEV) error structure. The advantage of the GEV structure is that it results in closed-form probability expressions for any and all consumption patterns. 4.2.1 The MDCNEV model Pinjari and Bhat (2010) formulate a special two-level nested case of the multiple discretecontinuous generalized extreme-value (MDCGEV) model with a nested extreme value distributed error structure that has the following joint cumulative distribution: (19.22) In the above expression, 5 (1, 2, . . . , Sk ) is the index to represent a nest of alternatives, SK is the total number of nests the K alternatives belong to, and qs (0 , qs # 1; 5 1, 2, . . .,SK) is the (dis)similarity parameter introduced to induce correlations among the stochastic components of the utilities of alternatives belonging to the th nest. This error structure assumes that the nests are mutually exclusive and exhaustive (that is, each alternative can belong to only one nest and all alternatives are allocated to one of the SK nests). Without loss of generality, let 1, 2,. . ., SM be the nests the M chosen alternatives belong to, and let q1, q2, . . . , qs be the number of chosen alternatives in each of the SM nests (thus, q1 1 q2 1 . . .1qs 5 M). Using the nested extreme value error distribution assumption specified in equation (19.22) (and the above-identified notation), Pinjari and Bhat (2010) derived the following expression for the multiple discrete-continuous nested extreme-value (MDCNEV) model:
s
s
s
M
M
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
440
Handbook of choice modelling
(19.23) In the above expression, sum (Xrs) is the sum of elements of a row matrix (Xrs) (see of Pinjari and Bhat, 2010, app. A, for a description of the form of the matrix (Xrs)). 4.2.2 The MDCGEV model More recently, Pinjari (2011) formally proved that the existence of, and derived, the closed form probability expressions for MDC models with error structure based on McFadden’s (1978) GEV structure. To do so, he expressed the probability expression in equation (19.15) as an integral of an Mth order partial derivative of the K-dimensional joint cumulative distribution function (CDF) of the error terms (e1, e2, . . . , eK) : ) P (x*1, . . . , x* M , 0, . . . , 0 5 1`
0M 0J0 3 ce F (e1, e2, . . . ,eK) f 0 ei 5V1 2Vi 1e1, 4i51,2, . . . , K d de1 0e1. . . 0eM
(19.24)
e1 52`
where F (e1,e2, . . .,eK) is the joint CDF of the error terms (e1,e2, . . .,eK) specified based on McFadden’s (1978) GEV form as below: FGEV (e1, e2, . . ., eK) 5exp [ 2G (e2e , e2e , . . ., e2e ) ] 1
2
K
(19.25)
where G is a non-negative function with the following properties: 1.
G (y1,yi,. . .,yK) $ 0, 4yi . 0 (i 5 1, 2, . . ., K)
2.
G is homogeneous of degree m.0, that is G (ay1,. . .ayi. . .,ayK) 5 amG (y1,. . .yi,. . .,yK) ,
3.
lim y S 1`G (y1, . . . , yi, . . . , yK) 5 1`, 4i 5 1, 2, . . . , K, and
4.
(21) M
i
0MG (y1,. . ., yK) # 0, 4yi . 0 (i 5 1, 2,. . ., K) . 0y1 . . . 0yM
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 441 He then derived a general, closed form for the probability expressions as below: V ) P (x*1, . . . , x* M, 0, . . . , 0 5 0 J 0 q e 3 M
i
i51
(M 2 1) ! {6 (H1H2..HM) }1 HM (M 2 2) ! {6 (H212H3..HM) 6(H1..H223..HM) 6 . . . 6 (H1H2..H2(M 21)M) }1 HM 21 (M 2 3) ! 2 { 6 (H3123H4..HM) 6. . . 6 (H122 H34 ..HM) 6 . . . }1 HM 22 ... 1! 21 M 21 )} {6 (HM 1 123. . .M 21HM) 6. . . 6 (H1H H2 0! {6 (HM 123. . .M) } H
(19.26)
234. . .M
(
)
(
)
, , where Hi 5 0H e 0e,. .., e , Hn123 . . . n 5 0 H0ee ....0e. . e and all other terms are defined similarly.5 Recognizing that working with the above general form of probability expressions becomes difficult in situations with complex covariance structures and a large set of choice alternatives (because of the sheer number of terms in the expression), Pinjari (2011) derived compact probability expressions for a variety of cross-nested error structures. The reader is referred to that paper for further details. V1
VK
n
Vi
4.3
V1
V1
VK
Vn
The Mixed MDCEV Model
The MDCGEV structure is able to accommodate flexible correlation patterns. However, it is unable to accommodate random taste variation, and it imposes the restriction of equal scale of the error terms. Incorporating a more general error structure is straightforward through the use of a mixing distribution, which leads to the Mixed MDCEV (or MMDCEV) model. Specifically, the error term,ek, may be partitioned into two components, zk and hk. The first component, zk, can be assumed to be independently and identically Gumbel distributed across alternatives with a scale parameter of pk. The second component, hk, can be allowed to be correlated across alternatives and to have a heteroscedastic scale. Let h 5 (h1,h2,. . .,hK) r, and assume that h is distributed multivariate normal, h , N (0,W) . For given values of the vector h, one can follow the discussion of the earlier section and obtain the usual MDCEV probability that the first M of the k goods are consumed. The unconditional probability can then be computed as: qe M
1
0J0≥ P (x*1,x*2, . . . , x* M, 0, . . . , 0) 5 3 sM 21 h
(Vi 1hi) /s
i51
a ae K
(Vk 1hk) /s
k 51
where F is the multivariate cumulative normal distribution.
b
M
¥ (M 2 1) ! dF ((hh) .
(19.27)
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
442
Handbook of choice modelling
Other distributions may also be used for h. Note that the distribution of h can arise from an error components structure or a random coefficients structure or a combination of the two, similar to the case of the usual mixed logit model. Thus, the model in equation (19.27) can be extended in a conceptually straightforward manner to also include random coefficients on the independent variables zk, and random-effects (or even random coefficients) in the ak satiation parameters (if the a profile is used) or the gk parameters (if the g profile is used). 4.4 The Multiple Discrete-Continuous Probit (MDCP) Model The choice of extreme value (either EV or GEV) stochastic specification is driven by convenience (of analytical tractability) rather than theory. A multivariate normally (MVN) distributed stochasticity assumption leads to complex likelihood functions, one reason why the KKT approach did not gain traction for empirical analysis until recently. Attempts have been made to address this issue by using simulation methods such as the GHK simulator (see Kim et al., 2002) and Bayesian estimation methods. However, the GHK and other such simulators become computationally impractical as the dimensionality of integration increases with the number of alternatives. Bayesian estimation methods can also be computationally intensive and saddled with convergencedetermination issues. Thus, no study has been able to estimate KKT demand systems with MVN distributions beyond a small number of alternatives. Notwithstanding the estimation difficulties, there are notable advantages of using an MVN error distribution. First, the MVN error kernel makes it easy to incorporate general covariance structures as well as random coefficients, as long as the number of choice alternatives is not too large. Second, an appealing feature of MVN errors is the possibility of negative correlations among the utilities of different alternatives (as opposed to MEV errors, which allow only positive dependency). This can potentially be exploited to capture situations where the choice of one alternative may reduce (if not preclude) the likelihood of choosing another, where the pattern of substitution is fundamentally different from the substitution due to satiation effects. Given these advantages, we show below that the probability expression of the MDCP model involves the evaluation of a multivariate normal cumulative distribution function (MVNCDF). Equation (19.17) provides the general expression for consumption probabilities for an MDC model based on KKT conditions of random utility maximization. One can rewrite the probability expression using a differenced-errors form as below: P (x*1, x*2, . . ., x*M, 0, . . . , 0) 5 0 J 0 3
(( | e 5 V1 2 V2) , (| e 3,1 5 V1 2 V3) ,. . ., (| e M,1 5 V1 2 VM) ) , c | 2,1 d (( e M 11,1 , V1 2 VM 11) , (| e K,1 , V1 2 VK) ) e M 12,1 , V1 2 VM 12) , . . ., (|
(19.28)
| ,. . ., | | | In the above expression, (| e 2,1,e e M,1,e 3,1 M 11,1,. . ., e K,1) is a K−1 dimensional vector of error differences following a multivariate normal distribution with a zero mean vector m (all elements in m are zeros), and a variance-covariance matrix S. For later
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 443 use, partition this K−1 dimensional vector into two smaller vectors A and B, where | ,. . ., | | | A 5 (| e 2,1,e e M,1) and B 5 (| e M 11,1,e 3,1 M 12,1,. . ., e K,1) . Thus, the m and S matrices m S S can also be partitioned as: m 5 c m d and S 5 c S S d . In the partition of S, S11 and S22 are the variance-covariance matrices of A and B, respectively, while S12 and S21 contain the covariance terms between the elements in A and those in B. Now, express the MDCP probability expression in equation (19.28) as: 1
11
12
2
21
22
) [ ] P (x*1, x*2, . . . x* M, 0, . . ., 0 5 0 J 0 3 P A 5 a, B , b .
(19.29)
) ) ( 0 ) P (x*1,x*2, . . .x* M, 0, . . . ,0 5 0 J 0 .P (A 5 a . P B , b A 5 a
(19.30)
where, a5{ V12V2,V12V3, . . .,V12VM } and b 5 {V1 2VM 11,V1 2VM 12, . . .,V12VK } . One can express the above expression as a product of marginal and conditional probabilities:
To simplify the conditional probability expression in the above expression, we utilize a property of MVN distribution that the distribution of B conditional on A 5 a, is another MVN distribution as given below (Tong, 1990, p. 35): 21 ( 21 (B 0 A5a) ,N (m,S ) , where m 5 m2 1 S21 S11 S12 (19.31) a 2 m1) , and S 5 S22 2 S21S11
21 In the above expression, since m1 and m2 are zero-vectors, one can write m 5 S21S11 a. Using the above result, the conditional probability expression in equation (19.30) can be expressed as Pr (B , b 0 A 5 a) 5 Pr (C , b) where C is an MVN distribution as described above. Then, the MDCP consumption probability can be expressed as:
) ) ( ) P (x*1,x*2,. . . x* M, 0, . . . ,0 5 0 J 0 .P (A 5 a .P C , c
(19.32)
In the above joint probability expression, the marginal probability P (A 5 a) is a multivariate normal probability distribution function (pdf) with a simple closed form expression, where as the MVNCDF Pr(C , c) does not have a closed form. Next, write the MVNCDF Pr (C , b) in standardized form as below: Pr (C , b) 5 Pa
b2m C2m , s b s
5 P (W1 , w1, W2 , w2,. . .,WK 2M , wK 2M)
(19.33)
where, (W1,W2, . . . ,WK 2M) is a vector of standardized, normally distributed random m variables in C 2 and (w1,w2,. . .,wK 2M) is a vector of scalars in b 2s m. Similarly, s m 5 (m1, . . . , mi, . . ., mK 2M) is a vector of means and s 5 (s1, . . ., si, . . ., sK2M) is a vector of standard deviations6 of the normally distributed random variables in C. The problem now boils down to approximating the MVNCDF in equation (19.33). In the recent past, there has been some evidence that using analytical approximations (as opposed to simulation) for evaluating the MVN cumulative distribution function can help in easier estimation of single discrete choice models (for example, the multinomial
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
444
Handbook of choice modelling
probit model; see Bhat and Sidharthan, 2011). Bhat et al. (2013) show that such analytical approximation methods can help in the estimation of MDCP models as well (that is, MDC models with MVN errors). The performance of different analytical approximation methods to evaluate the MVNCDF to estimate the parameters of the MDCP models is an open avenue for further research.
5
THE JOINT MDCEV-SINGLE DISCRETE CHOICE MODEL
The MDCEV model and its extensions discussed thus far are suited for the case when the alternatives are imperfect substitutes, as recognized by the use of a non-linear utility that accommodates a diminishing marginal utility as the consumption of any alternative increases. However, there are many instances where the choice situation is characterized by a combination of imperfect and perfect substitutes in the choice alternative set. The MDCEV model needs to be modified to handle such a combination of a multiple discrete-continuous choice among the imperfect substitutes, as well as a single choice of one alternative each from each subset of perfect substitutes. We do not discuss this case here due to space constraints, but you are referred to Bhat et al. (2006, 2009) for such formulations. Both these studies by Bhat and co-authors assume the absence of price variation across the perfect substitutes. Formulation of KKT model systems to consider price variation across imperfect substitutes as well as perfect substitutes is a potentially fruitful avenue for further research.
6
PREDICTION AND WELFARE ANALYSIS
Thanks to the above advances, several empirical applications have appeared in the recent literature using the KKT approach to model MDC choices. These applications cover a wide range of empirical contexts, including individuals’ time-use analysis, household expenditure patterns, household vehicle ownership and usage, household energy consumption, recreational demand choices and valuation of a variety of environmental goods (for example, fish stock, air quality and water quality). One reason why the KKT approach did not gain much attention until recently was the difficulty of estimating the model parameters. But we are now able to easily estimate KKT demand systems with a large number of choice alternatives (see Van Nostrand et al., 2013, for a model with 211 choice alternatives). Another reason why the KKT approach has not gained popularity is the lack of simple methods to apply the models for forecasting and policy analysis purposes. This section reviews the recent advances aimed to fill that gap. Once the model parameters are estimated, prediction exercises or welfare analyses with KKT-based MDC models involve solving the constrained, non-linear random utility maximization problem in equation (19.1) (or its dual form) for each consumer. In the presence of corner solutions (that is, multiple discreteness), there is no straightforward analytic solution to this problem. The typical approach is to adopt a constrained non-linear optimization procedure at each of several simulated values drawn from the distribution of the stochastic error terms (that is, the ek terms). The constrained optimization procedure itself has been based on either enumerative or iterative techniques.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 445 The enumerative technique (used by Phaneuf et al., 2000) involves an enumeration of all possible sets of alternatives that the consumer can potentially choose. This bruteforce method becomes computationally impractical as the number of choice alternatives increases. Von Haefen et al. (2004) proposed a numerical bisection algorithm based on the insight that, with additively separable utility functions, the optimal consumptions of all goods can be derived if the optimal consumption of the outside good is known. Specifically, conditional on unobserved heterogeneity, they iteratively solve for the optimal consumption of the outside good (and that of other goods) using a bisection procedure. They begin their iterations by setting the lower bound for the consumption of the outside good to zero and the upper bound to be equal to the budget. The average of the lower and upper bounds is used to obtain the initial estimate of the outside good consumption. Based on this, the amounts of consumption of all other inside goods are computed using the KKT conditions. Next, a new estimate of consumption of the outside good is obtained by subtracting the budget on the consumption of the inside goods from the total budget available. If this new estimate of the outside good is larger (smaller) than the earlier estimate, the earlier estimate becomes the new lower (upper) bound of consumption for the outside good, and the iterations continue until the difference between the lower and upper bounds is within an arbitrarily designated threshold. To circumvent the need to perform predictions over the entire distribution of unobserved heterogeneity (which can be time-consuming), von Haefen et al. condition on the observed choices. Pinjari and Bhat (2011) undertook analytic explorations with the KKT conditions of optimality that shed new light on the properties of Bhat’s MDCEV model with additive utility functions. Specifically, they derive a property that the price-normalized baseline marginal utility (that is, yk /pk) of a chosen alternative must be greater than the pricemarginalized baseline marginal utility of an alternative that is not chosen. Further, they discuss a fundamental property of several KKT demand model systems in the literature with additively separable utility form and a single linear binding constraint. Specifically, the choice alternatives can always be arranged in the descending order of a specific measure that depends on the functional form of the utility function. Consequently, when all the choice alternatives are arranged in the descending order of their baseline marginal utility, and the number of chosen alternatives (M) is known, it is a trivial task to identify the chosen alternatives as the first M alternatives in the arrangement. Based on this insight, Pinjari and Bhat (2011) propose computationally efficient prediction algorithms for different forms of the utility function in equation (19.3). One such forecasting algorithm, for the utility form with equal ak parameters across all choice alternatives (that is, ak 5 a 4k 5 1,2,. . .,K) for choice situations with an outside good is outlined in four broad steps below. For predictions algorithms for other additively separable utility forms, the reader is referred to Pinjari and Bhat (2011). Step 0: Assume that only the outside good is chosen and let the number of chosen goods M 5 1. Step 1: Given the input data (Zk,Pk), model parameters (b, gk, a), and the simulated error term (ek) draws, compute the price-normalized baseline utility values (yk /pk) for all alternatives. Arrange all the K alternatives available to the consumer in the descending order of the (yk /pk) values (with the outside good in the first place).
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
446
Handbook of choice modelling Step 2: Compute the value of l using the following equation. Go to step 3. E 1 a pkgk M
l5 ±
a 21
k 52
p1 (y1 /p1) 1 2a 1 a pkgk (yk /pk) 1 2a M
1
1
k 52
≤
(19.34)
Step 3: If l . (yM 11 /pM 11) (this condition represents the KKT condition for the M11th alternative) compute the optimal consumptions of the first M alternatives in the above descending order using the following expressions. Set the consumptions of other alternatives as zero and stop: (y1 /p1) 1 2a aE 1 a pkgk b M
1
x*1 5
k 52
1 1 1 2a
p1 (y1 /p )
1 a pkgk (yk /p ) M
1 k 1 2a
(19.35)
k 52
1 (yk /pk) 1 2a # aE 1 a pkgk b M
x*1 5 gk # ±
k 52
p1 (y1 /p1) 1 2a 1 a pkgk (yk /pk) 1 2a 1
M
k 52
1
2 1 ≤ ; 4k 5 (2, 3, . . . , M)
(19.36)
Otherwise, (if l # (yM 11 /pM 11) , set M 5 M11 and go to step 4. Step 4: If (M 5 K), Compute the optimal consumptions using equations (19.35) and 19.36) and stop. Otherwise, (if M , K), go to step 2. The algorithm outlined above can be applied a large number of times with different simulated values of the ek terms to sufficiently cover the simulated distribution of unobserved heterogeneity (that is, the ek terms) and obtain the distributions of the consumption forecasts. The above discussion is primarily orientated toward using KKT-based MDC models for prediction, but does not extend the discussion to include welfare analysis. For a discussion of how such prediction algorithms can be used for welfare analysis, see von Haefen and Phaneuf (2005).
7
FUTURE DIRECTIONS
In the recent past, there has been increasing recognition of the need to extend the basic formulation of consumer’s utility maximization in equation (19.1) in the following directions: 1. 2. 3.
Flexible functional forms for the utility specification. Flexible stochastic specifications for the utility functions. Flexibility in the specification of constraints faced by the consumer.
Each of these directions is discussed next.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 447 7.1 Flexible, Non-additive Utility Forms Most KKT models in the literature assume that the direct utility contribution due to the consumption of different alternatives is additively separable. Mathematically, this assumption implies that: U(x1,. . .,xK) 5 U1(x1) 1. . .1 UK(xK), and greatly simplifies the task of model estimation and welfare analysis. However, this assumption imposes strong restrictions on preference structures and consumption patterns. First, the marginal utility of one alternative is independent of the consumption of another alternative. This assumption, with an increasing and quasi-concave utility function, implies that goods can be neither inferior nor complementary; they can only be substitutes. Thus, for example, one cannot model a situation where the consumption of one good (for example, a new car) may increase the consumption of other goods (for example, gasoline). Third, even flexible substitution patterns in the consumption of different goods can be achieved only by correlating the stochastic utility components of different goods, but not through an explicit functional form. To overcome the restrictions identified above, it is critical to develop tractable estimation methods with flexible, non-additively separable utility functions. There have been a handful of recent efforts in this direction. For example, building on Bhat’s additively separable linear Box-Cox utility form, Vasquez-Lavin and Hanemann (2009) presented a general utility form with interaction terms between sub-utilities, as below: U (x)5a yk K
k 51
a ak am gk xk gk xk gm xm 1 K K ea 11b 21f 1 a a eqkm ca 11b 21d ca 11b d 21f ak gk ak gk am gm 2 k 51m51 k
(19.37)
In the above expression, the second term induces interactions between pairs of goods (m, k) and includes quadratic terms (when m 5 k). These interaction terms allow the marginal utility of a good (k) to depend on the consumption of other goods (m). Specifically, a positive (negative) value for qmk implies that m and k are complements (substitutes). However, the quadratic nature of the utility form does not maintain global consistency (over all consumption bundles) of the strictly increasing and quasi-concave property. Specifically, for certain parameter values and consumption patterns, the utility accrued can decrease with increasing consumption, or the marginal utility can increase with increasing consumption, which is theoretically inconsistent. Bhat and Pinjari (2010) show how a simple normalization by setting qmk 5 0 when m 5 k in equation (19.37) can resolve the issues of theoretical (in)consistency and parameter (un)identification. Other efforts on accommodating complementarity in consumption include Lee et al. (2010) who propose simpler interaction terms using log(quantities), and Gentzkow (2007) who accommodates interactions in indirect utility functions. Despite the above efforts, there are still unresolved conceptual and methodological issues pertaining to: (1) the form of non-additive utility functions, (2) the specification of stochasticity in non-additive utility functions, (3) estimation of parameters with increasing number of choice alternatives, and (4) interpretation of the resulting dependency patterns in consumption. Resolving these issues will be a big step forward in enhancing the behavioral realism of KKT-based RUM MDC models. Further, within the context
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
448
Handbook of choice modelling
of non-additively separable preferences, it is important to recognize asymmetric dependencies in consumption. For example, the purchase of a new car may lead to increased gasoline consumption, but not the other way round. 7.2 Flexible Stochastic Specifications The above discussion was in the context of the form of the utility function. But there is potential for improving the stochastic specification as well. For example, most studies assume independent identically distributed (iid) extreme value random error terms in the utility function. Recent advances on relaxing the iid assumption, specifically via employing MEV distributions, have been discussed in section 4.2. Although we are now able to estimate KKT-based RUM MDC models with general MEV stochastic distributions, no clear understanding exists on how different stochastic specifications and utility functional forms influence the properties of KKT models. Examining the substitution patterns implied by the different stochastic assumptions in KKT-based MDC models is a useful avenue for research. Further, the estimation of the MDCP model with MVN distributed stochasticity (as discussed in section 4.4) is an important avenue for investigation. 7.3 Multiple Constraints Most MDC model applications to date consider only a single linear binding constraint as governing the consumption decisions (for example, the linear constraint in equation 19.1). This stems from an implicit assumption that only a single resource is needed to consume goods. However, in numerous empirical contexts, multiple types of resources, such as time, money and space, need to be expended to acquire and consume goods. While the role of multiple constraints has been long recognized in microeconomic theory (see Becker, 1965), the typical approach to accommodating the different constraints has been to convert them all into a single effective constraint. For example, the time constraint has been collapsed into the money constraint using a monetary value of time. In many situations, however, it is important to consider the different constraints in their own right, because resources may not always be freely exchangeable with each other. To address this issue, a handful of recent studies (Satomura et al., 2011; Castro et al., 2012; Pinjari and Sivaraman, 2012) have provided model formulations to accommodate multiple linear constraints with additive utility functional forms. Satomura et al. (2011) provided a formulation to account for the role of money and space constraints in consumers’ decisions on soft drink purchases. Castro et al. (2012) provide a general treatment of the issue by providing formulations for different scenarios such as complete demand systems (that is, a case without the need of a Hicksian composite good), and incomplete demand systems (a case with the Hicksian composite good). Pinjari and Sivaraman (2012) provide a time- and money-constrained formulation in the context of households’ annual vacation travel destination and mode choices. 7.4
Beyond Simple, Linear Constraints
The above discussion suggests that we have just begun to move toward models with multiple constraints. It is worth noting, however, that most of the literature on MDC
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 449 modeling is geared toward simple, linear constraints that do not represent the complexity of situations consumers face in reality. There are several reasons why linear constraints do not hold. First, linear constraints represent a constant price per unit consumption (or a constant rate of resource-use). In many situations, however, prices vary with the amount of consumption leading to non-linear budget constraints. A classic example of such non-linear budgets is block pricing typically used in energy markets (for example, electricity pricing). While the issue has long been recognized in the classical econometric literature on estimating demand functions, it is yet to be given due consideration in MDC choice studies. Second, linear constraints do not accommodate fixed costs (or set-up costs) which cannot be converted into a constant price per unit consumption. For example, travel cost to a vacation destination is a fixed cost, unlike the lodging costs at the destination which can be treated as variable with a constant price per night. Solving the consumer’s direct utility maximization problem with non-linear constraints can become rather tedious, because the KKT conditions alone may not be sufficient anymore. In a recent study, Parizat and Shachar (2010) employ an enumeration approach to solve a direct utility maximization problem in the context of individuals’ weekly leisure time allocation with fixed costs (for example, ticket costs of going to a movie, the price of a meal). They acknowledge rather large computation times to estimate the parameters for their 12-alternative case. Thus, an alternative approach to incorporate non-linear constraints may be to work with the dual problem using indirect utility functions. Lee and Pitt (1987) provide a methodological treatment of incorporating block pricing with the dual approach. Further studies exploring this approach may enhance our ability to incorporate block prices. Another approach is to econometrically ‘treat’ the inherent endogeneity between prices and consumption due to the dependency of prices on consumption, for example, by estimating price functions simultaneously with the consumer preferences (that is, utility functions). This approach can potentially help in dealing with demand-supply interactions in the market as well (see Berry et al., 1995). 7.5
Prediction and Welfare Analysis with Flexible Model Structures
Thanks to recent advances, we now have simple and computationally efficient methods to apply KKT models with additive utility forms for forecasting and welfare analysis purposes. As the field moves forward with the specification and estimation of more flexible MDC models, it is important to develop methods to apply these models as well. The prediction procedures proposed by von Haefen et al. (2004) and Pinjari and Bhat (2011) based on Karush-Kuhn-Tucker conditions of optimality can potentially be extended to the case with multiple linear binding constraints as well, although with additional layers of computational effort (as many as the number of constraints). However, these procedures fall apart in situations with non-additive utility functions, as they are critically hinged upon the additive utility assumption. Similarly, the presence of non-linear constraints can make it difficult to apply KKT conditions alone for solving the utility maximization problem. Resolving each of these issues is a welcome research direction. Another useful direction of research is in the context of additive utility functions with a simple linear constraint. While we are now able to exploit the KKT conditions for
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
450
Handbook of choice modelling
obtaining the conditional predictions (given specific values of the random terms), we have not been able to characterize the unconditional distributions of the demand functions. In the presence of corner solutions, it is difficult to arrive at closed form expressions for the demand functions from equation (19.1). Perhaps this is why we are not aware of successful attempts to arrive at analytical expressions for price elasticities and sensitivities to explanatory variables. Besides, application of these models requires the simulation of the stochastic terms. In some cases, such as the case with MEV stochastic distributions, the stochastic terms themselves are difficult to simulate. Thus development of fast methods to simulate MEV distributions can aid in the application of KKT models with such stochastic specifications.
8
SUMMARY
There has been an increasing recognition of the MDC nature of consumer choices. Over the past decade the field has witnessed exciting developments in modeling MDC choices, especially with the advancement of the KKT approach to modeling consumer behavior based on RUM. Notable developments include: 1. 2. 3. 4.
Clever specifications with distributional assumptions that lead to closed-form probability expressions enabling easy estimation of the structural parameters (for example, the MDCEV model). Application of the KKT approach to model MDC choices in a variety of empirical contexts. Formulation of computationally efficient prediction/welfare analysis methods with KKT models. Extension of the basic RUM specification in equation (19.1) to accommodate richer patterns of heterogeneity in consumer preferences and to allow flexibility in distributional assumptions. Most of these extensions have been ‘econometrically’ oriented, akin to the extensions of the multinomial logit model in the traditional discrete choice analysis literature.
In the recent past, there has been an increasing recognition of the need to extend the basic formulation of consumer’s utility maximization in equation (19.1) in the following directions: 1. 2. 3.
Flexible functional forms for the utility specification, such as non-additive utility forms. Flexible stochastic specifications for the utility functions, such as MVN distributions. Flexibility in the specification of constraints faced by the consumer, including multiple inter-related constraints, and non-linear constraints.
Given the pace of recent developments, we optimistically look forward to seeing model formulations, estimation methods and prediction/welfare analysis procedures for a general framework with non-additive utility forms, flexible stochastic distributional assumptions and general forms of constraints.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 451
ACKNOWLEDGEMENTS This chapter draws heavily in some places from papers by Bhat and colleagues, and a recent workshop report by Abdul Pinjari, Chandra Bhat and David Bunch.
NOTES 1. A complete demand system involves the modeling of the demands of all consumption goods that exhaust the consumption space of consumers. However, complete demand systems require data on prices and consumptions of all commodity/service items, and can be impractical when studying consumptions in finely defined commodity/service categories. Thus, it is common to use an incomplete demand system, typically in the form of a two-stage budgeting approach or in the form of the use of a Hicksian composite commodity assumption. In the former two-stage budgeting approach, separabilility of preferences is invoked, and the allocation is pursued in two independent stages. The first stage entails allocation between a limited number of broad groups of consumption items, followed by the incomplete demand system allocation of the group expenditure to elementary commodities/services within the broad consumption group of primary interest to the analyst (the elementary commodities/services in the broad group of primary interest are commonly referred to as ‘inside’ goods). The plausibility of such a two-stage budgeting approach requires strong homothetic preferences within each broad group and strong separability of preferences, or the less restrictive conditions of weak separability of preferences and the price index for each broad group not being too sensitive to changes in the utility function (see Menezes et al., 2005). In the Hicksian composite commodity approach, the analyst assumes that the prices of elementary goods within each broad group of consumption items vary proportionally. Then, one can replace all the elementary alternatives within each broad group (that is not of primary interest) by a single composite alternative representing the broad group. The analysis proceeds then by considering the composite goods as ‘outside’ goods and considering consumption in these outside goods as well as the ‘inside’ goods representing the consumption group of main interest to the analyst. It is common in practice in this Hicksian approach to include a single outside good with the inside goods. If this composite outside good is not essential, then the consumption formulation is similar to that of a complete demand system. If this composite outside good is essential, then the formulation needs minor revision to accommodate the essential nature of the outside good. Please refer to von Haefen (2010) for a discussion of the Hicksian approach and other incomplete demand system approaches such as the one proposed by Epstein (1982) that we do not consider here. In this chapter, we consider incomplete demand systems in the form of the second stage of a two-stage incomplete demand system with a finite, positive total budget as obtained from the first stage (for presentation ease, we will refer to this case as the ‘inside goods only’ case in which at least one ‘inside’ good has to be consumed and there are no essential outside goods) or in the form of a Hicksian composite approach with a single outside good that is essential and no requirement that at least one of the inside goods has to be consumed (for presentation ease, we will refer to this case simply as the ‘essential outside good’ case or even more simply, as the outside good case; if the outside good is non-essential, the formulation becomes identical to the case of the ‘inside goods only’ case, while if there are multiple outside goods, the situation is a very simple extension of the formulations presented here depending on whether the outside goods are all essential, all non-essential, or some combination of essential and non-essential). Finally, a complete demand system takes the same formulation as the ‘inside goods only’ formulation. 2. Hanemann (1984) used this approach to derive a variety of SDC model forms consistent with equation (19.2). Chiang (1991) and Chintagunta (1993) extend Hanemann’s SDC formulation to include the possibility of no inside goods being selected by introducing a ‘reservation price’. In their approach, an inside good is selected only if the quality adjusted price of at least one of the inside goods is below the reservation price. See Dubin and McFadden (1984) for another, slightly different, way of employing the (conditional) indirect utility approach for SDC choice analysis. 3. Note that the subsequent discourse is for the case with a Hicksian composite outside good that is essential. However, the derivations carry over to the case without an outside good in a straightforward manner. 4. This is not an issue in contexts with a numeraire Hicksian composite outside good because p15 1.. 5. G and H are similar functions, but with different arguments; G represents G (e2e ,. . .,e2e ) , whereas H represents G (eV1, . .. eVi, . . . , eVn) . Note from the 6 signs that the sign in front of each mixed partial derivative term depends on the number of partial derivatives in the term and the number of chosen alternatives M. Also note that the model structures for MDCNEV and MDCGEV are derived for the case without price variation across choice alternatives. One can extend these structures for situations with price variation in a straightforward fashion. 6. si is the square root of the iith element of the covariance matrix S. 1
n
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
452
Handbook of choice modelling
REFERENCES Becker, G.S. (1965), ‘A theory of the allocation of time’, Economic Journal, 75 (299), 493–517. Berry, S., J. Levinsohn and A. Pakes (1995), ‘Automobile prices in market equilibrium’, Econometrica, 60 (4), 841–90. Bhat, C.R. (2005), ‘A multiple discrete-continuous extreme value model: formulation and application to discretionary time-use decisions’, Transportation Research Part B, 39 (8), 679–707. Bhat, C.R. (2008), ‘The multiple discrete-continuous extreme value (MDCEV) model: role of utility function parameters, identification considerations, and model extensions’, Transportation Research Part B, 42 (3), 274–303. Bhat, C.R. and A.R. Pinjari (2010), ‘The generalized multiple discrete-continuous extreme value (GMDCEV) model: allowing for non-additively separable and flexible utility forms’, working paper, Department of Civil, Architectural and Environmental Engineering, University of Texas at Austin. Bhat, C.R. and R. Sidharthan (2011), ‘A simulation evaluation of the maximum approximate composite marginal likelihood (MACML) estimator for mixed multinomial probit models’, Transportation Research Part B, 45 (7), 940–53. Bhat, C.R., M. Castro and M. Khan (2013), ‘A new estimation approach for the multiple discrete-continuous probit (MDCP) choice model’, Transportation Research Part B, 55, 1–22. Bhat, C.R., S. Sen and N. Eluru (2009), ‘The impact of demographics, built environment attributes, vehicle characteristics, and gasoline prices on household vehicle holdings and use’, Transportation Research Part B, 43 (1), 1–18. Bhat, C.R., S. Srinivasan and S. Sen (2006), ‘A joint model for the perfect and imperfect substitute goods case: application to activity time-use decisions’, Transportation Research Part B, 40 (10), 827–50. Bunch, D.S. (2009), ‘Theory-based functional forms for analysis of disaggregated scanner panel data’, working paper, Graduate School of Management, University of California-Davis. Castro, M., C.R. Bhat, R.M. Pendyala and S.R. Jara-Diaz (2012), ‘Accommodating multiple constraints in the multiple discrete-continuous extreme value (MDCEV) choice model’, Transportation Research Part B, 46 (6), 729–43. Chiang J. (1991), ‘The simultaneous approach to the whether, what, and how much to buy questions’, Marketing Science, 10 (4), 297–315. Chintagunta, P. (1993), ‘Investigating purchase incidence, brand choice and purchase quantity decisions of households’, Marketing Science, 12 (2), 184–208. Deaton, A. and J. Muellbauer (1980), Economics and Consumer Behavior, Cambridge: Cambridge University Press. Dubin, J. and D. McFadden (1984), ‘An econometric analysis of electricity appliance holdings and consumption’, Econometrica, 52 (2), 345–62. Epstein, L.G. (1982), ‘Integrability of incomplete systems of demand functions’, Review of Economic Studies, 49 (3), 411–25. Gentzkow, M. (2007), ‘Valuing new goods in a model with complementarity: online newspapers’, American Economic Review, 97 (3), 713–44. Hanemann, W.M. (1978), ‘A methodological and empirical study of the recreation benefits from water quality improvement’, PhD dissertation, Department of Economics, Harvard University. Hanemann W.M. (1984), ‘Discrete/continuous models of consumer demand’, Econometrica, 52 (3), 541–61. Kim, J., G.M. Allenby and P.E. Rossi (2002), ‘Modeling consumer demand for variety’, Marketing Science, 21 (3), 229–50. Lee, L.F. and M.M. Pitt (1986), ‘Microeconometric demand systems with binding nonnegativity constraints: the dual approach’, Econometrica, 54 (5), 1237–42. Lee, L.F. and M.M. Pitt (1987), ‘Microeconometric models of rationing, imperfect markets, and nonnegativity constraints’, Journal of Econometrics, 36 (1–2), 89–110. Lee, S., J. Kim and G. Allenby (2010), ‘A direct utility model for asymmetric complements’, Working paper, Ohio State University. Mäler, K.-G. (1974), Environmental Economics: A Theoretical Inquiry, Baltimore, MD: Johns Hopkins University Press for Resources for the Future. McFadden, D. (1978), ‘Modelling the choice of residential location’, in A. Karlquist, L. Lundqvist, F. Snickers and J. Weibull (eds), Spatial Interaction Theory and Residential Location, Amsterdam: North-Holland, pp. 75–96. Menezes, T.A., F.G. Silveira and C.R. Azzoni (2005), ‘Demand elasticities for food products: a two-stage budgeting system’, NEREUS-USP, São Paulo (TD Nereus 09-2005). Parizat, S. and R. Shachar (2010), ‘When Pavarotti meets Harry Potter at the Super Bowl’, working paper, Tel Aviv University.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Multiple discrete-continuous choice models 453 Phaneuf, D.J., C.L. Kling and J.A. Herriges (2000), ‘Estimation and welfare calculations in a generalized corner solution model with an application to recreation demand’, Review of Economics and Statistics, 82 (1), 83–92. Pinjari, A.R. (2011), ‘Generalized extreme value (GEV)-based error structures for multiple discrete-continuous choice models’, Transportation Research Part B, 45 (3), 474–89. Pinjari, A.R. and C.R. Bhat (2010), ‘A multiple discrete-continuous nested extreme value (MDCNEV) model: formulation and application to non-worker activity time-use and timing behavior on weekdays’, Transportation Research Part B, 44 (4), 562–83. Pinjari, A.R. and C.R. Bhat (2011), ‘An efficient forecasting procedure for Kuhn-Tucker consumer demand model systems: application to residential energy consumption analysis’, working paper, University of South Florida. Pinjari, A.R. and V. Sivaraman (2012), ‘A time and money budget constrained model of long-distance vacation travel demand’, working paper, University of South Florida. Roy, R. (1947), ‘La distribution du revenu entre les divers biens’ [‘The distribution of income among various goods’], Econometrica, 15 (3), 205–25. Satomura, S., J. Kim and G. Allenby (2011), ‘Multiple constraint choice models with corner and interior solutions’, Marketing Science, 30 (3), 481–90. Tong, Y.L. (1990), The Multivariate Normal Distribution. New York: Springer-Verlag. Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge: Cambridge University Press. Van Nostrand, C., V. Sivaraman and A.R. Pinjari (2013), ‘Analysis of annual, long-distance, vacation travel demand in the United States: a multiple discrete-continuous choice framework’, Transportation, 40 (1), 151–71. Vasquez Lavin, F. and M. Hanemann (2009), ‘Functional forms in discrete/continuous choice models with general corner solution’, working paper, University of California Berkeley. Von Haefen, R.H. (2010), ‘Incomplete demand systems, corner solutions, and welfare measurement’, Agricultural and Resource Economics Review, 39 (1), 22–36. Von Haefen, R.H. and D.J. Phaneuf (2005), ‘Kuhn-Tucker demand system approaches to nonmarket valuation’, in R. Scarpa and A. Alberini (eds), Applications of Simulation Methods in Environmental and Resource Economics, Dordrecht: Springer, pp. 135–58. Von Haefen, R.H., D.J. Phaneuf and G.R. Parsons (2004), ‘Estimation and welfare analysis with large demand systems’, Journal of Business and Economic Statistics, 22 (2), 194–205. Wales, T.J. and A.D. Woodland (1983), ‘Estimation of consumer demand systems with binding non-negativity constraints’, Journal of Econometrics, 21 (3), 263–85.
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
20 Bayesian estimation of random utility models Peter Lenk*
1
INTRODUCTION
Conjoint studies and their Bayesian estimation are remarkably intertwined. Luce and Tukey (1964) originated conjoint analysis for measuring judgment and perception in mathematical psychology. They proposed a system to measure constituent components of multi-attribute stimuli from subjects’ ordering of the stimuli. Meanwhile in economics, Lancaster (1966) proposed a theory of consumer choice that decomposed the utility of goods into the utility for their attributes. Green and Rao (1971) synthesized these two ideas to decompose the desirability of product1 attributes from subjects’ rankings of the products. For example, three attributes for hotels are room comfort, business centres and swimming pools. Based on a subject’s ranking of hotels, the researcher can measure the preferences for each attribute. Then a hotel chain can use this information to design hotels for different segments of customers. For instance, business travellers may appreciate business centers but not swimming pools, while families travelling with children prefer swimming pools to business centres. Wind et al. (1989) conducted such a study to design Courtyard by Marriott. The connection between Bayesian inference and conjoint analysis runs deeper than merely providing practical, effective estimation and measurement methods. They both have foundations in utility theory. Random utility theory (RUT), introduced by McFadden (1974) and foreshadowed by Aitchison and Bennet (1970), provides the economic foundation for conjoint analysis. RUT assumes that subjects select products that maximize their utility, or ‘brand enjoyment’ in Aitchison and Bennet, among a competitive set of alternatives. Bayesian analysis also is grounded in utility theory. Savage (1954) extended von Neumann and Morgenstern (1944) axioms of rational preferences to endogenize probability: probability becomes a subjective measure for belief. Savage applied his theory to inference and derived decision rules that maximize expected utility (or minimize expected loss) with respect to the decision maker’s subjective probability of the parameters. The decision maker updates his or her prior distributions by Bayes theorem as sample data become available. Bayesian analysis of conjoint models provides a unique setting where both the data-generating mechanism and the philosophy for inference share common theoretical roots. Most conjoint analyses use hierarchical models with two or more levels. The subjectlevel model relates the observed responses to the products’ attributes, often with the intermediate step of imputing unobserved or latent utilities. This model usually contains subject-specific parameters that allow subjects to have different preferences for product attributes. The population-level model describes the heterogeneity or distribution of the subject-specific parameters across the population. Variations of conjoint analysis alter the specifications of the subject-level or population-level models and the elicitation task. Subjects may rate, rank, or choose products. Different functional forms and 457 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
458
Handbook of choice modelling
distributional assumptions for the random utilities imply qualitatively different behaviour, and different population-level models result in different policy recommendations. These variations produce a large model space and numerous estimation procedures that differ in their details. Unlike classical inference that uses different approaches for these variations, Bayesian inference applies one method. Bayesian pay the price for this simplicity with complex numerical methods to approximate integrals. Fortunately, the reduction in computing cost and the development of numerical methods over the last 20 years have bought Bayesian inference in reach of anyone who owns a laptop. Sawtooth Software and SAS have commercial grade implementations for Bayesian conjoint models, and there are ever-growing freeware, especially Winbugs and R. The goal of the chapter is to give readers a toolset for hierarchical Bayes (HB) analyses of a wide range of conjoint specifications. Hierachical Bayes models are not specialized to conjoint analysis, and its application to conjoint models draws on all aspects of Bayesian theory. Hierachical Bayes models have a long history: Hill (1965) introduced HB models for random effects, one-way ANOVA; and Lindley and Smith (1972) and Smith (1973) proposed HB linear models. Lenk et al. (1996) applied HB analysis for metric conjoint and Allenby and Lenk (1994, 1995) considered discrete-choice HB. Hierachical Bayes models are often termed ‘Bayesian random coefficient models’, though the terminology can be misleading. Bayesians treat all unknown parameters as random, and they estimate random coefficients as though they are fixed coefficients. I will follow the following operational definitions: ‘fixed coefficients’ have prior distributions, and ‘random coefficients’ have heterogeneity distributions. Bayesian models specify a joint probability distribution for the data and all unknown quantities. The joint distribution includes the likelihood function for sample information, heterogeneity distributions for subject-level parameters, structural constraints and prior distributions. Bayesian inference derives posterior, predictive and marginal distributions from this joint distribution. Bayes estimators or Bayes rules minimize expected posterior loss for different loss functions. Bayesian inference is internally consistent and coherent (De Finetti, 1937) because all of the computations are obtained from the joint distribution by simple probability calculations. Bayesian analysis optimally combines all sources of information (Bernardo and Smith, 1994). Numerous studies have shown that Bayesian estimation also has desirable sampling properties, such as asymptotic normality and consistency (Berger, 1985). In very general settings, if the prior distributions do not rule out the truth, then Bayesian inference is consistent in probability (Doob, 1949). These theoretical properties, which are often overlooked in the rush to the computer, guarantee that researchers will be well treated by Bayesian analysis. Conjoint studies often produce ‘broad and shallow’ data: many subjects and few observations per subject. Researchers need to have many subjects to estimate the distribution of subject-parameter heterogeneity, which allow subjects to have different preferences. If studies were also ‘deep’ with many observations per subject, then two-stage methods would work well because the small estimation error for subject-level parameter estimates would not greatly distort the heterogeneity distribution. However, ‘broad and deep’ studies (many subjects and many observations per subject) are prohibitively expensive and difficult for subjects to complete. With broad and shallow data (many subjects and few observations per subject), two-stage methods fail because individual-level estimators may not exist or have high sampling variation, thus distorting the heterogeneity
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
459
Bayesian estimation of random utility models
distribution. Hierarchical Bayes inference introduces bias in the individual-level estimates to reduce their sampling error by shrinking them towards population-level estimators (Allenby and Rossi, 1998). Shrinkage estimators also appear in classical statistics to reduce sampling error: James-Stein estimation (Stein, 1956; James and Stein 1961), ridge regression (Hoerl and Kennard, 1970) and penalized maximum likelihood (Good, 1971; de Montricher et al., 1975). These classical methods can be viewed as special cases of Bayesian inference. Bayesian shrinkage occurs automatically from combining different sources of information in the joint distribution. The amount of shrinkage depends on estimation error and the explanatory power of the population-level model. In this way, HB analysis reliably estimates both the subject-level and population-level models with relatively few observations for each subject. Researchers often conflate Bayesian analysis with its numerical methods, such as Markov chain Monte Carlo (MCMC). The next section identifies essential elements of Bayesian analysis. Section 3 then surveys numerical approximation methods for integration, starting with the well-known grid methods from high school calculus and ending with MCMC simulation algorithms, the workhorse of modern Bayesian computation. Section 4 presents a series of MCMC algorithms for HB regression models for continuous, ordinal and nominal data. Readers can skip these details without loss of continuity. If you decide to implement your own software, revisiting the details will be beneficial. Section 5 discusses Bayesian hypothesis testing and model selection, and section 6 concludes the chapter with a partial survey of extensions and elaborations of the basic random utility model. Recent texts on Bayesian inference or conjoint analysis are Louviere et al. (2000), Lancaster (2004), Orme (2006), Rossi et al. (2005), Koop et al. (2007) and Train (2009).
2 BASICALLY BAYES Bayesian analysis rests on three pillars: the joint distribution of all random components to specify the model; probability calculus to derive marginal, posterior and predictive distributions; and loss functions to derive Bayes rules, which are decision rules for optimal estimation. Bayesian analysis consists of learning and summarization processes. Bayesian encode their prior beliefs about unknown parameters, such as attribute preferences, with probability distributions. When they obtain data from a conjoint study, they update these beliefs by computing posterior distributions in the learning step. They then estimate parameters with various statistics from the posterior distribution in the summarization step. Conjoint studies use repeated measures where each subject provides more than one evaluation. To fix notation, there are n subjects where subject i evaluates mi products or options. The total number of evaluations is M 5 m1 1 . . . 1 mn. Yi is the vector of responses for subject i; Xi are exogenous variables, which can include attributes of the products, experimental manipulations, and subject-level covariates. The entire observed data are (X,Y) where Y 5 {Y1,. . ., Yn} and X 5 {X1,. . ., Xn}. W are the unknown parameters. The joint distribution of Y and W given X is: f (Y, W 0 X) 5 f (Y 0 X, W) g (W)
(20.1)
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
460
Handbook of choice modelling
where f(Y|X,W) is the distribution of the data given the parameters and X, and g is the prior distribution for W. If Y or W is a continuous random variable, then f or g is a density function.2 If Y or W is a discrete random variable, then f or g is a probability mass function.3 The likelihood function L(W) 5 f(Y|X, W) expresses the information in the fixed data Y about the unknown parameter W, and the prior distribution summarizes our knowledge about the parameters before obtaining the data. Because X is fixed and exogenous in conjoint studies, we suppress it in following. If X were endogenous, we would have to expand the joint distribution to include its distribution. If the subjects’ responses are conditionally independent given X and W, then the overall likelihood factors into n subject specific likelihoods: f (Y 0 W) 5 wi51 f (Yi 0 W) . Further, if the mi evaluations within mi f ( yij 0 W) where y is subject i are conditionally independent given W, then f (Yi 0 W) 5 w j51 ij th subject i’s evaluation for the j stimulus or product. The Bayesian learning process updates our prior knowledge about W after observing the sample information Y by using Bayes theorem. The updating process results in the posterior distribution of W given the data Y: g (W 0 Y) 5
f (Y,W) f (Y 0 W) g (W) 5 f (Y) f (Y)
(20.2)
where f(Y) is the marginal distribution of Y or the integrated likelihood:4 f (Y) 5 3 f (Y 0 W) g (W) dW.
(20.3)
DB 5 arg min D3 L [ D,R (W) ] g (W 0 Y) dW.
(20.4)
Because Y is fixed at the observed data, the integrated likelihood f(Y) is a normalizing constant for the posterior distribution. Bayesians simply write: g (W 0 Y) ~ f (Y 0 W) g (W) , due to laziness and not to profundity. This normalizing constant f(Y) adjusts the posterior so that it integrates to one, and it does not affect its shape or location. The summarization process focuses on various aspects of the posterior distribution. It may be sufficient to graph the posterior distributions in one or two dimensions. Other summary measures are means, standard deviations, correlations, and percentiles. These measures have decision theoretic justifications based on different loss functions (negative utility for using an estimator). The loss function L[D, R(W)] measures the penalty for using the decision rule or estimator D for parameter R(W) where R is a function of W. R could be as simple as the identity function or as complex as market shares, profits, willingness-to-pay, consumer surplus, or social welfare. For example, squared-error loss is L [ D,R (W) ] 5 [ D 2 R (W) ] ’ [ D 2 R (W) ] . The Bayes rule DB minimizes the posterior expected loss for all possible D:
The integral is the posterior expected loss for using decision rule D. DB is the point estimator that gives minimal loss. Bayes rules are admissible: other estimators cannot uniformly dominate Bayes rules across all parameter values with respect to the loss function (DeGroot, 1970). The posterior mean E [ R (W)0Y ] 5 eR (W) g (W 0 Y) dW is the Bayes rule for squared-error loss; the posterior median is the Bayes rule for absolute error loss, and the posterior mode is the Bayes rules for 0/1 loss.5 The posterior Bayes risk measures the uncertainty in the Bayes rule: e L [ DB,R (W) ] g (W 0 Y) dW. Under squared-error loss, the Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
461
Bayesian estimation of random utility models
Bayes risk is the posterior variance. Many software packages report the posterior mean as the point estimator and the posterior standard deviation as a measure of estimation uncertainty. Bayesians use highest posterior density intervals (HPDI) as a substitute for confidence intervals. Conceptually, draw a horizontal line through the posterior density. Compute the area under the density and between the endpoints determined by the intersection of the horizontal line and density. Find the highest horizontal line such that the area is a specified value, say 90 percent or 95 percent. The HPDI is the set of parameter values corresponding to this area. If the posterior density is approximately normal, a fast and dirty approximation to the 95 percent HPDI is the posterior mean 1 2 posterior standard deviations. HPDI may be an intersection of disjoint subintervals if the posterior density is multi-modal. The learning process for unknown parameters also extends to prediction: future values of Y can be viewed as unknown parameters. Conceptually, Bayesians do not make a major distinction between inference and prediction, unlike classical statistics. The ‘posterior’ distribution for future observations are predictive distributions that integrate the likelihood function for future Yn11, . . ., Yn1k over the posterior distribution from past Y1, . . ., Yn. If the Y’s are conditionally independent given W, then the predictive distribution is: f (Yn 11,. . . ,Yn 1k 0 Y1, . . . ,Yn) 5
f (Y1,. . . ,Yn 1k) f (Y1, . . .,Yn)
5 3 c q f (Yn 1j 0 W) d g (W 0 Y1,. . . ,Yn) dW. k
(20.5)
j51
Loss functions also apply to prediction: the predictive mean is optimal for squared-error loss, and so on. For prediction, the equivalent of the HPDI uses the prediction distribution instead of a posterior distribution. These highest predictive density intervals indicate the range of most likely values for future Y variables. Bayesian summarization includes all sources of information, both sample information and prior information, by integrating over the posterior or predictive distributions. Unfortunately, integration is not easy. The next section briefly describes numerical approximation methods.
3 NUMERICAL APPROXIMATIONS Except for a small number of special models, such as linear regression with conjugate priors,6 Bayesian rely on numerical approximations of posterior expectations. In general, if R is a functional of parameters W, then the posterior expectation of R(W) is: E [ R (W) 0 Y ] 5 3 R (W) g (W 0 Y) dW
(20.6)
This section presents different approximation tactics in roughly their historical order. As computational resources have dramatically increased, the methods have become more
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)
462
Handbook of choice modelling
sophisticated, efficient and effective. However, the earlier techniques, which are easy to understand, provide insight into recent methods, which are less intuitive. Grid Methods Grid methods have been around since the beginning of calculus in the seventeenth century. Definite integrals are the area between a curve and the W-axis between two end-points. This area can be approximated by a sequence of rectangles or other shapes with known areas. These methods are feasible for evaluating posterior expectations if the dimension of W is small. In one dimension, we break the range of W into T intervals determined by the points W0 , . . . , WT. These grid points form the bases of approximating rectangles. A simple approximation is: 3 R (W) g (W 0 Y) dW