NEW DEVELOPMENTS IN PSYCHOLOGICAL CHOICE MODELING
ADVANCES
IN PSYCHOLOGY 60 Editors:
G . E. STELMACH
P. A. VROON
NORTH-HOLLAND AMSTERDAM. NEW YORK . OXFORD. TOKYO
NEW DEVELOPMENTS IN PSYCHOLOGICAL CHOICE MODELING
Edited by
Geert DE SOETE Universin; of Ghent Belgium
Hubert FEGER Free University of Berlin F.R.G.
Karl C.KLAUER Free University of Berlin F.R.G.
I989
NORTH-HOLLAND AMSTERDAM. NEW YORK . OXFORD. TOKYO
ELSEVIER SCIENCE PUBLISHERS B.V. Sara Burgerhartstraat 25 P.O. Box 21 I , lo00 AE Amsterdam, The Netherlands Distributors for the United States and Canada: ELSEVIER SCIENCE PUBLISHING COMPANY, INC. 655 Avenue of the Americas New York, N.Y. 10010,U.S.A.
ISBN: 0 444 88057 7
OELSEVIER SCIENCE PUBLISHERS B.V.. 1989 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science Publishers B.V./Physical Sciences and Engineering Division, P.O. Box 199I , 1000 BZ Amsterdam, The Netherlands. Special regulations for readers in the U.S.A. - This publication has been registered with the Copyright Clearance Center Inc. (CCC), Salem, Massachusetts. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including photocopying outside of the U.S.A., should be referred to the copyright owner, Elsevier Science Publishers B.V., unless otherwise specified. No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Printed in The Netherlands.
V
CONTENTS
List of contributors
vii
Introduction
1
Order invariant unfolding analysis under smoothness restrictions. W . J . Heiser
3
An analytical approach to unfolding. H . Feger
33
GENFOLD2: A general unfolding methodology for the analysis of preference/dominance data. W. S. DeSarbo & V . R . Rao
57
Maximum likelihood unidimensional unfolding for a probabilistic model without distributional assumptions. P . M.Bossuyt & E. E. Roskam
77
Latent class models for the analysis of rankings. M . A . Croon
99
The wandering ideal point model for analyzing paired comparisons data. G. De Soete, J . D. Carroll, & W. S. DeSarbo
123
Analysis of covariance structures and probabilistic binary choice data. Y . Takane
139
Two classes of stochastic tree unfolding models. J. D . Carroll, W . S . DeSarbo, & G.De Soete
161
Probabilistic multidimensional analysis of preference ratio judgments. J . L. Zinnes & D . B . MacKay
177
Testing probabilistic choice models. P . M . Bossuyt & E . E . Roskam
207
vi
Contcnts
On the axiomatic foundations of unfolding, with an application to political party preferences of German voters. B. Orth
22 1
Unfolding and consensus ranking: A prestige ladder for technical occupations. R. van Blokland-Vogelesang
237
Unfolding the German political parties: A description and application of multiple unidimensional unfolding. W . H . van Schuur
259
Probabilistic multidimensional scaling models for analyzing consumer choice behavior. W . S. DeSarbo, G. De Soete, & K . Jedidi
29 1
Probabilistic choice behavior models and their combination with additional tools needed for applications to marketing. W. Gaul
317
Author index
339
Subject index
341
vii
LIST OF CONTRIBUTORS
P. M. Bossuyt, Center for Clinical Decision Making, Erasmus University, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands. J. D. Carroll, AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, New Jersey 07974, U.S.A. M. A. Croon, Psychology Department, Tilburg University, Tilburg, The Netherlands. W. S. DeSarbo, Graduate School of Business, Marketing and Statistics Departments, University of Michigan, Ann Arbor, Michigan 48109, U.S.A. G. De Soete, Department of Psychology, University of Ghent, Henri Dunantlaan 2, 9000 Ghent, Belgium. H. Feger, Institute for Psychology, Free University Berlin, Habelschwerdter Allee 45, 1000 Berlin 33, FR Germany. W. Gaul, Institute of Decision Theory and Operations Research, Faculty of Economics, P.O. Box 6380,7500 Karlsruhe 1, FR Germany. W. J. Heiser, Department of Data Theory, University of Leiden, Middelstegracht 4, 2312 TW Leiden, The Netherlands.
K. Jedidi, Marketing Department, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A. K. C. Klauer, Institute for Psychology, Free University Berlin, Habelschwerdter Allee 45, 1000 Berlin 33, FR Germany. D. B. MacKay, School of Business, Indiana University, Bloomington, Indiana 47405, U.S.A. B. Orth, Department of Psychology, University of Hamburg, Von-MellePark 6, 2000 Hamburg 13, FR Germany.
viii
Contributors
V. R. Rao, Johnson Graduate School of Management, Cornell University, Ithaca, New York 14853, U.S.A.
E. E. Roskam, Mathematical Psychology Group, University of Nijmegen, Montessorilaan 3, 6500 HE Nijmegen, The Netherlands. Y. Takane, Department of Psychology, McGill University, 1205 Docteur Penfield Avenue, Montreal, PQ, Canada H3A 1B1. R. van Blokland-Vogelesang, Department of Psychology, Free University, Van der Boechorstraat 1, Room 1B-69, P.O. Box 7161, 1007 MC Amsterdam, The Netherlands.
W. H. van Schuur, Department of Statistics and Measurement Theory, Faculty of Social Sciences, University of Groningen, Oude Boteringestraat 23, 9712 GC Groningen, The Netherlands. J. L. Zinnes, National Analysts, 400 Market Street, Philadelphia, Pennsylvania 19106, U.S.A.
1
INTRODUCTION
Historically, two of the most important contributions to psychological choice modeling are undoubtedly Thurstone’s (1927) Law of Comparative Judgment and Coombs’ (1950, 1964) unfolding theory. The framework that Thurstone’s Law of Comparative Judgment provides for representing inconsistent choices is still the point of departure for much of the current work in probabilistic choice modeling. In 1987 the journal Communication & Cognition published a special issue on probabilistic choice models. Several of the papers in this special issue exemplify how many of the recent probabilistic choice models are still in one way of another related to Thurstone’s general Law of Comparative Judgment. An entirely different approach to modeling individual choice was offered by Coombs in his unfolding theory. Coombs’ unfolding principle gave rise to many different unidimensional and multidimensional unfolding models, as illustrated in the 1988 special issue on unfolding of the German journal of social psychology Zeitschrift fur Sozialpsychologie. The editors of both special issues wanted to make the contributions in these issues available to a broader audience. Since the papers in the two special issues are often very much related to each other, in that some of the recent stochastic choice models are based on a geometric unfolding model or, equivalently, that some of the recent unfolding models are probabilistic, it was decided to bundle the contributions into a single edited volume. Most papers have been substantially revised since their initial publication in either Communication & Cognition or Zeitschrift fur Sozialpsychologie. The resulting volume is fairly representative of the current work in psychological choice modeling. The papers by Heiser, Feger, and DeSarbo and Rao concentrate on devising efficient methods for fitting deterministic unfolding models to nonmemc (Heiser, Feger) or metric (DeSarbo & Rao) data. In the papers by Bossuyt and Roskam, Croon, De Soete et al., Takane, Carroll et al., and Zinnes and MacKay new choice models are developed. Whereas Bossuyt and Roskam propose a new
2
De Soete, Feger, 13Klauer
unidimensional probabilistic unfolding model, De Soere er al. and Zinnes and MucKay elaborate new multidimensional probabilistic unfolding models. Takune proposes a family of stochastic models where the within-subject and the between-subject inconsistency are explicitly modeled. An attempt to formulated discrete probabilistic analogs of the unfolding model is reported by Carroll et al. Next come two papers that deal with the problem of assessing the validity of choice models. Bossuyt and Roskam discuss one approach to testing the assumptions of probabilistic models, while Orrh explains and illustrates an axiomatization of the (deterministic) Coombsian unfolding model. The remaining contributions of the volume contain some important applications of psychological choice modeling in the fields of political science and marketing research. Van Blokland-Vogelesang illustrates the use of an unfolding technique for constructing a prestige ladder, whereas van Schuur applies a specific unidimensional unfolding model to political science data. DeSarbo et al. and Gaul discuss probabilistic choice models and related tools that are applicable in consumer research. As will be apparent from the various contributions in this volume, important progress has been made in psychological choice modeling in the last few years. However, many problems remain to be solved and it is our sincere hope that this volume might stimulate other researchers to work on some of these problems.
References Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145-158. Coombs, C. H. (1964). A theory of data. New York: Wiley. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273-286.
New Developments in PsychologicalChoice Modeling G . De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
3
ORDER INVARIANT UNFOLDING ANALYSIS UNDER SMOOTHNESS RESTRICTIONS Willem J . Heiser University of Leiden, The Netherlands Unfolding analysis is shown to have firm roots in the Thurstonian attitude scaling tradition. Next the nonmetric multidimensional approach to unfolding is described, and characterized in terms of objectives proposcd for attitude scaling by Guttman. The nonmetric approach is frequently bothcrcd by a phenomenon called degeneration, i.e., the occurrence of extremely uninformative solutions with good or even perfect fit. A new way to resolve this problem, while keeping the method order invariant, follows from the introduction of smoothness restrictions on the admissible model values. The effectiveness of requiring smoothness is illustrated with an example of political attitude scaling, and with a two-dimensional analysis of differential power attribution among children. Cross validation and resampling techniques can be used for establishing the stability of the unfolding results.
1. Introduction Applications of the unfolding model, using any one of its associated techniques, have been remarkably scarce in social psychology, especially in view of the fact that this methodology has such a classic precursor: the Thurstonian attitude scaling approach (Thurstone, 1929, 1931; Thurstone & Chave, 1929; see also Thurstone, 1959). Thurstone transferred the unimodal response model familiar from psychophysics to the study of attitudes and opinions, more generally of aflectively loaded responses. The attitude score of a subject was defined as the mean or the median scale value of the attitude statements endorsed. The selection and the allocation This paper is a revised version of an article published in Zeitschrifi f i r Sozialpsychologie, 1987, 18, 220-235.
4
Heiser
of scale values to the statements was usually done in a preliminary study, in which judges had to compare them with respect to their ‘‘favorability”. The reader is referred to Edwards (1957) for an extensive discussion of the Thurstonian approach, including its quality criteria and various early variants. In modem terms, it can be characterized as a way to perform an external unfolding analysis (a name coined by Carroll, 1972), with the model of equal appearing intervals - or the method of paired comparisons - as the first stimulus scaling step, and the computation of the mean or median as a primitive method to find the ideal point, i.e., the location of an imaginary statement that would get maximal support from any particular subject, or group of subjects. After the Second World War, Thurstonian attitude measurement became more and more a curiosity. The assumed possibility to obtain unique, common scale values in the first step of the judgmentendorsement procedure had always been a matter of debate. The early evidence in a variety of attitude domains, such as attitude “toward the Negro’’ (Hinckley, 1932), “toward a particular candidate for political office” (Beyle, 1932), “toward war” (Ferguson, 1935), and “toward one’s own country” (Pintner & Forlano, 1937), seemed to be positive in the sense that very high correlations were found between sets of scale values obtained from groups of judges with widely different attitudes. However, starting with Hovland and Sherif (1952) the influential social judgment school (Sherif & Hovland, 1961; Sherif, Sherif, & Nebergall, 1965) cast serious doubts on the validity of trying to separate “cognitive” judgments - presumably elicited in the first step - from “affective” judgments - presumably elicited in the second step. Objections were raised against some of the standard practices, such as eliminating judges with extreme categorizing behavior. Evidence was found for meaningful and systematic assimilation and contrast effects, reflected in local distortions of the stimulus scale. In addition, the social judgment school called attention to other aspects of attitudinal responses, i.e., the range of statements strongly endorsed (“the latitude of acceptance”), the subset of statements strongly rejected (“the latitude of rejection”, not necessarily consisting of statements in consecutive positions along the scale), and areas of neutrality (forming “the latitude of noncommitment” in between the regions of acceptance and rejection).
Smooth Order Invariant Unfolding Analysis
5
It is important to notice that, despite these criticisms and amendments, the major constituents of the Thurstonian approach remained intact. The statements were scaled in a separate judgment procedure. Attitude was conceived as a subject specific response function with respect to these scale values. Although other aspects than location of the peak were deemed important, it was still assumed - and empirically verified - that response strength tapers off as a function of the distance from the “own stand as an anchor point’’ (Sherif et al., 1965). Meanwhile, Likert’s short-cut (Liken, 1932) had become increasingly popular. It involves the reduction of the judgment to an a priori classification of the statements into two about equally sized classes: the favorable ones and the unfavorable ones. By adjusting the scoring direction of the responses accordingly, and by using “refinements” borrowed from test theory, the concept of a statement scale value seemed to be superfluous. Indeed, it has become common practice to ask subjects directly for their evaluations of the attitude object. Only Likert’s response format survived, and statement scaling was abandoned altogether. Guttman’s (1941, 1944, 1947, 1950) contributions are much less easily summarized in a few sentences. At least three novelties that he introduced into the field of attitude measurement should be mentioned: a. A method for finding a scale based on the endorsement alone; b. Posing reproducibiliry as an explicit criterion for scale construction; c. Scaling the response categories, rather than the statements themselves; It is of some historical interest to notice that the desirability of (a), called the “response approach” by Torgerson (1958, pp. 45-48), had already been expressed at the very introduction of Thurstone’s method: “Ideally, the scale should perhaps be constructed by means of voting only. It may be possible to formulate the problem so that the scale values of the statements may be extracted from the records of actual voting. If that should be possible, then the present procedure of establishing the scale values by sorting will be superseded.” (Thurstone & Chave, 1929, p. 56). Guttman achieved (a) by using (b): the construction should be such that “from a person’s rank alone we can reproduce his response to each of the items in a simple fashion” (Guttman, 1947, p. 249). But at the same time -
6
Heiser
although this would not have been strictly necessary - he switched from the concept of a Statement point (i.e., a stimulus scale value) to the idea of characterizing each statement as a set of category points (i.e., response alternative scale values). In addition he assumed that all category points for a single statement would ideally be ordered along the scale in their “natural” order, from “strongly disagree” via “indifferent” to “strongly agree”. So in Guttman scaling each subject is characterized by a score, and each statement by some monotonically increasing curve, for which frequently a step function is used as a first approximation. By contrast, and in line with the Thurstonian tradition, the unfolding technique represents each statement as a point along a scale, and each subject as some unimodal or single-peaked curve, for which frequently the location of the peak is considered to be the parameter of most interest. The approach of this paper will be to stick to aims (a) and (b), to replace (c) with a less restrictive requirement, and to bring in again the allocation of scale values to the objects of judgments. Undoubtedly, Coombs (1950, 1964) contributed much to the conceptual development of the single-peaked response model, including coining the generic name unfolding. In particular, he convincingly argued that one should refrain from making strong assumptions about the measurement level of human judgments - within, but especially also across persons - and that metric information should be obtained through the study of scalability. However, his methods for actually fitting scaling models to any set of data at hand lacked the rigor of optimizing a single loss function (as the reproducibility criterion is called nowadays). The Nonmemc Multidimensional Scaling (NMDS) approach to unfolding, to be discussed in Section 2, does enjoy this property. However, it is frequently bothered by a phenomenon called degeneration, as shall be clarified in Section 3. Then Section 4 proposes a new approach to resolve this difficulty, based on the idea of requiring a smooth succession of reproduced values. Next, the method will be applied in Section 5 to some political attitude data, and to a small example concerning the perceived importance of power characteristics by different groups of children in a classroom setting. Finally, Section 6 discusses some of the diagnostics that can be used in connection with an unfolding analysis.
Smooth Order Invariant Unfolding Analysis
7
2. The Nonmetric Multidimensional Scaling Approach to Unfolding
The earlier formulations of the unimodal response were all onedimensional, perhaps for reasons of simplicity, or just “another manifestation of psychologists’ peculiar evaluation monomania, reducing all information to this one dimension as if people think of themselves and other objects exclusively in terms of how good or how bad they are” (McGuire, 1985, p. 242, referring to McGuire, 1984). The model can be formulated q-dimensionally right from the start, with q = 1 merely a special case. At our disposal is a table P with elements pi,, each row of which corresponds to a particular subject, or group of subjects, i (i = 1, . . . , n), whereas each column corresponds to a particular statement, or other piece of psychological material, j 0’ = 1, . , . , rn). P might contain a measure of preference or response strength, or the proportion of people in group i voting for alternative j , or any other indication of the attraction of objecr j for source i. The first objective is to assign a point y, to each object. In the onedimensional case y, is just one real-valued number that can be marked off on a line; in the two-dimensional case y, is characterized by two coordinate values that can be plotted in a plane; in the q-dimensional case y, is a location in a q-dimensional space (less easy to visualize and talk about, but the principles and notation remain the same). We may now view the response strength of source i as a function of the y,. Under the unimodal response model it is assumed that this function has a single peak, i.e., it decreases monotonically in all directions with respect to some central point xi. In addition, it is assumed that the location of the peak is specific for each source. Since response strength is maximal at the position of the central point, xi is usually called the ideal point for source i. So the model associates objects with points, and sources with single-peaked curves or surfaces that are shifted with respect to each other. These shifts, or translations, are very important. Imagine, for instance, a set of unimodal curves precisely on top of each other; then any relocation of the object points along the line, although destroying the common shape, would still account for the same information. One could make the curves more skewed, double-peaked, monotonically increasing, any shape at all, by suitable reexpressions of the values against which they are plotted. But, when the curves are shifted along the object scale, the freedom of
Heiser
8
simultaneous change of shape is reduced enormously. It was Coombs (1950) who first clearly demonstrated this property of shifted singlepeakedness. Similar properties of shifted monotonically increasing curves have been studied in depth by Levine (1970, 1972). So far the description characterizes what is common to all unfolding techniques (though some are confined to the one-dimensional case). The MDS approach now proceeds as follows. Attention is restricted to those single-peaked curves and surfaces that are a decreasing function of the distance d(xi,y,) of the object point y, from the ideal point x i . This is almost always the ordinary Euclidean distance d(Xi,Yj)
=
c (xi, [a
-yja)*
I
3
(1)
defined here on the coordinate values x, and yja for ideal points and object points respectively, where a = 1, . . . , q. A major consequence of this restriction is that the response function will always be symmetric. Suppose we connect all points that have equal attractivity for a given source. Such a contour line is called an isochrest in this context, in analogy with “isobar” and “isotherm” for lines of equal atmospheric pressure and equal temperature on a map of physical locations Weiser & De Leeuw, 1981). In the MDS approach to unfolding the isochrests are assumed to be sets of concentric circles (or spheres, or hyperspheres, for q > 2) centered at the ideal point, due to their dependence on the distance function (1). At his juncture, the set of single-peaked functions could be restricted still further, for instance by choosing the explicit model
Here xi, denotes the predicted response strength, the decay function is of the negative exponential type, the parameter pi represents the maximum of the function (attained when the ideal point xi coincides with the object point y,), and the parameter ai represents the dispersion or tolerance of source i. Both oli and pi are assumed to be strictly positive. Note that ai would be a parameter of interest to workers in the tradition of the social judgment school, as it indicates the size of the latitude of acceptance relative to the latitude of rejection. From (2) it follows that the logarithm of predicted response strength is linear in the distances, and a metric
Smooth Order Invariant Unfolding Analysis
9
unfolding technique could be based on this model feature (cf. Heiser, 1986, for a more detailed discussion hereof). Obviously, there are many more conceivable relationships between data and distances than the one expressed in (2). The nonmetric approach attempts to embrace them all by introducing an intermediate type of quantities called the pseudo-distances (a term from Kruskal, 1977). In the unfolding situation, where we deal with row specific functions, they are defined as follows. Suppose the location of the object points is fixed, and consider a candidate ideal point xi, also fixed. In order to evaluate how well the distances in this particular configuration correspond to the i’th row of the data, we compute the minimum value of the raw stress
over all values of yi, satisfying the monoronicity restrictions %; 2 h if Pij YZ. The implications of the boundary positions (the three and four points extensions) prove to be very valuable for the analysis of incomplete data, such as the grade expectations data. They can also be used in constructing the admissible paths within the algorithm to find a solution for data with error.
8. Isotonic Regions in the Multidimensional Case For k = 2, a cell in a contingency table is defined by three pairs. If these three pairs are related to only three points, e.g., AB, AC, BC, then the configuration of the boundaries A I B, A I C, B I C forms a “star”, *ABC,
46
Feger
because A I B y A I C, B I C are mid-perpendiculars of the triangle ABC intersecting in one point. There are four topologically different possibilities how these boundaries may intersect (see Figure 4), only one is compatible with the unfolding model. This one - Figure 4 I - provides six smallest open isotonic regions with boundaries that differ with respect to the points facing each other. All incompatible configurations contain at least one region representing intransitive preferences, e.g., in Figure 4 I1 the region marked with an X corresponds to C > A, B > C, but A > B.
A1 B
iV
Ill A 6
E A
A B
B A
Figure 4. Topologically different configurations of the bqundaries AIB,AlC,BIC.
An Analytical Approach to Unfolding
47
If the three pairs defining a cell are related to four or more points, e.g., AB, AC, AD, then A I B, A I C, A I D form a boundary niangle (BT). To differentiate between different forms of a BT the orientation of a boundary is defined. If this orientation is important, A 1 B means that A is oriented outwards, B I A that B is oriented outwards and A inwards (see Figure 5). The cell with zero frequency in the contingency table determines the form of the BT. If this cell is, e.g., BA, CA, D A then the form is B I A, C I A, D I A - the A-side inwards (see Figure 6).
A is outwards B is inwards
A is inwards B is outwards
Figure 5. Two different orientations of a boundary in a boundary triangle.
Coombs (1964, Fig. 7.3) reports 12 rank orders for four points on a circle. Contingency tables in this case - three or more points on a circle - contain more than one cell with zero frequency. E.g., for the Coombs data, the cells AB, CA, AD and BA, AC, D A are zero frequency cells.
48
Feger
Every boundary is oriented inwards and outwards. Then these boundaries intersect in one point. Observed rank orders: (1)
(2) (3) (4)
ABCD DABC CABD DCAB
(5) (6) (7)
BACD BDAC CBAD
Contingency table for A I B, A I C , A I D:
ABACAD DA CA AD DA BA AC AD DA CA AD DA Form of the boundary triangle:
A'
Figure 6. Identification of the form of a boundary triangle.
If three boundaries form a BT, this may be a minimal region or not; if not, it is decomposable and the decomposition leads to information on the position of intersection points relative to each other. A BT is not a minimal region if it contains a pair of boundaries which have one point in common, and this point is for both boundaries oriented inwards or for
An Analytical Approach to Unfolding
49
both boundaries oriented outwards. In these cases, another boundary intersects the BT. Let the derived form of a BT be B I A , C I A , D I A. For the pair B I A , C I A the point A is inwards, i.e., B I A - A I C which implies B I A - B I C - A I C . Thus B 1 C intersects with A I D , written ADIBC. This intersection lies between the intersection of A I D with A I B and with A I C , i.e., between *ABD and *ACD. This is written *ABD - A D I B C - *ACD. From the pair B ] A , D I A one derives *ABC - A C I B D - *ACD; from the pair C [ A , D I A it is *ABC - ABICD - *ABD. Let the form of a BT be A I C , B I D , C I D. From B I D - B I C - D I C the intersection of BC and AC, thus *ABC is obtained, and *ACD - *ABC - ACIBD is the information on the location of the intersections. 9. Quantitative Information in the Multidimensional Case Rule I : If a point A is located exclusively in those isotonic regions for which “X closer than Y” is true, than AX < AY. This is true for a space with an arbitrary number of dimensions and the proof is trivial. An illustration is given for a BT of the form A I B , A I C , D I C. C I D does not intersect those open regions in which A is located, therefore AC < AD. Rule II (comparison of diagonals): Let the following relative position of stars be observed BCD
*
ACD
*
I
i *
ABD
which implies this intersection of two pairs of boundaries:
then the distance between the inner points is shorter than the distance
Feger
50
between the outer points (here: AC < BD).
To demonstrate the validity of this rule a well-known fact is used. The location of the intersection of the mid-perpendiculars is inside a triangle if all angles are acute, it is on the hypotenuse if one angle has go', and it is outside the mangle if it has a flat angle. Let the quadrilateral be a rectangle with diagonals AC = BD. Then extend BD in the direction of B:
The JABC becomes acute, and *ABD moves toward B while *ACD 1
remains at -AC. Because ;UBCD becomes flat, *BCD moves toward A, 2 and because JBAD becomes flat *ABD moves toward C. Then the assumed configuration of stars results. This is also true if BD is extended in the direction of D. On the other hand, if AC is extended in either or both directions
ABC ABD
*
c
*
ACD
*
BCD
results.
Rule 111 (comparison of opposite sides in a quadrilateral): Let the observed configuration of stars and two other intersections be:
An Analytical Approach to Unfolding
51
ABC
BCD
*
*
ABD
AB/CD
then AB < C D and BC < AD. From ADIBC two BT may be constructed, one with C I D , passing through *BCD and *ACD, the other with A I B , passing through *ABC and *ABD. The boundary closer to ADIBC, i.e., C I D represents a side equal to C D that is longer than the one represented by A I B which is AB. To see the validity of this rule one may start with a parallelogram ABCD in which A I D and B I C as well as A I B and C I D are parallel. If C and D move toward each other A I D and B I C incline toward each other over A I B to form the BT A I B , D I A , B I C.
10. Constructing a Multidimensional Solution A solution is complete, i.e., all qualitative and quantitative information has been retrieved from the data assuming the model is valid, if the position of all intersections of boundaries relative to each other is known. All intersections in which A I B participates lie on the same line - A I B , of course. The order in which they are located on A I B can be inferred from all BTs with A I B. A solution in k = 2 consists of the set of all boundaries and the information on the positions of the intersections on their boundaries. This will be demonstrated using an example of Coombs (1964, p. 164 - his Figure 7.8 does not contain all intersections). First, the positions of all intersections on A I B are determined, considering all BTs with
AIB.
Feger
52
From (1) A I B , A I C, D I A one derives *ABC - *ABD - A B / C D From (2) A I B , A I C , E I A one derives *ABC - *ABE - ABICE From ( 3 ) A I B , A I C, D I B one derives *ABC - *ABD - ABICD which is the same information as obtained from (1); the comparison of (1) and (3) thus provides the first consistency check. All information for A I B combined leads to: *ABC
- *ABD
- ABICD - *ABE
- ABICE - ABIDE.
Every other boundary cuts A I B exactly once, at that point, of course, where its intersection with A I B is located. This makes it possible to construct the complete lattice of boundaries. To construct a solution in k > 2 the following decomposirion rule may be used. If the analysis leads to the conclusion that a k-dimensional space is needed (or preferred) to represent the data then the zero (or minimum) cell of a contingency table is defined by k + 1 pairs of points. Then every k-tuple of these pairs can be selected and represented as a configuration of boundaries in a k-dimensional space as usual. E.g., let k = 3 and the zero cell be AE, BE, C E , DE. This is equivalent to four boundary triangles (1) A I E, B I E, C I E, (2) A I E, B I E , D I E , ( 3 ) A l E , C I E , D I E , (4) B I E . C I E , D I E . Of course, A I E , B I E, C I E may be decomposed to three lines A I E - E I B, A I E - E I C, B I E - E I C. From the four boundary mangles the threedimensional configuration of the points can be inferred to be a tetrahedron containing E as an inner point. For data with error and k 2 2 one strategy is to find all acceptable solutions for k = 2, then search for the optimal combination of twodimensional spaces to form a solution in k = 3, etc. To find the best fitting solution in k = 2 one first determines for ever- boundary line separately the acceptable sequences of intersections and than tests for compatibility. As a small example for k = 2 and data with error a reanalysis of the McElwain and Keats data (see Coombs, 1964, p. 175, for the data) will be reported. The authors collected 304 rank orders of childrens preferences for four radio stations A , B, C, D . A solution in k = 1 leads to many errors. With N = 4 objects 16 boundary triangles (and four stars) are to be determined. E.g., for the pairs AB, AC, AD the cell with the lowest
An Analytical Approach to Unfolding
53
frequency is BA, AC, DA with s = 1. The corresponding boundary triangle is
implying for the line AC: *ABC - ACIBD - *ACD. The same sequence of intersections on the line AC is implied by CA, BC, D C with s = 0. The sequences of intersections on all lines that were selected because the s-values were lowest and all sequences were compatible to form a solution are:
AB: *ABD - *ABC - ABICD AC: *ABC - ACIBD - *ACD AD: *ABD - *ACD - ADIBC BC: *ABC - *BCD -ADIBC BD: *ABD - ACIBD - *BCD C D : *ACD - *BCD - A B I C D This leads to the solution A-B
I I
D-C
with AC < BD, AB < C D , AD < BC; which is equivalent to the one found by McElwain and Keats: only two (DBAC, DCBA) of the 304 rank orders are not represented in the solution (McElwain and Keats do not explicitly state the quantitative informations).
54
Feger
11. Discussion What determines the dimensionality of the solution space? A set of rank orders may be characterized by those conditions it satisfies. E.g., ABC, BAC, BCA, CBA satisfies the condition: (A,C)B is empty. (A,C)B means either AC or CA preceding B. The points in parentheses will be called conditionals, and their number will be denoted by c. Data for which a k = 1 solution exists satisfy one or more conditions of the type with two conditionals. Data fitting a k = 2 space satisfy one or more conditions of the type “ ( A , B , C ) D is empty” or “(BC,D)A is empty” and “(BA,D)C is empty” for the data in Coombs (1964, p. 157). In general, the number of dimensions necessary for an error free representation is c - 1. With increasing c the restraints on the data are relaxed, i.e., more and more rankings are compatible with a solution. Expressed differently, the dimensionality of a solution is an index of agreement among the rank orders. The agreement is not identical with an average of Kendall’s tau or his coefficient of concordance W . The kind of agreement is expressed by the formula given above indicating which objects will not be preferred under the condition stated. One may, of course, characterize the positive side of the agreement, e.g., for data perfectly represented in k = 1 there exists one object in every triple which in all rankings is preferred to at least one of the two other objects. There is no necessity to represent dimensions in a solution as ares onto which the points project their positions. It is in this respect that the present approach departs fundamentally from earlier multidimensional procedures, including the one developed by Coombs and his coworkers. But Coombs’ basic idea is maintained: the essence of a solution is the configuration of isotonic regions. The present approach allows an exact specification of what is determined by the interaction of the data and the model, and what kind of information is not available. The problems of degeneracy are thus transformed to the task of listing all possible variants of solutions. The well known Monte Carlo studies on the recovery of preestablished solutions or on the uniqueness of representations only generate an impression of the extent to which an algorithm might fail. And these studies are difficult to evaluate because they used approximate algorithms. It has to our knowledge - never been shown and it is probably not true in
An Analytical Approach to Unfolding
55
general that these approximate procedures generate solutions containing all informations which definitely are uniquely determined by the model and the data. A procedure was outlined to handle data with error. It may again be pointed out that “error” in this case results from the desire of the analyst to use fewer dimensions than necessary, not from lack of agreement in repeated measurement. Of course, other criteria for the optimality of a solution may be used than the one offered here. But to minimize the stress as in MDS programs does not, as was demonstrated in an example, prevent serious distortions - especially of the quantitative aspects of a solution - which a researcher cannot detect. The result of the analytical approach is a statement about which topological configurations of the points in a Euclidean space are compatible with the data if the model is assumed to be valid. Usually, further quantitative information not implied in the topological structure can be derived. But a numerical representation, e.g., of the coordinates of the points is not given. To offer just one could be misleading; the isotonic regions of a solution give instead some limits for admissible sets of numerical representations, and that is what the model can provide without additional assumptions.
References Coombs, C. H. (1964). A theory of data. New York: Wiley.
This Page Intentionally Left Blank
New Developments in PsychologicalChoice Modeling G.De Soete, H. Feger and K. C. Klauer (eds.) 0Elsevier Science Publisher B.V. (North-Holland),1989
57
GENFOLD2: A GENERAL UNFOLDING METHODOLOGY FOR THE ANALYSIS OF PREFERENCE/DOMINANCE DATA Wayne S. DeSarbo University of Michigan, U.S.A. Vithala R. Rao Cornell University, U.S.A. This paper is a brief description of the GENFOLD2 methodology which is a set of multidimensional unfolding models and algorithms for the analysis of preference or dominance data (cf. DeSarbo & Rao. 1984, 1986). GENFOLD2 allows one to perform internal or external analyses, constrained or unconstrained analyses, conditional or unconditional analyses, metric or nonmetric analyses, as well as providing the flexibility of specifying and/or testing a variety of different types of unfolding-type preference models including simple, weighted, and general unfolding analysis. An alternating leastsquares algorithm is utilized in the estimation of the specified parameters. The melhodology is illustrated in this paper with a set of preference data for over-the-counter pain relievers. Some future research directions are also identified. 1. Introduction
From a managerial perspective, MDS methods are typically used to provide descriptions of preferences and/or perceptions of a sample of consumers toward a set of items in a product category. While these methods can assist in identifying “best” locations for existing or new products in the perceptual space, they offer little guidance on how to specifically alter This paper is a revised version of an article published in Zeitschriftfur experimentelle und angewandfe Psychdogie, 1988.35.
58
DeSarbo & Rao
existing products or design new products. This problem of “reverse transformation” or making inferences about desired product attributes from an inspection of the resulting MDS map has limited the use of MDS methods and has plagued applied researchers for some time; see Green (1975) for a discussion of this issue. Note that, in general, there will not be a unique “reverse mapping” since many combinations of product features and other marketing mix attributes may map into a specific perceptual product position. The objective of this paper is to present the GENFOLD2 methodology (DeSarbo & Rao, 1984, 1986) developed to address the “reverse mapping” problem in the context of spatial analyses of preferential data. GENFOLD2, GENeral UnFOLDing Analysis-Version 2, which, using the Carroll and Arabie (1980) classification, analyzes two-mode, polyadic, two-way, ratio or interval or ordinal scale, unconditional or conditional (assumptions concerning the comparability of the data), complete data. GENFOLD2, like traditional unfolding models, is a spatial, distance model which allows for the estimation of two sets of points in the same space, allowing for a variety of different model specifications. GENFOLD2 is an improved, modified version of GENFOLD (DeSarbo & Rao, 1983), utilizing a more efficient algorithm and providing joint space solutions which are “nondegenerate.” One particular option involving the reparameterization of stimulus and/or row coordinates enables the researcher to “manipulate” the derived spaces in answering various questions of relevance to applied work. We first review the relevant literature on the analytical problem of unfolding. GENFOLD:! is then presented in some detail and the algorithm employed in the estimation of the parameters is discussed. The GENFOLD2 methodology is illustrated with a small set of data on preference judgments for over-the-counter pain relievers. Finally, some directions for future research are discussed. 2. Brief Review of Literature
The literature on preference models has focused on two distinct types of spatial models - Tucker’s (1960) vector model and Coombs’ (1964) unfolding model. Both models assume that subjects arrive at their preference judgments by considering a multidimensional set of stimulus
GENFOLDZ
59
characteristics, but differ in their assumptions about how subjects combine stimulus information to arrive at a judgment. Davidson (1972, 1973) and Carroll (1972, 1980) compare these two types of models and discuss the assumptions and implications of each. Examining the unfolding-type (distance) spatial models, Bennett and Hays (1960) first generalized Coombs’ (1950) unidimensional unfolding model to the multidimensional case using the Euclidean distance memc. Here, subjects are represented as ideal points in the same multidimensional space of stimuli. Several authors have proposed algorithms for estimating stimulus scale values and ideal point coordinates from preference judgments assumed to follow the unfolding model (Lingoes, 1972, 1983; Bennett & Hays, 1960; Roskam, 1973; Young & Torgerson, 1967, Kruskal, Young, & Seery, 1973; Kruskal & Carroll, 1969; Schonemann, 1970; Carroll, 1972, 1980; Spence, 1979; Greenacre & Browne, 1982; Heiser, 1981; Takane, Young, & De Leeuw, 1977). This approach of estimating both ideal points and stimuli coordinates is known as internal analysis, as opposed to external analysis methods which estimate only ideal points given the stimuli coordinates (obtained from perhaps an analysis of similarities). Carroll (1972, 1980) has introduced PREFMAP and PREFMAP2 as a series of models and algorithms to perform analyses of preference data. His methods allow the user to select between internal or external, memc or nonmetric, and unfolding or vector model analyses. Three different (nested) unfolding models can be estimated in PREFMAP and PREFMAP2; these are the simple unfolding model (which equally weights the dimensions in the space); weighted unfolding model (which provides for unequal, possibly negative weights for the dimensions); and the general unfolding model (which allows for idiosyncratic orthogonal rotation of the space for each subject). There is controversy in the literature over the desirability of constraining the weights for the dimensions of the weighted unfolding model to be positive. Carroll (1972) claims that in the weighted unfolding model, a negative wit (weight on the r-th dimension for the i-th individual) has a clear interpretation - if wit is negative, the ideal point for individual i indicates the least preferred, rather than the most preferred value, and the farther a stimulus is along that dimension from the ideal point, the more highly preferred is the stimulus. He thus argues for not constraining the
60
DeSarbo
C?
Rao
weights to be positive, Other authors such as Srinivasan and Shocker (1973) and Davison (1976) dispute the value of unconstrained analyses. Srinivasan and Shocker (1973) present a nonmetric external unfolding analysis with this model using linear programming methods including nonnegativity constraints for the dimension weights. The same constraints are provided in a metric procedure using quadratic programming described by Davison (1976). Spence (1979) presents an interesting generalization of the external unfolding model allowing for linear constraints on the stimulus space as well as ideal points of individuals. In a similar vein, Heiser (1981) formulates an internal unfolding analysis that allows for restrictions to be placed concerning the relationship between ideal points and stimuli to avoid typical degenerate solutions. The nature of constraints used by Heiser (1981) does not call for use of external information (e.g., stimulus features or individual characteristics). In this paper, we present GENFOLD2, a methodology for the GENeral UnFOLDing Analysis of preferential data. This methodology was introduced by DeSarbo and Rao (1984, 1986) to accommodate a number of different unfolding model specifications. GENFOLD2 can handle various scales of data (i.e., ratio, or interval or ordinal), and unconditional as well as conditional preference data. Further, GENFOLD2 subsumes several of the previously published unfolding models such as the simple, weighted, and general unfolding models. The specification of reparameterizstions of stimulus and/or subject coordinates is extremely flexible in the sense that the user can relate stimulus coordinates to known characteristics of stimuli and individuals’ ideal points to their background variables. Thus, the derived spaces can be “manipulated” to yield pragmatically useful results.
3. The GENFOLD2 Methodology The full GENFOLD2 model is essentially a type of general unfolding model which accommodates, for example, Carroll’s (1972) simple, weighted, and general unfolding models as special cases. It also allows for the reparameterization of stimulus coordinates and/or individual ideal points. The underlying premises for stimuli and ideal point reparameterizations are that the physical or other characteristics of stimuli should in
GENFOLDZ
61
some way “determine” the stimulus coordinates and that individual characteristics (e.g., age, gender, education, etc.) should in some way “determine” their ideal points. These premises are useful in specifying the relationships on the stimulus space and ideal points. Although our formulation specifies these relationships to be linear in parameters, one could easily approximate nonlinearities in constraints by including higher order terms (e.g., squared and cross products) if deemed essential. We will now describe the full model with the following notation. Let: i=
1, . . . , I subjects;
j =
1,
. . . , J stimuli; 1, . . . , T dimensions; 1, . . . , L subject descriptor variables; 1, . . . , K stimulus descriptor variables;
t=
I= k=
Ai, = the “dispreference value” (inversely related to preference values) the i-th subject has for the j-th stimulus;
A
=
the I X J matrix [Aij];
xil =
the t-th coordinate of stimulus j ;
Yir =
the t-th coordinate of subject i’s ideal point;
Xj
. . . ,x , ~ ) , a T x 1 vector of
= (x,~,
the j-th stimulus coordi-
nates;
Y
=
(yi 1, . . . ,y i ~ ) a, T x 1 vector of ideal point coordinates for the i-th individual:
X=
the J x T matrix [xjt];
Y=
the I x T matrix
bit];
W i = subject i’s linear (symmetric) transformation matrix; ai =
subject i’s multiplicative constant;
bi =
subject i’s additive constant;
DeSarbo & Rao
62
cf = subject i’s exponent;
fi,
=
squared distance between subject i and stimulus j ;
el, = error; Ail = the I-th descriptor variable for subject i; A=
the I x L matrix
[Ail];
a/,= the importance or impact of the I-th descriptor variable on dimension c;
a=
the L x T matrix [al,];
Bjk = the k-th descriptor variable for stimulus j ; B=
the J x K matrix [Bjk];
yk =
the importance or impact of the k-th descriptor variable on dimension t;
7=
the K x T matrix [yk].
Then, the full GENFOLD2 model can be written as: n
Aij = Aij + eij, where:
The stimulus space and individuals’ ideal points are optionally reparameterized by the relationships:
Y=Aa and
X
= By,
(2)
where a and y are matrices of order L x T and K x T respectively that are estimated. As in CANDELINC (Carroll, Pruzansky, & Kruskal, 1980) and in Three-Way Multivariate Conjoint Analysis (DeSarbo, Carroll, Lehmann, & O’Shaughnessy, 1982), these constraints can aid in the
GENFOLD2
63
interpretation of the dimensions derived (cf. Bentler & Weeks, 1978; Bloxom, 1978; Noma & Johnson, 1977; De Leeuw & Heiser, 1980; Lingoes, 1980) and can replace the post-analyses property-fitting methods often used to attempt to interpret results. GENFOLD2 attempts to estimate the desired set of constrained and/or unconstrained parameters described (i.e., some subset of: Wi, X,Y,ui, bi, ci, a,y) given A and T (the number of dimensions) using an alternating least-squares algorithm in order to minimize the weighted sum of squares objective function:
where the 6i,s are defined by the user to weight the Ai, values differently. There has been considerable research attempting to cure unfolding of its tendency toward degenerate solutions. Degenerate solutions often occur in multidimensional unfolding in a number of ways. See DeSarbo and Rao (1984) for a discussion of these approaches. The degeneracy problem in unfolding is handled in GENFOLD2 in the expression (3) of the loss function, by the inclusion of the weights. We share Heiser’s (1981) implicit theory about a possible cause for degeneracy being the error or noise in the data, and we thus provide the flexibility of the user specifying 6, differently. For example, one may define the weights as:
respectively for the two cases of no preprocessing or specific preprocessing of the Ai,-values where p is an exponent and r(Ai,) represents the row ranks (from smallest = 1 to largest = J) of the Ai,. Other weighting options are also possible. For example, one could specify 6i, = 1, V i, j , so that the “weighted” loss function reduces to the nonweighted one. Or, one could specify a bimodal or step weighting function where, say, the fist three and last three choices would be highly weighted, and all others receive low weights. The choice of the “appropriate” weighting function depends upon such factors as the preprocessing options and scale assumptions of the data, the assumptions of the conditionality of the data, the assumptions
DeSarbo & Rao
64
concerning the reliability of the different data values, and, trial and error. Also, different 6, could be specified depending upon the assumptions made concerning the reliability of the Aij collected. In addition, the value of p needs to be decided usually by trial and error, although our experience indicates that the value of p = 2 appears to work well. Table 1. Features of the GENFOLD2 algorithm
Feature
Input options
Preprocessing of A
Row center; Row center and standardize; Row and column center; Double center A and row standardize; Remove geometric mean from rows or columns; Normalize columns or rows to unit sum of squares.
Method for generating staring values (e.g.. for X)
Random start; External analysis (i.e., given X); Values given for all; A “close” start on X (i.e., MDPREF solution); A “close” values on parameters (i.e., using PREFMAP2 with X given by MDPREF).
Type of unfolding model
Simple unfolding; Weighted unfolding; General unfolding.
Type of data scale
Ratio; Intcrval; Ordinal.
Type of analysis
Extcmal
Constraints on Y
Yes; No.
Constraints on X
Yes; No.
Constraints on W’
Symmctric W i ;Diagonal W’ options on non-nonnegativity constraints.
Restrictions on ci
Unconstrained; ci = c (constant) V i; ci = 1 ‘d i.
Specifications of ai and
ai = 1, bi = 0, Vi; ai = 1, hi = b, V i ;
(X given); Internal.
= 1, bi Unconstrained; ai = a, bi = 0, V i; = a, bi = b, V i; ai = a, hi unconstrained, V i; unconstrained, bi = 0, Vi; unconstrained, bi = b, V i ; ai and bi unconstrained, V i .
ai ai ai ai
GENFOLD2
65
General Description of the Algorithm: The algorithm for estimating the various parameters in the GENFOLD2 model uses alternating leastsquares method at the core, but it includes various options making it highly flexible and versatile. The list of several features built into the program are shown in Table 1. The exact details of computation are found in DeSarbo and Rao (1984). The technical details in estimating X and Y (or a and 7) within the alternating least-squares cycle for the special case of the simple unfolding model are described in Appendix I. 4. An Illustration
A sample of I = 30 undergraduate business students of the University of Pennsylvania was asked to take part in a small study designed to measure preferences for various brands of existing over-the-counter (OTC) analgesic pain relievers. These respondents were initially questioned as to the brand(s) they currently use (as well as frequency of use) and their personal motivations for why they chose such a brand(s) (e.g., ingredients, price, availability, etc.). They were then presented fourteen existing OTC analgesic brands: Advil, Anacin, Anacin-3, Ascriptin, Bayer, Bufferin, Cope, CVS Buffered Aspirin (a generic), Datril, Excedrin, Nuprin, Panadol, Tylenol, and Vanquish. Initially, they were presented colored photographs of each brand and its packaging, together with price per 100 tablets, ingredients, package claims, and manufacturer. Each subjectkonsumer was requested to read this information and return to it at anytime during the experiment if he/she so wished. After a period of time, they were asked to make likelihood to buy/use judgments on each of the fourteen brands on an eleven point scale (0 = definitely would not buy/use, 10 = would definitely bu y/use). We conducted the GEIWOLD2 analysis of A in T = 1, 2, and 3 dimensions for the simple unfolding model with the reparameterization option where X = By assuming interval scale, row conditional input data. As such, each vector of input data was standardized to zero mean and unit variance per subject. The brand design mamx, B (not shown in the paper) has also been standardized to zero mean and unit variance. This reparameterization specification was preferred since B contains features that consumers stated (in a pretest) were important in their choice of a
66
DeSarbo & Rao
specific OTC analgesic brand. All consumers were encouraged to read this information contained in the color photographs of each brand and its packaging prior to their judgments. Based on an examination of the associated variance accounted for statistics and respective solution interpretation, the T = 2 dimensional solution was selected (weighted R 2 = 0.921) as the most parsimonious one. Figure 1 depicts the derived joint space of fourteen brands (labeled Subjects’ preferences as A-N) and thirty ideal points (labeled as “*”s). appear to be quite diverse in spanning all quadrants of the space. However, there does appear to be a somewhat larger concentration of ideal points around the two ibuprofen brands A and K and the acetaminaphon brands C, I, L, and especially M. The model fit was extremely good across the thirty subjects; for fifteen of the subjects, the variance accounted for was over 0.95 and for the remaining fifteen it was between 0.90 and 0.95. Table 2. Correlations Between Design Variables (B) and Derived Stimulus Coordinatcs (X)for the GENFOLD2 Sirnple Unfolding Model
Dimension Feature variable 1 2 3 4 5 6 7
I
I1
.905
.273 .050 -.316 .815 -.399 -.089
-.533
-.573 .181 .716 -.860 .go0
.145
The 7 impact coefficients are also represented in Figure 1 as vectors given the “regression-like” manner they impact on the brand coordinate locations. Based upon these vectors, the location of the brands, and the correlations between X and B presented in Table 2, we can easily interpret the dimensions. (These correlations will vary according to the particular orthogonal rotation utilized. No rotation was utilized for this solution. Even as such, the correlations between dimensions for X, Y, and y are
GENFOLDZ
67
low: Cor(X 1, X 2 ) = 0.152; Cor(Y1, Y 2) = 0.135; and Cor(y1, y2) = 0.022.) Dimension I separates the lower cost, higher maximum dosage aspirin brands from the higher cost, lower maximum dosage aspirin substitutes. The second dimension separates the OTC analgesics that contain caffeine from those that do not. Thus, consumer preferences appear to be based upon aspirin-nonaspirin and caffeine vs. no caffeine. It is interesting to note the lack of brands in quadrant two of the figure since there are no aspirin substitute brands with caffeine available on the market presently.
-3.5
Symbol A
B C D E F G
Brand Advil Anacin Anacin-3 Ascriptin Bayer Bufferin Cope
2.5
0.5
-1.5
Symbol
Brand
H
cvs
1
I J K L M N
Dab4 Excedrin Nuprin Panadol Tylenonl Vanquish
2 3
Symbol
4.5
Feature
4 5
Mg. of Aspirin Mg. of Acetaminophon Mg. of Ibuprofen Mg. of Caffeine Mg. of Buffered Compounds
6
him
7
Max. Dosage
Figure 1. GENFOLD2 joint space for the brands, ideal points and product features
68
DeSarbo & Rao
5. Future Research We have presented a description of the GENFOLD2 unfolding model and our alternating weighted least squares algorithm for fitting it. The methodology was illustrated using a small set of preference data for fourteen brands of pain relievers. In other papers on this algorithm (cf. DeSarbo & Rao, 1984, 1986) we have shown how the model can be employed to investigate policy simulation and to derive optimal positioning of product features to tackle the “reverse mapping” problem described early in this paper. Although we believe that the algorithm is ready for use in several research situations, more work needs to be done to investigate its behavior under several experimental and real-world conditions. Several questions can be pursued in future research. While the weighted loss function does appear tentatively to provide nondegenerate solutions, obvious questions are raised as to why. What really causes degenerate solutions in unfolding? Is it a particular (and common) form of error structure found in most data sets? Does it result from a poorly determined model or flat objective function (or loss function) response surface? Our approach seems to relieve the symptoms of the disease, but we still do not really know for certain what the disease really is. More research is needed. Another related question concerns the choice of the weighting function 6,. While some guidelines can be established to rule out certain general forms of 6 , , the choice of a specific 6i, (especially p ) remains as a trial and error procedure. While applications suggest a p of 2 for 6i, defined in expression (4), more experience with the procedure must be obtained with more data sets before this recommendation can be general. Finally, experience with more real data sets is required in order to answer many of these issues raised and to properly evaluate GENFOLD2 as a reliable methodology.
Appendix I. A Technical Description of the GENFOLD2 Algorithm for the Simple Unfolding Model The simple unfolding model with options for a reparameterization of X and Y can be stated as:
GENFOLD2
69
where:
and (A-3)
with:
The algorithm utilized to estimate parameter values X (or 7) and Y (or a) utilizes an alternating weighted least squares formulation to minimize the loss function:
where 6ij the weighting function described in DeSarbo and Rao (1984). Assuming preprocessing, starting value, control parameters (see DeSarbo & Rao, 1984) have been stipulated, the algorithm cycles between two major estimation phases:
Phase I . A Quasi-Newton Gradient Procedure to Estimate X (or 7)and Y (or a) A Quasi-Newton unconstrained algorithm (Davidon, 1959; Fletcher & Powell, 1963) is utilized to estimate the joint space to minimize a,holding ai and bi values fixed at their current values. The partial derivatives of the loss function with respect to these parameters are:
DeSarbo & Rao
70
(A-7)
(A-9)
(A-10) For sake of convenience, let's assume that the relevant parameters to be estimated are contained in the vector 0 and VO is the vector of partial derivatives for this set of parameters. Let:
r=
T(L+K);
H, = an r x r positive definite symmetric matrix, at the n-th iteration;
h,
=
optimal step length at iteration n;
S , = the search direction at iteration n. The steps of the iterative algorithm used are as follows:
1. Start with given values 00, and an r x r positive definite symmetric matrix Ho = I (identity matrix) initially. Set n = 1. 2.
Compute VO at the point 0, and set
S, = -H,V@,.
(A-1 1)
Note that the first iteration, the search directions will be the same as the steepest descent direction, VO1, providing H = I.
3. Find the optimal step length h,* in the direction S,. This is done through use of a quadratic interpolation line search procedure. Then we set:
en+, = en + h,*Sn.
(A-12)
GENFOLDZ
71
4. This new solution
On+1 is tested for optimality and for maximum number of minor iterations. That is, we see if
(a)
(On- @,+I) < TOL, or
(b)
n > M I N O R (the user specified maximum number of such “minor” iterations).
If either of these two conditions holds, this procedure is terminated. If neither holds, then we proceed to step (5). 5. Update the H matrix as:
H,+1 = H,
+ M, + N,
(A- 13)
where: (A-14)
(A-15) Q, = VO,+1
- VO,.
(A- 16)
6. Set n = n + 1 and go the step (2). Gill, Murray, and Wright (1981) provide a derivation of this procedure as well as its convergence properties. The use of this Quasi-Newton method had been favorably compared with other gradient search procedures such as steepest descent and conjugate gradient methods (Himmelblau, 1972). It was found empirically that the approximate second derivative information can aid in speeding up convergence, especially when one was near the optimal solution. In addition, since the fist step of this algorithm is a steepest descent search, one could take advantage of a steepest descent search when initially far away from the optimal solution (empirical research demonstrates that steepest descent is best used in early iterations when far from the optimal solution). Note that there is an indeterminacy with respect to the parameters X and Y in that one can define:
DeSarbo & Rao
72
X* = X T Y* =YT, where T is an orthogonal transformation (T’T = TT’ = I), and still pro* duce the same Ai, values as defined in (A-2). This particular indeterminacy is important when conducting configuration matching analyses to compare the solutions of two different simple unfolding analyses.
Phase 2. A Weighted Least Squares Procedure to Estimate ai and bi Let’s first define: *
(A-17)
Then, current estimates of ai and bi can be obtained by performing I * * and a column of 1’s: separate regressions of A$ on Ai,
,.
bi = (LiLi)-’LfM,
(A-18)
where:
Li =
((19
i:)),
Mi = ((A?)),
i;
with a J x 1 vector of for subject i, and 1 a J x l vector of 1’s. Thus, the algorithm cycles back and forth between Phases 1 and 2 until either convergence in the value of the loss function is achieved or until one utilizes more major iterations or cycles than the user stipulates as maximum.
References Bennett, J. F., & Hays, W. L. (1960). Multidimensional unfolding: Determining the dimensionality of ranked preference data. Psychometrika, 25, 27-43. Bentler, P. M., & Weeks, D. G. (1978). Restricted multidimensional
GENFOLDZ
73
scaling models. Journal of Mathematical Psychology, 17, 138-151. Bloxom, B. (1978). Constrained multidimensional scaling in N spaces. Psychometrika, 43, 397-408. Borg, I., & Lingoes, J. C. (1980). A model and algorithm for multidimensional scaling with external constraints on the distances. Psychometrika, 45, 25-38. Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard, A. K. Romney, S . Nerlove (Eds), Multidimensional scaling: Theory and applications in the Behavior Sciences (Vol. I). New York: Seminar Press. Carroll, J. D. (1980). Models and methods for multidimensional analysis of preferential choice (or other dominance) data. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 234-289). Bern: Huber. Carroll, J. D., & Arabie, P. (1980). Multidimensional scaling Annual Review of Psychology, 31, 607-649. Carroll, J. D., Clark, L. A., & DeSarbo, W. S . (1984). The representation of three-way proximity data by single and multiple tree structure models. Journal of Classifrcation, 1, 25-74. Carroll, J. D., Pruzansky, S . , & Kruskal, J. B. (1980). CANDELINC: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters. Psychometrika, 45, 3-24. Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 148-158. Coombs, C. H. (1964). A theory of data. New York: Wiley. Davidon, W. C. (1959). Variable metric method of minimizations. Argonne National Laboratory Report Number ANLE-5990. Davidson, J. A. (1972). A geometrical analysis of the unfolding model: nondegenerate solutions. Psychometrika, 37, 193-216. Davidson, J. A. (1973). A geometrical analysis of the unfolding model: general solutions. Psychometrika, 38, 305-336. Davison, M. L. (1976). Fitting and testing Carroll’s weighted unfolding model for preferences. Psychometrika, 41, 233-247. De Leeuw, J., & Heiser, W. (1980). Multidimensional scaling with resmctions on the configuration. In P. R. Krishnaiah (Ed.), Multivariate analysis V (pp. 501-522). Amsterdam: North-Holland. DeSarbo, W. S., & Carroll, J. D. (1981). Three-way memc unfolding. In Proceedings of the 1981 TIMSIORSA Market Measurement
74
DeSarbo & Rao
Conference, Providence, Rhode Island: Management Science. DeSarbo, W. S., & Carroll, J. D. (1983). Three-way unfolding via weighted least-squares. Unpublished Memorandum, AT&T Bell Laboratories, Murray Hill, NJ. DeSarbo, W. S., Carroll, J. D., & Green, P. E. (1984). An alternating least-squares procedure for the estimation of missing preference data in product concept testing. Unpublished Memorandum, AT&T Bell Laboratories, Murray Hill, NJ. DeSarbo, W. S., Carroll, J. D., Lehmann, D., & O’Shaughnessy, J. (1982). Three-way multivariate conjoint analysis. Marketing Science, 1, 323350. DeSarbo, W. S., & Rao, V. R. (1983). A constrained unfolding model for product positioning. In Proceedings of the 1983 ORSAITIMS Marketing Science Conference, Los Angeles, California. DeSarbo, W. S., & Rao, V. R. (1984). GENFOLD2: A set of models and algorithms for the GENeral UnFOLDing analysis of preferenceldominance data. Journal of Classification, 1 , 147-86. DeSarbo, W. S., & Rao, V. R. (1986). A constrained unfolding methodology for product positioning. Marketing Science, 5, 1-19. Fletcher, R., & Powell, M. J. D. (1963). A rapidly convergent descent method for minimization. Computer Journal, 6, 163-168. Gill, P. E., Murray, W., & Wright, M. H. (1981). Practical Optimization. New York: Academic Press. Green, P. E. (1975). Multivariate tools for applied multivariate analysis. New York: Academic Press. Greenacre, M. J., & Browne, M. W. (1982). An alternating least-squares algorithm for multidimensional unfolding. Presented at the 1982 Joint Meeting of the Psychometric and Classification Societies, Montreal, Canada. Heiser, W. J. (1981). Unfolding Analysis of Proximity Data. Doctoral Dissertation, University of Leiden, The Netherlands. Himmelblau, D. M. (1972). Applied nonlinear programming. New York: McGraw-Hill. Kruskal, I. B., & Carroll, J. D. (1969). Geometric models and badnessof-fit functions. In P. R. Krishnaiah (Ed.), Multivariate Analysis II. New York: Academic Press. Kruskal, J. B., Young, F. W., & Seery, J. B. (1973). How to use KYST,
GENFOLDZ
75
a very flexible program to do multidimensional scaling and unfolding.
Unpublished Memorandum, Bell Laboratories, Murray Hill, NJ. Lingoes, J. C. (1972). A general survey of the Guttman-Lingoes nonmetric program series. In R. N. Shepard, A. K. Romney, and S. Nerlove (Eds.), In Multidimensional scaling: Theory and applications in the behavioral sciences (Vol. I). New York: Seminar Press. Lingoes, J. C. (1983). The Gutman-Lingoes nonmetric program series. Ann Arbor: Mathesis Press. Noma, E., & Johnson, J. (1977). Constraining nonmemc multidimensional scaling configurations. Technical Report #60, University of Michigan, Human Performance Center. Roskam, E. E. (1973). Fitting ordinal relational data to a hypothesized structure. Technical Report #73MA06, University of Nijmegen, The Netherlands: Schonemann, P. H. (1970). On memc multidimensional unfolding, Psychometrika, 35, 349-366. Spence, I. (1979). A general metric unfolding model. Paper presented at the 1979 Psychometric Society Meetings, Monterey, CA. Srinivasan, V., & Shocker, A. D. (1973). Linear programming techniques for multidimensional analysis of preferences. Psychometrikn, 38, 337-369.
Takane, Y., Young, F. W., & De Leeuw, J. (1977). Nonmemc individual differences multidimensional scaling: An alternating least-squares method with optimal scaling features. Psychometrika, 42, 7-67. Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S . Messick (Eds.), Psychological scaling: Theory and applications. New York: Wiley. Young, F. W., & Torgerson, W. S . (1967). TORSCA: A Fortran IV Program for Shepard-Kruskal multidimensional scaling analysis. Behavioral Science, 12, 498.
This Page Intentionally Left Blank
New Developments in Psychological Choice Modeling G . De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
77
MAXIMUM LIKELIHOOD UNIDIMENSIONAL UNFOLDING IN A PROBABILISTIC MODEL WITHOUT PARAMETRIC ASSUMPTIONS Patrick M,Bossuyt Erasmus University, Rotterdam, The Netherlands Edward E. Roskam University of Nijmegen, The Netherlands This paper presents a new probabilistic unidimensional unfolding procedure for paired comparisons data. This procedure is related to a probabilistic unfolding theory in which a nonparametric random ideal coordinate assumption is added to the familiar unidimensional unfolding assumptions. The procedure can be used to find a maximum likelihood sequencing of alternatives or their midpoints, based on choices of a single subject or a group of subjects. It requires a seriation strategy and the calculation of maximum likelihood binomial probability estimates under order restrictions. Algorithms are presented for both purposes. The unfolding procedure can easily be modified to suit related probabilistic unfolding theories.
1. Introduction The large appeal of Coombs’ (1950, 1964) unfolding theory can be likely attributed to the attractive plausibility of its main ideas. According to the unfolding theory a subject in a choice situation compares the available alternatives with an ideal alternative and chooses the alternative least dissimilar from this ideal. The ideal can be subjective, but some
The research reported in this paper was supported by Grant No. 40-30 of the Dutch Foundation for the Advancement of Pure Science Z.W.O. This paper is a revised version of an article published in Zeitschruf fur Sozialpsychologie, 1987, 18, 282-294.
I8
Bossuyt & Roskam
intersubjective cognitive structure is expected to exist in the pattern of dissimilarities between the alternatives. Representing dissimilarities as distances Coombs has proposed an “unfolding” procedure, based on these two notions. This procedure is elegant in its simplicity and leads to the construction of an underlying unidimensional sequencing of the alternatives and a partial order on the distances, out of a set of subjective preference rankings. Greenberg (1965) proposed a closely related procedure to be used with paired comparisons data. In spite of the simplicity of both procedures successful applications of Coombs’ and Greenberg’s unfolding procedures are infrequent. Both procedures require all choices or rankings to be perfectly consistent with a pattern of distances in the underlying unidimensional space. In practice this appears to be a very strong necessary condition. Violations of this condition, however small they may be, and however likely they are to occur, cannot be handled in a satisfactory way. Several authors have relaxed the consistency requirement by adding probabilistic assumptions to the unfolding theory. Examples of probabilistic unfolding theories for binary choices have been presented in the literature by authors as Bechtel (1968), Coombs, Greenberg & Zinnes (1961), DeSarbo & Hoffman (1986), Croon (in press), Ramsay (1980), Schonemann & Wang (1977), Sixtl (1973), and Zinnes & Griggs (1974). In this paper a new unfolding procedure for paired comparisons data is presented. The procedure is related to a simple, nonparametric probabilistic unidimensional unfolding theory. In a way, it is the probabilistic successor of Greenberg’s (1965) proposal. The procedure differs from Greenberg’s in that it requires a conditional estimation of binomial choice probabilities, subject to a hypothesized underlying sequencing of the interalternative midpoints. A maximum likelihood seriation strategy is then adopted in finding the most plausible underlying sequencing. The second section of this paper contains a description of what we will call the probabilistic midpoint unfolding theory. It describes the assumptions and the resulting ordinal restrictions on the choice probabilities for the case of a single subject. In the third section these results are extended to data from a population of subjects. In the fourth section the estimation of choice probabilities under order restrictions is discussed, and an algorithm for finding maximum likelihood estimates is presented. In
Maximum Likelihood Unidimensional Unfolding
79
the fifth section a branch and bound scheme is proposed for finding the best underlying sequencing, using the maximum likelihood principle. The paper concludes with a brief comparison of model and procedure with other probabilistic unfolding models for paired comparisons.
2. Probabilistic Midpoint Unfolding 2.1 A Single Subject The theory is intended for the familiar paired comparisons task in which the “no-choice’’ option has been eliminated. This means that each pair of elements { x , y } of a set S of s alternatives has been presented nV times, of which x has been chosen kV times and y kyx times, where kV + k,, = nV. As data, or model of the data, we have then a binary choice frequency structure d , k >
2.1 .I Assumptions The following set of four assumptions defines the theory. A. 1 (Choice based on dissimilarities) In making a choice, the subject has picked the alternative least dissimilar to the ideal alternative Z.
A.2 (Subjective metric) The dissimilarities in A . l can be represented by a distance function. Let S’ = S u { z } . A metric d then can be defined on S’ x S’ such that x is chosen out of { x , y } if and only if the distance between the ideal z and x does not exceed the distance between z and y: d,, c dzy. A.3
(Unidimensionality) The metric space (S’,d) can be mapped into a metric line. For every two elements x , y E S’, there exist real-valued coordinates x , y such that the distance between these elements can be expressed as dxy = I x - y I . (We use no additional notation in distinguishing between a point on the metric line and its coordinate.)
A.4
(Random ideal coordinate) The coordinate of the ideal alternative on the metric line is a random variable Z with a cumulative dismbution function H ( x ) = Pr(Z I x ) .
Bossuyt & Roskam
80
AS
(Nonidentical coordinates) All distances between the alternatives in A.2 are nonzero: for every two elements x, y E S’: dxy > 0.
The assumptions A. 1 to A.3 define the unidimensional unfolding theory as proposed by Coombs (1950, 1964). Choices are seen as resulting from a comparison of dissimilarities, and these dissimilarities can be represented as distances. The assumption A.4 is added in order to accommodate for small inconsistencies in unidimensional unfolding. Here we assume that the origin of the inconsistencies can be found in a random variability in the distances d,,, which is itself a consequence of a random uncertainty in the location of the ideal on the metric line. For a review of alternative probabilistic assumptions in unfolding see Croon (in press). A similar set of assumptions has been used by a number of authors who have proposed a related theory on probabilistic unfolding (Bechtel, 1968; Jansen, 1981; Sixtl 1973). However, these authors specified the exact functional form of the cumulative distribution function in their version of assumption A.4. Bechtel (1968) for example assumed that a cumulative normal distribution function was always appropriate, whereas Jansen (1973) and Sixtl (1973) proposed the logistic function. Those strong parametric assumptions will not be needed in the present approach, because only the assumed existence of a function H together with its monotonic nondecreasing property will be used. The fifth assumption A S is added to avoid problems in the representation of the dissimilarities. Together the assumptions A.l to A S define what will be called the probabilistic midpoint unfolding (PMU) theory. In line with the use of terminology advocated elsewhere (Bossuyt & Roskam, 1987) a “PMU model” will be a structure of the appropriate type in which the assumptions of the theory are satisfied. In this case a PMU model will be a binary choice frequency structure for which there exist a set of coordinate values and a cumulative distribution assumption such that assumptions A.l to A.4 are satisfied. It is difficult if not impossible to define a set of necessary and sufficient conditions on a choice frequency function k to guarantee the existence of a PMU model. However, such a set of conditions can be defined on a structure of choice probabilities. This result will be the kernel of our approach in constructing a PMU model. Given a choice frequency structure we will look for maximum likelihood estimates of the
Maximum Likelihood Unidimensional Unfolding
81
choice probabilities satisfying this set of necessary and sufficient conditions. If estimates of these probabilities are available the problem of finding values for the alternative coordinates and a distribution function for the ideal coordinate can be solved easily.
2.1.2 Binary Choice Probabilities Assumption A.4 of the midpoint unfolding theory is basically a probabilistic choice assumption. It implies that each choice out of a pair of alternatives { x , y } can be regarded as the result of an independent Bernoulli mal, where x has a binary choice probability (BCP) pq of being chosen. As a consequence, the choice frequency kq is a value from a binomial distribution with parameters (nq, pxy).The structure <S,p > will be called a BCP structure. The following relation holds for these binary choice probabilities:
p,=Pr(IZ-xI
I IZ-yl).
(1)
This is a simple result of the four assumptions made earlier. Relation (1) can be reformulated using the concept of a midpoint. The midpoint between two points x and y is defined as the point mv of the metric line for which the distances dmVX= dmVYare equal. Its coordinate 1
value is then defined as mv = - ( x + y ) . If we refer in the following to 2 the midpoints in S all midpoints mq between nonidentical elements x , y of S will be meant. Equation (1) now becomes: x for a binary choice frequency structure 4, k > satisfies midpoint monotonicity.
2. There exists a PMU model for cS,k>.
Proof: The first condition follows from the second as a simple consequence of the relation (2). To show that the second follows from the first take a set of coordinates satisfying the inequalities (3) derived from the sequencing of the midpoints in the midpoint order. Then define an arbitrary cumulative distribution function such that (2) is satisfied for all midpoints in S. As S is finite, this poses no problem. Theorem 1 will be central in our unfolding technique. If the unfolded midpoint order is given the choice probabilities are known to be monotonic with respect to this order. Consequently, the estimates of these
Maximum Likelihood Unidimensional Unfolding
83
probabilities will have to satisfy the corresponding ordinal restrictions. An algorithm to calculate these estimates is presented later on. If the unfolded midpoint order is not known we can, for every possible midpoint order, obtain the corresponding probability estimates subject to midpoint monotonicity. A midpoint order is then defined to be a maximum likelihood unfolded midpoint order in S if there is no other midpoint order for which the maximum likelihood estimates under midpoint monotonicity result in a higher value of the likelihood for < S , k > . One more result will be of use. If there exists a PMU model and the choice probabilities are organized in a matrix with the row and column indices arranged in the unfolded order, then the elements in each row of the resulting matrix do not increase from the left toward the main diagonal and do not decrease from the main diagonal to the right. This pattern has been called characteristic monotonicity by Dijkstra, Van der Eijk, Molenaar, Van Schuur, Stokman, and Verhelst (1980). An example of a BCP matrix satisfying characteristic monotonicity can be found in Table 2. Basically, a BCP structure 4 , p > satisfies characteristic monotonicity if there exists a permutation of the alternatives in S such that for each triple of alternatives w ,x , y , pwx I pwy I pxy whenever w precedes x and x precedes y in this permutation. Midpoint monotonicity then implies characteristic monotonicity. The reverse does not hold. For an example, take the choice probabilities in Table 2. If we rank them and take the corresponding monotonic permutation of the midpoints, the result is not a midpoint order because the resulting inequalities (3) on the coordinates are inconsistent. However, if the set S contains five elements or less, characteristic monotonicity always implies midpoint monotonicity. 2.2 A Population of Subjects In most paired comparisons applications the alternatives are presented to more than one subject. This occurs when the population of interest for the analysis consists of m subjects. Given this, there exists a wide variety of research designs for this multiple paired comparisons task. Not all designs use the same sampling procedure for the subjects. We will distinguish between the following two sampling schemes.
Bossuyt & Roskam
84
(SS.l) All pairs are presented at least once to each subject.
(SS.2) On each presentation of a pair of alternatives a subject is randomly sampled from the population. Each subject had a probability pzi of being selected, with m
Pzi = 1. i =1
Both sampling schemes imply that we have as data a set of binary choice frequency structures d , k > (i = 1, rn) as described earlier.
2.2 .I Assumptions We start by assuming that for each subject i (i = 1, rn) the assumptions A.l to A S defined earlier hold. This means that for each subject there exists a PMU model with coordinates xi, yi, rnqi and a cumulative distribution Hi function for the ideal coordinate. It is typical for the unfolding theory to assume some additional structure relating the metrics in the individual PMU models. This assumption follows from the premise that a considerable degree of intersubjectivity is to be expected in the dissimilarity pattern for the alternatives. In deterministic unidimensional unfolding (Coombs, 1964) either one of the following two assumptions is made. A.6
(Joint unfolded order) There exists a permutation of the elements in S that is an unfolded order for each subject i (i = 1, rn).
A.7
(Joint unfolded midpoint order) There exists a permutation of the midpoints in S that is an unfolded midpoint order for each subject i (i = 1, m).
It will be clear that A.7 implies A.6 but not conversely. Both follow from the stronger assumption that the memcs of all subjects are proportionally related. The latter is frequently assumed in probabilistic unfolding; it will not be needed in the present approach. The assumptions A.l-A.6 or A.l-A.7 define the joint probabilistic midpoint unfolding (JPMU) theory. In the following subsection we will examine necessary and sufficient conditions on the binary choice probabilities for the existence of a JPMU model.
Maximum Likelihood Unidimensional Unfolding
85
2.2.2 Binary Choice Probabilities We will have to distinguish between situations in which the first sampling scheme SS.1 has been followed and situations in which the second scheme SS.2 has been adopted. We start with the former. Suppose the assumptions A.l to A.7 hold. In that case there exists a PMU model for each subject. Assumption A.7 then implies that midpoint monotonicity is satisfied in all individual BCP structures for the joint unfolded midpoint order. Obviously midpoint monotonicity is also satisfied in every individual BCP structure if assumption A.6 holds, but a joint unfolded midpoint order does not necessarily exist. Yet through assumption A.6 all individual unfolded midpoint orders have to be related. More specifically, characteristic monotonicity has to be satisfied in each BCP structure for the joint unfolded order. These results are still valid in case sampling scheme SS.2 has been followed. Yet the construction of a JPMU model may be severely handicapped if a large number of the n,i are zero. In that case several probabilities pxyi cannot be estimated. In the extreme situation where every subject has chosen out of only one pair of alternatives all midpoint orders will be equivalent in terms of likelihood, because only one subjectdependent choice probability can be estimated. To deal with these situations we will follow a different approach. If sampling scheme SS.2 has been adopted, the binary choice probability p , that x is chosen by a subject selected at random can be expressed as
We will now formulate the necessary conditions for the existence of a JPMU model on the “joint” choice probabilities pq. Since we assume that a PMU model exists for each subject, the individual BCP structures satisfy midpoint monotonicity. If there exists a joint unfolded midpoint order (A.7), the monotonicity is preserved in the joint BCP structure through addition (equation (4)).In a similar way it can be shown that the joint BCP structure satisfies characteristic monotonicity for the joint unfolded order if assumption A.6 holds.
Bossuyt & Roskam
86
Recapitulating, the following strategies may be followed. If assumption A.6 holds, there are two sampling-dependent strategies.
(SS.l) For each subject, find the unfolded midpoint order using midpoint monotonicity on the choice probabilities under the condition that characteristic monotonicity holds within each BCP structure for the joint unfolded order. (SS.2) Find the joint unfolded order by using characteristic monotonicity on the joint choice probabilities. If assumption A.7 is assumed to hold, these strategies are altered as follows.
(SS.l) Find the joint unfolded midpoint order by using midpoint monotonicity on every individual BCP structure. ( S S . 2 ) Find the joint unfolded midpoint order by using midpoint monotonicity on the joint choice probabilities.
3. Estimation of the Probabilities In this section a general algorithm will be described to find the maximum likelihood estimates of binomial probabilities under order restrictions. As data we have a binary choice frequency structure 4 , k > . Let T be a set of ordered pairs ( x , y ) of the set of alternatives S. For each pair of alternatives x , y E S either ( x , y ) is a member of T , ( y , x ) is a member of T , or ( x , y ) nor 6 , x ) is a member of T. The probabilistic assumption A.4 implies that the binary choice frequencies k, are values from a binomial distribution with parameters (n,, p,). Let 4, be the choice proportion of x in { x , y } , 4, = k,/n,,. If a function g assigns estimates of the BCP p in T , then the log likelihood of the choice frequencies in T is equal to the function
denotes summation over all elements x , y of
plus an additive constant. T
T. Let R be a reflexive, transitive binary relation on T. The relation R then establishes a partial order on the set T. Assume the estimates gzy of
Maximum Likelihood Unidimensional Unfolding
87
the BCP p , in T are known to satisfy the following restrictions: ,g
XYRVW
5 gvw'
(6)
The problem of finding the maximum likelihood estimates of the probabilities p , in T , conditional on R , consists of finding the function f that maximizes the likelihood ( 5 ) within the set of all functions g satisfying the restrictions (6). If the ordinal restrictions are satisfied by the choice proportions, the latter are the conditional maximum likelihood estimates. If they are not, some other function f satisfying the order restrictions and maximizing the likelihood has to be found. We define two functions n and q on the power set of T. For each subset B c T , B
which implies that n~ contains the sum of presentations and qB the weighted average of the choice proportions in B . The basic principles of the algorithm are embodied in Lemma 1 and Lemma 2.
Lemma 1. If, within a subset B likelihood LB is maximized for fxy
c T,
= 40 vXY
all estimates are equal, then the
B.
Proof: Let f, = s for a real s within B . Obviously, if fxy = s, the function f satisfies the restrictions (6) in B . The likelihood LB then can be expressed as a function of s:
LB( k :s) = C [q, In s
+ ( 1 - qxy) In ( 1 - s)] nxy
B
= qB Ins
+ (1 - qB)ln(l - s).
(7 )
It is well known that the function L B ( s ) reaches a unique maximum at s = 40.0
Bossuyt & Roskarn
88
For purpose of what follows one last piece of terminology needs to be introduced. We will call a partition of a subset B c T into k subsets Bj (k > 1) an R-consistent partition of B if for every two subsets Bj, B, with qBi c q ~ there ~ , are no elements ( v , w ) of Bi and ( x , y ) of B, for which
(x,y)R( v , w ) . Call such a partition the greatest R-consistent partition of B if there does not exist an R-consistent partition for any of the subsets Bi in this partition.
Lemma 2. Let ,g = 40 for all ( x , y ) in a subset B
c T.
The following
two statements are equivalent.
1. There exists an R-consistent partition of B. 2. There exists a function f on T satisfying (6) such that f increases the likelihood in B: LB(k :f)> LB(k :g).
Proof: First we show that (2) follows from (1). Because of k
qB=zi=l
“B
and the convexity of the function (7),the result follows. To show that (1) follows from (2), create a partition of B by assigning two elements (x,y), (v,w) to the same subset Bi if and only if fxy = f,,. Since f satisfies the restrictions (6), the resulting partition is R-consistent. Set h, = qBi for all (x,y) in each subset Bi in this partition. Then, by Lemma 1, LB(k : h ) > LB(k :g). 0 The following theorem now can be proven. Theorem 2 . If for a subset B
c T either there
1. exists an R-consistent partition of B, this partition is the greatest Rconsistent partition, and fv = q ~ , ,for , all (x,y) in each subset Bi in this partition, or 2.
there does not exist an R-consistent partition, and f, = k Y > in B ,
for all
then the function f maximizes the likelihood in B.
Proof: Suppose there exists a function g on the elements of B such that L ~ ( k : g>) L ~ ( k : f ) Through . Lemma 2, the latter implies that there exists
Maximum Likelihood Unidimensional Unfolding
89
an R-consistent partition of this subset Bi. Since this contradicts the assumptions, such a function g does not exist: f maximizes the likelihood in B . 0 By Theorem 2 the problem of finding the function satisfying (6) and maximizing ( 5 ) can be solved by finding the greatest R-consistent partition of T , if there exists one. In the algorithm we propose, the set T is partitioned (not necessarily R-consistent) into two subsets, say T1 and T2. Initially T I contains only one element. Then, one by one, the elements of T z are transferred to T 1 , and each time the greatest R-consistent partition of the new T I is found. If, finally, T I = T , the maximum likelihood estimates in T have been found. A more detailed description of this algorithm can be found in Bossuyt (1987). Both the definition of the likelihood and the binary relation R refer to the case of a single subject. However, the extension to the case of a group of subjects with sampling scheme SS.l or SS.2 is straightforward. For sampling scheme SS.2, the algorithm is applied to the joint frequencies. For sampling scheme SS.1, the algorithm is repeated for each of the m frequency structures, for the same set T and the same relation R . It would be interesting to have a statistical test of the hypothesis that the probabilities satisfy the order restrictions as defined by (6) versus the alternative that they do not. A generalized likelihood ratio test seems indicated, since the maximum of the likelihood can be calculated both conditionally and unconditionally. Unfortunately the distributions of the statistic under the null hypothesis cannot be traced easily. For large sample frequencies, this distribution is a weighted chi-square distribution, but the weights for characteristic and midpoint monotonicity are hard to obtain (Robertson, Wright, & Dykstra, 1988). To overcome this difficulty we suggest a nonparametric estimation of the relevant quantiles of these distributions using order statistics. This can be done by calculating the value of the likelihood ratio for a large number of binary choice frequency structures generated by Monte Carlo simulations for parameters satisfying the order restrictions. A test with an approximate size a can then be based on the estimated 1 - CL quantile of the distribution of these values.
90
Bossuyt & Roskam
4. The Maximum Likelihood Unfolded Order In this section a branch and bound algorithm is described to find the maximum likelihood unfolded midpoint order in the case of a single subject and the maximum likelihood unfolded order in the case of a group of subjects. Again the extension to the remaining cases will be comparatively straightforward. A branch and bound algorithm guarantees that the resulting solution is optimal because it evaluates all possible permutations at least implicitly. The branch and bound principle will be described first in its general form. In the remaining subsections the details for the case of the unfolded order and the unfolded midpoint order will be specified. 4.1 Branch and Bound The algorithm first calculates the value of the likelihood under no order Then by some suboptimal method an initial permutarestrictions La. tion is generated. The corresponding conditional estimates are found, and the value of the likelihood function is calculated. If this value equals L-, the initial permutation is a maximum likelihood solution and the algorithm stops. If the likelihood for the initial permutation is lower than L-, its value is stored as LCut,and the initial permutation is stored as a provisional solution. Next the algorithm generates a permutation tree. An example of a permutation tree for a set of five elements is given in Figure 1. Let rbe the number of elements in the required permutation. Except for the branch at level 1, each branch in this tree corresponds to a subset of permutations in which the (r - 1) leftmost elements are as specified by the labels on the node. For example, a path through branches a at level 2, branch b at level 3, and branch c at level 4 in Figure 1 corresponds to the subset of all permutations in which abc are the three leftmost elements: abcde and abced. Consequently, the branch at level 1 corresponds to the set of all permutations of the elements in S, and a down to level r - corresponds to one permutation only. The algorithm looks for possible improvements on the initial solution in the following way. Starting from level 1 it consecutively mes to establish a path along branches between nodes down to the level r-. From a branch at level r all branches to the branches at level r + 1 are examined
Maximum Likelihood Unidimensional Unfolding
Level 1 Level 2 Level 3 Level 4 Level 5
91
I
i
lA,,c, 4,lA,J-.LiAAAi
d e c e c d
d e b e b d
A A A A A A i A r
c e b e b c c d b d b c d e c e c
Figure 1. A partial look at a permutation tree for five elements.
for feasibility. One device for evaluating feasibility common to all branch and bound schemes is to calculate the upper bound of the likelihood in the subset of permutations that corresponds to the path down to the branch at level r + 1. This maximum can be calculated using the restrictions on the estimates that are shared by all permutations in the corresponding subset. If the upper bound of the likelihood is lower than the current cutoff value LCuf,the branch is discarded. No element of the corresponding subset of permutations will lead to an improvement on the current provisional solution. The algorithm then continues to evaluate the remaining branches from level r to r + 1. If the upper bound is higher or equal than the current cutoff value LCuf,the procedure is repeated at level r + 1. If the algorithm arrives at level r - , the upper bound is equal to the likelihood for the probabilities corresponding to a single permutation. If this likelihood is equal to the cutoff criterion, the permutation is equal to the provisional solution in terms of the likelihood. It is stored, and the search continues. If the likelihood is higher than the cutoff criterion, the permutation replaces the provisional solution(s) and the corresponding value of the likelihood function becomes the new cutoff criterion L,,,. If this new criterion equals L-, the provisional solution is a maximum likelihood solution and the search stops.
92
Bossuyt & Roskam
If all branches from level r to r + 1 have been evaluated, the algorithm backtracks to level r - 1 along the path and checks whether all branches from this level have been evaluated. If not, the next branch is examined. Otherwise, the algorithm backtracks to the branch at level r - 2 along the path. If, ultimately, the algorithm has backtracked to level 1 and all branches have been examined for feasibility, the current provisional solution has to be a maximum likelihood solution. 4.2 Characteristic Monotonicity When looking for a maximum likelihood permutation of the alternatives in S, the permutation tree contains as many levels as there are elements in S. An initial solution can be found by selecting the smallest choice proportion, say (lab, and taking a and b to be the first two elements in the permutation. A sequencing of the remaining alternatives can be based on the rule: v precedes w if pov Ip,. This rule has been described by Greenberg (1965) who refers to a suggestion from Coombs. Before calculating the upper bound of the likelihood to examine the feasibility of a branch, the algorithm first checks for the presence of permutations in the corresponding subset that have already been implicitly evaluated. Since the permutations abcde and edcba will lead to equivalent values of the likelihood, only one of them needs to be evaluated. If all permutations in a subset have been evaluated at an earlier stage, the branch can be discarded. The upper bound under characteristic monotonicity in a subset of permutations is calculated using the algorithm described in the previous section. We will illustrate the construction of the set of ordered pairs T and the binary relation R by an example, with S = {a,b,c,d,e}. Suppose a path along the branches a,b has turned out to be feasible and the branch to the branch c at level 4 has to be examined. The set T is then composed of the ordered pairs ( a h ) (a,c) ( a , d ) ( a , e ) (b,c) ( b , d ) ( h e ) (c,d) (c,e). The relation R contains as elements
Maximum Likelihood Unidimensional Unfolding
93
4.3 Midpoint Monotonicity
When looking for a maximum likelihood midpoint order the permutation 1 tree has -s(s - 1) levels: the number of midpoints in S. An initial solu2 tion is found by taking a maximum likelihood permutation under characteristic monotonicity, and using the following rule: mwx precedes mwy if w precedes x and x precedes y in the unfolded order. If this permutation of the midpoints is a midpoint order, a maximum likelihood solution has been found. If the number of elements in S does not exceed five, the permutation will always be a midpoint order. If the number of elements in S exceeds five and the permutation is not a midpoint order, some midpoint order consistent with the maximum likelihood permutation under characteristic monotonicity is arbitrarily selected. Three devices are used to evaluate the feasibility of a branch. A branch is discarded a. if all permutations in the subset have been explicitly or implicitly evaluated, or b.
if the subset does not contain any midpoint orders, or
c. if the value of the upper bound in the subset is lower than the current cutoff criterion. Device (b) is invoked because not every permutation of the midpoints is a midpoint order. To check this, the algorithm takes the inequalities (3) that are shared by all elements in the subset of permutations that is to be examined, and sees if there exists a solution. For this purpose we use an algorithm by Chernikova (1969, modified by Nagels and Elzinga (Roskam, 1987). If there is no solution, the subset does not contain any midpoint orders and the branch is discarded. Though this algorithm always produces a maximum likelihood solution, it soon becomes very time-consuming as the number of midpoints increases. In that case some suboptimal modifications could turn out to be necessary. One modification consists of evaluating only those midpoint orders that are consistent with the maximum likelihood order under characteristic monotonicity. This leads to a considerable reduction in size of the permutation tree, but the amount of time necessary to evaluate all the branches might still lead to problems. For a large set of alternatives S
Bossuyt & Roskam
94
a suboptimal pairwise interchange strategy could be used.
5. An Example Greenberg (1965) asked 163 housewives to choose out of all pairs of nine phrases, describing possible attitudes towards the Volkswagen automobile. The phrases ranged from excellent (A), over indifferent (E), to terrible (I). Greenberg attempted to find an underlying midpoint order by using midpoint monotonicity on the resulting choice proportions. Unfortunately, his attempt was not successful. As can be concluded from an inspection of Table 1, the choice proportions do not satisfy midpoint monotonicity, they do not even satisfy characteristic monotonicity. Greenberg attributed this to “sampling error”. Table 1. Choice proportions, based on the proportions collected by Greenberg (1965).
A
B C D E F G H I
A
B
C
D
E
F
G
H
I
0.500 0.859 0.804 0.822 0.693 0.669 0.620 0.583 0.491
0.141 0.500 0.798 0.730 0.626 0.577 0.589 0.503 0.429
0.196 0.202 0.500 0.663 0.577 0.521 0.466 0.436 0.374
0.178 0.270 0.337 0.500 0.515 0.460 0.417 0.387 0.245
0.307 0.374 0.423 0.485 0.500 0.350 0.350 0.227 0.209
0.331 0.423 0,479 0.540 0.650 0.500 0.264 0.239 0.141
0.380 0.411 0.534 0.583 0.650 0.736 0.500 0.215 0.153
0.417 0.497 0.564 0.613 0.773 0.761 0.785 0.500 0.153
0.509 0.571 0.626 0.755 0.791 0.859 0.847 0.847 0.500
By multiplying the proportions reported in Greenberg (1965) Greenberg (1965), by 163 we obtained a set of choice frequencies. The choice proportions based on these frequencies (Table 1) are not entirely equal to the proportions in Greenberg (1965), which seems to imply that not all subjects did make all choices. Greenberg used sampling scheme SS.1 for his subjects. However, since the individual choice frequencies are unknown, we proceed as if sampling scheme S S . 2 had been adopted. It is reasonable to assume that there exists a joint unfolded order (A.6) of the nine phrases within Greenberg’s population of housewives. Therefore we looked for the
Maximum Likelihood Unidimensional Unfolding
95
maximum likelihood joint unfolded order using characteristic monotonicity. Not surprisingly, this order corresponded to the a priori order of the nine phrases. The maximum likelihood estimates can be found in Table 2. The corresponding value of -2 log likelihood ratio was 0.431. This value of the test statistic is lower than the .95 quantile of the distribution under characteristic monotonicity (13.315) estimated in a series of 500 Monte Car10 simulations. Table 2. Choice probabilities estimated under characteristic monotonicity based on the proportions in Table 1.
A B
C D E F G H I
A
B
C
D
E
F
G
H
I
0.500 0.859 0.813 0.813 0.693 0.669 0.620 0.583 0.491
0.141 0.500 0.798 0.730 0.626 0.583 0.583 0.503 0.429
0.187 0.202 0.500 0.663 0.577 0.521 0.466 0.436 0.374
0.187 0.270 0.337 0.500 0.515 0.460 0.417 0.387 0.245
0.307 0.374 0.423 0.485 0.500 0.350 0.350 0.233 0.209
0.331 0.417 0.479 0.540 0.650 0.500 0.264 0.233 0.149
0.380 0.417 0.534 0.583 0.650 0.736 0.500 0.215 0.149
0.417 0.497 0.564 0.613 0.767 0.767 0.785 0.500 0.149
0.509 0.571 0.626 0.755 0.791 0.851 0.851 0.851 0.500
6. Discussion The probabilistic unfolding models for binary choices can be distributed over two categories. The first category contains the random configuration models (Croon, in press). These models are inspired by the Thurstonian scaling approach. They assume that either the coordinates of the ideal or/and the alternative coordinates, or the ideal-alternative distances, are random variables. Examples are the models proposed by Bechtel (1968), Coombs, Greenberg & Zinnes (1961), Croon (in press), Ramsay (1980), and Zinnes & Griggs (1974). The second category contains models that are inspired by other scaling approaches. Examples are the Bradley-Terry-Luce approach (Schonemann & Wang, 1972), the Rasch model (Sixtl, 1973; Jansen, 1981), and the Fechnerian scaling model (see Bossuyt & Roskam, 1985). The present model belongs to the first category. It lacks an advantage of the other models in this category, since it does not provide exact
96
Bossuyt & Roskam
estimates of the ideal and alternative coordinates. Instead, the maximum likelihood unfolded order or unfolded midpoint order can be used to define a solution space for the coordinate values. However, other models acquire this advantage at the cost of strong parametric assumptions. Bechtel (1968) for example has specified a model much the same as ours, but he assumes that the distribution function is normal. As Sixtl (1973) has argued, this assumption is likely to be violated in most choice situations. A disadvantage of all existing probabilistic unfolding models in both categories is that they assume that there exists a joint metric. There is considerable evidence that this condition is not always met in practice. Sherif and Sherif (1967, 1969) for example demonstrated that there are situations in which a joint unfolded order exists without an intersubjective agreement on the interalternative dissimilarities. The procedure presented in this paper offers a way out of this difficulty by presenting the user with the choice between two assumptions: the existence of a joint unfolded order versus the existence of a joint unfolded midpoint order. If necessary a generalized likelihood ratio test with approximate size can be used to test the corresponding hypotheses on the binary choice probabilities. It is not difficult to extend the applicability of the approach proposed in this paper to related theories on probabilistic unidimensional unfolding, or probabilistic choice theories not involving the concept of an ideal alternative. In fact, any theory for which necessary ordinal conditions on the choice probabilities can be formulated lends itself to this strategy. This approach has been successful in a series of experiments designed to evaluate the appropriateness of probabilistic unidimensional unfolding models for paired comparisons data (Bossuyt & Roskam, 1985).
References Bechtel, G. G. (1968). Folded and unfolded scaling of preferential pair comparisons. Journul of Mathematical Psychology, 5, 333-357. Bossuyt, P. M. (1987). An algorithm for finding the maximum likelihood estimates of partially ordered binomial probabilities. Unpublished internal report, Mathematical Psychology Group, K. U. Nijmegen. Bossuyt, P. M., & Roskam, E. E. (1985). A nonparametric test of
Maximum Likelihood Unidimensional Unfolding
97
probabilistic unfolding models. Paper presented at the 4th European Meeting of the Psychometric Society, Cambridge, Great Britain. Bossuyt, P. M., & Roskam, E. E. (1987). Testing probabilistic choice models. Communication & Cognition, 1, 5- 16. Chernikova, N. V. (1965). Algorithm for finding a general formula for the non-negative solutions of a system of linear inequalities. U.S.S.R. Computational Mathematics and Mathematical Physics, 5, 228-233. Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145-158. Coombs, C.H. (1964). A theory of data. New York: Wiley. Coombs, C. H., Greenberg, M., & Zinnes, J. (1961). A double law of comparative judgment for the analysis of preferential choice and similarity data. Psychometrika , 26, 165-171. Croon, M. (in press). A comparison of statistical unfolding models. Psychometrika. DeSarbo, W. S., & Hoffman, D. L. (1986). Simple and weighted unfolding threshold models for the spatial representation of binary choice data. Applied Psychological Measurement, 10, 247-264. Dijkstra, L., Van der Eijk, C., Molenaar, I. W., Van Schuur, W. H., Stokman, F. N., & Verhelst, N. (1980). A discussion on stochastic unfolding. Methoden & Data Nieuwsbrief, 5 , 158-175. Greenberg, M. G. (1965). A method of successive cumulations for the scaling of pair-comparison preference judgments. Psychometrika, 30, 44 1-448. Jansen, P. G. W. (1981). Spezifisch objektive Messung im Falle nichtmonotoner Einstellungsitems. Zeitschrift f i r Sozialpsychologie, 12, 169-185. Ramsay, J. 0. (1980). The joint analysis of direct ratings, pairwise preferences and dissimilarities. Psychometrika, 45, 149- 166. Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference. New York: John Wiley & Sons. Roskam, E. E. (1987). ORDMET3: An improved algorithm to find the maximin solution to a system of linear inequalities. Internal report 87 MA 06, Mathematical Psychology Group, K. U. Nijmegen. Schonemann, P. H., & Wang, W. M. (1977). An individual differences model for the multidimensional analysis of preference data. Psychometrika, 37, 275-309.
98
Bossuyt & Roskam
Sherif, M., & Sherif, C. W. (1967). The own categories procedure in attitude research. In M. Fishbein (Ed.), Readings in attitude theory and measurement. New York : John Wiley & Sons. Sherif, M., & Sherif, C. W. (1969). Social psychology. New York: Harper & Row, Tokyo: Weatherhill. Sixtl, F. (1973). Probabilistic unfolding. Psychometrika, 38, 235-248. Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic multidimensional unfolding analysis. Psychometrika, 39, 327-350.
New Developments in Psychological Choice Modeling G. De Soete, H. Feger and K. C. Klauer (eds.) 0Elsevier Science Publisher B.V. (North-Holland),1989
99
LATENT CLASS MODELS FOR THE ANALYSIS OF RANKINGS Marcel A. Croon Tilburg University In this papers several latent class models for the analysis of rank order data are developed and discussed. These models try to accommodate the rationale of individual choice models to the situation in which a large number of respondents is sampled from a non-homogencous population. By considering these individual choice models as statistical error theories, these models may be seen to fall within the domain of general latent structure analysis and as such, they may provide a viable alternative to the more traditional scaling methods for the analysis of rankings.
1. Introduction For the analysis of rank order data several more or less traditional methods are available. A first very broad class of data analysis methods belongs to the domain of what commonly is called scaling techniques and encompass various methods which all essentially aim at a geometrical or pictorial representation of the data. The first class of methods can further be subdivided in two subclasses depending upon the geometric model on which these methods are based. Here we refer of course to the well known distinction between vector and distance models. Unfolding analysis belongs to the subclass of the distance models, since its main objective is to represent subjects and stimuli as points in a joint space in such a way that the rank order of the distances between one particular subject point and the stimulus points optimally reflects the observed preference ranking of the stimuli by the corresponding subjects. Vector This paper is a revised version of an article published in Zeitschrifr fur experimentelle und angewandte Psychology, 1988, 35, 1-22.
100
Croon
models, on the other hand, usually represent subjects by means of vectors or directions in the joint space while the stimuli remain mapped upon points. In these models the orthogonal projections of the stimulus points on the subject vectors are assumed to be related to the observed evaluation scores or rankings. Quite often these geometric scaling models succeed in adequately representing and summarizing the essential information in the data. Occasionally however, there arise situations in which these scaling methods seem less attractive and appropriate. This is for instance the case when a large sample of respondents is asked to rank a small number of stimuli on an evaluation criterion. In such a situation it is very likely that almost all or at least a majority of all possible rankings will indeed occur in the sample. As will be shown in a later section of this paper, distance or vector scaling models have some difficulties in adequately representing such abundant data, notwithstanding the fact that, when fitting a scaling model, a large number of parameters is estimated. In these situations, some relief may eventually be given by use of alternative methods, some of which will be developed in this paper. A second class of methods is more closely connected with the interest mathematical psychologists and economists have shown for the development of individual choice models. A landmark in this tradition is undoubtedly Luce’s (1959) monograph Individual choice behavior, in which, starting from a not too unacceptable axiom, the author derives a fairly simple unidimensional choice model. The same model was described some years earlier in a more statistically oriented way by Bradley and Terry (1952). Even much earlier, the German set-theoretician Zermelo (1929) gave it some considerations in solving some chess tournament problems. (Amazingly enough, quite recently another settheoretician (Jech, 1983) arrived in a seemingly independent way at the same model when analyzing similar tournament problems.) Although the BTL model, as it is known since then, has been used primarily for the analysis of paired comparison data, it can easily be adapted for the analysis of rank order data. Luce (1959) himself devoted some pages of his monograph to this extension, but his remarks were mainly of a theoretical nature. Similar theoretical remarks on the analysis of rank order data by means of individual choice models can be found in Block and Marschak (1960) and in Luce and Suppes (1965). From a more statistical point of view, the adaptation of the basic rationale of the BTL
Latent Class Models for the Analysis of Rankings
101
model to the analysis of rankings was treated by Pendergrass and Bradley (1960), Fienberg and Larntz (1976) and Beaver (1977). A related reference is Plackett (1975). As will be shown in the next section, all these models for the analysis of rankings lead to still manageable expressions for the ranking probabilities in terms of a small number of parameters which represent the stimulus scale values on an underlying unidimensional continuum. Due to the relative simplicity of the these expressions the maximum likelihood estimates of the unknown parameters can be determined for most data sets by a rapidly converging algorithm such as the Newton-Raphson procedure. After obtaining these estimates, various statistical tests can be performed in order to determine whether the proposed model provides an acceptable fit for the data at hand. Unfortunately, it is precisely at this point that most users will be disappointed with the final result. Quite frequently, and especially so in the case of large samples of respondents, these statistical tests will indicate a very bad model fit, necessitating the conclusion that ultimately the proposed model does not apply to the data. The reason for this unfortunate state of affairs is however easy to give. The BTL model is a model for individual choice behavior. If we apply this model to the rankings observed in a random sample of respondents from a particular population, we implicitly assume that all members of this population perceive and evaluate the stimuli in essentially the same way. This strong assumption of complete homogeneity in the population is certainly untenable in social psychological applications of the BTL model. People consistently differ in their stimulus evaluations and any analysis, which does not leave room for these interindividual differences to show up, is doomed to fail and to misrepresent the data. In this paper an attempt is made to accommodate the rationale of the BTL model to the case when respondents are sampled from a nonhomogeneous population by linking this choice model to latent class analysis. Originally, latent class analysis, and latent structure analysis in general, was proposed by Lazarsfeld (see e.g., Lazarsfeld & Henry, 1968) to explain associations between observed variables in terms of unobserved latent variables. In our application of it we assume that the nonhomogeneous population can be partitioned in several subpopulations, each of them being homogeneous with respect to the stimulus evaluations. In this way each subpopulation defines a latent class which is
Croon
102
characterized by a particular set of stimulus scale values.
2. Latent Class Models for Rankings We first introduce some notation. Since we assume in the sequel that only a finite number n of stimuli are used in a ranking experiment, we may represent these stimuli by the first n natural numbers. Hence, if S denotes the stimulus set, we have
S = {1,2,.
. . , i , . . . , n}.
Furthermore, since we will only consider the case in which the subjects are required to rank the entire set of stimuli on some evaluation criterion, the ranking given by any particular subject may be represented by an ordered n-tuple r:
r = ( r l r r 2 , . * . , rk, *
6
*
,rn),
in which r l is the number of the stimulus ranked first by the subject. In general, rk is the stimulus occupying the k-th position in the subject’s ranking. The probability that a randomly selected subject will given the ranking r will be represented by pr or, if necessary, more explicitly by p(r1J-2, *
* *
,rfl).
Next we will discuss two adaptations of the basic BTL model to the ranking task. Both models lead to manageable expressions for the ranking probabilities pr in terms of some unknown parameters, which may be interpreted as the stimulus scale values. Since the first ranking model we will discuss is related to the strict random utility formulation of the BTL model, we will from now refer to it as the strict utility ranking (SU for short) model and since our second ranking model is based on the model proposed by Pendergrass and Bradley (1960) for the analysis of mple rankings, we will refer to it in the sequel as the Pendergrass-Bradley (PB for short) model.
2.1 The Strict Utility Ranking Model A first adaptation of the BTL model starts from the well-known observa-
tion that the BTL model is compatible with a particular random utility model. Yellott (1977) is one of the most relevant references in this
Latent
Class Models for the Analysis of Rankings
103
respect. Suppose that the presentation of an arbitrary stimulus i results on the part of the subject in a subjective impression or evaluation, the strength of which may be represented by a real number ui. This real number is not to be considered as an unknown constant, but as a realization of a random variable Ui. Then, the BTL model is compatible with the random utility model which assumes that the random variables Ui follow independent double-exponential distributions with constant scale parameter but with differing location parameters, which correspond to the stimulus scale values. Since we may assume, without loss of generality, that the constant scale parameter is equal to one, this random utility model leads to the following density function for the random variable Ui:
- ai)]}.
f(~i= ) exp{-(ui - ai) - exp[-(ui
In this expression the location parameter ai represents the scale value of stimulus i. If we denote, for paired comparisons data, the probability that stimulus i is preferred to stimulus j by p (i,j), we may derive p ( i , j ) = Prob (Ui 2 U,) =
e' e4.
+eaj
Note that in this and also in the following derivations the assumption that the different random variables involved are independent of each other is crucial. If p ( i , J , k ) denotes the probability that in a ranking task with only three stimuli the ranking ( i , j , k ) will be given, then we obtain under this random utility model: p (i,j , k ) = Prob (Ui2 U,2 U,)
These results are well-known and can be found for instance in Bradley (1965) and in Yellott (1980). In the general case of a ranking task which involves n stimuli, we may derive the following expression for the ranking probabilities pr :
Croon
104
p r = Prob(U,, 2 Url 2
-
* *
2 Urm)
In order to elucidate the true nature of this at first sight impressive expression, we give some concrete versions of it for the case of II = 4 stimuli. Then, for instance,
and
These expressions illustrate the role played by Luce's choice axiom is their derivation. For instance, the last expression shows that the probability of the ranking (3,1,4,2) can be thought of as the product of the probabilities that a particular item will be selected from a set of available alternatives. The first term in this product corresponds to the probability that item 3 will be chosen from the set { 1,2,3,4}; the second term represents the probability that item 1 will be chosen from {1,2,4} and finally, the third term is the probability that item 4 will be chosen from {2,4}. In this model for ranking probabilities, it is implicitly assumed that the ranking of the stimuli takes place by means of a sequence of selections of items from sets of alternatives which remain available at each choice point. Furthermore, at each selection point the choices are assumed to be governed by the same set of stimulus values.
2.2 The Pendergrass-Bradley Model A second approach to the adaptation of the BTL model to the analysis of
rankings has been proposed by Pendergrass and Bradley (1960), who however only discuss the case of mple rankings (Le., n = 3) extensively. These authors assume there exist strictly positive real numbers vi such that
Latent Class Models for the Analysis of Rankings
105
in which s = v: ( v 2 + v3) + v; (v 1 + vg)+ v: (v 1 + v2). One easily sees that s equals the sum of the v:vj terms over all permutations of the symbols i, j and k. In this model the ranking probability p ( i , j , k ) is given as the product of three paired comparisons probabilities: P ( i , j , k ) = P ( i d * p( L k ) * PW).
By defining ai = lnvi, this model can be reparametrized as follows: p (i,j,k) =
exp(2ai + a,) S
Using the assumption that a ranking probability can be defined as the product of the paired comparisons probabilities which are induced by the ranking, the generalization of this model to the case in which n stimuli are to be ranked leads to the following expression for the ranking probability:
1
in which s again is the sum of all terms which occur in the numerator of some ranking probability. This sum is taken over all permutations of the stimulus indices. A possible advantage of the Pendergrass-Bradley approach resides in the fact that, as Fienberg and Larntz (1976) have shown, it allows for a log linear representation so that its theoretical analysis and its practical application may benefit from the general results available from the theory of log linear models. As described so far, both ranking models are yet unidentified since the scale values are defined except for a translation along the real axis. One usually solves this identification problem by imposing the following linear constraint on the scale values: n
C ai = 0. i=l
106
Croon
This constraint fixes the origin of the scale at zero. In the sequel we will always assume implicitly that this type of restriction has been imposed on the scale values. This leaves n - 1 independent scale values to be estimated.
2.3 Estimating the Parameters For a given set of observed rankings, both models for the analysis of rankings allow the determination of the maximum likelihood estimates of the stimulus scale values by means of Newton-Raphson iteration procedure. In this paper we will not dwell on the technical aspects of this procedures. It suffices here to say that in all our applications of this procedure to real data, the algorithm converged very rapidly. Even in the case of very bad starting values for the unknowns, convergence was generally reached in fewer than 10 iterations. However, it should be stress that the Newton-Raphson estimation algorithm only converges if maximum likelihood estimates exist. As an example of a situation in which these estimates do not exist, consider the case when, with n = 4 stimuli, only the following rankings are observed: (1,2,3,4), (1,2,4,3), (2,1,3,4), and (2,1,4,3). In this example the subset {1,2} dominates the subset {3,4} in the sense that each item from the dominating subset is always ranked before each item from the dominated subset. In such cases the maximum likelihood function achieves its maximum at the boundary of the parameter space: the scale values of the items in the dominating set tend to plus infinity, whereas the scale values of the items in the dominated set tend to minus infinity. So, in order for the maximum likelihood estimates to be defined, no dominating subsets of items should exist. For a similar condition in the case of paired comparisons, see Mattenklott, Sehr and Mieschke (1982).
2.4 Latent Class Models for Rankings Both versions of the BTL model for rankings can be used in the formulation of a latent class model for the analysis of rankings from nonhomogeneous populations. Basic to these latent class models is the assumption that the non-homogeneous population can be divided into a set of T homogeneous subpopulations or latent classes, each of them characterized by a distinctive set of stimuli scale values which are assumed to govern the ranking choices of the respondent belonging to that particular
Latent Class Models for the Analysis of Rankings
107
class. So, instead of one set of scale values, we now have T sets of scale values which, in due course, have to be estimated from the data. If we denote the probability that ranking r is given within latent class t by prt, then both expression (1) and (2) can easily be adapted to accommodate for the existence of different latent classes. The analogue of expression (1) becomes
(3)
whereas for expression (2) we have
rn-l
1
If we denote the probability that a randomly selected subject belongs to latent class t by q,we obtain the following expression for the probability pr that ranking r is observed when sampling is from the entire population: T Pr
=
W r t .
t=l
Obviously, the parameters nt satisfy the following constraint:
cT Kl = 1. t=l
As a consequence, the total number of independent parameters to be estimated equals (n - l)T + (T - 1 ) = nT - 1. A necessary condition for the latent class model to be identified is that the number of independent unknowns is smaller than, or equal to, the number of independent rankings: nT-lIn!-l
or T < ( n - l ) ! .
So, for instance, for n = 4, the number of latent classes should be smaller than or equal to 6. However, this condition is by no means sufficient to ensure identifiability of the model.
108
Croon
The estimation of the unknown scale values and of the latent class probabilities would not pose any new difficulty if we only knew which latent class each respondent belonged to. If this were the case, then we could determine for each class how frequently each ranking was generated by subjects belonging to it and on the basis of these observed frequencies frt all unknown parameters could be estimated. Unfortunately, latent class membership is, in our context, an unobserved variable, which implies that the data at our disposal should be considered as “incomplete data” in the sense as defined by Dempster, Laird and Rubin (1977). Consequently, the estimation of the unknown parameters should preferably proceed by means of the EM algorithm proposed by these authors. In this algorithm each iteration consists of two steps: an E-step and an M-step. During the E-step the missing data are estimated on the basis of the observed data and of the currently available provisional estimates of the model parameters. During the M-step maximum likelihood estimates of the model parameters are determined again using the completed data resulting from the preceding E-step. By alternating E- and M-steps a sufficient number of times, one may hope to achieve convergence to the global maximum likelihood solution. Although for relatively simple estimation problems, which are characterized by concave likelihood surfaces, the EM algorithm usually converges to the global maximum, such a reassuring observation cannot be made for more involved estimation problems for which the likelihood surface may have several local maxima. In these latter cases, different runs of the estimation procedure, each run starting from different initial parameter values, may give some comfort to the user, provided that these different runs converge to what seems to be essentially the same final solution. Another remark which is of some relevance concerns the rate with which the EM algorithm converges. In general, convergence is quite slow, at least if the rate of convergence is measured by the number of iterations required before the convergence criterion is reached. In our case at hand, the E-step of the EM algorithm consist of the determination of the frequencies frtwith which each ranking r is observed within each latent class t. During this step the provisional estimates ai, of the stimulus scale values are used to determine the ranking probabilities prt. Depending upon which choice model is implemented, expression (3) or (4) is used in this respect. Then the following weights wrt are computed:
Latent Class Models for the Analysis of Rankings
109
in which the provisional estimates xt of the latent class probabilities are used. These weights represent the conditional probabilities that a particular ranking r originated from latent class t. Finally, the unobserved fiequency frt is estimated by frt
= wrt*fr
where fr is the observed frequency of ranking r in the entire sample. During the M-steps of our iterative procedure, the maximum likelihood estimates of the stimulus values and of the latent class probabilities are determined again. The new estimates of the latent class probabilities are easily computed in the following way
in which the summation runs over all rankings. The determination of the new scale values is of course somewhat more involved since it requires a Newton-Raphson iteration procedure for each latent class apart. By alternating the E- and M-steps a sufficiently large number of times one may hope to reach in the end the maximum of the likelihood function. In our implementation of the EM algorithm two stop criteria were used. In the first place the iteration process was discontinued whenever the difference between two successively evaluated log likelihoods was smaller than and secondly, the iteration process was stopped after 250 iterations. If in the latter case there was an indication that the likelihood function still might increase substantially, a new iteration process was performed, starting from the previously obtained estimates of the parameters. Moreover, if there was any suspicion that the final solution might represent a local maximum of the likelihood function, a new iteration process was performed, starting from different initial estimates of the parameters. In the preceding pages, we tacitly assumed that the number T of latent classes was known beforehand. From a practical point of view this is certainly never the case. Instead, one would rather consider the parameter T
110
Croon
as an additional unknown to be estimated by the analysis. An obvious way to proceed in this respect is by means of statistical model tests. This amounts to the estimation of the model parameters under different hypotheses on the number of latent classes and the subsequent comparison of the value of the log likelihood function for each model with the value of that function under an appropriate null model. In the case of a latent class analysis of rankings, the appropriate null model assumes that the n ! different rankings define the categories of a multinomial random variable, implying that for each ranking r its theoretical probability p r can be estimated by f , l N . If we denote the value of the log likelihood function under this null model by F o and the value of the log likelihood function under the model with t latent classes by Fi, then, standard results from the theory of log likelihood ratio tests imply that, under the hypothesis that t latent classes suffice to explain the data, the test statistics
follows asymptotically a chi square distribution with degrees of freedom equal to n ! - nt. Large values of this test statistic lead to the rejection of the hypothesis of t latent classes. In this case one should repeat the analysis with (t + 1) latent classes. As the final estimate of T one takes the smallest value of t for which the test statistic became nonsignificant. Although the rational of this procedure to estimate T, apart from the often overlooked fact that actually a sequential estimation procedure is being carried out, seems impeccable, the direct dependence of the test statistic on the sample size renders the conclusions based on it somewhat insecure. For large samples, the test procedure becomes so powerful as to reject any low dimensional model. For a latent class analysis this generally implies that only for sufficiently large values of t the ensuing test statistics will turn out to be nonsignificant. Similar problems have been encountered with for instance covariance structure analysis, where also in large samples any model tends to be rejected as inadequate. As a response to these difficulties Joreskog (1978) and Bentler and Bonett (1980) have recommended to perform hierarchically nested model tests which in their opinion might be more informative that the tests that compare each model with the saturated null model. In our context this amounts to testing the hypothesis of t latent classes against the hypothesis o f t + 1 latent classes by means of the test statistic
Latent Class Models for the Analysis of Rankings
111
which is asymptotically chi square distributed with n degrees of freedom.
3. Some Numerical Examples We will illustrate our latent class model by analyzing a data set from the comparative cross-national study “Changing mass publics” which is described in Barnes et al. (1979). The present author would like to thank Dr.F. Heunks of the Sociology Department of the University of Tilburg for making these data available. In this study respondents from five Western countries were asked to rank the following four political goals according to their desirability: 1. Maintain order in the nation;
2. Give people more say in the decisions of the government;
3. Fight rising prices; 4. Protect freedom of speech.
In the rest of this paper we will only use the data from the German sample in which N = 2262 respondents gave a complete ranking of the four items. Table 1 contains the 24 possible rankings together with their observed frequency of occurrence. For clearness’ sake, we stress the fact that the rankings run from more to less desirability. The inclusion of this particular ranking task in the study was inspired by Inglehart’s theory on value orientations (see e.g., Inglehart, 1979). This theory draws a distinction between a materialistic and a post-materialistic value orientation. Persons characterized by a materialistic value orientation are supposed to care primarily about social and economic stability and security, whereas post-materialistically oriented persons rather emphasize the humane and spiritual aspects of social life. If asked to rank the four political goals on a desirability criterion, materialists can be expected to give precedence to the items 1 and 3 from the list, whereas for post-materialists the items 1 and 3 should occupy the first positions in the ranking. It is obvious that for this quite simple ranking task the assumption of complete population homogeneity is untenable. If Inglehan’s theory on value orientations is correct, one may expect at least two different latent classes, each of them
112
Croon
Table 1. Observed frequencies of the 24 rankings in the German sam-
ple No.
Ranking
Frequency
No.
Ranking
Frequency
1 2 3 4 5 6 7 8 9 10 11 12
1234 1243 1324 1342 1423 1432 2134 2143 23 14 234 1 2413 243 1
137 29 309 255 52 93 48 23 61
13 14 15 16 17 18 19 20 21 22 23 24
3124 3142 3214 324 1 3412 342 1 4123 4132 4213 423 1 4312 4321
330 294 117 69 70 34 21 30 29 52 35 27
55
33 59
corresponding with one of the ideal-typic value orientations. We first discuss the results of our analyses based on the strict utility model. Table 2. Model fit tests for the analyses based on the SU model ~~~
T
L
df
a
1 2 3 4
315.05 84.32 23.68 10.59
20 16 12 8
0 0 0.022 0.226
Table 2 summarizes the corresponding model fit tests for the analyses starting with T = 1 and continuing till T = 4. This table contains the log likelihood ratio statistic L, the corresponding degrees of freedom and the associated a-level for each of the analyses. From this table we infer that the hypothesis T = 3 should be rejected at a = 0.05 but not at a = 0.01, whereas the hypothesis T = 4 cannot be rejected at a = 0.05. In the
Latent Class Models for the Analysis of Rankings
113
sequel we will restrict ourselves to a discussion of the solution with three latent classes. Table 3 contains the parameter estimates for this case. Table 3. Parameter estimates for the SU model with T = 3 latent classes ~~~
Parameter
Latent Class 1
a 11 a2 a 31 a 4r
1.99 -0.92
0.06 -1.13 0.33
2
3
0.59
-0.69 0.63
-1.07 1.73 -1.25 0.45
-0.01 0.07 0.22
From the table we note that the estimates of the stimulus scale values are quite similar in the classes 1 and 2. These two classes conform to our expectations as to how materialists should evaluate the stimuli. Both classes are clearly characterized by a rejection of the post-materialist items. They differ with respect to which materialist item is emphasized: in latent class 1 item 1 is preferred to item 3 whereas in latent class 2 the reverse is the case. Since item 1 seems to tap “law and order” sentiments whereas item 3 is more concerned with problems of economic stability, the following very bold and tentative hypothesis may be formulated. At least in Germany, Inglehart’s conception of a fairly homogeneous block of materialists, needs revision; instead, a distinction should be drawn between people who are primarily concerned about economic stability and people who give precedence to problems of social stability. On the other hand, latent class 3 may be identified as the post-materialist class although this characterization is not as pure as one might wish. The materialist item 3, which is indeed very popular in the entire German sample, still scores rather high in this third class. Finally, we note that only 22 percent of the respondents are estimated to belong to this third class, whereas the remaining 78 percent are distributed over the two materialist classes, underlining the strong concern with problems of social and economic stability in Germany. Of course, the latter conclusion could also have been reached by inspecting the original data as given Table 1 and counting, for instance, the number of times each item occupies the
114
Croon
Table 4. Model fit tests for the analyses based on the PB model T
L
df
a
1 2 3 4
284.5 1 3 1.79 25.88 14.12
20 16 12 8
0 0.01 1 0.01 1 0.079
first position in the ranking. We next turn to a similar discussion of the results obtained by the analysis based on the PB model. Table 4 summarized the model fit tests for this model. As 'may be seen from this table, the successive model fit tests for this model result in a somewhat less clear picture than it was the case for the SU model. First or all we note that for this model even the hypothesis T = 2 could not be rejected at the 1% level and that moreover the incremental fit, gained when moving from T = 2 to T = 3, is not significant at the 5 % level: L = 5.9098 with 4 degrees of freedom. However, the incremental fit obtained by moving from T = 3 to T = 4 is highly significant at the 1% level: L = 11.7673 with 4 degrees of freedom. So the decision as to which number of latent classes to retain is an uneasy one, due to the relatively bad fit provided by the analysis with three latent classes. Two possible explanations for this fact may be given. In the first place it cannot be excluded that the analysis with three latent classes did not yet converge to the optimal maximum likelihood solution, despite the fact that approximately 1200 EM iterations were performed. Such a situation might arise if the likelihood surface is very flat in the neighborhood of the global maximum. A second explanation is that the estimation procedure converged to a local maximum of the likelihood function. However, several runs of the estimation procedure, each starting from different initial estimates, were performed and all runs resulted in essentially the same final solution. Moreover, as we shall discuss within a few moments, the solution with T = 3 for the PB model was quite similar to the corresponding solution for the SU model. Table 5 contains the parameter estimates for the PB analysis with three latent classes. The similarity between the PB and the SU solutions is striking. Once again we obtain
Latent Class Models for the Analysis of Rankings
115
Table 5 . Parameter Estimates for the PB model with T = 3 Latent Classes
Parameter a 1r a 21 a3t
a 41 Kt
Latent Class 1
2
3
0.64 -0.48 0.19 -0.35 0.26
0.55 -0.47 0.72 -0.80
-0.75 0.57 -0.06 0.24 0.15
0.59
two materialistic classes, both characterized by a rejection of the postmaterialistic items and differing from each other with respect to which materialistic item is given prominence. Furthermore, the third latent class of the PB solution also seems to capture the post-materialistic value orientation. The major difference between the SU and the BP solution has to do with the estimates of the latent class probabilities. In the PB analysis only 15% of the respondents are estimated to belong to the postmaterialistic class, whereas the corresponding figure in the SU analysis was 22%. Moreover, under the PB model the distribution of the materialists over the two materialistic classes is more uneven than it is under the SU model. A few additional comments on the solutions obtained with for two and four latent classes may be in order here. First of all we note that in all these cases the SU and the PB solutions were very similar to each other. From the analysis with two latent classes, a pronounced materialistic and a clear post-materialistic class emerged. The analysis with four latent classes resulted for each of the models in a solution in which, in addition to two materialistic classes and one post-materialistic class all three similar to those obtained to those obtained by the analysis with three latent classes, a fourth latent class emerged in which the items 2 and 3 were highly evaluated. This class was probably called into existence to accommodate for the large popularity of item 3 in the German sample. At this point it may also be of some interest to compare our latent class analyses with the results of a classical unfolding analysis. To this end we analyzed our data set with the MINIRSA program from the
116
Croon
MDS(X) integrated series of scaling programs produced by A. Coxon and his collaborators from the University of Edinburgh. The MINIRSA program was originally developed by Prof. E. Roskam of the University of Nijmegen. The three-dimensional solution obtained by MINIRSA provided by a perfect fit to the data. However, the resulting geomemc representation of the stimuli and rankings can hardly be considered as informative, since in this solution the four stimuli were located at the vertices of a tetrahedron. As a matter of fact, the start configuration computed by MINIRSA immediately yielded this perfect solution. This result does not come as a surprise since it is well known that preferential choice data for n stimuli can always be unfolded perfectly in a joint space with n - 1 dimensions. Moreover, this perfectly fitting configuration can be constructed without any reference to the data whatsoever and it is in this sense that we may label it uninformative. On the other hand, the one-dimensional solution proved a very bad fit to the data: STRESS-HAT equal to 0.393 after 144 iterations; a result which did not come as a surprise either, since for four stimuli the unidimensional unfolding model can maximally account for seven rankings. So it seems that we have no other option than to retain the two-dimensional solution. For this solution STRESS-HAT was equal to 0.175 after 105 iterations. Since for four stimuli in two dimensions, maximally 18 different rankings can be accounted for by the unfolding model (see Table 7.1 in Coombs, 1964), we cannot expect a perfect representation of our data in two dimensions. But even then the MINIRSA solution was somewhat suboptimal since only 12 rankings were perfectly accounted for by the two-dimensional solution. These 12 rankings represented only 62% of all rankings in the German sample. This situation highlights a major problem encountered by many scaling techniques when they are applied to abundant data just as ours. If relatively few stimuli are used in an investigation and if moreover all logically possible response patterns occur in the sample, a low dimensional scaling solution seldomly provides an acceptable fit to the data, whereas a solution in a high dimensional space easily becomes uninformative. Figure 1 represents the two-dirnensionaI MINIRSA solution after an orthogonal rotation which was performed in order to let the first dimension optimally correspond to the opposition between materialistic and post-materialistic items. Although it is extremely risky to interpret a
Latent
Class Models for the Analysis of Rankings
117
2341 3214
+1
3241
**
3
3421 3123 '3412
..231 4
21 34
2431
4321
.
4312
.
31 42
*2
4231
I
+1
-1
241 3
*4
.
134-2
421 3 1324 1234
'*I
413;
41 23
9
1432
-1
1423
1243
.
2143
-2 Figure 1. Two-dimensional MINIRSA solution. (The rankings are
represented by correspondingly labeled points; the items are represented by astcrisks.)
configuration which consists of only four points, we may try to relate the
118
Croon
results of the unfolding analysis to the results obtained by the latent class analyses. The most striking feature in Figure 1 is perhaps the fact that item 1, item 3 and the cluster composed of items 2 and 4 are approximately located at the vertices of an equilateral triangle. This stimulus configuration is undoubtedly the best fitting two-dimensional projection of the tetrahedron obtained by the procedure which computes a start configuration for the MLNIRSA analysis. In Figure 1 we easily recognize the contrast between the materialistic and the post-materialistic items. Due to the orthogonal rotation we performed on the final MINIRSA solution, this contrast defines the first dimension of the configuration. The second dimension on the other hand shows a contrast between the two materialistic items. In a certain sense this particular triangular pattern of the stimulus configuration agrees with the results of our latent class analyses in which three latent classes were retained. The fact that the latent class analyses resulted in two distinct materialistic classes seem to be reflected in the unfolding analysis by the opposition between the materialistic items along the second axis, whereas the contrast between the materialistic and post-materialistic items along the first axis corresponds to the distinction between the two materialistic and the one post-materialistic classes. 4. Discussion
In this paper we developed two latent class models for the analysis of rankings. Basic to these models is the assumption that a nonhomogeneous population can be broken down in a set of subpopulations which are homogeneous with respect to the way in which the stimuli are evaluated. For each subpopulation or latent class a probabilistic choice model is assumed to hold. These choice models, which generally associate a scale value with each stimulus, can be considered as a formulation of a stochastic error theory which may explain inconsistencies in the rankings generated by different members of the same subpopulation. In this respect latent class analysis of ranking data falls within the broad domain of general latent structure analysis in which observed variables are treated as imperfect operationalizations or indicators of underlying, theoretically relevant latent variables. Moreover, our approach implies that the dismbution of the respondents within each latent class over the different
Latent Class Models for the Analysis of Rankings
119
rankings can be viewed as a parametric multinomial distribution, showing that the latent class approach to the analysis of rankings may also be considered as specific instance of a finite mixture problem (Redner & Walker, 1984). Without doubt, a latent class approach is relevant for the analysis of rankings obtained in large samples of respondents, especially when only a relatively small number of stimuli are used in the ranking task. In such a situation, one may expect that almost all logically possible response patterns do occur in the sample, a fact which may lead to quite suboptimal or even uninformative results if the more traditional scaling techniques are used. Furthermore, we presume our approach to be so some interest for the analysis of data from a ranking experiment in which the subjects only have to select and to rank a limited number of stimuli from a larger set of available alternatives, the so-called rank kln data in the terminology of Coombs (1964). However, there still remain some substantial problems to solve in our approach. First of all, as has been noted already, the EM algorithm often converges very slowly and efficient numerical procedures to accelerate the convergence process should be sought for. On a more theoretical level a comparison of the different probabilistic choice models which may be implemented in a latent class model, should be carried out, eventually leading to still more general and flexible models.
References Barnes, S. H. et al. (1979). Political action. Mass participation in five Western countries. London: Sage. Beaver, R. J. (1977). Weighted least-squares analysis of several univariate Bradley-Terry models. Journal of the American Statistical Association, 72, 629-634. Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structure. Psychological Bulletin, 88, 588-606. Block, H. D.., & Marschak, J. (1960). Random orderings and stochastic theories of response. In I. Olkin, S . Ghurye. W. Hoeffding, W. Madow & H. Mann (Eds.), Contributions to probability and statistics (pp. 97- 132). Stanford: Stanford University Press.
120
Croon
Bradley, R. A. (1963). Another interpretation of a model for paired comparisons. Psychometrika, 30, 3 15-318. Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs. I. Biometrika, 39, 324-345. Coombs, C. H. (1964). A theory of data. New York: Wiley. Dempster, A. P., Laird, N. M., & Rubin, D. M. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-38. Fienberg, S . E., & Larntz, K. (1976). Log linear representations for paired and multiple comparisons models. Biometrika, 63, 245-254. Inglehart, R. (1977). The silent revolution. Princeton: Princeton University Press. Jech, T. (1983). The ranking of incomplete tournaments: A mathematician’s guide to popular sports. American Mathematical Monthly, 90, 246-266. Joreskog, K. G. (1978). Structural analysis of covariance and correlation matrices. Psychometrika, 43. 443-477. Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton-Mifflin. Luce, R. D. (1959). Individual choice behavior. New York: Wiley. Luce, R. D., & Suppes, P. (1965). Preference, utility and subjective probability. In R. D. Luce, R. R. Bush, & E. Galanter (Ms.),Handbook of mathematical psychology (Vol. 3, pp. 97-132). New York: Wiley. Mattenklott, A., Sehr, J., & Mieschke, K. J. (1982). A stochastic model for paired comparisons of social stimuli. Journal of Mathematical Psychology, 25, 149-168. Pendergrass, R. N., & Bradley, R. A. (1960). Ranking in mple comparisons. In I. Olkin, S . Ghurye, W. Hoeffding, W. Madow & H. Mann (Eds.), Conrributions to probability and statistics (pp. 133-351). Stanford: Stanford University Press. Plackett, R. L. (1975). The analysis of permutations. Applied Statistics, 24, 193-202. Redner, R. A., & Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26, 195-239. Yellott, J. I. (1977). The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment and the double exponential dismbution. Journal of Mathematical Psychology, 15, 109-144.
Latent Class Models for the Analysis of Rankings
121
Yellott, J. I. (1980). Generalized Thurstone models for ranking: equivalence and reversibility. Journal of Mathematical Psychology, 22, 48-69. Zermelo, E. (1929). Die Berechnung der Turnierergebnisse as ein Maximumproblem der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift, 29, 436-460.
This Page Intentionally Left Blank
New Developments in Psychological Choice Modeling G . De Soete, H. Feger and K. C. Klauer (eds.) 0 Elsevier Science Publisher B.V. (North-Holland), 1989
123
THE WANDERING IDEAL POINT MODEL FOR ANALYZING PAIRED COMPARISONS DATA Geert De Soete University of Ghent, Belgium
J . Douglas Carroll AT&T Bell Laboratories, Murray Hill, NJ, U.S.A.
Wayne S. DeSarbo University of Michigan, U.S.A. A recently developed probabilistic multidimensional unfolding model for paired comparisons data is described. Unlike the stochastic multidimensional unfolding models previously proposed in the literature, the present model is a moderate utility model. After presenting the model in its most general form, some properties and special cases are discussed. Subsequently, some practical issues related to applying the model, such as parameter estimating and model testing, are addressed. Finally, an illustrative application is reported.
1. Introduction Ever since Coombs (1950, 1964) introduced the unfolding model for representing preferential choice data, attempts have been made to reformulate the model in a stochastic way. These attempts were motivated by the uncertainty and inconsistency that typically characterizes human choice behavior. Although it is in principle possible to develop probabilistic
Thc first author is supported as “Bevocgdverklaard Navorscr” of Ihc Belgian “Nationaal Fonds voor Wctenschappclijk Ondcrzoek”. This paper is a rcviscd version of an article published in Zeitschrifr fur Sozialpsychologie, 1987, 18, 274-281.
De Soete, Carroll, & DeSarbo
124
models accounting for first choices on sets consisting of more than two stimuli, almost all effort have been directed towards developing models for representing pairwise choice data that were obtained by means of the time-honored method of paired comparisons. While most probabilistic versions of the unfolding model were limited to the unidimensional case (Bechtel, 1976; Coombs, Greenberg, & Zinnes, 1961; Sixtl, 1973), a few attempts were undertaken to develop a probabilistic multidimensional unfolding model. Schonemann and Wang (1972; Wang, Schonemann, & Rusk, 1975) suggested a model in which the probability that subject i prefers stimulus j to stimulus k was defined as Pijk =
1 1 + eXp[-C(d:k
- d,?k)] ’
(1)
where dij denotes the Euclidean distance between the points representing subject i and stimulus j in an r-dimensional space. Since model (1) is based on the well-known Bradley-Terry-Luce (Bradley & Terry, 1952; Luce, 1959) model, it implies the strong stochastic transitivity condition which states that if Pijk 2112 and Pik1 2 112, then pijl 2 max(pijk,pikl).
(2)
A quite different multidimensional stochastic model was developed by Zinnes and Griggs (1974). In this model the coordinates of both the sub-
ject and the object points are assumed to be independently normally distributed with a common variance. When a subject is presented a pair of stimuli, he or she is assumed to sample for each element of the pair independently a point from his or her ideal point distribution. This leads to the following choice probability:
where F”(v1 ,v2,hl,h2) denotes the doubly noncentral F distribution with degrees of freedom v1 and v2 and noncentrality parameters hl and h2, and where di, now indicates the Euclidean distance between the mean point of subject i and the mean point of object j . As De Soete, Carroll and DeSarbo (1986) demonstrated, this model also implies strong stochastic transitivity.
The Wandering Ideal Point Model
125
Although empirical choice proportions sometimes do satisfy strong stochastic transitivity, there is strong empirical evidence (Becker, DeGroot, & Marschak, 1963; Coombs, 1958; Krantz, 1967; Rumelhart & Greeno, 1971; Sjoberg, 1977, 1980; Sjoberg & Capozza, 1975; Tversky & Russo, 1969; Tversky & Sattath, 1979) indicating that often pairwise choice proportions violate strong stochastic transitivity in a systematic way. Empirical choice proportions seem to be influenced not only by the difference in utility between the choice objects, but also, to some extent, by the similarity or comparability of the choice alternatives, even when the stimuli differ substantially in utility. Similar stimuli, on the contrary, tend to evoke more extreme choice proportions, even when the difference in utility is not that large. A less stringent condition which is usually satisfied by empirical choice data is moderate stochastic transitivity which states that if pijk 2 112 and Pik/ 2 112, then pi,^ 2 min(pijk,pikl).
(4)
It can be proved that any model of the form
where F is monotonically increasing with F ( x ) = 1 - F (-x), ui, the utility of stimulus j for subject i, and di,k a (semi-)metric on the set of choice objects for subject i, implies (4) but not necessarily (2) (Halff, 1976). A model of the form ( 5 ) is called a moderate utility model. Contrary to models implying (2), moderate utility models can account for the empirically observed similarity effects. In this paper we discuss a recently developed probabilistic multidimensional unfolding model, called the Wandering Ideal Point (or WIP for short) model @e Soete et al., 1986) which is, unlike the SchonemannWang and Zinnes-Griggs models, a moderate utility model. The WIP model is an unfolding analogue of the wandering vector model originally proposed by Carroll (1980) and further elaborated by De Soete and Carroll (1983). In the wandering vector model, each stimulus is represented by a fixed point in a multidimensional space, while each subject is represented in the same space by a vector emanating from the origin with a terminus that follows a multivariate normal dismbution. When a subject is presented a pair of stimuli, he or she samples a point from that
De Soete, Carroll, C? DeSarbo
126
distribution and chooses the stimulus that has the largest orthogonal projection on the vector from the origin in the direction of the sampled points.
2. The Wandering Ideal Point Model 2.1 General Formulation In the WIP model, both the subjects and the stimuli are represented as points in a joint r-dimensional space. Whereas the stimuli 1,2, . . . , M are represented by fixed points X I , x2, . . . , x ~ the , subjects are represented by random points. More specifically, a subject i (i = 1,N) is represented by a random point Yi which is assumed to follow a multivariate normal distribution
Yi
- N(Pi, xi).
(6)
It is assumed that the distributions of the N subjects points are independent of each other, i.e.,
COVar(Yi, Yi,) = 0 for i, i f = 1, N and i # if. According to the model, each time a pair of stimuli U,k) is presented to a subject i, he or she samples a point yi from Yi. Following Coombs’ unfolding model, the subject prefers stimulus j to k whenever
d(Yi,Xj) < d ( ~ i , x d
(7)
where d ( . ; ) denotes the Euclidean distance function, i.e.,
d2(yi,xj) = (yi - Xj>’(yi -
(8)
An illustration of the WIP model is shown in Figure 1. In the figure, the sampled point yi is closer to x, than to xk. Therefore, subject i would on this particular occasion prefer stimulus j to stimulus k. Since the subject always prefers the choice alternative that is closest to yi, yi can be considered as subject i’s ideal point. However, since each time a pair of stimuli is presented, a new point yi is sampled from Yi, a subject’s ideal point is not fixed, but “wanders” from trial to trial. Hence the name the wandering ideal point model.
The Wandering Ideal Point Model
127
Figure 1. Illustration of thc WIP model. The ellipse represents the random subject point.
By squaring both sides of (7) and rearranging terms, we obtain that subject i prefers stimulus j to k whenever (xk
- xj)’yi < (xk’xk - x j ’ x j ) / 2 .
(9)
Consequently, the probability that subject i prefers object j to k is
{
pijk = Rob (Xk
}
- X j ) ’ Y i < (Xk’Xk - X j ’ X j ) / 2 .
(10)
Since it follows from (6) that (xk
where
-xj)yi
- N((Xk
- xj)’pi,
6$k)
(11)
De Soete, Carroll, & DeSarbo
128
eq. (10) becomes
where @ denotes the standard normal distribution function. Equation (13) provides the general formulation of the WIP model. 2.2 Properties
It is easy to show that the WIP model is a moderate utility model. By defining uij
= xj’pi - X j ‘ X j l 2
(14)
eq. (13) can be rewritten as
Since as a covariance matrix Xi is always positive (semi-)definite, 6 i j k is a (semi-)metric and eq. (15) is of the form (5). That the choice probabilities defined by the W P model do not necessarily satisfy strong stochastic transitivity is readily demonstrated by means of a simple counterexample. Let
then pijk = 0.98 and pikl = 0.69, but pijl = 0.82. Figure 2, taken from De Soete et al. (1986), visualizes some of the properties of the WIP model. When the distances between x , and pi and between x k and pk (in the figure indicated as di, and dik respectively) are fixed, the probability that subject i prefers stimulus j to k varies as a function of the distance between x , and xk. This illustrates that extreme choice proportions are more likely to occur when the stimulus points are close, while distant object points are more likely to induce more moderate choice proportions.
The Wandering Ideal Point Model
129
1.o
x
0
+ .-
c3
3 U U
w
LL W LY
a 0.5 LL
0
>-
t
-I
m
--- dij - dij
6
m
0
a a
-.-. 0.0 0.00
= 1.20 dik = 2.00 = 1.50 dik = 2.00
dij = 1.80 d i k = 2.00
I
1.97 DISTANCE BETWEEN j AND k
3 $3
Figure 2. Probability of prefemng stimulus j to k as a function of the distance between j and k for fixed dij and dik (adapted from De Soete et al., 1986). 2.3 Degrees of Freedom The following parameters occur in the WE' model: the mean subject points pi, the subject covariance matrices Zi, and the stimulus points x,. Thus, with N subjects and M stimuli, the WIP model has in its general form as defined in eq. (13)
De Soete, Carroll, & DeSarbo
130 r
1
L
parameters. However, the model does not determine all these parameters uniquely. More specifically, the choice probabilities are invariant under the following family of transformations of the parameters: a. Translation of the subject and the stimulus points: Adding the same arbitrary r-component vector to all subject and stimulus points does not affect the choice probabilities. b. Central dilation of the subject and the stimulus points: Simultaneous transformations of the form
xi
+ axj
(j= 1,M)
Yi
+ aYi
(i = l,N),
where a is an arbitrary positive constant, leave the choice probabilities invariant. Note that
CLYj - N(api, a2Zj). c. Orthogonal rotation of the subject and stimulus points: Applying the same orthogonal rotation T to all stimulus and subject points does not affect the choice probabilities predicted by the model. Note that the distribution of TYi is
TYi
- N ( T p j , TZiT’).
Because of these indeterminacies, we must subtract r + 1 + r(r - 1)/2 from (16) (r for the translational indeterminacy, 1 for the scale indeterminacy, and r(r - 1)/2 for the rotational indeterminacy), in order to obtain the degrees of freedom of the general WIP model:
(M + N)r + Nr(r+ 1)/2 - r(r
+ 1)/2 - 1.
(17)
2.4 Special Cases
In empirical applications, it might be interesting to impose restrictions on the general WIP model, either to reduce the number of parameters to be estimated or to verify specific hypotheses. The validity of a hypothesis
The Wandering Ideal Point Model
131
can be tested statistically by comparing the fit of the restricted model with the fit of the general model. First of all, various kinds of restrictions can be imposed on the covariance matrices of the subject points. The Z i can for instance be constrained to be diagonal. Due to the rotational indeterminacy mentioned earlier, setting the off-diagonal elements of the covariance matrices equal to zero only imposes real constraints on the general WIP model when N > 1. The degrees of freedom of this constrained model are (M
+ 2N)r - r - 1.
(18)
Note that when N = 1, (18) equals (17. A more restrictive constraint which is effective even when N = 1, requires all X i to be identity matrices. In this case, the model has (N
+ M ) r - r ( r + 1)/2
degrees of freedom. Besides, or in addition to, constraining the covariance matrices Xi, various linear constraints could be imposed on the coordinates of the stimulus points in order to relate the stimulus point locations to known characteristics of the stimuli. Similarly, the mean subject points can be related to background information on the subjects by imposing appropriate linear restrictions on the ~ i . De Soete and Carroll (1986) consider the special case where it is supposed, in analogy with the factor analysis model, that the M stimuli have r (< M> dimensions in common and that, in addition, there is a specific dimension for each stimulus. The stimulus coordinates can therefore be written as
X* = (X I,) where X = (xl, . . . ,x,)' contains the coordinates of the M stimuli on the r common dimensions and I, is an identity matrix of order M. Assume that Y', the (r + M)-dimensional random point representing subject i, is distributed as follows
where ,Ot denotes an s by t matrix filled with zeros. Le., Y; is assumed
De Soete, Carroll, & DeSarbo
132
to have zero expectation and a variance of y? on each specific dimension. Now, since
-
( x i - xj”)’Y; N((Xk - xj)’pj, 6$k
+ 22),
the model becomes
3. Applying the WIP Model 3.1 Parameter Estimation
In order to apply the WIP model, one must dispose of replicated paired comparisons for one or more subjects (or groups of subjects). Maximum likelihood estimates of the model parameters can be obtained by maximizing
L
= nnp”(l- p . . N
M
i jck
V
)(Nijk-nij~)
lJk
(19)
where Ni;k denotes the number of times stimulus pair 0 , k ) was presented to subject i and ni$ the number of times subject i preferred j to k. De Soete et al. (1986) use a generalized Fisher scoring algorithm for maximizing log L. This amounts to iteratively applying the following updating rule till no further improvement is possible: e(4+1)
where
= e ( 4 ) + a(4)I(eC4))+g(e(q)>,
(20)
The Wandering Ideal Point Model
133
8 is a vector containing the parameters to be estimated, q is the iteration index,
a is a stepsize parameter, g is the gradient of log L:
I(€))is the Fisherian information matrix:
The classic scoring algorithm utilizes the regular inverse of the information matrix. Because the WIP model does not determine all parameters uniquely, the information matrix is not of full rank and has no regular inverse. Therefore, following Ramsay (1980), the Moore-Penrose inverse I(@+ is used. 3.2 Model Validation
One of the advantages of maximum likelihood estimation is that it enables statistical model evaluation in a straightforward way. Whenever a model o is subsumed under a more general model R, the null hypothesis that o fits the data equally well as R can be tested by means of the statistic A
.
u = -2log(L,/Ln)
i,
(19)
and i n denote the maximum of (19) for models w and R where respectively. U follows under the null hypothesis asymptotically a chisquare distribution with degrees of freedom equal to the difference between the degrees of freedom of model R and the degrees of freedom of model o. The most general model, referred to as the null model, against which the WIP model can be tested, only assumes that for each subject i and each pair of stimuli U,k)the data are sampled from a binomial distribution with probability pjjk. It is well-known that the maximum likelihood estimate of pi,k under this model is simply ni,k/Nj,k. When the goodness-of-fit of two non-nested models needs to be compared, one can resort to Akaike's (1977) information criterion which is defined for model w as
De Soete, Carroll, & DeSarbo
134 1
AIC, = 2 log 15,
+2~,,
where v, is the degrees of freedom of model o. The AIC statistic is a badness-of-fit measure that corrects for the gain in goodness-of-fit due to an increased number of free parameters in the model. The model with the smallest value for the AIC statistic is considered to give the most parsimonious representation to the data. 4. Illustrative Application
As an illustrative application, we report the WIP analyses carried out by De Soete et al. (1986) on a data set gathered by Rumelhart and Greeno (1971). Rumelhart and Greeno (197 1) obtained pairwise preference judgments from 234 undergraduates about nine celebrities. These celebrities consisted of three politicians (L. B. Johnson, Harold Wilson, Charles De Gaulle), three athletes (A. J. Foyt, Johnny Unitas, Carl Yastrzemski), and three movie stars (Brigitte Bardot, Sophia Loren, Elizabeth Taylor). The subjects were treated as replications of each other, so that the case N = 1 applies. Two versions of the WIP model were applied in two dimensions: the general model with a diagonal covariance mamx (which is in the case of N = 1 equivalent to using an unconstrained covariance mamx) and the WIP model with an identity matrix as covariance matrix. Both models were tested against the null model described in the previous section. The chi-square statistic for the general WIP model has 17 df and amounted to 9.8, while the chi-square statistic for testing the constrained WIP model has 19 df and amounted to 10.1. Both chi-square values are clearly nonsignificant, showing that both representations give a good account of the data. Since the WIP model with an identity covariance mamx is subsumed under the general WIP model, a likelihood ratio test can be performed to see whether the constrained model fits the data equally well as the more general model. The relevant chi-square statistic has 2 df and amounted to 0.3, which is clearly not significant. This implies that the ideal point appears to wander to an equal degree in all directions of the space. This two-dimensional solution is presented in Figure 3. As is apparent from the figure, the politicians, athletes, and movie stars clearly show up as identifiable clusters. The politicians constitute the most preferred group of celebrities, whereas the movie stars are generally preferred
The Wandering Ideal Point Model
135
to the athletes. For a further discussion of this application, and a comparison with analyses of the same data according to other models, we refer the reader to De Soete et al. (1986).
SOPHIA LOREN
b
TAyLCN
CENTROID IDEAL POINT L.B. JOHNSON b
BRIGITTE BARDOT
HAROLD WILSON 0
A.J. FOYT
CHARLES DE GAULLE
0
JOHNNY UNITAS
CARL YASTRZEMSKI
Figure 3. Representation of the Rumelhart and Greeno (1971) data according to the WIP model with identity covariance matrix.
References Akaike, H. (1977). On entropy maximization principle. In P. R. Krishnaiah (Ed.), Applications of statistics (pp. 27-41). Amsterdam: NorthHolland. Becker, G. M., DeGroot, M. H., & Marschak, J. (1963). Probabilities of choice among very similar objects. Behavioral Science, 8, 306-31 1. Bechtel, G. G. (1968). Folded and unfolded scaling from preferential comparisons. Journal of Mathematical Psychology, 5, 333-357. Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika, 39,
136
De Soete, Carroll, & DeSarbo
324-345. Carroll, J. D. (1980). Models and methods for multidimensional analysis of preferential choice (or other dominance) data. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 234-289). Bern: Huber. Coombs, C. H. (1950). Psychophysical scaling without a unit of measurement. Psychological Review, 57, 145-158. Coombs, C. H. (1958). On the use of inconsistency of preferences in psychological scaling. Journal of Experimental Psychology, 55, 1-7. Coombs, C. H. (1964). A theory of data. New York: Wiley. Coombs, C. H., Greenberg, M., & Zinnes, J. (1961). A double law of comparative judgment for the analysis of preferential choice and similarities data. Psychometrika, 26, 165-171. De Soete, G., & Carroll, J. D. (1983). A maximum likelihood method for fitting the wandering vector model. Psychometrika, 48, 553-566. De Soete, G., & Carroll, J. D. (1986). Probabilistic multidimensional choice models for representing paired comparisons data. In E. Diday et al. (Eds.), Data analysis and informatics IV (pp. 485-497). Amsterdam: North-Holland. De Soete, G., Carroll, J. D., & DeSarbo, W. S. (1986). The wandering ideal point model: A probabilistic multidimensional unfolding model for paired comparisons data. Journal of Mathematical Psychology, 30, 28-41. Halff, H. M. (1976). Choice theories for differentially comparable alternatives. Journal of Mathematical Psychology, 14, 244-246. Krantz, D. H. (1967). Rational distance function for multidimensional scaling. Journal of Mathematical Psychology, 4 , 226-245. Luce, R. D. (1959). Individual choice behavior. A theoretical analysis. New York: Wiley. Ramsay, J. 0. (1980). The joint analysis of direct ratings, pairwise preferences, and dissimilarities. Psychornetrika, 45, 149-165. Rumelhart, D. L., & Greeno, J. G. (1971). Similarity between stimuli: An experimental test of the Luce and Restle choice models. Journal of Mathematical Psychology, 8, 370-38 1. Schonemann, P. H., & Wang, M.-M. (1972). An individual difference model for the multidimensional analysis of preference data. Psychometrika, 37, 275-309. Sixtl, F. (1973). Probabilistic unfolding. Psychometrika, 38, 235-248.
The Wandering Ideal Point Model
137
Sjoberg, L. (1977). Choice frequency and similarity. Scandinavian Journal of Psychology, 18, 103-115. Sjoberg, L. (1980). Similarity and correlation. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 70-87). Bern: Huber. Sjoberg, L., & Capoza, D. (1975). Preference and cognitive structure of Italian political parties. Italian Journal of Psychology, 2 , 391-402. Tversky, A., & Russo, J. E. (1969). Substitutability and similarity in binary choices. Journal of Mathematical Psychology, 6 , 1-12. Tversky, A., & Sattath, S. (1979). Preference trees. Psychological Review, 86, 542-573. Wang, M.-M., Schonemann, P. H., & Rusk, J. G. (1975). A conjugate gradient algorithm for the multidimensional analysis of preference data. Multivariate Behavioral Research, 10, 45-99. Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic, multidimensional unfolding analysis. Psychometrika, 39, 327-350.
This Page Intentionally Left Blank
New Dcvclopments in Psychological Choice Modeling G. Dc Swte, H. Fcgcr and K. C. Klauer (eds.) 0 Elsevicr Science Publishcr B.V. (North-Holland), 1989
139
ANALYSIS OF COVARIANCE STRUCTURES AND PROBABILISTIC BINARY CHOICE DATA Yoshio Takane McGill University, Canada Pair comparison judgments are often obtaincd by multiple-judgment sampling, which givcs rise to dependencies among observations. Analysis of covariance structures (ACOVS) provides a general methodology for taking apart between-subjcct and within-subject variations, Lhcrcby accounting for thc dependencies among observations. In this expository papcr we show how various concepts underlying ACOVS can be used in constructing probabilistic choicc models that take into account systematic individual differences.
1. Introduction Stimulus comparison presents a general paradigm in diversified fields of scientific investigations (Bradley, 1976). In bioassay strength of life of an organism is compared with dosage levels of a drug. In psychology, econometrics and political science, a subjective quality of a stimulus (e.g., subjective length of a line, grayness of a color, preference toward a political candidate, etc.) is compared against that of another. In statistics loglinear analysis of a frequency table compares the strengths with which subjects belong to certain categories. In a mental test subjects’ ability is compared against difficulty of a test item.
The work reported in this papcr has bccn supported by Grant A6394 to the author from the Natural Scicnces and Engineering Research Council of Canada. Thanks arc due to Jim Ramsay for his helpful comments on an earlier draft of this paper. This papcr is a rcviscd version of an articlc publishcd in Communication & Cognition, 1987, 20, 45-62.
140
Takane
In each case, pi,, the probability that stimulus i is chosen over stimulus j , indicates the degree to which stimulus i dominates stimulus j . However, there are two possible interpretations of pi,, which closely parallel two sampling schemes of pair comparison data (Thurstone, 1927). In Case 1 replications (both within and across stimulus pairs) are made strictly within a single subject, and thus inconsistency in choice is attributed to momentary fluctuations in the internal state of the subject. The pi, in this case represents the proportion of times stimulus i is chosen over stimulus j by the subject. In Case 2, on the other hand, the probability distribution is over a population of subjects. That is, the stochastic nature of choice is attributed to subject differences. The pi, in this case represents the proportion of the subjects in the population who choose stimulus i over stimulus j . Despite the difference in the interpretation, basically the same class of models have been used in both cases. Typically, these models assume statistical independence among observed choice probabilities. However, in Case 1 all pair comparison judgments are made by a single subject, so that there should be no sequential effects. This rules out identifiable stimuli to be used in Case 1 because of the memory effect. In Case 2, each subject is supposed to contribute one and only one observation. This usually ensures the statistical independence. On the other hand, it requires a huge number of subjects. Pair comparison experiments thus rarely use either one of these extreme sampling designs. Instead they typically employ a mixed design, in which each of a group of subjects is asked to respond to all possible pairs of stimuli. That is, replications over different stimulus pairs are obtained within subjects, and replications within stimulus pairs are obtained across subjects. This mixed mode sampling scheme is analogous to the treatment by subject design in ANOVA and is called multiple-judgment sampling in this paper. This sampling design is especially popular in preference judgments, because researchers in this area are often interested in how preferences toward various stimuli correlate with each other, how patterns of preference distribute in the population of subjects, and how an individual’s pattern of preference can be represented in relation to others. In the multiple-judgment sampling pij can be still interpreted as the proportion of the subjects who choose stimulus i over stimulus j , as in Case 2. However, due to within-subject replications across different
Analysis of Covariance Structures
141
stimulus pairs, observed choice probabilities are no longer statistically independent. Systematic individual differences give rise to the dependencies among the observations. For example, a person who tends to prefer product A to B may also tend to prefer C to D. Models of pair comparisons in this case should take into account the systematic individual differences in pair comparison judgments. However, with notable exceptions (Bock & Jones, 1968, pp. 143-161; Bloxom, 1972; Takane, 1985) nearly all previous models of pair comparisons ignored the systematic individual differences. What is needed is a general methodology for separating the systematic individual differences components in the data from strictly random components. The method particularly relevant in this context is the analysis of covariance structures (ACOVS) originally proposed by Bock and Bargman (1966) and subsequently amplified by Joreskog (1970). As has been demonstrated recently (Takane, 1985), the ACOVS framework can be successfully used to extend conventional Thurstonian pair comparison models to multiple-judgment sampling situations. In addition the ACOVS framework may bring on considerable richness to analysis of pair comparison data in general. The purpose of this paper is to explore and overview this possibility. 2. Thurstonian Models of Pair Comparisons
Let us begin with a brief review of Thurstonian random utility models (Thurstone, 1927, 1959). Over the past several years there were interesting developments in this approach (Takane, 1980; Heiser & de Leeuw, 1981; Carroll, 1980; De Soete & Carroll, 1983), which directly lead to the ACOVS formulations of these models. In Thurstone’s original pair comparison model each stimulus is associated with a random variable (called a discriminal process) with prescribed distributional properties. Let Yi represent the random variable for stimulus i. It is assumed that ~i
- N ( m i , o;),
i = 1,..., n
(1)
where mi = E ( Y i ) and sf = V ( Y i ) . The m i represents the mean scale value (e.g., preference value), and sf the degree of uncertainty of stimulus i. When stimuli i and j are presented for comparison, random variables
142
Takane
corresponding to these stimuli, namely Yi and Y,, are generated, and the comparison is supposedly made on the realized values of the random variables at the particular time. The comparison process is supposed to take the difference between Yi and Y,, and either the value of Yi - Y, or some monotonic transformation of it is directly reported, or only its sign (if Yi - Y, is positive or negative) is reported in the form of choice (either stimulus i is chosen or stimulus j is chosen). Under the distributional assumption made above,
where
with si, = Cov(Yi, Y,). Thus the probability that stimulus i is chosen over stimulus j is given by
where qi, = (mi - m,)/di,, and @ and
7 no tables for P e ( Y = y ) are as yet available. They will be available soon. Table 2 Goodness of fit for the Goldberg data with and without outliers,
case k = 7. Goldberg* data ,. 7 = 1.02, 8 = .21, T = .86
Y
obs
0 1 2 3 4 5 6 7 8 9 10
54 49 23 9 3 2 0 0 0 0 0 140
X2 55.01 46.3 1 24.46 9.73 3.25 .93 .24 .05 .o1
.02 .16 .09 .06 .06
.oo .oo x2=.37
Goldberg data 7 = 1.18, = .23, z = .84 A
e
Y
obs
0 1 2 3 4 5 6 7 8 9 10
54 49 23 9 3 2 0 0 2 0 1 143
X2 49.17 46.67 27.82 12.50 4.72 1.53 .44 .ll .02
.47 .12 .84 .98 .20
.oo .oo X 2=2.61
Case k = 7. For the Goldberg* data X 2 = .37, df = 3, .90 c p c .95. For the Goldberg data X2 = 2.61, df = 3, .25 < p c SO. If we had treated the values Y = 4 and Y 2 5 as separate categories, we would have found
254
van Blokland-Vogelesang
X 2 = 6.95, df = 4, .10 < p < .25. Hence, the Feigin and Cohen model fits in very well with the errors in the data for k = 7. See Table 2. In sum, the Goldberg data without the three outliers unfold into a quantitative J scale for nine items. The best qualitative J scale for ten items is consistent with this scale. The median ranking is a folded J scale and represents the prestige hierarchy of these technical occupations. The J scale is bipolar and can be interpreted as going from “working with techniques’’ to “working with people” (see Figure 6 ) . The Goldberg data including the outliers do not give the same results. An extreme item flips over to the other end of the scale, because of the increased level of error due to these outliers.
social prestige
‘,
\
\
i,
\ \
\
\
\\
\
‘\--”appl
\ ,
‘.---oO.R
‘. ‘\ ---\
\
I
\
‘\, ‘\\ .-41ind
\
\
\
‘
,\
‘1
rnan(r--( -41
‘...------ -
I
I
/I
.’’
’
I
I
,I
/
/
// I
I
I
/ /I /
/
/I
//’
staff+---/ 41
I
I
/
/
mech
I
I
tech
..,/
I
___C-/’
‘rconsensus ranking. fac, own,
- - - - - -, sup.
Figure 6. The unfolding scalc for the Goldbcrg* data. The median ranking is a foldcd J scalc and rcprcscnls the prcstigc ladder for thcse tcchnicd occupations.
For all k that have been investigated, deviations from the perfect unfolding model can be explained by Feigin and Cohen’s model.
Unfolding and Consensus Ranking
255
7. Discussion
In unfolding a set of data we are searching for an underlying J scale. This J scale can be seen as a reference frame for the evaluation of stimuli. If people’s rankings all unfold into the same quantitative J scale, the consensus ranking is the ranking of the median individual on the J scale. Two points come up for discussion now. The first point concerns the stability of an unfolding solution for increasing numbers of items. If the J scale is a reference frame in some domain of research, it should not be varying with the number of stimuli in the analysis. That is to say: we are looking for a J scale which grows with increasing numbers of stimuli. Also, for each k the same items must be on the J scale in a constant order. If this is the case we can conclude that an underlying reference frame has been established. However, as we have seen from the two Goldberg data sets, extreme rankings or - in general - the level of error in the data may cause stimuli not to have firm positions on the J scale. If this is the case, stimuli may start flipping over from the one end of the scale to the other end. A related problem is that of “irrelevant stimuli”. The majority decision and the consensus ranking in general are not independent of irrelevant alternatives. More specifically: by introducing irrelevant stimuli in the data, the median ranking (and the J scale) may be different for increasing k. For some k an item may crop up into the J scale and disappear again with the next larger k. If a stable continuum seems to arise from the analysis, it seems wise to ignore a J scale which contains that particular stimulus and to take a next best J scale for that value of k which is consistent with the whole set of J scales. In the previous section it was shown that the Goldberg (1976) data (excluding three outlying rankings) unfold into a nested set of J scales. Also errors from the perfect unfolding model could be explained by Feigin and Cohen’s (1978) error model (at least for k 5 7). Since the occupations were ranked according to the degree of social prestige associated with each of them, the consensus ranking is interpreted as the prestige ladder for these technical occupations.
van Blokland- Vogelesang
Appendix I E o ( X ) (first row) and Ee(T) (second row) for selected values of 6 and k.
0 .05 .10 -15 .20 .25 .30 .35 A0 .45
k 3
.loo .933 .199 367 .297 302 .392 .738 ,486 .676 .576 .616 .663 .558 .747 .502 ,828
.448 .50 .55 .60 .65 .70 .75
.80 .85 .90 .95
1.OO
.905 .397 ,978 .348 1.048 .301 1.115 .256 1.179 .214 1.239 .174 1.297 .135 1.352 ,099 1.404 .064 1.453 .03 1 1.500 0
4
5
.152 .949 .310 .897 .471 343 ,636 .788 303 .732 .972 .676 1.141 .620 1.309 .564 1.475 .508 1.638 .454 1.798 .401 1.953 ,349 2.103 .299 2.248 .251 2.388 .204 2.522 .159 2.650 ,117 2.772 ,076 2.889 .037 3.000 0
,205 .959 .421 .916 .647 ,871 .884
.823 1.132 .774 1.388 .722 1.653 .669 1.924 .615 2.199 .560 2.477 .505 2.755 .449 3.031 .394 3.304 .339 3.572 .286 3.832 ,234 4.085 ,183 4.329 .134 4.563 ,087 4.786 ,043 5.000 0
6
258 .966 .532 .929 324 .890 1.134 .849 1.464 305 1.813 .758 2.180 .709 2.566 .658 2.967 .604 3.382 .549 3.806 .492 4.238 .435 4.672 ,377 5.105 .319 5.533 ,262 5.953 ,206 6.362 .152 6.757 ,099 7.137 .048 7.500 0
7 .310 .970
.643 .939 1.Ooo ,905 1.384 368 1.797 329 2.240 .787 2.714 .741 3.221 .693 3.759 .642 4.326 .588 4.920 .53 1 5.536 ,473 6.168 .413 6.810 .351 7.455 .290 8.096 ,229 8.726 .169 9.340 ,110 9.932 ,054 10.500
0
8 .363 .974 .754 .946 1.177 .916 1.634 383 2.130 348 2.668
309 3.251 ,768 3.882 .723 4.564 .674 5.295 .622 6.075 .566 6.899 ,507 7.762 .446 8.654 .382 9.565 ,317 10.483 ,251 11.396 .186 12.293 ,122 13.164 ,060 14.000
0
9
10
.415 .977 365 .952 1.353 .925 1.884 ,895 2.463 .863 3.096 328 3.789 ,790 4.547 ,747 5.375 .701 6.277 .651 7.256 .597 8.308 ,538 9.429 .476 10.609 .411 I 1334 ,343 13.088 .273 14.350 ,203 15.601 ,133 16.823 ,065 18.000
,468 .979 .976 .957 1.530 .932 2.134 .905 2.796 376 3.525 ,843 4.327
0
,808 5.212 .768 6.190 .725 7.268 .677 8.452 .624 9.747 ,567 11.149 .504 12.651 .438 14.237 .367 15.885 ,294 17.565 .219 19.248 ,145 20.902 .07 1 22.500
0
Unfolding and Consensus Ranking
257
References Arrow, J. K. (1951). Social choice and individual values. New York: Wiley. Black, D. (1948a). On the rationale of group decision making Journal of Political Economics, 56, 23-34. Black, D. (1948b). The decisions of a committee using a special majority. Econometrica, 16, 245-261. Cohen, A., & Mallows, C. L. (1980). Analysis of ranking data. Bell Laboratories Memorandum, Murray Hill, NJ. Coombs, C. H. (1954). Social choice and strength of preference. In R. M. Thrall, C. H. Coombs, & R. L. Davis (Eds.), Decision processes. New York: Wiley. Coombs, C. H. (1964). A theory of data. New York: Wiley. Davison, M. L. (1979). Testing a unidimensional, qualitative unfolding model for attitudinal or developmental data. Psychomenika, 44, 179-194 Feigin, P. D., & Cohen, A. (1978). On a model for concordance between judges. Journal of the Royal Statistical Society, B, 40, 203-213. Goodman, L. A. (1954). On methods of amalgation. In R. M. Thrall, C. H. Coombs, & R. L. Davis (Eds.), Decision processes. New York: Wiley. Goldberg, A. I. (1976). The relevance of cosmopolitan/local orientations to professional values and behaviour. Sociology of Work and Occupation, 3 , 331-356. Kemeny, J. G. (1959). Mathematics without numbers. Daedalus, 88, 577-591. Kemeny, J. G., & Snell, J. L. (1972). Preference rankings, an axiomatic approach. Cambridge, MA: MIT Press. Kendall, M. G . (1975). Rank correlation methods. London: Griffin Lehmann, E. L. (1975). Nonparametrics. New York: McGraw-Hill. Mallows, C. L. (1957). Non null ranking models I. Biometrika, 44, 114130. van Blokland-Vogelesang, R. A. W., Verbeek, A., & Eilers, P. (1987a). Iterative estimation of pattern and error parameters in a probabilistic unfolding model. In E. E. Roskam & R. Suck (Eds.), Progress in mathematical psychology I . Amsterdam: Elsevier (North-Holland).
258
van Blokland-Vogelesang
van Blokland-Vogelesang, R. A. W. (1988). UNFOLD: A computer program for the unfolding of complete rankings of preference in one dimension. Free University, Amsterdam van Blokland-Vogelesang, R. A. W. (in press). Midpoint sequences, intransitive J scales and scale values in unidimensional unfolding. In E. E. Roskam & E. Degreef (Eds.), Progress in mathematical psychology 11 Amsterdam: Elsevier (North-Holland). Van der Ven, A. H. G. S . (1977). Inleiding in de schaaltheorie. Deventer: Van Loghum Slaterus. Van Schuur, W. H. (1984). Structure in political beliefs. Doctoral thesis, University of Groningen, The Netherlands.
New Dcvclopments in Psychological Choicc Modeling G . Dc Soctc, H. Fcgcr and K. C. Klaucr (cds.) 0Elscvicr Scicncc Publisher B.V. (North-Holland), 1989
259
UNFOLDING THE GERMAN POLITICAL PARTIES: A DESCRIPTION AND APPLICATION OF MULTIPLE UNIDIMENSIONAL UNFOLDING Wijbrandf H. van Schuur University of Groningcn, Thc Nethcrlands This paper discusses a numbcr of problems with existing unfolding modcls and proposes a strategy of analysis to overcome these probIcms. This stratcgy assumcs dichotomous or dichotomized data, and derives unfoldability critcria from information about ordcrcd triples of stimuli. A unidimcnsional unfolding scale conforming to thcse criteria can bc found for a maximal subset of stimuli. This proccdure can bc applicd to full or partial rank ordcrs of preference, which arc dichotomized to “pick kln” data, and to Likcrt-type rating scales, which arc dichotomizcd to “pick anyln” data. This procedurc is applicablc to largc data scts, such as survey data. As an cxamplc, the proccdurc is applicd to prcfcrcnccs for five German political parties in electoral survcys in 1969, 1972, and 1980. A dominant left-right unfolding dimcnsion is found, and violations of this represcntation are discusscd. 1. Introduction
Coombs’ unfolding model, first presented in 1950, is regarded by many methodologists in the social sciences as an appealing model for the analysis of preferences. Introductions to unfolding appear in many textbooks on scaling, and computer programs for unfolding analysis continue to be developed. Despite favorable attention, however, reports of successful applications of unfolding are rare, and unfolding programs have only very recently found their way into some general-purpose statistical packages. This paper is a rcviscd version of an articlc publishcd in Zcirschrifi fur Sozialpsychologie, 1987, 18, 258-273.
260
van Schuur
There are two major reasons for the relative neglect of the unfolding model by applied social researchers. One is that most techniques for unfolding operate on full rank orders of preference, which ties unfolding to a relatively unpopular form of data collection. The other is that until now we have not been able to satisfactorily unfold imperfect data (i.e., data that do not conform perfectly to the unfolding model). In this paper I propose a new strategy for unidimensional unfolding that solves both these problems. This strategy is implemented in a computer program called MUDFOLD, for Multiple UniDimensional unFOLDing. To exemplify of the technique, I present the unfolding analysis of preferences for five German political parties by German voters in 1969, 1972, and 1980. 2. Background: Problems With Existing Unfolding Models 2.1 Unfolding Analysis of Different Data Types
The tradition of using full rank orders of preference in unfolding analyses has obscured the fact that other types of data may be represented along an unfolding dimension as well. In particular, data obtained from five-point Likert-type rating scales may conform to the unfolding model. Researchers who do not realize that rating data may fit the unfolding model often subject their data to factor analysis. The use of factor analysis with data that can be unfolded gives rise to a problem, however: an extra, artificial factor will be introduced, in addition to the number of factors (i.e., dimensions) necessary for an unfolding representation (Coombs & Kao, 1960; Ross & Cliff, 1964). Such factor analysis results may then lead to interpretations of the data different from those that an unfolding analysis would suggest. However, researchers who try to unfold rating data with the currently available unfolding models often get degenerate results because their data contain many ties. The unfolding model presented below is capable of analyzing rating data without the problem of degeneracy. An essential aspect of this model is that data are dichotomized, e.g., into “preferred” and “not preferred” response alternatives. Dichotomization allows rating data and a large number of other data types, including full and partial rank orders, to be used in unfolding analysis. It is a desirable technique also for additional reasons, as will be discussed shortly.
Unfolding the German Political Parties
26 1
2.2 Unfolding Analysis of Imperfect Data When a data set is not perfectly unfoldable, its imperfections can be attributed either to random noise or to systematic deviations from the unidimensional unfolding model. Random noise can be handled by using a stochastic rather than a deterministic model. Stochastic models have been discussed by Sixtl (1973), Zinnes and Griggs (1974), Bechtel (1976), Dijkstra et al. (1980), and Jansen (1983), among others. All of these models are unsatisfactory in certain ways. Some are designed to be used only in a confirmatory way, i.e., to test whether a known order of all stimuli can be interpreted as a Jscale. Others require repeated questioning of subjects to obtain estimates of the probability with which they prefer one stimulus over the other. Still others depend on assumptions that are probably incorrect; for example, especially problematic is the assumption that if subjects are given a choice between two stimuli that lie close together on the J-scale but far away from their ideal point, they will almost deterministically prefer the one closer to their ideal point. However, they will probably prefer both to approximately the same (low) degree. Despite these difficulties with stochastic unfolding models, the strategy of relaxing the criterion for perfect representation to allow stochastic representation seems advantageous. I return to this when I propose a new strategy for unidimensional unfolding. Systematic deviations from the unidimensional unfolding model can be explained in at least four ways. According to one interpretation, respondents begin the task of picking preferred items by using the most salient common criterion, but in the course of evaluating stimuli that are less preferred, they bring other, more idiosyncratic criteria into play. According to a second explanation, the preference judgment process is multidimensional rather than unidimensional, i.e., two or more criteria for preference play an independent but simultaneous role in all preference judgments for all stimuli. Thirdly, the set of stimuli may not be homogeneous with respect to the latent unfolding dimension; that is, one or more of the stimuli are indicators of a different latent trait. Finally, the set of subjects may not be homogeneous with respect to the latent unfolding dimension: they may either use different dimensions, or they may perceive the stimuli differently on the same dimension.
262
van Schuur
Let us look at these problems in more detail and consider some possible strategies for dealing with them.
2.2.1 Analyzing Dichotomous Data: “pick anyln” and “pick kln” Analysis The unfolding model assumes that successively chosen stimuli are decreasingly good substitutes for the subject’s ideal stimulus according to the criterion used for selection. However, in the course of giving a full rank order of preference for n stimuli, a subject may begin to use other criteria for choosing that are different from the criterion with which he or she started out. Coombs (1964) talked about the “portfolio” model in this connection, and Tversky (1972) and Tversky and Sattath (1979) suggested an “Elimination by Aspects” (EBA) model, in which different criteria for preference are hierarchically ordered. To deal with this problem in such a way that we can still find the dominant criterion that is used by all subjects, we should restrict ourselves to distinguishing only the first few most preferred stimuli from the remaining ones: otherwise we risk introducing idiosyncratic noise. Distinguishing the first few most preferred stimuli from the remaining ones can be done by dichotomizing the preference responses of each subject (see Leik & Matthews, 1968; Coombs & Smith, 1973; and Davison, 1980, among others). This is accomplished by assigning the code “1” to each subject’s most preferred stimuli; and “0” to the remaining stimuli. The cutoff point between preferred and non-preferred stimuli depends on the type of data. In the case of Likert-scale items one or more response categories (e.g., “strongly in favor”) can be considered as the “preferred response” and the remaining categories as “nonpreferred responses”. Since a subject can give the “preferred response” to any number of Likert-scale items, he can pick any of the n stimuli as “most preferred”. In the case of full or partial rank orders of preference, however, the researcher generally has to decide which k ( k 2 2) most preferred stimuli will be distinguished from the remaining ones. The unfolding analysis of dichotomous data has been called “parallelogram analysis” by Coombs (1964): a data matrix of subjects and stimuli, ordered according to their position on a perfect unidimensional unfolding scale shows a parallelogram pattern of “ l ” s from top left to bottom right.
Unfolding the German Political Parties
263
Using dichotomous data has both advantages and disadvantages. An advantage is that a large number of different data types (including full and partial rank orders of preference, Likert-type rating scales, and roll call data) can all be subjected to such an analysis; all that is needed is that the most preferred stimuli can be distinguished from the others. A disadvantage is that, in contrast to the unfolding analysis of full rank orders of preference, the unfolding analysis of dichotomous data only leads to a qualitative J-scale. This means that no metric information about the relative distances between the stimuli is available and therefore that subjects cannot be discriminated as well as in a quantitative J-scale. However, Davison (1979), has argued convincingly that it is in any event unlikely that a single quantitative J-scale can be found for a large group of subjects in practical applications of unfolding analysis, because subjects often use different subjective metrics.
2.2.2 Multidimensional Unfolding Multidimensional unfolding models assume that subjects do not use a single criterion in making their preference choices, but rather use two, three, or even more independent criteria simultaneously. Multidimensional models have been proposed by Bennett and Hays (1960), Roskam (1968), Schonemann (1970), Carroll (1972), Gold (1973), Kruskal, Young, and Seery (1973) and Heiser (1981), among others. Multidimensional unfolding models are appealing in part because there are various ways for combining the different dimensions, for example, the vector model and the weighted distance model (e.g., Carroll, 1972). However, disadvantages are that technical problems of degeneracy and the representation of I-scales as points in essentially open isotonic regions are more likely to arise in doing multidimensional than unidimensional unfolding. Also problematic is the assumption that different criteria for preference (i.e., more than one dimension) are used simultaneously and independently, and that they are relevant for each stimulus and each subject. Proponents of multidimensional unfolding insist that reality is multidimensional: e.g., a chair has a color, a weight, and a size; a person has an age, a sex, and a preference for certain goods; and a political party may be large, religious, and right wing. Still, subjects often do not evaluate items on the basis of all possible attributes at once. Often they compare them with respect to one attribute only, e.g., sizes of chairs, ages of
264
van Schuur
subjects, and ideological positions of political parties. There may be instances in which a multidimensional model is indeed the best one. But the relative merit of multidimensional versus unidimensional unfolding models in particular situations should be determined empirically.
2.2.3 Selecting a Maximal Subset: of Stimuli or of Subjects? It is an established practice in (multidimensional) unfolding analysis to assign stress values to subjects. This practice reflects the assumption that difficulty in finding a representation can be explained by reference to subjects who apparently used criteria other than the overall dominant one(s), or who perhaps even behaved randomly. A possible procedure for reducing imperfection in one’s data is thus to delete subjects whose stress values are too high. However, high stress values may arise because one or more stimuli do not belong in the same universe of content along with the other stimuli, and therefore cannot be adequately incorporated into the same representation. For unfolding to apply, subjects should differ in their preferences for the stimuli, but they should agree about the cognitive aspects of the stimuli: whether gentlemen prefer blondes or brunettes is a different matter from establishing whether Marilyn is blond or brown. If there is disagreement among the subjects about the characteristics of a stimulus, differences in preference will be difficult to represent; such a stimulus can better be deleted from an unfolding scale. There are reasons for preferring the deletion of stimuli to the deletion of subjects from an unfolding scale. Subjects are often selected as representatives of a larger population. Deleting subjects therefore lowers the likelihood that the result will generalize successfully from a sample to a population. Stimuli, in contrast, are rarely a random sample from a population of stimuli, but rather are intended to serve as the best and most prototypical indicators of a latent trait. In other words, we are often less interested in the actual stimuli than in their potential for allowing us to measure subjects along a latent trait. This means that the deletion of stimuli can generally be defended more easily than the deletion of subjects. Regardless of whether stimuli or subjects are deleted, an explanation for their nonscalability is called for. For stimuli this is especially true in the case that they constitute an entire population, such as all political
Unfolding the German Political Parties
265
parties of a country. The nonscalability of certain stimuli in one dimension may mean that a less parsimonious spatial (multidimensional), or a discrete (cluster, or tree) representation is needed instead of a unidimensional unfolding representation. Alternatively, different well-specified groups of subjects may consistently use different criteria in judging a set of stimuli. Explaining why certain subjects are difficult to represent on an unfolding dimension or in an unfolding space is generally even more difficult than explaining why certain stimuli do not fit. Such explanations are virtually nonexistent in the applied unfolding literature.
3. A Proposed Strategy for Unidimensional Unfolding The unidimensional unfolding model proposed in this paper is based on a combination of three of the strategies discussed above for dealing with data that are not perfectly unfoldable. It allows for a stochastic representation of a maximal subset of stimuli and all subjects in one dimension, using only the highest preference judgments of each subject. Subjects’ most preferred stimuli are distinguished from the remaining ones in a dichotomous way. The approach used here to find an unfolding scale is a form of hierarchical cluster analysis. The optimal smallest unfolding scale is first found and this is then extended with additional stimuli, for as long as the stimuli jointly continue to satisfy the criteria for an unfolding scale. If no more stimuli can be added to the p-stimulus unfolding scale the procedure begins again by selecting the optimal smallest unfolding scale among the remaining n - p stimuli. The process by which more than one maximal subset of unidimensionally unfoldable stimuli can be found in a given pool of stimuli is called “multiple scaling”.
3.1 The Concept of “Error” We generally do not know in advance which stimuli are representable in an unfolding scale, much less the order in which they are representable. The smallest unfolding scale consists of three stimuli, since it takes at least three stimuli to falsify a proposed proximity relation. For the unfolding scale of the ordered triple ABC, the response pattern in which A and C are preferred but B is not is defined as the “error pattern” for that uiple of stimuli. Since part of our analysis is to establish the order in which the stimuli form an unfolding scale, we must consider all three
266
van Schuur
permutations in which each of the three stimuli is the middle one: BAC, ABC, and ACB (a reflection of the scale is an admissible transformation). For the triple consisting of the stimuli A, B, and C in this order, the response pattern 101 would be the error pattern for an unfolding scale ABC, 110 for the scale ACB, and 01 1 for the scale BAC. The amount of error in an individual response pattern to more than three stimuli in a proposed scale order is defined as the number of proximity relations in that pattern that violate the unfolding model, i.e., the number of triples that contain the error pattern. For example, the pattern (ABCD, 0101) contains one error, namely in the triple BCD; the pattern (ABCD, 1011) contains two errors: in the triples ABC and ABD, the pattern (ABCDEFG, 1110111) contains nine errors: in the triples ADE, ADF, ADG, BDE, BDF, BDG, CDE, CDF, and CDG, and the pattern (ABCDEFG, 1011111) contains five errors: in the triples ABC, ABD, ABE, ABF, and ABG. The amount of error in a data set being evaluated for its fit with a candidate unfolding scale is defined as the sum of errors over the response patterns of all subjects. This figure can be calculated by summing the number of errors in each triple of stimuli fist over all subjects, and then over all triples of stimuli. The number of errors in a data set can be calculated for each candidate unfolding scale, i.e., for each set of three or more stimuli in each of their possible permutations.
3.2 Calculating the Expected Number of Errors The number of errors found in a candidate unfolding scale must be compared with the number of errors that would be expected under statistical independence, i.e., under the assumption that a subject’s preferences for the stimuli are completely unrelated. In this “null model” it is assumed that subjects do not differ systematically from each other in their probability of giving a positive preference response to the stimuli. When subjects are free to select as many stimuli as they wish as their most preferred (the “pick anyln” situation), the expected frequency with which a given set of stimuli is preferred is the product of the relative frequencies with which each of the stimuli is preferred times the number of subjects:
Unfolding the German Political Parties
267
Exp.Frey.(ijk, 101) =p;(l
-pJ)*pk.N
where p 1 is the relative frequency with which stimulus i is “picked” and N is the number of subjects. In the case of “pick kln” data, calculating the expected number of errors under statistical independence is a two-step procedure. This is first explained for “pick 3/n” data, and then generalized. a. The expected frequency of the “111”-response patterns is first determined by applying the n-way quasi-independence model (e.g., Bishop, Fienberg, & Holland, 1975). b. From the expected frequency of the “111”-response to each triple the expected frequency of other response patterns (e.g., 01 1, 101, or 110) is deduced. Ad a. In a data matrix in which each of the N subjects picks three of the n stimuli as most preferred, we can find the relative frequency p i with which each stimulus is picked. Under the statistical independence model these pi's arise from the addition of the expected frequency of triples ( i , j , k ) for all combinations of j and k with a fixed i . The expected frequency of triple i, j , k (is., alJk) is the product of the item parameters f,), f J , and fk times a general scaling factor f,without interaction effects: aijk
=
f f, ‘4 *fk. ‘
The values for f and each f, are found iteratively. This procedure, first described by Davison (1980), was developed by Van Schuur and Molenaar (1982). The details of this procedure are given in Van Schuur (1984). Ad b. Once the expected frequency of the “1 1 1”-pattern of all triples is known, the expected frequency of the other response patterns can be found, assuming that each subject picked exactly three stimuli as most preferred. For example: consider the situation in which subjects pick three out of five stimuli A, B, C, D, and E. For the unfolding scale ABC, the error pattern is the one in which stimuli A and C are picked, but B is not. Since exactly three stimuli were picked, either D or E must have been picked in addition to A and C. We can therefore calculate the expected frequency across all subjects of the error response pattern for the triple (ABC,lOl) by summing the frequency of the expected “ 1 11”-patterns for
van Schuur
268
the mples ACD and ACE. In general: Exp.Freq (ijk, 101) =
f e f i
*fk
*
C
fs.
s+i,j,k
This procedure can be easily generalized to the “pick kln” case, where k = 2 or where k > 3. First we find the expected frequency of each k-tuple, ranking between 1 and Second, we calculate the expected
[i].
frequency of the error response pattern of an unfolding scale of three stimuli, as follows: Exp.Freq. (ijk, 101) = f * f i*fk .Q where Q is the sum over all
[;I;] k - 2 tuples of the product of their
fi’s, where s is not equal to i, j , or k. 3.3 A Coefficient of Scalability Once we know for a triple of stimuli in a particular permutation both the frequency of the error response observed, 0bs.Freq. (ijk, lOl), and the frequency expected under statistical independence, Exp.Freq.(ijk, 101), a coefficient of scalability can be defined analogous to Loevinger’s H (cf. Mokken, 1971, who uses Loevinger’s H for multiple unidimensional cumulative scale analysis): H(ijk) = 1 -
0bs.Freq.(ijk, 101) Exp.Freq. (ijk,101) *
For each triple of stimuli (i, j , and k), three coefficients of scalability can be found: H (jik),H (ijk), and H (ikj). Perfect scalability is defined as H = 1. This means that no error is observed. When H = 0 the amount of error observed is equal to the amount of error expected under statistical independence. The scalability of a (candidate) unfolding scale of more than three stimuli can also be evaluated. In this case we simply calculate the sum of the error responses to all relevant triples of the scale for both the observed and expected error frequency, and then compare them, using the coefficient of scalability H :
Unfolding the German Political Parties
269
el 0bs.Freq. (ijk, 101)
C H=l-
ijk=l
P
3
Exp.Freq. (ijk, 101) ijk=l
The scalability of individual stimuli in the scale can also be evaluated. This is done for each stimulus separately by adding up the frequencies of the error patterns, observed and expected, in only those mples that contain the stimulus, and then comparing these frequencies by using the coefficient of scalability H .
3.4 The Search Procedure for an Unfolding Scale After obtaining all the information needed for calculating the coefficients of scalability of each triple of stimuli in each of its three essentially different permutations (i.e., 0bs.Freq (ijk, 101), Exp.Freq (ijk, lOl), and H ( i j k ) , we can begin to construct an unfolding scale. This is done in two steps. First the best elementary scale (the “best mple”) is found, and second, new stimuli are added in one by one to the existing scale. The best elementary scale is defined as the triple of stimuli that conforms best to the following criteria: Its scalability value should be positive in only one of its three permutations. This guarantees that the best triple has a unique order of representation. Its scalability value is higher than some user-specified lower boundary. This helps to ensure that the scale will be interpretable in a substantively relevant way. In practical applications a lower boundary of H > 0.30 is suggested as a rule of thumb; this value is modeled on Mokken’s (1971) approach to cumulative scaling.
If more than one triple satisfies the first two criteria, we select that mple with the highest absolute frequency of the sum of the perfect patterns that contain at least two of the three stimuli. Each mple contains eight response patterns, one of which (101) is the error pattern, and four of which (000, 100, 010, and 001) are not very informative about preferences for sets of stimuli. The high frequency of Occurrence of the patterns 111, 011, and 110 guarantees the
van Schuur
270
representability of the largest group of respondents for the elementary scale. Once the best elementary scale is found, each of the remaining n - 3 stimuli is investigated to see whether it might make the best fourth stimulus. The best fourth stimulus (e.g., D) may be added to the best triple (e.g., ABC) in any of four positions: DABC, ADBC, ABDC, or ABCD. These places are denoted as place 1 to place 4. The best fourth or, more generally, p + l’th - stimulus has to meet the following criteria to be included in a p-stimulus unfolding scale: 1. All new
k]
triples that include the p
+ i’th
stimulus and two
stimuli from the existing p-stimulus scale have to have a positive
H (ijk)-value. This guarantees that all stimuli are homogeneous with respect to the latent dimension. 2. The p + l’th stimulus should be uniquely representable, i.e., it can be positioned in only one of the p possible places in the p-stimulus scale. This helps to ensure the later usefulness and interpretability of the order of the stimuli in the scale.
+ l’th stimulus, as well as the H-value of the scale as a whole, have to be higher than some user-specified lower boundary (see second criterion for the best triple). Actually, adding a stimulus to a scale may even increase the H-value of the scale as a whole, depending on the scalability quality of the triples that are added to the scale.
3. The Hi-value of the p
4.
If more than one stimulus conforms to the criteria mentioned above, that stimulus will be selected that leads to the highest scalability for the scale as a whole.
This procedure of extending a scale with an additional stimulus is repeated as long as the criteria mentioned above are satisfied. When no further stimulus conforms to the criteria, the p-stimulus scale is taken as a maximal subset of scalable stimuli. This maximal subset can then be further evaluated as an unfolding scale with additional goodness-of-fit criteria.
Unfolding the German Political Parties
27 1
3.5 Maximizing Perfection or Minimizing Error? The search procedure of finding an unfolding scale is based on identifying a maximal subset of stimuli that contains the smallest proportion of errors in its mples. An alternative procedure might be to find a maximal subset of stimuli that contains the largest proportion of perfect patterns among all of its patterns (e.g., Davison, 1980). If we had applied this procedure we would have been interested in the extent to which the number of perfect patterns found exceeds the frequency of perfect patterns to be expected under statistical independence. We should not accept a set of stimuli as a scale if the observed frequency of perfect patterns is no more than can be explained by assuming statistically independent responses. Observed and expected frequencies of perfect patterns can also be compared by applying Loevinger’s coefficient of homogeneity. For a “pick 3/n” analysis this becomes: H=l-
0hs.Frey. ( i j k , 111) Exp.Freq.(ijk, 11 I ) ’
where 0bs.Freq.(ijk, 11 1) and Exp.Frey.( i j k , 11 1) are counted and calculated, respectively, in the same way as the error response patterns. Perfect response patterns - especially the “ 111”-responses to adjacent stimuli - should occur more frequently than expected under statistical independence, and should have a negative H-value, whereas imperfect patterns, that is, “1 11 ”-responses to non-adjacent stimuli, should occur less often than expected under statistical independence and should have a positive H-value. There are at least two problems in using the frequency of perfect patterns and the H (ijk, 11 1)-coefficients to find an unfoldable subset of stimuli. One problem is the difficulty in finding a “best” or even unique ordering of stimuli. Whereas the unique ordering of a “best” triple of stimuli follows from the (non)occurrence of errors in the three permutations, no unique ordering of the stimuli is implied in the H (ijk, 11 1)coefficients. A more important problem is that with this procedure the evaluation of a set of stimuli as an unfolding scale cannot be based on the error patterns of all triples, but only on the set of (perfect) patterns of adjacent triples. In the “pick kln” case this involves only the evaluation of n - k
+ 1 patterns,
whereas i n the procedure I am advocating all
212
van Schuur
triples are considered. Evaluating the frequencies of the perfect patterns will therefore only be used heuristically at the end of the scaling procedure to help in considering other possible start sets for the search procedure described above, and in evaluating an hypothesized unfolding scale. 3.6 The Dominance Matrix and the Adjacency Matrix The use of the coefficient of scalability as a test for the goodness-of-fit of a candidate unfolding scale can be criticized on grounds that the coefficient is not specifically tuned to the unfolding model: a good fit can be obtained for data that conform either to the unfolding model or to Guttman’s cumulative scaling model. Although this criticism is justified for the “pick anyln” case, its force can be reduced by subjecting the dominance matrix and the adjacency mamx of the unfoldable stimuli, in their scale order, to visual inspection. The dominance matrix is a square asymmetric matrix whose cells ( i J ) display the percentage of subjects who prefer stimulus i but not stimulus j . When the stimuli are ordered in their scale order along the J-scale, then for each stimulus i the percentage pi, should decrease from the first column toward the diagonal and increase from the diagonal toward the last column. The adjacency matrix is a lower triangle whose cells ( i , j ) show the percentage of subjects who “picked” both i and j . When the stimuli are ordered in their scale order along the J-scale, then for each stimulus i the percentage pi, should increase from the first column toward the diagonal and decrease from the diagonal toward the last row. The procedure for detecting stimuli that disturb the expected pattern of characteristic monotonicity is analogous to the procedure Mokken (197 1) used in multiple unidimensional cumulative scaling. Table 1 shows the dominance matrix and the adjacency matrix for a perfect unidimensional unfolding data set. Note that in the dominance mamx no column-wise monotonicity is expected. If stimuli form a cumulative scale rather than an unfolding scale, the monotonicity patterns of the dominance matrix of stimuli will differ from those just described, in that they will not reverse around the diagonal. An important difference between the use of the coefficients of scalability and the use of the dominance and adjacency matrices must be
Unfolding the German Political Parties
213
Table 1. Dominance Matrix and Adjacency Matrix
for a Perfect Four Stimulus Unfolding Scale. ~~
Data matrix A
B
C
D
Frequency
1
0 1
0 0 1
0 0
P
0 0 0
0
r
0
1
S
0
0 0 1 0
t
1
X
C p + t q+t
D p+t+w q+t+u+w
1
0 0 1
0
0 0 1 1
0 1 1
1 1 I 1
4
U V W
Dominance matrix A A B C
D
B
P q+u+x
r+u+v+x s+v+x
r+v s+v
r+u+w S
Adjacency matrix A
B
B C
t+W W
u+w+x
D
0
X
C
v+x
mentioned here. The coefficients of scalability reflect the relative number of errors, whereas the matrices reflect the absolute number of errors. Dijkstra et al. (1980) have shown already that the characteristic monotonicity requirement is not a sufficient condition for a set of stimuli to be interpreted as an unfolding scale. They give a counter example in which a perfect characteristically monotone dominance matrix was derived from Iscales that did not belong to the same J-scale. Looking at the pattern of absolute frequencies only and disregarding the information from the H coefficients may therefore lead to unjustified acceptance of an unfolding
214
van Schuur
scale.
3.7 Scale Values Once an unfolding scale of a maximal subset of stimuli has been found, scale values for stimuli and subjects can be determined. The scale value of a stimulus is defined as its rank number in the unfolding scale. The scale value of a subject is defined as the mean of the scale values of the stimuli the subject “picked” as most preferred. Subjects who did not pick any stimulus from the scale cannot be given a scale value, and have to be treated as missing data.
4. An Application to Preference for German Political Parties Transitive rank orders of preference were derived from pairwise preference comparisons for five German parties by German voters in 1969 (N = 907) and in 1980 ( N = 1316). Full rank orders of preferences given in 1972 were obtained directly from a random sample of 1785 German voters (the data are published in Pappi (1983); Norpoth (1979a, 1979b) also discusses the 1969 and 1972 data). The MUDFOLD model will be applied to a “pick 2/5” and a “pick 315” analysis of these three data sets. The parties are denoted by the capital letters A-E as follows: A: CDU/CSU (Christian democrats); B: SPB (social democrats); C: FDP (liberals); D: NPD (neo-national socialists); E: DKP (communists). The scalability values of each triple of all stimuli in each permutation, as well as the dominance and adjacency matrices, are given in the Appendix in Table 2 through Table 7. For each permutation of a triple i,j,k (e.g., jik, ijk, and ikj) the observed and expected frequency of the error patterns are given (i.e., the patterns ijk,Oll, ijk,lOl, and ijk,llO, which are the error patterns of the scales jik, ijk, and ikj, respectively), as wel! as their appropriate H-value. Expected frequencies are rounded to the nearest integer. On the basis of this information an unfolding scale of a maximum subset of stimuli is constructed. In the “pick 3/5” analyses, the observed and expected frequencies of the “ijk,lll”-patterns are also given, together with the matching H-value. The dominance matrix contains the percentage of subjects who “pick” the row party but not the column party among the most preferred. The adjacency matrix contains the percentage of subjects who “pick” both the row party and the column
Unfolding the German Political Parties
215
party among the most preferred.
4.1 A “Pick 2/5” Analysis of the 1969 Data Five of the ten triples have a positive and high enough coefficient of scalability (i.e., > 0.30) in only one of the three possible permutations, that is, they are “unique” triples:
BAD : SPD - CDU - NPD EBA : DKP - SPD - CDU CAD : FDP - CDU - NPD ADE : CDU - NPD - DKP BED : SPD - DKP - NPD
(H = 0.71) ( H = 0.80) (H = 0.66) ( H = 0.80) (H = 0.71).
However, it is impossible to construct a scale of more than three stimuli. The three major parties (ABC, or: CDU, SPD and FDP) have a negative H-value in all three permutations, which means that they cannot be represented together i n one unidimensional unfolding scale. Moreover, the scale value of a larger scale containing all three major parties would be very low. Finally, the position of the DKP (stimulus E) - either to the left of the SPD (stimulus B), or close to the NPD (stimulus D) - cannot be uniquely determined. The five three-item scales can be interpreted either in terms of a left-right dimension (the first three), or in terms of a government-opposition dimension (the last two). On the basis of the dominance and adjacency matrices an unfolding order of the stimuli SPD CDU - FDP - NPD - DKP might have been expected. However, the “unique” triple CAD (FDP - CDU - NPD), which we have already seen has an acceptably high coefficient of scalability, violates this order. The fact that the triple with the three major parties cannot be unfolded suggests that the German voters did not all use the same criterion for preference for the five political parties.
4.2 A “Pick 3/5” Analysis of the 1969 Data.
The best candidate starting triple is the only “unique” ordered triple, BCE, that has a H-value larger than 0.30, namely 0.49. Since the triple BCD has negative H-values in all three permutations, the scale BCE cannot be extended with stimulus D. If we follow the strict procedure, the best triple cannot be extended with stimulus A either, since A can be represented in two places in the scale: position 1 (giving scale ABCE), or
van Schuur
276
position 2 (giving scale BACE). Moreover, in both positions the scalability value of stimulus A, H(A), falls slightly below the user-specified lower boundary of 0.30. And even if we are willing to accept stimulus A in the unfolding scale, it is difficult to choose between these two positions on the basis of the monotonicity patterns in the dominance and adjacency matrices. However, if we relax the criterion of unique representability to allow stimulus A to be represented in the position that gives the highest overall H-value, then it will be represented in scale BACE (SPD - CDU FDP - DKP). This scale can probably best be interpreted in terms of a “government-opposition” dimension: the “Great Coalition” between SPD and CDU governed the Federal Republic until 1969. The scale is rather weak, however. Scale ABCE A: CDU B: SPD C: FDP E: DKP
Scale BACE
Pi
Hi
0.98 0.97 0.94 0.02
0.28 0.32 0.32 0.34 H = 0.32
B: SPD A: CDU C: FDP E: DKP
Pi
Hi
0.97 0.98 0.94 0.02
0.35 0.29 0.32 0.37 H = 0.33
where pi is the proportion of subjects who “pick” stimulus i as most preferred, and Hi coefficient of scalability for item i.
4.3 A “Pick 2/5” Analysis of the 1972 Data The best triple among the “unique” triples ADE, CBE, and BED is CBE (or reflected as EBC: DKP - SPD - FDP): its frequency of admissible patterns (011 and 110) is highest, and its H-value is 1.00. Unfortunately, as in the analysis of the 1969 data, this triple cannot be extended to form a scale that comprises all three major parties (CDU, SPD, and FDP). This is because each of the three pairs that can be made of these three parties (CDU + SPD; CDU + FDP; and SPD + FDP) are mentioned approximately as often as would be expected under statistical independence. There are 1730 respondents, or 97%, who “picked” two of the three major parties as most preferred. The representation of the two small parties DKP (stimulus E) and NPD (stimulus D) is also problematic. The unique triples ADE and BED suggest that D and E are relatively close together, whereas information
Unfolding the German Political Parties
277
from BAD and CAD suggests that D is relatively close to A (CDU), and information from ABE suggests that E is relatively close to B (SPD), which is in accordance with the suggestion from other analyses that D and E are the end points of the scale. The two other criteria for evaluating data as an unfolding scale (the Occurrence of perfect patterns and the characteristic monotonicity patterns of the dominance and adjacency mamces) do not suggest the same solution: the four pairs of parties that are mentioned together more often than expected under statistical independence, (AD, DE, BE, and BC) might lead us to expect a scale ADEBC (i.e., CDU - NPD - DKP - SPD - FDP). However, the dominance and adjacency matrices suggest an unfolding scale ECBAD (DKP - FDP - SPD - CDU - NPD), in which the only deviations of the characteristic monotonicity patterns involve the item pairs BE (SPD-DKP) and CE (FDP-DKP). In fact, for the three major parties this last scale conforms to the order that Norpoth (1979a, 1979b) suggested on the basis of his own analyses: FDP - SPD - CDU, which he interpreted in terms of a “religious-secular” dimension. Still, this scale has an H-value of only 0.08, which makes the null hypothesis of statistical independence very plausible. The position of the two smaller parties, DKP and NPD, is based on the responses of a small number of subjects and therefore highly unstable: of the 1785 subjects only 22 mentioned the DKP, 35 mentioned the NPD, and 2 mention both the DKP and NPD among their two most preferred parties.
4.4 A “Pick 3/5” Analysis of the 1972 Data The best elementary unfolding scale is the triple ACE, since the sum of the patterns 011, 110, and 111 is higher than for the other “unique” triples (the ordered triples ABE, ADE, BED, and CED). Its H-value is 0.38. (In this example the triple ABE could also have been considered: the sum of its patterns 011, 110 and 111 is only marginally less, and its H ( i j k ) value is larger (0.65). The search procedure would lead to the same final conclusion, however.) This best triple cannot be extended with stimulus D, since there is at least one negative scalability value in the triples ACD, ADE, and CDE for each of the four possible places. Stimulus B can be represented in more than one position: position 2 (scale ABCE) and position 3 (scale ACBE). The position that gives the highest overall H-value is position 3, giving as
278
van Schuur
a final scale: CDU - FDP - SPD - DKP. The two perfect response patterns of this scale (ABC and BCE) are the two most preferred patterns, and they occur more often than expected under statistical independence, so these results do not violate the unfolding interpretation. This order of the stimuli conforms to the reflected order of the parties on the ideological left-right continuum. Since it is customary to represent political parties from left to right, I have reversed the order to EBCA in the final scale, as well as in the dominance matrix and adjacency mamx. Scale EBCA
E: DKP B: SPD C: FDP A: CDU
Pi 0.05 0.98 0.98 0.93
Scale ECBA
Hi 0.63 0.43 0.37 0.32 H = 0.42
DKP FDP SPD CDU
Pi
Hi
0.05 0.98 0.98 0.93
0.54 0.29 0.36 0.30 H = 0.36
Let us return for a moment to the nonrepresented party, the NPD (stimulus D). The triples incorporating stimulus D that have the highest scalability values are BAD, CAD, and ADE. The triples BAD and CAD are mentioned relatively frequently (24 and 30 times, respectively), more frequently than would be expected under statistical independence. This suggests that the NPD should be represented to the right of the CDU. The scale in this case would be EBCAD. However, triple ADE also has a high scalability value ( H = 0.76), and these three stimuli are also mentioned together more often than expected under statistical independence. In fact, all triples including both stimuli D and E (NPD and DKP) occur more often than expected under statistical independence. This relatively frequent co-occurrence of NPD and DKP - which are at opposite end on the ideological left-right continuum - suggests that at least some subjects used another dimension in establishing their preference order (e.g., a “protest”, “anti-system”, or “government-opposition” dimension). 4.5 A “Pick 215” Analysis of the 1980 Data
Among the unique mples (ACB, ADE, BED, CAD and CBE) mple CBE is the best one, according to the criteria given in 3.4. It can be extended with stimulus D in the fourth place, giving scale CBED. The best triple
Unfolding the German Political Parties
279
cannot be extended with stimulus A: although all H(ijk)-values for the scale ACBE are positive, the scalability value of the scale as a whole drops below 0.30. The representation of stimulus D (NPD) next to E (DKP) rather than at the other end of the scale, next to the FDP, depends on a single person who mentions stimuli D and E together. The expected number of subjects who would mention D and E together under statistical independence is 0.089. Because of this one subject, the values for H(EBD) and H(ECD) become negative, which precludes the scale EBCD, according to our criteria. Moreover, the scalability of scale DEBC is higher then that of the scale EBCD. Scale EBCD
E: DKP B: SPD C: FDP D: NPD
Scale DEBC
Pi
Hi
0.01 0.69 0.77 0.01
0.46 0.67 0.67 0.52 H = 0.61
NPD DKP SPD FPD
Pi
Hi
0.01 0.01 0.69 0.77
0.96 0.83 0.83 0.89 H = 0.88
4.6 A “Pick 3/5” Analysis of the 1980 Data
The best triple has to be sought among the three unique triples with a H(ijk)-value of over 0.30: ABE, ACE, and BED. Triple ACE has the highest sum of the frequencies of the 011, 110 and 111 patterns, and is therefore chosen as the best triple (H = 6.40).Stimulus D cannot be added to the scale: according to the H-values of triple CDE, E should be represented between C and D, but due to the negative H-value of the ordered triple AED this representation is not possible. Stimulus B can be represented in two places: position 2 (scale ABCE) and position 3 (scale ACBE). Representation of stimulus B in position 3 gives the highest overall H-value, with no violations of the characteristic monotonicity patterns of the dominance matrix and the adjacency matrix. This scale (in reflected order: DKP - SPD - FDP - CDU) can be interpreted in terms of a left-right dimension.
van Schuur
280
Scale ECBA
Scale EBCA
E: DKP B: SPD C: FDP A: CDU
Pi
Hi
0.07 0.97 0.99 0.92
0.81 0.74 0.64 0.58 H = .70
E: DKP C: FDP B: SPD A: CDU
Pi
Hi
0.07 0.99 0.97 0.92
0.75 0.21 0.39 0.35 H = .39
5. Discussion
Applications of MUDFOLD, a computer program for the unidimensional unfolding analysis of dichotomous data, to preferences for five political parties by West German voters in 1969, 1972, and 1980, lead to unfolding scales for four of the five parties. It is not possible to represent all five German parties in a unidimensional unfolding scale. The difficulty in unfolding the preference rankings of the five German parties has already been pointed out by Norpoth (1979a, 1979b) and Pappi (1983). The detailed information, obtained through MUDFOLD analyses, suggests two major reasons for this difficulty: first, that there is very little structure in preferences for the three major parties, and second, that the two smallest parties can be represented in two conflicting ways. In the three years for which data have been analyzed, he number of subjects who mentioned one of the three pairs of the three major parties together as most preferred in the “pick 2/5” analysis, or who mentioned all three parties together as most preferred in the “pick 3/5” analysis, hardly deviates from the number that would be expected under statistical independence. Two possible explanations may be given for this finding, both of which are compatible with an unfolding representation. According to the first, the three parties are very close together on the unfolding scale, and are therefore difficult for subjects to distinguish. Second, subjects may differ in their interpretation of the position of the three major parties along the underlying dimension. For instance, for some people the FDP may be representable to the right of the CDU, whereas for others the FDP should be placed between SPD and CDU, or even to the left of the SPD. Such cognitive difference would make the unidimensional representation of differences in preferences impossible. Klingemann (1972) and Pappi (1980), among others, present some evidence supporting this phenomenon.
Unfolding the German Political Parties
28 1
The conflicting possible representation of the DKP and NPD as either close together or each at opposite ends of the unfolding scale is found in all three data sets. This also suggests that different subjects may base their preference judgments on different criteria. However, only a small number of subjects mentioned these parties together among their two or three most preferred ones, and it is difficult to make valid inferences on the basis of a comparison between small numbers of observed and expected errors. An alternative explanation for the results for these two parties is that some respondents or some coders may have inadvertently reversed the appropriate pairwise preference judgments or the preference I-scales. The least-preferred parties would then have been interpreted as the most preferred, and vice versa. This reversed order is more in agreement with the dominant unfolding interpretation. However, there is no way to validate this suggestion on the basis of the published data. Despite the difficulty of representing all parties along one unidimensional unfolding scale, we still find some easily interpreted structure in subsets of the data. In 1969 the preference effects of the “Great Coalition’’ are clearly visible in a “government-opposition” dimension. Most of the additional structure found among unfoldable triples or four-tuples of parties can be interpreted in terms of the “left-right” dimension, which Klingemann (1972) also identified as important on the basis of other evidence. These results do not conform to the interpretation given by Norpoth (1979a, 1979b) for the same data. Norpoth analyzed these data by constructing an unfolding scale for a maximal subset of subjects rather than a maximal subset of stimuli, and concluded that the three major parties would form the best unfolding scale in the order FDP - SPD - CDU, which he interpreted as a “religious-nonreligious” dimension. However, he did not find this interpretation very plausible: “... the overwhelming share [of subjects] claimed by this dimension strains credulity. Religious issues have rarely if ever topped the priority list of the public in recent years” (1979b, p. 729). By insisting on keeping all stimuli in the scale, which forced him to throw out at least 20% of his subjects without any substantive explanation, he found it impossible to obtain the left-right results that he also had expected on the basis of Klingemann’s previous studies.
282
van Schuur
The major reason for the difference between Norpoth’s findings and those presented here, lies in Norpoth’s emphasis on absolute numbers of errors, compared to my emphasis on the number of errors relative to the number of errors that would be expected under the null hypothesis, i.e., the hypothesis that subjects’ responses are statistically independent of each other. It is true that the permutation FDP-SPD-CDU, that Norpoth accepts as an unfolding scale among the three major parties, is the order that leads to the smallest absolute number of errors. However, this number of errors does not differ significantly from the number of errors that would be expected under the null hypothesis of statistical independence.
Appendix. Detailed Information on MUDFOLD Analyses Table 2. 1969 Data, Pick 215, N = 907 Error paitcrns
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Obs
Exp
Huik)
Obs
Exp
143 3 11 1 2 6 1 2 6 6
141 10 9 3 2 .2 3 2
-.01 .71 -.29 .66 .20 -32.54
163 15 2 15 2 2 3 11 11 2
162 12 10 12 10 10 10 9 9 2
.2 .2
.66 -20 -32.54 -32.54
H(iJk) Obs -.01 -.29 .80 -.29 .80 30
.71 -.29 -.29 .20
Dominance matrix
B: SPD A:CDU C:FDP D:NPD EDKP
561 561 561 163 163 15 143 143 3 1
Exp
H(ikJ)
558 558 558 162 162 12 142 142 10 3
-.01 -.01
-.01 -.01 -.01 -.29 -.01 -.01
.71
.66
Adjaccncy matrix
SPD
CDU
FDP
NDP
DKP
-
17 -
63 64
18
16
-
2
1
1
2
3 2
79 80 34 2
78
20
SI’D
SI’D 81 CDU 62 FDP 16 34 2 NPD 0 - D K P 1
CDU
FDP
NDP
2 0
1
18 0 0
DKP
Table 3. 1969 Data, Pick 315, N = 907
C
E m r patterns
2s
111 patterns
a
00
Obs ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Exp
14 12 2
20 17 3
18 8 6 23 7 8 2
17 3 .2 22 4 .3 .4
H(jik)
Obs
Exp
H(ijk)
Obs
Exp
H(ikj)
Obs
Exp
.29 .29 .37
18 19 3
26 22 4
.30 .14 .27
49 822 863
51 818 853
.03 -.01 -.01
818
-.07 -1.60 -24.08 -.05 -.76 -26.87 -4.48
47 6 5 45 4 6 3
43 8 12 43 8 11 7
-.09 .24 .57
819 835 62 820 830 57 29
814 832 65 813 827 60 39
-.01
-.04
-49 .44 .56
Dominance matrix SPD B: SPD A:CDU C:FDP D:NPD E:DKP
2 3 3 1
FDP
NDP
DKP
2
5 6
91 91 91
96 97 93 8
I
-
5 I
.04
-.01 -.OO .05 .25
B
810
-.01
Q
45 4
43 8
17 1 2 12 2 0 6
22 4
-.05 .48 .22 .74 -8.69 .28 .33 1.oo -73.47
.2 17 3 .2 .I
2
s 3
%
% =. r,
Q
e%
3.
2
Adjacency matrix
CDU
2 1 1
-.OO
H(111)
SPD CDU FDP NPD DKP
SPD
CDU
FDP
NDP
96 92 6 1
92 7 1
4 1
1
DKP
h)
01
w
van Schuur
284
Table 4. 1972 Data, Pick 215, N = 1785 Error patterns
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Obs
Exp
Huik)
Obs
Exp
690 7 19 3 0 2 3 0 2 2
677 20 12 7 5 .1 7 5 .1 .1
-.02 .64 -.55 .58 1.oo -14.38
279 23 1 23 1 1 7 19 19 0
283 8 5 8 5 5 20 12 12 5
.58
1.00 -14.38 -14.38
H(iJk) Obs .01 -1.82 .80 -1.82 .80 .80 .64 -55 -.55
1.00
Dominance matrix DKP
E: DKP C:FDP B: SPD A:CDU D:NPD
-
SPD
CUD
NPD
1
0 16
1 39 40 -
1 54
-
82 60 2
44 44 2
17 2
H(ikj)
.o 1 .o1 .o 1
768 768 768 283 283 8 677 677 20 7
.01 .01 -1.82 -.02 -.02
.64 .58
Adjacency matrix
FDP
54
761 761 761 279 279 23 690 690 7 3
Exp
1
82 58
-
DKP DKP FDP SPD CDU NPD
FDP
SPD
39 16 0
43 0
CDU
0 1
0 0
1
NPD
Table 5. 1972 Data, Pick 315, N = 1785 Error patterns
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Obs
Exp
122 45 79 45 79 2 31 3
123 66 58 66 58 1 20 18 .9 .8
5
5
111 patterns
Hulk)
Obs
Exp
H(ijk)
Obs
Exp
H(ikj)
Obs
Exp
.oo
32 34 6 28 10 8 25 7 84 80
36 19 17 18 16 33 19 17 74 74
.ll -.76 .65 -.53 .38 .76 -.33 .58 -.14 -.08
30 1601 1619 1597 1625 54 1673 1639 68 74
34 1607 1609 1607 1610 37 1648 1656 83 84
.12
1595 24 6 30 2 4
1590 18 16 19 17 .2 65 58 .7 .7
.32 -.36 .32 -.36 -.48 -.56 .83
-4.64 -4.89
Dominance matrix DKP E: DKP B: SPD C: FDP A: CDU D: NPD
93 94 92 5
FDP
CDU
NPD
0
1 2
4 7 7
5 94 94 90
3
-
2 2
-.01 .01 -.01 -.47 -.02 .01 .18 .12
44 78 1 1
-.oo -.32 .63 -.56 .88 -19.72 .33 -.36 -.66 -.44
Adjacency matrix
SPD
2 2 2
.oo
H(111)
DKP DKP SPD FDP CDU NPD
SPD
FDF
CDU
96 91 4
91 4
3
5 5 1 0
NPD
van Schuur
286
Table 6. 1980 Data, Pick 215, N = 1316 Error pattcrns
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Obs
Exp
636 1 15 0 3 1 0 3 1 1
626 6 6 9 9 .1 9 9 .1 .I
Huik)
Obs
Exp
-.02 .84 - 1.36
382 17 0 17 0 0 1 15 15 3
378
-.01
4 4 4
-3.44 1.00 -3.44 1.00 1.00 .84 -1.36 -1.36 .66
1.00
0.66 -10.29 2.00
.66 -10.29 -10.29
4 4
6 6 6 9
H(ijk)
D.NPD E: DKP B: SPD C: FDP A: CDU
-
DKP
FDP
1
1
1
68 77 50
0 29 30
1 21 21
1
69 78 49
SPD
Exp
26 1 26 1 26 1 382 382 17 636 636
275 275 275 378 378 4 626 626 6 9
1
0
H(ikj) .05 .05
.05 -.01 -.01
-3.44 -.02 -.02 .84 1.00
Adjnccncy matrix
Dominance matrix NDP
Obs
NDI’
CDU 0 1 50 49
-
NPD DKP SI’D
FDI’ CDU
DKI’
SPD
FDP
1 0 0
48 20
29
CDU
0 0 0 1
-
Table 7. 1980 Data, Pick 3/5, N = 1316
C
Error patterns
ABC ABD ABE ACD ACE
ADE BCD BCE BDE CDE
Obs
Exp
100 20 80 21 81 1 37 2 4 3
100 45 56 45 57 1 18 23 1 .3
Huik)
Obs
-.MI
37 39
.55 -.43 .53 -.43 .09 -1.00 .91 -3.22 -7.68
4
11 4 2 8 1 81 81
Exp
H(iJk)
Obs
40 18 23 5 7 29 5 7 62 78
.08 -1.20 .82 -1.10 .40 .93 -.49 .85 -.30 -.04
9 1167 1174 1167 1202 44 1246 1186 28 56
3 @a
Exp
H(ikj)
Obs
Exp
H(111)
5
12 1169 1168 1186 1180 23 1219 1207 49 62
.23
1166 8 1 36 1 3 20 80 0 1
1163 5 7 18 22 .1
-.00 -.56 .85 -1.04 .96 -29.33 .55 -.43 1.oo -.18
9 3Q
Dominance matrix
.OO -.01
.02 -.02 -.93 -.02 .02
.43 .09
44 56 .2 .8
m
3
% 2
s 0
E
e3 2.
8
Adjacency matrix
~
~~
E:DKP B: SPD C: FDP A: CDU D: NPD
3t?:
111 patterns
DKP
SPD
FDP
CDU
NPD
-
0
0 1
6 8 8
6 95 95 89
2
-
91 93 92 5
3 3 3
1 1
DKP DKF' SPD FDP CDU NF'D
SPD
FDP
CDU
96 89 2
91 4
4
NDP
-
6
6 0 0
N
2
288
van Schuur
References Bechtel, G . G . (1976). Multidimensional preference scaling. The Hague: Mouton. Bennett, J. F., 8z Hays, W. L. (1960). Multidimensional unfolding: Determining the dimensionality of ranked preference data. Pyschometrika, 25, 27-43. Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press. Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard, A. K. Romney, & S. Nerlove (Eds.), Multidimensional scaling: Theory and applications in the behavioral sciences (Vol. 1, pp. 105-155). New York: Seminar Press. Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 148-158. Coombs, C. H. (1964). A theory of data. New York: Wiley. Coombs, C. H., & Kao, R. C. (1960). On a connection between factor analysis and multidimensional scaling. Psychometrika, 25, 219-231. Coombs, C. H., & Smith, J. E. K. (1973). On the detection of structure in attitudes and developmental processes. Psychological Review, 80, 337-351. Davison, M. L. (1979). Testing a unidimensional, qualitative unfolding model for attitudinal or developmental data. Psychometrika, 44, 179194. Davison, M. L. (1980). A psychological scaling model for testing order hypotheses. British Journal of Mathematical and Statistical Psychology, 33, 123-141. Dijkstra, L., van der Eijk, C., Molenaar, I. W., van Schuur, W. H., Stokman, F. N., & Verhelst, N. (1980). A discussion on stochastic unfolding. Methoden en Data Nieuwsbrief, 5 , 158-175. Gold, E. M. (1973). Metric unfolding: Data requirements for unique solutions and clarification of Schonemann’s algorithm. Psychometrika, 38, 44 1-448. Heiser, W. J. (1981). Unfolding analysis of proximity data. Leiden: University of Leiden. Jansen, P. G. W. (1983). Rasch analysis of uttitindinul data. Nijmegen:
Unfolding the German Political Parties
289
Catholic UniversityRhe Hague: Rijks Psychologische Dienst. Klingemann, H. D. (1972). Testing the left-right continuum in a sample of German voters. Comparative Political Studies, 5, 93-106. Kruskal, J. B., Young, F. W., & Seery, J. B. (1973). How to use KYST. Bell Laboratories, Murray Hill, NJ. Leik, R. K., & Matthews, M. (1968). A scale for developmental processes. American Sociological Review, 54, 62-75. Mokken, R. J. (1971). A theory and procedure of scale analysis. The Hague: Mouton. Dimensionen des Parteikonflikts und Norpoth, H. (1979a). Praferenzordnungen der deutschen Wahlerschaft: Eine Unfoldinganalyse. Zeitschrijt fur Sozialpsychologie, 10, 350-362. Norpoth, H. (1979b). The parties come to order! Dimensions of preferential choice in the West German electorate, 1961-1976. American Political Science Review, 73, 724-736. Pappi, F. U. (1983). Die Links-Rechts Dimension des deutschen Parteiensystems und die Parteipraferenz-Profile der Wahlerschaft. In M. Kaase & H. D. Klingemann (Eds.), Wahlen und politisches System, Analysen aus Anlass der Bundestagswahl 1980 (pp. 422-441). Opladen: Westdeutscher Verlag. Roskam, E. E. (1968). Metric analysis of ordinal data. Voorschoten: VAM. Ross, J., & Cliff, N. (1964). A generalization of the interpoint distance model. Psychometrika, 29, 167-176. Schonemann, P. H. (1970). On metric multidimensional unfolding. Psychometrika, 35,349-366. Sixtl, F. (1973). Probabilistic unfolding. Psychometrika, 38, 235-248. Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 86, 542-573. Tversky, A., & Sattath, S. (1979). Preference trees. Psychological Review, 86, 542-573. van Schuur, W. H., & Molenaar, I. W. (1982). MUDFOLD: multiple stochastic unidimensional unfolding. In H. Caussinus, P. Ettinger, & R. Tomassone (Us.), COMPSTAT 1982 (Part I, pp. 419-426). Vienna: Ph y sica-Verlag. van Schuur, H. (1984). Structure in political beliefs, a new model for stochastic unfolding with application to European party activists.
290
van Schuur
Amsterdam: CT Press. Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic multidimensional unfolding analysis. Psychometrika, 39, 327-350.
New Developrncnts in Psychological Choice Modeling G. De Soete, H. Fcger and K. C. Klauer (eds.) 0 Elsevicr Scicncc Publisher R.V. (North-Holland), 1989
29 1
PROBABILISTIC MULTIDIMENSIONAL SCALING MODELS FOR ANALYZING CONSUMER CHOICE BEHAVIOR Wayne S. DeSarbo University of Michigan
Geert De Soete University of Ghcnt, Belgium
Kame1 Jedidi University of Pennsylvania We review the development of two new stochastic multidimensional scaling (MDS) methodologies that operate on paired comparisons choice data and render a spatial representation of subjects and stimuli. In the probabilistic vector MDS model, subjects are represented as vectors and stimuli as points in a T-dimensional space, where the scalar products or projections of the stimulus points onto the subject vectors provide information about the utility of the stimuli to the subjects. In the probabilistic unfolding MDS model, subjects are represented as ideal points and stimuli as points in a T-dimensional space, where the Euclidcan distance between the stimulus points and the subject ideal points provides information as to the respective utility of the stimuli to the subjects. To illustrate the versatility of the two models, a marketing application measuring consumer choice for fourteen actual brands of over-the-counter analgesics, utilizing optional reparameterizations, is dcscribcd. Finally, other applications are identified.
The second author is supported as “Bcvoegdverklaard Navorser” of the Belgian “Nationad Fonds voor Weknschappelijk Onderzock”. This paper is a revised version of an article published in Communication & Cognition, 1987, 20, 17-43.
DeSarbo, De Soete, & Jedidi
1. Introduction
The method of paired comparisons involves presenting a subject two stimuli at a time. The subject is then required to choose one of the two presented stimuli (cf. e.g., David, 1963; Thurstone, 1927). Since this paper is concerned with understanding consumer behavior, we will be using the terminology of consumers (for subjects) and productshrands (for stimuli). The method of paired comparisons can be gainfully applied in consumer behavior research whenever it is not possible or feasible to make continuous measurements of the utilities of a set of products or brands. With J products, each of the I consumers typically makes
(”21
judgments. However, if this number is too large, incomplete designs may be utilized (cf. Bock & Jones, 1968; Box, Hunter, & Hunter, 1978) in order to reduce the number of judgments a consumer must make. Since consumers are often inconsistent when making judgments, probabilistic models are needed for analyzing such paired comparisons data. To display the structure in paired comparisons data, several models have been presented in the psychometric literature which represent the consumers and the products in a joint uni- or multidimensional space. There have been an number of unidimensional scaling procedures proposed to obtain scale values for products from such (aggregated) paired comparisons data (for a survey, see Bock & Jones, 1968; Torgerson, 1958). More recently, multidimensional scaling models have been devised to account for the multidimensional nature of the products. Here, two general classes of models have been typically utilized to represent such preferencehhoice data: vector and unfolding models. A vector or scalar products multidimensional scaling model (Slater, 1960; Tucker, 1960) represents the consumers as vectors and the products are points in a 7’-dimensional space. Figure 1 represents a hypothetical two-dimensional portrayal of such a representation where there are two consumers (represented by two vectors I and 11) and five products (represented by the letters A-E). Here, utility or preference order for a given consumer is assumed to be given by the orthogonal projection of the products onto the vector representing that consumer. For example, for consumer I, product B has the highest utility, then E, then A, then D, and finally C. For consumer 11, the order of utility (from highest to lowest) is A, B, C, D, and E. The goal of the analysis here is to estimate the “optimal” vector directions and product
Analyzing Consumer Choice Behavior
293
coordinates in a prescribed dimensionality. An intuitively unattractive property of the vector model is that it assumes preference or utility to change monotonically with all dimensions. That is, it assumes that if a certain amount of a thing is good, more must be even better. (The isoutility contours therefore are parallel straight lines perpendicular to a consumer’s vector.) According to Carroll (1980). this is not an accurate representation for most quantities or attributes in the real world (perhaps with the exception of money, happiness, and health).
Figure I . Two-dimensional illustration of the vector model (taken from Carroll & DeSarbo, 1985).
There has been some work done concerning analyzing paired comparisons via such vector or scalar products models. Bechtel, Tucker, and Chang (1971) have developed a scalar products model for examining
294
DeSarbo, De Soete, & Jedidi
graded paired comparisons responses (i.e., where consumers indicate which of two products are preferred and to what extent). Cooper and Nakanishi (1983) have devised two logit models (vector and ideal point) for the external analysis of paired comparisons data. Carroll (1980) has proposed the wandering vector model for the analysis of such paired comparisons data. According to this vector model, it is assumed that each consumer can be represented by a vector and that individual consumers will prefer that brand from a pair having the largest projection on that vector. The direction cosines of this vector specify the relative weights the consumer attaches to the underlying dimensions. The wandering vector model assumes that a consumer’s vector wanders or fluctuates from a central vector in such a way that the distribution of the vector termini is multivariate normal. De Soete and Carroll (1983, 1986) have developed a maximum likelihood method for fitting this model and have proposed various extensions of the original model to accommodate additional sources of error as well as graded paired comparisons. Unfortunately, the De Soete and Carroll (1983, 1986) model requires replicated paired comparisons per subject (or group of subjects) to estimate more than one vector. This turns out to be a rather difficult data collection task in consumer behavior research. Without such replications, a group of subjects must be considered as replications of each other. Assuming considerable heterogeneity within the group of subjects, the centroid vector for the group may be estimated with considerably high variances on the terminus. In addition, no provision is available to explore individual differences (with replications) as a function of specified subject differences (such as demographic characteristics). DeSarbo, Oliver, and De Soete (1986) propose an alternative probabilistic vector MDS model which operates on paired comparisons. This model can estimate separate subject vectors without requiring withinsubject replications. A variety of possible model specifications are provided where vectors andor stimuli can be reparameterized as a function of specified background variables. We will describe its model structure as well as its program options, and provide a marketing application. The other major type psychometric model to represent such preference/choice data is the unfolding model (Coombs, 1964). We will discuss only the simple unfolding model of Coombs (1964). In the simple unfolding model, both consumers and products are represented as
Analyzing Consumer Choice Behavior
295
x2
t
@ III (DABEC)
@ I (BACDE)
Figure 2. Two-dimensional illustration of the simple ideal point model (taken from Carroll & DeSarbo, 1985).
points in a T-dimensional space. The points for the consumers represent ideal points, or optimal sets of dimension values. The farther a given product point is from a consumer’s ideal point, the less utility that product has for the consumer. This notion of relative distance implies a Euclidean metric on the space which implies that, in T = 2 dimensions, iso-utility contours are families of concentric circles centered at a consumer’s ideal point. Carroll (1980) demonstrates that the vector model is a special case of this unfolding model where the ideal point goes off to infinity. Figure 2 illustrates a hypothetical two-dimensional space from an unfolding perspective. Here there are three consumers represented by ideal points labeled I, 11, and 111, and five products labeled A-E. The figure specifies the preferencehtility order for each consumer as a function of distance
296
DeSarbo, De Soete, & Jedidi
away from the respective ideal point. The objective in unfolding analysis is to estimate the “optimal” set of ideal points and product coordinates in a prescribed dimensionality. Although several unidimensional stochastic unfolding models have been proposed in the literature (Bechtel, 1968, 1976; Coombs, Greenberg, & Zinnes, 1961; Sixtl, 1973; Zinnes & Griggs, 1974), only three multidimensional unfolding models have been developed to accommodate paired comparisons data. The first one by Schonemann and Wang (1972) and Wang, Schonemann, and Rusk (1975) is based on the well-known Bradley-Terry-Luce model and consequently assumes strong stochastic transitivity. In the multidimensional unfolding model proposed by Zinnes and Griggs (1974), it is assumed that the coordinates of both the consumer and the product points are independently normally distributed with a common variance. Zinnes and Griggs (1974) assume that for each element of the product pair, a consumer independently samples a point from his or her ideal point distribution. In the Zinnes-Griggs model, the probability that consumer i prefers product j to k is defined by
where F ”(v1,v2,h1,h2) denotes the doubly non-central F distribution with degrees of freedom v1 and v2 and noncentrality parameters hl and hz, and di; (respectively &) the Euclidean distance between the mean point of consumer i and the mean point of product j (respectively k). More recently, De Soete, Carroll, and DeSarbo (1986) and De Soete and Carroll (1986) have proposed the wandering ideal point model for the analysis of such paired comparisons data as an unfolding analogue of the wandering vector model. According to this model, it is assumed that each consumer can be represented by an ideal point and that he or she will prefer that product from a pair which has the smallest Euclidean distance from that ideal point. This model assumes that a consumer’s ideal point wanders or fluctuates from a central ideal point in such a way that the distribution of the ideal point coordinates is multivariate normal. De Soete, Carroll, and DeSarbo (1986) have developed a maximum likelihood method for fitting this model and show that it is the only existing probabilistic multidimensional unfolding model requiring only moderate stochastic transitivity.
Analyzing Consumer Choice Behavior
297
Unfortunately, as in the case of the wandering vector model, the De Soete, Carroll, and DeSarbo (1986) model also requires replications of paired comparison matrices per consumer to estimate more than one ideal point. Again, this turns out to be a rather difficult task in terms of data collection. Without such replications, only one centroid ideal point can be estimated for a sample of I consumers. Assuming considerable heterogeneity in the sample, the single centroid ideal point me be estimated with considerably high variances. In addition, no provision is currently available to explore individual differences (with replications) as a function of specified consumer differences (such as demographic characteristics), or have similar reparametrizations on products (vis a' vis attributes or features). DeSarbo, De Soete, and Eliashberg (1987) propose an alternative probabilistic MDS unfolding model which also operates on paired comparisons. This model can estimate separate consumer ideal points without requiring within-consumer replications. A variety of possible model specifications are provided where ideal points and/or product coordinates can be reparameterized as a function of specified background variables which aids in the understanding of consumer choice behavior. We will describe its model and structure as well as its program options, and provide a marketing example.
2. Methodologies
2.1 Research objectives As stated, the objective of this paper is to review the two probabilistic MDS models proposed by DeSarbo, Oliver, and De Soete (1986) and DeSarbo, De Soete, and Eliashberg (1987) for representing paired comparison judgments so that consumers and products can be displayed in a joint space, thus permitting inferences concerning the nature of the consumer choice under investigation. In doing so, two sub-objectives will be addressed. The first concerns the ability to investigate the nature of individual (consumer) differences on preference/choice and its measurement, while the second involves modeling the effect of specific product features on the measurement of preferencehhoice. The discussion section will suggest further potential applications to the investigation of still other latent constructs.
DeSarbo, De Soete, & Jedidi
298
2.2 Notation
Let
i = 1, . . . , I consumers, j,k = 1, . . . ,J brands/products, f = 1, . . . , T dimensions, 1 = 1, . . . , L brand features, n = 1, . . . ,N consumer variables,
b
1 if consumer i finds product j more satisfying than k, = 0 else,
{
= the I-feature/attribute value for the j-th brand, Yi, = the n-th background variable value for the i-th consumer, = the f-th coordinate for consumer i, bit = the f-th coordinate for brand j , aa = the impact coefficient of the n-th consumer variable on the r-th
Hjl
dimension, ylt = the impact coefficient of the I-th brand variable on the f-th dimension. 2.3 The Vector Model DeSarbo, Oliver, and De Soete (1986) define a latent consumer preference or utility construct: Vi, = Ui,
+ ei,,
(1)
where V , = the (latent) utility of brand j to consumer i, T
aitbjt,
uij = t=1
ei, = error. Here, U, refers to a “true” utility or latent preference score for consumer i concerning brand j . It is modeled as equal to the scalar product of the brand coordinates (bjt)and the consumer vector (ad). The order of utility or preference for a given consumer is thus assumed to be given by the projection of the brand into the vector representing that consumer. As is characteristic for a vector MDS model, it also assumes that utility or
299
Analyzing Consumer Choice Behavior
preference changes monotonically with all dimensions. Assume now that:
eij
-N(o,~?)
(2a)
(where 0: is the variance parameter for the i-th consumer),
Cov(eij,eik)= 0, V i , j z k ,
(2b)
Cov(eij,eisk) = 0, V i
(2c)
#
i ‘,j,k.
Suppose that consumer i is presented two brands j and k and is asked to select the one that is “more preferred”. Then
P(6ijk = 1) = P(Vij > Vik)
where
* J
HW
LJ .a
ET
.S
SL
Figure 2. Two-dimensional solution of the Rumelhart and Grecno data according to the wandering vector model. The students in the sample put a conversation with the politician Lyndon B. Johnson at the head of the rank whereas the athlete Carl Yastrzemski takes the last position in this ranking. Of course, once such a ranking is known another obvious question would be “Why?”. Up to now nothing is known about the dimensions in which the solutions are presented. Are there connections between the dimensions in the used joint space and salient aspects of the choice behavior approaches or is the dimensionality just something needed for fitting the mathematical model? One of the many attempts to use the dimensions of the chosen space for interpretation purposes of the solutions obtained is contained in Heiser and de Leeuw (1981) who reanalyzed data originally collected by Sjoberg (1967) from studies by Ekman (1962). In the underlying paired comparisons experiment offenses had to be judged
Probabilistic Choice Models for Marketing
321
with respect to “immorality” and in the two-dimensional space directions concerning reckless vs. intentional causes and a graduation of damage caused by the offenses could be distinguished. While these interpretations are derived without collecting additional information about objects (and subjects) additional tools which could be combined with the up to now described evaluation possibilities should be of interest.
CY
-* JU *a
AF
.z
HW CD
.I
.expected ideal point
Figure 3. Two-dimensional solution of the Rumelhart and Grecno data according to the wandering ideal point model.
3.2 Kaas (1977) Study The second data set is taken from Kaas (1977) who collected paired comparisons data for 10 stimuli consisting of seven hair spray brands and three amounts of money (see Table 2). Hundred customers of a supermarket were asked to judge each brand-brand combination and each brand-money combination with respect to the question “Which stimulus
Gaul
328
possesses a higher worth?” Concerning the money-money combinations it was assumed that a higher amount of money will be preferred, see Table 3 for the aggregated paired comparisons data. Table 2 . Ten choice objects from Kaas (1977).
Amounts of moncy
Brands 1 2 3 4 5 6 7
Elidor Gard Poly Pretty hair Riar Shamtu Taft
8 9 10
2.00DM 2.50 DM 3.00 DM
Table 3. Aggregated paired comparisons matrix for the choice objects in Table 2. 0 72 46 29 37 56
64 31 37 52
28 0 18 21 23 34 28 14
19 41
54 52 0 34 41 64 66 44 43 67
71 79 66 0 53 72 68 44 59 68
63 77 59 47 0 63 71 43 57 69
44 66 36 28 37 0 48 31 35 50
36 72 34 32 29 52 0 29 35 47
69 86 56 56 57 69 71 0 100
100
63 81 57 41
43 65 65 0 0 100
48 59 33 32 31 50 53 0 0 0
Already a cursory glance at Table 4 shows the following:
- The LCJ case V model is not appropriate. -
The LCJ case 111 results have already a non-rejectable fit.
-
All other one-dimensional model versions and also the twodimensional wandering vector model (for which a comparison with the two-dimensional factorial model is interesting) have a bad fit.
Probabilistic Choice Models for Marketing
329
Thus, the attempt to incorporate price as dominant dimension to support the interpretation of the paired comparisons choice behavior data within one-dimensional Thurstonian scaling models - as was done in the original study - was not fully successful. Again, the additional random disturbances parameters 02 could not increase the fit of the models significantly, and was omitted (except for the one-dimensional wandering ideal point and wandering vector model approaches) in Table 4. Tab& 4. Summary of selected analyses on the Kaas (1977) data.
Model specification dimensionality
OZ
C8.W m
Weighted wandering ideal pint model Wandering vector model
Factorial model
1
add.'
2 3 2 1 2 3 2
I~L.
effedive no. of
xz
d.f.
p-value
AIC (-5000)
parameters
Null model LCJ Case V Wandering ideal pin1 model
Test against null model
add.
-2653.68 -2773.30 -2669.02
45 9 18
239.24 30.68
36 27