a theory and procedure of scale analysis
methods and models in the social sciences
1
MOUTON
·
THE HAGUE
·
PARIS...
14 downloads
578 Views
20MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
a theory and procedure of scale analysis
methods and models in the social sciences
1
MOUTON
·
THE HAGUE
·
PARIS
a theory and procedure of scale analysis With applications in political research
by
R. J. MOKKEN University ofAmsterdam
MOUTON
·
THE HAGUE
·
PARIS
Thi� book wa� published with the aid of the Netherlands Organisation for the Advancement of Pure Research (Z.W.O)
,\I muon & Co 1971 l'rinlcd in I he Nclhcrlands •
Lillrary or Congress Calatog Card No. 70·152074
f;J,:;;;;JJe6/ 'd /C. 71
Voor Thelnw. Wiebe. Marc en Fleur
Contents
XIII
ACKNOWLEDGEMENTS
I.
TilE SCOPE OF THE SfUDY
I. I
I
Introduction
1.2
Data models
1.3
Measurement models
1.4
Omline of the study
5 9 I5
PART I. THEORY AND METHOD 2.
THE DETERMINISTIC MODEL:THE GUTI"MAN SCALE
23
2.1
Introduction
23
2.2
The petfe ct scale
24
2.3
2.4
2.2. I
Dichotomous data
25
2.2.2
Monotone items and trace lines
26
2.2.3
Properties of the per fect
27
scale
The tll'o sets as popuiMions
35
2.3.1
The popu lation of subjects
36
2.3.2
The popu lation of items
38
The impetfect scale: the problem of'"error" 2.4.1
The failure of de1erminism:
ome 42
consequences 2.4.2 2.5
The definition of
41
error
"
"
Coefficients o.fsca/ability 2.5.1 Coefficients of reproducibility (Rep) 2.5.2 Coefficien ts of types I and s� vii
43 48 49 54
viii
Conlelll.l'
2.6
2. 7
3.
2.5.3
Various coellicients used in practice
47
2.5.4
The problem of sampling
60
Procedures uf scale analysis
61
2.6. 1
Criteria of scalability
63
2.6.2
The problem of scoring
The quasi-scale: a stochastic model
PROilAOILISTtC RESPONSE: LATENT STRUCTURE MODELS
3. I
3.2
Lalen/ structure analysis
72 73
3. l.l
Response functions
78
3.1.2
An exponential model
86
Latent structure models for dichotomous data
93 94
3.2.1
Trace lines
3.2.2
Specific forms of trace lines
97
3.2.3
Normal ogive and logistic models
98
3.2.4
The approach of Rasch: the two-parameter
3.2.5
The empirical equivalence of normal ogive
logistic model and logistic models 3.2.6
I05 I II
A digression: some paired-comparison models
4.
68 70
Ill
HOMOGENEITY AND HOLOMORPHISM IN SCALING MODELS: SOME SIMPLE PROPERTIES
4. I
4.2
holomorphic two-parameter models
I 17
4. I. I
Some lemmas
I 19
4. I. 2
Some marginal relations
122
4. I .3
Ordering items and subjects
124
4.1.4
Marginal second moments
129
Scale statistics: score, reliability, scalability and patterns 4.2.1
-1.3
115
Properties of the simple score�
138 138
4.2.2
Reliability
142
4.2.3
A coefficient of scalability: !f.
148
4.2.4
Response patterns
153
Approximate sampling distributions
157
4.3. I
157
Some remarks on estimation
Colllents 4.3.2
The approximate sampling distribution of fj_: the null case
4.3.3
The approximate sampling di�tribution or
if.:
the non-null case 5.
A CLASS OF SCALING PROCEDURES
5.1
160 164 170
Stllltllwry and evaluation of the findings
170
5. I. I
Parametric modeb
171
5.1.2
Non-parametric modeb: homogeneity and
5.1.3
Summary of the notation�
176
5.1.4
The ordering of item� and subjects
178
5.1.5
Positive correlation: small ..error"
173
holomorphism
probabi litie!>
179
5.1.6
Checks for holomorphism
180
5.1.7
A criterion of scalability: H
182
5.1.8
Definition of a scale
184
5.1.9
The '>i gni ficance of respon'>e panerns
185 186
5.1.10 The '>imple score 5.2
ix
5.1.1 I Conclu sion
187
Description ofscoling procedures
187
5.2.1
The evaluation of a set of items as one scale
189
5.2.2
Constructing a scale from a pool of items
190
5.2.3
Multiple sca ling
194
5.2.4
Extending nn ex ist ing scale
195
5.2.5
S electing fl level of confidence
196
5.2.6
Investigating double monotony
198
5.2.7
Reliability coemcients
198
5.2.8
The en·ects of the order of -,election
198
PART II. APPLICATIONS IN POLITICAL RESEARCH 6.
CROSS-CULTURAL COI-11'1\I�ISON:
rilE DISCOVERY 01
DII-IENSIONAL I DEN r1 I Y
203
6.1
On cross-cultural COIII{mrisoll
203
6.2
The study by Anast ( Modi.1·o11, Wise.)
207
6.2. I
Method
208
6.2.2
Results
208
X
Contents
6.3
The study by Carter Jr. and Clark (Minneap olis,
Method
210 210 211 212 213
Results
214
Minn.)
6.3.1 6.3.2 6.4
7.
Results
A Dutch study (Amsterdam,
6.4.1 6.4.2 6.5
Method
1964)
A second Dutch study (Amsterdam, 1965) Method
216
6.5.2
Results
6.6
A scale of political interest
217 218
6.7
Conclusions
222
THE CROSS-CULTURAL ROBUSTNESS OF SCALES: POLITICAL EFFICACY
7.1
7.1.1 7.1.2
A new analysis of the efficacy items:
7.1.3
The original scale re-analyzed
7.1.4
The non-monotony of item I (Voting only
Political efficacy: concept and scale Dutch-American comparisons cross-culturally
226 226 229 232
way)
234
7.1.5
A cross-culturally robust scale
237
7.1.6
A comparison of the efficacy scale across
7.1.7
The item marginals in cross-national
sub-groups within the Dutch sample comparison
7.2
224
Dutch-American comparisons of the "sense of political efficacy"
�.
216
6.5.1
Some further possibilities
240 242 245
7.2.1
Equivalence and comparability: an operational framework
245
7.2.2
The consequences for parametric models
7.2.3
The theoretical relevance of scale structure
248 249
AN EXPLOHATION OF POLITICAL EFFICACY
254
l!.l
254
A Dutch extension of the efficacy scale
8.1.1
A nine-item elficacy scale
255
8.1.2
A local efficacy scale (8 items)
257
Contents 8.2
An analysis across sub-groups
8.3
A broader ef f icacy dimension: a combined scale 270
Investigating the double monotony of scales
272
8.4. I
Some examples
274
8.4.2
Conclusions
287
8.5
The reliability ofdoubly monotone scales
APPLICATIONS OF MULTIPLE SCALING
9.1
Seating opinion leadership
288 290 290
9.1. I
Opinion leadership: the concept
291
9.1.2
A scale of opinion leadership
294
9.2
The sense of civic competence and the sense of political efficacy
I0
259
(17 items) 8.4
9.
XI
300
9.3
Influence stereotypes
306
9.4
Scaling political participation
312
9.4.1
The dimensionality of political participation
313
9.4.2
Scales of political participation
315
CONCLUSION
APPENDIX
2
324 333
Sampling design
333
The Dutch text of the scales
334
REFERENCES
339
INDEX
348
Acknowledgements
This study is a result of my activi tie
at three academic institutes in
Amsterdam: the Department of Mathema tical Statistics of the Mathe matical C entre and the 1 nstitutes of Mass Communications and for Political Science of the University of Amsterdam. It served as a dis sertation in the Faculty of Social Sciences of the University of
A ms terdam
.
Much gra t itude i
owed by me to my nearest colleagues and numer
ous former colleagues for their kind cooperation and assistance as well as for the opportunities they have given me. I benefited much from the critical comments and general guidance of Professor H. Daudt of the Institute for Political Science during the preparation of the manuscript. Special thanks arc due to Drs. W. van Nootcn of the Mathematical Centre for our thorough and frequent discussions about chapters 3 and 4 which led to many improvements. With pleasure I also acknowledge the collaboration with Drs. F. Bergsma and Drs. F. N. Stokman in the research. some resul t s or
which arc reported in chapters 6 and 9. and that with Dr�. Co ns tance
E. van dcr Macsen and 0. Schmidt in the research. reported in cha pters 7 8 and 9. .
For the typ ing of the manuscript I owe much to the concerted cn·orts of Miss Anke Faddegon, Miss N. Kool and. last but not least, of my
wife Thelma. The author's English was corrected skilfully by Mrs.
C. M. van Sta indeed exist and may be measured by the observations. Only if thi� specific internal consistency is observed may we try to measure the variable in what is probably a more reliable way and, it is hoped or believed, with better chances of validity. As for general validity, however, all types of measurement will have to �tand the ultimate test of construct validity: their satisfactory performance within the theoretical and operational context to which they refer (Cronbach and Meehl, 1955; De Groot, 1961, 271-8). Other general types of measurement have been distinguished. Those forming part of what has been called "the classical view of measurement"
are well-known
and have been highly influential
(Sicvens, 1959. 21-3). This classical view produced what was mainly a reconstruction of the measurement procedures commonly used in the physical sciences in the form of a measurement theory (Krantz,
1967, 12). This theory has had great impact in the social sciences. although there measurement procedures as common and obvious in
I he light of everyday experience as those used in physics do not occur. A well-known typology of measurement used in this classical theory i� that of Campbell, who distinguishedfundamental measuremenl and derived
meas urem ent
(Campbell. 1 928, 14). Fundamental measure
ment i� direct in the sen�e that no prior measurements are involved, whereas derived measurement is based on other measurements, as in the classical example of temperature and density. Fundamental measurement is performed clircclly on the ob�crvations and in this respect resembles measurement by fiat. except for the fact that in its original formulation as a mea!>urcment procedure Campbell restricted it to the measurement of quanlilies. i.e. empirical properties admitting an empirical operation equivalenl (or rather isomorph) to the mathe-
The scope of tlte study
7
matical operation of addition. The classical example is the measure . ment of mass in which the .. additive. operation consists of putting weight:, together in a balance. Therefore. in the classical exposition referred to. the procedures of fundamental measurement and necessarily those of derived measure ment. as they were ultimately based on the former, resulted in ratio or interval scales. This narrow definition of measurement was broadened considerably as further developments of the mea!>uremcnt theory took place in the social sciences. S.S. Stevens took the first step by taking into consideration the range of transformations under which a particular
calc was invariant and introducing his well-known classifi
cation of scales of measurement (Stevens. 195 I). A later axiomatic treatment of measurement theories (Scott and Suppes 1958) in a . more recent version introduced a method of fundamental and derived measurement in which no special value is attached to the existence or an additive operation, which in the social sciences is a very rare phenomenon (Suppes and Zinncs, 1963). In this system the various models giving rise to Coombs' unfolding technique (
oombs. 1950)
arc regarded as examples of the extended concepts of fundamental measurement. whereas both the Bradley-Terry-Luce models and the 1 hurstone models for
paired comparisons data (Bradley and
Terry. 1952: Luce, 1959: Thurstone, 1927) arc seen as examples of derived measurement because a first operation of measurement, the . introduction and estimation of probabilities for pairs, is necessary before the ultimate scale values cun be derived. We �hall l>ee that in this study a first step of measurement is rostulated by introducing probabilities of a positive response to item\ as function!> of the unknown �cale values of subjects. The modeb dealt with in this study can therefore be classified most appropriately us derived measurement models. if we follow the foregoing reasoning. • Arart from what has been said about the develorment of a formal mea�urement theory, it has also rightly been :,tated that
... at the rresent stage of development of quantitative theory in
··
�ocial science. it is impossible to separate the search for interesting empirical hi\\(S from the discovery and refinement of measurement procedures·· (Krantz. 1967. 13). Indecd. significant break-throughs in the social sciences have often l'l.antagl (196R. 173-4). however. lrcah paired compari,on '" fundamentatmeasure mcnl. lie flUh up a ralher convincing argument that "derived mc:l\urement" should not· be con,idercd measurement at all (Pfan�::ogt.
I 'JflH. 3 I).
8
The scope of the study
been associated with new methods of observation, data construction and finally measurement. The development of factor analysis is just one well-known example of this process. Such developments lead to a continuing increase of the observational possibilities and a multitude of data configurations, which almost defies classification. The development of modern probability theory and mathematical statistics has opened up further perspectives, making it possible to formulate and test probabilistic models which formalize behavioral theories in terms of a great variety of data. For a particular set of data we may derive from a plausible theory such a probabilistic model, which explains the data partially in term of parameters that are defined within the model. One might argue that as soon as values of para meters are estimated for such models from the data, a problem of measurement is in question. This would lead to a very broad defini tion of measurement indeed. For the purposes of our study, at least, we shall use a more narrow one. In fact, we shall distinguish data models in general from measurement models in particular. Coombs ( 1964, 4-6) has pointed out that the term data is commonly used in the behavioral sciences in a double meaning. In the first place it is used to denote the original recorded observations, the "raw" empirical items that are first collected as part of the process of inquiry. The voting record of legislators in a particular session of a legislature, the votes cast by members of the United Nations General Assembly, the answers of a respondent to a set of questions in a questionnaire are examples of data in this first meaning. According to another usage, the term data is used to denote only those observational elements that enter into the models used for analysis. These observational elements are not necessarily identical to the original recorded observations. Thus a number of roll-calls concerning a group of legislative items may form the recorded observations. The matrix of correlations between pairs of legislative items, taken over legislators after a proper scoring of votes, may form the set of data for a factor analysis or a multidimensional scaling model. Another example that corresponds more closely to the type of problems in this study is that in which, after a proper dichotomous �coring of the answers to a set of questions, the set of (2 X 2)-tables between questions and taken over respondents, forms the data that are entered into a scale analysis. Coombs particularized the term dMa to denote only observational clements according to this second meaning. indicating the class of restructured observations that enter into the analytic models used for
The scope ofrhe srudy
9
in ference and distinguish ing them from the original recorded observa tions. The merit of this specialized definition of the term clara is that it helps us keep in mind that these original recorded observations can be converted into a plurality of different types of data corresponding to difl'erenttypes of models. Following Coombs's argument. three phases of scientific activity may be discerned in the inferential process. The first phase covers the recording of the primary observations from the ..real world" as a universe of potential observations. l n the second phase these recorded observations arc convened into data by a creative effort of interpretation on the part of the scientist. In this second phase the observations are mapped onto data. so to speak. This entails the identification and imerrclation of indivi duals and stimuli,
ubjects and objects or whatever difrcrcnt classes
of observational units are brought in relation to eaL:h other in a particular form of data. The third phase, finally, involves the detection of relations. order and structure that follow from the data and the model chosen for analysis, leading to a classification of e.g. individuals and stimuli. ror instance, applying our model to the data, we may estimate parameters or other quantities that we require in the context of our research. In the case of measurement this third phase leads to the determination of scale values for individuals. For this type of model, one that renects the data and is used
to
analyze them in order to obtain
inferential evidence, the name dow model has been used (Stouthard,
1965). Thus Coombs' view of the process of inquiry may be summarized as follows: observations are recorded, the recorded observations are converted into a ser of data which then by means of a data model is analyzed for inferential evidence or classification.
1.3
MEASUREMENT MODELS
It will be clear from the discussion in the last section that -leaving out of account the trivial fact that there are an infinity of ways of collecting and recording observations- given a certain set of observa tions, there are many different ways of converting these into data. Again, given a particular set of data, there will usually be numerous possibilities of constructing models to describe and analyse these data. Stouthard ( 1965, 59-75) has stressed the point that data models
10
The scope of/he s!ltdy
form a much larger class than measurement models. In this study we shall also consider measurement models as a sub-class of the data models. distincuished from the other types by the special purpose for which they are used. This purpose is not in the first place to seek
an adequate unidimensional or multidimensional representation of our set of data, as is the case in, for instance, the analysis of the structure of party preferences in electoral research (Converse, 1966). The purpose is primarily the derivation of numerical values for a variable
that may be used as valid measurements to relate such a variable to other quantitative attributes in research. For the purpose of this study, we may therefore define a measure ment model as a data model applied with the intention of inferring from the postulated or verified fit of the data to the model the values of the model variable, as determined by the data, to be used as measurements in a wider research context. The insistence on the intention of actual use of the variable in research, though admittedly rather pragmatic, is not in disagreement with the traditional views on measurement and is intended to exclude to a certain degree models that are of limited use. It should incline us to select the simplest models with which the data are in agreement. Henceforth we shall use the term scaling model in a sense more or less synonymous with measurement model. All data models are based on theories that are partly imposed and partly tested on the data models, and this is therefore true of measure ment or scaling models as well. As Coombs has remarked: "This illustrates the general principle that all knowledge is the result of theory- we buy information with assumptions- "facts"' are inferences, and so also are data and measurements and scales··
( 1964, 5). In addition to this we want to stress once again the fact that different data models, that is, different behavioral theories, can be used tu describe the same set of data even in the case of measurement models. This whole study, for instance, may be regarded as an illustration of the fact that many different models can be used to explain or describe dichotomous data of a particular kind. From the multitude of possible models for a given set of data, we may prefer to choose for our measurement models only those that arc relatively simple in their behavioral assumptions, as are some of those which we will discuss in later chapters. In a sense this is simply another application ofThurstone's "principle of parsimony". Despite the narrowing of our definition of a measurement model.
The scope uj'lite study
II
rhe variety ol" types of data and correwonding models appropriat e to the con:.truction of measurement procedures is �till very great. In Coomb:.· theory of data."hich is concerned with behavioral theory at the level that forms the basi
for mea�urement and scaling as
practised in the l>Ocial sciences. a formal classification ol" data types is attempted on the basis of a triple dichotomy. super�eding an older classification ( 1964.3-31. 561-5; 1953. 491-4). One important basis for classification in
oombs· \ystem may be
mentioned here to locate the major part of the research re po n ed in this sllldy. Thi:. i
a distinction based on the sets of observational
element� that arc brought into relation with each other in data and model. In a general formal context, two difl'crcnr sets may be involved: a set of subjects, individual!>. or responden t'> and a set of items or stimuli. The classification of data involved here is ba�ed on the way the two typeo, of sets are dealt with in the data. Either both sets or only one of them may be distingubhed in the model. For in'ttancc, a large class of psy chological scaling data i� based on experiment' in which stimuli are compared. Pair comparison data form an C\ample here.
In the corresponding scaling models. the
indi-.rduab rnvolved are not distinguished. but seen a'> replica t ions ol" the same companson experiment. In the models for the�e stimulu� compari.�on dma (or B data; Coombs. 1964. 27-8) only one !>et, the set of stamuli, is speci fi ed. In mult i pa rty -.y�tems. for insrance. we may ask a sample of the electorate to judge all possi ble pairs that can be formed from the relevant parties and to indicate their relative preference for each pair p resented . In mode l s for this type of data it is sometime� a-,sumed that the preference judgements arc determined mainly by the rela ti ve po�ition of the partie� with respect to e ach other irre�pecrive of the individu;il diiTerence� between 1.ubjccts.
In that ca�e only the values and
positions of the stimuli (parties) ente r into the data model. In the type of scaling problem� with "hich we arc concerned in this study. a diiTerent situation prevail'>. In this case the respondenh may be thought of as comparing tltemselues (i.e. their altitudes) to stimuli or items one at a time. A typical case is the situatilln where respondenh compare their dispositions and rhc contents or certain questions '' hen trying to answer them . Let us com.idcr the case nr
a
respond ent determining his response . . sense of
to an int erv ie'' que�tion related to a concept ::.uch as the
political el licacy.. hce chap ters 7 and X). F ir s t of all we may assume that'" a �timulus the contenl and meaning or rhar partii�ting of elements of each of the two different sets. The three approaches to scaling put forward by Torgerson ( 1958, -15-60) are closely related to the classification described above. He
first mentions the subject-centered approach. Stimuli are seen as replications of the same experiment and combined solely with the purpose of differentiating between subjects. As examples Torgerson mentions the Likert-type procedures and most of the methods in the field of mental testing. Following Coombs' terminology, one might call them individual comparison data (if one is prepared to include stimuli comparing individuals), as only the set of individuals is involved in these procedures, which Torgerson somewhat casually regards as belonging to the domain of measurement by definition. Methods of this type, which belong chiefly to the field of mental testing, are not discussed by Torgerson, who refers the reader to standard texts in this field (e.g. Gulliksen,
1950). In this study no
very important place is given to them either. As a second approach Torgerson mentions the stimulus-centered or judgement approach, which is essentially based on Coombs' previously mentioned stimulus comparison data or B-data, whereas Torgerson's response approach, in which the variability of reactions
to stimuli is ascribed to differences in the subjects as well as in the stimuli, corresponds to the individual-stimulus comparison data or A-data of Coombs. As has been said before, the major part of the rc�carch reported here has been carried out on the basis of this rc!>pon�c approach. Another important distinction is that between 1111idi111ensional and multidi111e11siollal scaling methods. In our discussion we have mentioned the two basic sets that can be considered in measurement models. In the unidimensional measure-
The scope of the study
13
ment models, the sets involved are mostly represented as sub-sets of the real line, ''the single continuum" as it is traditionally called. The positi on of an individual is reduced to one value. which indicates his position (score) on the single dimension, the interpretation of which again depends on the problem behind the data. In the more general models that have been developed, these sets arc sub-sets of n-dimensional Euclidean space. In the case of the set of individuals. for instance, the individual is considered as a point in this space, the nature and interpretation of which depend on the problem at hand. If the space can be interpreted as an attitude space. the individual's position (attitude) a a point in that space will be given by the coordinates of that poi nt , i.e. the collection of values of that particular individual on each of the relevant attitude dimensions that arc spanning the space. Factor analysis is one of the early methods used to generate metric multidimensional representations.
We shall mention just on e recent example. Alker represented the nations particip ating in the Sixteenth General Assembly of the United Nations on the basis of a factor analysis of seventy votes rrom this session in terms of their positions in a two-dimensional space
spanned by and "East-West" and a
N orth South
"
-
"
conflict dimension
(Aiker, 1964). Besides the methods of metric multidimensional scale analysis based on an observed distance metric (Torgerson, 1958,
247-97), method s of non-metric multidimensional scaling have been developed which seem very promising indeed. As
examples
of
more recent development� we may mention
Shepard ( 1962a;
1962b) and the improvements of his model by
Kruskal ( 1964a;
1964b) and Guttman ( 1968). (See also Roskam,
1968). Stokes ( 1963) has been studying the usc of spatial models to explain party competition. Converse ( 1966) suggested that non metric multidimensional scaling method!> be used to analy!>c perceived party distances in multiparty systems for the analysis of voting change. The research that is reported here is for the most part restricted to unidimensional scaling methods. Despite the increasing importance of multidimensional measurement procedures. at least two arguments can be put rorward in support of the cont i nuing inve st igatio n and application of unidimensional scaling models. In the first place. many of the uses that have been made of multi dimensional techniques fall outside the field or measurement as defined in this study. Then the problem is primarily to detect relevant dimensions and not to measure them. For instance. the major p art of the applications or factor analysis ralls into this class. In the same way,
T/i(' ,\cop(' uft/i(' .1f11dy
14
mullidimensional �caling models, apart from providing possibilities for genuine multidimensional measurement, may continue to serve to a considerable extent the purpose not so much of measurement, but of the reduction of data structures in search of simple and useful dimension!.. Once thc!>e fat:h have been realized and relevant dimensions have been found. J'urt her application of these results in research may very often call for adequate unidimensional scaling models based on observation' ("pure items") which are constructed especially for the purpo!>e llf measuring each of the dimensions concerned (for an example
see
section
6.6.). Therefore adequate unidimensional
techniques are still needed as well as multidimensional techniques. Another argument in favour of the use of adequate unidimensional st:ales stems from the increasing use of more advanced multivariate techniques. A recent trend is the application of linear models for purpo:.e� of causal analysis (Blalock,
1961; Boudon, 1967. 30-202; 1966). One of the earlier examples of this approach in the lield of political research is the analysis by Miller and Stokes (1963) Alker
of the influence of constituency opinion on roll call behavior of Congressmen.
Other
examples followed:
Cnudde and
McCrone
( 1966), in the same field as the last-mentioned study, and Tanter ( 1967). in the field of political development. Goldberg in a similar analysis of data on voting behavior even expects that such procedures \�ill become a part of statistical orthodoxy (Goldberg.
1966). The
fa capability as a scaling criterion. When we arc interested in scaling from the point of viC\\ of a scaling criterion we arc interested in psychological theory; when we are interested in scaling from the point of view of a �caling technique we have already adopted some theory (that which led to the datn and their conversion to measurements) and arc interested in constructing tooh for fu rt her research." It i� C\actl} thi� last focL" of intere">t that dominate\ the research rerortcd here.
1.-1
OUIIINE
OF
'IIIE SIUOY
We may ">ummarit.c our imroduction by -.aying that thi" study is mainly concerned with unidimcn�ional mca�urcmcnt modcb based on data of the individual-�timulu� comparison type (A-data in Coombs'
fiN typology) and the resron�c approach (Torgerson. 1958). Our analy... i... will be further n.:�trictcd to stimuli or items with dichotomized
16
The scope ofrhe srud_,.
response alternative�. Both deterministic and probabilistic measure ment models will be investigated. The major part of our study will, however, be devoted to models of the Janer type. the former, deterministic, models serving mainly as a good starting point from which the analysis can proceed. The well known Gunman scale will serve as the deterministic example because of its pragmatic value in research, demonstrated by the many scaling procedures it has fathered. ln fact
a
major purpose of these probabil
istic models will be to find a better methodological basis for the procedure!> of cumulative scaling that have proved so useful in many fields of social research. Yet these models have been given relatively Jiule aucntion in the more comprehensive monographs that have been published up to now.
Coombs,
commenting
on the individual
�timulus comparison data, the data which form the chief basis of our �tudy (Ql l a or b data in his later system of classification, 1964, 21 1-83), remarks that they are the most prevalent in psychology. Nevertheless he treats these models only summarily, referring for a more elaborate treatment to Torgerson's study ( 1958). The major part of the latter work is, however, devoted to Thurstonean models baseu on the judgement or stimulus-comparison approach which originated in the problems of psychophysical scaling. Compared to the elaborate treatment of this subject, the discussion in the Torgerson monograph of the measurement models that are the object of our �tully is but a cursory one. It provides a good though not complete survey of the techniques of Guttman scaling, but does not give much information about probabilistic versions; this is due to the fact that -..uch stochastic scaling models are comparatively new developments in the behavioral sciences. Our �tudy will consist of two parts. In the first part (chapters 2-5) the rheory and method of scaling proposed by us will be dealt with. In the second part (chapters 6-9) we shall consider a number of applications. In chapter 2 (part 1) we shall investigate the relevant characteristics and practices of the conventional methods of Guttman scaling for dichotomous data. It is not our purpose to give a complete survey of all the varieties of practical methods and techniques that are common ly used in social research. We arc interested mainly in an unalysis of the common deterministic assumptions from which they were derived. We �hall end our analysis with a discussion of the conclusion that this determiniqic framework led to cumbersome procedures frequently containing fallacious elements. The admission of "error", which is
The scope of/he study
17
necessary in order to arrivl: at practical procedures that can be used in resea rch. and the corresponding concept of a ·'quasi-scale.,. under line the necessity of incorporating a theory of error in the model itself. Consequently, probabilistic response models should be analyzed and used to evaluate or set up scaling methods instead of deterministic response models. The survey of chapter � therefore suggests the framework of the next two chapter
and the problems that will be
investigated in these chapters.
Clwp1ers 3 a nd 4 are unabashedly mathematical and contain analyses of a type which will probably be beyond the range of interest to be expected from many readers concerned with social measurement and its applications. Instead of relegating his mathematical analyses to appendices, the author preferred to develop them in the context or his study in separate chapters. The reader who is not sufficiently interested in these mathematical details may skip chapters 3 and 4 and will find in the first section of chapter
5 a summary of the main results of these chapters.
We begin the analysis in chapter 3 with a mathematical scaling model
of
extreme
generality
which allows
of
multidimensional
subject and item representation. This model is presented as a general ization of the "latent structure" model suggested by Lazarsfeld
( 1950). It includes as special ca�es not only all the models proposed under this heading but also models such as factor analysis. Thi� model gives us the opportunity of mentioning the types or problems involved in probabilistic models and dealing with a specific sub-clas� of latent structure models which can be derived through the applica tion of an interesting new theory of measurement proposed by Rasch. In the second part of chapter 3 the latem structure model is special ized for dichotomous data and one-dimensional subject representa tion. As an illustration of the theory developed in the first part, specific parametric models based on the cumulative normal (normal ogivc) and the logistic curves arc considered. In chapler 4 we continue our mathematical analysis of models for dichotomou� data with a general class of models with one-dimensional subject and item representation and without a specification of their parametric or functional form. These non-parametric models, which are called 11101/0IOIIely ho1110Reneous and Jwlomorph or doubly
lltono
rone seem fairly natural probabilistic counterparts of the Gunman model. They give us the opportunity or deriving a number of proper tics which serve to evaluate prevalent usages in conventional Guttman scaling methods and of arriving at new scaling methods not containing the fallacious elements of t he first.
The scope ofthe st11dy
18
Thi� derivation or a dass of scaling methods and procedures based a .:lear delinit ion or a �cale is undertaken in chapter 5. The chapter
011
hccin�
�
11
ith a �ummary and recapitulation of the main results of
ch lptcl � 3 and -1. Using these results. a new, simple definition of a scale i� given in terms of which a class of scaling procedures can be de\ eloped and actually were subsequently developed in a system of program� a� the Mathematical Centre in Amsterdam. Chapter 5 .:onclude� Part I and the discussion of our scaling theory. Part II (chapters 6-9) is devoted to a number of applications of our methnds. Apart from the fact that these applications may serve as illu�trations of several methods suggested in this study, the author hope� that these chapters will have a substantial interest of their O\\
11.
They are based on results obtained in the fields of local politics,
electoral research and mass communication research at the Institute ror Policital Science and the Institute for Mass Communication at the University of Amsterdam. The investigations whose results are reported here were not just constructed as particular examples of �cales but were carried out in the context of current research i n accordance with the author's conviction that the development o f theory and measurement should go firmly hand i n hand. The first two chapters (6 and 7) of Part II contain the results of �ome cross-national comparative investigations in the framework of .:ro�s-cultural methodology. The general validity of variables measured in terms of measurement models such as scales attains a special �ignificance when we investigate
the common existence of the
dimensions concerned in a cross-cultural or cross-national context.
We hope to demonstrate that our models and methods supply us with the means of developing a methodological and conceptual framework on the basis of which such problems may be analyzed with profit. Chapter 6 gives an introduction to the subject with a comparative �tudy of factor analyses of readership interest in the contents of newspapers. The chapter concludes with some evidence that in
certain cases dimensions found by factor analysis may subsequently be �calable. In chapter 7 the cross-cultural existence of common dimensions is ill\ e�tigated in term� of the concept of the "'robustness" of scales. I he di�cussion is ba�cd on a comparative analysis of the scale of the ··�en ...e of political enicacy" for the United States and the Netherlands.
In chapter 8 we report our re�ulh concerning the development of a �c;de l'f the "sense of political enicacy" in the Netherlands as an e\ten�ion of the original �calc. Al"tcr an additional scale concerning the �cn�e of efficacy with respect to local politics had been set up,
The scope oft he study
19
rile analysis produced evidence of a more general dimension of ··toea!''
and ··national" political efli cacy. 1 n chapter 9, finall y, we r..:port the re�u lts of a number of analyses in 11hich our procedure or mullipl e �eating was applied with varying degrees of success. The re�ulh concern �c;t lc� or the type of informal
curnmunication behavior associated with opinion kacler�hip. of th e �cnse of civic competence (compared 11 ith the �ensc or political
cllicacy), and of two dimensions or influence stereotype� . The chapter
concludes with our findings concerning the �c a lahilit y or political participation. In chapter I 0, our final chaptcr. we �urnmarit.e our main conclusions. Some final remarks must be made with r e, pec t to the lllllllerical ree f rences to sections, tables, figure� and t heore m� in thc text.
Sectio ns and sub-sections are numbered lexicographically within the chapters or sections. For instance, �ection X. I. d enote� the first scction of chapter 8 and section 8.1.2. the �econd �ub-�ection or ,ection 8.1. Figures and tables are numbered cunsccutivcly according to the chapter, so that table 9.3 denotes the third table of cha pte r 9.
i 1n order to keep numerical references re ati vel y simple. t heorcm s . Jc.:mmas and corollaries will be nu mbere d according to the "�ctions
and sub-sections only, without reference to the nu m h..: r or thcllicicnts defined for populations. The deterministic theory of the Guttman scale has led to quite a number of rules of thumb, meant to refine practical methods of scaling. In chapter 4 we shall attcmrt to evaluate a number of such rules with the help of a stochastic �caling model. We may prepare the way for such an analysis by an aprraisal
of the Guttman model
and the procedures that have
rrevailed in practice. Still another reason for reviewing this simple model is that it di�plays in a rudimentary form virtually all the major properties and problems that characterize the more general scaling models which we �hall consider in chapters 3 and 4 and of which it will prove to be a �recial case. As such the model is a good introduction to the general problem of scaling. In fact, when re-reading Guttman's original text, one is struck by the fact that almost all these aspects were seen or !>urmised by Guttman at that time (Guttman 1950 a, b). In our review we will not attempt to give a complete survey of all the techniques and problems that have been treated under the heading or the Guttman method. For this we may refer the reader to the well known reviews mentioned above. We shall restrict ourselves to the rroperties, problems and criteria that seem relevant in trying to
evaluate and develop scaling procedures based on a non-deterministic theory. 1\loreover, we will consider only dichotomized items. type
of data very
much
As this
prevails in practical applications,
this
restriction does not seem too narrow. Nor shall we deal with the metric solution and the related problem of the principal components or scalable attitudes, intensity analysis and the determination of a 1..eru
point, for which we may refer the reader to Torgerson ( 1958,
336-46).
2.2 We
TilE PERFECT SCALE '>aw
in chapter I that in the case of individual-stimulus comparison
data. two different types of sets may be distinguished. Guttman (I ':>50 a. HO) already referred to a set of objects and a set of attributes, remarking that "scale a11alysis is a formal analysis, and hence applies
The deterministic model: the C Ult111an scale
25
10 nny universe of qualitative data of any science, obtained by any
,111mer ofobservation" ( J950a. t\8). We shall specialize this general form to suit the type of empirical research reported in this study. as follows. In the fi rst place there will
be one set of individuals or rcspondcnb. We shall refer to this set mostly as the set of subjects. The second set consists of stimuli presented to these subj ec ts . wch as a number of items presented in
the context of a survey quest ionnaire. We shall refer to this set as the sel of items. It is assumed tha t each subject responds to each item as a
single stimulus, that is, that he measures himself against the it ems
one at a time and does not measure the items against each other.
2.2.1
Dichotomous data
A subject may respond in many ways to the stimuli that arc presented to him. As a result we can usually discern for any sti m u lus or item a number of different possible responses or categories in the raw recorded observations. In the data models which we shall consider in this study, these numerous possible responses arc reduced through
an appropriate combination to just two responses. the same for each subject.
In fact, we select one respon se category. or combine a
number of important response alternat ives into one �::atcgory. and then consider all other response alternat ives as the second, comple mentary response category. We may therefore think of this reduction as a one-sided dichotomization of the or i ginal response alternatives. The adjective "one-sided" is used mainly to stress a certain point of view with respect to an interpretation or the resulting dichotomous response categories. We may clarify this point as follows . One-sided
dichotomization
occurs
when we
isolate
fl'llm the
possible response alternatives that alternative which is �::onsidered on the basis of item content to be related most signilicantly or meaning fully to the underlying continuum that is being measured. In this sense we may consider that alternative as containing the information most relevant for the measurement of the underlying variable. We shall call this alternative in the resulting dichotomy the scale alremative or scale response and occasionally, when no confusion seems possible, the positive
alternative or response. The information in all the
other alternatives is disregarded, for these alternatives are lumped together in the second complementary
alternati v e
or the dichotomy.
The meaning of this second alternative is derived solely from the scale alternative in that the former is the negation of t he latter.
:.!6
Theory and llletliod
1!.\amplc� borrowed from i\1easurement and Prediction 1 950a. 115) are given to illustrate the procedure. They . ,,,0 concern items from a scale measuring "Satisfaction with One's 1\rmy Job ... Fir;, I an item of the Yes-No variety. )I\ 0
cSul:hman
.. Would} ou change to some other Army job if given a chance?" I- Ye� :! -No J- Undecided 1 n this case 2: "No" may be considered to be the scale alternative (the
..positive··
answer),
"Yes"
and
"Undecided"
forming
the
complementary alternative. Next
a
multicategory item:
.. Which of the following would you say best applies to your job?" 1 -Time always passes quickly. 2 -Time passes quickly most of the time. 3- Enjoy working part of the time, but it drags at other times. -l -Time drags most of the time. 5 -Time always drags. ( ..No answers" were all coded 0). llcrl! alternative I may be the scale alternative. However, one might al'o combine several alternatives to form the scale alternative, e.g. alternatives I and 2. The other alternatives together form the second complementary alternative. Guttman's original method was also designed to scale multicategory items without dichotomizing them. We shall not consider these items in this study, and again refer the reader to Torgerson ( 1958) or l'vlatalon ( 1965). This restriction does not seem too severe when we consider that dichotomized data are used very frequently in measure ment and analysis in the behavioral sciences .
., ., ..,
Monotone items and trace lines
II ha-; been said that each subject ..measures" himself against each item in terms of the variable we want to measure. his response giving the rl!\ult of this comparison. The content of the item determines this re\pon-,c behavior. 0\\
this response behavior can be partially classified into two
The deterministic model: the G utt11wn scale
27
eading to a classification of items that is very important in ty pes. l ��:aling models. We can illustrate thi:-. with the usual example of t:dlness. When investigating the heights of people we might formulate
tii'O types of questions or item� that lead to totally different forms of r.:sponse behavior. I. 73 m. tall?" 1. "Are you :?.. ·'Are you over 1.73 m. tall?" The hypothetical results are shown in figure 2. 1. The variable that we are measuring and that may be thought t o 1:cnerate the responses of the subjects to the items. can be represented example �t� the "single continuum" along the horizontal axis. In our this '"underlying" variable is the ..unknown.. length of the respon
d ents. In our model each subject compares his own position on this
continuum with the item content as related to the au ribute. Let us take as scale al·.ernative in both questions the answer ..yes". W e can
imagine respondents of all possible heights distributed along the horizontal axis. A respondent of a certain height (the degree. quantity
or value of the attribute) will give the ..positive" answer (scale alternative) with a probability depending on that degree or value.
Assuming that with all subjects of the same height. the -;ame probabil
ity exists that they will do so, we can measure the probability of a
positive answer (scale alternative) and plot this probability as a
function of the "underlying" or "latent" attribute or height.
This method of representation, very common in stochastic scaling
models, introduces the probabilities of a positive response as curve:..
In both cases these curves will range along a P-axi'> between the
values zero and one.
p
"point " item
(Question 1)
p
"monotone" item (Quc�tlon 2)
height
1-igure 2.1 The probability of the scale alternative for two types of questions. \haded area indicates the interval with non-zero probability of po,itive response.
The
28
Theory am/merfwd
These curves have been called item trace lines in latent structure analysis ( Lazarsfeld. 1950), item operating characterisrics (Green, 1954) or item characteristic curves (I.C. C.'s) in mental test construc tion (Lord and Novick. 1968, 366). Folilm in�.: LaLarsfeld we will mainly use the term trace lines. To return t� our example: we first note a striking dissimilarity in the trace lines or items I and 2. Of course that difference is based on the difTerent con1ent of the items. In question I only respondents whose height is around I. 73 m. will be inclined to say "yes" to item I. By doing so they identify their height with the proximity of the point 1.73 m. With subjects that are either much taller or much shorter than 1.73 m.. there is a high probability that they will not give the scale answer ""yes". Items that elicit this behavior have been called point iwns (Mosteller, 1949) or differential items (Loevinger, 1948). On the other hand, the trace line of item 2 indicates quite another type of behavior. This again is due to the wording of the question, which makes the subject compare his own height with that mentioned in the question and which admits of the answer "yes" only when the �ubject rates himself at least as tall as the item states. The respondent orders himself in comparison to the item. The�e two types of behavior, one indicating proximity to the stimuli and the other indicating the order relationship of subjects to items, has been made one of the three basic dimensions in Coombs' most recent typology of data ( 1964, 19). With regard to the trace line of question 2 we may make two remarks. In the first place, as a function it is monotonic (weakly, non-decreasing): the probability of the scale alternative "yes" does not dccrease with an increase in tallness. Items of this type have for this reason been called monotone by Coombs, who contrasted them with non-monotone items in an older typology of data ( 1953, 493-4:
1964, 561-3).
Loevinger ( 1948)
called these items cumulative. Secondly. we may notice in figure 2.1 that the trace line of question :! virtually bisects the underlying continuum of "tallness" at the point I. 73 m. Practically all the subjects with values greater than I. 73 m. will choose the scale alternative "yes". Practically all the subjects shorter than 1.73 m. will not choose that scale alternative and answer "'no". There is virtually a 1-1 correspondence between the two alternative� and the two intervals into which the item divides the continuum of "tallnc:-.s"'. This behavior is characteristic for the perfect Guttman item, a� we shall sec. In figure 2.1 this pattern is due
The deterministic model:
the G utt111an scale
29
the steepness of the trace line or question 2. In our example. as eople in general arc well nwarc or their own height as a stable p a ttribute and as the item clearly embodies a well-defined value of the 10
a ttribute of tallness, we may expect a trace line like this, though in the ne ighborhood o f the value 1. 73 m. some erroneous answers may be possible, as indicated by the slopes in figure �.I.
For items related to vaguer and lcs� �table attributes such a�. let us say, attitudes, we hav e lcs� reason to expect steep trace line!> like those for question 2. For these attribute!> the items might be more
like those given by Ford ( 1950). such a� ··Arc yo u taller than a "Are you taller than the head of a pony?" and .. Arc you
table?"
,
taller than a good-sized bookcase?" ln this study we will consider only item!> with monotone trace lines. For methods of analyzing point item� we refer the read er to Mo steller (1949) and Torgerson (1958. 31�-7). A !.! ain thi� re�triction
does not seem too severe to us, because by rw· -thc greater part of scaling practice based on the measurement models tha t we shall consider shows at least an implicit
2.2.3
usc
or monotone item :-..
Properties ofthe perfecT scale
In general we shall consider the case of 11 subject�. each an�wcring to the same set of k items, each item being scored dichotomously. According to conventional statistical practice each item will receive a score of I when the scale response is given and 0 when it is not. For each subject the resulting respon�e pallcrn can be rcprc�entcd by a row vector: X;=
l ,O;i= l.� ..... k.
(I. I)
where X; is the score for item i. There are theoretically Zk different response pallerns. We shall further suppose that each subject has a gi v en hut unknown .
numerical value on the variable (e.g. attitude). Let that unknown value be(}. Our measurement problem amounts to getting information about
IJ from the observable response vector x. In our discussion of the trace lines of item� we noted that they represented the probability of a scale answ er (score I) a� a function of this unknown value(}. We might formulate this proba b ility as*
P{x;
=
lje}
*Stochastic variables will be underlined in this study.
( 1.2)
30
Theory and nlelhod
A perfect scale i:. ba�ed on perfect items. A perfect item is character ized by a trace line as shown in figure 2.2, where P{,!1 = 118} is ploucd as a function of 8. One of its properties is that there is a certain unknown. critical value, oi, on the 8-axis. Subjects with
O-values lower tlwn 81 will never choose the scale alternative for item
i. They will always score zero on that item. Subjects with 8-values as high as or higher than o;, on the other hand, will certainly score I
on item i.
as
they will always choose the scale alternative.
p
1----
---------- -- �----
Figure 2.2
Trace line of G-item i. The shaded area indicates the interval with non
Lero probability of item score t.
A subject's response to any item is entirely determined by his own value on the variable (8) and by that of the item (81). Hence the term deterministic model. We see therefore that a subject will score I on item i either with probability one or with probability zero, depending on his own 0-value and on the value o1 of that item. This unknown value o1 we may call the item difficulty of item i. To express this dependen.::e on fi1• we may formulate the definition of the trace line for a perfect item
(Guttman-item or G-item as we shall call it for short) as follows: P{.x:;= liO,o;} =0 = I
if
8 < o;
if
8 ;. 81•
( 1.3)
In other words, when a subject responds positively to an item if and
only if he has a value on the latent variable (e.g. attitude) that is at least
equal to the item difficulty, the item is a perfect item (G-item). In this case the respondent indicates by a positive answer that his attitudinal value i� equal to or exceeds that indicated by the item difficulty. A set ofk C-itemsforms a pe1fect CuT/man scale. We �hall throughout this study adopt the convention by which items
arc numbered according to the order of their item difficulty, the most
The deter111inistic 11/ode/: the G u/IIIW/1 scale
31
diflicult one bein g given I. the least diflicult on e k. The same number inc will hold good for the components of the response vector given in ( 1.1 ). We can now summarize the well-known properties of the pctfect -.calc. In figure 2.3 we have given a ver sion of 1 he usual picture for a fou r-item scale adapted to our purposes. This i l lus tration gives us a
.,imultancou view of th e trace lines for four (perfect) G-items. Each of these tra it ,,as the ,omcwhat literal interpretation 0w
of Guttman·� idea of sampling items, inspired by his own straight fon,ard exposition of it, that led to much critici�m of thi� concept (Torgerson. 1958. 332-6: Matalon. 1965. 29: Campbell and
Kcrckhon·. 1957). Campbell and Kcrckhoff point out in their criticism of the concept that the fact that a �ample of items pro\es to be a perfect scale docs not necessarily mean thHt the population of ncms is -.calable. f his. it must be said, is hardly a valid criticism of the concept of a popul a tio n of items, bccau'>e, a!> we saw at the conclu ion of 2.3. I. the �amc argument can be used against the concept of a population of �ubject'· In fact, the argument implies a criticism of the perfect scale a'> a usable construct. ampbell and Kerd.hoff also point out, and with more reason. that there is no genume sampling procedure for the items. because the boundaries of the population of items are not known. Consequently stntcmcnh ,tbout the univer..,c of items based on a set of items must be m.Hic \\ithin unkno'' n probability limits. They conclude that therefore. beyond the 11ems used in a scale, no meaningful construc tion of a wider universe is possible. Although we have our objections to th1" evaluauon of the utility of the concept of a wider population of Items, their concluding appraisal of scale analy�is itself is much to the point. They remark that
... calc
analysis enables us to judge whether
the set of items u...ed posse!->!->Cs ··a single central core of meaning... If the set forms a ..,calc we can \a� that re!>pondcnts bcha\ed as if these items had a ...ingle core of meaning. If not. then cither the item� Jack uniformity of 11/l.'alli!IJ.: for the respondents. or they have
ntl
uniclimenlional meaning for the respondent.... a,
It is certainly true that the careful �md purposeful method.., of item
it
selection and construction do not bear the faintest resemblance to
es
the procedure .., of random !>election in the sampling or respondents.
m
I hercfore the traditional sampling them·) based on th c-.e last pro
•le
cedures cannot adequately be u..,ed for inferences about a "ider
·m ns er
population of items. But in the experimental branches of the behavior al sciences. the method' of suhjcct -.election arc o rt e n as fa r removed from the survey sampling procedures. In othe r disciplines. -.uch as b1omctric re..,earch. where model� highly akin to tho!>c treated in
chapters 3 and 4. arc u�ed in cxpcrimcnts to estimate the toxicity of medicines (bio-a�say). guinea pig' and
o ther
animals arc -.ckcted by
procedures much farther removed from such sampling methods.
40
Theory and mellwd
Yet the result!> o f such experiments are being used for generalizations about other (e.g. human) populations. In all these cases statistical model� form an attempt at providing a basis for the generalization. apparently with some claim to success. The concept of a wider population of behavioral items relevant to a construct we wish to measure is well in agreement with intuition. As anyone know� who has tried to construct an attitude scale or ha!> :,tudied items as indicators of other types of behavior, in spite of the carefulness with which one works, one ends up with the feeling that other items might have done just as well. The whole empirically founded idea of the interchangeability of indices presupposes some thin!.! like a population or universe in which they are embedded.
LaL; rsfeld put up a convincing argument for the introduction of this
concept: "In the formation of indices of broad social and psychological wncepts, we typically select a relatively small number of items from a large number of possible ones suggested by the concept and it� attendant imagery. It is one of the notable features of such indices that their correlation with outside variables will usually be about the same, regardless of the specific 'sampling' of items which goes into them from the broader group associated with the concept. This rather startling phenomenon has been labeled 'the changeability of indices"' (Lazarsfeld, 1959b, I 13). We :-.hall in due course investigate models incorporating the idea of a pllpulation of items from which a given set of k items were "selected". We shall see in chapters 3 and 4 that the formalization of such models enables us to gain some general insight into the effects of item selec tion ;md the properties that are invariant for variations in procedures of item selection (or the "distribution" of the population of items). Many properties can be derived from a certain symmetry in the rob or the subject parameters, (8), and the roles of the items, (o). hmnally the roles of the two sets are often identical. For instance. \\C
may use this symmetry of the perfect Guttman model to reverse
the roles of items and subjects and repeat our argument of section �.3.1. This is illustrated in figure 2.6, which is similar to figure 2.5. The roles of items and subjects have, however, been reversed. In figure 2.6 we have introduced two subjects with values 81 and 82• J u:-.t "" we postulated a trace line for a given item in figure 2.5. which giv es at a certain point on the 8-axis the probability that a subject with that value will pass that item, we may. inversely, introduce in figure 2.6 a trace line for �ubject 81• This "subject trace line" gives for any point l'i on the o-axis the probability that an item with that value
The derenninisric 111odel: the G uttl!ta/1 scale
41
\\ill be passed by subject o,. For perfect �tic models.
2.4
THE
11\li'ERFECT SCALE:
1 HE
PRO Ill Fl\1 OF
"FIUtOR"
When we try to apply this Guttman model of a pc rrcct scale to empirical data, its deterministic features will lead to imperfections. Perfect ;;calc!> and perfect items rarely exi�t in practice. One has to
Theory and mel hod
42
can only be approximated. face the fact that the ideal, as usual,
a rearrangement of the data. c;tling procedures that are aimed at at a pattern like that in table arrive to order �ubjcct� and item�. in the order of the items has once that fact the by d obstructe be :!.I. will been e!>lablished and the perfect patterns have therefore been a�cenained.
numerous
'·imperfect"
patterns
will
occur, due
to
respon�e� that are '·in error". Let u., consider, for instance, the situation in figure 2.3 and the
corre:-ponding table 2. I. The occurrence of a response pattern such a� {0. 1. 0. I} is contradictory to the deterministic model, according to
\\ hich
only the "pertect" pattern { 0, 0, I, I} consistent with the
�calc score of 2 is possible. This pattern is "imperfect": the second item has a positive response whereas the third and "easier" item is an-,wered negatively. From a deterministic viewpoint there must be an "error'' somewhere. Yet in actual practice such deviations from the deterministic model may well occur. What do these "errors" lead to?
The failure of determinism: some consequences
:!.4. I
One of the immediate consequences of deviations from the determin istic model in practical applications is, of course, the loss of the property of reproducibilily. No longer does the scale score of any subject
predict
his
response
pattern exactly:
other
"imperfect"
patterns with the same score may occur. Still other problems have to be coped with when the deterministic model is used as a guide line for the construction of practical instru ments for measurement. The basic one is that of error. If the ordering or the items is given, the perfect patterns can be established. In all applications, patterns deviating from these perfect patterns are con'iiclered "imperfect" and held to contain some error. In order to arrive at some objective basis of judgement from which the serious ne-.s of such deviations may be evaluated it is necessary to determine clearly what an "error" is. This problem of the definition of error will
be the subject of section 2.4.2. The other sections of this chapter contain
a
discussion of some of the remaining problems and the ways
in which they have been solved in most practical applications. For in-,tance, if a certain definition of error has been set up, we must find a \\ ay
of ascertaining how serious the deviation of a set of items is
from the ideal of the perfect scale for a given population of subjects.
I hi-, i� the problem of determining the scalability of a set of items: the degrees to which a set of items may be said to fit the model or
a
Tlte delerminislic model: the C u/IIIW/1 scale pcrfc� l !>Calc. W.� must therefore devise criteria
43
of SClllllbi/ity in terms
0r wh1ch sets o( llcms may be evaluated. and we must des1gn proced ures to find and construct !>Cales from sets of items in terms of such criteria. Finally. in the literature on the ubjcct of scales. much attention hal> been paid to the problem of determining the proper score for
wbject� with imper fect rcspon:.c patterns. This scoring problem has generally led to the allocation of a perfect pattern to subj ects with imperfect ones.
:!.4.2
Tlte definition oj"'error··
The main purpose of procedures of scale analysis i�.
as
we have seen.
to establish the order of the item:, and that of the subject!>. Once the order of the item!> has been cstabli::.hcd, the perfect scale paucrns corresponding to that order, arc known and so therefore arc the imperfect patterns, the "wrong" one::..
We have to decide how
"wrong" they arc, 1 hat i�. we have to define error. Thi� definition of error i::. important because in all scaling proced ure� it determines the criteria by which the scalability of a �el is judged.
luch of the confusion and many of the diiTicultics that have
often hindered a clear a\\C�smcnt of the many criteria of scalability that have been put forward. arc simply the result of a lack ol' an in1.ight into the concept of error. a'> we hope to demonstrate. In fact. the early standard works on the subject do not give an unambiguou� definition of error. This has led to a rather unfortunate formulation of cocnicienh of scalabiltiy. as will be shown in section
:!.5. One of the simplest definition'> of error gives equal weight to every imperfect pallem: 1 hey all count as o11c error. Thi::. simple method of defining error has hardly been actually put into practice. In most definitions of error that arc actually U!>ed. however, some imperfect paucrns arc judged "more wrong that others", as we shall sec. Fir'>t there i� the definition by Guuman himself. It is clear enough. Defining a coefficient of reproducibility, he states: "It i� secured by counting up the number of respon�cs which would have been predicted wrongly for each person on the basis of his scale score. dividing these errors by the total number of res pons es
and \Ubtracting the resulting fraction from I"
( 1950a.
77).
rhe first pan of this quotation gives a precise definition of error in term� or the principle of reproducibility which states that for a perfect
44
Theory anc/11/ethod
�calc the sca le score or any subject will be the result of one and only one re�ponsc pallc rn: the perfect pattern corresponding to that score. Keeping in mind our method (see section 2.2.3) of ordering and numbcrin!-: the item scores in the response vector according to their I, 2, . . . , k; (or difficulty (·rom left to rrght with decreasing 8i; i =
i11crea.,·in� ··ca�iness''). we may consider the following examples rrom table ::!.3. which is designed in the form of table 2.1.
lmpeJfect pallerns
Tahle 2.3
Items 2
Patterns
2 3 4 5 (,• 7 8 9
I 0 0 0 I 0 0 0 0
I 0 0 I 0 0 0 0
3
0 0 I 0 0 0
4
I I 0 0 0
5
I 0 0 I 0 0
6
I 0
I 0
Scale score(s) 6 5 4 3 3 3 2 I 0
Imperfect patterns. In I able 2.3 we see enumerated for a six-item scale all the possible perfect response patterns, together with two "imperfect" patterns (pa t te rns 5 and 6 with a scores= 3). This enables us to predict. from
I he reproducibility property of a perfect scale, the pattern {0, 0, 0, I. I. I} (pattern 4). Counting the errors in reproducibility per item. the imperfect pattern 5, {I, 1.0. 1,0,0}, contains 4 errors according
to this definition. Because this definition was used by Goodenough
( 1944). we shall henceforth call it the Guttman-Goodenough defini tion. The fact that according to this definition "imperfect" patterns arc
11
eighed differently according to their number of errors, will
become clear when the definition is applied to the imperfect pattern 6. {0. 0. I. I,0, I}, which also has a score of 3. In this pattern there arc two errors. In Meas11rement and Prediction, however. a second definition is U\cd which leads to a definition of error and hence of scalability difl"er-
The de1erminis1ic model: the G Ul/11/tllt scole
45
n :' This definition is i nl! from the Guuman-Goodenough de finitio . co-ntained in Suchman' discussion of the !>Calogr;�m board technique: ··we define 'scale type' for the purpo!>e of ...calogram analy�i as that per fect scale type which the given individual most closel� approaches with the least number of errors. If there is more than
one perfect scale type to which the given individual approaches most closely. he is classified as belonging to that scale type which best maintain an adjacent error pair. According to counts and climinutcs from the pattern all adJaCent error patrs met