List of contents Preface
Part I: Concepts and Methods of CFA
ix 1
1.
Introduction: the Goals and Steps of Configura...
66 downloads
1235 Views
20MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
List of contents Preface
Part I: Concepts and Methods of CFA
ix 1
1.
Introduction: the Goals and Steps of Configural Frequency Analysis
1
1.1
Questionsthat can be answeredwith CFA
1
1.2
CFA and the PersonPerspective
5
1.3
The five stepsof CFA
8
1.4
A first completeCFA data example
13
2.
Log-linear Base Models for CFA
19
2.1
SampleCFA basemodels and their designmatrices
22
2.2
Admissibility of log-linear models as CFA base models
27
2.3
Sampling schemesand admissibility of CFA base models
31
2.3.1
Multinomial sampling
32
2.3.2
Product multinomial sampling
33
2.3.3
Sampling schemesand their implications for CFA
34
2.4
A grouping of CFA basemodels
40
2.5
The four stepsof selectinga CFA basemodel
43
3.
Statistical Testing in Global CFA
47
3.1
The null hypothesisin CFA
47
3.2
The binomial test
48
3.3
Three approximationsof the binomial test
54
List of Contents 3.3.1
Approximation of the binomial test using Stirling’s formula
54
3.3.2
Approximation of the binomial test using the DeMoivre-Laplace limit theorem
55
3.3.3
Standardnormal approximation of the binomial test
56
3.3.4
Other approximationsof the binomial test
57
3.4
The 2 test and its normal approximation
58
3.5
Anscombe’snormal approximation
62
3.6
Hypergeometrictests and approximations
62
3.6.1
Lehmacher’sasymptotic hypergeometrictest
63
3.6.2
Ktichenhoff s continuity correction for Lehmacher’s test
64
3.7
Issuesof power and the selection of CFA tests
65
3.7.1
Naud’s power investigations
66
3.7.2
Applications of CFA tests
69
3.7.2.1 CFA of a sparsetable
70
3.7.2.2 CFA in a table with large frequencies
76
3.8
Selecting significance tests for CFA
78
3.9
Finding types and antitypes: Issuesof differential power
81
3.10
Methods of protecting a
85
3.10.1 The Bonferroni a protection (SS)
87
3.10.2 Holm’s procedurefor a protection (SD)
88
3.10.3 Hochberg’sprocedurefor a protection (SU)
89
List of contents 3.10.4 Holland and Copenhaver’sprocedurefor a protection
... &I
90
(SD) 3.10.5 Hommel, Lehmacher,and Perli’s modifications of
90
Holm’s procedurefor protection of the multiple level a (SD) 3.10.6 Illustrating the proceduresfor protecting the test-wise
92
a
4.
Descriptive Measures for Global CFA
97
4.1
The relative risk ratio, RR
97
4.2
The measure log P
98
4.3
Comparing the p component with the relative risk ratio and log P
99
Part II: Models and Applications of CFA
105
5.
Global Models of CFA
105
5.1
Zero order global CFA
106
5.2
First order global CFA
110
5.2.1
Data exampleI: First order CFA of social network data
111
5.2.2
Data exampleII: First order CFA of Finkelstein’s Tanner data, Waves2 and 3
115
5.3
Secondorder global CFA
118
5.4
Third order global CFA
121
6.
Regional Models of CFA
125
6.1
Interaction StructureAnalysis (ISA)
125
6.1.1
ISA of two groups of variables
126
6.1.2
ISA of three or more groups of variables
135
List of Contents
-iv 6.2
Prediction CFA
139
6.2.1
Basemodels for Prediction CFA
139
6.2.2
More P-CFA models and approaches
152
6.2.2.1 Conditional P-CFA: Stratifying on a variable
152
6.2.2.2 Biprediction CFA
159
6.2.2.3 Prediction coefficients
164
7.
Comparing k Samples
173
7.1
Two-sample CFA I: The original approach
173
7.2
Two-sample CFA II: Alternative methods
178
7.2.1
Gonzales-Deb&r’sX*
179
7.2.2
Goodman’sthree elementaryviews of nonindependence
180
7.2.3
Measuring effect strength in two-sampleCFA
186
7.3
Comparing three or more samples
190
7.4
Three groups of variables: ISA plus k-sampleCFA
195
Part III: Methods of Longitudinal
CFA
203
8.
CFA of Differences
205
8.1
A review of methodsof differences
206
8.2
The method of differences in CFA
212
8.2.1
Depicting the shapeof curves by differences: An example
213
8.2.2
Transformationsand the size of the table under study
214
8.2.3
Estimating expectedcell frequenciesfor CFA of differences
216
List of contents
41.
8.2.3.1 Calculating a priori probabilities: Three examples
216
8.2.3.2 Three data examples
220
8.2.4
CFA of seconddifferences
227
9.
CFA of Level, Variability, and Shape of Series of Observations
229
9.1
CFA of shifts in location
229
9.2
CFA of variability in a seriesof measures
236
9.3
Considering both level and trend in the analysisof seriesof measures
240
9.3.1
Estimation and CFA of polynomial parametersfor equidistant points on X
241
9.3.1.1 Orthogonal polynomials
244
9.3.1.2 Configural analysisof polynomial coefficients
248
9.3.2
Estimation and CFA of polynomial parametersfor non-equidistantpoints on X
251
9.4
CFA of seriesthat differ in length; an exampleof confirmatory CFA
256
9.5
Examining treatmenteffects using CFA; more confirmatory CFA
259
9.5.1
Treatmenteffects in pre-postdesigns(no control group)
259
9.5.2
Treatmenteffects in control group designs
263
9.6
CFA of patternsof correlation or multivariate distance sequences
265
9.6.1
CFA of autocorrelations
266
9.6.2
CFA of autodistances
269
List of Contents
yj
9.7
Unidimensional CFA
271
9.8
Within-individual CFA
274 279
Part IV: The CFA Specialty File and Alternative Approaches to CFA 10.
More Facets of CFA
280
10.1
CFA of cross-classificationswith structural zeros
280
10.2
The parsimony of CFA basemodels
284
10.3
CFA of groupsof cells: Searchingfor patternsof types and antitypes
293
10.4
CFA and the exploration of causality
295
10.4.1 Exploring the concept of the wedgeusing CFA
’
296
10.4.2 Exploring the concept of thefork using CFA
301
10.4.3 Exploring the conceptof reciprocal causation using
305
CFA 10.5
Covariatesin CFA
309
10.5.1 Categorical covariates:stratification variables
309
10.52 Continuous covariates
316
10.6
CFA of ordinal variables
323
10.7
Graphical displays of CFA results
326
10.7.1 Displaying the patternsof types and antitypes basedon
327
test statistics or frequencies 10.7.2 Mosaic displays
330
10.8
Aggregating results from CFA
334
10.9
Employing CFA in tandemwith other methodsof analvsis
338
List of contents
vii -
10.9.1 CFA and cluster analysis
338
10.9.2 CFA and discriminant analysis
342
11.
Alternative Approaches to CFA
347
11.1
Kieser and Victor’s quasi-independencemodel of CFA
347
11.2
BayesianCFA
353
11.2.1 The prior and posterior distributions
354
11.2.2 Types and antitypes in BayesianCFA
356
11.2.3 Patternsof types and antitypes and protecting u
356
11.2.4 Data examples
357
Part V: Computational
Issues
361
12.
Software to Perform CFA
361
12.1
Using SYSTAT to perform CFA
362
12.1.1 SYSTAT’s two-way cross-tabulationmodule
362
12.1.2 SYSTAT’s log-linear modeling module
367
12.2
Using S-plusto perform BayesianCFA
371
12.3
Using CFA 2002 to perform FrequentistCFA
374
12.3.1 Program description
375
12.3.2 Sampleapplications
379
12.3.2.1First order CFA; keyboard input of frequencytable
379
12.3.2.2Two-sample CFA with two predictors; keyboard input
384
12.3.2.3SecondOrder CFA; data input via file
390
12.3.2.4CFA with covariates;input via file (Frequencies)and keyboard (covariate)
394
... yllJ
Part VI:
List of Contents
References, Appendices, and Indices
401
References
401
Appendix A: A brief introduction to log-linear modeling
423
Appendix B: Table of a*-levels for the Bonferroni and Holm
433
adjustments Author Index
439
Subject Index
445
Configural Frequency Analysis - Methods, Models, and Applications Preface Events that occur as expected are rarely deemedworth mentioning. In contrast, events that are surprising, unexpected, unusual, shocking, or colossal appear in the news. Examples of such events include terrorist attacks,when we are informed aboutthe eventsin New York, Washington, andPennsylvaniaon September11,2001; or on the more peacefulside,the weather, when we hear that there is a drought in the otherwise rainy Michigan; accidentstatistics,when we note that the numberof deathsfrom traffic accidents that involved alcohol is smaller in the year 2001 than expectedfrom earlier years;or health,whenwe learnthat smokingand lack of exercisein the population doesnot preventthe life expectancyin France from being one of the highest amongall industrial countries. Configural FrequencyAnalysis (CFA) is a statistical method that allows one to determinewhether eventsthat are unexpectedin the sense exemplified aboveare significantly discrepantfrom expectancy.The idea is that for eachevent,an expectedfrequencyis determined.Then, one asks whether the observedfrequency differs from the expectedmore than just randomly. As was indicated in the examples, discrepanciescome in two forms. First, events occur more often than expected.For example, there may be more sunny days in Michigan than expected from the weather patternsusually observedin the Great Lakes region. If such eventsoccur significantly more ofrenthan expected,the pattern under study constitutes a CFA type. Other eventsoccur lessoften than expected.For example,one can ask whether the number of alcohol-relateddeathsin traffic accidents is significantIy below expectation.If this is the case,the patternunderstudy constitutesa CFA antitype. According to Lehmacher (2000), questions similar to the ones answeredusing CFA, were asked already in 1922 by Pfaundler and von Sehr. The authors asked whether symptoms of medical diseasescan be shown to co-occur aboveexpectancy.Lange and Vogel (1965) suggested that the term syndrom be used only if individual symptomsco-occurred above expectancy.Lienert, who is credited with the developmentof the conceptsand principles of CFA, proposedin 1968 (seeLienert, 1969) to test for eachcell in a cross-classificationwhether it constitutesa type or an antitype. ix
x
Con&rural FreouencvAnalvsis: Preface
The presenttext introduces readersto the method of Configural Frequency Analysis. It provides an almost complete overview of approaches, ideas, and techniques. The first part of this text covers concepts and methods of CFA. This part introduces the goals of CFA, discussesthe base models that are used to test event patterns against, describesandcomparesstatisticaltests,presentsdescriptivemeasures,and explains methodsto protect the significance level a. The second part introduces CFA base models in more detail. Models that assignthe samestatusto all variables are distinguishedfrom models that discriminate between variables that differ in status, for instance, predictors and criteria. Methods for the comparison of two or more groups are discussedin detail, including specific significance tests and descriptive measures. The third part of this book focuses on CFA methods for longitudinal data. It is shown how differences between time-adjacent observationscan be analyzedusing CFA. It is also shownthat the analysis of differences can require specialprobability models.This part of the book also illustrates the analysis of shifts in location, and the analysis of series of measuresthat are representedby polynomials, autocorrelations,or autodistances. The fourth part of this book contains the CFA Specidty File. Methods are discussedthat allow one to deal with such problems as structural zeros, and that allow one to include covariates into CFA. The graphical representationof CFA results is discussed,and the configural analysis of groups of cells is introduced.It is shown how CFA results can be simplified (aggregated). Finally, this part presents two powerful alternatives to standardCFA. The first of thesealternatives,proposed by Kieser andVictor (1999), usesthe more generallog-linear modelsof quasiindependenceas basemodels.Using thesemodels,certain artifacts can be prevented.The secondalternative, proposedby Wood, Sher and von Eye (1994) and by GutiCrrez-Pefiaand von Eye (2000), is Bayesian CFA. This method (a) allows one to consider a priori existing information, (b) provides a natural way to analyzinggroupsof cells, and(c) doesnot require one to adjust the significance level a. Computational issues are discussed in the fifth part. This part shows how CFA can be performed using standard general purpose statistical software such as SYSTAT. In addition, this part shows how BayesianCFA can be performedusing Splus.The featuresof a specialized CFA program are illustrated in detail. There are severalaudiencesfor a book like this. First. studentsin
Configural FreauencvAnalvsis; Preface the behavioral, social, biological, and medical sciences,or students in empirical sciencesin general,may benefit from the possibility to pursue questions that arise from taking the cell-oriented (Lehmacher, 2000) or person-orientedperspectives(Bergman& Magnusson,1997).CFA can be usedeither as the only method to answerquestionsconcerning individual cells of cross-classifications,or it canbe usedin tandemwith suchmethods as discriminant analysis,logistic regression,or log-linear modeling. The level of statistical expertiseneededto benefit most from this book is that of a junior or senior in the empirical behavioral and social sciences.At this level, students have completed introductory statistics courses and know such methods as x2-tests.In addition, they may have taken coursesin categorical data analysis or log-linear modeling, both of which would make it easierto work with this book on CFA. To perform CFA, no morethan a generalpurposesoftwarepackagesuchasSAS, SPSS, Splus, or SYSTAT is needed.However, specialized CFA programs as illustrated in Part 5 of this book are more flexible, and they are available free (for details seeChapter 12). Acknowledgments. When I wrote this book, I benefitted greatly from a number of individuals’ support, encouragement,and help. First of all, Donata, Maxine, Valerie, and Julian tolerate my lengthy escapadesin my study, and provide me with the humanenvironmentthat keepsme up when I happento venture out of this room. My friends Eduardo Gutibrrez-Pefia, Eun-Young Mun, Mike Rovine, and Christof Schusterreadthe entire first draft of the manuscript and provided me with a plethora of good-willing, detailed, and insightful comments.They found the mistakesthat are not in this manuscriptany more.I am responsiblefor the onesstill in the text. The publishers at Lawrence Erlbaum, most notably Larry Erlbaum himself, Debra Riegert, and JasonPlanerexpressedtheir interest in this project and encouragedme from the first day of our collaboration. I am deeply grateful for all their support. GustavA. Lienert, who initiated CFA, readandcommenton almost the entire manuscript in the last days of his life. I feel honored by this effort. This text reflects the changeshe proposed.This book is dedicated to his memory. Alexander von Eye Okemos,April 2002
This page intentionally left blank
Configural Frequency Analysis Methods, Models, and Applications
This page intentionally left blank
Part 1: Concepts and Methods of CFA
This page intentionally left blank
%+, &a+ &3~,
1.
1996,
p.33
Introduction: The Goals and Steps of Configural Frequency Analysis
This first chapter consists of three parts. First, it introduces readers to the basic concepts of Configural Frequency Analysis (CFA). It begins by describing the questions that can be answered with CFA. Second, it embeds CFA in the context of Person Orientation, that is, a particular research perspective that emerged in the 1990s. Third, it discusses the five steps involved in the application of CFA. The chapter concludes with a first complete data example of CFA.
1.1
Questions that can be answered with CFA
Configural Frequency Analysis (CFA; Lienert, 1968, 1971a) allows researchers to identify those patterns of categories that were observed more often or less often than expected based on chance. Consider, for example, the contingency table that can be created by crossing the three psychiatric symptoms Narrowed Consciousness (C), Thought Disturbance (T), and Affective Disturbance (A; Lienert, 1964, 1969, 1970; von Eye, 1990). In a sample of 65 students who participated in a study on the effects of LSD 50, each of these symptoms was scaled as 1 = present or 2 = absent. The cross-classification C x T x A, which has been used repeatedly in illustrations of CFA (see, e.g., Heilmann & Schtitt, 1985; Lehmacher, 198 1; Lindner, 1984; Ludwig, Gottlieb, & Lienert, 1986), appears in Table 1.
Ouestions answered using CFA Table 1:
Cross-classification of the three variables Narrowed Consciousness (C), Thought Disturbance (T), and Affective Disturbance (A); N = 65 Pattern CTA
Observed Frequency
111
20
112
1
121
4
122
12
211
3
212
10
221
15
222
0
In the context of CFA, the patterns denoted by the cell indices 111, 112, .... 222 are termed Configurations. If d variables are under study, each configuration consists of d elements. The configurations differ from each other in at least one and maximally in all delements. For instance, the first configuration, 111, describes the 20 students who experienced all three disturbances. The second configuration, 112, differs from the first in the last digit. This configuration describes the sole student who experiences narrowed consciousness and thought disturbances, but no affective disturbance. The last configuration, 222, differs from the first in all d = 3 elements. It suggests that no student was found unaffected by LSD 50. A complete CFA of the data in Table 1 follows in Section 3.7.2.2. The observed frequencies in Table 1 indicate that the eight configurations do not appear at equal rates. Rather, it seems that experiencing no effects is unlikely, experiencing all three effects is most likely, and experiencing only two effects is relatively unlikely. To make these descriptive statements, one needs no further statistical analysis. However, there may be questions beyond the purely descriptive. Given a cross-classification of two or more variables. CFA can be used to answer
Introduction: Goals and Stepsof CFA
2
questionsof the following types:
(1)
How do the observedfrequencies compare with the expected j%equencies?As interestingand important as it may be to interpret observed frequencies,one often wonders whether the extremely high or low numbersare still that extremewhen we comparethem with their expected counterparts. The same applies to the less extremefrequencies.Are they still about averagewhen compared to what could havebeenexpected?To answerthesequestions,one needs to estimate expected cell frequencies. The expected cell frequencies conform to the specificationsmade in so-calledbase models.Theseare modelsthat reflect the assumptionsconcerning the relationshipsamongthe variablesunderstudy.Basemodelsare discussedin Sections2.1- 2.3. It goeswithout sayingthat different base models can lead to different expected cell j?equencies (Mellenbergh, 1996). As a consequence,the answer to this first question depends on the base model selected for frequency comparison, and the interpretation of discrepancies between observedand expectedcell frequenciesmust always consider the characteristicsof the basemodel specifiedfor the estimationof the expectedfrequencies.The selectionof basemodelsis not arbitrary (seeChapter2 for the definition of a valid CFA basemodel). The comparisonof observedwith expectedcell frequenciesallows one to identify those configurations that were observed as often as expected.It allows one also to identify those configurations that were observedmore often than expectedand those configurations that were observedless often than expected.Configurations that are observed at different frequencies than expected are of particular interest in CFA applications.
(2)
Are the discrepancies between observed and expected cell j?equencies statistically signiJicant? It is rarely the case that observed and expected cell frequencies are identical. In most instances,there will be numerical differences.CFA allows one to answerthe questionwhether a numerical difference is random or too largeto be consideredrandom.If an observedcell frequency is significantly largerthanthe expectedcell frequency,the respective configuration is said to constitute a CFA type. If an observed frequencyis significantly smallerthan its expectedcounterpart,the configuration is said to constitutea CFA antitype. Configurations
4
Ouestions answered using CFA with observed frequencies that differ from their expectancies only randomly, constitute neither a type nor an antitype. In most CFA applications, researcherswill find both, that is, cells that constitute neither a type nor an antitype, and cells that deviate significantly from expectation.
(3)
Do two or more groups of respondentsdljj?erin their pequency distributions? In the analysis of cross-classifications, this question typically is answered using some form of the X2-test, some loglinear model, or logistic regression. Variants of X2-tests can be employed in CFA too (for statistical tests employed in CFA, see Chapter 2). However, CFA focuses on individual configurations rather than on overall goodness-of-fit. CFA indicates the configurations in which groups differ. If the difference is statistically significant, the respective configuration is said to constitute a discrimination type.
(4)
Do jkequency distributions change over time and what are the characteristics of such changes?There is a large number of CFA methods available for the investigation of change and patterns of change. For example, one can ask whether shifts from one category to some other category occur as often as expected from some chance model. This is of importance, for instance, in investigations of treatment effects, therapy outcome, or voter movements. Part III of this book covers methods of longitudinal CFA.
(5)
Do groups differ in their change patterns? In developmental research, in research concerning changes in consumer behavior, in research on changes in voting preferences, or in research on the effects of medicinal or leisure drugs, it is one issue of concern whether groups differ in the changes that occur over time. What are the differences in the processes that lead some customers to purchase holiday presents on the web and others in the stores? CFA allows one to describe these groups, to describe the change processes, and to determine whether differences in change are greater than expected.
(6)
Are there predictor-criterion relationships? In educational research, in studies on therapy effects, in investigations on the effects of drugs, and in many other contexts, researchers ask
Goals and Stensof CFA
5
whether events or configurations of events allow one to predict other configurations of events.CFA allows one to identify those configurations for which one can predict that other configurations occur more often than expected, and those configurations for which one can predict that other configurations occur less often than expectedbasedon chance. This book presentsmethods of CFA that enable researchersto answertheseand more questions.
1.2
CFA and the person perspective’
William Stern introduced in 1911 the distinction betweenvariability and psychography. Variability is the focus whenmanyindividuals are observed in one characteristic with the goal to describe the distribution of this characteristic in the population. Psychographicmethodsaim at describing one individual in many characteristics. Stern also statesthat these two methodscan be combined. Whendescribingan individual in apsychographiceffort, resultsare often presentedin the form of a proJiZe. For example, test results of the MMPI personality test typically are presentedin the form of individual profiles, and individuals are comparedto referenceprofiles. For example, a profile may resemble the pattern typical of schizophrenics.A profile describesthe position of an individual on standardized,continuousscales. Thus,onecanalsocomparethe individual’s relative standingacrossseveral variables. Longitudinally, one can study an individual’s relative standing and/or the correlation with some reference change. Individuals can be grouped basedon profile similarity. In contrastto profiles, configurations are not basedon continuous but on categoricalvariables.As was explainedin Section 1.1,the ensemble of categories that describes a cell of a cross-classification is called configuration (Lienert, 1969). Configurational analysis using CFA investigates such configurations from several perspectives.First, CFA identifies configurations (see Table 1). This involves creating crossclassificationsor, when variablesare originally continuous,categorization
‘The following sectionborrows heavily from von Eye (2002b; seealso von Eye, Indurkhya, & Kreppner, 2000).
6
CFA and the Person Persnective
and then creating cross-classifications. Second, CFA asks, whether the number of times a configuration was observed could have been expected from some a priori specified model, the base model. Significant deviations will then be studied in more detail. Third, researchers often ask in a step that goes beyond CFA, whether the cases described by different configurations also differ in their mean and covariance structures in variables not used for the cross-classification. This question concerns the external validity of configurational statements (Aksan et al., 1999; see Section 10.11). Other questions that can be answered using CFA have been listed above. In the following paragraphs, CFA will be embedded in Differential Psychology and the Person-Oriented Approach. This section covers two roots of CFA, Differential PsychoZogyand the Person-Oriented Approach. The fundamental tenet of Differential Psychology is that “individual differences are worthy of study in their own right” (Anastasi, 1994, p. ix). This is often seen in contrast to General Psychology where it is the main goal to create statements that are valid for an entire population. General Psychology is chiefly interested in variables, their variability, and their covariation (see Stem, 1911). The data carriers themselves, for example, humans, play the role of replaceable random events. They are not of interest per se. In contrast, Differential Psychology considers the data carriers units of analysis. The smallest unit would be the individual at a given point in time. However, larger units are often considered, for example, all individuals that meet the criteria of geniuses, alcoholics, and basketball players. Differential Psychology as both a scientific method and an applied concept presupposes that the data carriers’ characteristics are measurable. In addition, it must be assumed that the scales used for measurement have the same meaning for every data carrier. Third, it must be assumed that the differences between individuals are measurable. In other words, it must be assumed that data carriers are indeed different when they differ in their location on some scale. When applying CFA, researchers make the same assumptions. The Person-Oriented Approach (Bergman & Magnusson, 1991, 1997; Magnusson, 1998; Magnusson & Bergman, 2000; von Eye et al., 2000) is a relative of Differential Psychology. It is based on five propositions (Bergman & Magnusson, 1997; von Eye et al., 1999a):
(1)
Functioning, process, and development (FPD) are, at least in part, specific to the individual.
Goals and Stepsof CFA
(2) (3) (4) (5)
FPD are complex andnecessitateincluding many factors andtheir interactions. There is lawfulnessand structure in (a) individual growth and (b! interindividual differences in FPD. Processesare organizedand function aspatterns of the involved factors. The meaning of the involved factors is given by the factors’ interactionswith other factors. Some patterns will be observed more frequently than other patterns, or more frequently than expected based on prior knowledge or assumptions.Thesepatternscan be called common types.Examplesof commontypes include the types identified by CFA. Accordingly, there will be patterns that are observed less frequently than expected from some chance model. CFA terms thesethe antitypical patternsor antitypes.
Two consequencesof thesefive propositionsareof importancefor the discussionand application of CFA. The first is that, in order to describe humanfunctioning anddevelopment,differential statementscan be fruitful in addition to statementsthat generalizeto variable populations, person populations,or both. Subgroups,characterizedby group-specificpatterns, can be describedmore precisely. This is the reasonwhy methodsof CFA (and cluster analysis) are positioned so prominently in person-oriented research. Each of these methods of analysis focuses on groups of individuals that sharein commona particular pattern and differ in at least one, but possibly in all characteristics(seeTable 1, above). The secondconsequenceis that functioning needsto be described at an individual-specific basis.If it is a goal to compareindividuals based on their characteristics of FPD, one needs a valid description of each individual. Consider,for example,Proposition 5, above.It statesthat some patternswill occurmorefrequently andotherslessfrequentlythanexpected based on chance or prior knowledge. An empirical basis for such a proposition can be provided only if intra-individual functioning and developmentis known. Thus, the person-oriented approach and CFA meet where (a) patternsof scoresor categoriesare investigated,and (b) where the tenet of differential psychology is employed according to which it is worth the effort to investigate individuals and groups of individuals. The methodology employed for studies within the framework of the personoriented approachis typically that of CFA. The five stepsinvolved in this methodology are presentedin the next section.
The Five Steps of CFA
8
1.3
The five steps of CFA
This section introduces readers to the five steps that a typical CFA application involves. This introduction is brief and provides no more than an overview. The remainder of this book provides the details for each of these steps. These steps are
(1)
(2) (3) (4) (5)
Selection of a CFA base model and estimation of expected cell frequencies; the base model (i) reflects theoretical assumptions concerning the nature of the variables as either of equal status or grouped into predictors and criteria, and (ii) considers the sampling scheme under which the data were collected; Selection of a concept of deviation from independence; Selection of a significance test; Performance of significance tests and identification of configurations that constitute types or antitypes; Interpretation of types and antitypes.
The following paragraphs give an overview of these five steps. The following sections provide details, illustrations, and examples. Readers already conversant with CFA will notice the many new facets that have been developed to increase the number of models and options of CFA. Readers new to CFA will realize the multifaceted nature of the method. (1) Selection of a CFA base model and estimation of expected cell j-equencies. Expected cell frequencies for most CFA models* can be estimated using the log-frequency model
log E = XL , where E is the array of model frequencies, that is, frequencies that conform to the model specifications. X is the design matrix, also called indicator matrix. Its vectors reflect the CFA base model or, in other contexts, the logfrequency model under study. h is the vector of model parameters. These parameters are not of interest per se in frequentist CFA. Rather, CFA focuses on the discrepancies between the expected and the observed cell frequencies. In contrast to log-linear modeling, CFA is not applied with the
‘Exceptions are presented, for instance, in the section on CFA for repeated observations (see Section 8.2.3; cf. von Eye & Niedermeier, 1999).
Goals and Steps of CFA
9
goal of identifying a model that describes the data sufficiently and parsimoniously (for a brief introduction to log-linear modeling, see Appendix A). Rather, a CFA basemodeltakesinto account all effectsthat are NOT of interest to the researchers, and it is assumed that the base model fails to describe the data well. If types and antitypes emerge, they indicate where the most prominent discrepancies between the base model and the data are. Consider the following example of specifying a base model. In Prediction CFA, the effects that are NOT of interest concern the relationships among the predictors and the relationships among the criteria. Thus, the indicator matrix X for the Prediction CFA base model includes all relationships among the predictors and all relationships among the criteria. In other words, the typical base model for Prediction CFA is saturated in the predictors and the criteria. However, the base model must not include any effect that links predictors to criteria. If types and antitypes emerge, they reflect relationships between predictors and criteria, but not among the predictors or among the criteria. These predictor-criterion relationships manifest in configurations that were observed more often than expected from the base model or in configurations that were observed less often than expected from the base model. A type suggeststhat a particular predictor configuration allows one to predict the occurrence of a particular criterion configuration. An antitype allows one to predict that a particular predictor configuration is not followed by a particular criterion configuration. In addition to considering the nature of variables as either all belonging to one group, or as predictors and criteria as in the example with Prediction CFA, the sampling scheme must be considered when specifying the base model. Typically, the sampling scheme is multinomial. Under this scheme, respondents (or responses; in general, the units of analysis) are randomly assigned to the cells of the entire cross-tabulation. When the sampling scheme is multinomial, any CFA basemodel is admissible. Please notice that this statement does not imply that any log-frequency model is admissible as a CFA base model (see Section 2.2). However, the multinomial sampling scheme itself does not place any particular constraints on the selection of a base model. An example of a cross-classification that can be formed for configurational analysis involves the variables, Preference for type of car (P; 1 = minivan; 2 = sedan; 3 = sports utility vehicle; 4 = convertible; 5 = other) and number of miles driven per year (M; 1 = 0 - 10,000; 2 = 10,OO1 15,000; 3 = 15,001 - 20,000; 4 = more). Suppose a sample of 200
The Five Steps of CFA respondents indicated their car preference and the number of miles they typically drive in a year. Then, each respondent can be randomly assigned to the 20 cells of the entire 5 x 4 cross-classification of P and M, and there is no constraint concerning the specification of base models. In other instances, the sampling scheme may be productmultinomial. Under this scheme, the units of analysis can be assigned only to a selection of cells in a cross-classification. For instance, suppose the above sample of 200 respondents includes 120 women and 80 men, and the gender comparison is part of the aims of the study. Then, the number of cells in the cross-tabulation increases from 5 x 4 to 2 x 5 x 4, and the sampling scheme becomes product-multinomial in the gender variable. Each respondent can be assigned only to that part of the table that is reserved for his or her gender group. From a CFA perspective, the most important consequence of selecting the product-multinomial sampling scheme is that the marginals of variables that are sampled productmultinomially must always be reproduced. Thus, base models that do not reproduce these marginals are excluded by definition. This applies accordingly to multivariate product-multinomial sampling, that is, sampling schemes with more than one fixed marginal. In the present example, including the gender variable precludes zero-order CFA from consideration. Zero-order CFA, also called Configural Chster Analysis, usesthe no effect model for a base model, that is, the log-linear model log E = lh, where 1 is a vector of ones and h is the intercept parameter. This model may not reproduce the sizes of the female and male samples and is therefore not admissible.
(2) Selectionof a conceptof deviationj?om independenceand Selectionof a significance test. In all CFA base models, types and antitypes emerge when the discrepancy between an observed and an expected cell frequency is statistically significant. However, the measures that are available to describe the discrepancies use different definitions of discrepancy, and differ in the assumptions that must be made for proper application. The x2based measures and their normal approximations assessthe magnitude of the discrepancy relative to the expected frequency. This group of measures differs mostly in statistical power, and can be employed regardless of sampling scheme. The hypergeometric test and its normal approximations, and the binomial test also assessthe magnitude of the discrepancy, but they presupposeproduct-multinomial sampling. The relative risk, RR, is defined as the ratio Ni/Ei where i indexes the configurations. This measure indicates the frequency with which an event was observed, relative to the frequency
Goals and Stepsof CFA
11
with which it was expected.RR,is a descriptive measure(seeSection 4.1; DuMouchel, 1999).Thereexistsan equivalentmeasure,Ii, that resultsfrom a logarithmic transformation, that is, 4 = lOgE(RR,;cf. Church & Hanks, 1991). This measurewas termed mutual infirmation. RR, and Ii do not require any specific sampling scheme.The measurelog P (for a formal definition see DuMouchel, 1999, or Section 4.2) has been used descriptively and also to test CFA null hypotheses.If used for statistical inference, the measureis similar to the binomial and other tests used in CFA, althoughthe rank order of the assessedextremity of the discrepancy between the observed and the expected cell frequencies can differ dramatically (see Section 4.2; DuMouchel, 1999; von Eye & GutierrezPefia, in preparation). In the present context of CFA, we use 1ogPas a descriptive measure. In two-sampleCFA, two groupsof respondentsarecompared.The comparisonusesinformation from two sources.The first sourceconsistsof the frequencieswith which Configuration i was observedin both samples. The secondsource consists of the sizes of the comparison samples.The statisticscan be classified basedon whetherthey are marginal-dependent or marginabfiee. Marginal-dependentmeasuresindicate the magnitudeof an associationthat also takes the marginal distribution of responsesinto account. Marginal-free measuresonly consider the association.It is very likely that marginal-dependenttests suggesta different appraisal of data than marginal-free tests (von Eye, Spiel, & Rovine, 1995). (3) Selection of sign$cance test. Four criteria are put forth that can guide researchersin the selectionof measuresfor one-sampleCFA: exact versus approximative test, statistical power, sampling scheme, and use for descriptive versus inferential purposes.In addition, the tests employed in CFA differ in their sensitivity to types and antitypes. More specifically, when samplesizesare small,mosttestsidentify more typesthan antitypes. In contrast when sample sizesare large, most tests are more sensitive to antitypes than types. one consistent exception is Anscombe’s (1953) zapproximation which alwaystendsto find more antitypesthan types, even when sample sizes are small. Section 3.8 provides more detail and comparisons of these and other tests, and presents arguments for the selection of significance tests for CFA. (4) Performing sign#cance testsand identlfiing configurationsas typesor antitypes.This fourth stepof performing a CFA is routine to the extentthat significance tests come with tail probabilities that allow one to determine
12
The Five Steps of CFA
immediately whether a configuration constitutes a type, an antitype, or supports the null hypothesis. It is important, however, to keep in mind that exploratory CFA involves employing significance tests to each cell in a cross-classification. This procedure can lead to wrong statistical decisions first because of capitalizing of chance. Each test comes with the nominal error margin ~1.Therefore, a% of the decisions can be expected to be incorrect. In large tables, this percentage can amount to large numbers of possibly wrong conclusions about the existence of types and antitypes. Second, the cell-wise tests can be dependent upon each other. Consider, for example, the case of two-sample CFA. If one of the two groups displays more casesthan expected, the other, by necessity, will display fewer cases than expected. The results of the two tests are completely dependent upon each other. The result of the second test is determined by the result of the first, because the null hypothesis of the second test stands no chance of surviving if the null hypothesis of the first test was rejected. Therefore, after performing the cell-wise significance tests, and before labeling configurations as type/antitype constituting, measuresmust be taken to protect the test-wise a. A selection of such measures is presented in Section 3.10. (5) Interpretation of types and antitypes. The interpretation of types and antitypes is fueled by five kinds of information. The first is the meaning of the configuration itself (see Table 1, above). The meaning of a configuration can often be seen in tandem with its nature as a type or antitype. For instance, it may not be a surprise that there exist no toothbrushes with brushes made of steel. Therefore, in the space of dental care equipment, steel-brushed brushes may meaningfully define an antitype. Inversely, one may entertain the hypothesis that couples that stay together for a long time are happy. Thus, in the space of couples, happy, long lasting relationships may form a type. The second source of information is the CFA base model. The base model determines the nature of types and antitypes. Consider, for example, classical CFA which has a base model that proposes independence among all variables. Only main effects are taken into account. If this model yields types or antitypes, they can be interpreted as local associations (Havranek & Lienert, 1984) among variables. Another example is Prediction CFA (PCFA). As was explained above, P-CFA has a base model that is saturated both in the predictors and the criteria. The relationships among predictors and criteria are not taken into account, thus constituting the only possible reason for the emergence of types and antitypes. If P-CFA yields types or
Goals and Steps of CFA
13
antitypes, they are reflective of predictive relationships among predictors and criteria, not just of any association. The third kind of information is the sampling scheme. In multinomial sampling, types and antitypes describe the entire population from which the sample was drawn. In product-multinomial sampling, types and antitypes describe the particular population in which they were found. Consider again the above example where men and women are compared in the types of car they prefer and the number of miles they drive annually. Suppose a type emerges for men who prefer sport utility vehicles and drive them more than 20,000 miles a year. This type only describes the male population, not the female population, nor the human population in general. The fourth kind of information is the nature of the statistical measure that was employed for the search for types and antitypes. As was indicated above and will be illustrated in detail in Sections 3.8 and 7.2, different measures can yield different harvests of types and antitypes. Therefore, interpretation must consider the nature of the measure, and results from different studies can be compared only if the same measures were employed. The fifth kind of information is external in the sense of external validity. Often, researchers are interested in whether types and antitypes also differ in other variables than the ones used in CFA. Methods of discriminant analysis, logistic regression, MANOVA, or CFA can be used to compare configurations in other variables. Two examples shall be cited here. First, (Giirtelmeyer, 1988) identified six types of sleep problems using CFA. Then, he used analysis of variance methods to compare these six types in the space of psychological personality variables. The second example is a study in which researchers first used CFA to identify temperamental types among preschoolers (Aksan et al., 1999). In a subsequent step, the authors used correlational methods to discriminate their types and antitypes in the space of parental evaluation variables. An example of CFA with subsequent discriminant analysis appears in Section 10.9.2.
1.4
A first complete CFA data example
In this section, we present a first complete data analysis using CFA. We introduce methods “on the fly” and explain details in later sections. The first example is meant to provide the reader with a glimpse of the
14
CFA Data Examnle
statements that can be created using CFA. The data example is taken from von Eye and Niedermeier (1999). In a study on the development of elementary school children, 86 students participated in a program for elementary mathematics skills. Each student took three consecutive courses. At the end of each course the students took a comprehensive test, on the basis of which they obtained a 1 for reaching the learning criterion and a 2 for missing the criterion. Thus, for each student, information on three variables was created: Test 1 (Tl ), Test 2 (T2), and Test 3 (T3). Crossed, these three dichotomous variables span the 2 x 2 x 2 table that appears in Table 2, below. We now analyze these data using exploratory CFA. The question that we ask is whether any of the eight configurations that describe the development of the students’ performance in mathematics occurred more often or less often than expected based on the CFA base model of independence of the three tests. To illustrate the procedure, we explicitly take each of the five steps listed above. Step I: Selection of a CFA base model and estimation of expected cell frequencies. In the present example we opt for a log-linear main effect model as the CFA base model (for a brief introduction to log-linear modeling, see Appendix A). This can be explained as follows.
(1)
The main effect model takes the main effects of all variables into account. As a consequence, emerging types and antitypes will not reflect the varying numbers of students who reach the criterion. (Readers are invited to confirm from the data in Table 2 that the number of students who pass increases from Test 1 to Test 2, and then again from Test 2 to Test 3). Rather, types and antitypes will reflect the development of students (see Point 2).
(2)
The main effect model proposes that the variables Tl , T2, and T3 are independent of each other. As a consequence, types and antitypes can emerge only if there are local associations between the variables. These associations indicate that the performance measures for the three tests are related to each other, which manifests in configurations that occurred more often (types) or less often (antitypes) than could be expected from the assumption of independence of the three tests. It is important to note that many statistical methods require strong
Goals and Steps of CFA
15
assumptions about the nature of the longitudinal variables (remember, e.g., the discussion of compound symmetry in analysis of variance; see Neter, Kutner, Nachtsheim, & Wasserman, 1996). The assumption of independence of repeatedly observed variables made in the second proposition of the present CFA base model seems to contradict these assumptions. However, when applying CFA, researchers do not simply assume that repeatedly observed variables are autocorrelated. Rather, they propose in the base model that the variables are independent. Types and antitypes will then provide detailed information about the nature of the autocorrelation, if it exists. It is also important to realize that other base models may make sensetoo. For instance, one could ask whether the information provided by the first test allows one to predict the outcomes in the second and third tests. Alternatively, one could ask whether the results in the first two tests allow one to predict the results of the third test. Another model that can be discussed is that of randomness of change. One can estimate the expected cell frequencies under the assumption of random change and employ CFA to identify those instances where change is not random. The expected cell frequencies can be estimated by hand calculation, or by using any of the log-linear modeling programs available in the general purpose statistical software packages such as SAS, SPSS, or SY STAT. Alternatively, one can use a specialized CFA program (von Eye, 2001). Table 2 displays the estimated expected cell frequencies for the main effect base model. These frequencies were calculated using von Eye’s CFA program (see Section 12.3.1). In many instances, in particular when simple base models are employed, the expected cell frequencies can be hand-calculated. This is shown for the example in Table 2 below the table.
Step2: Selectionof a conceptof deviation. Thus far, the characteristics of the statistical tests available for CFA have only been mentioned, The tests will be explained in more detail in Sections 3.2 - 3.6, and criteria for selecting tests will be introduced in Sections 3.7 - 3.9. Therefore, we use here a concept that is widely known. It is the concept of the difference between the observed and the expected cell frequency, relative to the standard error of this difference. This concept is known from Pearson’s,J?test (see Step 4).
Step 3: Selection of a significance test. From the many tests that can be used and will be discussed in Sections 3.2 - 3.9, we select the Pearson y for the present example, because we suppose that this test is well known to
CFA Data Example most readers. The y component that is calculated for each configuration is
where i indexes the configurations. Summed, the y-components yield the Pearson%test statistic. In the present case,we focus on thez-components which serve as test statistics for the cell-specific CFA &. Each of the y statistics can be compared to the ?-distribution under 1 degree of freedom.
Step 4: Performing significance testsand iden@ing typesand antitypes. The results from employing they-component test and the tail probabilities for each test appear in Table 2. To protect the nominal significance threshold a against possible test-wise errors, we invoke the Bonferroni method. This method adjusts the nominal a by taking into consideration the total number of tests performed. In the present example, we have eight tests, that is, one test for each of the eight configurations. Setting a to the usual 0.05, we obtain an adjusted a * = a/8 = 0.00625. The tail probability of a CFA test is now required to be less than a* for a configuration to constitute a type or an antitype. Table 2 is structured in a format that we will use throughout this book. The left-most column contains the cell indices, that is, the labels for the configurations. The second column displays the observed cell frequencies. The third column contains the expected cell frequencies. The fourth column presents the values of the test statistic, the fifth column displays the tail probabilities, and the last column shows the characterization of a configuration as a type, T, or an antitype, A. The unidimensional marginal frequencies are Tl 1= 3 1, T 12 = 55, T2, = 46, T2, = 40, T3 1= 47, T3, = 39. We now illustrate how the expected cell frequencies in this example can be hand-calculated. For three variables, the equation is E,,
=
Ni..Nj.N..k
N2
’
where N indicates the sample size, Ni.. are the marginal frequencies of the first variable, AJ, are the marginal frequencies of the second variable, N,k are the marginal frequencies of the third variable, and i, j, and k are the indices for the cell categories. In the present example, i, j, k, = { 1,2).
Goals and Steps of CFA
17
Table 2: CFA of results in three consecutive mathematics courses Cell Indices
Significance Tests
Frequencies
Tl T2 T3
observed
expected
x2
11 1
20
9.06
13.20
0.0003
112
4
7.52
1.65
0.1993
121
2
7.88
4.39
0.0362
122
5
6.54
0.36
0.5474
211
19
16.08
0.53
0.466 1
212
3
13.34
8.02
0.0046
221
6
13.98
4.56
0.0328
Type/ Antitype ?
P” T
A
< a* T 20.44 11.60 27 222 a < a* indicatesthat this tail probability is smaller than can be expressedwith 4 decimal places. Inserting, for example, the values for Configuration 111, we calculate
6 111
=
3 1-46047 862 = g *062 -
This is the first value in Column 3 of Table 2. The values for the remaining expected cell frequencies are calculated accordingly. The value of the test statistic for the first configuration is calculated as
x2111
=
(20 - gJm2 9.062
= 13. 202 .
This is the first value in Column 4 of Table 2. The tail probability for this value is p = 0.0002796 (Column 5). This probability is smaller than the critical adjusted a* which is 0.00625. We thus reject the null hypothesis according to which the deviation of the observed cell frequency from the frequency that was estimated based on the main effect model of variable independence is random.
18
CFA Data Example
Step 5: Interpretation of types and antitypes. We conclude that there exists a local association which manifests in a type of success in mathematics. Configuration 111 describes those students who pass the final examination in each of the three mathematics courses. Twenty students were found to display this pattern, but only about 9 were expected based on the model of independence. Configuration 2 12 constitutes an antitype. This configuration describes those students who fail the first and the third course but pass the second. Over 13 students were expected to show this profile, but only 3 did show it. Configuration 222 constitutes a second type. These are the students who consistently fail the mathematics classes. 27 students failed all three finals, but less than 12 were expected to do so. Together, the two types suggest that students’ success is very stable, and so is lack of success. The antitype suggests that at least one pattern of instability was significantly less frequently observed than expected based on chance alone. As was indicated above, one method of establishing the external validity of these types and the antitype could involve a MANOVA or discriminant analysis. We will illustrate this step in Section 10.11.2 (see also Aksan et al., 1999). As was also indicated above, CFA results are typically non-exhaustive. That is, only a selection of the eight configurations in this example stand out as types and antitypes. Thus, because CFA results are non-exhaustive, one can call the variable relationships that result in types and antitypes ZocaZassociations. Only a non-exhaustive number of sectors in the data space reflects a relationship. The remaining sectors show data that conform with the base model of no association. It should also be noticed that Table 2 contains two configurations for which the values of the test statistic had tail probabilities less than the nominal, non-adjusted a = 0.05. These are Configurations 121 and 22 1. For both configurations we found fewer cases than expected from the base model. However, because we opted to protect our statistical decisions against the possibly inflated a-error, we are not in a situation in which we can interpret these two configurations as antitypes. In Section 10.3, we present CFA methods that allow one to answer the question whether the group of configurations that describe varying performance constitutes a composite antitype. The next chapter introduces log-linear models for CFA that can be used to estimate expected cell frequencies. In addition, the chapter defines CFA base models. Other CFA base models that are not log-linear will be introduced in the chapter on longitudinal CFA (Section 8.2.3).
2.
Log-linear Base Models for CFA
The main effect and interaction structure of the variables that span a crossclassification can be described in terms of log-linear models (a brief introduction into the method of log-linear modeling is provided in Appendix A). The general log-linear model is
log E = Xi , where E is an array of model frequencies, Xis the design matrix, also called indicator matrix, and h is a parameter vector (Christensen, 1997; Evers & Namboodiri, 1978; von Eye, Kreppner, & WeISels, 1994). The design matrix contains column vectors that express the main effects and interactions specified for a model. There exist several ways to express the main effects and interactions. Most popular are dummy coding and effect coding. Dummy coding uses only the values of 0 and 1. Effect coding typically uses the values of - 1, 0, and 1. However, for purposes of weighting, other values are occasionally used also. Dummy coding and effect coding are equivalent. In this book, we use effect coding because a design matrix specified in effect coding terms is easier for many researchers to interpret than a matrix specified using dummy coding. The parameters are related to the design matrix by
where p = log E, and the ’ sign indicates a transposed matrix. In CFA applications, the parameters of a base model are typically not of interest because it is assumed that the base model does not describe the data well. 19
20
Log-linear Base Models for CFA
Types and antitypes describe deviations from the base model. If the base model fits, there can be no types or antitypes. Accordingly, the goodnessof-fit y values of the base model are typically not interpreted in CFA. In general, log-linear modeling provides researchers with the following three options (Goodman, 1984; von Eye et al., 1994):
(1)
Analysis of thejoint frequency distribution of the variables that span a cross-classzfication.The results of this kind of analysis can be expressed in terms of a distribution jointly displayed by the variables. For example, two variables can be symmetrically distributed such that the transpose of their cross-classification, say A : equals the original matrix, A.
(2)
Analysis of the association pattern of response variables. The results of this kind of analysis are typically expressed in terms of first and higher order interactions between the variables that were crossed. For instance, two variables can be associated with each other. This can be expressed as a significant deviation from independence using the classical Pearsonp-test. Typically, and in particular when the association (interaction) between these two variables is studied in the context of other variables, researchers interpret an association based on the parameters that are significantly different than zero.
(3)
Assessmentof thepossible dependenceof a responsevariable on explanatory or predictor variables. The results of this kind of analysis can be expressed in terms of conditional probabilities of the states of the dependent variable, given the levels of the predictors. In a most elementary case, one can assume that the states of the dependent variable are conditionally equiprobable, given the predictor states.
Considering these three options and the status of CFA as a prime method in the domain of person-oriented research (see Section 1.2), one can make the different goals of log-linear modeling and CFA explicit. As indicated in the formulation of the three above options, log-linear modeling focuses on variables. Results are expressed in terms of parameters that represent the relationships among variables, or in terms of distributional parameters. Log-linear parameters can be interpreted only if a model fits.
CFA Base Models
21
In contrast, CFA focuses on the discrepancies between some base model and the data. These discrepancies appear in the form of types and antitypes. If types and antitypes emerge, the base model is contradicted and does not describe the data well. Because types and antitypes are interpreted at the level of configurations rather than variables, they indicate local associations (Havrtiek & Lienert, 1984) rather than standard, global associations among variables. It should be noticed, however, that local associations ofien result in the description of a variable association as existing. Although the goals of log-linear modeling and CFA are fundamentally different, the two methodologies share two important characteristics in common. First, both methodologies allow the user to consider all variables under study as response variables (see Option 2, above). Thus, unlike in regression analysis or analysis of variance, there is no need to always think in terms of predictive or dependency structures. However, it is also possible to distinguish between independent and dependent variables or between predictors and criteria, as will be demonstrated in Section 6.2 on Prediction CFA (cf. Option 3, above). Second, because most CFA base models can be specified in terms of loglinear models, the two methodologies use the same algorithms for estimating expected cell frequencies. For instance, the CFA program that is introduced in Section 12.3 uses the same Newton-Raphson methods to estimate expected cell frequencies as some log-linear modeling programs. It should be emphasized again, however, that (1) not all CFA base models are log-linear models, and (2) not all log-linear models qualify as CFA base models. The chapters on repeated observations (Part III of this book) and on Bayesian CFA (Section 11.12) will give examples of such base models. Section 2.1 presents sample CFA base models and their assumptions. These assumptions are important because the interpretation of types and antitypes rests on them. For each of the sample base models, a design matrix will be presented. Section 2.2 discusses admissibility of log-linear models as CFA base models. Section 2.3 discusses the role played by sampling schemes, Section 2.4 presents a grouping of CFA base models, and Section 2.5 summarizes the decisions that must be made when selecting a CFA base model.
22 2.1
CFA Base Models
Sample CFA base models and their matrices
design
For the following examples we use models of the form log E = xh, where E is the array of expected cell frequencies, Xis the design matrix, and h is the parameter vector. In the present section, we focus on the design matrix X, because the base model is specified in X. The following paragraphs present the base models for three sample CFA base models: classical CFA of three dichotomous variables; Prediction CFA with two dichotomous predictors and two dichotomous criterion variables; and classical CFA of two variables with more than two categories. More examples follow throughout this text. The base model of classical CFA for a cross-classljkation of three variables. Consider a cross-classification that is spanned by three dichotomous variables and thus has 2 x 2 x 2 = 8 cells. Table 2 is an example of such a table. In “classical” CFA (Lienert, 1969), the base model is the log-linear main effect model of variable independence. When estimating expected cell frequencies, this model takes into account
(1)
The main effects of all variables that are crossed. When main effects are taken into account, types and antitypes cannot emerge just because the probabilities of the categories of the variables in the cross-classification differ; None of the first or higher order interactions. If types and antitypes emerge, they indicate that (local) interactions exist because these were not part of the base model.
Consider the data example in Table 2. The emergence of two types and one antitype suggeststhat the three test results are associated such that consistent passing or failing occurs more often than expected under the independence model, and that one pattern of inconsistent performance occurs less often than expected. Based on the two assumptions of the main effect model, the design matrix contains two kinds of vectors. The first is the vector for the intercept, that is, the constant vector. The second kind includes the vectors for the main effects of all variables. Thus, the design matrix for this 2 x 2 x 2 table is
23
CFA Base Models 1
1
1
-1 1
X=
-1 -1
-1 1 1
-1 1
1
-1 -1
-1
-1
-1
1 ’ -1 1
-1
The first column in matrixXis the constant vector. This vector is part of all log-linear models considered for CFA. It plays a role comparable to the constant vector in analysis of variance and regression which yields the estimate of the intercept. Accordingly, the first parameter in the vector h, that is, &, can be called the intercept of the log-linear model (for more detail see, e.g., Agresti, 1990; Christensen, 1997). The second vector in X contrasts the first category of the first variable with the second category. The third vector in Xcontrasts the first category of the second variable with the second category. The last vector in Xcontrasts the two categories of the third variable. The order of variables and the order of categories has no effect on the magnitude of the estimated parameters or expected cell frequencies. The base modelfor Prediction CFA with two predictors and two criteria. This section presents a base model that goes beyond the standard main effect model. Specifically, we show the design matrix for a model with two predictors and two criteria. All four variables in this example are dichotomous. The base model takes into account the following effects:
(1)
Main effects of all variables. The main effects are taken into account to prevent types and antitypes from emerging that would be caused by discrepancies from a uniform distribution rather than predictor-criterion relationships.
(2)
The interaction between the two predictors. If types and antitypes are of interest that reflect local relationships between predictors and criterion variables, types and antitypes that are caused by relationships among the predictors must be prevented. This can be
CFA Base Models
24
done by making the interaction between the two predictors part of the base model. This applies accordingly when an analysis contains more than two predictors. (3)
The interaction between the two criterion variables. The same rationale applies as for the interaction between the two predictors.
If types and antitypes emerge for this base model, they can only be caused by predictor-criteria relationships, but not by any main effect, interaction among predictors, or interaction among criteria. The reason for this conclusion is that none of the possible interactions between predictors and criteria are considered in the base model, and these interactions are the only terms not considered. Based on the effects proposed in this base model, the design matrix contains three kinds of vectors. The first is the vector for the intercept, that is, the constant vector. The second kind includes the vectors for the main effects of all variables. The third kind of vector includes the interaction between the two predictors and the interaction between the two criterion variables. Thus, the design matrix for this 2 x 2 x 2 x 2 table is
x=
1 1
1 1
1 1
1
1
l-l
1 1
1 l-l
l-l
1
1
-1
1 1 1 1
1 1
-1 -1
1 1-l
1 l-l
1
l-l
-1 1 1 -1 -1 1
-1 -1
1 1
1 -1
1 1 1
-1
1
-1
-1 -1
-1 -1
-1
1
-1
-1
-1
1
-1
-1
-1
-1
-1
-1 -1
-1 1
l-l -1 1 -1
1 1
1 1
l-l 1
1
1
1
-1
-1
1
1
-1 -1
-1 -1
-1
1
1
1 1
-1
1
1
-1
-1 -1
1
1
1
1
CFA BaseModels
25
This design matrix displays the constant vector in its first column. The vectors for the four main effects follow. The last two column vectors representthe interactions betweenthe two predictors and the two criteria. The first interaction vector resultsfrom element-wisemultiplication of the secondwith the third column in X. The secondinteraction vector results from element-wisemultiplication of the fourth with the fifth column vector in X. The base modelfor a CFA of two variables with more than two categories.
In this third example,we createthe designmatrix for the basemodel of a CFA for two variables.The model will only take main effects into account, so that types and antitypes can emerge only from (local) associations between these two variables. The goal pursued with this example is to illustrate CFA for a variable A which hasthree and variable B which has four categories.The designmatrix for the log-linear main effect model for this cross-classificationis
X=
1
1
0
1
0
0
1
1
0
0
1
0
1 1
1 1
0 0
0 -1
0 -1
1
1
0
1
1
0
0
1
0
1
0
1
0
1
0
1
0
0
1
1
0
1
-1
-1
-1
-1
1
-1
-1
1
0
0
1
-1
-1
0
1
0
1 1
-1 -1
-1 -1
0 -1
0 -1
1 -1
The first vector in this design matrix is the constant column, for the intercept. The second and third vectors represent the main effects of variable A. The first of thesevectorscontraststhe first categoryof variable A with the third category.The secondof thesevectorscontraststhe second category of variable A with the third category. The last three column vectors of X representthe main effects of variable B. The three vectors contrastthe first, second,andthird categoriesof variable B with the fourth category. Notation. In the following sections,we usethe explicit form of the design matrices only occasionally, to illustrate the meaning of a basemodel. In
CFA Base Models most other instances, we use a more convenient form to express the same model. This form is log E = XI. Because each column of X is linked to one h, the model can uniquely be represented by only referring to its parameters. The form of this representation is
log E = h, +
c
main eflects
‘i
c first order interactions
h, +
c
Ailk + *..,
second order interactions
where & is the intercept and subscripts i, j, and k index variables. For a completely written-out example, consider the four variables A, B, C, and D. The saturated model, that is, the model that contains all possible effects for these four variables is
log E = h, + ?L; + X; + hf + Xf BD AD + k’k” + h/l + ‘il ABD BCD ACD + ‘ikl + hJkl + hj, AC
+ ~~-B+ hik
CD + ‘kl
ABC + 4jk ABCD + ‘ijkl 7
where the subscripts index the parameters estimated for each effect, and the superscripts indicate the variables involved. For CFA base models, the parameters not estimated are set equal to zero, that is, are not included in the model. This implies that the respective columns are not included in the design matrix. To illustrate, we now reformulate the three above examples, for which we provided the design matrices, in terms of this notation. The first model included three variables for which the base model was a main effect model. This model includes only the intercept parameter and the parameters for the main effects of the three variables. Labeling the three variables A, B, and C, this model can be formulated as
The second model involved the four variables A, B, C, and D, and the interactions between A and B and between C and D. This model can be formulated as
-27
CFA Base Models
The third model involved the two variables A and B. The base model for these two variables was
log E = h, + h:’ + A,;. This last expression shows that the h-terms have the same form for dichotomous and polytomous variables.
2.2
Admissibility models
of log-linear models as CFA base
The issue of admissibility of log-linear models as CFA base models is covered in two sections. In the present section, admissibility is treated from the perspective of interpretability. In the next section, we introduce the implications from employing particular sampling schemes. With the exception of saturated models which cannot yield types or antitypes by definition, every log-linear model can be considered as a CFA base model. However, the interpretation of types and antitypes is straightforward in particular when certain admissibility criteria are fulfilled. The following four criteria have been put forth (von Eye & Schuster, 1998):
(1)
Uniquenessof interpretation of typesand antitypes.This criterion requires that there be only one reason for discrepancies between observed and expected cell frequencies. Examples of such reasons include the existence of effects beyond the main effects, the existence of predictor-criterion relationships, and the existence of effects on the criterion side.
Consider, for instance, a cross-classification that is spanned by the three variables A, B, and C. For this table, a number of log-linear models can serve as base models. Three of these are discussed here. The first of these models is the so-called null model. This is the model that takes into account no effect at all (the constant is usually not considered an effect). This model has the form log E = lh, where 1 is a vector of ones, and X contains only the intercept parameter. If this base model yields types and antitypes, there must be non-negligible effects that allow one to describe the data. Without further analysis, the nature of these effects remains unknown. However, the CFA types and antitypes indicate where “the action is,” that is, where these effects manifest. This interpretation is unique in the
28
Admissibilitv of CFA Base Models
sense that all variables have the same status and effects can be of any nature, be they main effects or interactions. No variable has a status such that effects are a priori excluded. Types from this model are always constituted by the configurations with the largest frequencies, and antitypes are always constituted by the configurations with the smallest frequencies. This is the reason why this base model of CFA has also been called the base model of Conjigural Uuster Analysis (Kruger, Lienert, Gebert, & von Eye, 1979; Lienert & von Eye, 1985; see Section 5.1). The second admissible model for the three variables A, B, and C is the main effect model log E = ho + Xf + ky + 1:. This model also assigns all variables the same status. However, in contrast to CCA, types and antitypes can emerge here only if variables interact. No particular interaction is excluded, and interactions can be of any order. Main effects are part of the base model and cannot, therefore, be the reason for the emergence of types or antitypes. Consider the following example of Configural Cluster Analysis (CCA) and Configural Frequency Analysis (CFA). In its first issue of the year 2000, the magazine Popdar Photography published the 70 winners and honorable mentions of an international photography contest (Schneider, 2000). The information provided in this article about the photographs can be analyzed using the variables Typeof Camera (C; 1 = medium format; 2 = Canon; 3 = Nikon; 4 = other), Type of Film used (F; 1 = positive film (slides); 2 = other (negative film, black and white, sheet film, etc.)), and Price Level (P; 1 = Grand or First Prize; 2 = Second Prize; 3 = Third Prize; 4 = honorable mention). We now analyze the 4 x 2 x 4 cross-tabulation of C, F, and P using the null model of CCA and the model of variable independence, that is, the main effect base model of CFA. Table 3 displays the cell indices and the observed cell frequencies along with the results from these two base models. For both analyses we used an approximation of the standard normal z-test (this test will be explained in detail in Section 3.3), and we Bonferroni-adjusted c1= 0.05 which led to a* = 0.05/32 = 0.0015625. The results in the fourth column of Table 3 suggest that three configural clusters and no configural anticlusters exist. The first cluster, constituted by configuration 224 suggests that more pictures that were taken with’ Canon cameras on negative film were awarded honorable mentions than expected based on the null model. The second cluster, constituted by Configuration 3 14, suggests that more pictures that were taken with Nikon
CFA Base Models Table 3:
Cell Indices CFP
29
CFA of contestwinning pictures based on null model and independence model Null model Observed frequencies
2.uk
Pijk
Independence model 6.. vk
Pgk
111
2.188
.2110
.456
.21
112
2.188
.2110
.414
.18
113
2.188
.2110
.414
.18
114
2.188
.0670
1.616
.lO
121
2
2.188
.4500
.644
.05
122
0
2.188
.0670
.586
.22
123
0
2.188
.0670
.586
-22
124
2
2.188
.4500
2.284
.43
211
0
2.188
.0670
1.367
.42
212
0
2.188
SO670
1.243
.13
213
2
2.188
.4500
1.243
.25
214
4
2.188
.1102
4.847
.35
221
3
2.188
.2914
1.933
.22
222
2
2.188
.4500
1.757
.43
223
2
2.188
.4500
1.757
.43
224
8
2.188
yi+l, a - was assigned.The secondvariable is the early successcriterion (S). A + was assignedif a subjectreachedthe criterion before the eighth trial, and a - was assignedif a subject neededall eight trials. The third variable is the number-of-errors criterion (F). The number of wrong associationswas counted in addition to the number of hits. A + was assignedif a subject producedmore errors
CFA of seriesthat differ in length than the grand median, and a - was assignedif a subject produced fewer errors. Table 79 displays the (2 x 2 x 2) x 2 cross-classificationof M, S, and F, with Gender,G. Insteadof performing a standardtwo-sampleCFA, we now employ a prediction test as presentedfor biprediction CFA in Section 6.2.2.2. Specifically, we comparefemaleswith males in configuration - - - of the Table 79:
Cross-classification of the monotonic trend (M), early success (S), and number of mistakes (F) in two samples of males and females
Configuration
Comparisongroups Totals
males
females
+++
12
12
24
++-
2
3
5
+-+
3
2
5
+ --
6
6
12
-++
5
6
11
-+-
3
2
5
-- +
2
2
4
-Be
15
4
19
Totals
48
37
85
MSF
three variables M, S, and F. The test is X2
= NW - W2* ABCD
Inserting yields x2 = 85( 15.33 - 4*33)2 = 5.029. 19*66*48*3 7 For df= 1, this value has a tail probability of p = 0.0249. Thus, we can
CFA of Level. Variabilitv. and Shane reject the null hypothesis,according to which configuration - - - doesnot allow oneto discriminate betweenmalesand females.Note that 01doesnot needto be adjusted,becausewe performed only one test. In contrast to routine exploratory CFA, testing only a subsetof configurations is part of confirmatory or explanatory CFA. In the example in Table 79 we only askedwhethermalesandfemalesdiffer in regardto the pattern non-monotonicslope - no early success- abovemediannumber of errors. This hypothesis was largely fueled by an inspection of the frequenciesin Table 79.In substantiveapplications,theory andprior results are needed to justify the selection of configurations for confirmatory analysis. The main advantageof confirmatory CFA is that the number of tests is smaller than in exploratoryCFA. The protection of the family-wise or experiment-wisea only needsto take into accountthis smaller number. Thus, the a* that results in confirmatory CFA can be far less prohibitive than the a* in exploratory CFA. The next section presents additional examplesof confirmatory applications of CFA.
9.5
Examining treatment effects using CFA; more confirmatory CFA
This sectionpresentsmethodsfor a rather detailedconfigural examination of treatment effects. These methods are presentedfor pre-post designs without control group in Section 9.5.1 and with control group in Section 9.52. 9.5.1
Treatment effects in pre-post designs (no control group)
In evaluative and experimental research researcherstypically pursue specific, a priori formulated hypotheses.Data are examined in regard to these hypotheses. The analyses involve data exploration only in a secondarystep, if at all. In this section,we exemplify application of confirmatory CFA in an evaluation study. Lienert and Straube(1980) treated a sample of 75 acute schizophrenicswith neuroleptic drugs for two weeks. Before and after this treatment, the patients were administeredthe Brief Psychiatric Rating Scale(Overall & Gorham,1962).Three of the seventeensymptoms captured by this instrument are used for the following analyses: W = emotional withdrawal; T = thought disturbances;and H = hallucinations.
CFA of treatment effects Each of the symptoms was scaled as either present (+) or absent (-), Table 80 displays the data. Table 80:
Number of
Evaluation of treatment of schizophrenics with neuroleptic drugs in a pm-post study Number of symptomsafter
Configurations
treatment
symptoms before treatment
1
2
3
0
Totals
WTH
+++
1
10
4
0
15
6
11
17
4
38
1
+ --+w- +
1
4
7
4
16
0
w-m
0
1
2
3
6
8
26
30
11
75
3
++-
2
+-+ -++
Totals
We now ask whether the number of patients who display fewer symptoms after the treatment is greater than the number of patients with more symptoms. Table 80 has been arranged such that a count that leads to an answer can easily be performed. Instead of the usual arrangement of configurations in which all permutations are created using a routine scheme in which the last variable is the fastest changing one, the second last variable is the one changing next, and so on, the arrangement in Table 80 groups configurations based on the number of + signs. That is, configurations are grouped based on the number of symptoms displayed by the patient. Looking at the rows, the top configuration includes the patients who suffer from all three symptoms (Row 1). Then come three configurations with two symptoms. These three configurations are considered one category, the category of two symptoms. The following
CFA of Level. Variabilitv. and Shape three configurations are also considered one category, the one with one symptom. The last category includes the patients who show none of the three symptomsunder study.All this appliesaccordingly to the columnsin Table 80. The patientswho suffer from fewer symptomsafter the treatment can be found in the upperright triangle of Table 80, excludingthe diagonal. For example, the 10 patients in the secondcell in Row 1 are those who suffered from all three symptomsbefore the treatmentand from only two symptomsafter the treatment.The first row also indicates that no patient was freed from all three symptoms.The total numberof patientsfreed from one or two symptomsis 10+ 4 + 0 + 17+ 4 + 4 = 39. No patient was freed from all three symptoms. The patientswho suffer from more symptomsafter the treatment than beforecanbe found in the lower left triangle of the cross-classification in Table 80, again excluding the diagonal. For example, the table shows that one patient suffered from only one symptombefore the treatmentbut from all three symptomsafter the treatment(Row 3, Column 1). The total of patients with an increasein the number of symptomsis 6 + 1 + 4 + 0 + 1 + 2 = 14.
To compare these two frequencies, the one that indicates the number of improved patients and the one that indicates the number of deteriorated patients, we posit as the null hypothesis that there is no difference. That is, discrepancies between these two frequencies are random in nature. There is a number of tests that can be usedto test this null hypothesis.Examples include the binomial test given in Section 3.2 and its normal approximations,given in Section 3.3; symmetry tests (see below); and the diagonal-half sign test. For the latter, let b denote the number of patients who improved, and w the number of patients who disimproved. Then, the null hypothesisof no difference betweenb and w can be testedusing z =
b-w j/z-G’
The test statistic is approximately normally distributed. Alternatively, in particular when the samplesare small, the binomial test can be usedwith p = 0.5.
To illustrate thesetwo testswe usethe data in Table 80. We insert in the z-test formula and obtain
CFA of treatmenteffects 39 - l4 z=@m7
= 3 434 * ’
andp = 0.0003. We thus conclude that the neuroleptic drugs reduce the number of symptoms in schizophrenic inpatients. The same probability results from the normal approximation of the binomial test. More detailed hypothesescan be testedby focusing on individual symptoms.Two methodsof analysisare suggested.First, one can createa pre-interventionx post-interventioncross-tabulationfor eachsymptomand analyzethe resulting Ix I table using the Bowker test ( 1948;cf. von Eye & Spiel, 1996),where I indicatesthe number of categories,or the McNemar test (1947), when I = 2. The test statistic for both tests is x2
= C
C i
j
cNg
No
-
N,I)’
+
,
Nji
for i>j and i,j= 1, .... I. This test statistic is approximatelydistributed as x2with df =
i
. For I= 2, this equation simplifies to
0 x2
= (b - N2 b+w
’
with df= 1 or, with continuity correction, b+w
also with df= 1, where b and w denote the cell frequenciesN12and Nzl, respectively. Consider the following example. The cell frequencies for the symptom hallucinations in the neuroleptic drug treatmentstudy are + + = 8,+-=21,-+=9,and-= 32. For thesevalues we calculate X2 = c21 - 9>2 = 480 21+9 ’ * For df = 1, the tail probability of this value isp =0.0285.We thus can reject the null hypothesisthat the neurolepticdrug treatmentonly leadsto random changesin hallucinations.
CFA of Level, Variabilitv. and Shape 9.52
Treatment effects in control group designs
Control groupsare often consideredan indispensablenecessityin research on treatment effects. Control groups allow researchersto distinguish betweenspontaneousrecoveryor spontaneouschangeson the onehandand treatment effects on the other hand. CFA allows one to compare experimental groups and control groups with two-sample CFA (see Sections7.1 and 7.2). When therearemorethan two groups,multi-sample CFA can be employed(see Section 7.3). In this section, we show how two samplescan be compared in regard to the change from one configuration to another. Consider the following scenario.PatternA is observedbeforetreatment.PatternB is the desiredpattern, and is observedafter the treatment.Both observationsare made both in the treatment and the control groups.Then, the two groups can be comparedin regardto the changefrom PatternA to PatternB based on the 2 x 2 tabulation that is schematizedin Table 81. Table 81:
2 x 2 table for the comparison of two groups in one pattern shift
Comparisongroups Patterns
all others combined Totals
Treatment
Control
b
b’
a+c+d n
a’+c’+d’ n’
Totals NW3 n +n’-Nm n+-n’
The middle columns in Table 81 separatethe treatment and the control groups. The frequenciesof the treatment group can be consideredtaken from a 2 x 2 Table of the format given in Table 82. The frequenciesof the control group can be consideredtaken from an analogous2 x 2 table. Frequencyb in Table 82 is the number of treatment group cases who switched from symptom Pattern A to symptom Pattern B. The remaining three cells contain caseswho stayed stable or switched from Pattern B to Pattern A. The cell labels in Table 81 indicate that the same frequenciesare usedas in Table 82. Thus, cell frequency b in Table 8 1 is
CFA of treatmenteffects the sameas cell frequency b in Table 82. This applies accordingly to the control group, for which a cross-classificationparallel to the one in Table 82 can be constructed.The frequenciesin Table 81 can be analyzedusing the methodsdescribedin Sections7.1 (Table 47) and 7.2. Table 82:
2 x 2 table of pattern change in treatment group
PatternsPretreatment
Post-treatment Totals
A
B
A
a
b
a+b
B
c
d
c+d
Totals
a+c
b+d
n
Data example.The numberof respondentsin Lienert and Straube’s(1980) investigationon the effectsof neurolepticdrugswho switchedfrom Pattern + + + to Pattern+ + - was b = 9. The frequencya + c + d is then 66. Now supposethat in a control group of size 54 only 2 patientsshowedpattern+ + +/+ + -. From thesefrequencies,the cross-classificationin Table 83 can be created. Table 83:
Two-sample comparison pattern + + +/+ + -
with
respect to change
Comparisongroups Patterns + i- +/+ + -
all others combined Totals
Treatment b=9
Control b’=2
a+c+d= 66
a’+c’+d’= 52
n=75
n ‘=54
Totals N+++/++- = 11 n +n’-N+++,+.-= 118 n-h+129
Using the exact Fisher test described in Section 7.1, we calculate a probability ofp = 0.086.Using they-test without continuity correction,we
CFA of Level, Variabilitv. and Shape calculated = 2.77 andp = 0.096 (df= 1). The conclusion madein Section 9.5.1, that is, the conclusion that the neuroleptic drugs improve hallucination problems in schizophrenics,must thus be qualified. While there is a significant improvementin units of the numberof hallucinations from the first to the second observation, this improvement cannot be considered caused by the drug treatment. The control group patients experience improvementsthat are not significantly different than those experienced by the patients in the treatment group. This result again illustrates that the use of control groups can prevent researchersfrom drawing wrong conclusions.
9.6
CFA of patterns of correlation distance sequences
or multivariate
Thus far, we have covered CFA of the following characteristicsof series of measures:
(1) (2) (3)
slope, curvature and higher order characteristicsof series in the forms of differences and polynomial parameters; location/elevationin the form of meansof ipsative scoresrelative to somereference; variability of series of measuresas assessedby von Neumann’s variance.
A fourth characteristicof seriesof measuresis their autocorrelation structure.Repeatedobservationstypically arestrongly correlatedwith each other (autocorrelation).It can be of interestto researchersto identify types andantitypesof autocorrelations.Changesin the correlational structurecan be as interesting and important as changes in the mean or slope characteristics.A fifth characteristicof seriesof measurescan be captured by multivariate distances.In Section 9.1, we only consideredunivariate distances in the form of first, second, and higher order differences. Multivariate distancesreflect differences between vectors of measures. This section is concernedwith CFA of autocorrelationsand multivariate distances.
266
CFA of nattemsof correlation or multivariate distancesequences
9.6.1
CFA of autocorrelations
Consider the data box (Cattell, 1988) in Figure 9. This box describesthe data that are collected from a number of individuals in a number of variableson a numberof occasions.The r1,12andr1.23on the right-hand side of the box are correlations. rl.,* indicates that, at the first occasion (first subscript), Variables 1 and 2 (last two subscripts)are correlated using all subjects(period in the subscript).r1.23indicates that, at the first occasion (first subscript),Variables 2 and 3 (last two subscripts)arecorrelatedusing all subjects(period in the subscript).Using all threeoccasions,for instance, the correlations Y1.129 r1.13, *2.12, r2.13, y3.12~ and r3.13 can be estimated.
r1.12
r1.23
Figure 9: Cattell’s data box
Individuals
In general,six correlation matricescan be createdfrom a data box as the one depicted in Figure 9. Each of these correlation matrices corresponds to one of the six elementary factor analytic techniques described by Cattell (1988). The first correlation matrix is of the individuals x variables type. The factor analytic R technique is used to extract factors of variables from this matrix. The secondmatrix is of the variablesx individuals type, yielding factors of people(Q technique).The third matrix, occasionsx variables, usesthe P techniqueto createfactors of variables. The fourth matrix, variables x occasions,yields factors of occasions(0 technique). The fifth matrix, occasionsx individuals, yields factors of people (S technique), and the sixth matrix, individuals x occasions,yields occasionsfactors (T technique).
CFA of Level. Variabilitv. and Shane Each of these matrices can also be subjected to a CFA. The matrices that contain correlations that vary acrossoccasionsare the most interesting ones in the presentcontext of methods of longitudinal CFA. Which of these is selectedfor a particular analysis is determined by the researchers’researchtopic. None of the options is a priori superior. CFA of sucha correlation matrix proceedsin the following steps:
(4) (5) (6) (7)
Creatingofthe correlation matricesof interest,e.g.,the individuals x variables matrix, separatelyfor eachoccasion; Categorizing correlations; Creating cross-classificationof the categorizedcorrelations; Performing CFA.
It should be mentioned that a very large number of correlation measures has been proposed. Correlations can be calculated between categoricalmeasures,continuousmeasures,or measuresthat differ in scale level. Any of thesemeasurescan be usedfor CFA of autocorrelations. Data example. The following data example, taken from von Eye (1990), illustrates these four steps.A sampleof 148 individuals participated in a study on fatigue andmood changescausedby a memoryexperiment.In the experiment,subjectshadto read and recall narratives.Immediately before and after the experiment,subjectswere presentedwith a questionnairethat measuredanxiety, arousal,and fatigue. The subjectswent through two of theseroutines, thus filling the questionnairea total of four times. In the first step, time-adjacentvectors of scoreswere correlated separately for each individual. The 3 x 4 matrix of raw scoresfor each subject was thus transformed into a vector of three correlations. These correlations comparethe first with the second,the secondwith the third, and the third with the fourth responsesto the questionnaire. In the second step, these correlations were categorized. The distribution was bimodal with one mode at aroundY= - 0.80 and the other mode at around r = 0.99. There were more positive than negative correlations.The medianwas locatedat r = 0.9. Still, the cutoff was chosen to be at r = 0.5. This value identifies the minimum of the frequencies betweenthe two modes. Correlations abovethe mode were assigneda 1, correlations below the mode were assigneda 2. In Step3, the threedichotomizedcorrelationswere crossedto form a 2 x 2 x 2 tabulation. This tabulation appearsin Table 84, along with the results of CFA. We used Lehmacher’stest with Ktichenhoff s continuity
268
CFA of natterns of correlation or multivariate distance sequences
correction, and Holm’s procedure to protect a which led to ~1; = 0.00625. Table 84:
con-.
CFA of correlations fatigue and mood Frequencies
between four observations of
Test statistics
Holm procedure
obs.
exp.
z
P
Rank w
critical P
111
65
56.41
2.61
.005
1
.006
112
12
13.16
-0.27
.393
6
.017
121
31
38.46
-2.34
.OlO
3
.008
122
9
8.97
-0.2 1
.418
7
.025
211
8
14.95
-2.55
.005
2
.007
212
3
3.49
0.01
.497
8
.05
221
16
10.19
2.23
.013
4
.Ol
222
4
2.38
0.79
.213
5
.0125
r12‘23
‘34
Type ? T
A
The results in Table 84 suggest that one type and one antitype exist. The type, constituted by Pattern 111, describes those subjects who have above cutoff correlations throughout. Thus, the strength of the autocorrelation of these subjects’ mood and fatigue scores does not seemto be affected by the two experimental trials. The antitype is constituted by Pattern 2 11. These are subjects who display a low or negative correlation between the mood and fatigue scores observed before and after the first trial. The correlations between the measures after the first and before the second trial are above the cutoff, and so are the correlation between the measures before and after the second trial.
CFA of Level. Variabilitv, and Shane 9.6.2
CFA of autodistances
It is well known that distancesand correlations are independentof each other. Correlationscanbe high or low regardlessof distanceandvice versa. Therefore, researchersoften consider both correlations and distancesin their analysesrather than only one of the measures.In this section,we call the multivariate distances between time-adjacent observations autodistances. This term can be viewed parallel to the term autocorrelations.
Many measuresof distancehave beenproposed.The best known is the Euclidean distance
s = co 21
j+lJ - YjJ)‘,
i
where i indexesthe observationsandj indexesthe variables(or other units of analysis). The Euclidean distanceand many other measuresof distance can be derived from the Minkowski metric d, =
1 ‘lr.
For instance,setting r = 2 yields the Euclidean distance,and r = 1 yields the city block metric. (Here, r is a distanceparameter,not a correlation.) It is important to take into accountthat the Euclideandistanceuses raw scores.Thus, if scalesare not commensurable(samescaleunits), there may be a weighting such that the scaleswith large numbersdominate the distancemeasurementto the extent that the scalewith the smaller numbers becomeirrelevant.Before usingdistances,researchersarethereforeadvised to make sure their scalesare commensurable. CFA of autodistancesproceedsin the samefour stepsas CFA of autocorrelations: Creating the distance matrices of interest, for example, the individuals x variables matrix, separatelyfor eachoccasion; Categorizing distances; Creating the cross-classificationof the categorizeddistances; Performing CFA. Data example. To illustrate that CFA of autocorrelations and CFA of autodistancescanyield different patternsof typesandantitypes,we usethe
270
CFA of patternsof correlation or multivariate distanceseouences
same data as in Section 9.6.1. The data were collected in a memory experimentin which 148subjectsreadandrecallednarrativesin two trials. Before and after eachtrial, the subjectsprovided information on mood and fatigue. For the following CFA, the distancesbetweenthe mood andfatigue scoresadjacent in time were calculated.The dichotomizedvariableswere scoredasa 1when their raw scoresincreasedanda 2 when their raw scores decreased.The cross-classification of the three dichotomized distances appears in Table 85, along with the results of CFA. To make results comparable with those in Section 9.6.1, we used Lehmacher’s test with Kuchenhoff’s continuity correction and Helm’s adjustmentof c1which led to a; = 0.00625. Table 85:
Distance s12s23
s34
CFA of distances between four observations of fatigue and mood Frequencies obs.
exp.
Test statistics Z
P
Holm procedure Rank
critical
cP>
P
111
17
26.25
-2.674
.0037
4
.Ol
112
18
17.40
0.033
.4867
7
.025
121
38
24.87
3.905
< a*
2
.007
122
12
16.49
-1.357
.0874
5
.013
211
16
19.46
-0.965
.1673
6
.017
212
25
12.90
4.228
~*w%l
?Ly
@W%, + m112 - m121 - m122 + m211
hk”
0-25(m,,, - ml12 + ml21 - m122 -I-m211 - m212 + m221 - m222)
kiM
0.w%,, + m112 - ml21 - m122 - m211 - m212 + m221 + m222)
hpkv
0mm,,* - ml12 + ml21 - m122 - m211+
MV
hlk
0*25h,,
+
+ m121+
hp
m112
- ml12
- m121+
ml22
m122
m211
+m2,2
m212
- m221 - m222)
- m221+
m222)
+ m21r - m2I2 - m221 + m222>
Table A3 shows the following characteristics of parameters in log-linear modeling:
(1)
In orthogonal designs, that is, in designs in which the correlations among the column vectors in Xare zero, the weight with which the
Appendix A: Log-linear modeling
(2)
cell frequencies are used in hierarchical models is always equal. The weight can vary in nonstandard designs and in nonorthogonal designs. The meaning of a parameter is given by the pattern of signs and by the weights of the cell frequencies in the equations in the right- hand column in Table A3. For instance, the sign pattern + + ++---for parameter 1: shows that the magnitude of this parameter is the result of the comparison of the first four cells (these are the cells that fall in the first category of the Penalty variable) with the second four cells (these are the cells that fall in the second category of the Penalty variable). This applies accordingly to the other main effect terms. To explain the meaning of the interaction terms, consider, for example, the parameter for the interaction between Penalty and Race of Murderer, Xy”. The signs for this parameter are + + - - - - + +. The first four signs are the same as in the vector for the main effect of Murderer, hJ”. The second four signs are the inverse of the first four signs. This interaction is thus used to test the hypothesis that the main effect Murderer is the same across the two categories of the variable Death Penalty. Equivalently, one can say that the parameter $” is used to test whether the main effect Death Penalty is constant across the two categories of the variable Race of Murderer. The parameters for the other two two-way interactions and the threeway interaction (not represented in Table A3) can be interpreted in analogous fashion.
In the above data example, all parameters are significant, thus explaining significant portions of the information in the D x M x V crossclassification given in Table Al. To give an example, let’s interpret parameter 1:“. The sign pattern for this parameter is + - + - - + - +. The first four of these signs correspond to those for the main effect parameter h;. The second four are inverted. Thus, using the parameter kzv, one tests the hypothesis that the main effect Race of Victim is the same across the two categories of the variable Death Penalty. As before, one can also say that the parameter hpkyis used to test whether the main effect Race of Victim is constant across the categories of the variable Death Penalty. This
Annendix A: Log-linear modeling applies accordingly to interactions of any level. The relationshin between log-linear modeling and CFA. The following brief discussionof the relationshipbetweenlog-linear modeling and CFA focusesmostly on thosecasesin which either log-linear modelsare created using modelsthat could also be usedasbasemodelsfor CFA and methods of residual analysisthat are also usedin CFA, or vice versa. The fact that (a) log-linear models exist that cannot be CFA base models and (b) methodsof calculating expectedfrequenciesexist that are not basedon the log-linear model indicate that the two methodsoverlap only partly. When comparingthe two methods,it mustbe notedthat many CFA basemodels can be cast in terms of log-linear models. Most of these are hierarchical models.Someinclude covariates.Accordingly, the methodsof estimating expectedcell frequenciesare the samealso. What then is the difference betweenthe two methods?The basicdifferenceslie in the goals of analysis. The method of log-linear modeling, while applicable in the context of person-orientation,is mostly used in the context of variablecenteredresearch(seeSection 1.2).Resultsaretypically expressedin terms of variable relationships suchas interactionsor dependencystructures.In contrast,CFA is the prime method of person-centeredresearch.CFA asks whether configurations (e.g., personprofiles) occur at ratesdifferent than expected, or whether groups of individuals differ significantly in the occurrenceratesof particular profiles. Lehmacher(2000) calls CFA a cellorientedmethod. Thesediverging goalshaveone major implication which concernsthe role played by the models under scrutiny. In log-linear modeling, researchersattemptto identify the model that best describesthe data. In addition, this model must be parsimonious and there cannot be significant model-data discrepancies. Only then, parameters can be interpreted. In particular when there are significant model-datadiscrepancies,researchersmodify the model,trying to improve model fit. The role played by cell-specific large or small residualsis that of guiding model improvement.This processof model testing and modifying is repeateduntil an acceptableand interpretablemodel is found or until the model is rejected. We note that log-linear modeling sometimesimplies testing severalmodels before one model is retained. In contrast,the typical CFA application usesonly one basemodel. When significant model-datadiscrepanciesexist, they are interpreted in terms of types and antitypes.The basemodel is not changedbecauseof the existence of types and antitypes. If a different base model is considered then either with the goal of identifying the reasonswhy types and antitypes
Appendix A: Log-linear modeling exist or to test additional hypotheses. We seefrom this brief discussionthat log-linear modeling andCFA pursue different goals. However, the two methods can also be used in tandem.Here are two samplescenarios.
(1)
(2)
Explaining types and antitypes. The existence of types and antitypes can be explained using substantive arguments. For example, one can explain the antitype that is constituted by the configuration depressed + happy-go-lucky as logical and confirming thesetwo concepts.In the contextof test construction, this antitype could be considered one of the indicators of instrument validity. In addition to substantivearguments,one can ask whether types and antitypes reflect variable interactions. To determinewhich interactionsexist, onecango two routes.The first route involves specifying a different, typically more complex CFA basemodel. For instance,one can move from a global first order CFA to a global secondorder CFA. If the new basemodel makes all types and all antitypes disappear, they can be considered explained by the effects included in the base model. It may not always be possible to explain all types and antitypes this way, becausethe selection of CFA basemodels underlies restrictions (see Section 2.5) which exclude models that are possible and can be meaningful in the context of a log-linear analysis. The second route involves fitting log-linear models.The result of this effort is a log-linear model that describesthe data well, that is, without significant model-data discrepancies.There can be no types or antitypes for a well-fitting model. Regardlessof whether the first or the second route is taken, log-linear modeling and CFA complementeachother in the sensethat log-linear modeling can lead to an explanationof types and antitypesthat usesmodelsthat do not belongto the classof CFA basemodels(Lehmacher,2000). Explaining interactions in log-linear models.Considera researcher that hasfound a well- fitting log-linear model. This researchermay then ask whether a finer-grained analysis could help identify the sectorsin the cross-classificationthat carry the effects. Oneway of answeringthis question is employing CFA to the model that does not contain the significant effects (if possible, see above). The resulting types and antitypes will tell this researcherwhere the variable interactionsare the strongest(or exist at all).
Annendix A: Log-linear modeling Conclusions.It seemsperfectly all right to only employ log-linear modeling when variable-centeredquestionsneedto be answered,andto employ only CFA when the focus of analysisis purely person-centered.However, there are many reasonswhy methods of analysis can be employed in tandem. This applies to both log-linear modeling and to CFA. In addition, this applies to Bayesianmethodsof typal analysisandto cell-directed methods of model modification as implementedin SYSTAT. Whatever method of categorical data analysis is employed,other methodscan help researchers round out the picture. Thus, variable-centeredmethods can be used to bolster person-or cell-oriented results in termsof variable relationships.In turn, CFA can be usedto add the personperspectiveto variable-centered analyses.
Appendix B Table of a*-levels for the Bonferroni and Holm adjustments t indicates either the total number of cells (for Bonferroni protection of a; seeLienert, Ludwig, dzRockefeller, 1982)or the remainingnumberof tests (for Holm protection of cc).
t 380 379 378 377 376 375 374 373 372 371 370 369 368 367 366 365 364 363 362 361 360 359 358 357 356 355
t
a;,, 0.0001316 0.0001319 0.0001323 0.0001326 0.0001330 0.0001333 0.0001337 0.0001340 0.0001344 0.0001348 0.0001351 0.0001355 0.0001359 0.0001362 0.0001366 0.0001370 0.0001374 0.0001377 0.0001381 0.0001385 0.0001389 0.0001393 0.0001397 0.0001401 0.0001404 0.0001408
0.0000263 0.0000264 0.0000265 0.0000265 0.0000266 0.0000267 0.0000267 0.0000268 0.0000269 0.0000270 0.0000270 0.0000271 0.0000272 0.0000272 0.0000273 0.0000274 0.0000275 0.0000275 0.0000276 0.0000277 0.0000278 0.0000279 0.0000279 0.0000280 0.0000281 0.0000282
354 353 352 351 350 349 348 347 346 345 344 343 342 341 340 339 338 337 336 335 334 333 332 331 330 329
a;,, 0.0001412 0.0001416 0.0001420 0.0001425 0.0001429 0.0001433 0.0001437 0.0001441 0.0001445 0.0001449 0.0001453 0.0001458 0.0001462 0.0001466 0.0001471 0.0001475 0.0001479 0.0001484 0.0001488 0.0001493 0.0001497 0.0001502 0.0001506 0.0001511 0.0001515 0.0001520
ar;fls 0.0000282 0.0000283 0.0000284 0.0000285 0.0000286 0.0000287 0.0000287 0.0000288 0.0000289 0.0000290 0.0000291 0.0000292 0.0000292 0.0000293 0.0000294 0.0000295 0.0000296 0.0000297 0.0000298 0.0000299 0.0000299 0.0000300 0.0000301 0.0000302 0.0000303 0.0000304
Appendix B: BonferroniEIolm a*-levels t
328 327 326 325 324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307 306 305 304 303 302 301 300 299 298 297 296 295
a;,, 0.0001524 0.0000305 0.0001529 0.0000306 0.0001534 0.0000307 0.0001538 0.0000308 0.0001543 0.0000309 0.0001548 0.0000310 0.0001553 0.0000311 0.0001558 0.0000312 0.0001563 0.0000313 0.0001567 0.0000313 0.0001572 0.0000314 0.0001577 0.0000315 0.0001582 0.0000316 0.0001587 0.0000317 0.0001592 0.0000318 0.0001597 0.0000319 0.0001603 0.0000321 0.0001608 0.0000322 0.0001613 0.0000323 0.0001618 0.0000324 0.0001623 0.0000325 0.0001629 0.0000326 0.0001634 0.0000327 0.0001639 0.0000328 0.0001645 0.0000329 0.0001650 0.0000330 0.0001656 0.0000331 0.0001661 0.0000332 0.0001667 0.0000333 0.0001672 0.0000334 0.0001678 0.0000336 0.0001684 0.0000337 0.0001689 0.0000338 0.0001695 0.0000339
t
294 293 292 291 290 289 288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271 270 269 268 267 266 265 264 263 262 261
0.0001701 0.0001706 0.0001712 0.0001718 0.0001724 0.0001730 0.0001736 0.0001742 0.0001748 0.0001754 0.0001761 0.0001767 0.0001773 0.0001779 0.0001786 0.0001792 0.0001799 0.0001805 0.0001812 0.0001818 0.0001825 0.0001832 0.0001838 0.0001845 0.0001852 0.0001859 0.0001866 0.0001873 0.0001880 0.0001887 0.0001894 0.0001901 0.0001908 0.0001916
al; ns 0.0000340 0.0000341 0.0000342 0.0000344 0.0000345 0.0000346 0.0000347 0.0000348 0.0000350 0.0000351 0.0000352 0.0000353 0.0000355 0.0000356 0.0000357 0.0000358 0.0000360 0.0000361 0.0000362 0.0000364 0.0000365 0.0000366 0.0000368 0.0000369 0.0000370 0.0000372 0.0000373 0.0000375 0.0000376 0.0000377 0.0000379 0.0000380 0.0000382 0.0000383
Appendix B: BonferronVHolm a*-levels t
ain,
260 259 258 257 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227
0.0001923 0.0001931 0.0001938 0.0001946 0.0001953 0.0001961 0.0001969 0.0001976 0.0001984 0.0001992 0.0002000 0.0002008 0.0002016 0.0002024 0.0002033 0.0002041 0.0002049 0.0002058 0.0002066 0.0002075 0.0002083 0.0002092 0.0002101 0.0002110 0.0002119 0.0002128 0.0002137 0.0002146 0.0002155 0.0002165 0.0002174 0.0002183 0.0002193 0.0002203
0.0000385 0.0000386 0.0000388 0.0000389 0.0000391 0.0000392 0.0000394 0.0000395 0.0000397 0.0000398 0.0000400 0.0000402 0.0000403 0.0000405 0.0000407 0.0000408 0.0000410 0.0000412 0.0000413 0.0000415 0.0000417 0.0000418 0.0000420 0.0000422 0.0000424 0.0000426 0.0000427 0.0000429 0.0000431 0.0000433 0.0000435 0.0000437 0.0000439 0.0000441
t
aio,
al n5
226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193
0.0002212 0.0002222 0.0002232 0.0002242 0.0002252 0.0002262 0.0002273 0.0002283 0.0002294 0.0002304 0.0002315 0.0002326 0.0002336 0.0002347 0.0002358 0.0002370 0.0002381 0.0002392 0.0002404 0.0002415 0.0002427 0.0002439 0.0002451 0.0002463 0.0002475 0.0002488 0.0002500 0.0002513 0.0002525 0.0002538 0.0002551 0.0002564 0.0002577 0.0002591
0.0000442 0.0000444 0.0000446 0.0000448 0.0000450 0.0000452 0.0000455 0.0000457 0.0000459 0.0000461 0.0000463 0.0000465 0.0000467 0.0000469 0.0000472 0.0000474 0.0000476 0.0000478 0.0000481 0.0000483 0.0000485 0.0000488 0.0000490 0.0000493 0.0000495 0.0000498 0.0000500 0.0000503 0.0000505 0.0000508 0.0000510 0.0000513 0.0000515 0.0000518
Atwendix B: BonferronihIolm a*-levels t
aLo,
192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159
0.0002604 0.0002618 0.0002632 0.0002646 0.0002660 0.0002674 0.0002688 0.0002703 0.0002717 0.0002732 0.0002747 0.0002762 0.0002778 0.0002793 0.0002809 0.0002825 0.0002841 0.0002857 0.0002874 0.0002890 0.0002907 0.0002924 0.0002941 0.0002959 0.0002976 0.0002994 0.0003012 0.0003030 0.0003049 0.0003067 0.0003086 0.0003106 0.0003125 0.0003145
a&, 0.0000521 0.0000524 0.0000526 0.0000529 0.0000532 0.0000535 0.0000538 0.0000541 0.0000543 0.0000546 0.0000549 0.0000552 0.0000556 0.0000559 0.0000562 0.0000565 0.0000568 0.0000571 0.0000575 0.0000578 0.0000581 0.0000585 0.0000588 0.0000592 0.0000595 0.0000599 0.0000602 0.0000606 0.0000610 0.0000613 0.0000617 0.0000621 0.0000625 0.0000629
t
a,*,,
158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125
0.0003165 0.0003185 0.0003205 0.0003226 0.0003247 0.0003268 0.0003289 0.0003311 0.0003333 0.0003356 0.0003378 0.0003401 0.0003425 0.0003448 0.0003472 0.0003497 0.0003521 0.0003546 0.0003571 0.0003597 0.0003623 0.0003650 0.0003676 0.0003704 0.0003731 0.0003759 0.0003788 0.0003817 0.0003846 0.0003876 0.0003906 0.0003937 0.0003968 0.0004000
0.0000633 0.0000637 0.0000641 0.0000645 0.0000649 0.0000654 0.0000658 0.0000662 0.0000667 0.0000671 0.0000676 0.0000680 0.0000685 0.0000690 0.0000694 0.0000699 0.0000704 0.0000709 0.0000714 0.0000719 0.0000725 0.0000730 0.0000735 0.0000741 0.0000746 0.0000752 0.0000758 0.0000763 0.0000769 0.0000775 0.0000781 0.0000787 0.0000794 0.0000800
Appendix B: BonferroniIHolm c&levels
t 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91
a;(-),
a;ns
t
0.0004032 0.0004065 0.0004098 0.0004132 0.0004167 0.0004202 0.0004237 0.0004274 0.0004310 0.0004348 0.0004386 0.0004425 0.0004464 0.0004505 0.0004545 0.0004587 0.0004630 0.0004673 0.0004717 0.0004762 0.0004808 0.0004854 0.0004902 0.0004950 0.0005000 0.0005051 0.0005102 0.0005155 0.0005208 0.0005263 0.0005319 0.0005376 0.0005435 0.0005495
0.0000806 0.0000813 0.0000820 0.0000826 0.0000833 0.0000840 0.0000847 0.0000855 0.0000862 0.0000870 0.0000877 0.0000885 0.0000893 0.0000901 0.0000909 0.0000917 0.0000926 0.0000935 0.0000943 0.0000952 0.0000962 0.0000971 0.0000980 0.0000990 0.0001000 0.0001010 0.0001020 0.0001031 0.0001042 0.0001053 0.0001064 0.0001075 0.0001087 0.0001099
90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57
c&), 0.0005556 0.0001111 0.0005618 0.0001124 0.0005682 0.0001136 0.0005747 0.0001149 0.0005814 0.0001163 0.0005882 0.0001176 0.0005952 0.0001190 0.0006024 0.0001205 0.0006098 0.0001220 0.0006173 0.0001235 0.0006250 0.0001250 0.0006329 0.0001266 0.0006410 0.0001282 0.0006494 0.0001299 0.0006579 0.0001316 0.0006667 0.0001333 0.0006757 0.0001351 0.0006849 0.0001370 0.0006944 0.0001389 0.0007042 0.0001408 0.0007143 0.0001429 0.0007246 0.0001449 0.0007353 0.0001471 0.0007463 0.0001493 0.0007576 0.0001515 0.0007692 0.0001538 0.0007813 0.0001563 0.0007937 0.0001587 0.0008065 0.0001613 0.0008197 0.0001639 0.0008333 0.0001667 0.0008475 0.0001695 0.0008621 0.0001724 0.0008772 0.0001754
Appendix B: BonferroniMolm a*-levels t
56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29
t
a;,, 0.0008929 0.0009091 0.0009259 0.0009434 0.0009615 0.0009804 0.0010000 0.0010204 0.0010417 0.0010638 0.0010870 0.0011111 0.0011364 0.0011628 0.0011905 0.0012195 0.0012500 0.0012821 0.0013158 0.0013514 0.0013889 0.0014286 0.0014706 0.0015152 0.0015625 0.0016129 0.0016667 0.0017241
0.0001786 0.0001818 0.0001852 0.0001887 0.0001923 0.0001961 0.0002000 0.0002041 0.0002083 0.0002128 0.0002174 0.0002222 0.0002273 0.0002326 0.0002381 0.0002439 0.0002500 0.0002564 0.0002632 0.0002703 0.0002778 0.0002857 0.0002941 0.0003030 0.0003125 0.0003226 0.0003333 0.0003448
28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2
a;,, 0.0017857 0.0018519 0.0019231 0.0020000 0.0020833 0.0021739 0.0022727 0.0023810 0.0025000 0.0026316 0.0027778 0.0029412 0.0031250 0.0033333 0.0035714 0.0038462 0.0041667 0.0045455 0.005 0.0555556 0.00625 0.0071438 0.0083333 0.01 0.0125 0.0166667 0.025
0.0003571 0.0003704 0.0003846 0.0004000 0.0004167 0.0004348 0.0004545 0.0004762 0.0005000 0.0005263 0.0005556 0.0005882 0.0006250 0.0006667 0.0007143 0.0007692 0.0008333 0.0009091 0.001 0.0011111 0.00125 0.0014286 0.0016667 0.002 0.0025 0.0033333 0.005
Abramowitz, M. 205,242,244 298 AbuShaba, R. Achterberg, C. 298 23,351,401 Agresti, A. Aksan, A. 6, 13, 18, 152, 191, 193,327 Anastasi, A. 6, 173 Anscombe, F.J. 11,62,66,70,76, 78,81,83,84,378 174 Barth, A.-R. Bartholomew, D.J. 195,346 Bartoszyk, G.D. 220 Benjamini, Y. 87,90 Bergman, L.R. iii, 6,54,55,56, 81, 125, 173,203,274,338, 340,361 Bierschenk, B. 230 Bilalbegovic, A. 342 Bishop, Y.M.M. 152,153,286 Bollen, K.A. 296 Bonhoeffer, K. 293 Bowker, A.H. 262 Brandtstidter, J. 296,300,305,323, 334,335 Browne, M.W. 84 Cairns, R.B. 203 Campbell, D.T. 296 Carlin, J.B. 353,354 Cattell, R.B. 266 146, 154 Chipuer, H. Christensen, R. 19,23,32,63,390, 401 11 Church, C. Clark, R. 6, 13, 18, 154, 191, 193,327 Clinton, W.J. 34-37 Clogg, C.C. 53, 174, 195,205, 286,379 Cohen, J. 187 Cohn, R. 281,282 Cook, T.D. 296 Copenhaver, M.D. 90,93,94 Cribbie, R. 87
Darlington, R.B. 294 Delaney, H.D. 248,316 Diggle, P.J. 1 Dowling, E. 342 DuMouchel, W. 11,48,97,99, 100, 379,384 Dunkl, E. 350 Dunnett, C.W. 87 ElKhouri, B.M. 173,338,361 Erlbaum, L. ‘*’ 111 Essex, M. J. 6, 13, 18, 153, 191, 193,327 Ever&t, B.S. 66,353 Evers, M. 19 Feger, H. 281 Feller, W. 54,55 Fienberg, S.E. 59, 138, 152, 153, 154,286,323,324 Finkelstein, J. 94, 115, 116,339 Fisher, R.A. 244 Fitzgerald, H.E. 327, 331 Fleischmann, U.M. 136, 159 Foster, F.M. 167-170 Friendly, M. 330 Funke, J. 164, 166, 167 Fur&e, S. 361,371 Fur&e, W. 164, 166, 167 Gabriel, K.R. 89 Gebert, A. 28,106 Gehnan, A. 353,354 Gliick, J. 122,309,3 17, 3 18, 320,327,377 Goldsmith,H.H. 6, 13, 18, 154, 191, 193,327 Goldstein, H.I. 248 Gonzales-Deb&, A. 179,223-225, 379 Goodman,L.A. 20,180,181,183, 186,378 Gorham, D.R. 259 Gortelmeyer, R. 13,219,302,342, 357-359 1,305 Gottlieb, G.
439
Author Index 309-3 11 Graham, P. Greenacre, M.J. 125 Gutierrez-Pefia, E. ii, iii, 11,48,97, 99, 100, 102, 109,317,347, 356-358,360,371,372,376, 377,384 Haberman, S.J. 62,63 Hammond, SM. 361 Hanks, P. 11 Hartigan, J.A. 125,330 Havranek, T. 12,21, 118, 136, 139,238 Hawkins, M.M. 157 294 Hayes, A.F. Hebben, H. 361 Heilrnann, W.-R. 1,42,58, 139 Hennig, J. 345 87,90,91,94,95 Hochberg, Y. Holland, B. 87 Holland, B.S. 90,93,94 Holland, P.W. 152,153,286,296 Helm, S. 88,91-94,268,270 Hommel, G. 64, 86,87,90-95, 107, 112,119 Horn, J.L. 167-170 Hu, T.-C. 54 Huberty, C.J. 87, 89,90 Hume, D. 295 Hussy, W. 241,246,251,252 Htitter, U. 137,151 6, 13, 18, 154, 191, Hyde, J.S. 193,327 Indurkhya, A. 5,6,30,82,83, 272,274,277,323 Ising, M. 236,237 Jacobs, A. 249 Jacobson, L.P. 102 Jar&e, W. 236,237 Jobson, J.D. 32 Jones, L.V. 87 Jiireskog, K.G. 142,256,342 Kause, B. 176 Keenan, D.P. 298
Keselman, H.J. 87 Keuchel, I. 230 Khamis, H.J. 394,396,397 Kieser, M. ii, 279,347-353, 356,369,370 Kimball, A.W. 159 Kirk, R.E. 244 Klein, M.H. 6, 13, 18, 154, 191, 193,327 Kleiner, B. 330 Klingenspor, B. 106,111, 115 Knott, M. 195,346 Koehler, K.J. 66 Kohnen, R. 118, 195 Kotze, P.J.V. 157 Krause, B. 176 Krauth, J. 41,42,48,56,81, 85, 88, 125, 126, 128, 139, 153, 154, 173, 177, 187, 212,214,216,231,234, 241,251,254,287,361, 375,376,385 Krebs, H. 236,237 Kreppner, K. 5,6, 19,20,272, 274,277,323,338 Kris-Ether-ton, P.M. 298 Kristof, W. 294 Kruger, H.-P. 28,106 Kiichenhoff, H. 62,64,65,66,70, 71,76,78,79,81,95, 166, 267,270,302,378,383 Kutner, M.H. 15,242 Lange, H.-J. ’ Lamtz, K. ti6 Lautsch, E. 64, 189,225,345, 361 Lehmacher, W. i, iii, 62,63-66,70, 71,76,78,79, 81, 83, 84, 86, 87, 90-95, 107, 112, 115, 117, 166,267,270,274, 302,340,343,348,352, 378,383,394,430,43 1 Lemer, J.V. 296,305,306
Author Index Lemer, R.M. 296,305,306,342 Li, J. 89,90,91 Liang, K.-Y. 1 Lickliter, R. 305 Lienert, G.A. i, iii, 1, 5, 12, 21, 22,28,4 l-43,48,56,63,64, 77, 85, 88, 106, 110, 118, 125, 126, 128, 136, 139, 143, 153, 154, 159, 160, 164, 166, 167, 173, 174, 177, 187, 189, 190, 195, 197,212,214,216,220, 225,230,23 1,234,238, 241,25 1,254,256,257, 259,264,287,293,294, 334-336,349,350,375,376, 378,379,383,385,391,433 Lindner, K. 1,62,64,79 Lindsay, B.G. 179,379 Ludwig, 0. 1,433 Macht, M. 236,237 Magnusson, D. iii, 6, 173,274 Mahoney, J.L. 149, 150, 178, 183, 184,342 Maly, V.42, 137 Manning, W.D. 53 Marcus, R. 90 Marsiske, M. 106,111,115 Maxwell, A.E. 287 Maxwell, S.E. 248,3 16 McCluskey, E.J. 334 Meehl, P.E. 49, 124 Mellenbergh, G.J. 3, 143, 148,396 MCndez Ramfrez, I. 179,379 Metzler, P. 176 Molenaar, W. 57,58 Miiller, M. J. 173 Mtiller, U. 139,153 Mun, E.-Y. iii, 327, 33 1 Nachtsheim, C.J. 15,242 Namboodiri, N.K. 19 Naud, S.J. 57, 58,66,67,69, 78,92
Nessehoade, J.R. 24 1,249,377 Neter, J. 15,242 Netter, P. 118, 160, 159, 173, 195, 197,198,212,345,378 Niedermeier,K.E. 8, 14,204,217, 309,327,401 Nilsson, L.-G. 203 Ninke, L. 345 Nyborg, H. 345 Nystedt, L. 203 Ohannessian, C.M. 306 Olejnik, S. 87, 89,90 Osterkom, K. 56,78 Overall, J.E. 259 Paulsen, S. 273 Pearson, K. 15 Peritz, E. 90 Perli, H.-G. 64,86, 87,90-95, 107, 112,119,349 Petkova, E. 286 Pfaundler, H. i Planer, J. “’ 111 Preece, M.A. 94,115, 116,339 Pruchno, R. 249 Puttler, L.I. 327,33 1 Quine, W.V.O. 334 Rey, E.-R. 139 Riegert, D. iii Riley, J.W. Jr. 281,282 Riley, M. W. 281,282 Ripley, B.D. 362,371 Rockefeller, K. 433 Rohner, R.P. 306 Rohrmann, S. 345 Rosenthal, R. 187 Rovine, M. J. iii, 11,78,79, 145, 173, 180, 182, 183,186, 187, 189 Rubin, D.B. 187 Rubin, D.B. 353,354 Rudas, T. 179, 182, 186,248, 3 16,379 Rudolph, J. 195
Author Index 70,72 Schmitt, N. Schneider, J. 28 Schneider-Dtiker, M. 225,226 Schumacker, R.E. 189 iii, 27,32,33,43, Schuster, C. 139, 140, 142, 143, 145, 152,285,296,376,377 Schtitt, W. 1,58 Schiitze, Y. 273 246 Selder, H. Shaffer, J.P. 86 Shapiro, A. 84 Sher, K. ii, 347,354 Shihadeh, E.S. 286 Sidak, Z. 89 Simes, R.J. 88 Smider, N.A. 6, 13, 18, 154, 191, 193,327 Sobel, M.E. 296 S&born, D. 142,256,342 Spiel, C. 11,81, 145, 173, 180, 182, 183, 186, 187, 189,262,378,390,394 Stegmiiller, W. 296 Stegun, I.A. 205,242,244 Steiger, J.H. 85 Stemmler, M. 214 Stern, H.S. 353,354 Stern, W. 536 Stevens, S.S. 205 Stirling, 84 Straube, E. 2 15,259,264 Supapattathum, S. 87,89,90 296 Suppes, P. Tan-mane,A.C. 87 Taylor, C.S. 342 Thompson, K.N. 189 Toby, J. 281,282 Toll, C. 345 Tukey, J.W. 87 Upton, G.J.G. 378 Vandell, D.L. 6, 13, 18, 154, 191, 193,327
248,3 16 Vargha, A. Velleman, P.F. 205 Venables, W.N. 362,371 Victor, N. ii, 279,347-353, 356,369,370 Villaruel, F.A. 342 Vogel, F. 189 i Vogel, T. ii, iii, 5, 6, 8, 11, von Eye, A. 14, 15, 19,20,27,28,30, 33,33,41-43,48,54-56,64, 78,79,81-84,94,96,99, 101, 102, 106, 109, 111, 115, 118, 124, 136-141, 143, 146, 154, 159, 173, 174, 180, 182, 183, 186, 187, 189, 195,204,214,217, 219,230,236,237,241, 246,247,25 1,252,254, 256,257,262,267,272, 274,277,285,294,296, 298,300,305,306,309, 3 17,3 18,323,326,327, 33 1,334,335,339,342, 347,350,354,356-358,360, 362,371,374,395,376-378, 384,390,394 von Eye, D. iii von Eye, J. **’ 111 von Eye, M. iii von Eye, V. “’ 111 von Neumann, J. 236 von Sehr, L. * von Weber, S. ;25,361 Wahisten, D. 305 Wanberg, K. W. 167- 170 Wang, C.M. 330 Ward, J.H. 110,339 Wasserman, W. 15,242 Weijers, H.-G. 236,237 Wermuth, N. 154 Wertheimer, M. 294 WeI3els,H. 19,20
Author Index Weyers,P. Wickens, T.D. Wilkinson, L. Williams, V.S.L. Wills, S.D. Wise, M.E. Wolfium, c.
236,237 280 52,53,205,362 87 102. 66 139,153
Wood, K.P. ii, 81,345, 347, 354,378 Yang, M.C. 351 Yates, F. 64,244 1 Zeger, S.L. Zerbe, G.O. 254 Zucker, R.A. 327,33 1 zur Oeveste,H. 2 14
This page intentionally left blank
Subject Index 2-sampleCFA 173, 189,239,264 and P-CFA 174 alternatives178- 186 binomial effect size BES 187189, 194 correlation p 180, 182- 186, 201 basemodel 174,177, 178 comparisonof tests 177 discrimination types 174, 194, 202,223,239,255 Gonzales-Deb&r’sn* 179, 180,200,201,223 - 225,238 240,255 measuresof non-independence 182 non-weightedinteraction h 181 - 186,201 odds ratio 8 180, 182 - 186, 201,223,224,275,237,238 of differences22 1 - 225,254, 255 of polynomial parameters254, 255 original approach173 - 178 relative difference A 181 - 186, 201 weighted interactioni 181 186,201 aggregatingresults 334 - 337 Anscombe’sz 62,66,70 - 81,83, 84, 107, 112, 119, 197,231, 249 antitype seetype association,local 12, 18,21, 118 binomial test 48 - 53,60,67 - 75,78 - 81, 122, 132, 155, 161163,185,221,225,226
-
-
conservative49 deMovre-Laplace approximation55,56 exact49 normal approximation,55, 66,70 - 78, 178, 179, 185, 254,3 18,320 other approximations57, 58 Stirling approximation54, 70-81
Bonferroni czprotection 87, 88, 93, 95, 107, 122, 132, 150, 155, 157, 161, 162, 163, 167, 178, 186, 191, 197,200, 22 1,226,23 1,234,237, 239,249,254,272,275, 282,287,298,302,306, 311,313,318,320,336, 340,343,348,352 basemodel 3,8, 12,22,23 1,3 10, 311 admissibility 27 - 3 1,44, 234 global 40 - 41, 105 - 124, 141, 142 grouping 40 - 43 hierarchy 105, 106 log-linear 19 - 27 regional 41 - 43, 125 - 172 selection43 - 45 wrong choice 37 - 40 BayesianCFA 353 - 360,371 - 374 4 steps353,354 fast order 357,358 patternsof types/antitypes 356,357 posterior distribution 355 priors 354 causality295 - 309 criteria 295,296 445
SubiectIndex fork 301,302,305 reciprocal causation305,308 wedge296,297,300 seeZero Order CFA basic concepts1 confirmatory 124,258,259, 349,350,351,359 5 steps8 - 13, 14 - 18 exploratory 14, 123,258 goals vs. log-linear modeling 21 CFA of differences205 - 228 and estimationof expected frequencies2 16 a priori probabilities 2 16 - 220, 222,223 and polynomials 208 ascendingdifferences206,207 descendingdifferences206, 207 equidistance207 fast differences206,207,212 -218,226,227 higher order differences207, 213 -215,227,228 identification of errors 209 211 method of differences205 211,212 - 228 seconddifferences206,207, 212 - 219,227,228 selectionof basemodels227 CFA of level, variability, and slope of series229 - 277 Chi*-test58 - 62,66 - 84, 107, 112, 120, 151, 161, 176, 186, 187, 193, 197,223,225,234,
238,239,249,262,264, 272,287 comparisonwith 2 60 - 62 Krause-Metzler approximation 176 normal approximation59 62,70-84,176,177,185 with continuity correction 176,200 cluster analysis338 - 340,345, 346 collapsibility 151,286 computerprograms36 1 - 399 CFA 2002 374 - 399 conditional CFA 152, 153 Configural FrequencyAnalysis seeCFA Configuration (def.) 2 vs. profile 5 correlation patterns265 - 268 covariates309 - 323 categorical309 - 3 16 continuous3 16 - 323 maximumnumber 3 17 Delta option 53 descriptivemeasures comparisonwith X 99 - 104 for global CFA 97 - 104 designmatrix 8,23 - 25, 107, 111,117,119, 120, 133, 178, 196,272,282,402, 405
Subiect Index indicator matrix 8 deviation from independence8, 10, 13 Goodman’s3 elementary views 180 - 186 marginal-dependencevs. marginal-free 11 differential psychology6,7 discriminant analysis342,344 - 346 first order CFA 18,22,28,29,36, 41,62,110 - 115,123, 144, 146, 185, 191,226, 227,235,249,275,302, 321,322,325,327,337, 338,343 and zero order CFA 112 of differences220,221 Fisher’s exacttest 175, 176,20 1, 264 graphical display of CFA results326 - 333 bar charts327 - 330 mosaicdisplays330 - 333
Hommel u-protection 90 - 93, 107, 112,119 hypergeometrictests62 - 65 Lehmacher62 - 64,66,70 81,83,84, 115, 191,274, 340,343,348,352 w. Kuchenhoff s continuity correction 64 - 66,70 - 81, 95,167,267,268,270,302 Interaction StructureAnalysis (ISA) 41,42, 125 - 139, 285,287 3 or more groups 136 - 139 and k-sampleCFA 195 - 202 and P-CFA 139, 140,149 152 basemodel 127, 130, 136, 195- 197 generalizedISA 42 groupings 127,129,130, 137, 138 higher order interactions 126 of shifts in location 236 I-Statesas ObjectsAnalysis (ISOA) 338 - 340 jack-knife methods183, 185
groups of cells
seepatternsof types and antitypes
higher order CFA
k-sampleCFA 43, 173 - 202,285 and ISA 195 - 202
143,145 Kimball’s equation 159, 163
Hochberga-protection 89,90,94 Holland & Copenhaveraprotection 90 Helm a-protection 88, 89,93, 115, 268,270
level and trend in a series240 255 cubic trend 24 1,246 linear trend 24 1,246,247 quadratictrend 24 1,246, 247,254
448
Subiect Index
log-linear modeling 9,26 - 28,38, 52, 106, 107, 117, 119, 122,130,131,139 - 142, 177,178,195 - 197,271, 272,281,310,311,317, 325,338,345,346,350, Appendix A generalmodel 19 quasi-independencemodel 281,347,349 vs. CFA 430 - 432 log P
97,98,108,114,134,135 comparedwith x and RR 99 104
longitudinal CFA 203 - 277 time series203,204 main effect model seefirst order CFA Meehl’s paradox49 - 52,124
PersonOrientation 1,6,45, 173, 203 5 propositions6,7 vs. Variable Orientation 45, 155 polynomials 241 - 255 and 2-sampleCFA 254 and regression246,247 approximation242,243 degree241,242 equidistantpoints 24 1,244 interpretation246 non-equidistantpoints 24 1, 251 - 254 orthogonal243 - 247 parameters241,242,244, 245,247,25 1 - 253 power 60,3 16 and selectionof CFA tests 65 - 69 differential 81 - 85 Naud’s simulations66 - 69
mosaicdisplays seegraphical display multivariate distances265,268 271 null hypothesis47,48, 173 null model
seeZero Order CFA
ordinal variablesin CFA 323 - 326 parsimony 31 and CFA basemodels284 293 patternsof types and antitypes293 - 295 P-CFA seePrediction CFA
Prediction CFA9,23 - 25,42,43, 127,139 - 172,189,191, 285,297,298,301,302, 340 basemodels 140- 146, 151, 152,161 biprediction 159- 164,255, 257 conditional 151 - 157 directed variable relations 142 prediction coefficients 164 172 vs. 2-sampleCFA 174 vs. ISA 139,140, 149 - 152 protecting a 12 (seealso: Bonferroni,
Subiect Index Hochberg,Holland & Copenhaver,Holm, Hommel et al.) comparisonof methods91 - 95 local, global, multiple level 86 methods 85 - 99 relaxedprotection 87
conservativevs. nonconservative65,67 dependenttests 12,85 multiple testing 86 selectionfor global CFA 78 -81 sparsetables69 - 75
Relative Risk (RR) 10, 11,97,98, 108,114,134,135 comparedwith y 99 - 104
SPSS@15,290 Spluse362,371 - 374
samplingscheme9,13,3 1,43 implications for CFA 34 - 40 multinomial9,3 1,32,40,62, 64,67, 114, 138, 141, 143, 286,289,354 product multinomial 10,3 1, 33,34,40,62,67, 114, 138, 141,143,286,289,354
Stouffer test 294 structuralzeros 117 in CFA 280 - 284 SYSTAT=’15,52,53,242,353,362 - 371
SAS@ 15,353,361
third order CFA 121 - 124
secondorder CFA 41, 118- 121, 121,142,143,285
transformations2 14 - 2 16 and size oftable 214,215
seriesthat differ in length 256 259 criteria 256,257
treatmenteffects 259 - 265 diagonal-half sign test 26 I, 262 pre-postdesigns259 - 262 with control group 259,263 - 265
shifts in location 229 - 236 anchors230 size of table 230,23 1 transformations230 significancetests 11 (seealso Anscombe’sz, binomial test, Fisher’s exact test, hypergeometrictests, Lehmachertest,z-test, Pearson,protecting ff, p-test, 2 approximations) capitalizing on chance12,86
two-sampleCFA 11 typeiantitype3,7,9 correlation type 183 discrimination type 4 interpretation 12,27,44, 81 interactiontype 183 weighted interactiontype 183
Subiect Index unidimensionalCFA 271 - 274 basemodel 27 1,272
within-individual CFA 274 - 277 basemodel 275
validity, external 13,342
y-test
variable independence
seefirst order CFA
variability in a series236 - 240 transformations237 von Neumann’svariance236, 265 Victor and Kieser’s CFA 347 353 stepwisesearch352,353
seeChi2-test
zero order CFA (CCA; null model) 10,27,29,40,41, 106 110, 114, 115, 144,226, 227,272,274 vs. cluster analysis 108 vs. First Order CFA 112 z-test66,70 - 76, 114, 147, 150, 157, 185,201,225,282, 298,306,311,313,336