New Frontiers in Comparative Sociology
International Studies in Sociology and Social Anthropology Series Editor
Davi...
45 downloads
1369 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
New Frontiers in Comparative Sociology
International Studies in Sociology and Social Anthropology Series Editor
David Sciulli, Texas A&M University Editorial Board
Vincenzo Cicchelli, Cerlis, Paris Descartes-CNRS Benjamin Gregg, University of Texas at Austin Carsten Q. Schneider, Central European University Budapest Helmut Staubmann, University of Innsbruck
VOLUME 109
New Frontiers in Comparative Sociology Edited by
Masamichi Sasaki
LEIDEN • BOSTON 2009
The contents of this volume has previously been published in Volumes 1–6.3 of Brill’s journal Comparative Sociology. This book is printed on acid-free paper. Library of Congress Cataloging-in-Publication Data New frontiers in comparative sociology / edited by Masamichi Sasaki. p. cm. — (International studies in sociology and social anthropology ; 109) Includes bibliographical references and index. ISBN 978-90-04-17034-6 (hardback : alk. paper) 1. Sociology. I. Sasaki, Masamichi S. II. Title. III. Series. HM585.N459 2008 301—dc22
2008034492
ISSN 0074-8684 ISBN 978 90 04 17034 6 Copyright 2009 by Koninklijke Brill NV, Leiden, The Netherlands. Koninklijke Brill NV incorporates the imprints Brill, Hotei Publishing, IDC Publishers, Martinus Nijhoff Publishers and VSP. All rights reserved. No part of this publication may be reproduced, translated, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission from the publisher. Authorization to photocopy items for internal or personal use is granted by Koninklijke Brill NV provided that the appropriate fees are paid directly to The Copyright Clearance Center, 222 Rosewood Drive, Suite 910, Danvers, MA 01923, USA. Fees are subject to change. printed in the netherlands
CONTENTS Introduction ................................................................................ Masamichi Sasaki
1
PART ONE
METHODS IN COMPARATIVE SOCIOLOGY Strategies in Comparative Sociology ......................................... Mattei Dogan Methods for Assessing and Calibrating Response Scales across Countries and Languages ............................................ Tom W. Smith, Peter Ph. Mohler, Janet Harkness, and Noriko Onodera
13
45
PART TWO
RADICAL SOCIAL CHANGE The Transition to Capitalism in China and Russia .................. Erich Weede Social Structure and Personality during the Process of Radical Social Change: A Study of Ukraine in Transition .... Melvin L. Kohn, Valeriy Khmelko, Vladimir I. Paniotto, and Ho-fung Hung
97
119
PART THREE
VALUES, CULTURE AND DEMOCRACY A Theory of Cultural Value Orientations: Explication and Applications ............................................................................ Shalom H. Schwartz
173
vi
contents
Islamic Culture and Democracy: Testing the ‘Clash of Civilizations’ Thesis ................................................................ Pippa Norris and Ronald Inglehart
221
The Cultural-Economic Syndrome: Impediments to Democracy in the Middle East .............................................. Brigitte Weiffen
251
PART FOUR
INSTITUTIONS IN COMPARATIVE PERSPECTIVE Running Uphill: Political Opportunity in Non-Democracies ... Maryjane Osa and Cristina Corduneanu-Huci Does a Strong Institution of Religion Require a Strong Family Institution? .................................................................. Kristen R. Heimdal and Sharon K. Houseknecht
277
313
PART FIVE
SOCIAL PROCESSES Globalization and Income Inequality in the Developing World ................................................................... Margit Bussmann, Indra de Soysa, and John R. Oneal
353
English as an International Language in Non-Native Settings in an Era of Globalization ....................................... Masamichi Sasaki, Tatsuzo Suzuki and Masato Yoneda
379
A New Test of Convergence Theory ......................................... Robert M. Marsh
405
Notes on Contributors ................................................................ Index ...........................................................................................
449 457
INTRODUCTION Masamichi Sasaki New Frontiers in Comparative Sociology is a collection of notable papers from the journal Comparative Sociology, gathered from its first six volumes spanning the period 2002–2007. Choosing from among all the journal’s outstanding papers published during that period was, of course, not an easy task. Several members of the journal’s editorial board were polled for their suggestions. This work is the result of that sometimes agonizingly difficult selection process. Equally difficult was selecting a title for this work. Among numerous potential titles, the idea of new frontiers stood out, as it suggested leading-edge work in the burgeoning science of comparative sociology. Indeed, the discipline is literally bursting with astute analyses of a globalizing world in transition. Given that not all scholars and interested laypersons are well acquainted with the topic of comparative sociology, this book was viewed as an opportunity to enlighten otherwise unfamiliar readers about the cogency of comparative sociology to the new world order. New Frontiers in Comparative Sociology has been organized into five parts: Methods in Comparative Sociology; Radical Social Change; Values, Culture and Democracy; Institutions in Comparative Perspective; and Social Processes. Taken together, all the articles in this book serve to highlight one or more aspects of comparative sociology—some theoretical, some methodological, and some substantive. Some compare social entities in subjective, case-study fashion, while others report on rigorous social research and analyses. Thus, all contribute in one form or another to describing the many and varied facets of the exciting “new” science of comparative sociology. Methods in Comparative Sociology There were several reasons for choosing Mattei Dogan’s “Strategies in Comparative Sociology” as the lead chapter of this work. Principal among these is Dogan’s broad stroke portrait of comparative sociology. Dogan details 15 strategies (methods, methodologies, techniques)
2
masamichi sasaki
for conducting comparative sociology, but first he gets at the root of the concept of comparison and what it means in various scientific contexts. In the social sciences, he points out, “There is not a single sociological theory that has not been invalidated in some cases: in the social sciences there are very few paradigms.” This, then, begs varied and abundant methodologies, both in terms of actual scientific research and statistical methods, or techniques, and also in terms of general strategies. To put this in context, he describes some excellent analogies from comparative architecture. Dogan then proceeds to delve into brief but concise descriptions of his 15 strategies. They are: (1) “comparing by replication of single case studies”; (2) “comparison by ideal types and by empirical typologies”; (3) “binary comparison”; (4) “comparing similar countries”; (5) “comparing contrasting countries by functional equivalence”; (6) “conceptual homogenization of a heterogeneous domain”; (7) “worldwide statistical comparisons”; (8) “cross-national comparisons of intra-national diversities”; (9) “longitudinal, diachronic and asynchronic comparisons”; (10) “comparison of causal relationships staggered over time”; (11) “comparison by composite indices”; (12) “comparison by scoring and scaling as a substitute for formal statistics”; (13) “comparing ecological environments”; (14) “comparing mini-states and mega-cities”; and (15) “anomaly, deviance, exceptionalism and uniqueness in comparative perspective.” Dogan concludes by pointing out that one must carefully select the appropriate and relevant strategies for a given comparative endeavor. In “Methods for Assessing and Calibrating Response Scales across Countries and Languages,” Tom Smith, Peter Mohler, Janet Harkness and Noriko Onodera address the very real challenges inherent in comparative attitude surveys. How does one design an attitudinal questionnaire that is useful in multiple languages, countries and cultures? How does one achieve the levels of comparability required to justify drawing comparative conclusions across nations, cultures, and peoples? This chapter focuses in particular on the construction and use of response categories for questions. The authors go to great lengths to explore the uses and linguistic nuances of unipolar and bipolar response categories such as “agree/disagree,” “important/not important,” and “in favor of/against” (along with appropriate positive and negative modifiers for increasing or decreasing the intensity of the response).
introduction
3
The authors explore several issues, including “how response categories influence the reported distributions of results,” measuring response category intensities, and using alternative response scales. These inquiries are carried out using American and German pilot studies and a Japanese replication. Extensive results are presented with a view toward seeking optimal techniques for constructing attitudinal survey questionnaires and their respective response categories. Finally, the authors propose numerous areas for further investigation with the aim of “achieving equivalence in cross-cultural, multiple-language surveys.” Radical Social Change from Socialism to Capitalism In “The Transition to Capitalism in China and Russia,” Erich Weede takes us on a comparative socio-economic tour of post-communist Russia and China. Relying heavily on pre-communist and post-communist economic statistics, Weede explores the relative failures of Russia vis-àvis the two countries’ socio-economic well-being. Indeed, in nearly every instance, Russia lags behind China—on per capita GDP, on numerous growth indices, on privatization (of agriculture in particular), on governance, on foreign investment, on capital flight, and so on. Along the way, Weede continually asks why. Inequalities are worse in Russia. Russia is far more ethnically heterogeneous, whereas China is much more homogeneous. The author notes, for instance, that “Only China has overcome collective agriculture, not yet Russia.” China has managed to establish market-preserving federalism, whereas Russia’s federalism is still “market-hampering.” China has proactively promoted foreign trade for many years, whereas Russia rests on the laurels of its depleting natural resource exports. All in all, Weede makes convincing arguments for the superior socio-economic performance of China over Russia in their post-communist years. At the same time, the author gives us a solid example of a two-country comparative case study. In “Social Structure and Personality during the Process of Radical Social Change: A Study of Ukraine in Transition,” Melvin Kohn, Valeriy Khmelko, Vladimir Paniotto, and Ho-Fung Hung report on a subset of a massive cross-national comparative effort. This chapter focuses on analyses of surveys conducted in the Ukraine just shortly after the dissolution of the Soviet Union and then again three to threeand-a-half years later—much of which is then comparatively juxtaposed
4
masamichi sasaki
with analyses of similar investigations in Poland, the United States and, to a lesser extent, Japan. The surveys analyzed were conducted face-to-face among Ukrainian men and women in 1992–1993 and again in 1996. Kohn’s group reports a number of very surprising findings. (It must always be kept in mind that these surveys were conducted during a period of radical social change.) For example, the investigators found that “the overtime correlations—the stabilities—of two underlying dimensions of personality—self-directedness of orientation and a sense of well-being or distress—were startlingly low” (emphasis added). Indeed, these and many other similar findings “flew in the face of ” those of myriad previous studies. For persons not to experience changes in their levels of well-being or distress when their social positions changed strikingly was at first so implausible that the researchers expended a great deal of effort to turn up any methodological flaws, none of any consequence of which were found. Among the many other interesting findings was that “change in none of the component dimensions of social stratification . . . is significantly related to change in either self-directedness of orientation or intellectual flexibility. . . .” Indeed, the list of such “startling” findings could go on and on. The substantive complexity of one’s work emerges as one of a number of interesting variables. Nonetheless, it is difficult to summarize this exceptional comparative study in just a few short paragraphs. The implications for further cross-national case studies and comparisons are enormous and the authors urge such pursuits. Values, Culture, and Democracy In “A Theory of Cultural Value Orientations: Explication and Applications,” Shalom Schwartz presents a theory of “seven cultural value orientations that form three cultural value dimensions.” Much of the work compares the theory to those of Hofstede and Inglehart. Schwartz draws on data from 73 countries to validate his seven orientations. Ultimately Schwartz’s theory “yields three broad dimensions”: autonomy versus embeddedness (where autonomy subdivides into intellectual and affective autonomy), egalitarianism versus hierarchy, and mastery versus harmony. All three of these cultural dimensions “contribute uniquely to the explanation of important social phenomena.” Schwartz describes numerous analyses of the data, which reveal a circular pattern to the
introduction
5
seven orientations, which in turn explicates their common and opposing characteristics (interdependent as opposed to orthogonal) and ultimately boils down to the three dimensions. Throughout the work, Schwartz compares his findings to those of Hofstede and Inglehart. Though simpler, their two approaches mesh and agree with Schwartz’s exceptionally well. For instance, they all yield similar world cultural regions: African, Confucian, East-Central European, English-Speaking, Latin American, South Asian, and West European. Considering the differences in the approaches, the fact that these same regions emerge in nearly all instances is quite remarkable. All orientations were examined closely for their associations with socio-economic development, demographics, and attitudes and behavior (moral, political, etc.). Also notable is confirmation of findings regarding “countries as cultural units,” thus supporting “the idea of national cultures.” The positioning of ethnicity in this complex equation is also described in some detail. Opportunities for future research abound, and Schwartz’s work suggests numerous avenues for such pursuits. Pippa Norris and Ronald Inglehart, in “Islamic Culture and Democracy: Testing the ‘Clash of Civilizations’ Thesis,” explore Huntington’s hypothesis about the clash of civilizations from political and social perspectives. They begin by detailing Huntington’s thesis, which proposes that, while culture does matter, it is attitudes toward and expressions of democracy that lie at the root of events such as 9/11. Norris and Inglehart examine evidence from the World and European Values Surveys for the period 1995 to 2001, focusing on attitudes toward four key political values (democratic performance, democratic ideals, religious leaders, and strong leaders) and four key social values (gender equality, homosexuality, abortion, and divorce). Cultural regions of the world are extensively described so that a comparison can be made between regions characterized by Islamic culture and other regions. The authors’ findings do not support the “core components” of Huntington’s “clash of civilizations” thesis, i.e., “societal values in contemporary societies are rooted in religious cultures; the most important cultural division between the Western and Islamic world relates to differences over democratic values; and, in the post-Cold War era, this ‘culture clash’ is at the source of much international and domestic ethnic conflict.” To the contrary, Norris and Inglehart found that there is hardly any difference at all between the Islamic world and the West in terms of political attitudes toward democracy in practice, attitudes
6
masamichi sasaki
toward democratic ideals, and overall disapproval of strong leaders. Indeed, they found that the so-called democratic “clash” was in fact between the post-Communist states and most of the rest of the world (including “both Western and Islamic nations”). They next point out that while support for religious political leaders was stronger in the Islamic societies, this was also the case in a number of other, non- or minimally Islamic, cultural regions (such as sub-Saharan Africa and Latin America). Finally, in studying the results of the four key social attitudinal indicators, Norris and Inglehart found that “there is a substantial cultural cleavage, although one underestimated by Huntington, in social beliefs about gender equality and sexual liberalization.” Here the West is far more liberal with respect to gender equality and sexual liberalization than all other cultural groups, and especially the Islamic nations. At first glance, Brigitte Weiffen’s “The Cultural-Economic Syndrome: Impediments to Democracy in the Middle East,” would seem to be a study in direct contradiction to the preceding chapter by Norris and Inglehart. This is not the case, however, as Weiffen is studying the democratically political realities of Muslim and non-Muslim states, whereas Norris and Inglehart are addressing the political attitudes and aspirations of survey respondents to the ideas and ideals of democracy. Weiffen proposes that there is a “cultural-economic syndrome” which afflicts Muslim states, this based on a relatively overt “resistance to democratization” in the Middle East. Culturally, this resistance is attributed to the role of Islam, and economically it is attributed to oil wealth. Weiffen sets out to show that these two factors “mutually reinforce each other” to create the syndrome which hinders or retards the emergence of democratic regimes in the Middle East. Weiffen analyzes data from diverse sources based upon a matrix of religious orientation and democratic aspirations. These in turn are analyzed within the context of oil-wealthy states and non-oil-wealthy states. She concludes that “in countries where oil wealth and Islamic cultural tradition are at work, religious doctrine, political authoritarianism and [oil] wealth . . . mutually reinforce each other in blocking the democratic option.” In those Muslim states without oil wealth, democratization is a much more likely phenomenon. Finally, Weiffen spends quite a bit of time speculating upon what will happen to Middle Eastern states as their oil wealth runs dry. In so doing, she makes clear that “Islam does not inherently make democracy impossible. It hinders democracy mainly as long as
introduction
7
Islamic doctrine is interpreted by autocratically-minded leaders or would-be autocrats.” Institutions in Comparative Perspective In “Running Uphill: Political Opportunity in Non-Democracies,” Maryjane Osa and Cristina Corduneanu-Huci describe a fascinating and expansive analysis of factors affecting social mobilization in non-democratic autocracies. They studied 24 cases in 15 “stable non-democratic regimes” “to determine conditions of political opportunity in high-risk authoritarian contexts” using Ragin’s Boolean method of Qualitative Comparative Analysis (QCA). This methodology greatly extends the power of otherwise relatively anecdotal comparative case studies. Osa and Corduneanu-Huci began their analysis by selecting four previously identified factors for mobilization: state repression, elite divisions, influential allies, and media access. They later found that adding social networks increased the model’s robustness considerably. The analysis produced four “prime implicants, or specific combinations of political opportunity variables resulting in mobilization, and two prime implicants associated with its non-occurrence.” Among their specific results, they found that increasing repression contributes to mobilization when it is in conjunction with increased media access and a unified elite but no influential allies. Decreasing repression in conjunction with media access, a divided elite, and social networks also contributes to mobilization. Among the six cases where mobilization did not occur, two prime implicants were identified: (a) decreasing repression, unified elites, lack of media access and social networks, but with an influential ally present; and (b) unified elite, increasing repression, lack of media access, no allies, no social networks. Finally, in a reduction of the conditions to their most minimal level, media access and social networks emerged. What is interesting is that there was no one single necessary condition for political opportunity. The prime implicants identified differing combinations of the five independent variables studied. The authors present a number of specific policy implications as a consequence of their findings, such as supporting “cultural and educational exchanges,” and subsidizing “uncensored broadcasts such as Radio Free Europe.” External allies alone, however, are not sufficient to serve as a condition to create political opportunities for social mobilization.
8
masamichi sasaki
Kristen Heimdal and Sharon Houseknecht, in “Does a Strong Institution of Religion Require a Strong Family Institution?” explore the relationships between the institutions of family and religion, principally by using World Values Survey data from 1990. Their main focus is whether one of these institutions has primacy over the other. They emphasize that this cross-sectional, cross-national (i.e., non-longitudinal) work is exploratory in nature; that is, “the goal here is not to see whether strength in family is a predictor of strength in religion, but rather to see whether strong family seems to be a requisite [i.e., ‘necessary but not sufficient’] for strong religion.” Forty-one countries were included in the analyses. One of the key features of the analyses is that they look at both attitudinal and behavioral measures of family and religion. Among the contextual variables included in the analyses are Catholic predominance, communist/communist transition status, level of democracy, level of development, degree of urbanization, per capita GDP, and educational level. The authors found that their original hypotheses were “compellingly” supported. Family strength, at least at the time of this study (circa 1990), was not seen to have declined as much as some other investigators had suggested, nor was religious decline seen as inevitable. Finally, the authors caution that the available data have certain limitations, which suggest one of many solid opportunities for further research. For example, longitudinal studies, they note, would be of immense value. Social Processes In “Globalization and Income Inequality in the Developing World,” Margit Bussmann, Indra de Soysa and John Oneal look at income equality in 72 countries as juxtaposed with these countries’ levels of foreign direct investment (a proven measure of globalization) over the period 1970 to 1990. Other political, economic, and social factors are brought into the study as well. For many years it was thought that globalization negatively impacted income distributions in developing countries, toward increases in income inequalities. However, some studies have found this not to be the case. Indeed, there is a substantial amount of controversy when studying results related to income inequality and its commonly associated social indicators. Drawing on data from UNCTAD and the World Bank, as well as income inequality data from Deininger and Squire, the authors studied
introduction
9
“the influence of foreign investment to gauge the consequences of globalization.” They looked not only at the Gini index but also at the income of the poorest 20% of each country. They found “no evidence that globalization has adversely affected national income inequality.” With one minor exception, this was an across-the-board conclusion. Contrary to earlier thinkers who claimed globalization was harming the developing world, the authors’ findings reinforce recent studies that suggest globalization “has reduced global income inequality.” In conclusion, though, the authors point out that “our analyses make clear the limits of our understanding of the determinants of income inequality: We know more about what does not affect the distribution of incomes than what does.” In “English as an International Language in Non-Native Settings in an Era of Globalization,” Masamichi Sasaki, Tatsuzo Suzuki, and Masato Yoneda underscore that language is an “unquestionable prerequisite for human communication,” and is thus intrinsic to sociology. The apparent dominance of English is addressed in some detail, and attitudes toward that dominance are examined through data from Tokyo’s National Language Research Institute’s 1996–1998 cross-national surveys in 25 nations where English is not the native language. The surveys asked about one’s preference (and tendency) for one’s mother tongue in talking with foreigners inside one’s own country. The surveys also asked about what language or languages would be essential for international communication in the future, as well as for communication within one’s own country. The surveys also asked about what languages one would like one’s children to learn. Respondents were asked to agree or disagree about English as the world’s dominant or most influential language. Finally, respondents were asked what they thought about English’s dominance. Using cross tabulations and correspondence analyses, the authors identified clusters of nations: (A) speakers tend to use their native language, do not think English dominance is good and advocate greater use of other languages; (B) speakers tend to use their native language, do not think English dominance is good but see no alternative; and (C) use English when talking with foreigners and who think English’s dominance is good. In conclusion, the authors validated that English is or is becoming the dominant international language and that there are no immediate competing languages. This does not mean, however, that non-native
10
masamichi sasaki
speakers are happy about using English. As the data clearly show, there is a great deal of ambivalence and discontent about using English. Ultimately, the results suggest a plethora of opportunities for further study. In “A New Test of Convergence Theory,” Robert Marsh sets out to test Marion Levy’s proposition of convergence theory as related to modernization theory. That is, Marsh takes data from 148 non-modernized societies and 52 modernized societies and sets out to determine degrees of convergence or divergence on a number of relevant parameters including economic development, capitalist market economies, demographics, technology, political democracy, cognitive modernization, health, income equality, gender particularism-universalism, and information and communications. The principal proposal is that already modernized societies will show less variation on the variables used in the analysis than will non-modernized societies; i.e., that modernized societies will show increasing “structural uniformity” as opposed to greater structural variation in non-modernized societies. In the two-part analysis, Marsh first compares already modernized and non-modernized societies on 51 variables, and then he compares, within the modernized societies only, shifts in structural uniformity across time (generally between 10 and 20 years, depending upon data available for the specific variables). In the first part of the analysis, he found that the modernized societies showed more convergence on 49 of 51 variables. In the second part, he found that, on 32 of the 45 variables, modernized societies became more convergent over time. Marsh describes how the results support Levy’s general modernization theory across a broad range of social, economic, demographic and political parameters. In conclusion, Marsh roughly fits his results into four domains and shows their relative congruence with Alex Inkeles’ institution domains. Marsh emphasizes that these findings are important for current theories of globalization. Convergence, he states, “is not some uniform process operating throughout the world. Rather, the fact of great variability among the less developed societies means that while some are converging toward the patterns of the developed world, others are diverging.”
PART ONE
METHODS IN COMPARATIVE SOCIOLOGY
STRATEGIES IN COMPARATIVE SOCIOLOGY Mattei Dogan Introduction There is no such thing as comparative chemistry or contextual physics. In the natural sciences, the chain of causality is everywhere identical. In experimental physics or chemistry discoveries have an universal validity. On the contrary, the social sciences, because of the diversity and idiosyncrasy of human societies, are contextual and relativistic. This is true for all living species, as has been demonstrated by the great comparativists Lamarck and Darwin. “Truth on this side of the Pyrénées, error on the other side”, said Montaigne several centuries ago. The best way to comprehend such a biological and social diversity is the comparative method. But in the social sciences, there are very few theoretical explanations with universal validity or applicability. Indeed, John Stuart Mill had admitted that the methods of concomitant variations could not be applied to social realities. There is not a single sociological theory that has not been invalidated in some cases: in the social sciences there are very few paradigms (Dogan 2002). Such a diversity of causal relationships requires a variety of methods. The word “method” has two meanings. The first refers to the technicalities of analysis, such as sampling in survey research, classification of aggregate data, multiples regressions, ecological inference, and so on. The second designates the general approach, the stratagem of conducting the investigation. Here we are exclusively interested in the general strategies, leaving aside the research techniques. In comparative research, the various strategies are complementary. They can be combined and used successively at various stages of the analysis and synthesis. A concrete example could be useful, chosen deliberately from outside the social sciences. Let’s make an incursion in comparative architecture, focusing on religious monuments. We may start by comparing Gothic cathedrals in Europe. This would be a comparison between similar cases, the common trait being the Gothic style. But we will soon discover that in spite of the same basic structure, no two cathedrals are identical. The basic model is easily recognizable,
14
mattei dogan
but each cathedral has its own form, size, decoration and stained-glass windows. Here, the comparison underscores these detailed differences within the category of gothic cathedrals. Suppose now that we compare the Gothic style and the Roman style. We could this time adopt the strategy of comparison between contrasting cases, since the dome of the Saint Peter’s Basilica in Rome and the towers of Notre Dame in Paris are fundamentally different from each other, aesthetically speaking. We can go further and include in the comparison Protestant temples, patriarchical Orthodox churches, Buddhist temples, mosques, Greek temples, pyramids of Incas and pyramids of pharaohs. What is the common denominator of all these buildings? Clearly enough, the willingness to offer a symbolic “home” to deities and to honour the Gods. We arrive then at the concept of “sanctuary”. Around this concept we homogenize a heterogeneous domain: a common belief despite the architectural diversity. This is what can be called conceptual homogenization of an heterogeneous domain. This rapid incursion into comparative architecture enables us to distinguish three kinds of strategies in comparative research. There are many others. Before inventorying them, it is necessary to stress that an important decision has to be taken from the outset: the need to delineate the field of research. Overall comparisons in the classical tradition are becoming increasingly rare, because the social sciences are now more analytical and functional. Today, almost all comparative studies deal with segments, with parts of a society. Overall analyses in the tradition of Montesquieu, Spencer, or Weber are becoming increasingly rare, because the progress of knowledge leads sociologists increasingly to define and limit their field of investigation. A similar specification of research is to be found in physics. The intellectual profile of Nicolaus Copernicus, Leonardo da Vinci, or Isaac Newton no longer corresponds to that of the leading scientists of today. The discipline matures by dividing the social reality it studies. This is not to say that the holistic perspective has been abandoned. Some great comparativists have helped to keep it alive. But one has only to consult the bibliographies devoted to comparative analysis to note the overwhelming predominance of sectoral comparisons. Very few studies attempt to compare, in their entirety, vast political and social structures. The division of the system into segments is the normal course of the comparative approach. Confronted with the complexity
strategies in comparative sociology
15
of the political system, unless he opts for pure theory, the researcher is led to make a choice, to divide, to select the phenomenon on which to center the comparison. The distinction between segmentation and the global approach is a matter of degree. Between the restrictive sectoral study and the global approach that loses itself in abstract theory, there lies a progression from the particular to the general. In contrasting these two facets, it is the overall method that we wish to emphasize. Comparing always involves extracting a small or large sector from a society or political system. But there is a considerable distance between, for instance, the analysis of the political behaviour of workers in two countries and the study of the aggregative functions of parties in twenty countries. Once the comparativist has delimited the object to compare, he has to make a second important decision: the choice of countries to be included in the comparison. This choice depends on many parameters, that will appear progressively in the fifteen following strategies that we shall now review. 1. Comparing by Replication of Single Case Studies In bibliographies and citation indices concerning comparative research, the majority of studies mentioned deal with only one country (Sigelman and Gadbois 1983). The main reason is that even if they are not directly comparative they help comparative research in the sense that they contain a significant theoretical-conceptual component. Another reason is the confusion accepted by many scholars and institutions between truly comparative studies and foreign area studies. On this point, Giovanni Sartori is intransigent: “I must insist that as a ‘one case’ investigation the case study cannot be subsumed under the comparative method though it may have comparative merit” (Sartori 1994:23). Nonetheless, the case study is advocated by many comparativists. Harry Eckstein long ago defined its “merit”: “case studies are first and foremost, part and parcel of theory-building, not of theory controlling” (Eckstein 1975). In this sense, Sidney Verba is right to say that one can validly explain a particular case only on the basis of general hypotheses (Verba 1967:114). All the rest is less relevant, and so is of no use. Alfred Grosser puts it differently: “In a certain sense, no monograph is scientific. There is science only if the analysis of a specific subject is conceived straightaway as a case study: that is to say if one
16
mattei dogan
asks the subject questions deduced from a comparative, even though brief, view of similar subjects” (Grosser 1972:137). For Henry Teune “even single-country case studies, if theoretically framed, can be used to support generalizations. Such cases also can be important first steps to selecting other relevant cases to elaborate a theoretical problem” (Teune 1990:45). B. Guy Peters concurs: “scholars may utilise the case method as their fundamental basis for methodology but then themselves accumulate a number of cases that create a theoretical whole. These scholars had a common theoretical framework, which they then applied to a series of cases. This purposive selection would not meet the canons of experimental or statistical methodology, but it still permitted these scholars to make reasonable theoretical statements with a strong comparative basis” (Peters 1998:141). Many researchers have tested or developed general models within the framework of a single country. Here we have in mind the works of Rene Dumont on India, David Apter on Ghana, James Coleman on Nigeria, Fred Riggs on Thailand, Michael Hudson on Lebanon, and Lucian Pye on Burma. If we consider this last case, we see how, by pondering the problems of this new Asian country, Pye made certain theoretical observations that have since been widely recognized and discussed in studies on political development and communication. No modern society can take shape unless complex and efficient large organizations develop; but the case of Burma makes it clear that such organizations cannot be established in the absence of informal communication between citizens; that is, in the absence of an adequate social organization. These works exemplify how a case study may bring to light significant factors and variables neglected in more inclusive comparisons. Limiting the analysis to a single country has the advantage of allowing the researcher to study the subject in depth. The case study becomes “heuristic”, as Harry Eckstein says, when it contributes to the refinement of a theory. To study Canada as a consociational democracy means distinguishing between explanatory elements that become integrated into the cumulative knowledge of this type of democracy. To note, for example, how little resistance the consociational model offers to excessive governmental responsibilities, as in Lebanon, leads to a better understanding of the rules of the game in such a system. These examples illustrate how the case study, far from passively depicting social features, contributes actively to their explanation. “When explanations are drawn from such single-country
strategies in comparative sociology
17
studies—explanations that have theoretical or potential applicability to other contexts—such studies clearly contribute to the goals of crossnational analysis” (Mayer 1983:175). “Indirectly, case studies can make an important contribution to the establishment of general propositions and thus to theory-building in political science” admits Arendt Lijphart (1971:691), who distinguishes between six types of case studies: nontheoretical, interpretative, hypothesis-generating, theory confirming, theory infirming, and deviant cases. A study covering a single country could become, retrospectively, truly comparative if it is replicated in one or several other countries, and if the replication is focusing on relationships between variables and is not limited to descriptive facts. Most comparativists make a clear distinction between descriptive facts and the replication of relationships (Nowak 1977:17). What does the word “replication” mean? In the hard sciences it means the exact repetition of another’s research design and experiments to assess if the same conclusions can be reached. In the social sciences, a case study is replicative if it is “consciously patterned after methods, hypotheses or measures that had previously been employed in another study” (Sigelman and Gadbois 1983:279). The nature of the problem is an important factor for deciding whether a case study will be of value for the comparativist. Studies focused on structural or systemic data have rather good prospects because the political system already provides a universal matrix; that is, it exhibits a generally relevant set of issues and allows a replication from one particular experience to another. Such replication may become more difficult when the field considered in the monograph implies the kind of intimate approach that only history can provide. Case studies dealing with segments of the political system, such as parties or parliaments, are generally more relevant to a comparative perspective than, say, analyses devoted to ideologies. 2. Comparison by Ideal Types and by Empirical Typologies There are two types of typologies: deductive and inductive. The deductive approach consists in building abstract types. Max Weber is the classic representative of this method. The art of constructing ideal types implies a profound knowledge of the reality and a great capacity to synthesize. Max Weber’s typology of three kinds of legitimacy—charismatic,
18
mattei dogan
rational and traditional—is one of the most cited typologies in sociology and political science. Today, it is nevertheless obsolete. In the Weberian typology of legitimacy, the concept of charismatic leadership plays a crucial role. But if one tries to apply the concept of charisma to contemporary leaders—without stretching it too far—one finds, through empirical research, only a handful of genuine cases during the last few decades, and even fewer cases of traditional hereditary legitimacy: only three or four, if ceremonial kings deprived of real political power are excluded. Consequently, two of the three “boxes” of the Weberian typology are almost empty for the contemporary world. The third “box”, legal-rational-bureaucratic rulership, is overfull with about 180 contemporary independent countries, and it is also diluted, since it amalgamates a large variety of regimes: Latin American bureaucratic authoritarianism, Scandinavian neocorporatist democracies, African tyrannical regimes. The theoretical discomfort becomes even more acute when the empiricist finds that most regimes included in this third “box” are not even legitimate. Thus the old Weberian typology does not include the majority of contemporary regimes. This classical typology must be updated by adding a fourth type, reserved to semi-legitimate regimes, and a fifth, for totally illegitimate rulerships. On closer inspection, the researcher finds that these two new types of authority are still too heterogeneous; the choice is then between multiplying the number of types and the distinction of several sub-types. Bearing in mind that, according to classical theory, legitimacy is not a paragraph in a constitution but a belief in the minds of the people, the researcher needs empirical data. He then discovers that the legitimacyillegitimacy dichotomy is too rigid and that another concept is needed, one that can be operationalized in empirical research by gradually measuring the notions of confidence and trust. Hence the right questions to ask are: how much confidence, by whom, in which domain? Many examples of deductive typologies could be given, starting with Aristotle. Most of them are still alive in the literature (Tonnies: community and society; Durkheim: organic and mechanical solidarity; Redfield: folk societies and urban societies; Parsons has suggested several seminal but very abstract typologies). But in recent decades the trend has been toward empirically grounded typologies. Classification is an old undertaking in all sciences. A typology is a multidimensional classification. The most simple typology results from
strategies in comparative sociology
19
the crossing of two dimensions. Typologies of social actors are often elaborated in a single context, whereas typologies concerning systems, regimes, and societies are conceived from the outset in an international framework. Someone who observes individuals or groups can compare them without looking beyond national frontiers. He can do a noninternational comparison, build a typology of leaders or voters, taking into consideration a single country, or city. This becomes difficult, if not impossible, when the analysis deals with groups, institutions or structures existing in limited numbers only. International comparisons become more valuable when the objects of analysis are classes or parties than when the study deals with families or individuals; they are more useful for understanding pressure groups and unions than for distinguishing between the leaders of these groups. Typologies require an extension of the field across national boundaries when the number of cases is insufficient: the typology of political systems falls naturally into the hands of the comparativist. The value of typologies of regimes, like typologies of social actors, depends on the amount of debate they generate. It is essential that a certain consensus evolves in order that the typology become a real instrument of comparisons. The way the reflection on authoritarian types of government gradually emerged from discussion and confrontations illustrates the point. The democracy-totalitarianism dichotomy lost most of its analytical interest with the increase in the number of “hybrid” countries. As many countries of the Third World became independent, comparativists studying these new countries rapidly found that the concept of totalitarianism was inadequate, if only because of the absence of a technical infrastructure permitting the control of individuals. There was little real analogy between tyrannical African or Asian countries and Stalinist or Nazi regimes. Leo Strauss has rightly defined totalitarianism by two elements. Contrary to the classical tyranny, he wrote, the totalitarian regime possesses technology and ideology. This means that the will to mobilize the population totally—the ideological factor—is not sufficient to transform the new state into a totalitarian state. For that, the development of the country must be at a level that enables it to penetrate deep into the society. It is necessary that the central government be in possession of the infrastructures, the means, and mechanisms, such as the media or educational system, for effectively controlling employment, incomes, travel, voluntary associations, and military or police forces.
20
mattei dogan
The great diversity of newly independent states engendered many typologies that often overlapped. What is remarkable is the consensus that has finally been reached among the greatest comparativists. The typology elaborated by Edward Shils marked a pioneering stage in this direction. Shils distinguished two intermediary types between the extreme poles of democratic and totalitarian regimes—the tutelary democracies characterized by the hypertrophy of the executive, and the modernizing oligarchies whose dominant trait is domination by military or bureaucratic groups, unconcerned with democratizing the country. To these four types, Shils added a fifth, which is rapidly disappearing: the traditional oligarchy. James S. Coleman distinguished between three types of developing countries: competitive, semi-competitive, and authoritarian; the orientation toward modernization introduced a second axis that permitted the elaboration of five types (Coleman 1960). More complex, more ambitious, and more abstract than typologies of actors, the global typologies have a crucial place in comparative research. From Aristotle to Max Weber, history has been marked by these constructions, the best of which were true tools in the progress of sociological knowledge. It is because the analyst tries to fill the voids left by conceptual framework that he is led to formulate new ones. There is no better generator of concepts than a good typology (Dogan and Pelassy 1990:178–9). 3. Binary Comparison There are two kinds of binary comparisons: explicit and implicit. An explicit binary comparison is a comparison between two countries chosen according to a clear hypothesis, crossing analogies against differences. A binary analysis may be aimed at covering two countries in their entirety, but such an attempt may result in a series of parallel studies that are not directly comparative, or in a series of analyses by sectors. The need to “segment” reality appears clearly in such attempts. A good example is the comparison of modernization in Japan and Turkey (Ward and Rustow 1964). The choice of the pair of countries is of crucial importance. Some pairs are interesting, and others meaningless. The pair Japan and the United States is an instructive example. Here is how Lipset justifies this binary comparison: Japan and the United States are two of the foremost examples of industrial success in the contemporary world, and they took very different
strategies in comparative sociology
21
paths to reach that position. Efforts to account for America’s past success have emphasized that it had fewer encrusted pre-industrial traditions to overcome, in particular, that it had never been a feudal or hierarchically state church dominated society. All of Europe and, of course Japan were once feudal, organized in terms of monarchy, aristocracy and fixed hierarchy, with a value system embedded in religious institutions. (Lipset 1994:153)
Similar justifications can be found for the pair junkers and samurai, chosen by Reinhard Bendix, and for the pair Japan and China by Marion J. Levy. Binary comparison permits a kind of detailed confrontation that is almost impossible when the analysis encompasses too many cases. Binary comparison sometimes seems the best way to undertake a study that leaves out neither the specific nor the general. Comparing two countries naturally enhances one’s interest in each one; in particular it stresses the main characteristics and the originality of each situation. But binary comparison can be used not only for increasing our knowledge of two different systems. It can also contribute to an understanding of more general phenomena. In the latter cases, the two countries considered are thought of as contrasted illustrations of a theoretical reflection. When the sociologist compares the British and French industrial revolutions, he proposes an analysis worthy of consideration not only for those interested in France and Great Britain but also for those who study the dynamic of industrialization. When a comparativist contrasts the political attitudes of the working class in Britain and France, he tries to identify variables that can explain the more or less developed propensity to political radicalism. In France, workers valued the political arena as the unique site where change is to be obtained; whereas, the “pragmatism” of the British working class gave union representatives the decisive role. Why did the samurai in Japan become agents of the central power and modernization, whereas in Germany the Junkers became a conservative force? In attempting to answer this question, Reinhard Bendix was able to bring to light some phenomena of general significance. It was partly because Japan had withdrawn into itself that its aristocracy, unlike that of Germany, did not feel threatened; it was partly because the samurai had been deprived of private lands that they so easily adapted to city and administrative life. Structural factors such as the openness of a country and the connections people have with the land can have important effects on the behaviour of members of a society. In such cases, binary comparison may provide general illustrations of
22
mattei dogan
the way in which development, modernization, or national integration come about. Binary comparison is often used for countries that show contextual similarities, even if the aim of the analysis is to bring out differences in one or more specific fields. To analyze comparatively the recruitment and tenure of Cabinet ministers in France and Britain, considered as opposing systems, might show some analogies between the two countries, for instance in the stability of a “governmental nucleus” (Dogan and Campbell 1957:313–45). Conversely, a study of political cleavages in France and Italy, in contexts considered similar, might demonstrate that various social strata do not distribute themselves similarly between political parties in the two countries. A pairing of France and Italy has a natural appeal because the two countries share many features. The France-Italy pairing has stimulated lots of comparisons between the two communist parties, which were the strongest in the Western world. Britain and the United States share other characteristics, which have encouraged many comparisons on policy-making processes, since Ostrogorski first contrasted them. It is more attractive to use pairs of countries like France and Italy, Morocco and Tunisia, Norway and Sweden, or Uruguay and Costa Rica than to compare Finland and Bolivia or Brazil and Pakistan. Some pairs will produce a great deal of interest, while others will give only meagre results. A comparison of England and Japan, as two insular nations or two maritime powers, might be very meaningful, but an attempt to compare Switzerland and Chad, as two countries having no direct access to the sea, would be of little interest. Of course, the comparativist has the liberty to establish original pairs based on his own conception of relevance. It would be relevant to compare India and China in the framework of a study of the choices available to overpopulated Asian countries as they try to solve problems connected with demography, underemployment, and famine. On the other hand, someone interested in power structures or mobilization of the masses in totalitarian regimes would no doubt find it more meaningful to compare Nazi Germany and the USSR. For those interested in European fascism, Germany could pair with Italy. With England, Germany forms a pair often used by those studying the industrial revolution. Germany and France can be studied together in the framework of an analysis of social stratification. These examples show the range of possibilities open to binary comparisons (Dogan and Pelassy 1990:126–8).
strategies in comparative sociology
23
A binary comparison may be implicit when a foreign country is seen in relation to the observer’s own country. By a kind of dialectical process, the view from afar strengthens the perception of our own society. One knows one’s own country better when one knows other countries too. Some characteristics of the French Society appeared more clearly to Tocqueville when he observed the American Society. Lucian Pye has perceived the “non-Western political process”, comparing it implicitly to his own Western culture. As Charles C. Ragin notes, “many area specialists are thoroughly comparative because they implicitly compare their chosen case to their own country or to an imaginary but theoretically decisive idealtypic case” (Ragin 1987:4). The implicit binary comparison is not always immune from ethnocentrism. 4. Comparing Similar Countries Geographical contiguity is evidently neither the only nor necessarily the best means to define a relatively homogeneous universe. Yet, geographical contiguity often implies certain shared cultural, economic, social, historical similarities. The regional approach presents important advantages. This strategy ensures in the most natural way a control over those variables that the observer would like to keep constant so as to better analyse others variables. The comparison between similar countries in many cases overlaps with the regional approach. Studies on Western Europe, Latin America, the Middle-East and Tropical Africa have eliminated from the analysis the impact of contextual, environmental and geographical factors. However, the relevance of the geographical area approach for comparative politics is not as straightforward as it may appear at first sight. Similarity is not necessarily linked to contiguity. Geographical proximity in itself is not always meaningful for comparing. There are nations or political systems that belong to the same region, are contiguous, and yet are very different—such is the case, for instance, for the South East Asian countries. As John Martz has shown, students of Latin American politics remain enduringly frustrated by the problems of diversity: none of the broad theories that were applied to Latin America as a whole, such as the dependencia (Cardoso and Faletto 1978), bureaucratic-authoritarianism (O’Donnell 1973) or the transition theory (O’Donnell et al., 1986; Linz and Stepan 1978), allowed a general conceptualization covering all
24
mattei dogan
countries of the region and region-wide comparative studies. There are indeed certain risks of confinement that the regional studies conceal when they put the emphasis too heavily on the specificities of the region. A comparison between “relatively similar” countries sets out to neutralize certain differences in order to permit a better analysis of others. This strategy is at the heart of the comparative method. As John Stuart Mill once stated, it is by reducing, insofar as possible, the number of interacting variables, that one has the means to observe the influence of factors one wishes to study. It is easier to test the weight of certain institutional rules on political behaviour by choosing democracies that have common creeds than by incorporating authoritarian regimes or pseudo democracies into the analysis. We know that the comparativist, unlike the chemist, can never eliminate the impact of environment. No two nations in the world would enable the researcher to measure the influence of the Protestant religion or certain rules of ownership “all things being equal in other respects”. What the researcher can do is to increase the pertinence of his conclusions by carefully choosing the political and social entities to be compared. For the researcher who studies political systems, analogies are to be sought either in the sociocultural environment of those systems or in their structures and features of operation. The homogeneity will be more a cultural one if, for example, Anglo-Saxon countries are chosen, and more a structural one if the researcher decides to study single-party regimes. The strategy of comparing similar countries has been criticized by Adam Przeworski: I do not know one single study which has successfully applied Mills’ cannon of only differences [“most similar systems design” in the PrzeworskiTeune 1970, terminology]. I continue to be persuaded, indeed, that the “most similar systems design” is just a bad idea. The assumption is that we can find a pair (or more) of countries which it will differ in all but two characteristics and that we will be able to confirm a hypothesis, that X is a cause of Y under a natural experiment in which a ceteris paribus holds. There are no two countries in the world, however, which differ in only two characteristic and in practice there are always numerous competing hypotheses. (Przeworski 1987)
Such a criticism is not justified because the similar countries are not chosen on simple characteristics, but on the criteria of basic analogies, such as the sociological context or the socio-economic level.
strategies in comparative sociology
25
5. Comparing Contrasting Countries by Functional Equivalence A comparison between two series of contrasting countries implies that the contrasts are of broad significance and delineate areas defined by systemic features. The notion of functional equivalence plays an important role in comparisons between contrasting countries. This approach has generated a great number of new concepts and terms. Here is an extract from Gabriel Almond’s “manifesto”: Thus, instead of the concept of “State”, limited as it is by legal and institutional meanings, we prefer “political system”; instead of “powers”, which again is a legal concept in connotation, we are beginning to prefer “functions”; instead of “offices” (legal again), we prefer “roles”; instead of “institutions”, which again directs us toward formal norms, “structures”; instead of “public opinion” and “citizenship training”, formal and rational in meaning, we prefer “political culture” and “political socialization”. We are not setting aside public law and philosophy as disciplines, but simply telling them to move over to make room for a growth in political theory that has been long overdue. (Almond 1960:4)
In the contrasting comparison, the researcher eliminates from the analysis the secondary differences and the similarities which may persist in spite of the profound contrasts. This strategy may generate new comparisons, also contrasting, but more refined. Comparativists initially contrasted democracy and totalitarianism, and then they focused on the differences between nazism and Stalinism. Just like the mountaineer who reaches the summit, he discovers another summit. The history of comparisons by contrasts looks like a chain of high mountains. 6. Conceptual Homogenization of a Heterogeneous Domain The ingenious comparativist chooses carefully the countries to be compared. The criterion of choice is not immediately obvious. In all cases, however, the choice must be made according to a clear concept. The comparativist can select countries from the four corners of the world, and find common characteristics to countries apparently dissimilar. It is he who creates a concept capable of homogenizing a heterogeneous series of countries. Some examples will illustrate how this conceptual homogenization is achieved. The consociational democracy was not understood or recognized until the concept was forged, describing a society segmented into
26
mattei dogan
religious, ethnic or cultural communities, but where consensus was institutionalized at the summit. In the same way, the concept of neocorporatism has allowed meaningful comparisons of some European countries. Another example is the concept of the “one-dominant partysystem”. In these examples, the explanatory hypothesis pre-existed the selection of countries to be compared. A recent example of this strategy is offered by a new interpretation of the concept of “presidentialism”, by Fred Riggs (1994). Many authors have compared presidential regimes and parliamentary regimes. We have learned a lot from this kind of comparison. But Riggs presents an innovative approach by comparing presidential systems to one another. The integrating concept—presidentialism—gives coherence to a heteroclite universe: Brazil, the United States, South Korea, Chile; in total more than 30 countries. This apparently disparate aggregation is conceptually coherent. Riggs finds a contrast, within this constructed universe, between the United States and all other presidential systems. He is the first scholar to try to explain the success of the American presidential system in the light of the failures of such systems in more than 30 countries in Latin America, Africa and Asia: “We must not reject comparisons between the US and other presidentialist regimes because of the failures of the latter—rather, they provide the information we need in order to explain the relative success of the United States”. He makes a clear distinction between formal constitutional rules and parapolitical constitutional practices. He arrives at a paradoxical conclusion: “The more democratic a presidentialist regime, the more likely it is to be overthrown and replaced by authoritarianism”. So for him the United States is an exceptional case: it is the only successful presidential system, if we exclude the particular case of the French system (which is at times super-presidential without countervailing powers, at times simply parliamentarian, but never truly presidential; it is unofficially called semi-presidential). Riggs uses two strategies of comparative research. First, conceptual homogenization; secondly, the identification of an exceptional case. The conceptual framework, more than anything else, helps the scientific knowledge of social or political phenomena to advance. The historian Paul Veyne defends this strategy, arguing that facts are emphasised by their place in an intellectual construction. “The spatiotemporal continuum is only a didactic framework that perpetuates the lazily narrative tradition. Historical facts are not organized by period or
strategies in comparative sociology
27
people, but by notion; they do not need to be replaced in their time, but grouped under their concepts. History does not study man in time; it studies human materials subsumed under concepts” (Veyne 1976:49). 7. Worldwide Statistical Comparisons “The principal problems facing the comparative method can be succinctly stated as many variables, small number of cases”. “There is, consequently, no clear dividing line between the statistical and comparative methods; the difference depends entirely on the number of cases” (Lijphart in 1971). More than three decades later, with the hindsight of the progress made in comparative politics, such a statement remains convincing only for certain types of comparisons. With 200 independent nations (in 2000), the number of existing cases does not seem to be that small. In the last two decades, many insignificant variables have been abandoned and other indicators, because of their interchangeability, have been combined in indices. Worldwide analysis, called by some scholars “holonational” (adapted from the anthropological term hologeistic) consists of the study of whole societies, counts each country as one case, computes formal mathematical measures of relationships among variables and uses these measures to test general theories (Naroll 1972:212–3). The larger the number of countries included in the comparison, the greater the need for quantitative data. Worldwide correlational analysis has experienced a period of stagnation and is out of breath today, an overused form of research. The main reason for this decline is the discrepancy between the quality of statistical data for the advanced countries and for the developing ones. Scholars became aware that in comparing the two sets of countries they were dealing with material of unequal accuracy. It became clear that the lower the level of development, the lower is also the validity of quantitative data. The difficulties encountered in worldwide correlational analyses mark one of the limits to statistical approaches in comparative politics. The weakness of worldwide statistical comparison can also be explained by the fact that it is based on national averages, neglecting the within-nation diversity.
28
mattei dogan 8. Cross-national Comparison of Intra-national Diversities
With very few exceptions, cross-national comparisons use national averages. But we all know that when on the Gauss curve the distance between average, mean and mode is great, an average is not a significant statistical value. In a distribution the average does not reflect skewness. In other words, the skewness of a distribution differently affects the mean, mode and average. The assumption is that the internal diversity of countries is less significant than the differences between them. But in reality most countries are characterized by an important internal diversity, either regional, or vertical in terms of social strata. Some of the most significant characteristics are distributed unevenly. Internal diversities can be ethnic, linguistic, religious, social, economic. Almost all countries could be ranked according to their degree of homogeneity-heterogeneity. In some matters, such as pluralism, internal diversity is an essential dimension. The internal diversity of countries is not necessarily related to their size. Some small countries are very heterogeneous and some large countries relatively homogeneous. Regional diversities are visible in all European countries except, perhaps, Denmark. There are three Belgiums, four Italys, eight Spains. In France there are old regional contrasts. Yugoslavia has exploded into six pieces. Instead of a single national average for the entire Soviet colossus, there are today 15 independent nation-states and as many national averages. Geographical diversity may be expressed in survey research by the notion of social context. When these contexts are taken into consideration, the risk of the “individualistic fallacy” (Scheuch 1966) is seriously reduced, particularly in ethnically diverse countries. “Cross-national comparison may be more fruitful when based upon within-nation comparison” (Verba 1971:309). For the analysis of intra-national diversities, statisticians and geographers long ago elaborated adequate indices, such as the Gini index of inequality, translated into Lorenz curves and coefficients of dispersion. We have the appropriate tools but the standardized statistical data on internal diversity were, until recently, scarce. An important indicator of internal diversity is the degree of linguistic homogeneity, which has been quantified for a large number of countries. Many political phenomena cannot be explained by national averages. Take, for instance, the level of poverty. People do not revolt against poverty as such, they revolt against injustice; they do not revolt against
strategies in comparative sociology
29
the national average of poverty. In statistical terms social inequalities may be expressed in standard deviations. In some countries, governments have been reluctant to collect and publish data on regional, ethnic or social inequalities. Nevertheless, the World Bank has published data on income inequality for many countries, and so has the OECD for fifteen Western countries. Regional disparities have been studied in many fields, including voting behaviour. Disparities among social strata and their changes have so far received little systematic comparative attention, except for Western Europe. Today we can do better. We have more data on many more countries and we know much more about the diversity within these countries. It is very likely that in the future more attention will be given to intra-national disparities because, for many significant variables, within-nation differences are larger than between-country differences. In this way it will be possible to explain a larger part of the variance. 9. Longitudinal, Diachronic and Asynchronic Comparisons Most international comparisons concern societies which exist at a given time. They are synchronic analyses. But societies can be compared over time, even at a distance of centuries. Distinctions should be made between longitudinal, diachronic and asynchronic comparisons. The longitudinal comparison throws light on continuous, gradual evolution, long-term change, irrespective of accidents of history. It is often based on statistical series. For instance, Peter Flora has made a comparative analysis of the development of mass democracy and the redistribution of national income in some fifteen European countries over a period of more than a century (Flora 1983). Another example is the work of Tatu Vanhanen covering some 150 countries over many decades (Vanhanen 1979). But the use of statistics is not indispensable. The longitudinal comparison is different from the analysis of social change, which emphasizes short-term changes or accelerations of history or historical turning points, and not what persists, or changes only slowly. The longitudinal comparison divides time up into periods, more or less mechanically, whereas the comparison of social change is based on successions of events, often dwelling on the concept of generation. The longitudinal comparison shows what would not appear in a factual narration.
30
mattei dogan
A method that relies on successive synchronic comparisons within a chronological framework is called diachronic (Thrupp 1970). A diachronic analysis compares two or several countries at different times, leaving aside the intermediary periods; for instance a comparison of social inequalities in Europe and the United States in 1900, 1950 and 2000. The asynchronic comparison is a twofold comparison in time and space. It involves a comparison of two or more countries or cultural areas at different moments; for instance a comparison between population growth in Europe in the mid-nineteenth century and the population growth in India and Brazil in the second half of the twentieth century without checking the intermediate trend. Such a comparison would vindicate Malthus by demonstrating that he was not a false prophet. The same kind of comparison can be made of the role played by mandarins in imperial China and contemporary Japan and France. The comparison between junkers and samurais made by Bendix comes under this asynchronic strategy. Another good example is Joshua Forrest’s comparison between the weak state in post-colonial Africa and Europe in the Middle Ages. The objective of the author is to show the similarities between the two worlds. He does so on the basis of an impressive amount of empirical evidence from contemporary Africa and mediaeval Europe, concerning the following features that characterize weak states: inadequate administrative capacity; low level of state penetration because of strong local powers; the dominance of informal politics, involving personal rule, unbounded power struggles, multiplicity of factions, use of force, military involvement and coups d’état, over formal political institutions. Forrest’s asynchronic comparative analysis illuminates certain problems of contemporary political systems in Africa. But the author remains very cautious about the possibility of extending the Africa-mediaeval Europe comparison to the study of future trends in African politics. On the contrary, he maintains that there is no certainty whatsoever that the historical paths of African politics will be similar to those of the post-mediaeval states of Europe. Forrest’s contribution is first of all an example of the asynchronic comparative method. But it also is an illustration of the strategy of conceptual homogenization of a heterogeneous field, since his analysis is built on the concept of “weak state”, bringing together countries from two continents separated by seven to ten centuries (Forrest 1994).
strategies in comparative sociology
31
10. Comparison of Causal Relationships Staggered over Time The time dimension is important for understanding political processes and effects. Rates of change are essential for the analysis of political development. Rapid changes may have different effects from slow ones. Comparisons of rates of change may reveal important differences. Time lags are crucial in understanding causality or probabilistic influence. Everything in politics takes time, and so do all changes in society. No social change is instantaneous. Even if communications take place with electronic speed, the social impact of political decisions takes time. Even revolutions need time to engender social consequences. Nevertheless, most comparative research over the last quarter century has used synchronic data, often because they seemed to be the only ones available. For a long time most survey data were synchronic; only recently have comparative time-series become available. Synchronic political analysis was an important step, but often it could only explain a fraction of the variance. This is one reason why many findings reach only minimal results, and often are not even published. A method for dealing with time is the use of lagged variables. If we assume for theoretical reasons, or from experience, that a change in variable A will have an impact on variable B, we must still ask how much later this impact will take place and have observable results. We must compare variables A and B not at the same time, but variable A at a certain moment and B at some later time. This delay may be quite long. The introduction of compulsory primary education in several Western countries around the 1860s was followed by the rise of the “yellow press” in the 1890s. The historian Daniel Vernet has demonstrated that in France, during the eighteenth century, revolutionary ideas and behaviour spread in the countryside two decades after the rise of radical ideas in the main cities. Other time lags may be short, depending on the scale of the processes involved, but some lag is always to be expected. For instance, the attainment of power by social democratic or similarly welfare-oriented parties—often in the form of coalitions—has been linked by several authors to the enactment of additional social welfare legislation and to an actual rise in welfare benefits. Many of these studies, however, have not given enough weight to time lags, and hence have underestimated the actual impact that occurred. The time lags involved include the time between the formation of the government, the enactment of
32
mattei dogan
specific legislation, its promulgation, its effective implementation at the administrative level, and the time it takes the public to learn to make full use of the opportunities under the new laws. The rise in the number of social security beneficiaries partly illustrates this process. In all Western democracies social expenditures have changed slowly, by an incremental trend. Many comparativists have tried to ascertain the importance of social democratic parties in the growth of government, but because they neglected the time dimension and the delayed, incremental social consequences of the participation of social democrats in power, they have succeeded in explaining only a small part of the variance (Dogan 2000:93–114). The vexed question of economic development and the prerequisites for the establishment of stable democratic regimes also involves considerable time lags too often neglected. Causal relationships in contemporary demographic trends in the Third World would emerge more clearly if urbanization and literacy were considered at a certain moment and birth rates and infant mortality one generation later. Such staggering does not require sophisticated statistical techniques. The neglect of the temporal dimension has long limited the explanation of variance. Its inclusion in research designs could enhance the potential for comparative quantitative analysis. 11. Comparison by Composite Indices Single isolated indicators are often misleading. When a researcher relies on only one or two indicators to measure a complex phenomenon, these are likely to be invalid measures. An example: some still use the number of radios per 1000 population as an indicator of the development of the entire communications network of a nation. While such extrapolation may have been valid several decades ago for many nations, there are today cases where this indicator is invalid. A relatively poor country could rank in radios per 1000 inhabitants as high as a relatively rich country. At the same moment the rich country could rank very high on television sets and daily newspaper circulation per 1000. Except for comparisons between the 50 or 60 poorest countries the indicator “radios per 1000” could today be abandoned. The same problem is evident in many other areas where there are complementary items, as in the transportation network. Cars, trains, buses, boats and aircraft all fulfil similar functions. The relative frequency
strategies in comparative sociology
33
in the use of one or more of these modes of transport is influenced by geography, average distances, cost and cultural preferences. In Europe the rail system is more developed than in the US, there being shorter distances to cover and higher population densities. The train is not seen as a lowly form of transport in Europe, as it is in the US. It would be misleading, then, to use air traffic as an indicator of the development of the transportation system. While many social scientists have assumed that the number of cars per thousand inhabitants is a valid indicator of development, they may not have recognized the importance of the fact that there are alternatives available. Energy consumption per capita is another variable which needs an index to help integrate various energy data. The consumption of energy can reflect many social indicators: industrialization, mechanization and even mass communication. Forms of energy include oil, electricity, coal, gasoline and nuclear energy. For purposes of international standardization, the index of energy expresses data in coal equivalents to oil, natural gas and electrical energy. Another aspect of the relevance of indicators is whether certain variables can meaningfully be quantified. It is not enough to assign numbers to events. The second edition of the World Handbook (Taylor and Hudson 1972) contains quantified data on indicators of political protest. Aside from problems of accuracy, these data are of questionable validity: do they really measure unrest in a society? Even if we grant that demonstrations, riots, armed attacks, deaths from domestic violence and governmental sanctions can be quantified accurately, it is still questionable whether we can assume that these categories represent the true level of unrest in a society. Discontent may not appear without a spark to bring it into the open. Even more fundamentally, the indicators of unrest fail to acknowledge the role of suppression in affecting the statistics. Dictatorial governments around the world suppress the expression of unrest. The existence of this underlying level of unrest was demonstrated by the crises in East Germany, Czechoslovakia, Poland and Hungary in 1989–90. By compounding various indicators in an index, the sociological significance of statistical data could be enhanced. Too often, isolated indicators are still treated by complex methods, even when a simple statistical treatment of indices would be sufficient. But in some cases the components of a composite indicator may obscure more than they illuminate. We now possess quantified indicators difficult or impossible to obtain in the 1970s for a large number of countries; for instance, for life
34
mattei dogan
expectancy, access to safe water, number of people per hospital bed, and school enrolment at age 10–12. By combining isolated indicators into indices, quantitative comparative analysis would be facilitated, because the number of variables would be reduced and their explanatory power enhanced. Certain indicators do not need to be combined into indices, because their explanatory power is sufficient, as attested by numerous empirical analyses. Among these privileged indicators is infant mortality. One does not need sophisticated factor analysis to understand why, sociologically, infant mortality is one of the best indicators in comparative research (Vanhanen 1989). 12. Comparison by Scoring and Scaling as a Substitute for Formal Statistics Many of the most significant aspects of political life cannot be treated in statistical terms. The alternative is scaling by experts. The recourse to judgemental rankings and to scoring finds a justification in a statement by the mathematician Tukey: “Far better an approximate answer to a right question, than an exact answer to the wrong question, which can always be made precise” (cited by Banks and Textor 1963:7). The translation of qualitative aspects into measurable variables requires scaling by judges. The involvement of judges raises the question of coder reliability: how likely are two or several judges to rate the same situation in the same manner? If an expert says that country A is more democratic than country B, and this last more than country C, he must admit also that A is more democratic than C. The reliability of an expert can be tested by the consistency of his rankings. To show the potential of scoring and judgemental rankings, I have selected four examples from the literature. The first one is from Phillips Cutright’s “National political development: its measurement and social correlates”. This article is one of the most cited in the literature on comparative politics and one of the few still relevant today of those published three decades ago. With the help of experts, Cutright constructed an index of political development. He allocated for each country two points for each year in which a parliament existed and where the minority party had at least 30 percent of the seats. He allocated only one point when the minority party was weaker, and no points for each year when no parliament existed. He did the same scoring for the executive branch. Over a period of 22
strategies in comparative sociology
35
years a country could accumulate 66 points. Cutright used a simple but pertinent index. The validity of his scoring can be tested retrospectively. He found for 1963 an imbalance for Chile, the Philippines, Indonesia, Nicaragua and Guatemala: political development was higher than socio-economic development. In the following years the regimes in these countries collapsed. The opposite was “predicted” for Spain, Portugal, Czechoslovakia and Poland. These countries were supposedly ripe for democracy. Cutright’s analysis based on scores and a simple statistical model should be compared with many other articles published at roughly the same time which disappeared from the literature despite the mountains of statistics on which they were built. Cutright’s method of scores could be applied retrospectively to Eastern Europe: the implosion in 1989–90 can be explained by the gap between the relatively high socio-economic level (education, health, urbanization, industrialization) and the low level of political development, before the implosion of the Soviet Union. A second example of scoring as a substitute for formal statistics is the voluminous book by Banks and Textor, A Cross-polity Survey (1963). They proposed a series of 57 dichotomized variables, most of which were directly political: interest articulation and aggregation, leadership charisma, freedom of group opposition, freedom of the press, role of the police, character of the bureaucracy, personalismo, westernization and others. The authors preferred significant aspects of political life to quantified but unimportant variables, even if their dichotomization was uncertain. They gave approximate answers to good questions. Another codification of variables which are not directly quantifiable was adopted by Irma Adelman and Cynthia Taft Morris in their Society, Politics and Economic Development: a quantitative approach (1967). This book has been severely criticized by some scholars (Kingsley Davis, among others) and appreciated by others. These contrasting evaluations can be explained by the fact that it consists of two parts. The first (pp. 1–129) contains an interesting discussion of 41 variables, most of which were and remain not directly quantifiable. The second part consists of a confusing factor analysis. I mention this book for its first part. I use the second part to try to vaccinate comparativists against the temptation to engage in factor analysis. Because of frequent malpractice in the use of this statistical tool, mass immunization is needed. In a series of volumes, Freedom in the World, Raymond D. Gastil (1979–90) has ranked countries with the help of experts according to two basic dimensions: political rights and civil liberties. The rating is
36
mattei dogan
on a seven-point scale by univocal ranking. Published annually since 1979, this series has become a rich source of documentation for comparative politics. After decades of progress in comparative politics we still face this dilemma: whether to have recourse to judgemental variables or to neglect some of the most important aspects of political life. 13. Comparing Ecological Environments Because of exaggerations of some older sociological schools of thought, particularly that of Ellsworth Huntington, who overstressed geographical determinism, as a reaction (see Sorokin), geographical conditions were neglected by sociologists for more than a generation; but most recently the evolution of the ecological sciences has greatly increased the possibilities for analysing the environment. Some economists, such as Andrew M. Kamarck of the World Bank, have spoken of tropical societies as distinct from those in the temperate zones. Three-quarters of Africa is in the tropics. The well-known fact that the vast majority of the world’s poor people live in the tropical or semitropical zones is highlighted by the “North-South” categorization. Human behaviour depends not only on temperature and humidity, but also on the rarity or prevalence of morbidity and debilitation. The frequency of infection by parasites and chronic malnutrition not only reinforce each other, but they also interact in feedback cycles with economic productivity and growth, speed or slowness of behaviour, human energy and capacity to work, and the gap between thoughts and feelings on one side and effective action on the other. Large cohorts of tropical populations are not sick enough to die but sick enough to remain poor. Epidemics or widespread tropical diseases may not destroy governments but the population may lack the energy to wipe them out. Malaria and sleeping sickness have been driven back but not eliminated. Hookworm, bilharzia and trachoma still blight the lives of hundreds of millions. Chronic malnutrition has been estimated to account for about two-thirds of morbidity and child mortality in Africa, South Asia and tropical Latin America. Trypanosomiasis kills horses and cattle and makes it difficult to get to the interior from the coast using animal transport. “The transport obstacle alone was quite sufficient to postpone for centuries any appreciable economic development in tropical Africa” (Kamarck 1976:19).
strategies in comparative sociology
37
Comparativists have not yet asked this difficult question: How far is the low level of development in most of Africa and in some Asian countries to be explained by their tropical environments? Such an interrogation is completely absent even in the recent books on Africa. In the southern regions of the United States and in Northern Australia similar conditions of heat and humidity prevail but the economic handicaps have been overcome. Before the coming of modern hygiene in the eighteenth and nineteenth centuries, major ecological handicaps also prevailed among the poorer strata in the temperate zones. Rats and lice spread plague; epidemics of cholera, typhus and tuberculosis were frequent and often endemic, and so were rickets and other diseases due to deficient nutrition. The mass availability of industrially produced soap and cotton underwear increased as early as the eighteenth century. Clean drinking water, free of epidemic germs, is not available everywhere. The spread of tea in parts of Asia, but not in Africa, meant that drinking water was boiled. Disinfectants came later, used first in hospitals and later in homes. Malaria was wiped out in the south of Italy only about 50 years ago. The experience of Western countries, temperate or hot, suggests that social and economic conditions can contribute substantially to reducing morbidity and mortality. It is the same for the highest strata in tropical countries: the Latin American upper classes have for centuries been healthier than the poor. A set of additional quantified new indicators, highlighting these conditions, could lead to significant revisions of many received theories of economic and political underdevelopment. They might even lead to a revision of the received and often ethnocentric notions about easy self-help for tropical nations. In the advanced countries the ecological problem is reappearing at a higher level in concerns for the “quality of life”. Access to green spaces, to woods and meadows, is becoming rarer and more difficult. Water and air are polluted, less often by germs and parasites and more often by industrial effluents. Smog burdens eyes and lungs. Along with such conditions new political movements and parties have arisen in European countries. Indicators for these ecological problems are available at least for urban areas in many countries, but they are difficult to integrate as international statistical series. This is why comparative researchers have been slow to use them and to relate them systematically to social and political issues. But these problems will not go away; they will grow. And social scientists will have to catch up with them. The ecological dimension may require nominal indicators as well as quantified ones.
38
mattei dogan 14. Comparing Mini-states and Mega-cities
In 1992, there were 214 countries and territories, 187 of them independent. Of these 214, only 132 have a population of over one million and only 122 over two million. One-quarter of the countries represented at the United Nations have together a population equivalent to that of Colombia, which ranks thirtieth among nations in demographic terms. At the same time, half the world’s population lives in four countries: China, India, the USA and Russia. For many comparative purposes, such disparity creates no difficulties. In a typology of political systems the size of the country does not matter. One can compare social mobility in a series of countries without taking their size into consideration. The political systems of Denmark and Costa Rica can be compared with those of India and Nigeria. It is appropriate to compare the presidential system of France and of Sri Lanka even if one country is six times larger than the other. But under some circumstances, size may have an impact on the functioning of a democratic regime. Size always has an impact in international relations: we cannot evaluate the role of Ghana and of Brazil in the international arena if we ignore their sizes. When the analytical approach is basically statistical, the number of cases and the diversity of their size can be an essential dimension. When it is remembered that Norway is not much more populous than Connecticut, there is a feeling of unease about a comparison of electoral behaviour in the United States and in Norway. A sample of 2000 individuals in each country might be statistically sufficient, but one cannot avoid certain doubts about the choice of this pair of countries. One remedy would be to weight the countries according to their demographic size. This is already done in some comparisons of European countries and in studies which consider the continent as a whole. In such analyses, France counts sixteen times more than Finland. The problem of the size of nations is aggravated when we contrast small states with giant cities. Considering only independent countries (leaving aside the territories), one of every two had in 2001 less than four million inhabitants. At the same moment there were eighty megacities of over four million people, many in middle-sized countries. During the 1950s, urbanists defined cities as agglomerations of 5,000 and, for some world regions, of 10,000 inhabitants. Later they adopted a criterion of 20,000 people, and still later of 100,000 people
strategies in comparative sociology
39
per agglomeration. Comparative political researchers followed these definitions, since they depended on the data made available to them. These changes in definition seemed to reflect reality, since urban centres evidently grew. But in part the new definitions were adopted for reasons of convenience rather than of insight. Giant cities of more than one million are a new category. They are of crucial importance in the politics of the countries in which they exist. In 1950, there were about 50 such cities; in 1982 there were 278; in 1992, about 330; in 2001 about 400. Statistical data on giant cities are not easily rendered comparable. Some include only the population in the city itself, administratively defined; others include the suburbs or the entire urbanized area gravitating to the central city. The United Nations has made a serious effort to standardize these criteria, but in many cases it is still necessary to evaluate rather than to count. These giant cities require separate treatment. Their number in a country makes a difference to its political system. If there is only one, it is apt to dominate the country and make it “monocephalic” (singleheaded), usually with a star-shaped system of internal communication, as in France, Britain, Austria, Peru, or the Republic of Korea. Some 30 or more countries, from Hungary to Mexico, Argentina and Thailand, are in this condition. Other countries are “polycephalic”: they have several giant cities, none dominating the others, with a grid-shaped system of transport and communications. Here we find some of the largest countries in the world—China, India, Russia and the United States—but also middle-sized countries, such as Germany, Italy, Canada, Spain, Australia, Poland, Morocco, and also a number of small countries such as Switzerland, The Netherlands and Belgium. Some countries are bicephalic or double-headed, such as Turkey, Syria and Vietnam. For certain comparisons of European countries it is useful to take into consideration the system of cities. All else being equal, the fact that France, Austria, Denmark, Ireland and Finland are “macrocephalic” countries, and, on the contrary, that Germany, Italy and The Netherlands are “polycephalic”, makes an important difference in many political domains. If Yugoslavia had a single powerful mega-city instead of six important regional cities, the dismemberment of the federation would probably have taken a different course. The link between a network of old major cities, born of history, and federalism is obvious.
40
mattei dogan
Comparing large American and European cities, one should take into consideration the public transportation system, particularly the underground infrastructure. The metros in Paris, London, Moscow or Tokyo represent an investment that Los Angeles would need more than 20 or 30 years to build. The cost of the Parisian metro is perhaps equivalent to the cost of the entire production of automobiles in the United States over two or three years. Such a comparison cannot remain at the statistical level. The metropolitan areas of Mexico City, Buenos Aires, São Paolo, Cairo, Bombay, Calcutta, Seoul and other mega-cities had in 1990 a larger population than each of the 120 smaller independent nations. In some countries, the primary city accommodates a significant part of the population (Athens and Santiago about 40 percent, Montevideo almost half, Beirut about three-quarters), and includes the lion’s share of economic, financial, cultural, educational, scientific, artistic and political activities of the country. A World of Giant Cities (Dogan and Kasarda 1988) is replacing progressively the world of territorial nations. In his Political Order in Changing Societies, Samuel Huntington asked in 1968, “What groups are most likely to be revolutionary in the city?”. In the last two decades it has been necessary to explain why the urban lumpenproletariat did not revolt despite the continuing growth of shantytowns, favelas (Brazil), poblaciones (Chile), barriadas (Lima), ciudades esperidida (Mexico), Kutcha (Calcutta) and other slums and bidonvilles at the peripheries of mega-cities from Casablanca to Bogota and from Bombay to Lagos. It may be that tomorrow many comparativists will have to give priority to political unrest in the giant cities of the Third World; they will then need new indicators to replace older ones. Lerner’s model (urbanization ∅ literacy ∅ communication ∅ participation) had a nice run in comparative studies but, for the study of primary cities in Asia, Latin America and Africa and their lumpenproletariat and troglodytes, it appears obsolete today. Some 40 quantified indicators are available for a large number of cities, not all standardized. There is an important monographic literature on mega-cities, but very few comparative studies. As the number of giant metropolises will inevitably continue to grow, there is a need for systematic comparison not only of metropolises, but also between small countries and giant cities; for instance, the budget of the municipality of New York or of the Metropolitan District of Mexico is higher than the national budget of dozens of small countries.
strategies in comparative sociology
41
15. Anomaly, Deviance, Exceptionalism and Uniqueness in Comparative Perspective Even in the most imaginative comparative research there remain, in the final analysis, certain irreducible phenomena which refl ect the originality of each country. History is the greatest generator of national configurations. The older a country, the more it has been shaped by its history. Two or several countries may have many features in common, but they are never identical, because the attributes are combined differently for each country. We are always facing unique realities that we call China, Switzerland, Egypt, Russia, India or Spain. In international comparisons we may distinguish anomalies, deviant cases and exceptional cases. It is a matter of degree. An anomaly is an unexpected position in a ranking, curve or diagram. Statistical eccentricities can be discovered by crossing variables in a scattergram. For instance, life expectancy in Bulgaria is higher than we would expect considering the other correlates of this country. The number of students per thousand inhabitants in India appears “abnormal” for a country with such a low standard of living. An anomaly can be revealed only in a comparative light. In comparative sociology it may play the same role as the clinical case in biology or medicine. The deviant case is less frequent than the anomaly, but it is more significant because it is an entire sector of the society or the political system which appears unusual, abnormal. Deviance can be defined in relation to a set of expectations drawn from a series of countries similar in many ways. The exceptional case is a multiple deviant case, an extreme case from many points of view. India, according to certain theories, should not be democratic; it is an enormous exception. The difference between the deviant case and the exceptional case is a difference of degree. Exceptionalism refers to an accumulation of several deviances in systemic or contextual characteristics, forming a configuration, a Gestalt. It is in this sense that S.M. Lipset considers Japan and the United States as exceptional cases. When we decompose the configuration into variables, the distinctiveness tends to be obscured, because we are extracting the variables from their contextual significance. Japan and the United States are exceptional as wholes, but when we “segment” these configurations, when we isolate variables and indicators, the differences between the two countries become differences of degree. Among the three dozen partial comparisons and sectoral analyses, Lipset always finds differences
42
mattei dogan
of degree; it is never zero percent in one case and 100 percent in the other. The exceptionalism of each of these two countries resides in their national configuration. The search for deviant and exceptional cases is an alternative strategy to the statistical approach. Instead of high correlations, we are looking for a meaningful clinical test. Exceptional cases do not limit the potential of international comparisons. On the contrary, the search for exceptions can be a sui generis strategy of comparative research, for only by comparing can one say that a country is or is not abnormal, deviant, exceptional. Concluding Comments Everyone agrees that the microscope and telescope serve different purposes. In the same way, some of these fifteen strategies could be appropriate for certain problems, but not necessarily for others. Not every strategy is equally useful in studying every kind of problem. The strategies set out here analytically as concurrent, are not exclusive in practice. Some of them may appear complementary. The adoption of one strategy rather than another depends on the nature of the phenomenon to be studied. But for many questions, several strategies can be combined at different stages of the comparative research. The hypotheses validated for a given country in a case study can be tested in a second country by a binary comparison. At a later stage, the analysis can be extended to two series of countries, each one relatively homogeneous, by a comparison of contrasting countries. The comparativist’s freedom of choice of countries is great, unless the problem studied is too closely linked to a determined context. Post-industrial democracies, traditional Islamic societies or totalitarian regimes are social contexts with profound characteristics which condition the choice of countries. But even then it is possible to adopt another strategy by choosing an exterior pole of reference such as emerging pseudo-democracies, secularized societies, or oriental despotic societies. The comparativist establishes causal relationships and observes the interaction of various factors by dividing social reality into specific sectors. Before comparing, it is necessary to segment, choosing at the same time the appropriate countries to be included in the comparative research. It is by such segmentation and choice that during the last three decades a new comparative social science came to be established.
strategies in comparative sociology
43
References Adelman, J., and C.T. Morris. 1971. Society, Politics and Economic Development: A Quantitative Approach. Baltimore: Johns Hopkins Press. Almond, Gabriel. 1960. “A Functional Approach to Comparative Politics.” In The Politics of Developing Areas, edited by G. Almond and J.S. Coleman. Banks, A., and R.B. Textor. 1963. A Cross-Polity Survey. Cambridge, MA: M.I.T. Press. Bebler, Anton, and Jim Seroka (eds.). 1990. Political Systems: Classifications and Typologies. Boulders, Co. Coleman, James. 1960. “The Political Systems of the Developing Areas.” In The Politics of Developing Areas. Princeton University Press. Cutright, Philips. 1963. “National Political Development: Measurement and Analysis.” American Sociological Review 28 (April). Dahl, R., and E. Tufte. 1973. Size and Democracy. Stanford: Stanford University Press. Dogan, Mattei. 2000. “Class, Religion, Party, Triple Decline of Electoral Cleavages in Western Europe.” In Party Systems and Voter Alignments Revisited, edited by L. Karnoven and S. Kuhnle. London: Routledge. ——. 2002. “Are there Paradigms in the Social Sciences?” In International Encyclopedia of the Social and Behavioral Sciences. Oxford: Elsevier. ——, and Dominique Pellassy. 1990. How to Compare Nations (Second edition), New York: Chatham House. ——, and John Kasarda. 1988. A World of Giant Cities, 2 vol., London: Sage. ——, and Peter Campbell. 1957. “Le Personnel Ministériel en France et en Grande Bretagne.” Revue Française de Science Politique 7(2). Flora, Peter (ed.). 1983. State, Economy and Society in Western Europe 1815–1975. Frankfurt: Campus Verlag. Forrest, Joshua, B. 1994. “Weak States in Post-Colonial Africa and Mediaeval Europe.” In Comparing Nations, edited by M. Dogan and A. Kazancigil. Oxford: Blackwell. Grosser, Alfred. 1972. L’Explication Politique, Introduction à l’Analyse Comparative. Paris, Colin. Holtz, R.T., and J.E. Turner. 1970. The Methodology of Comparative Research. New York: Free Press. Huntington, Samuel. 1968. Political Order in Changing Societies. New Haven: Yale University Press. Kamarck, Andrew M. 1976. The Tropics and Economic Development. New York: The World Band Publications. Lijphart, Arendt. 1971. “Comparative Politics and the Comparative Method.” American Political Science, Review 65. Linz Juan. 1975. “Totalitarian and Authoritarian Regimes.” In Handbook of Political Science, edited by F. Greenstein and N.W. Polsby. Reading: Addison-Wesley. Lipset, Seymour M. 1994. “American Exceptionalism-Japanese Uniqueness.” Pp. 153– 212 in Comparing Nations, edited by M. Dogan and A. Kazancigil. Oxford: Blackwell. Martz, John D. 1994. “Problems of Conceptualization and Comparability in Latin America.” In Comparing Nations, edited by M. Dogan and A. Kazancigil. Oxford: Blackwell. Mayer, Lawrence C. 1993. “Practicing what we Preach: Comparative Politics in the 1980s.” Comparative Political Studies 16, 2 ( July):173–194. Naroll, Raoul. 1972. “A Holonational Bibliography.” Comparative Political Studies 16, 2 ( July):5–2. Nowak, Stefan. 1977. “The Strategy of Cross-National Survey Research for the Development of Social Theory.” In Cross-National Comparative Survey Research, edited by A. Szalai and R. Petrella. Oxford: Pergamon Press.
44
mattei dogan
Peter, B. Guy. 1998. Comparative Politics, Theory and Methods. New York University Press. Przeworski, Adam. 1987. “Methods of Cross-National Research: an Overview.” In Comparative Policy Research: Learning from Experience, edited by M. Dierkes, H. Weeler, and A. Berthoin Antal. Berlin: Wissenschaft Zentrum. ——, and Henry Teune. 1970. The Logic of Comparative Social Inquiry. New York: Wiley. Pye, Lucian. 1958. “The Non-Western Political Process.” Journal of Politics 20(3). Ragin, Charles C. 1987. The Comparative Method. Berkeley: University of California Press. Riggs, Fred W. 1994. “Presidentialism in Comparative Perspective.” Pp. 72–152 in Comparing Nations, edited by M. Dogan and A. Kazancigil. Oxford, Blackwell. Sartori, Giovanni. 1994. “Comparing Miscomparing and the Comparative Method.” In Comparing Nations, edited by M. Dogan and A. Kazancigil. Oxford: Blackwell. Scheuch, Erwin. 1966. “Cross-National Comparisons Using Agreggate Data: Some Methodological Problems.” In Comparing Nations, edited by R.L. Merritt and S. Rokkan. New Haven: Yale University Press. Sigelman, L., and G.H. Gadbois. 1983. “Contemporary Comparative Politics: An Inventory and Assessment.” Comparative Political Studies 16, 3 (October):275–306. Taylor, Ch.L., and M.C. Hudson. 1972. World Handbook of Political and Social Indicators. New Haven: Yale University Press. Teune, Henry. 1990. “Comparing Countries: Lessons Learned.” In Comparative Methodology, edited by E. Oyen. London: Sage. Thrupp, Sylvia L. 1970. “Diachronic Methods in Comparative Politics.” In The Methodology of Comparative Research, edited by R.T. Holt and J.E. Turner. New York: The Free Press. Vanhanen, Tatu. 1979. Power and the Means of Power, A Study of 119 States 1850–1975, Ann Arbor, MI: University Microfilm International. ——. 1989. “The Level of Democratization Related to Socio-economic Variables in 147 States 1980–85.” Scandinavian Political Studies 12(2):95–127. Verba, Sidney. 1967. “Some Dilemmas in Comparative Research.” World Politics 20(1). ——. 1971. “Cross-National Survey Research: the Problem of Credibility.” In Comparative Methods in Sociology, edited by Ivan Vallier. Berkeley: University of California Press. Veyne, Paul. 1976. L’Inventaire des Différences. Paris, Seuil. Ward, R.E., and D.A. Rustow. 1964. Political Modernization in Japan and Turkey. Princeton University Press.
METHODS FOR ASSESSING AND CALIBRATING RESPONSE SCALES ACROSS COUNTRIES AND LANGUAGES Tom W. Smith, Peter Ph. Mohler, Janet Harkness, and Noriko Onodera Introduction Scientific research rests on the reliable and consistent measurement of phenomenon. In cross-national or cross-cultural survey research between countries or social groups that speak different languages, the goal of replicative measurement is greatly complicated by the necessity of designing and administering questionnaires in two or more languages. Only by assuring that the items in all languages and questionnaires are equivalent both in meaning and response scales can comparable measurement be obtained and valid inferences drawn. But the complexity of both survey measurement and of languages makes the goal of equivalency an extremely difficult challenge (Glick et al. 2004; Kumata and Schramm 1956; Ommundsen et al. 2002; Scheuch 1989; Smith 1988, 2002, 2004; Van de Vijver and Leung 1997). Each question has two parts: 1) the point of the inquiry or substance of what is being asked about and 2) the implicit or explicit categories in which the response is requested. When the question is open-ended, the requested response is unstructured (e.g. “What is the most important problem facing the country today?” and “Why did you vote for George W. Bush for President?”). But most survey questions are closed-ended with an explicit set of response categories or some type of response mechanism described (e.g. “If you were to consider your life in general these days, how happy or unhappy would you say you are on the whole: Completely happy, Very happy, Fairly happy, Not very happy, or Not at all happy?” and “Do you favor or oppose the death penalty for people convicted of murder?”). While there are effectively an unlimited number of subjects that questions ask about (and an wide variety of ways of asking about each subject), survey researchers tend to use a much smaller number of response categories in their questions. As Davis’ review (1993) of
46
Tom
w. smith et al.
301 questions on the 1985–1993 International Social Survey Program (ISSP) modules showed, several response scales were repeatedly used. For example . . . Scale
# of Items
Agree strongly/Agree/Neither agree nor disagree/ Disagree/Disagree strongly/Can’t choose Essential/Very important/Fairly important/ Not very important/Not important at all/Can’t choose Definitely allowed/Probably allowed/Probably not allowed/Definitely not allowed/Can’t choose Strongly in favor of/In favor of/Neither in favor of nor against/Strongly against Very important/Important/Neither important nor unimportant/Not important/Not important at all/ Can’t choose
92 26 22 11 9
Not only are the same scales utilized again and again, but certain terms tend to be repeated across scales. Note, for example, the use of “very” and “important” in the second and fifth examples above, of “strongly” in the first and fourth examples, and of “can’t choose” in all but the fourth example. Thus, by focusing on the response-scale part of questions, one deals with a set of measurement and translation issues that have widespread application across questions and surveys. In addition, most survey response scales seek to arrange responses along a underlying continuum such as agreement/disagreement, importance, allowance, being in favor of/against, etc.1 By assessing the position of each response category on the underlying continuum, the intensity of the response is determined. If this is done for items in two languages, it becomes possible to determine the equivalency of the individual response categories and ultimately of the response scale as a whole. The task then becomes developing a method for assessing where categories fall on a response continuum. This paper will examine 1) how response categories influence the reported distribution of results, 2) how to measure the intensity of response categories, 3) results from American and German pilot studies
1
Nominal scales do not do this, but these are rare in attitudinal scales.
methods for assessing and calibrating response scales
47
of response scales and a Japanese replication, 4) the use of alternative response scales, and 5) the implications of these results for cross-national research. Response Scales and Reported Distributions Reported distributions are a function of a) the true distribution of attitudes in the population and b) measurement properties of the response scale.2 How much of an underlying distribution is captured by a given term/category is a function of a) the underlying distribution and b) the number, intensity, positioning, and intervals between the scale points utilized. In general, a) the more points used, less of the distribution will be captured by a particular point, b) the closer two points are in intensity, the less of the distribution will be captured by each individual point, c) broader terms may capture more of the distribution than narrower terms (i.e. it is not only the mean intensity of a term, but its range, that determines how much of the distribution will be covered), and d) adding a new, more intense point to a scale can change how the previous end point was understood and alter (and typically increase) the share of the distribution captured by the displaced endpoint. The effect seems to be that some people avoid “extreme” categories where extremity is based on a category representing the end or extreme position on a scale, rather than on the extremity of the term actually used to express the scale point. To illustrate these points, let us start with the simplest case of a dichotomy: agree/disagree. Given the hypothetical distribution of attitudes in Figure 1, the reported distribution would be about 65% agree and 35% disagree. Now suppose a third category, “neither agree nor disagree,” was added and that half of the people closest to the midpoint (4.5) were attracted to this mid-category. The revised distribution would be agree 55%, neither agree nor disagree 17.5%, and disagree 27.5%. Next, suppose that “agree” was replaced with two categories “completely agree” and “somewhat agree”. If “completely agree” was at point 0 and “somewhat agree” at point 3, then the new distribution might be 15% completely agree and 40% somewhat agree. But suppose
2 This paper does not cover other reasons that people in various cultures may respond differently to the offered response options such as response effects due to social desirability or an extremity bias (Smith 2002; Javeline 1999).
48
Tom
w. smith et al.
the two new categories were “completely agree” and “strongly agree” with the former at point 0 and the latter at 2. The distribution would be something like 10% completely agree and 45% strongly agree. But if “strongly agree” was added as a third new category on the agreement side, then the distribution might become completely agree 10%, strongly agree 20%, and somewhat agree 25%. But if “completely agree” was then dropped, then the distribution might become 30% strongly agree and 25% somewhat agree.
X XX – – Agree 0 1 X = 5% of total
X X X – 2
X X X – 3
X X X X – 4
X X X – 5
X XXX0 – – – – 6 7 8 9
Di sagree
Figure 1. Hypothetical distribution.
Alternatively, assume that “slightly agree” was added and represented at 4. It might not only take over much of the somewhat agree cases, but draw in some of the distribution from neither agree or disagree. Along with a matching “slightly disagree” category these might bring back in say half of the distribution lost to the middle category above leaving 59.375% in the combined agree categories, 8.75% in neither agree nor disagree, and 31.875% in disagree. Thus, assuming a fixed true distribution, seven different response scales ranging from two to seven categories, and the simplest of rules for allocating cases, there is considerable variation in distributions reported. In this hypothetical example the % agreeing varies from 55% to 65% and strongly agreeing from 20% to 45%. Now consider what the impact might be of using two scales to measure two different populations with the same true distribution of an attitude as in Figure 1. Suppose that in population A the completely agree/somewhat agree/neither . . . scale was used with the resulting distribution of 15%, 40%, 17.5%, 25%, and 2.5%. In population B the strongly agree/somewhat agree/neither . . . scale was employed and the distribution was 30%, 25%, 17.5%, 17.5%, and 10%. Now assume that population B was interviewed in another language and the researcher was told that the second (non-English) scale was a translation of and equivalent to the first (English) scale. Comparing these two scales using
methods for assessing and calibrating response scales
49
the typical values of 1–5 for the five categories one would conclude that there was more agreement in population B than in population A (means respectively of 2.5 vs. 2.6) and that there was much more extremity in population B than in A (1+5 = 40% vs. 17.5%). Neither conclusion would be correct, but merely the artifact of mistranslations and/or misinterpretations of scales in two languages. Measuring the Intensity of Response Categories There are several ways to measure the strength of response categories along an underlying response scale. One approach is to have respondents rate the strength of terms defining each point on the scale. There are three standard variants of this approach. First, one can rank the terms from weaker to stronger (or from less to more or along any similar continuum) (Spector 1976). This, of course, only indicates their relative position and not the absolute strength or distance between terms. Second, one can rate each term on a numerical scale (usually with 10 to 21 points) (Wildt and Mazis 1978; Worcester and Burns 1975; Myers and Warner 1968; Cliff 1959; Jones and Thurstone 1955; Mosier 1941; Vidali 1975; Mittelstaedt 1971; Bartram and Yelding 1973; Traenkle 1987). This allows the absolute strength or distance between each term to be known and thus facilitates the creation of equal interval scales. Alternatively, it is also possible to use an alphabetical scale or unlabeled spaces, rungs, or boxes as in a semantic differential scale (Osgood et al. 1957). The letters or spaces are then transformed into their numerical equivalents. Finally, magnitude measurement techniques can be used to place each term on a ratio scale (Lodge et al. 1975, 1976, 1979, 1981, 1982; Hougland et al. 1992; Osinski and Bruno 1998). The magnitude measure techniques gives an arbitrary value to a reference term and has respondents rate other terms as ratios to this base term. This allows more precision than the numerical scale approach (since the terms are not constrained by the artificial limits of the bounded number scale). Of these three variants the middle seems most useful. On the one hand, the ranking method fails to provide the numerical precision that is necessary to calibrate terms across languages. On the other hand, the magnitude measurement technique is much more difficult to administer and much harder for respondents to work with (about 10–15% seem
50
Tom
w. smith et al.
unable to master the procedure). In addition, the extra precision that the magnitude measurement procedure can provide over that achievable using a 21-point scale approach does not appear to be needed. The direct rating approach has been used to rate words along various dimensions. Of most interest to us are those that either rate terms along a general good/bad or positive/negative dimension or which rate the intensity of modifiers (Wildt and Mazis 1978; Worcester and Burns 1975; Myers and Warner 1968; Cliff 1959; Jones and Thurstone 1955; Mosier 1941; Vidali 1975; Mittelstaedt 1971; Bartram and Yelding 1973; Lodge et al. 1975, 1976, 1979, 1981, 1982; Hougland et al. 1992; Bullinger 1995; Szabo et al. 1997; Skevington and Tucker 1999; Skevington 2002). Similarly, other studies have rated probability statements (Wallsten et al. 1986; Lichtenstein and Newman 1967); frequency terms (Spector 1976; Schaeffer 1991; O’Muircheartaigh et al. 1993; Strahan and Gerbasi 1973; Bradburn and Sudman 1979; Schriesheim and Schriesheim 1974; Hakel 1968; Simpson 1944); and terms used in reports to describe percentages from public opinion surveys (Crespi 1981 and “RAC . . .,” 1984). The studies generally show that a) people (usually college students) can perform the required ratings tasks,3 b) ratings and rankings are highly similar across different studies and populations, c) there is high test/retest reliability, and d) several different treatments or variations in rating procedures yield comparable results. Thus, the general technique seems robust and reliable.4 A second approach for assessing the intensity of scale terms and response categories is to measure the distributions generated by using different response scales (Smith 1979; Laumann et al. 1984; Michael and Michaels 1994; Hougland et al. 1992; Orren 1978; Sigelman 1990). In an experimental, across subjects design, one random group is asked to evaluate an object (e.g. presidential popularity or one’s personal happiness) with one set of response categories and a second random group
3 While this is reassuring, other studies show that various measurement artifacts can influence responses to numerical scales (Wilcox et al. 1989; Smith 1993; Schwarz and Hippler 1995; Schwarz et al. 1985; and Schwarz et al. 1991). See also, O’Muircheartaigh et al. 1993; Wright et al. 1997. 4 An exception is that vague frequency terms correspond to different absolute values depending on the commonness or rarity of the specified event or behavior. Thus, people who “usually” vote may vote once a year, but people who “usually” dine out dine out more than once a week (Schaeffer 1991; Bradburn and Sudman 1979).
methods for assessing and calibrating response scales
51
evaluates the same object with another set of response categories. Since the stimulus is constant and the sub-group assignment is random, the number of people attracted to each category will depend on the absolute location of each response category on the underlying continuum and the relative position of each of the scale points adopted. With some modeling around what the two observed distributions suggest are the underlying distribution, it is possible to estimate at what point each term is cutting the underlying scale (Clogg 1982, 1984). The alternative version uses a within subjects design in which people are asked the same question (i.e. presented with the same stimulus) two or more times with different response categories being used (Orren 1978). This differs from a test/retest reliability design in that a) the measurement instrument is not constant (since the response categories differ) and b) the two administrations are essentially consecutive without any intervening time and/or buffer tasks. This provides additional information since it allows the direct comparison of responses, but the initial evaluations may artificially influence responses to the later scales (e.g. a person may feel constrained to choose the same response in terms of position or term used on the first administration on a subsequent administration). The advantage of the distributional approaches is that they ask respondents only to do what they are normally required to do—to answer substantive questions with a simple set of response categories. The disadvantages are that a) it is harder to access a large number of response terms and thus is better suited for assessing a discrete response scale already adopted than for evaluating a large number of terms that might be utilized in possible response scales,5 b) results will depend on the precise underlying distribution and the modeling procedures adopted, and c) it creates more work for the analysts, since the strength of terms must be indirectly estimated from the distributions rather than directly calculated from respondent ratings. A final approach uses anchoring vignettes to establish comparability across measures (Banks et al. 2004; d’Uva et al. 2006; King et al. 2004;
It would be possible to evaluate more terms using more random sub-groups, but in order to maintain the same level of precision this would mean increasing the sample size. Similarly, the same people could be asked many repetitions of a question with different response scales, but this would soon become tedious and later repetitions would probably be distorted by the previous administrations. 5
52
Tom
w. smith et al.
Salomon et al. 2004). Short vignettes describing a person’s situation regarding the construct of interest are devised and then respondents evaluate the person’s situation and rate it. For example, the vignettes may describe a person’s health status and then ask respondents to rate that person’s health as “excellent, very good, good, fair, or poor.” Given that the vignette’s person’s objective, health-related conditions are fixed and identical across respondents, differences in ratings are deemed to reflect how the scale is understood and utilized by respondents. When comparing two groups such as respondents from two countries, the mean differences in responses to the vignettes can be used to anchor people’s ratings of their own health and this make those ratings more comparable across groups. As with the response-scale, calibration approach, the anchor-vignette approach does not have to be asked of all respondents on every survey, but can be used to generate general adjustment factors that can be applied whenever the tested construct and response scale are used. This approach rests on several assumptions. First, response consistency assumes that respondents use scales to rate people in vignettes in the same way that they use scales to rate their own situation. Second, vignette equivalence assumes the objective situations in vignettes are perceived by people across groups in the same way. While not implausible, neither of these assumptions have been seriously tested. Because the direct rating approach provides the quantified intensity scores needed in the most straight-forward manner, this was adopted as the main technique in this study. In addition, there may be context effects in the rating of the intensity of terms. For example “very” may be rated more intensely if it was the first strong term presented than if it followed other stronger terms (e.g. completely, extremely). Context effects have generally not been searched for in this line of research, but the randomization of order in several studies has tried to average out any such effects. This latter approach is generally utilized here, but an ordered vs. not ordered experiment is also included. American and German Pilot Studies and Japanese Replication Pilot studies were carried out in the United States and Germany to use the above approach to evaluate the translation and equivalency of response scales. The American pilot study was carried out on a quasi-representative sample of adults living in households. Ten sample
methods for assessing and calibrating response scales
53
points were selected to represent all four Census regions (West, South, Midwest, and Northeast) and three size of place strata (central cities, metropolitan areas outside of central cities, and non-metropolitan areas). Interviewers had quotas to fill based on gender, age, and employment status. They proceeded through neighborhoods in the selected communities until the quotas were completed. The study was designed and carried out by the National Opinion Research Center at the University of Chicago. Besides representing the adult population of United States on the stratification and quota variables (region, size of place, gender, age, and employment status), the sample is also representative on race and marital status. The sample does underrepresent the less educated segment of the population (less than a high school degree: pilot study 6%, General Social Survey 17%). Interviews were conducted in July/ August, 1995. A total of 119 interviews were collected, but two were lost in the mail for a final total of 117. The German pilot study stratified the country by states (Bundeslaender) and city size (cities over 100,000 vs. else). Within these areas interviewers filled quotas based on gender, age, and education. The study was designed and supervised by the Zentrum fuer Umfragen, Methoden, und Analysen, Mannheim, and interviewing was conducted by Infratest— Burke Sozialforschung, Munich. The sample closely matches German Census figures on gender, age, and education. Fieldwork was carried out in September, 1995. A total of 221 interviews were conducted. In order to see how the results between two linguistically-similar languages (and two societies with close cultural and historical ties as well) compared to findings from a country with a very dissimilar language (and more remote culture and history in general), the study was later replicated in Japan. The Japanese study was carried out by the NHK Broadcasting Culture Research Institute on a national sample of 405 in March, 2001 (Onodera 2002). American Results In the pilot study attempts were made both to assess the intensity that people assigned to particular terms and therefore response categories and to evaluate the meaning of the underlying continuum on which intensity was being measured. First, people were asked to rate the intensity of 27 phrases on a 21-point agree/disagree scale (See Qs A3 and B3 in the Appendix). Item order was randomized by sorting cards
54
Tom
w. smith et al.
Table 1. Mean scores on agree/disagree terms Term
Mean
Standard Deviation
Completely agree Definitely agree Strongly agree Very much agree Agree a lot Agree Basically agree Probably agree Tend to agree Moderately agree Somewhat agree Agree a little In the middle Neither agree nor disagree Can’t choose Undecided Disagree a little Somewhat disagree Moderately disagree Tend to disagree Probably disagree Disagree Not agree Disagree a lot Strongly disagree Very much disagree Definitely disagree Completely disagree
19.4 19.0 18.8 18.5 17.2 16.1 13.8 13.6 13.5 13.3 12.9 12.1 10.1 9.9 9.8 9.6 7.1 6.6 6.4 6.4 6.2 3.5 3.5 3.0 1.5 1.4 1.0 0.8
1.6 1.5 1.3 2.2 2.8 2.9 3.1 2.9 2.8 2.3 2.4 2.6 0.7 1.3 2.7 1.8 2.2 2.1 2.3 2.7 3.1 2.9 3.1 3.6 2.2 1.6 1.3 2.3 N = 97–101
containing the phrases, except for “basically” which was the first term rated by each respondent. Table 1 gives the means and standard deviations for the terms.6 In terms of magnitude and relative position the
6 On a scale-by-scale basis cases were excluded from the analysis that failed to carry out the ratings adequately. This excluded respondents who refused to do items, those with high item non-response, those who could not consistently associate terms with the proper pole, and those showing peculiar response patterns. People were not excluded for a few unusual responses, but for incomplete and erratic responses to the scale as a whole. There were 14 exclusions for Q. 3 (agree/disagree), 5 for Q. 4 (important/unimportant), 10 for Q. 5 (in favor of/against), and 12 for Q. 6 (ranges
methods for assessing and calibrating response scales
55
terms array themselves almost exactly as one would expect.7 The 11 agree terms run from “agree a little” at 12.1 to “completely agree” at 19.4. The four mid-point or uncertain terms are from 10.1 to 9.6. The 11 disagree terms range from “disagree a little” at 7.1 to “completely disagree” at 0.8. In addition, “not agree” exactly matches “disagree” at 3.5. Standard deviations follow a wave pattern. They are small near the extremes, increase as intensity moderates, and then decrease to their lowest level for the two mid-point categories (in the middle and neither agree nor disagree). The lower range for the categories near the extremes (strongly agree/disagree) is only partly a function of floor and ceiling effects resulting from respondents rating the terms at or near the end-points. The unbounded end of the range is usually a little smaller than that for broader and more moderate terms. For example, the average upper range for strongly disagree is +1.6 compared to +2.2 for disagree, while the average lower range for strongly agree is –1.9 compared to –2.2 for agree. Thus these terms appear to have more precise and limited meanings not only because of floors and ceilings, but also because their greater intensity also narrows people’s understanding of their meaning. The standard deviations narrow for the middle categories because people have a clear and consistent understanding of on agree/disagree). Overall, there were 15 respondents who were excluded for two or more individual scales. Exclusions were significantly associated with interviewer assessments that respondents misunderstood the word rating tasks and that these tasks were difficult. (The interviewer evaluations questions were “How was the respondent’s understanding of the word rating tasks? Completely understood/Mostly understood/ Mostly misunderstood/Completely understood” and “How hard were the word rating tasks for the respondent? Very difficult/Somewhat difficult/Somewhat easy/Very easy.”) Exclusions were also higher among the less educated, although only the association with Q. 5 was statistically significant. 7 On the agree/disagree rating scale the questionnaire was handed back to respondents after the question was completed and they were told by the interviewer “Please look over your answers. If you want to change any of your responses, indicate in the right-hand column, the one headed “CHANGES,” what number you now want to give a phrase.” Respondents to later questions were not given a chance to review their responses, but at any point while a question was being administered a respondent could change a response. Changes were fairly rare. 62.4% made no changes, 17.0% 1–2 change, 14.6% 3–6 changes, and 6.0% 7+ changes. On average 1.7 changes were made among the 28 phrases rated. Two type of changes were common. First, there were minor upward or downward adjustments to have responses better fit in with other phrases being rated. Second, there were pole corrections when respondents realized they had oriented their response to the wrong end of the scale. These usually resulted in large changes (e.g. from 2 to 18). In almost all cases, the changes moved answers towards the modal response.
56
Tom
w. smith et al.
what the mid-point of a scale is. The uncertain terms “can’t choose” and “undecided” are also placed near the middle, but the standard deviations are a bit higher because some people wanted to rate them as off-scale and gave some different responses such as 0 to try to convey this idea. (In addition, a few more people than for the other terms did not rate these terms for the same reason.) Second, a similar exercise was carried out on two important/unimportant scales (Qs. A4 and B4 in Appendix). As in the case of the agree/disagree scale, order of presentation was randomized by sorting. Table 2 reports the means and standard deviations. On one half of the sample people rated terms on an unipolar scale measuring degree of importance and on the other half on a bipolar scale of important/unimportant. The unipolar scale ran from 19.4 for “extremely important” to 1.4 for “not at all important.” The bipolar scale extended from 19.4 “extremely important” to 0.8 for “extremely unimportant.” On this scale middle terms were placed very near the mid-point (“in between” = 10.0; “neither important nor unimportant” = 9.5). There were 15 “important” terms that were rated on both scales. In 13 of these cases the terms were rated somewhat higher on the bipolar scale than on the unipolar scale. It appears that on the important/not important scale people adjust terms down towards the not important end of the scale. For example, “neither important nor unimportant” is scored at 9.0 instead of the mid-point of 10.0. This suggests that “unimportant” defines a more extreme position than the lack of importance does. The latter is seen by at least some as indicating the absence of importance rather than the presence of unimportance. Standard deviations are smaller for high terms on both the unipolar and bipolar scales, but the pattern is less clear at the lower end of these scales. “Not important” terms have the largest standard deviations of all terms on the bipolar scale and on the unipolar scale they have among the largest values. Some negative phrases tend to confuse people in general (Smith 1995) and especially on the bipolar scale people were less sure where to rate these terms vis-a-vis the “unimportant” terms. Third, Table 3 rates another set of terms, “against/in favor of,” and also carries out an order experiment (Qs. A5 & B5). In terms of the means and standard deviations both orders are similar to each other and to the pattern shown with “agree/disagree” in Table 1. In particular, the means have magnitudes and relative positions as one would expect and the standard deviations show the same wave pattern of going from small for extreme terms to larger for more moderate, general terms and then smaller for the middle term.
methods for assessing and calibrating response scales
57
Table 2. Mean scores on important/unimportant terms Term
Important: Extremely Very, very Exceptionally Completely Definitely Highly Very Quite IMPORTANT Pretty Probably Fairly Somewhat Slightly A little bit Neither imp. nor unimp. Not too Not very Not Not at all Unimportant: In between Slightly A little bit Somewhat Probably Fairly Very Pretty UNIMPORTANT Definitely Completely Extremely N
Important Only Important/ Combined List List Unimportant List Mean SD Mean SD Mean SD 19.4 19.0 18.9 18.6 18.5 18.2 18.2 16.8 15.1 15.0 13.0 13.4 12.2 10.8 10.1 9.0
1.2 2.4 2.4 2.9 2.0 1.9 1.5 2.8 3.6 3.1 3.4 3.5 3.5 3.4 4.2 3.6
19.4 – – 19.1 18.4 – 18.3 – 16.3 15.6 14.0 13.9 13.2 12.0 12.2 9.5
0.9 – – 1.8 2.0 – 2.6 – 2.8 2.6 3.1 2.2 2.5 2.7 2.4 2.4
19.4 – – 18.8 18.4 – 18.2 – 15.5 15.3 13.5 13.6 12.7 11.3 11.1 9.3
1.0 – – 2.5 2.0 – 2.1 – 3.5 2.9 3.3 3.0 3.1 3.2 3.7 3.1
6.8 4.7 2.4 1.4
3.6 3.4 3.4 3.3
– 5.5 4.1 3.0
– 4.2 4.4 4.2
– 5.1 3.2 2.2
– 3.8 4.0 3.8
– – – – – – – – – – – –
– – – – – – – – – – – –
10.0 8.0 7.9 6.6 6.1 5.8 5.1 4.7 3.6 1.8 1.3 0.8
2.2 2.7 3.0 2.7 3.8 3.2 6.7a 3.1 3.9 3.3 3.6 1.8
– – – – – – – – – – – –
– – – – – – – – – – – –
56–58
51–54
109–112
This item has a small number of cases coded near the high end of the scale (16–20). These cases create the large standard deviation (6.7) and also make the mean (5.1) much higher than the median (2). In all other cases the mean and median are very close (almost always within +/– 1). Inspection of the cases to see why a high number of pole reversals (i.e. errors of reference) occurred on this item did not reveal any special cause.
a
58
Tom
w. smith et al.
Table 3. Ratings of In favor of and Against Ascending Order Mean StdDev
Mixed Order Mean StdDev
Strongly against Against Slightly against Neither against nor in favor of Slightly in favor of In favor of Strongly in favor of
1.6 3.9 6.9 9.6 12.2 15.9 18.9
2.0 2.2 2.4 1.7 2.3 2.4 1.3
1.3 3.5 7.2 9.8 12.1 15.6 18.6
1.9 3.4 1.9 1.4 1.9 2.8 1.6
% with all items rated in ascending order
71.7
–
37.0
–
N
53
52–54
The order experiment did however reveal a decided difference in terms of the consistency of ratings. On the version that arranged terms in ascending order from “strongly against” to “strongly in favor of ” as they would be presented as part of a response scale, 72% of people rated all seven terms in ascending order without any inconsistency. On the version that presented the terms in a fixed, unordered sequence only 37% of respondents rated all seven terms in ascending order. This indicates that presenting the terms in ascending order, as they are presented as actual response scales, provides people with additional information and constrains how people perceive and evaluate the terms. When terms are organized as a scale, people are more likely to perceive and treat them as such. Fourth, Table 4 shows that the values assigned to terms at both ends of the scales for agree/disagree, important/unimportant, and against/in favor of (Table 1–3) are highly symmetrical. The first column gives the mean rating for each term when associated with the positive/top end of the scale. The second column gives the rating when used in conjunction with the lower end of the continuum. The third column reverses the numbers in the second column to show what they equal if rated at the opposite end. Comparing the first and third columns shows how similar and symmetrical the ratings are. With one exception, all terms rated at the positive end practically match how they are rated at the negative pole. This indicates that people assign these terms a consistent value regardless of their positive or negative orientation.
methods for assessing and calibrating response scales
59
Table 4. Symmetry in ratings A. Ratings of Agree/Disagree (Samples A and B) Agree Disagree Completely Definitely Strongly Very much A lot AGREE/DISAGREE Not agree Probably Tend to Moderately Somewhat A little
19.4 19.0 18.8 18.5 17.2 16.1 – 13.6 13.5 13.3 12.9 12.1
0.8 1.0 1.5 1.4 3.0 3.5 3.5 6.2 6.4 6.4 6.6 7.1
20 – Disagree 19.2 19.0 18.5 18.6 17.0 16.5 16.5 13.8 13.6 13.6 13.4 12.9
B. Ratings of Importance/Unimportance (Sample B) Important Unimportant 20 – Unimp. Extremely Completely Definitely Very IMPORTANT/UNIMPORTANT Pretty Probably Fairly Somewhat A little bit Slightly
19.4 19.1 18.4 18.3 16.3 15.6 14.0 13.9 13.2 12.2 12.0
0.8 1.3 1.8 5.1 3.6 4.7 6.1 5.8 6.6 7.9 8.0
C. Ratings of In favor of/Against (Samples A and B) In favor of Against Strongly In favor of/Against Slightly a
See note “a” in Table 2.
18.7 15.8 12.2
1.5 3.7 7.1
19.2 18.7 18.2 14.9 16.4 15.3 13.9 14.2 13.4 12.1 12.0 20 – Against 18.5 16.3 12.9
60
Tom
w. smith et al.
Table 5. Comparisons across rating scales A. Agree/Disagree and Important/Unimportant (Sample B) Agree Important Disagree Completely Definitely Very much/very BASE WORD Probably Somewhat A little/a little bit
19.4 18.9 18.2 15.6 13.5 12.5 12.1
19.1 18.4 18.3 16.6 14.0 13.2 12.2
0.8 1.2 1.4 3.7 6.5 6.6 7.1
B. Agree/Disagree and In favor of/Against (Samples A and B) Agree In favor of Disagree Strongly BASE WORD Neither Agree/In favor or . . . a
18.8 16.1 9.9
18.7 15.8 9.7
1.5 3.5 –
Unimportant 1.3 1.8 5.1a 3.6 6.1 6.6 7.9 Against 1.5 3.7 –
See note “a” in Table 2.
Fifth, Table 5 shows that terms are also rated in a highly similar manner when the underlying continuum varies. Part A indicates that terms rated on the agree/disagree and important/unimportant scales have highly consistent values. Part B reveals that terms rated on agree/ disagree and in favor of/against are also quite similar. Along with the results from Table 4, this indicates that ratings are robust and that terms probably have similar intensities across various scales. Sixth, the rating of scales are also quite stable across sub-groups. Subgroup differences were examined for all items rated in the agree/ disagree and important/unimportant scales. Differences by gender, age, education, and race were examined. While a few statistically significant results emerged, there were no consistent differences either across samples or demographics. Education showed the most significant differences (6 of 43) Pearson’s correlations, but only one significant one-way analysis of variance. The education effects that do appear seem to be related to the greater difficulty of less educated respondents in carrying out the rating task, rather than to systematic differences in the meaning of terms. Finally, Table 6 examines intra- and inter-respondent variability in the rating of terms. Intra-respondent variability was measured by selecting eight terms rated on the agree/disagree scale and reminding people
methods for assessing and calibrating response scales
61
Table 6. Range of acceptable values Mean Rangea
StdDevb
2.6 4.0 4.1 1.6 1.6 4.1 3.9 2.6
1.3 3.1 2.9 1.3 2.7 3.5 3.1 2.1
Strongly agree Basically agree Agree Neither agree nor disagree Can’t choose Disagree Not agree Strongly disagree a b
Difference between high and low limits in Q. 6 Standard deviation of items in Q. 3
what score they had assigned to the terms. Next, people were asked what was the lowest value they would accept for the term and what was the highest (Qs. A6 & B6). If they thought that no variation from their earlier assignment was acceptable, then that same value was entered as the minimal and maximum score for the term and the acceptable range was 0. The first column shows the mean interval between the top and bottom values. First, these values follow the wave pattern described earlier for the standard deviations (which are presented in the second column for comparison). Acceptable ranges are narrow at the extremes and at the middle and widest between the middle and extremes. Second, the ranges are almost perfectly symmetrical with strongly agree/strongly disagree and agree/disagree showing the same means. Third, most people see these term as somewhat malleable. They do not believe that the terms have only a precise and invariant value (like agree = 16.1), but see terms covering a range of values (e.g. 14–18). Next, assessments were made of the meaning of the underlying dimension on which the above terms were arrayed. First, the similarity between different pair of words were examined. In Table 7 five pair of words were compared with the pair “agree/disagree” (Qs. A8 & B8). People evaluated how similar the “agree/disagree” pair was to each of the other pairs. “For/against” and “favor/oppose” were considered to be the most similar, “positive/negative” the next closest, and “like/dislike” and “important/unimportant” the least alike. This indicates that “agree/disagree,” and “for/against” and “favor/oppose” come closest to tapping a similar underlying dimension, while the other pairs define more distinct continuums.
62
Tom
w. smith et al.
Table 7. Closeness of the meaning of various pairs of words to Agree/Disagree Meana
% Very much the same
2.4 3.2 2.9 2.3 2.9
34.7 8.9 8.8 29.7 14.9
For/against Important/unimportant Like/dislike Favor/Oppose Positive/negative
n = 101
Response scale ran from 1 = Very much the same to 4 = Very much different. Lower number indicates pairs are closer.
a
Table 8. Terms used in the definition of agree Accept, acceptance Accord, accordance Against (not) Agree, agreeable, agreeing, agreement Alike Approve Congenial Consensus, consent Disagree (not) Favor For Harmony In line with Like, liking Mutual OK Same Similar Support True Valid
6 2 1 13 1 3 1 4 1 6 6 2 2 3 1 2 16 1 2 2 1
Then, the similarity of other terms to those used in the “agree/disagree” (“agree,” “neither agree nor disagree,” and “disagree”) and “important/unimportant” (“important” and “unimportant”) dimensions were assessed by an open-ended item that asks people to define these terms.8 8 The definition tasks were found to be fairly hard by many people. A number of interviewers noted in the evaluation section that particular people had problems expressing themselves and often used the word itself as part of the definition.
methods for assessing and calibrating response scales
63
Table 8 lists the terms offered to define “agree.” The list basically includes synonyms along with repetitions of “agree” itself. The use of this list will be discussed in the comparative section below. America and Germany Compared The preceding analysis indicates how useful the evaluations of the response terms are for understanding response scales in general. Here the use of this information for comparing scales in two countries and languages is considered. Tables 9 and 10 show that overall there is a high correspondence between the agree/disagree and important/unimportant scales in the United States and their counterparts in Germany. Table 9 presents the mean ratings for the agree/disagree and the two German counterparts stimme zu/lehne ab and stimme zu/stimme nicht zu. The American scores correlate almost as highly with both German scales (respectively r = 0.993 and 0.986) as the two German scales associate with each other (r = 0.995) and most means are close and not statistically different from one another (Mohler et al. 1997). Despite this extremely high correlation and the general correspondence in scale scores, there are some important differences in the mean values. First, the base words (e.g. agree, stimme zu, disagree, lehne ab, etc.) have more extreme meanings in German than in English. For example agree is 16.1 in English and stimme zu 17.4–17.5 in German. Semantically “lehne ab (from “ablehnen”) is more like “to reject (e.g. an idea)” than disagree; German lacks a verb directly corresponding to “disagree”. Second, “definitely” is a stronger term in English than “bestimmt” is in German. Third, while “strongly” is a weaker term in English than either “completely” or “definitely,” this does not appear to be the case in German where “voll und ganz” shows up as the strongest German term. But this disagreement is really in using “voll un ganz” to stand for “strongly”. It means literally “fully and wholly” and as such it is not unexpected that it is rated more highly. In retrospect, perhaps the German phrase “stimme stark zu” should have been used to match “strongly agree”. Fourth, while “a lot” is an intensifier in English both in terms of semantics and its performance in this study, “ziemlich” in German is more complicated. In terms of semantics “ziemlich” is a middle to high intensifier depending on context. “Ziemlich” X can mean X to a considerable, but not extreme, degree or as very X. There is a direct and literal way of expressing very X in German (“sehr” X). In selecting “ziemlich” X with the intention of meaning very X, one
64
Tom
w. smith et al.
Table 9. American/German scores on Agree/Disagree, Stimme zu/Lehne ab, and Stimme zu/Stimme nicht zu English/German
Agree/Stimme zu: Completely/Voellig Definitely/Bestimmt Strongly/Voll und ganz Very much/Sehr A Lot/Ziemlich AGREE/STIMME ZU Basically/Im grunde Probably/Wahrscheinlich Tend to/Eher Moderately/Maessig Somewhat/Teilweise A little/Ein bisschen
America
Germany Stimme zu/ Stimme zu/ Lehne ab nicht Means SD Means SD
Means
SD
19.4 19.0 18.8 18.5 17.2 16.1 13.8 13.6 13.5 13.3 12.9 12.1
1.2 1.5 1.3 2.2 2.8 2.9 3.1 2.9 2.8 2.3 2.4 2.6
19.3 17.9 19.7 17.6 16.0 17.5 14.4 13.8 13.8 12.3 13.3 12.5
1.8 2.5 1.4 2.8 2.2 2.5 2.9 2.8 2.5 2.4 2.5 2.5
19.3 17.6 19.8 18.3 16.4 17.4 14.6 14.0 13.8 10.4 12.8 11.7
1.4 3.1 0.7 2.1 2.6 2.8 3.5 3.1 3.1 3.9 2.5 3.5
0.7 1.3
10.0 9.7
1.3 1.5
9.9 9.6
1.3 2.4
2.7
9.5
2.5
8.5
3.7
1.8
10.0
1.1
10.0
0.5
2.2 2.1
6.7 6.8
2.4 2.5
– 7.6
– 3.1
2.3 2.7 3.1 2.9 3.1
6.6 5.9 6.1 2.9 3.5
3.1 2.4 2.6 2.6 3.6
– 6.0 4.9 1.2 1.8
– 3.1 2.5 2.4 2.2
3.6
4.1
2.6
4.4
3.1
2.2
1.7
2.7
0.4
0.9
1.6
2.0
2.6
1.2
2.9
1.3 2.3
2.7 1.1
3.0 2.6
1.7 0.6
2.8 1.6
Middle/Mitte: In the middle/In der mitte 10.1 Neither agree nor disagree/ 9.9 Stimme weder zu noch lehne ab/Stimme weder zu noch nicht zu Can’t Choose/Kann ich 9.8 nicht sagen Undecided/Unentschieden 9.6 Diasagree/Lehne ab/Stimme nicht zu: A little/Ein bisschen 7.1 Somewhat/Teilweise/ 6.6 Zum teil nicht Moderately/Maessig 6.4 Tend to/Eher 6.4 Probably/Wahrscheinlich 6.2 DISAGREE/LEHNE AB 3.5 Not agree/STIMME 3.5 NICHT ZU A lot/Ziemlich/ 3.0 Ueberwiegend nicht Strongly/Stark/ 1.5 Ueberhaupt nicht Very much/Sehr/ 1.4 Entschieden nicht Definitely/Bestimmt 1.0 Completely/Voellig/ 0.8 Ganz und gar nicht
methods for assessing and calibrating response scales
65
Table 10. American/German scores on Important/Unimportant and Wichtig/Unwichtig English/German Important/Wichtig: Extremely/Aeusserst Completely/Voellig Definitely/Bestimmt Very/Ganz IMPORTANT/WICHTIG Pretty/Schon ziemlich Probably/Wahrscheinlich Fairly/Einigermassen Somewhat/Teilweise A little bit/Ein wenig Not very/Nicht sehr Not/Nicht Not at all/Ueberhaupt nicht Middle: Neither important nor unimportant/Weder wichtig noch unwichtig In between/Dazwischen Unimportant/Unwichtig: A little bit/Ein wenig Somewhat/Teilweise Probably/Wahrscheinlich Fairly/Einigermassen Very/Sehr Pretty/Schon ziemlich UNIMPORTANT/ UNWICHTIG Definitely/Bestimmt Completely/Voellig Extremely/Aeusserst
America Means SD
Germany Means SD
19.4 18.8 18.4 18.2 15.5 15.3 13.5 13.6 12.7 11.1 5.1 3.2 2.2
1.2 2.5 2.0 2.1 3.5 2.9 3.3 3.0 3.1 3.7 3.8 4.0 3.8
18.6 18.8 16.5 18.4 16.4 15.7 13.0 11.9 12.9 11.0 5.8 2.6 1.5
3.0 3.0 2.6 2.9 2.8 2.7 3.5 3.9 2.8 3.5 3.7 3.4 2.8
9.3
3.1
9.7
2.3
10.0
2.2
10.1
1.9
7.9 6.6 6.1 5.8 5.1 4.7 3.6
3.0 2.7 3.8 3.2 6.7 3.1 3.9
7.0 6.6 5.3 5.2 1.5 4.8 2.2
3.2 2.6 3.1 3.1 3.7 4.6 3.4
1.8 1.3 0.8
3.3 3.6 1.8
2.6 1.1 0.9
3.1 3.0 2.8
66
Tom
w. smith et al.
opts for stylistic understatement. In literal terms however “ziemlich” remains less intense than “very” or “a lot”. As a result, one can understand it as meaning less than “very” and thus more moderate. Thus, if “stimme zu” is taken as an absolute, “Stimme ziemlich zu” can be understood as less than the absolute, meaning something like a good bit of agreement, but not all. This apparently is what many German respondents did, since empirically it acted as a deintensifier. In the ZUMA survey 58–85% of respondents rated it less strongly than they rated the base words (stimme zu, lehne ab, stimme nicht zu). It maybe that “ziemlich” is not an appropriate translation of “a lot.” Table 10 shows the mean scores for important/unimportant and wichtig/unwichtig. As before, the cross-national, scale scores correlate very strongly (r = 0.987) and most means are quite close. But again there are some notable differences. First, as in previous comparisons, the base words are stronger in German than in English (e.g. unimportant = 3.6 and unwichtig = 2.2). Second, “definitely” again shows up as stronger than “bestimmt.” The difference in the intensity of base terms may be a general difference between English and German. The pattern appears not only for agree/disagree and important/unimportant (See Tables 9 & 10), but also for in favor of/against. Next, Table 11 shows the frequency of English terms used to define “agree” and German terms used to define “stimme zu.” The next step in the analysis is to take each English term and translate it into German and each German term and translate it into English. “Agree” and “stimme zu” will be judged to mean the same thing to the extent that a) the German terms offered as meaning “stimme zu” match the German terms translated from English terms used to define “agree” and b) the relative frequency of these terms are similar. Perfect correspondence would involve only matched terms appearing and in the same proportions. While the detailed analysis has not been carried out, the terms in Table 11 clearly show both much overlap and some distinctions. For example, “bejahe,” etc. means “accept,” “give consent” both of which appear among the English terms and translations of “approve,” etc. include “zustimmung” which is among the German terms offered. But “positiv” in German is equivalent to “positive” in English and this term is not mentioned in the American survey.
methods for assessing and calibrating response scales Table 11. Terms used in the definition of Agree and Stimme zu A. Agree Accept, acceptance Accord, accordance Against (not) Agree, agreeable, agreeing, agreement Alike Approve Congenial Consensus, consent Disagree (not) Favor For Harmony In line with Like, liking Mutual OK Same Similar Support True Valid
6 2 1 13 1 3 1 4 1 6 6 2 2 3 1 2 16 1 2 2 1
B. Stimme zu Akzeptieren, akzeptabel, akzeptiere Anerkannen Befueworte, befueworten, befuewortung Bejahe, bejahen, bejahung Dafuer Einverstanden, Einverstandnis, einverstandniserkaerung Gleiche, gleichen, gleicher Grosse Grund, grunde Gut Identisch Positiv, positive Richtig, richtige, richtigkeit Selbe, selben, selber Voll, volle, volles Uebereinstimmen, uebereinstimmung Ueberzeugt Ueberzeugung Zustimme, zustimmen, zustimmung Zutreffend Zuveriaessig
3 1 4 8 42 55 23 1 4 8 1 7 9 6 11 7 9 3 14 1 1
67
68
Tom
w. smith et al.
America and Japan compared The Japanese instrument was based on a translation of the American instrument. The German questionnaire was not consulted. The larger differences between Japanese and English compared to English and German and the greater challenge of coming up with equivalent terms for testing were apparent from the start. Two bilingual translators, one a native Japanese speaker and the other a native English speaker, independently translated the 50 some agree/disagree and importance terms to be rated. They came up with only one translation that was identical (although many were similar). Moreover, of the 28 terms used for agree/disagree, the final terms used in the study agreed with either of the initial translations in only 11 of 28 cases (Onodera 2002).9 Overall, there was a high agreement on the ranking of terms between the English and Japanese on both the agree/disagree and importance scale (Tables 12 and 13). There was also usually close agreement on the absolute scores given. However, absolute differences were somewhat larger than for the English-German comparisons. On agree/disagree 9 of 28 terms differed by one point or more for English and Japanese, but only 6 of 28 differed by that much for English and German. Similarly, on importance 12 of 26 terms were separated by a point or more for English/Japanese, while only 4 of 25 varied by this much for English/ German. The main differences were that 1) the more intense terms on agreedisagree were always rated more towards the endpoints in English than the Japanese terms were (this was also true, but to a lesser extent, for the importance terms), 2) there was closer ratings toward the high end of the importance scale and less towards the low end, and 3) while there was close agreement on the mid-point location of “dochira to mo ieru/in the middle” and “dochira to mo ienai/neither agree nor disagree”, there was less agreement on “wakarani/can’t choose”.10 Given the linguistic gulf between English and Japanese, the matching of terms was notable. But where differentiation occurred, further work on comparable translation and scaling are indicated. 9 Also, there is the matter as to how closely the base terms agreed. The Japanese terms used for agree/disagree are not literal equivalents, but come closer to the notion of “thinking X is the case/thinking X is not the case.” 10 On “wakarani/can’t choose” 15–19% of Japanese respondents gave no rating (left blank) apparently thinking of the items as off-scale and a number of people gave ratings of zero—perhaps trying to convey the same idea (Onodera 2002).
methods for assessing and calibrating response scales
69
Table 12. American/Japanese scores on Agree/Disagree and Sou Omou/ Sou Omowani English/Japanese Agree/Sou Omou: Completely/Mattaku Definitely/Zettai Strongly/Hijou ni Strongly/Tsuyoku Very much/Ooi ni A lot/Kanari AGREE/SOU OMOU Basically/Kihonteki ni wa Probably/Osoraku sou da to omou Tend to/Dochira ka to ieba Moderately/Aru teido Somewhat/Tashou A little/Yaya Middle/Dochira to mo Ieru: In the middle/Dochira to mo ieru Neither agree nor disagree/ Dochira to mo ienai Can’t Choose/Wakaranai Undecided/Kimerarenai Disagree/Sou Omowanai: A little/Yaya Somewhat/Tashou Moderately/Amari Tend to/Dochira ka to ieba Probably/Osoraku sou de wa nai to omou DISAGREE/SOU OMOWANAI Not agree/Sou Omowanai A lot/Kanari Strongly/Kesshite sou wa omowanai Very much/Zenzen Definitely/Zettai Completely/Mattaku
English
(Means) Japanese
19.4 19.0 18.8 18.8 18.5 17.2 16.1 13.8 13.6 13.5 13.3 12.9 12.1
17.7 18.8 17.7 17.7 17.1 16.4 15.7 14.7 14.4 13.5 13.8 12.8 13.2
10.1 9.9
10.4 10.0
9.8 9.6
8.3 9.2
7.1 6.6 6.4 6.4 6.2
6.2 6.7 6.0 6.4 5.8
3.5 3.5 3.0 1.5 1.4 1.0 0.8
3.7 3.7 3.7 2.7 2.1 1.4 2.1
70
Tom
w. smith et al.
Table 13. American/Japanese scores on Important/Unimportant and Juuyou Dearu/Juuyou De Wa Nai (Means) English Japanese Important/Juuyou Dearu: Extremely/Kiwamete Completely/Mattaku Definitely/Zettai Very/Hijou ni IMPORTANT/JUUYOU DEARU Pretty/Kanari Probably/Dochira ka to ieba Fairly/Aru teido Somewhat/Tashou wa Slightly/Yaya A little bit/Maa Middle/Dochira To Mo Ieru: Neither important nor unimportant/Dochira to mo ienai In between/Dochira to mo ieru Unimportant/Juuyou De Wa Nai: Slightly/Yaya A little bit/Maa Somewhat/Sorehodo Probably/Dochira ka to ieba Fairly/Am ari Not very important/Taishite Very/Sukoshi mo Pretty/Hotondo Not Important/JUUYOU DE WA NAI UNIMPORTANT/JUUYOU DE WA NAI Not at all/Zenzen Definitely/Zettai Completely/Mattaku Extremely/Kesshite
19.4 19.1 18.4 18.3 16.3 15.6
18.1 18.0 18.9 18.4 16.8 17.3
14.0 13.9 13.2 12.0 12.2
13.2 13.9 13.4 13.4 13.6
10.0
9.9
9.5
10.0
8.0 7.9 6.6 6.1 5.8 5.5 5.1 4.7 4.1 3.6 3.0 1.8 1.3 0.8
6.3 5.4 5.9 6.5 5.0 5.0 3.0 2.8 2.5 2.5 1.6 1.3 1.5 2.4
Note: Except as indicated the American means are from the important/unimportant column in Table 2. As Table 2 shows, the English tried two formulations important and unimportant, and important and not important. Japanese does not have two ways of expressing the presence/absence of importance comparable to the English. Slightly different results emerge if the other English scale or the combined scales are used.
methods for assessing and calibrating response scales
71
Using Pilot Results to Formulate Response Scales In an ideal world, a five-point, response scale would mark the lowest value, the exact midpoint, the highest value and two intermediate values with equal intervals between the points. In the case of the this study, the highest agreement value should be somewhat near 20, the intermediate agreement value near 15, the midpoint at 10, the intermediate disagreement value near 5 and the highest disagreement value at 0. Moreover, this should be the case for all countries and cultures (i.e. full agreement would represent in all cultures ‘20’, intermediate agreement ‘15’ and so on). But as these pilot studies indicate, real world response scales are not ideally calibrated in such a way. The studies tested response categories in an out-of-survey context, placing them in the unusual context of modifier comparison. As suggested later, respondents answering real survey questions may well fix the unbalanced scales and ‘transform’ them into well-calibrated scales. But this is something that will require further research. In the meanwhile, the pilot studies suggest what the next steps may be. As one can see from Table 14, response scale terms used in America, Germany, and Japan do not form an ideally balanced scale. The midpoint is the only answer category fitting optimal response scale properties across all three countries. There is a difference in the highest agreement as well as the expected compression of the Japanese scale, for example. On the other hand, one could be satisfied with the results because the end-points in all three countries are at a marked distance from the midpoint. At the same time, the intermediate points raise some questions. The American results indicate that “disagree” (3.5) category is nearer to “strongly disagree” (1.5) than ideal, leaving a substantial gap between “disagree” and “Neither nor . . .” (9.9). Similar observations can be made for Germany and Japan. Thus the pilot studies suggest that respondents in each country rate expressions differently that are assumed at face value to “match” across languages. As an alternative, one could assemble a different set of answer categories from the pilot lists as illustrated in Table 15. The categories presented here spread more equally over the 0–20 possible values. They use the highest and lowest categories in each country (thus reducing the difference between the highest agreement in America and Japan from .9 to only .2). They are closer at the 15 and 5 intermediate points, and use the lowest agreement level available (with still a notable difference between Germany and America on one hand and Japan on the other).
72
Tom
w. smith et al.
Table 14. Means of response categories commonly used in ISSP surveys Item IDs D/US A20/v A16/b A4/p A3/j A5/w A9/e
German Expressions
American Expressions
Japanese Expressions
Stimme voll und ganz zu Stimme zu Stimme weder zu noch lehne ab Lehne ab Lehne stark ab Kann ich nicht sagen
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree Can’t choose
Hijouni sou omou Sou Omou Dochira tomo ieanai
Mean Mean Mean Germany USA Japan
Sou omowanai Kesshite sou wa omowanai Wakaranai
19,87
18,80
17,70
19,05 9,77
16,00 9,90
15,70 10,00
2,41 1,21
3,50 1,50
3,70 2,70
9,42
9,80
8,30
Table 15. Means of best matched response categories from pilot studies Item IDs D/US A17/h A1/a A4/p A21/o A2/i
German Expressions Stimme bestimmt zu Stimme im Grunde zu Stimme weder zu noch lehne ab Lehne maessig ab Lehne stark ab
American Expressions Definitely agree Basically Agree Neither agree nor disagree Moderately Disagree Definitely disagree
Japanese Expressions
Mean Mean Mean Germany USA Japan
Zettai
19,22
19,00 18,80
Kihonteki ni wa Dochira tomo ieanai
14.93
13.80
14.70
9.77
9.90
10,00
Amari
6.63
6.40
6.00
Zettai/Zenzen/ Mattaku
1.21
1.00
2.10
Whether such a response scale would achieve better measurement than standard response scales would need to be tested. Implications and Future Research First, in general the comparison of American and German (and to a lesser extent American and Japanese) results on the agree/disagree and important/unimportant scales indicate a close, but not perfect, correspondence between the scale terms in general and in particular for terms used in prior ISSP scales (e.g. the five-point, agree/disagree scale). Some scale disparities do exist and the above rating scores could be used to suggest the use of alternative terms in future response scale
methods for assessing and calibrating response scales
73
Table 16. Mean correlations using raw and adjusted response values for Agree/Disagree ISSP questions A. Religion Mean Inter-item Correlations Mean Correlation with Gender Age Years of Education Highest Educational Degree Church Attendance B. Environment Mean Inter-item Correlations Mean Correlation with Gender Age Years of Education Highest Educational Degree Church Attendance
(Pearson’s r) Raw .169
Adjusted .160
.069 .127 .153 .141 .219
.071 .123 .147 .136 .215
Raw .148
Adjusted .130
.088 .094 .181 .177 .076
.072 .089 .162 .156 .073
Source: ISSP, 1991 and 1993.
or the adjustment of past scales according to their position on the underlying continuum. Regarding the latter, attitudinal scales are often used in analysis as if they were interval scales with equal distances between each response. For example, a five-point agree/disagree scale will be used in analysis with the response points assigned values of one to five. But the above analysis indicates that the response points do not have equal intervals between them. For example, the scores on the American five-point agree/disagree scale are 18.8, 16.0, 9.9, 3.5, and 1.5 and the intervals are 2.8, 6.2, 6.4, and 2.0. To estimate the impact of these miscalibrations, 16 agree/disagree items on the 1991 ISSP religion module and 18 agree/disagree items on the 1993 ISSP environment module were inter-correlated with themselves and five demographics (gender, age, years of education, highest educational degree, and frequency of church attendance) using both the raw 1–5 scale and the 18.8–1.5 scale. Overall, there was little difference in the raw or adjusted correlations (Table 16). What impact there is, is for the adjusted correlations to decrease. This may mean that the raw scale scores apply more regularity
74
Tom
w. smith et al.
to attitudes than really prevails so that the adjusted figures show the marginally lower and truer associations. Alternatively, when presented with the terms as a response scale in the context of a survey, people may assign them equal distances and shift from scale-independent evaluations of the response terms to more ordered, scale-dependent assessments. This would mean that the scale adjustment would be less than optimal since respondents had already self-adjusted their responses. Even if the placement of terms in a scale tends to more firmly establish order and distance than when the individual terms are used independently, the utilization of terms that naturally represented the proper interval should facilitate creation of an optimal response scale. Second, these results offer some tentative ideas about what kinds of scales might produce more equivalent, cross-national comparisons. Symmetrical, bipolar scales with an explicit middle point are probably best for cross-national scales. First, people have a very clear understanding of what is the mid-point. It provides people with a third anchor point (in addition to the end points). Second, the division into two sides means that even if sub-categories within the two sides do not match that summing the categories within each side should produce comparable recoded categories. Third, modifiers generally appear to be balanced. For example, strongly agree and strongly disagree have reciprocal values. Of course it is important that bipolar pairs exist in each language. This will not always be the case (Harkness 2003, 2005). Unipolar scales without an explicit mid-point that ask about the amount of some quantity are likely to be more problematic. First, setting aside the translation of specific terms, it would be harder to match categories across languages since on these scales the mid-point is either not clearly defined or subsumed into some broad middle category. Second, the terms used tend to be asymmetrical which makes the matching across languages harder to achieve. Third, research indicates that on unipolar scales people confound terms and position (Klockars and Yamagishi 1988). Without the mid-point clearly defined, people will often assume that the middle category represents the middle even when the term used (e.g. good or bad) is clearly towards the positive or negative end. Fourth, people may not consistently understand what the low end of a pure unipolar scale mean. For example, if people are rating values as from high to low on “conservativism,” does a low conservativism score mean the value is very liberal or merely that it is not conservative and perhaps moderate. Fourth, without a clear mid-
methods for assessing and calibrating response scales
75
point it is possible for unipolar scales to “slide over,” so that categories are unintentionally tilted towards the upper or lower end.11 Finally, more research is needed. It would be extremely useful both to cover more languages and have larger samples. Specific issues that need further study are: 1. How common is it that German terms are stronger (more near the extremes) than corresponding English and what can be done to compensate for this? 2. How common is it that extreme English terms are rated more moderately in Japanese? 3. Would numerical response scales with only the endpoint labeled be more equivalent across languages than labeled scales? How would a scale with only the ends and mid-point labeled perform? 4. Does presenting terms in response scales change the intensity that people associate with them (and the distance between categories)? In particular, to what extent does the use of terms in as response scales lead to them taking on the formal attributes of scale items (e.g. ordered and equally spaced)? 5. When people hold an attitude that lies between two response scale points, how do they decided to choose the category that is higher or lower than their precise position? Do they select the nearest category or are other decision rules used? 6. Do well-constructed response scales reduce respondent burden by reducing respondent effort to understand and “fix” less wellconstructed response scales? 7. Are the positions of terms/categories independent of the substance of the scale (i.e. its subject matter and the nature of public attitudes toward it)? Would people rate “completely agree” about a moderate statement about the economy the same as they would in reference to an extreme statement about religion?
11 On unipolar and bipolar scale in general see Ostrom 1987. A suspect scale is the Eurobarometer life satisfaction scale (“On the whole are you very satisfied, fairly satisfied, not very satisfied, or not at all satisfied with the life you lead?) (European Commission 1996). Year-to-year changes across the European Union are minor and inter-country difference are large and pretty stable. It is suspected that the large intercountry differences are in part due to differences on the intensity of terms used in the scale, variations in translation of the underlying dimension itself, or both.
76
Tom
w. smith et al.
Achieving equivalence in cross-cultural, multiple-language surveys is a challenge. The numerical scaling of response options is one technique that can be used to further this goal. In combination with the optimal, general translation procedures (Harkness 2003; Harkness et al. 2004; Smith 2004), it can notably assist in that endeavor. References Banks, James et al. 2004. “International Comparisons of Work Disability.” Discussion Paper IZA DP No. 1118. Institute for the Study of Labor. Bartram, Peter, and David Yelding. 1973. “The Development of an Empirical Method of Selecting Phrases Used in Verbal Rating Scales: A Report on a Recent Experiment.” Journal of the Market Research Society 15 ( July):151–156. Bradburn, Norman M., and Seymour Sudman. 1979. Improving Interview Method and Questionnaire Design. San Francisco: Jossey-Bass. Bullinger, Monika. 1995. “German Translation and Psychometric Testing of the SF-36 Health Survey: Preliminary Results from the IQOLA Project.” Social Science Medicine 41:1359–1366. Cliff, Norman. 1959. “Adverbs as Multipliers.” Psychological Review 66 ( January):27–44. Clogg, Clifford C. 1982. “Using Association Models in Sociological Research: Some Examples.” American Journal of Sociology 88:114–134. ——. 1984. “Some Statistical Models for Analyzing Why Surveys Disagree.” In Surveying Subjective Phenomena, edited by Charles F. Turner and Elizabeth Martin. Volume 2. New York: Russell Sage. Crespi, Leo P. 1981. “Semantic Guidelines to Better Survey Reportage,” Office of Research, International Communication Agency, Memorandum, August 11. D’Uva, Teresa Bago et al. 2006. “Does Reporting Heterogeneity Bias the Measurement of Health Disparities.” Tingergen Institute Discussion Paper, TI 2006–033/3. Davis, James A. 1993. “[Memorandum to] ISSP Methodology Group,” September. European Commission. 1996. Eurobarometer: Public Opinion in the European Union. No. 45. Brussels: European Commission. Glick, Peter et al. 2004. “Bad but Bold: Ambivalent Attitudes towards Men Predict Gender Inequality in 16 Nations.” Journal of Personality and Social Psychology 86 (May):713–728. Hakel, Milton D. 1968. “How Often is Often?” American Psychologist 23 ( July):533–534. Harkness, Janet. 2003. “Questionnaire Translation.” In Cross-Cultural Survey Methods, edited by Janet A. Harkness, Fons J.R. Van de Vijver, and Peter Philip Mohler. New York: John Wiley & Sons. ——. 2005. Report to the ISSP General Assembly on Behalf of the Translation Group. Mannheim: ZUMA. ——, Beth-Ellen Pennell, and Alisu Schoua-Glusberg. 2004. “Survey Questionnaire Translation and Assessment.” In Methods for Testing and Evaluating Survey Questionnaires, edited by Stanley Presser et al. New York: John Wiley & Sons. ——, Peter Ph. Mohler, Tom W. Smith, and James A. Davis. 1997. Final Report of the Project on ‘Research into the Methodology of Inter Cultural Surveys’ (MINTS). Transcoop Research Reports for ZUMA and NORC. Mannheim: ZUMA. Hougland, James G., Timothy P. Johnson, and James G. Wolf. 1992. “A Fairly Common Ambiguity: Comparing Rating and Approval Measures of Public Opinion.” Sociological Focus 25 (August):257–271. Javeline, Debra. 1999. “Response Effects in Polite Cultures: A Test of Acquiescence in Kazakhstan.” Public Opinion Quarterly 63:1–28.
methods for assessing and calibrating response scales
77
Jones, Lyle V., and L.L. Thurstone. 1955. “The Psychophysics of Semantics: An Experimental Investigation.” Journal of Applied Psychology 39 (February):31–36. King, Gary, Christopher J.L. Murray, Joshua A. Salomon, and Ajay Tandon. 2004. “Enhancing the Validity and Cross-Cultural Comparability of Measurement in Survey Research.” American Political Science Review 98 (February):191–207. Klockars, Alan J., and Midori Yamagishi. 1988. “The Influence of Labels and Positions in Rating Scales.” Journal of Educational Measurement 25 (Summer):85–96. Kumata, Hideya, and Wilbur Schramm. 1956. “A Pilot Study of Cross-Cultural Meaning.” Public Opinion Quarterly 20:574–584. Laumann, Edward O., John H. Gagnon, Robert T. Michael, and Stuart Michaels. 1994. The Social Organization of Sexuality: Sexual Practices in the United States. Chicago: University of Chicago Press. Lichtenstein, Sarah, and J. Robert Newman. 1967. “Empirical Scaling of Common Verbal Phrases Associated with Numerical Probabilities.” Psychon. Sci. 9:563–564. Lodge, Milton. 1981. Magnitude Scaling: Quantitative Measurement of Opinions. Beverly Hills: Sage Publications. ——, and Bernard Tursky. 1979. “Comparisons between Category and Magnitude Scaling of Political Opinion Employing SRC/CPS Items.” American Political Science Review 73:50–66. ——, and Bernard Tursky. 1981. “On the Magnitude Scaling of Political Opinion in Survey Research.” American Journal of Political Science 25 (May):376–419. ——, and Bernard Tursky. 1982. “The Social-Psychological Scaling of Political Opinion.” In Social Attitudes and Psychophysical Measurement, edited by Bernd Wegener. Hillsdale, NJ: Lawrence Erlbaum Associates. ——, David Cross, Bernard Tursky, Joseph Tanenhaus, and Richard Reeder. 1976. “The Psychophysical Scaling of Political Support in the ‘Real World’,” Political Methodology 3:159–182. ——, David V. Cross, Bernard Tursky, and Joseph Tanenhaus. 1975. “The Psychological Scaling and Validation of a Political Support Scale.” American Journal of Political Science 19 (November):611–649. ——, Joseph Tanenhaus, David Cross, Bernard Tursky, Mary Ann Foley, and Hugh Foley. 1976. “The Calibration and Cross-Modal Validation of Ratio Scales of Political Opinion in Survey Research.” Social Science Research 5:325–347. MacKuen, Michael B., and Charles F. Turner. 1984. “The Popularity of Presidents, 1963–1980.” In Surveying Subjective Phenomena, edited by Charles F. Turner and Elizabeth Martin. Volume 2. New York: Russell Sage. Mittelstaedt, Robert A. 1971. “Semantic Properties of Selected Evaluative Adjectives: Other Evidence.” Journal of Marketing Research 8 (May):236–237. Mohler, Peter Ph., Janet Harkness, Tom W. Smith, and James A. Davis. 1996. “Calibrating Response Scales Across Two Languages and Cultures.” Paper presented to the International Sociological Association Conference, Colchester, July. Mosier, Charles. 1941. “A Psychometric Study of Meaning.” Journal of Social Psychology 13 (February):123–140. Myers, James H., and W. Gregory Warner. 1968. “Semantic Properties of Selected Evaluation Adjectives.” Journal of Marketing Research 5 (November):409–412. O’Muircheartaigh, Colm A., George D. Gaskell, and Daniel B. Wright. 1993. “The Impact of Intensifiers.” Public Opinion Quarterly 57 (Winter):552–565. Ommundsen, Reidar, Sven Moerch, Tony Hak, Knud S. Larsen, and Kees Van Der Veer. 2002. “Attitudes towards Illegal Immigration: A Cross-national Methodological Comparison.” Journal of Psychology 136:103–110. Onodera, Noriko. 2002. “Notes on Broadcast Research: Answers Depend on Adverbial Terms Used in Questions—A Review of Expressions of Degree in Choices Used in International Comparative Studies.” The NHK Monthly Report on Broadcast Research ( January):62–75.
78
Tom
w. smith et al.
Orren, Gary R. 1978. “Presidential Popularity Ratings: Another View.” Public Opinion 1 (May/June):35. Osgood, Charles E., George J. Suci, and Percy H. Tannenbaum. 1957. The Measurement of Meaning. Urbana, IL: University of Illinois Press. Osinski, I.C., and A.S. Bruno. 1998. “Response Categories in Likert-scale.” Psicothema 10 (November):623–631. Ostrom, Thomas M. 1984. “RAC Quantifies the Vast Majority.” The Sampler from Response Analyses 31 (March):2. ——. 1987. “Bipolar Survey Items: An Information Processing Perspective.” In Social Information Processing and Survey Methodology, edited by Hans-J. Hippler, Norbert Schwarz, and Seymour Sudman. New York: Springer-Verlag. Salomon, Joshua, Ajay Tandon, and Christopher J.L. Murray. 2004. “Comparability of Self-Rated Health: Cross-Sectional Multi-country Survey Using Anchoring Vignettes.” British Medical Journal 358 ( January 31):258–264. Schaeffer, Nora Cate. 1991. “Hardly Ever or Constantly? Group Comparisons Using Vague Quantifiers.” Public Opinion Quarterly 55 (Fall):395–423. Scheuch, Erwin. 1989. “Theoretical Implications of Comparative Survey Research: Why the Wheel of Cross-Cultural Methodology Keeps on Being Reinvented.” International Journal of Sociology 4 ( June):147–167. Schriesheim, Chester, and Janet Schriesheim. 1974. “Development and Empirical Verification of New Response Categories to Increase the Validity of Multiple Response Alternatives Questionnaires.” Educational and Psychological Measurement 34 (Winter):877–884. Schwarz, Norbert, and Hans-J. Hippler. 1987. “What Response Scales May Tell Your Respondents: Informative Functions of Response Alternatives.” In Social Information Processing and Survey Methodology, edited by Hans-J. Hippler, Norbert Schwarz, and Seymour Sudman. New York: Springer-Verlag. ——. 1995. “The Numeric Values of Rating Scales: A Comparison of their Impact in Mail Surveys and Telephone Interviews.” International Journal of Public Opinion Research 7 (Spring):72–74. Schwarz, Norbert, Baerbel Knaeuper, Hans-J. Hippler, Elisabeth Noelle-Neumann, and Leslie Clark. 1991. “Rating Scales: Numeric Values May Change the Meaning of Scale Labels.” Public Opinion Quarterly 55:570–582. Schwarz, Norbert, Hans-J. Hippler, Brigltte Deutsch, and Fritz Strack. 1985. “Response Scales: Effects of Category Range on Reported Behavior and Comparative Judgments.” Public Opinion Quarterly 49 (Fall):388–395. Sigelman, Lee. 1990. “Answering the 1,000,000-Person Question: The Measurement and Meaning of Presidential Popularity.” Research in Mircropolitics 3:209–226. Simpson, Ray H. 1944. “The Specific Meanings of Certain Terms Indicating Differing Degrees of Frequency.” Quarterly Journal of Speech 30 (October):328–330. Skevington, Suzanne M. 2002. “Advancing Cross-cultural Research on Quality of Life: Observations Drawn from the WHOQOL Development.” Quality of Life Research 11 (March):135–144. ——, and Christine Tucker. 1999. “Designing Response Scales for Cross-cultural Use in Health Care: Data from the Development of the UK WHOQOL.” British Journal of Medical Psychology 72 (March):51–61. Smith, Tom W. 1979. “Happiness: Time Trends, Seasonal Variations, Intersurvey Differences, and Other Mysteries.” Social Psychology Quarterly 42:18–30. ——. 1993. “An Analysis of Response Patterns to the Ten-Point Scalometer.” American Statistical Association 1993 Proceedings. Alexandria, VA: ASA. ——. 1995. “The Holocaust Denial Controversy.” Public Opinion Quarterly 59 (Summer): 269–295. ——. 2002. “Developing Comparable Questions in Cross-National Surveys.” In CrossCultural Survey Methods, edited by Janet Harkness, Fons van de Vijver, and Peter Ph. Mohler. London: Wiley Europe.
methods for assessing and calibrating response scales
79
——. 2004. “Developing and Evaluating in Cross-National Survey Instruments.” In Methods for Testing and Evaluating Survey Questionnaires, edited by Stanley Presser et al. New York: John Wiley & Sons. Spector, Paul E. 1976. “Choosing Response Categories for Summated Rating Scales.” Journal of Applied Psychology 61 ( June):374–375. Strahan, Robert, and Kathleen Carrese Gerbasi. 1973. “Semantic Style Variance in Personality Questionnaires.” Journal of Psychology 85 (September):109–118. Szabo, Silvija, John Orley, Shekhar Saxena, and Alison Harper. 1997. “An Approach to Response Scale Development for Cross-Cultural Questionnaires.” European Psychologist 2 (September):270–276. Traenkle, Ulrich. 1987. “Auswirkungen der Gestaltung der Antworskala suf Quantitative Urteile.” Zeitschrift fuer Sozial Psychologie 18:88–99. Van de Vijver, Fons, and Kwok Leung. 1997. “Methods and Data Analysis of Comparative Research.” In Handbook of Cross-Cultural Personality, edited by J.W. Berry, Y.H. Poortinga, and J. Pandey. Boston: Allyn and Bacon. Vidali, Joseph J. 1975. “Context Effects on Scales Evaluatory Adjective Meaning.” Journal of the Market Research Society 17 ( January):21–25. Wallsten, Thomas S., David V. Budescu, Amnon Rapoport, Rami Zwick, and Barbara Forsyth. 1986. “Measuring the Vague Meanings of Probability Terms.” Journal of Experimental Psychology 115 (December):348–365. Wilcox, Clyde, Lee Sigelman, and Elizabeth Cook. 1989. “Some Like it Hot: Individual Differences in Responses to Group Feeling Thermometers.” Public Opinion Quarterly 53 (Summer):246–257. Wildt, Albert R., and Michael B. Mazis. 1978. “Determinants of Scale Response: Label Versus Position.” Journal of Marketing Research 15 (May):261–267. Worcester, Robert M., and Timothy R. Burns. 1975. “A Statistical Examination of the Relative Precision of Verbal Scales.” Journal of the Market Research Society 17 ( July):181–197. Wright, D.B., G.D. Gaskell, and Colm O’Muircheartaigh. 1995. “How Response Alternatives Affect Different Kinds of Behavioural Frequency Questions.” British Journal of Social Psychology 36 (December):443–456.
80
Tom
w. smith et al.
Appendix: American Questionnaires Sample A: Time: _____________
Section A: Translation 1. If you were to consider your life in general these days, how happy or unhappy would you say you are on the whole . . . Very happy Fairly happy Not very happy Not at all happy
1 2 3 4
2. What is your opinion of the following statement? It is the responsibility of the government to reduce the differences in income between people with high incomes and those with low incomes. Do you . . . Agree strongly Agree Neither agree nor disagree Disagree Disagree strongly
1 2 3 4 5
CAN’T CHOOSE
8
3. In order to help us write better and more understandable questions, we need to know how people like you use certain words. Here is a scale that goes from 0 to 20. The zero (0) point means you totally and completely disagree with an idea and 20 means you totally and completely agree with an idea. I’m going to read you some terms and I’d like you to tell me what number best represents how much agreement or disagreement the word or phrase means. A. What score between 0 and 20 would you give to . . .
methods for assessing and calibrating response scales
81
HAND CARD Q. 3
First Response Changes a. basically agree _____ _____ SHUFFLE CARDS AND ASK REST OF PHRASES b. agree _____ _____ c. agree a little _____ _____ d. agree a lot _____ _____ e. can’t choose _____ _____ f. completely agree _____ _____ g. completely disagree _____ _____ h. definitely agree _____ _____ i. definitely disagree _____ _____ j. disagree _____ _____ k. disagree a little _____ _____ l. disagree a lot _____ _____ m. in the middle _____ _____ n. moderately agree _____ _____ o. moderately disagree _____ _____ p. neither agree nor disagree _____ _____ q. not agree _____ _____ r. probably agree _____ _____ s. probably disagree _____ _____ t. somewhat agree _____ _____ u. somewhat disagree _____ _____ v. strongly agree _____ _____ w. strongly disagree _____ _____ x. tend to agree _____ _____ y. tend to disagree _____ _____ z. undecided _____ _____ aa. very much agree _____ _____ bb. very much disagree _____ _____ Code 95 = verbatim; 96 = can’t rate term; 98 = Don’t know term; 99 = missing/no answer/unreadable. Code 0.5 for value between the 21 numbered scale points. If range given, code 95 and indicate range specified. B. HAND RESPONSE SHEET WITH ANSWERS RECODED TO RESPONDENT AND SAY: Please look over your answers. If you want to change any of your responses, indicate in the right hand column, the one headed “CHANGES,” what number you now want to give a phrase. 4. Now we’re going to use a similar scale that goes from 0 to 20 to rate some additional phrases. On this scale 0 indicates something of the lowest importance possible, something last and least in importance, and 20 indicates the highest importance possible, something that is first and foremost in importance. As I read you each phrase, please tell me what number best represents how much importance the phrase indicates.
82
Tom
w. smith et al.
HAND CARD Q. A4 SHUFFLE CARDS AND ASK IN THAT ORDER a. Pretty important _____ b. Definitely important _____ c. Not too important _____ d. Extremely important _____ e. Not very important _____ f. Fairly important _____ g. Highly important _____ h. Probably important _____ i. Not at all important _____ j. Exceptionally important _____ k. Not important _____ l. Very important _____ m. Somewhat important _____ n. Important _____ o. Neither important nor unimportant _____ p. Quite important _____ q. Very, very important _____ r. A little bit important _____ s. Slightly important _____ t. Completely important _____ 5. And now consider a similar scale going from 0 to 20. Point 0 indicates that someone is totally and completely against an idea and point 20 means that someone is totally and completely in favor of the idea. I’m going to read you some terms and I’d like you to tell me what number best represents how much someone is either against or in favor of an idea. HAND CARD Q. A5 a. b. c. d. e. f. g.
Slightly against Strongly in favor of Against Strongly against Neither against nor in favor of Slightly in favor of In favor of
_____ _____ _____ _____ _____ _____ _____
GO BACK TO Q. 3, LOOK UP THE RATINGS GIVEN TO TERMS USED IN 6A-H, AND ENTER IN MIDDLE COLUMN, “PREVIOUSLY GIVEN,” BELOW: 6. Now let’s consider again a few of the terms you rated about disagreement and agreement. Here again is the scale that goes from 0 to 20. The zero (0) point means you totally and completely disagree with an idea and 20 means you totally and completely agree with an idea. You gave “basically
methods for assessing and calibrating response scales
83
agree” a score of [mention number given to Q. 3]. Now I want you to think about what is the lowest score that you feel would still represent the phrase “basically agree” and what would be the highest score that would still mean “basically agree,” that is what numbers would represent the range from high to low that would describe where “basically agree” fits on our scale from 0 to 20. First, what would be the lowest number for “basically agree?” And what would be the highest number? REPEAT FOR 6B-H. HAND CARD Q. 6 Lowest a. b. c. d. e. f. g. h.
Basically agree Strongly agree Neither agree nor disagree Disagree Can’t Choose Not agree Strongly Agree Agree
_____ _____ _____ _____ _____ _____ _____ _____
INTERVIEWER: FILL-IN FROM Q. 3 Previously Given _____ _____ _____ _____ _____ _____ _____ _____
Highest _____ _____ _____ _____ _____ _____ _____ _____
NOTE: IF ANY OF THE TERMS WERE NOT RATED 0–20 IN Q. 3, THEN SKIP AND DO NOT ASK IN Q. 6. 7. Now, I’m going to ask you about some of words we’ve just been discussing. What does the word “agree” mean? What does it involve? How about “disagree”? What does it mean or involve? And WHAT does the phrase “neither agree nor disagree” mean? What does it involve? And what about “important”? And how about “unimportant”? A. Agree B. Disagree C. Neither agree nor disagree D. Important E. Unimportant 8. I’m going to read several pairs of words and I would like you to compare and contrast these with the pair “agree/disagree”. I want you to tell me whether you think they mean very much the same as “agree/disagree”, somewhat the same as “agree/disagree”, somewhat different from “agree/disagree” or very much different from “agree/disagree.”
84
Tom
w. smith et al.
First, does the phrase “for/against” mean very much the same as “agree/ disagree”, somewhat the same as “agree/disagree”, somewhat different from “agree/disagree” or very much different from “agree/disagree”? REPEAT FOR 8B-E.
a. b. c. d. e.
Very Much the Same for/against 1 important/ unimportant 1 like/ dislike 1 favor/ oppose 1 positive/ negative 1
Somewhat the Same 2
Somewhat Different 3
Very Much Different 4
DK 8
2
3
4
8
2
3
4
8
2
3
4
8
2
3
4
8
9. What language did you mainly speak at home when you were a child? _______________ 10. What languages do you now speak at home? First Mentioned: _______________ Second Mentioned: _______________ Q. 3
-
0 – Totally and Completely Disagree 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 – Totally and Completely Agree
methods for assessing and calibrating response scales Q. A4
-
0 – Lowest Possible Importance/Last and Least in Importance 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 – Highest Possible Importance/First and Foremost in Importance
Q. A5 -
85
0 – Totally and Completely Against 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 – Totally and Completely in Favor of
86 Q. 6
Tom
-
w. smith et al.
0 – Totally and Completely Disagree 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 – Totally and Completely Agree
Q. 7 Agree/Disagree Very Much the Same Somewhat the Same Somewhat Different Very Much Different
1 2 3 4
Sample B: Time: _____________
Section A: Translation (B) 1. If you were to consider your life in general these days, how happy or unhappy would you say you are on the whole . . . Completely happy Very happy Fairly happy Not very happy Not at all happy
1 2 3 4 5
methods for assessing and calibrating response scales
87
2. What is your opinion of the following statement? It is the responsibility of the government to reduce the differences in income between people with high incomes and those with low incomes. Do you . . . Completely agree Somewhat agree Neutral Somewhat disagree Completely disagree
1 2 3 4 5
CAN’T CHOOSE
8
3. In order to help us write better and more understandable questions, we need to know how people like you use certain words. Here is a scale that goes from 0 to 20. The zero (0) point means you totally and completely disagree with an idea and 20 means you totally and completely agree with an idea. I’m going to read you some terms and I’d like you to tell me what number best represents how much agreement or disagreement the word or phrase means. A. What score between 0 and 20 would you give to . . . HAND CARD Q. 3
First Response Changes a. basically agree _____ _____ SHUFFLE CARDS AND ASK REST OF PHRASES b. agree _____ _____ c. agree a little _____ _____ d. agree a lot _____ _____ e. can’t choose _____ _____ f. completely agree _____ _____ g. completely disagree _____ _____ h. definitely agree _____ _____ i. definitely disagree _____ _____ j. disagree _____ _____ k. disagree a little _____ _____ l. disagree a lot _____ _____ m. in the middle _____ _____ n. moderately agree _____ _____ o. moderately disagree _____ _____ p. neither agree nor disagree _____ _____ q. not agree _____ _____ r. probably agree _____ _____ s. probably disagree _____ _____ t. somewhat agree _____ _____
88 u. v. w. x. y. z. aa. bb.
Tom
somewhat disagree strongly agree strongly disagree tend to agree tend to disagree undecided very much agree very much disagree
w. smith et al. _____ _____ _____ _____ _____ _____ _____ _____
_____ _____ _____ _____ _____ _____ _____ _____
Code 95 = verbatim; 96 = can’t rate term; 98 = Don’t know term; 99 = missing/no answer/unreadable. Code 0.5 for value between the 21 numbered scale points. If range given, code 95 and indicate range specified. B. HAND RESPONSE SHEET WITH ANSWERS RECODED TO RESPONDENT AND SAY: Please look over your answers. If you want to change any of your responses, indicate in the right hand column, the one headed “CHANGES,” what number you now want to give a phrase. 4. Now we’re going to use a similar scale that goes from 0 to 20 to rate some additional words. On this scale 0 indicates something that is completely and totally unimportant and 20 indicates something that is completely and totally important. As I read you various terms, please tell me what number best represents how much importance the word or phrase means. HAND CARD Q. B4 SHUFFLE CARDS AND ASK IN THAT ORDER a. a. b. c. d. e. f. g. h. i. j. k. l. m. n. o. p. q.
Pretty important Pretty important Definitely important Probably unimportant Extremely important Not very important Fairly important Very unimportant Probably important Not at all important Fairly unimportant Not important Very important Somewhat important Unimportant Completely important Neither important nor unimportant Somewhat unimportant
____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____
methods for assessing and calibrating response scales r. s. t. u. v. w. x. y. z. aa.
Important Pretty unimportant A little bit unimportant Slightly unimportant A little bit important Definitely unimportant Extremely unimportant Slightly important Completely unimportant In between
89
____ ____ ____ ____ ____ ____ ____ ____ ____ ____
5. And now consider a similar scale going from 0 to 20. Point 0 indicates that someone is totally and completely against an idea and point 20 means that someone is totally and completely in favor of the idea. I’m going to read you some terms and I’d like you to tell me what number best represents how much someone is either against or in favor of an idea. HAND CARD Q. B5 a. b. c. d. e. f. g.
Slightly against Against Strongly against Neither against nor in favor of Slightly in favor of In favor of Strongly in favor of
_____ _____ _____ _____ _____ _____ _____
GO BACK TO Q. 3, LOOK UP THE RATINGS GIVEN TO TERMS USED IN 6A-H, AND ENTER IN MIDDLE COLUMN, “PREVIOUSLY GIVEN,” BELOW: 6. Now let’s consider again a few of the terms you rated about disagreement and agreement. Here again is the scale that goes from 0 to 20. The zero (0) point means you totally and completely disagree with an idea and 20 means you totally and completely agree with an idea. You gave “basically agree” a score of [mention number given to Q. 3]. Now I want you to think about what is the lowest score that you feel would still represent the phrase “basically agree” and what would be the highest score that would still mean “basically agree,” that is what numbers would represent the range from high to low that would describe where “basically agree” fits on our scale from 0 to 20. First, what would be the lowest number for “basically agree?” And what would be the highest number?
90
Tom
w. smith et al.
REPEAT FOR 6B-H. HAND CARD Q. 6 Lowest a. b. c. d. e. f. g. h.
Basically agree Strongly agree Neither agree nor disagree Disagree Can’t Choose Not agree Strongly Agree Agree
_____ _____ _____ _____ _____ _____ _____ _____
INTERVIEWER: FILL-IN FROM Q. 3 Previously Given _____ _____ _____ _____ _____ _____ _____ _____
Highest _____ _____ _____ _____ _____ _____ _____ _____
NOTE: IF ANY OF THE TERMS WERE NOT RATED 0–20 IN Q. 3, THEN SKIP AND DO NOT ASK IN Q. 6. 7. Now, I’m going to ask you about some of words we’ve just been discussing. What does the word “agree” mean? What does it involve? How about “disagree”? What does it mean or involve? And WHAT does the phrase “neither agree nor disagree” mean? What does it involve? And what about “important”? And how about “unimportant”? A. Agree B. Disagree C. Neither agree nor disagree D. Important E. Unimportant 8. I’m going to read several pairs of words and I would like you to compare and contrast these with the pair “agree/disagree”. I want you to tell me whether you think they mean very much the same as “agree/disagree”, somewhat the same as “agree/disagree”, somewhat different from “agree/ disagree” or very much different from “agree/disagree.” First, does the phrase “for/against” mean very much the same as “agree/disagree”, somewhat the same as “agree/disagree”, somewhat different from “agree/ disagree” or very much different from “agree/disagree”?
methods for assessing and calibrating response scales
91
REPEAT FOR 8B-E.
a. b. c. d. e.
Very Much the Same for/against 1 important/ unimportant 1 like/ dislike 1 favor/ oppose 1 positive/ negative 1
Somewhat the Same 2
Somewhat Different 3
Very Much Different 4
DK 8
2
3
4
8
2
3
4
8
2
3
4
8
2
3
4
8
9. What language did you mainly speak at home when you were a child? _______________ 10. What languages do you now speak at home? First Mentioned: _______________ Second Mentioned: _______________ Q. 3 -
0 – Totally and Completely Disagree 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 – Totally and Completely Agree
92
Tom
w. smith et al.
Q. B4 -
0 – Completely and Totally Unimportant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 – Completely and Totally Important
Q. B5 Strongly against Against Slightly against Neither against nor in favor of Slightly in favor of In favor of Strongly in favor of
methods for assessing and calibrating response scales -
0 – Totally and Completely Against 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 – Totally and Completely in Favor of
-
0 – Totally and Completely Disagree 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 – Totally and Completely Agree
Q. 6
93
94
Tom
w. smith et al.
Q. 7 Agree/Disagree Very Much the Same Somewhat the Same Somewhat Different Very Much Different
1 2 3 4
PART TWO
RADICAL SOCIAL CHANGE
THE TRANSITION TO CAPITALISM IN CHINA AND RUSSIA Erich Weede 1. Introduction For 1979, the World Bank (1981:134–135) provided numbers which imply a Soviet per capita product about 16 times as high as the Chinese product. For 1991, when Russia became the (most important) successor state of the USSR, the World Bank (1993:238–239) reported a Russian per capita product about 8.7 times as high as the Chinese product. Since the Chinese economy grew nearly 8% per year in the 1980s, most of the narrowing of the gap—from 16:1 to about 9:1—had been due to Chinese success rather than Soviet failure. If one had used purchase power parity data (World Bank 1993:296–297) instead, then Russia per capita income in 1991 was only 4.1 times as high as Chinese income. But the decline of the Soviet Union was only the beginning.1 According to The Economist (1997:5) and the World Bank (1996:26), after the dissolution of the Soviet Union the Russian economy lost about half of its size until the mid-1990s.2 This implies worse economic losses than those suffered by the Russians during World War II. Simultaneously, China’s economy continued to grow. At the turn of the millennium, in 2000, Russia’s gross national income per capita was about twice as high as China’s (World Bank 2002:232–233). Looking at purchase power parity data hardly affects this conclusion. Thereafter, high prices for Russia’s natural resources, in particular oil and gas, generated some recovery for Russia’s fortune. In 2005, Russian gross national income per capita in dollar terms was 2.6 times as high as Chinese income. For living standards, however, the ratio for per capita incomes in purchase power parity terms, which was only 1.6, might be more relevant 1 Trenin (2002:2) has summarized the impact of the collapse of the USSR in the following terms. Russia kept 50% of its population, 60% of its industry, and 70% of its territory. 2 According to Silverman and Yanowitch (2000:151), in 1998 agricultural production and GDP had fallen to 60% of production in 1991. Industrial production had fallen to 50% in the same period.
98
erich weede
(World Bank 2007:288–289). In this respect at least, China is still catching up with Russia. How could this happen? 2. The Communist Heritage of China and Russia Both countries still suffer the consequences of Communist rule or misrule. Both societies lost millions of people because of the cruelty and incompetence of their rulers which resulted in terrible losses of life, frequently because of starvation. Together, the number of victims of Soviet and Chinese Communism may be on the order of magnitude of a hundred million, although different researchers disagree about whether Soviet or Chinese Communism was more lethal (Courtois et al. 1998; Rummel 1994). If one trusts the results from cross-national analyses of growth rates (Barro and Sala-i-Martin 1995; Levine and Renelt 1992; Sala-i-Martin, Doppelhofer, and Miller 2004; World Bank 1993a:51), then Russia or the Soviet Union and China enjoyed favourable prospects for growth.3 Compared to the developed countries of the West and Japan, both countries were and still are poor and lagging behind, which promises potential advantages of backwardness. They should not yet suffer from diminishing returns to investment. They still have some potential to reallocate labour from agriculture to more productive employment elsewhere. As a matter of fact, of course, both the Soviet Union and China did not permit labour to go where the rewards were highest. Instead mobility was controlled.4 Probably most important of all, they could borrow technologies from more advanced economies and thereby benefit from the fruits of the earlier establishment of economic freedom and capitalism in the West (Weede 2006). Moreover, their poverty protected both of them from post-materialism and an erosion
3 As Easterly (2001:200ff.) has pointed out, presumed determinants of growth (including level of development, investment and human capital) tend to be much more stable than growth rates. Therefore, “luck” or unknown determinants of growth must be important too. 4 In contemporary China the surplus rural workforce might have been as high as 150 million at the turn of the millennium. Up to 100 million people might live and work in places where they should not be, according to official regulations. Only in 2001 was lifting the restrictions at least considered and only for a later date (The Economist 2001b). Russia’s restrictive internal registration system also survived the demise of Communism (The Economist 2001a:8).
the transition to capitalism in china and russia
99
of achievement motivation by affluence (Inglehart 1997; but see also Mehlkop 2000 for criticism). Although the availability and quality of data for socialist countries has always been poor (Winiecki 1988), there is reason to believe that both countries, the USSR as well as China, invested5 a lot, although not always efficiently.6 Compared with nations at the same level of development, both countries did a lot for human capital formation. Investment and human capital formation should have permitted both of them to realize the potential advantages of backwardness. But this happened to an ever lesser degree in the Soviet Union after the 1960s, and it really happened in China only in the era of Deng Xiaoping. Under Mao, the Chinese economy grew more slowly than the global economy (Maddison 1998:15–16, 97). Looking at the second half of the 20th century and comparing growth rates with either the losers of World War II, Germany and Japan, or with the Asian tigers, especially Taiwan or South Korea, the economic performance of none of the socialist great powers looks very impressive. The reasons for the growth deficits of both of them have been known for centuries, or at least decades. According to Adam Smith (1776–1976), there are few incentives to produce rather than to shirk without property. According to Mises (1920), private property in the means of production is a prerequisite of scarcity prices in factor markets and therefore of a rational allocation of resources.7 One may point to a link between state-owned enterprises on the one hand and comparative-advantage-denying strategies of development on the other hand (Lin, Cai and Li 2003). The nationalization of the means of production facilitates value-subtracting production, i.e., turning useful raw materials which can be sold on free markets into unwanted and less valuable goods which can no longer be sold in free markets, but only to consumers in socialist economies who have no choice but to accept
5 Actually, the impact of investment on growth is disputed. According to Blomström, Lipsey and Zejan (1996), there is much better evidence for growth affecting later investment than for earlier investment affecting growth. In recent work investment is increasingly seen as endogenous to growth (Bleany and Nishiyama 2002:44). 6 Only after the dissolution of the Soviet Union did investment in Russia collapse. In the late 1990s investment was only about one-fifth of what it still was in 1990 (Lynch 2001:19). This lack of investment and maintenance in the Yeltsin era had resulted in seriously deteriorating infrastructure. 7 For a modern treatment of similar ideas and an identical conclusion, see Michael Keren (2002).
100
erich weede
shoddy products. According to Hayek (1945), central planning by itself is incompatible with the mobilization of knowledge which is necessarily dispersed over millions of heads. The mobilization and the expansion of knowledge by innovation is dependent on an economy where a large number of actors exert free choices, where they command resources (labour, knowledge, etc.) of their own and where there are incentives to put them to good use. Under central planning, however, knowledge unavailable to the leadership exerts no beneficial impact. More than forty years ago, on the occasion of the ‘great leap forward’, Mao Zedong acted contrary to this Hayekian insight with the most tragic results. Then, comparatively small agricultural collectives encompassing villages sometimes consisting of only a few extended families or clans were combined into so-called people’s communes. Thereby, property rights became ever more attenuated. One’s living standard thereafter depended largely on the effort of a multitude of others whom one did not even know. Nor did most peasants share feelings of solidarity with most other commune members. Worst of all, the leadership of the commune decided what should be done, when, and on which field. Whereas peasants usually know from experience what grows best on which ones of their fields, an education in Marxism or Mao Zedong thought provides a poor substitute. Nowadays, estimates of the numbers of starvation victims exceed 30 million (Fu 1993:235, 304; Kristof and WuDunn 1994:66). A crucial task of all ex-socialist societies remains the re-establishment of respect for property rights. As Olson (2000) recognized, one may regard socialism and nationalized property as a kind of educational program to make people not respect property rights. In societies with private property rights, most of the costs of protecting property are borne by private owners, not by the state. If these private property owners no longer exist, then there is nobody to cover these costs. Securing property rights is much easier, if owners cooperate with the police and the courts. They install locks, they pay private guardians, they provide the police and the authorities with information after a theft or robbery has occurred. Without their assistance, fighting crimes against property becomes much more difficult. Where there is private property, the state can rely on the self-interests of owners to assist it in safeguarding property rights. The situation is quite different in a society with little private property, widespread scarcity, and prices which do not reflect supply and demand.
the transition to capitalism in china and russia
101
There producers and potential customers face strong incentives for illegal exchange. A typical case concerns the illegal sale of public property stolen at the place of work. This benefits the worker and thief. The customer benefits, too, because the stolen goods might not be available legally. Since such exchanges benefit all of those involved, nobody will cooperate with the police. If the police somehow learn about the exchange, there is an incentive to bribe it to buy its silence. Therefore, regulation and central planning boost crime and corruption.8 Although both countries, China and Russia, suffer from this heritage of socialism, the mis-education in not respecting property rights lasted much longer in Russia than in China, especially in rural China where it was essentially overcome after three decades. Moreover, it should be easier to re-establish respect for private property rights in a countryside where people know each other, where reputation is easily established and lost, than in big cities where people do not even know their neighbours. Finally, political instability shortens the time horizon of the authorities and therefore increases the kleptocratic inclinations of the powerful (Heilmann 2000:128). Compared to China after 1979, Russia looked much less stable, at least before the Putin presidency. 3. The Transition to Capitalism in Comparative Perspective Despite some attempts, neither China nor Russia succeeded in establishing the rule of law (see chapters IVb and Xc in Weede 2000; Blankenhagel 2000; Pei 1998, 2006; Zakaria 2003:89ff.; Yavlinsky 1998), which is a prerequisite for safe property rights, economic freedom and incentives to work productively. Compared to Russia, however, China is a fairly safe country. Whereas more than 40% of all surveyed Russians claim to have been victims of crime, less than 20% of all Chinese claim to have suffered from it. Whereas only about 10% of all Russians asked are satisfied with the police, about 80% of all Chinese were so at the beginning of the millennium (Newman 2002:28). Because of being vulnerable to ethnic separatism, Russia under Yeltsin succeeded in continuing neither the tsarist and Communist tradition of strong central authority, nor an orderly and voluntary devolution of administrative controls and tasks to local and regional levels. Only under
8
See the econometric study by Peter Graeff and Guido Mehlkop (2003) for evidence.
102
erich weede
Putin was central authority restored. By contrast, China could afford to shift many economic policy responsibilities from the central state to provinces, counties, cities and townships. According to Montinola, Qian and Weingast (1995), this shift of much decision-making from central to lower levels of authority contributed to the rise of “market-preserving” federalism in China. In my view, the Chinese thereby established a functional equivalent of the rule of law. In contrast to the central authorities, local and regional authorities have to compete more vigorously for the favour of investors, including overseas Chinese investors. Therefore, they have to act as if they desire to respect private property rights and provide some infrastructure. Whoever does worst in these respects, whoever is more corrupt than neighbouring units, whoever engages in more arbitrary and confiscatory taxation than others will drive investors, capital and even qualified workers elsewhere.9 From this perspective, the task of the central government is to guarantee a common market and to prevent the rise of local or regional protectionism. By contrast to China, the post-Communist Russian state has been less capable of containing regional protectionism, at least during the Yeltsin era (Heilmann 2000:210–211). Legally, one may argue that Russia is a federal state, whereas China is not. Administratively, both political systems are similar in consisting of five levels of government. What seems to matter, however, are neither similarities in administrative arrangements, nor legal fictions, but the actual independence of various levels of government from each other.10 Without a reasonable degree of fiscal independence of various layers of government, federalism is unlikely to be market-preserving. As Zhuravskaya (2000:134) has demonstrated, Russian cities enjoy little fiscal independence and face few incentives to improve the local economies because additional local revenues are almost entirely taxed away by regional authorities. Instead, Russian cities waste a lot of resources
9 Although population movements within the People’s Republic of China are still controlled, foreigners, overseas Chinese or Taiwanese enjoy more choices. About a million people from Taiwan live on the Chinese mainland. There even have been a quarter million cross-strait marriages (Ross 2005:82). Such people can discriminate in their choice of workplace and residence in favor of the better governed parts of China. 10 Feld and Voigt (2003) provide a beautiful econometric demonstration of the relevance of ‘de facto’ rule of law and the irrelevance of ‘de jure’ rule of law in accounting for economic performance. According to Blankart (2007), the constitutional ideal is “institutional congruence”, which implies that decision-making happens at the same level of government where benefits and costs occur.
the transition to capitalism in china and russia
103
on subsidizing loss-making enterprises. Whereas Chinese federalism may look weaker than Russian federalism from a legalistic perspective, the former may still be market-preserving, whereas the latter might be “market-hampering”. In Zhuravskaya’s (2000:148) view, “there is quite a lot of evidence that Russian local governments conduct predatory policies toward business, while Chinese governments make efforts to promote entrepreneurial activities in their communities”. In Treisman’s (2000:66) view, revenue sharing between various levels of government in Russia leads to “overgrazing of the common tax and bribe basis”.11 Moreover, differential treatment of regional governments in Russia seems to be politically rather than economically motivated. The main objective has been appeasement of potential troublemakers, especially in ethnic republics (Shleifer and Treisman 2000). By contrast to Russia, the central government of China kept tighter control of senior provincial-level appointments and frequently rotated them from one province to another (World Bank 2002:115). Only under Putin has the primacy of the central government been vigorously reasserted in Russia. Fiscal federalism in China suffers from some weaknesses too (Li 2000). From the late 1970s to the late 1990s, budgetary revenue as a proportion of GDP has been falling. More and more local and provincial revenue has been shifted off-budget. Farmers carry a heavy and frequently arbitrarily imposed burden (Wen 2000). The tax share of the central government had fallen so low that there was a major reform in 1994, which reinforced the role of the central government in tax collection. The fiscal system has not been successful in transferring resources from richer to poorer provinces. The reverse side of this latter shortcoming, however, might be that tax collection in China does not undermine local and regional autonomy. Therefore, it does not weaken
11 According to Easterly (2001:248ff.) “centralized corruption is less damaging than decentralized corruption”. Within Russia corruption has not only grown in magnitude, but under Yeltsin it has become more decentralized, thereby reinforcing the tendency to overgraze the bribe basis. Moreover, the combination of fiscal federalism and electoral pressure in Russia has made regional governments expand public employment beyond fiscal capabilities, resulting in wage arrears, strikes und ultimately bailouts from the central government in the 1990s (Gimpelson and Treisman 2000). Of course, there is a lot of decentralized corruption in China too. Pei (2006:132 and 167) even refers to “local mafia states” and argues: “The combination of lagging political reforms, entrenchment of rent-seeking groups, and decentralization of state predation is a recipe for deteriorating governance.” Although Pei is very good at listing all the weaknesses of China, he is less persuasive in reconciling his criticism with China’s stellar economic performance.
104
erich weede
market-preserving federalism and incentives to work and to make profits too much. In China, fiscal decentralization actually curtailed government size (Zhu and Krug 2007). Both countries suffered from state-asset depletion and self-enrichment by politically well-connected persons (Shleifer and Treisman 2000; Xu 2000). This resulted in declining government revenue as a proportion of GDP, especially at the central level of government. In Russia, insider privatization, the loans-for-shares deal and bank profits from short-term treasury bills are the most important examples of self-enrichment for a few at the expense of the many. In China, asset transfers from stateowned enterprises (SOEs) to semi-public companies, SOE-attached collectives, joint ventures, or even purely private enterprises siphoned off investment capital supplied by the state-owned banks, profitable divisions and trademarks from SOEs. The SOEs were left with money-losing divisions and burdens, like social service obligations and loans, whereas the ‘new’ enterprises could prosper. This process of asset-stripping may be called ‘insider privatization with Chinese characteristics’. In both countries the mode of privatization is likely to have improved the incentives for economic performance (Shleifer and Treisman 2000:33, 38; Xu 2000:87) at the expense of social justice. A potential long-run consequence of this process might be undermining the legitimacy of the transition from socialism to capitalism and the market. Since the Chinese economy works much better than the Russian one, Russia is more likely than China to be at risk sooner rather than later. The reimposition of state control on many previously privatized enterprises in Putin’s Russia is compatible with this view. One of the reasons why China did so much better than the Soviet Union or Russia after 1979 is that responsibility for agriculture was transferred to ever smaller work units until the level of households had been reached. Moreover, the rent contracts of peasants became more extended in time. So, incentives to work and opportunities for the application of knowledge significantly improved in the Chinese countryside, although there is still no private property in land.12 By contrast, Russia had not yet overcome the legacy of collectivization at the beginning of the 21st century. According to Ryback (2000), only
12 Comparing property rights in land across Chinese provinces, the World Bank, (2002:35) concluded “that higher levels of transferability were positively correlated with higher levels of farm investment.”
the transition to capitalism in china and russia
105
6% of the agricultural land was worked by private farmers and another 3% consisted of garden plots at the turn of the century. But these private plots are much more productive than other land. At the end of the 20th century, they accounted for 90% of the potatoes, 75% of the vegetables, and 55% of the meat produced in Russia (Silverman and Yanowitch 2000:157). Since China remained more rural than Russia into the early 21st century, agricultural reforms by themselves could never have had as beneficial an impact in Russia as they had in China in the early 1980s. Moreover the Chinese economy was never as centralized as the Russian one was under Soviet rule. Heilmann (2000:60) estimates that the state commanded about 90% of the Soviet economy, whereas the Chinese government commanded a mere 20% after the de facto privatization of agriculture. Since the 1950s there have always been waves of centralization and decentralization in China (Qian 2000). During periods of decentralization, village, township, county and provincial administrations got valuable experience in running local or regional economies. This experience was put to good use in later township village enterprises (TVEs).13 Their most important characteristic is that they have to compete with each other. In spite of collective ownership they have to behave as if they were capitalist enterprises. Actually, some of them have been quite close to private enterprises which preferred to take cover under some collective. Moreover, there are rising numbers of truly private enterprises in China, some of which are owned by overseas Chinese. Admittedly, the government wasted a lot of time instead of privatizing state-owned enterprises. Although many of them were bankrupt according to Western standards and nevertheless continued to operate for years, even though China could not afford to subsidize them endlessly (Lardy 1998; Weede 2000, chapter IVb), growth elsewhere at least reduced the economic weight of state-owned enterprises.14 13 According to Che and Qian (1998:490–491), TVEs are a second best solution to the ownership problem under insecure property rights. In their view “local government-owned enterprises have more secure property rights than private enterprises have because the national government expects them to better serve its interests . . . they provide more revenues to the national government; and they also spend more on local public goods . . .” 14 According to The Economist (2000:93–97), SOEs constituted about 28% of the Chinese economy at the turn of the century, but accounted for 44% of urban employment, 70% of government revenue and 80% of bank loans. From 1998 to 2000 SOEs released about 21 million workers (Taube 2001:135), most of whom did not benefit from a social security net. According to Pei (2006:2–3), the share of SOEs in industrial output fell
106
erich weede
Whereas Russian small enterprises are weak, Chinese small enterprises are comparatively strong. Why this difference is important has been persuasively argued by Pejovich (2001:28): “The small enterprises are the breeding ground for entrepreneurs, a work ethic, and a capitalist exchange culture. They educate ordinary people to appreciate a way of life that rewards performance, promotes individual liberties, and places high value on self-responsibility and self-determination”. Because of its head-start in nurturing small enterprises, capitalism stands a chance of growing much deeper roots in China than in Russia. Another reason for the differential success of China and Russia is the differential degree of export orientation of both economies. Econometric studies (Bleaney and Nishiyama 2002; Dollar 1992; Edwards 1998; Greenaway and Nam 1988; World Bank 1993a) demonstrate that open or export-oriented economies do grow more rapidly than others. Whereas China integrated itself into the global economy, the Soviet Union and its Russian successor largely neglected exports and the benefits to be derived therefrom. According to Taylor and Jodice (1983:226–228), exports and imports constituted little more than 10% of the Soviet economy during the 1960s and 1970s. A 6% export ratio for China in 1979 (World Bank 1981:142) demonstrates much similarity between the Soviet Union and pre-reform China. By 1991 the Chinese export ratio had leapt to 20% (World Bank 1993:254). Because of the dissolution of the Soviet Union and the resulting disorder, we don’t know the comparable number for Russia. In the long run, however, the dissolution of the USSR should have increased the importance of foreign trade for the Russian economy. By and large, economic size and trade dependence are negatively correlated. Whereas China achieved export growth of 13% in the 1990s, the Russians achieved a meagre 2.3% (World Bank 2000–2001:294–295).15 Because of Hong Kong’s entrepot role, Chinese numbers are more likely to be understated than overstated. Whereas Russians sell commodities, oil and gas, the Chinese
from 78 to 41% from 1978 to 2002, whereas the share of the private sector (including foreign invested enterprises) rose from 0.2 to 41% of industrial output. 15 According to Shleifer and Treisman (2000:103), foreign trade dependence helps to reduce corruption. Obviously, the civilizing impact of trade must be weaker in Russia than in China. As Colombatto (2001) has argued, Western advice and Western policies have not been helpful in promoting a free and open Russian economy.
the transition to capitalism in china and russia
107
sell labour-intensive products.16 As Sachs and Warner (1995) or Bleaney and Nishiyama (2002) have demonstrated, high ratios of natural resource exports to GDP are generally associated with low growth rates. Because of currently high prices of oil and other natural resources, Russian exports in 2005 were about one-third of the value of Chinese exports, or a quarter if one includes Hong Kong’s exports with China’s. Given China’s huge preponderance in population size, these numbers look fairly good for Russia. But export quality remained what it was. Manufactured exports constituted 91% of Chinese exports, but merely 21% of Russian exports. The gap in high technology exports is also large and in favour of China. Its share in Chinese exports was 30% in China, but 9% in Russia (World Bank 2007:296–297). For China globalization worked. According to The Economist (2007:5) “with a trade-to-GDP ratio around 70% and a sea of foreign investment, China is one of the world’s most open economies”. In the first decade of the 21st century China had already overtaken Japan as an exporter and remained behind only Germany and the United States (The Economist 2007:8). In 2008 China is likely to become the biggest exporter in the world. Differences in economic openness also manifest themselves in differential attractiveness for foreign investment. In 1998, China received about 43 billion US dollars, whereas Russia got less than 3 billion (World Bank 2000–2001:314–315). According to The Economist (2001a:5), the gap was similar in 2000. In 2004, however, Russia’s attractiveness to foreign capital looked much better. The Russians received nearly a quarter of what Mainland China got, or about one-seventh, if one includes Hong Kong with China (World Bank 2007:296–297). As foreign direct investment tends to be more productive than domestic investment, as foreign investment does not crowd out domestic investment (de Soysa and Oneal 1999), as foreign takeovers of enterprises frequently lead to performance improvements in transition economies 16 Rodrik (2006) notes that China’s export basket is more sophisticated than one would expect from its level of economic development. Consumer electronics and auto parts are examples of such exports. Although no government knows in advance where investment in non-traditional exports is likely to succeed, experimentation with subsidies or (in the case of auto parts) local content requirements may help to upgrade a country’s export basket and thereby improve its growth rate. In Rodrik’s (2006) account, this is what China succeeded in. From a political economy perspective, the main requirement for this type of industrial policy is to stop subsidizing losers. Picking winners only is impossible, but giving up losers is not. Although Rodrik does not even raise the question, one may speculate about the differential feasibility of giving up losers in autocracies and democracies.
108
erich weede
(World Bank 2002:66), the long-standing preference of foreign capital owners for China over Russia looks like a self-fulfilling prophecy. Moreover, Chinese and Russian capitalists feel like foreigners toward their home countries. Capital flight in Russia exceeded inward investment for some time.17 In China it is the other way round. Much of Chinese flight capital re-enters under the foreign capital label via the backdoor (Heilmann 2000:235). Although Russia benefited from high and rising prices of oil during the first decade of the 21st century, which stabilized the currency and boosted economic growth, the middle-run prospects of Russia look much poorer than the prospects of China. In the early 1990s, Russia succeeded in privatizing a lot of its state-owned enterprises (Aslund 1995:223ff.; Layard and Parker 1996:125), but the preferential treatment of insiders in assigning private property rights, the dubious links between the new tycoons and the authorities, corruption and the absence of the rule of law together prevented the establishment of Russian industries (except for the extraction of natural resources) which can compete in global markets. Foreign investors acquired little property and control in Russia. But they might have contributed to healthier government finances in the 1990s in Russia, to more efficient management and to access to foreign technology and markets (Heilmann 2000:176–177). Therefore, the story of Russian privatization left Russian industry with serious handicaps.18 It cannot balance the Russian disadvantages compared to China—of a still largely collectivized agriculture, of establishing too few young enterprises, of inward orientation of its civilian manufacturing industries and too little foreign investment in them, of especially weak property rights. It is likely that China will build on its advantages and that Russia will fall further and further behind China. Russia does not owe this modest position to a poor endowment with natural resources—quite to the contrary—but to institutional deficits
17 The Economist (2001a:5) reported a monthly capital flight of $1.5 billion to be compared with yearly foreign direct investment of $2.7 billion. Russian capital flight is probably related to its frequently illegitimate acquisition. In 1998, 65% of the respondents in a survey favored confiscation of illegally acquired wealth (Silverman and Yanowitch 2000:151). Under such circumstances, capital flight looks rational. 18 Shleifer and Treisman (2000) argue that it was impossible to privatize Russian industry more efficiently. In their view, stakeholders or veto players had to be bought off. Insider privatization was a way of substituting an inefficient economic arrangement for an even less efficient prior arrangement. They cite restructuring and better productivity after privatization as support for their view.
the transition to capitalism in china and russia
109
and policy errors.19 One should not conclude therefrom that Russia could easily change and become as successful as China has been since the 1980s. Russia has neither established the rule of law, nor decentralized and developed market-preserving federalism, nor established private property rights in agricultural land, nor even started the route to export-oriented development. Worse still, political obstacles arising out of Russia’s ethnic heterogeneity might have prevented even a most enlightened Russian government from enacting more efficient reforms (see Shleifer and Treisman 2000). Except for the flat income tax of 13% which resulted in an 80% increase of government revenue within a year after its introduction, too many Russian reforms remained mere facades (Siegl 2001; World Bank 2002:64). Putin restored the Russian tradition of administration by subordination without recognizing its cost. According to Shevtsova (2003:97), “the regime of subordination went against Putin’s goal of building an efficient market economy, which demands freedoms and initiative”. Although both countries have made great progress on the road to economic freedom, albeit from a dismal starting point, China always scored better than Russia on the Fraser Index of economic freedom in the late 20th century. Only in 2004 did Russia at last come close to China’s score (Gwartney, Lawson, with Easterly 2006:19–21).20 The comparative performance of Russia and China fits with the view that economic freedom is productive (Doucouliagos and Ulubasoglu 2006; Gwartney et al. 2006; Liu 2007; Weede 2006). The Russians quickly established only a single trait of capitalist economies, i.e., inequality. In Russia, the top 10% received about 39% of the income—in the United States and China it was between 30% and 31%—at the end of the 20th century. In Russia the top 20% obtained nearly 54% of the income; in China and the United States it was between 46% and 47% (World Bank 2000–2001:282–283). 19 It has been argued (Lal 1998:3; de Soysa 2000), however, that rich resource endowments might reinforce predatory behavior and thereby make institutional and economic development more difficult. According to Ross (1999:297), “there is now strong evidence that states with abundant resource wealth perform less well than their resource poor counterparts, but there is little agreement on why this occurs”. For further evidence on the negative effects of the resource curse, see Weiffen (2004) and Collier (2007). 20 O’Driscoll, Holmes and Kirkpatrick (2001), however, disagree. In their view, economic freedom in China and Russia had been quite similar in the late 1990s, but Russia pulled ahead of China in 2000. For 2007, however, Kane, Holmes, and O’Grady (2007) place China and Russia very close to each other on their economic freedom scale.
110
erich weede
Admittedly, China is one of those globalizing economies where the size distribution of income deteriorated in the 1980s and 1990s. Because of strong economic performance, incomes of the poorest quintile in China nevertheless improved by 3.8% per year (Dollar and Kraay 2001:40). But the Chinese distribution of income has become more unequal than the Russian distribution according to the most recent data (World Bank 2006:280–281). Although inequality of income is essential in order to elicit effort and hard work and in order to guide producers to doing what consumers actually want, not all observable income inequality everywhere is justifiable in this way. Corruption, theft, robbery and rent-seeking may also produce income inequality. Most Russians do not explain inequality in their country by differences in effort, merit or hard work. Instead they believe that “swindlers and manipulators” stand the best chance to become rich in contemporary Russia (Silverman and Yanowitch 2000:33). The trademark of capitalism is not inequality of income, but competition among producers and satisfaction of the wants of consumers. In these respects, Russia is lagging. Male life expectancy provides another cue that Chinese society works better than Russian society. In poorer China it was about 63 years; in comparatively richer Russia it was only 58 years in the mid-1990s (Heilmann 2000:16). In 2004, life expectancy at birth was 70 for males and 73 for females in China, but still merely 59 for males and 72 for females in Russia (World Bank 2007:288–289). Russia might lose about 750,000 people per year, or up to 22 million in the first fifteen years of the 21st century (Legvold 2001:63). Establishing good government has always been easier in China than in Russia, because Chinese society is ethnically more homogeneous than Russian society. In addition to ethnic heterogeneity, there might be a second obstacle to good government and growth-promoting policies, i.e., income inequality and a potential for distributional struggles (Easterly 2001, chapter 13). When Communism was still practiced in Russia and China, both countries were fairly egalitarian at low levels of prosperity. Economic reforms and the transition to capitalism increased income inequality in both societies. Russia faces two obstacles to stable, effective and growth-promoting government, i.e., ethnic heterogeneity and inequality, and China merely one of them. Whereas China—like Japan before it (Weede 2004)—has succeeded in making nationalism an engine of growth, Russian nationalism always focused on the acquisition of territory and great power rhetoric (Trenin 2002), but never put the
the transition to capitalism in china and russia
111
idea of becoming a rich country and a global economic powerhouse high on its political agenda. Greenfeld (2001:218) argues that nationalism and “the view of the economy as a battlefield in the struggle for national supremacy” provides much of the motivation for economic growth. As in Japan more than one hundred years ago, nationalism in China might succeed in legitimizing entrepreneurship, private property rights, and capitalism and thereby overcome the traditional contempt and lack of respect from which merchants suffered in the Confucian societies of East Asia. 4. Conclusion In comparative terms, the Chinese transition to capitalism has been a success, whereas the Russian transition so far would have been a failure, had it not been rescued by high and rising energy prices during the Putin presidency. China enjoyed a head-start of about a dozen years. Whereas the Russian economy suffered negative growth rates at the beginning of this transition, China grew vigorously and persistently since the transition began. The size of the Chinese economy grew eightfold; per capita incomes sevenfold (Pei 2006:2). Although the protection of private property rights, rule of law, and economic freedom leave much to be desired in both countries, China overcame the legacy of socialism sooner and to a greater degree than Russia. Only China, but not yet Russia, benefits from the existence and competition of a multitude of small and dynamic enterprises. Whereas China seems to have established market-preserving federalism, Russia suffers from market-hampering federalism. Moreover, China has opened its doors more vigorously for foreign trade and investment than Russia did. The one head-start of Russia over China, i.e., early privatization of state-owned enterprises, did not result in beneficial consequences, because the preferential treatment of insiders and the weakness of the rule of law in Russia neutralized much of this potential Russian advantage. Moreover, the reassertion of state ownership under Putin has squandered some of the impressive productivity gains achieved by the previous, but transient privatization (Aron 2006). If Chinese administrative and economic policies were more efficient than Russia’s, this raises the question whether regime differences might account for divergent economic outcomes. To put it simply, in Russia democratization lead to capitalist reforms, whereas in China we have
112
erich weede
seen vigorous economic reforms under the autocratic guidance of a nominally still Communist party. Does this establish an inherent economic superiority of autocracies like China over more democratic, albeit still illiberal regimes like Russia? The economic history of China itself tells a different story. An autocracy is capable of getting its economic decisions disastrously wrong, as the Chinese did under Mao Zedong during the great leap forward resulting in mass starvation. It is also capable of promoting an economic miracle, as the Chinese did under Deng Xiaoping’s leadership and his successors. Less autocratic leadership is less likely to result in either extreme. As has been argued elsewhere (Weede 1996; Quinn and Wooley 2001), autocracies differ from democracies not in their average performance, but in their variation of performance. After some decades of very poor performance under autocratic policy-makers the Chinese benefited from a string of better results recently. Neither China nor Russia have yet established limited government and the rule of law. Whoever does so makes enlightened leadership (and luck in getting it) less important than they still are in most transitional societies. Like Russia, China is destined to gray without having become a rich country. Although both countries are aging the hard way (Eberstadt 2006), this may well be worse for a resource-poor country like China, which has to work its way towards prosperity, than for Russia with its abundant natural resources. By 2020, China’s population will stagnate. By 2015, 120 million Chinese, or about 9% of the population, will be older than 65 (Eberstadt 1998, 2006; England 2005:17). Since the mandatory retirement age for male employees is still 60 (for females it is 55), this lower age threshold might be more relevant. In 1990, about 9% of the Chinese population was over age 60; in 2030 it is likely to be about 22%, i.e., on the order of magnitude of 300 million people (Williamson and Shen 2004:3). Another decade later the percentage might be 26 and the number of old people about 400 million (England 2005:23). Then China’s share of old people might be higher than the Chinese share of the global population. Currently, the Chinese state has promised pension benefits to only about 10% of these hundreds of millions of old people. Whether paid for privately by their children or publicly by the state, support for the elderly will soon become a significant burden on the Chinese economy.21 Although China cannot 21 This number is only a crude guess. It has been ‘confirmed’ at the International Institute for Sociology Conference in Beijing in July 2004 by John B. Williamson who
the transition to capitalism in china and russia
113
continue to grow by increasing inputs, whether capital or labour, as it did so massively and successfully in the past, the size of its labour force will continue to grow for another twenty years and then start to decline gently. Moreover, internal migration from the countryside to the cities will contribute to the avoidance of a European-type labour shortage and contribute to growth (England 2005:118). Nevertheless, given the already high level, major increases in capital investment are inconceivable. The workers for increasing the input of labour simply will no longer be available. Worse still, on top of the aging problem China faces a major imbalance between men and women. Since there might be 16 to 20% more men than women, about every sixth Chinese man will not be able to find a wife (Eberstadt 1998:63; Poston 2004). Small and rich countries might close this gap by inviting in foreign women. The world’s most populous country, and still a comparatively poor one, cannot solve its home-grown problems in this way. Conceivably, China’s gender imbalance could even contribute to future political instability (Hudson and Boer 2002; Poston 2004). It has been demonstrated in a cross-national econometric study (Bloom and Williamson 1998) that economic growth rates are affected by differential growth rates of dependent and working-age populations. In the future, China’s growth rate will be reduced by the stronger growth rate of its dependent population, whereas Russia’s growth rate might remain dependent on the prices of its natural resources, especially its oil and gas exports. It is conceivable that exports of natural resources could keep a greying Russia solvent. According to West (2005:125), “oil and gas revenues and taxes are as much as 50 percent of government revenues, generate most of the country’s foreign exchange and subsidize domestic industry and agriculture”. It is also conceivable that the reassertion of state control under Putin could increase wastefulness, reduce the quality of management, decrease investment and prevent a massive flow of Western capital and technology to the Russian energy sector (Aron 2006; West 2005). Certainly, an aging Russia is not prepared for a sharp drop in oil and gas prices. The faster China’s has done research on old-age security in China together with Chinese social scientists. Most rural Chinese cannot look forward to public promises for their old age. Since about one-third of them do not have sons who are traditionally responsible for supporting their parents, their prospects are bleak. If they have to work during old age, they suffer the consequences of little formal schooling and no work experience beyond the low-income agricultural sector (Eberstadt 2006).
114
erich weede
economy grows, the higher China’s demand for natural resources and foreign commodities, the better the Russian terms of trade and Russian growth prospects might become.22 Concerning institutional and political development, however, Russia’s ‘resource curse’ (Collier 2007; Ross 1999; Weiffen 2004) generates more problems for Russia’s future than for China’s future.23 References Aron, Leon. 2006. “What Does Putin Want?” Commentary 122(5):19–24. Aslund, Anders. 1995. How Russia Became a Market Economy. Washington, DC: Brookings. Barro, Robert J., and Xavier Sala-I-Martin. 1995. Economic Growth. New York: McGraw-Hill. Blankart, Charles B. 2007. Föderalismus in Deutschland und Europa. Baden-Baden: Nomos. Blankenhagel, Alexander. 2000. “Legal Reforms in Russia.” Journal of Institutional and Theoretical Economics 156(1):99–119. Bleany, Michael, and Akira Nishiyama. 2002. “Explaining Growth: A Contest Between Models.” Journal of Economic Growth 7(1):43–56. Blomström, Magnus, Robert E. Lipsey, and Mario Zejan. 1996. “Is Fixed Investment the Key to Growth?” Quarterly Journal of Economics 111(1):269–276. Bloom, David E., and Jeffrey G. Williamson. 1998. “Demographic Transitions and Economic Miracles in Emerging Asia.” The World Bank Economic Review 12(3):419–455. Che Jiahua, and Yingyi Qian. 1998. “Insecure Property Rights and Government Ownership of Firms.” Quarterly Journal of Economics 113:467–496. Collier, Paul. 2007. The Bottom Billion. Why the Poorest Countries Are Failing and What Can Be Done About It. Oxford: Oxford University Press. Colombatto, Enrico. 2001. “Was Transition About Free-Market Economics?” Journal des Economistes et des Etudes Humaines XI(1):63–76. Courtois, Stephane. 1998. “Die Verbrechen des Kommunismus.” Pp. 11–43 in Das Schwarzbuch des Kommunismus, by Stephane Courtois et al. München: Piper. De Soysa, Indra. 2000. “The Resource Curse: Are Civil Wars Driven by Rapacity or Paucity?” Pp. 113–135 in Greed and Grievance: Economic Agendas in Civil Wars, edited by Mats Berdal and David M. Malone. Boulder, CO: Lynne Rienner. ——, and John R. Oneal (1999). “Boon or Bane? Reassessing the Productivity of Foreign Direct Investment.” American Sociological Review 64(5):766–782.
22 For a similar argument on the relationship between China’s growth and export success on the one hand and deteriorating terms of trade on the other hand, see The Economist (2007:10). 23 Environmental problems have not been considered in this paper. One may argue that air and water pollution—and water shortages in the North—pose a serious threat to China’s future (Harding 2007). Since Russia supports a much smaller population on a bigger territory than China, the consequences of environmental degradation promise to be more severe in China than in Russia.
the transition to capitalism in china and russia
115
Dollar, David. 1992. “Outward Oriented Developing Economies Really Do Grow More Rapidly.” Economic Development and Cultural Change 40(3):523–544. ——, and Aart Kraay. 2001. Trade, Growth, and Poverty. Washington, DC: World Bank (Working Paper). Doucouliagos, Chris, and Mehmet Ali Ulubasoglu. 2006. “Economic Freedom and Economic Growth.” European Journal of Political Economy 22(1):60–81. Easterly, William. 2001. The Elusive Quest for Growth: Economists’ Adventures and Misadventures in the Tropics. Cambridge, MA: MIT Press. Eberstadt, Nicholas. 1998. “Asia Tomorrow, Gray and Male.” The National Interest 53:56–65. ——. 2006. “Growing Old the Hard Way: China, Russia, India.” Policy Review 136:15–39. Economist, The. 1997. “Survey: Russia.” The Economist 344, 8025 ( July 12th). ——. 2000. “China’s State Owned Enterprises.” The Economist 356, 8190 (September 30th):93–97. ——. 2001a. “Survey: Russia. Putin’s Choice.” The Economist 360, 8231 ( July 21st). ——. 2001b. “Mobility in China. Off to the City.” The Economist 360, 8237 (September 1st):48. ——. 2007. “Reaching for a renaissance. A special report on China and its region.” The Economist 382, 8522 (March 31st). Edwards, Sebastian. 1998. “Openness, Productivity and Growth: What Do We Know?” Economic Journal 108:383–398. England, Robert Stowe. 2005. Aging China. The Demographic Challenge to China’s Economic Prospects. Westport, CT: Praeger (for the Center for Strategic an International Studies, Washington, DC). Feld, Lars P., and Stefan Voigt. 2003. “Economic Growth and Judicial Independence: Cross-Country Evidence Using a New Set of Indicators.” European Journal of Political Economy 19(3):497–527. Fu Zhengyuan. 1993. Autocratic Tradition and Chinese Politics. Cambridge: Cambridge University Press. Gimpelson, Vladimir, and Daniel Treisman. 2002. “Fiscal Games and Public Employment: A Theory with Evidence from Russia.” World Politics 54(2):145–183. Graeff, Peter, and Guido Mehlkop. 2003. “The Impact of Economic Freedom on Corruption: Different Patterns for Rich and Poor Countries.” European Journal of Political Economy 19(3):605–620. Greenaway, David, and Chong Hyun Nam. 1988. “Industrialization and Macroeconomic Performance in Developing Countries under Alternative Trade Strategies.” Kyklos 41:419–435. Greenfeld, Liah. 2001. The Spirit of Capitalism. Nationalism and Economic Growth. Cambridge, MA: Harvard University Press. Gwartney, James D., Randall D. Holcombe, and Robert A. Lawson. 2006. Institutions and the Impact of Investment on Growth.” Kyklos 59:255–273. Gwartney, James, Robert Lawson, with William Easterly. 2006. Economic Freedom of the World. Annual Report 2006. Vancouver, BC: Fraser Institute, and Potsdam: Liberales Institut. Harding, Harry. 2007. “Think Again. China.” Foreign Policy 159:26–32. Hayek, Friedrich August Von. 1945. “The Use of Knowledge in Society.” American Economic Review 35(4):519–530. Heilmann, Sebastian. 2000. Die Politik der Wirtschaftsreformen in China und Rußland. Hamburg: Mitteilungen des Instituts für Asienkunde, Band 317. Hudson, Valerie M., and Andrea Den Boer. 2002. “A Surplus of Men, A Deficit of Peace: Security and Sex Ratios in Asia’s Largest States.” International Security 26(4):5–38. Inglehart, Ronald. 1997. Modernization and Postmodernization. Cultural, Economic and Political Change in 43 Societies. Princeton: Princeton University Press.
116
erich weede
Kane, Tim, Kim R. Holmes, and Mary Anastasia O’Grady. 2007. 2007 Index of Economic Freedom. Washington, DC: Heritage Foundation, and New York: Wall Street Journal. Keren, Michael. 2002. “Socialism and Stalinism: Never the Twain Shall Part? Or Why Can’t we have Liberal Socialism?” Paper presented at the European Public Choice Meeting, Belgirate (Italy), April 4–7. Kristof, Nicholas D., and Sheryl Wudunn. 1994. China Wakes. New York: Random House. Lal, Deepak. 1998. Unintended Consequences: The Impact of Factor Endowments, Culture, and Politics on Long-Run Economic Performance. Cambridge, MA: MIT Press. Lardy, Nicholas R. 1998. China’s Unfinished Economic Revolution. Washington, DC: Brookings. Layard, Richard, and John Parker. 1996. The Coming Russian Boom. New York: Free Press. Legvold, Robert. 2001. “Russia’s Unreformed Foreign Policy.” Foreign Affairs 80(5):62–75. Levine, Ross, and David Renelt. 1992. “A Sensitivity Analysis of Cross-Country Growth Regressions.” American Economic Review 82:942–963. Li Shi. 2000. “Efficiency and Redistribution in China’s Revenue Sharing System.” Pp. 103–122 in Governance, Decentralization and Reform in China, India and Russia, edited by Jean-Jacques Dethier. Dordrecht and Boston: Kluwer Academic Publishers. Lin Justin Yifu, Fang Cai, and Zhou Li. 2003. The China Miracle: Development Strategy and Economic Reform. Hong Kong: The Chinese University Press. Liu Lirong. 2007. Wirtschaftliche Freiheit und Wachstum. Münster: Lit-Verlag. Lynch, Allen C. 2001. “Einen Schritt vor, zwei Schritte zurück. Wurzeln des wirtschaftlichen Dilemmas in Ruland.” Internationale Politik 56(10):17–26. Maddison, Angus. 1998. Chinese Economic Performance in the Long-Run. Paris: OECD. Mehlkop, Guido. 2000. “Methodische Probleme bei der Analyse von Wertvorstellungen und Wirtschaftswachstum.” Zeitschrift für Soziologie 29(3):217–226. Mises, Ludwig Von. 1920. “Die Wirtschaftsrechnung im sozialistischen Gemeinwesen.” Archiv für Sozialwissenschaft und Sozialpolitik 47(1):86–121. Montinoala, Gabriella, Yingyi Qian, and Barry Weingast. 195. “Federalism Chinese Style. The Political Basis for Economic Success in China.” World Politics 48(1):50–81. Newman, Graeme. 2002. “Crimes and Punishments.” Foreign Policy 127:28–29. O’Driscoll, Gerald P., Kim R. Holmes. and Melanie Kirkpatrick. 2001. Index of Economic Freedom. New York: Wall Street Journal, and Washington, DC: Heritage Foundation. Olson, Mancur. 2000. Power and Prosperity: Outgrowing Communist and Capitalist Dictatorships. New York: Basic Books. Pei Minxin. 1998. “Is China Democratizing?” Foreign Affairs 77(1):68–82. ——. 2006. China’s Trapped Transition. The Limits of Developmental Autocracy. Cambridge, MA: Harvard University Press. Pejovich, Svetozar. 2001. “After Socialism. Where Hope for Individual Liberty Lies.” Journal des Economistes et des Etudes Humaines XI(1):9–30. Poston, Dudley L. 2004. “The Demographic Destiny of China, South Korea and Taiwan: Changes and Implications for the Family.” Paper presented at the 36th World Congress of the International Institute of Sociology, Beijing, July 7–11. Qian Yingyi. 2000. “The Process of China’s Market Transition (1978–1998).” Journal of Theoretical and Institutional Economics 156(1):151–171. Quinn, Dennis P., and John T. Wooley. 2001. “Democracy and National Economic Performance: The Preference for Stability.” American Journal of Political Science 45(3):634–657. Rodrik, Dani. 2006. What is so Special About China’s Exports? Cambridge, MA: Harvard University, Kennedy School of Government, Faculty Research Paper RWP 06–001.
the transition to capitalism in china and russia
117
Ross, Michael L. 1999. “The Political Economy of the Resource Curse.” World Politics 51(2):297–322. Ross, Robert S. 2005. “Assessing the China Threat.” The National Interest 81:81–87. Rummel, Rudolph J. 1994. Death by Government. New Brunswick, NJ: Transaction. Ryback, Andrzej. 2000. “Putin steuert bei der Bodenreform Zick-Zack-Kurs.” Financial Times Deutschland, March 20:14. Sachs, Jeffrey D., and Andrew M. Warner. 1995. “Natural Resource Abundance and Economic Growth.” Cambridge, MA: NBER Working Paper W 5398. Sala-I-Martin, Xavier, Gernot Doppelhofer, and Ronald I. Miller. 2004. “Determinants of Long-Term Growth.” American Economic Review 94(4):813–835. Shevtsova, Lilia. 2003. Putin’s Russia. Washington, DC: Carnegie Endowment for International Peace. Shleiffer, Andrei, and Daniel Treisman. 2000. Without a Map. Political Tactics and Economic Reform in Russia. Cambridge, MA: MIT Press. Siegl, Elfie. 2001. “Rußland baut Reformfassaden.” Frankfurter Allgemeine Zeitung 205 (September 4):17. Silverman, Bertram, and Murray Yanowitch. 2000. New Rich, New Poor, New Russia. Armonk, NY: Sharpe. Smith, Adam. 1776–1976. An Inquiry into the Nature and Causes of the Wealth of Nations. Oxford: Oxford University Press. Taube, Markus. 2001. “Ostasien—neues Gravitationszentrum der Weltwirtschaft im 21. Jahrhundert?” Pp. 119–139 in Jahrbuch für internationale Sicherheitspolitik, edited by Erich Reiter. Hamburg: Mittler. Taylor, Charles Lewis, and David A. Jodice. 1983. World Handbook of Political and Social Indicators. 3rd ed. New Haven, CT: Yale University Press. Treisman, Daniel. 2000. “Russia’s Federal System of Public Finance: Trends, Politics, and Pressing Issues.” Pp. 65–68 in Governance, Decentralization and Reform in China, India and Russia, edited by Jean-Jacques Dethier. Dordrecht and Boston: Kluwer Academic Publishers. Trenin, Dmitri. 2002. The End of Eurasia. Washington, DC: Carnegie Endowment for International Peace. Weede, Erich. 1996. “Political Regime Type and Variation in Economic Growth Rates.” Constitutional Political Economy 7(3):167–176. ——. 2000. Asien und der Westen. Baden-Baden: Nomos. ——. 2004. “Comparative Economic Development in China and Japan.” Japanese Journal of Political Science 5(1):69–90. ——. 2006. “Economic Freedom and Development.” CATO Journal 26(3):511–524. Weiffen, Brigitte. 2004. “The Cultural-Economic Syndrome: Impediments to Democracy in the Middle East.” Comparative Sociology 3(3–4):353–375. Wen Tie Jun. 2000. “The Reform of the Agricultural Tax System.” Pp. 279–296 in Governance, Decentralization and Reform in China, India and Russia, edited by Jean-Jacques Dethier. Dordrecht and Boston: Kluwer Academic Publishers. West, J. Robinson. 2005. “The Future of Russian Energy.” The National Interest 80:125–127. Williamson, John B., and Ce Shen. 2004. “Do Notional Defined Contribution Accounts Make Sense as Part of the Old-Age Security Mix for China?” Paper presented at the 36th World Congress of the International Institute of Sociology, Beijing, July 7–11. Winiecki, Jan. 1988. The Distorted World of Soviet-Type Economies. London: Routledge. World Bank. 1981. 1993. 1996. 2000–2001. 2002. 2006. 2007. World Development Reports. New York: Oxford University Press. ——. 1993a. The East Asian Miracle. New York: Oxford University Press. Xu Yi-Chong. 2000. “State Assets Depletion and Economic Reform in China.” Studies in Comparative International Development 35(1):73–100. Yavlinsky, Grigory. 1998. “Russia’s Phony Capitalism.” Foreign Affairs 77(3):67–79.
118
erich weede
Zakaria, Fareed. 2003. The Future of Freedom. New York: Norton. Zhu Ze, and Barbara Krug. 2007. Is China a Leviathan? Rotterdam: Erasmus University, Rotterdam School of Management, Manuscript. Zhuravskaya, Ekaterina V. 2000. “Market-Hampering Federalism: Local Incentives for Reform in Russia.” Pp. 127–170 in Governance, Decentralization and Reform in China, India and Russia, edited by Jean-Jacques Dethier. Dordrecht and Boston: Kluwer Academic Publishers.
SOCIAL STRUCTURE AND PERSONALITY DURING THE PROCESS OF RADICAL SOCIAL CHANGE: A STUDY OF UKRAINE IN TRANSITION Melvin L. Kohn, Valeriy Khmelko, Vladimir I. Paniotto and Ho-fung Hung The theoretical question that motivates this inquiry is whether the relationships between social structure and personality previously found in both Western and non-Western, capitalist and socialist societies (Kohn et al. 1990; Kohn and Slomczynski 1990) during periods of apparent social stability continue to obtain even during periods of radical social change. Following Williams (1970), we define social change as change in the structure of the society, not merely as an eventful or dramatic period in the life of that society: “Change occurs when there is a shift in pattern, when new relationships emerge . . . ” (Williams 1970:620–621). By radical social change, we refer not to the pace of change but to the nature of the change—the transformation of one political and economic system into a quite different system. Our exemplar of radical social change is the transformation of the countries of Eastern Europe and the former Soviet Union from socialism to nascent capitalism. The question of whether the relationship of social structure and personality continues to hold under conditions of radical social change has been provisionally answered by comparative analyses of Poland and Ukraine, based on cross-sectional surveys of the adult populations of the urban areas of those countries in 1992–93 (Kohn et al. 1997). In all those respects in which socialist Poland had shown a pattern of relationships of social class and of social stratification with personality similar to that found in studies of the capitalist United States and Japan, it continued to do so after the advent of nascent capitalism: Under conditions of radical social change, just as under conditions of social stability, people of more advantaged class position, and of higher social-stratificational level, enjoyed much greater opportunity to be self-directed in their work—that is, to do more substantively complex work, to be less closely supervised, and to work under less routinized conditions—than did people of less advantaged social-structural position. Occupational self direction, in turn, continued to be conducive
120
melvin l. kohn et al.
to more self-directed orientations to self and society and to greater intellectual flexibility. Where, however, socialist Poland had differed from the United States and Japan (notably, in that people of more privileged position in the capitalist countries had a stronger sense of well-being, and people of less privileged position were more distressed, while nearly the opposite obtained in then-socialist Poland), Poland in transition now fully exemplified the capitalist pattern. Ukraine seemed to be following a similar trajectory, albeit at a slower pace: Ukraine showed the same pattern of relationships between social structure and personality as did Poland, but all the relationships were weaker in magnitude, with those for distress not even statistically significant.1 The evidence of the cross-sectional analyses thus demonstrates that the radical social change attendant on the transformation of the social and economic structures of Poland and Ukraine had not fundamentally affected the relationships between social structure and personality, at least for the employed segments of the population, except insofar as the social structures of these countries had become more like those of capitalist countries. Yet, the comparative analyses of Poland and Ukraine tell us little about the dynamics of the ongoing process. Not only were the analyses necessarily cross-sectional, but the transformation of Poland had occurred so rapidly that, by the fall and early winter of 1992–93, the relationships of social class and social stratification with personality already exemplified the capitalist pattern of the United States and Japan (Kohn et al. 1990; 1997). The Ukrainian transformation had not advanced nearly so far, and the relationships between social structure and personality, while similar to those for Poland, were not nearly so sharply pronounced; Ukraine was still very much in process of transformation to nascent capitalism. In terms of movement away from the long-term domination of the economy by a centralized system of command, though, the Ukrainian transformation was even more profound than the Polish. For sixty years prior to the beginning of the transformation, Ukrainians had had 1 Kohn et al. (1997) speculated about whether the weaker relationships for Ukraine than for Poland of job conditions and personality, and thus also of class and stratification with personality, were more likely a carryover of Ukraine’s history as part of the Soviet Union or the result of the extreme conditions of uncertainty that Ukraine was experiencing at the time of the 1992–93 survey. Without pertinent data from Ukraine while it was part of the Soviet Union, there was no way to be certain which explanation was valid.
a study of ukraine in transition
121
no experience with private enterprise: even small private enterprises were forbidden. In the rural areas of Ukraine, where more than 80 percent of the population lived, private ownership was eradicated by draconian measures, beginning in the 1930’s. In Poland, by contrast, small private business was never forbidden, and agriculture was never socialized. Moreover, the economy of Ukraine was an integral part of the centralized economy of the USSR. With the disintegration of the USSR, the industrial connections of the enterprises of Ukraine with tens of thousands of enterprises in fourteen newly independent countries of the former Soviet Union were abruptly broken. The result was a much sharper decline in production and in the standard of living in Ukraine than in Poland. Thus, in terms of the depth of the changes that were occurring, the early years of the transformation in Ukraine were extremely radical. The process was hardly complete in 1992–93, and was still ongoing for some years to come. Herein lies the impetus and the opportunity for the present study. With the realization that Ukraine had been at a very early stage of a very radical transformation at the time of the cross-sectional surveys in 1992–93, the Ukrainian collaborators in the present research grasped the unique opportunity to secure the data that would make possible longitudinal analyses of the dynamics of change during the ongoing process of radical social change. In the spring and summer of 1996 they re-interviewed all those men and women in the original sample who had been in the labor force at the time of the initial interview. This made possible the conversion of a cross-sectional survey conducted at a time when the transformation of Ukraine had barely begun into a longitudinal data-set extending three to three and a half years into the ongoing transformation. For this study, even more than for most studies of social structure and personality, context is crucial. Khmelko’s (2002) analysis of macrosocial change in the first decade of Ukrainian independence documents that by 1996 (and even later), although Ukraine had left its former socialist economy far behind, it had not moved decisively to a capitalist social and economic structure. This paper, then, is not a study of Ukraine before and after its transformation from socialism to capitalism, nor of Ukraine during and after the transformation, but of Ukraine during the early stages of an ongoing transformation whose eventual outcome was still uncertain. The strategic value of a longitudinal study of Ukraine during these years is that it enables us to study the dynamics of the relationships between
122
melvin l. kohn et al.
social structure and personality during the ongoing process of radical social change, our analytic lens being an examination of what happens to these relationships under such uncertain, changing conditions. This inquiry thus provides an extreme test of whether the relationships of social structure and personality found in studies conducted under conditions of apparent social stability obtain even during the ongoing process of radical social change. As will be apparent in the analyses to be presented shortly, the test is not only longitudinal, and not only conducted during the very process of radical social change, but extremely severe for a reason that we did not anticipate and that contrasts sharply with the findings of many studies of personality conducted during times of social stability: the over-time correlations of the dimensions of personality we study are astonishingly low. The central questions we shall pursue, then, are (1) whether the relationships of social structure and personality are meaningful, non-trivial in magnitude, and consistent over time even during the very process of radical social change and even in the face of instability of personality during this period of time; and (2) if they are meaningful, non-trivial, and consistent under these extraordinary circumstances, what makes this possible. Sample and Methods of Data-Collection The Baseline Cross-sectional Survey of 1992–93 The initial, cross-sectional survey of Ukraine, which we now treat as the baseline for our longitudinal analyses, was conducted in the winter of 1992–93. It was based on face-to-face interviews with representative samples of all men and women living in urban areas of the country. The sample was drawn by the Ukrainian members of our research team, who designed a method to overcome the limitations of past procedures for selecting samples in the former Soviet Union and the poor quality of official statistics in Ukraine. Their method is based on multi-stage random sampling: the first stage being to sample from seven hundred districts, then to successively sample post offices, streets, buildings, and apartments, and finally residents aged 18 or older living in the selected dwellings. The survey was carried out by the Kiev International Institute of Sociology, a Research Center that the Ukrainian investigators had created in 1990. Since sociologists in the former Soviet Union had had little
a study of ukraine in transition
123
experience in conducting surveys based on face-to-face interviews (see Kohn 1993), they had to develop their survey research center almost from scratch. Fortunately, they had the expert assistance of Michael Haney of the Research Institute of Radio Liberty, who conducted intensive interviewer training sessions (in Russian, which he speaks fluently), in preparation for surveys that the Center carried out for Radio Liberty, with further training in the conduct of academic surveys by the uniquely knowledgeable sociologist and Sovietologist, Michael Swafford, again in Russian, which he too spoke fluently. By the time the Institute carried out the cross-sectional survey of 1992–93, it had a trained and experienced field staff and a good system for ensuring that their interviews were of high quality. The investigators successfully interviewed 81% of their designated respondents, interviewing 2322 people (966 men and 1356 women). The apparent over-representation of women in this sample reflects the demographic composition of the country. (For further information about sampling and methods of data collection, and for information about the methods used for pretesting the interview schedule and for insuring comparability of meaning and measurement, not only between the Russian and Ukrainian versions of the interview schedules, but also with past studies of other countries in other languages, see Appendix A of Kohn et al. 1997.) The Follow-up Survey With limited resources for fieldwork, the investigators restricted the follow-up survey to those respondents in the original survey who had at that time been in the labor force—defined in Ukraine, as in the United States, as either gainfully employed or not employed and looking for work. This was a strategic subsample for studying movement into and out of the ranks of the employed and for studying the psychological concomitants of interclass mobility. Limiting the follow-up survey to people in the labor force, though, had the corresponding disadvantage of precluding longitudinal analysis of housewives and pensioners. Still, we can compensate for much of this loss by juxtaposing the cross-sectional data about people who in 1992–93 were housewives or pensioners to the follow-up data about people who in 1992–93 were in the labor force but by 1996 had become housewives or pensioners. Securing an adequate completion rate in the 1996 follow-up survey proved to be even more difficult than in the baseline survey of 1992–93,
124
melvin l. kohn et al.
in part because many Ukrainians had become disillusioned with the formal institutions of their society; and in part, too, because economic conditions were so difficult that many people—even employed urbanites—spent the time when they were not at their jobs doing what amounts to subsistence farming, in small plots in or near the cities or towns in which they live, and so were not available to be interviewed.2 With great persistence, the Ukrainian investigators did secure interviews with approximately 75% of their intended sample—admittedly, no longer fully representative of the population to which we would like to generalize, but—interpreted cautiously—useful for the study of the dynamics of change. The Over-Time Stability of Personality Authoritarian Conservatism An obvious question with which to begin our analysis is how stable was personality during this period of radical change. To answer this question, and to provide crucial indices of personality for analyses to follow, we developed longitudinal measurement models of the same dimensions of personality as had been studied in the cross-sectional analyses of Poland and Ukraine for 1992–93 (see Kohn et al. 1997, Appendix Table A-1), and had earlier been studied for the United States (Kohn and Schooler 1983, Chapter 6 and Appendices C and D), Poland when it was socialist (Kohn and Slomczynski 1990, Chapter 4), and Japan (Kohn et al. 1990). We began with authoritarian conservatism vs. open-mindedness, deliberately selected as a well-measured dimension of orientation to self and society, one that had been shown to be highly stable in the longitudinal analyses of U.S. men. An initial model showed the over-time correlation (which we shall call its stability) of this dimension of orientation to be astonishingly low (at 0.18 for men and 0.37 for women), particularly for the relatively short interval
2 A survey of a representative sample of 4500 Ukrainian households carried out by the Kiev International Institute of Sociology in the summer of 1996 found that approximately 62 percent of the urban population were engaged in subsistence agriculture, spending on the average 24 hours per week on such activities. The urban residents who engaged in these agricultural activities were not limited to manual workers, but included the self-employed and small-scale employers.
a study of ukraine in transition
125
of three to three-and-a-half years, even considering the tumultuous times that Ukraine was then experiencing. By contrast, for U.S. men over a 10-year period of much greater social stability, from 1964 to 1974, the over-time correlation had been 0.78 (Kohn and Schooler 1983:328). What may be a more apt comparison, even though based on a small sample: Bogdan Mach’s analysis of subsamples of 99 men and 98 women representative of the southern half of Poland during approximately the same span of time as the Ukrainian analyses,3 albeit during a more advanced stage of transformation, yielded over-time correlations for both men and women about as high as that for U.S. men: 0.78 for men and 0.76 for women. We were so astonished at the extraordinarily low stability of authoritarian conservatism for Ukraine that we thought it unwise to pursue the analysis until we were confident that the finding was not an artifact of some flaw in the fieldwork, or in matching baseline respondents and follow-up respondents, or in data-processing.4 We therefore retraced our steps, beginning with the selection and locating of respondents for the follow-up survey, not only reviewing field notes but also conducting brief re-interviews of respondents in the follow-up survey, to be certain that they really were the same people as the original respondents. We also checked our procedures for merging the baseline and follow-up data-files, to be certain that we had not mismatched any 1992–93 and 1996 respondents. Since we had earlier found that the information provided in four respondents’ initial and follow-up interviews about such identifying personal characteristics as age, gender, educational attainment, and marital and parental status had been so inconsistent as to suggest that the wrong person had been re-interviewed or even that one or the other interview had been fraudulent, we did a systematic analysis of the consistency of such information for all people in the longitudinal sample. We also refined the criteria that had been employed for including respondents in the follow-up sample.
3 The baseline survey for this analysis (and for other Polish analyses to be discussed below) consisted of face-to-face interviews with a representative sample of all men and women living in the urban areas of Poland (see Kohn et al. 1997). The follow-up survey was conducted with a small but representative subsample of those members of the original sample living in the southern half of the country. 4 We here follow the strategy that, before embarking on substantive interpretations of cross-national differences, one should always attempt to rule out the possibility of these differences being a methodological artifact (Kohn 1987:719–721).
126
melvin l. kohn et al.
We found no inconsistencies that required excluding other respondents from the sample (although we did find a few inconsistencies that seem to have resulted from miscoding rather than mismatching, which we were able to correct). We did, however, find that 28 respondents who had said they were “working” at the time of the initial interview did not provide convincing evidence that they were actually employed, or were on leave from their jobs, or were actively seeking employment. We removed them from the sample as not really being in the labor force. Neither their removal from the sample nor the correction of inconsistent information made the slightest difference in the measurement model of authoritarian conservatism. In particular, the stability of authoritarian conservatism was unchanged, both for men and for women (for the final measurement model, see Table 1). We count this as important evidence against the possibility that the low over-time correlations were somehow the result of methodological artifact. There is much more evidence to come. Other Dimensions of Orientation The logically next question is whether the low stability of authoritarian conservatism is indicative of a more general pattern of instability of orientation to self and society or is somehow unique to this particular dimension. To answer this question entails assessing the stability of all seven dimensions of orientation that we use in our analyses—anxiety, authoritarian conservatism, receptiveness or resistance to change, personally responsible standards of morality, self-confidence, self-deprecation, and trustfulness. We developed satisfactory longitudinal measurement models for all these dimensions of orientation—satisfactory, as judged by the models’ fitting the data well (see the first column of Table 2) and the parameters being consistent with those of cross-sectional models for Ukraine and other countries and with longitudinal measurement models for the United States and Poland. From these models, we learn that, although authoritarian conservatism is one of the two least stable of the seven dimensions of orientation for men, it is of intermediate rank for women, and in any case (even for men) is not unique in having a low stability: several dimensions of orientation had only small-to-modest stabilities for one or both genders (see the second and third columns of Table 2). We also learn that the magnitudes of over-time correlation
a study of ukraine in transition
127
Table 1. Longitudinal measurement model of authoritarian conservatism, Ukraine (1992–93 and 1996) Concept and Indicators
Standardized Paths: Concept to Indicators Men
Women
1992–93 1996
1992–93 1996
Authoritarian Conservatism The most important thing to teach children is absolute obedience to their parents. It’s wrong to do things differently than past generations did. Any good manager should be demanding and strict with people under him in order to gain their respect. In this complicated world, the only way to know what to do is to rely on specialists. No decent man can respect a woman who has engaged in sexual relations before marriage. One should always show respect to those in authority. You should obey your superiors whether or not you think they’re right. Young people should not be allowed to read books that are likely to confuse them. Over-time correlation of the concept Ratio of chi-square to degrees of freedom Root mean square error of approximation Number of cases
.50*
.40*
.64*
.53*
.67*
.49*
.62*
.47*
.44*
.56*
.44*
.42*
.41*
.46*
.47*
.31*
.42*
.31*
.52*
.54*
.48*
.31*
.41*
.42*
.30*
.52*
.36*
.53*
.30*
.40*
.49*
.42*
.18*
.37* 2.11 0.05
380
460
*p