Outcome-Based Evaluation Second Edition
Outcome-Based Evaluation Second Edition Robert L. Schalock Professor Emeritus...
75 downloads
2154 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Outcome-Based Evaluation Second Edition
Outcome-Based Evaluation Second Edition Robert L. Schalock Professor Emeritus Hastings College Hastings, Nebraska
KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: Print ISBN:
0-306-47620-7 0-306-46458-6
©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow Print ©2001 Kluwer Academic/Plenum Publishers New York All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: and Kluwer's eBookstore at:
http://kluweronline.com http://ebooks.kluweronline.com
Preface to the First Edition This book is the product of 30 years of experience with program evaluation. During this time, both service recipients and educational and social programs have experienced major cultural and political shifts in service delivery philosophy, including a focus on quality, mainstreaming, deinstitutionalization, community inclusion, and an emphasis on measurable outcomes. Recently stakeholders of these programs have demanded more than just the provision of service, forcing program administrators to evaluate their programs’ effectiveness and efficiency. The “era of accountability” is here, and my major goal in writing this book is to help current and future program administrators understand that they need to look beyond simply the provision of service. Indeed, they need to be competent in outcome-based evaluation, which I define as a type of program evaluation that uses valued and objective person-referenced outcomes to analyze a program’s effectiveness, impact or benefit-cost. By design, this book can be read from the perspective of a consumer or producer of outcome-based evaluation. As a consumer, the reader will be introduced to the various techniques used in outcome-based evaluation, and how to interpret data from outcome-based evaluation analyses. As a producer, the reader will be instructed in how to do outcome-based evaluation analyses, along with how to use and act on their results. For both the consumer and producer, two questions should guide the use of outcome-based evaluation: For what purpose will I use the outcome-based evaluation data, and What information will I need for the intended use? Numerous examples of outcome-based evaluations that reflect answers to these two questions will be provided throughout the text. The reader will encounter a number of key terms throughout the text. Chief among these are: Valued, person-referenced outcomes that reflect both the results of the intervention provided and an enhanced quality of life for the service recipient. v
vi
Preface to the First Edition
Performance-based assessment that involves using objective indicators to evaluate a person’s adaptive behavior level and role status. Outcome-based analyses that include effectiveness, impact, or benefitcost. These analyses are used respectively to determine whether the program is meeting its goals, whether the program made a significant difference, or whether the program represents a reasonable return on investment. Data-based management systems that are used to provide the information necessary for both outcome-based analysis and formative feedback that can be used by program administrators to increase their programs’ effectiveness and efficiency. I have attempted to make this book as “user friendly” as possible. I realize that most of the readers are neither program evaluators nor statisticians. As a teacher and program evaluator for these 30 years, I have discovered that outcome-based evaluation requires primarily logical thinking and being clear in the questions asked. Once one knows where he/she is going and the basic road map to get there, then the journey is much easier. Thus, I have attempted throughout the text to provide the reader with easily-read and followed tables, graphs, and exhibits that should facilitate both our tasks. For those readers who like to count and keep track, there are 16 figures, 24 tables, 34 exhibits, and 20 guiding principles that summarize key points. For those readers who want to go into greater detail, I have provided study questions and a list of additional readings for each chapter. Whether this book is read from cover to cover or by topical area, my goal has been to make your journey easier by stressing the critical need for thinking clearly and asking specific questions that can then be answered via one or more of the OBE analytic techniques discussed in the text’s 9 chapters. This book is divided into five sections, beginning with an overview and concluding with the future of outcome-based evaluation. Working with program administrators for the last 30 years has made me sensitive to the challenging job they face. Their task is probably best exemplified in a recent book by Hammer and Champy (1993) entitled, Reegineering the Corporation: A Manifesto for Business Revolution. In the book, the authors discuss the “Three Cs” of current service delivery: consumers, competition, and change. Consumers are asking more from education and social programs; fiscal restraints are forging new, competitive service delivery mechanisms; and change is constant. Thus, any program evaluation effort must be designed and implemented within the current zeitgeist that demands accountability within the context of the “Three C’s.” My hope and anticipation is that this text will assist present and future program administrators to understand and use outcome-based evaluation to demonstrate their programs’ effectiveness, impact, or benefit–cost. If this is the case, the journey has been both beneficial and rewarding.
Preface to the Second Edition Doing a second edition of a book is a wonderful challenge. On the one hand, an author wants to retain the core values and approaches presented in the first edition, yet at the same time update the reader with the current thinking and new methods that have developed over the past five years in the field of outcome-based evaluation. Certainly the main trends that led to the publication in 1995 of Outcome-Based Evaluation have continued, including the focus on person-referenced outcomes, the increased need for program accountability, the increasing use of methodological pluralism, and the popularity of the pragmatic evaluation paradigm. Additionally, there continues to be major philosophical shifts in the delivery of education, health care, and human and social service programs. These shifts emphasize programmatic issues such as demonstrated quality of life outcomes, consumer empowerment, increased accountability, and the need to evaluate the efficiency and effectiveness of programs and services. As a result of this emphasis, service providers, policymakers, funders, and program evaluators have been challenged to examine critically the manner in which programs are delivered and evaluated. Both the published reviews and the informal comments received from colleagues regarding Outcome-Based Evaluation were quite positive. Overall, reviewers and text users characterized it as a user-friendly guide to the challenges of documenting a program’s impact, effectiveness, or benefit-cost. Others commented about the important contribution the text made to matching the type and complexity of the evaluation to the needs of their programs. Still others commented on its 20 guiding principles, study questions, focus on internal evaluation, basis in logical thinking, and use of contextual analysis. At a personal level, I was very honored when the first edition was chosen by Doody’s Rating Service as one of the best health science books in 1996. However, there were also a number of suggestions should a second edition be published. Across several reviewers, there was a suggested need to expand coverage and examples into other fields such as education, health, vii
viii
Preface to the Second Edition
mental health, aging, and corrections, and reducing the apparent emphasis on developmental disabilities. Indeed, one reviewer actually counted the published studies in the text involving persons with mental retardation and reported that 65 of the 115 studies (42 percent) involved these persons. Analogously, another reviewer recommended expanding the material to make the text more valuable to administrators in school districts, hospitals, universities, human resource departments, drug rehabilitation agencies, and local governments. Two reviewers suggested the need to integrate the concept of multigoal, theory-driven evaluation throughout the text, rather than simply referring to it in the beginning section only. Still others suggested the need to illustrate how using an impact model can guide the selection of process variables, intermediate outcomes, and end-of-program outcomes to help develop recommendations for managers and staff. Finally, at least one reviewer noted the need to expand on the differences among statistical, clinical, and practical significance. I have taken these suggestions seriously in completing the second edition of Outcome-Based Evaluation. The thrust of the second edition is more generic, focusing on the interests and needs of a larger evaluation community–administrators, students, policymakers, funders, policy analysts, consumers, and educators. More specifically, readers of the second edition will find: A user-friendly, practical, “how to” presentation of the four types of outcome-based evaluation: program, effectiveness, impact, and policy. A multiple measurement approach to accountability assessment. Applications to the fields of education (regular and special), health care (medical and mental), and human and social service programs (disabilities, aging, substance abuse, and corrections). An outcome-based evaluation model that is used with slight variations throughout the text. The model responds to the dual needs of program evaluators to focus on both organization-referenced outputs and person-referenced outcomes. A detailed discussion of outcomes research and why it is so critical in program evaluation. Homepage Web sites for organizational, state, and national databases. A detailed explanation of methodological pluralism that allows one to use both qualitative and quantitative research methods to evaluate subjective and objective outcomes. Scenarios and examples of program, effectiveness, impact, and policy evaluation across a wide range of education, health care, and human service programs.
Preface to the Second Edition
ix
Current application of key accountability concepts such as report cards, benchmarks, performance measurement, informatics, national databases, practice guidelines, and participatory action research. Updated evaluation scenarios and exhibits reflecting the challenges, opportunities, and utility of outcome-based evaluation. Further reading suggestions and study questions for each chapter. 19 Figures, 32 Tables, and 21 exhibits. I have enjoyed the challenges posed by this second edition. The field of outcome-based evaluation is expanding rapidly. In the text, I attempt to integrate the important trends and techniques into a user-friendly approach to evaluation that will be both valuable and useful to a wide constituency of program evaluation users. In the end, outcome-based evaluation represents a way of thinking about and approaching complex human, social, and political issues. It offers an evaluation orientation in which values for the well-being of people are the foundation. It demonstrates that effective evaluation clearly is not just a set of technical methods and tools; at its core, outcome-based evaluation is anchored ultimately in the values of stakeholders. I hope that this orientation is very apparent to the reader, along with the tools and techniques that make outcome-based evaluation the viable approach to program evaluation that it is today.
Acknowledgments This book is dedicated to my students, program administrators, consumers, colleagues, and my wife, Susan, who have been receptive and supportive of my ideas and efforts throughout my career. I have learned so much from each of them and deeply appreciate the significant roles they have played in my life. I am also most deeply appreciative of the editorial advice provided by Frank Rusch throughout the early development of both editions. My thanks also for the expert technical assistance provide by Darlene Buschow and Janet Burr.
Contents I: AN OVERVIEW OF OUTCOME-BASED EVALUATION 1. An Overview of Outcome-Based Evaluation and Its Application
5
Overview Definition Elements Methodological Pluralism Comparison with Other Types of Evaluation Formative Feedback Summary Study Questions Additional Readings
5 6 7 9 12 13 14 16 16
2. Program Evaluation
17
Overview Use of Outcome Measures A Multiple Measurement Approach to Accountability Performance Assessment Consumer Appraisal Functional Assessment Personal Appraisal Evaluability Program Evaluation Factors Action Steps Involved in Using Desired Outcomes to Guide Organization Improvement Step 1: Establish Baseline
17 19 20 20 22 23 24 25 26
xi
26 28
xii
Contents
Step 2: Determine Desired Outcomes Step 3: Align Services with Desired Outcomes The Utilization of Program Evaluation Data Understanding the Organization´s Personality Being Aware of Key Success Factors Summary Study Questions Additional Readings
28 33 35 36 36 37 39 40
3. Effectiveness Evaluation
41
Overview Effectiveness Evaluation Model and Analysis Steps Performance Goals (Anticipated Outcomes) Purpose and Comparison Condition Methodology Data Collection and Analysis Person and Organization-Referenced Outcomes Example 1: Effectiveness of a Demonstration Program Overview Step 1: Performance Goals (Anticipated Outcomes) Step 2: Purpose and Comparison Condition Step 3: Methodology Step 4: Data Collection and Analysis Step 5: Outcomes Critique Example 2: Effectiveness of Consumer-Generated Survey Data Overview Step 1: Performance Goals (Anticipated Outcomes) Step 2: Purpose and Comparison Condition Step 3: Methodology Step 4: Data Collection and Analysis Step 5: Outcomes Critique Example 3: Influence of Participant Characteristics and Program Components Overview Step 1: Performance Goals (Anticipated Outcomes) Step 2: Purpose and Comparison Condition Step 3: Methodology
42 43 43 48 49 51 52 53 53 53 54 54 54 55 55 56 56 56 56 56 57 58 58 59 59 59 60 60
Contents
xiii
Step 4: Data Collection and Analysis Step 5: Outcomes Critique Summary Study Questions Additional Readings
61 61 62 62 63 64
4. Impact Evaluation
65
Overview Outcomes versus Impacts Comparison Condition Impact Evaluation Designs Person as Own Comparison Pre/Post Change Comparison Longitudinal Status Comparison Hypothetical Comparison Group Matched Pairs (Cohorts) Experimental/Control Steps Involved in Impact Evaluation Study 1: The Impact of Different Training Environments Purpose/Questions Asked Comparison Condition Core Data Sets and Their Measurement Results Discussion of Results and Their Implications Study 2: The Impact of Transitional Employment Programs Purpose/Questions Asked Comparison Condition Core Data Sets and Their Measurement Results Discussion of Results and Their Implications Summary Study Questions Additional Readings
66 67 67 70 71 71 73 73 73 75 82 83 83 84 84 85 86 87 87 88 88 89 92 93 94 95
5. Policy Evaluation
97
Overview
97
xiv
Contents
An Overview of Benefit-Cost Analysis Policy Evaluation Model and Process Steps Model Data Sets Process Steps Example 1: Families and Disability Example 2: Welfare-to-Work Paths and Barriers Example 3: Implementation of the 1992 Vocational Rehabilitation Amendments Guidelines Summary Study Questions Additional Readings
99 102 102 103 107 109 111 114 116 118 120 121
II: OUTCOMES: THEIR SELECTION, MEASUREMENT, AND ANALYSIS 6. Selecting Outcomes
127
Overview The Reform Movement Accountability Dimension Quality Dimension Selection Criteria Outcome Measures: Regular Education Outcome Measures: Special Education Outcome Measures: Health Care Outcome Measures: Mental Health Outcome Measures: Disabilities Outcome Measures: Aging Outcome Measures: Substance Abuse Outcome Measures: Corrections Generic Outcome Measures Summary Study Questions Additional Readings
127 128 129 131 134 136 137 139 143 144 148 150 152 155 156 157 158
7. Measuring Outcomes
159
Overview Psychometric Measurement Standards Realiability
159 164 164
Contents
xv
Validity Standardization Group Norms Performance Assessment Effectiveness Efficiency Consumer Appraisal Satisfaction Fidelity to the Model Functional Assessment Adaptive Behavior Role Status Personal Appraisal The Concept of Quality of Life The Assessment of Quality of Life Summary Study Questions Additional Readings
165 165 165 165 165 169 171 175 176 180 181 182 186 187 189 193 194 194
8. Analyzing and Interpreting Outcomes
195
Overview Input Variables: Recipient Characteristics Age and Gender Diagnosis or Verification Adaptive Behavior Level Role Status Throughput Variables: Core Service Functions and Cost Estimates Core Service Functions Cost Estimates Statistical Principles, Guidelines, and Analyses Statistical Principles Statistical Guidelines Statistical Analyses Interpreting External Influences on Outcomes Clinical Significance Threats to Internal Validity Organization Variables Attrition Summary Study Questions Additional Readings
196 197 197 198 198 198 199 199 201 207 207 210 213 216 218 220 221 228 230 232 232
xvi
Contents
9. Future Scenarios
233
Overview Increased Variability of the Service Deliver System Balance between Performance Measurement and Value Assessment Evaluation Theory: Embracing the Postmodernist Paradigm Managing For Results Outsourcing of Evaluation Summary and Conclusion Study Questions Additional Readings
233 235 237 238 239 242 245 247 247
References
249
Author Index
263
Subject Index
267
I An Overview of Outcome-Based Evaluation Could Mother Teresa survive an outcomes-oriented world? LISBETH SCHORR (1997, p. 135)
A lot has happened since the first edition of Outcome-Based Evaluation. The fields of education, health care, and human services have continued their focus on results-based accountability, outcomes research, and performance reporting. New terms have emerged such as “evidence-based medicine” and “evidence-based mental health.” Policymakers and funders have focused increasingly on cost containment, service reforms, and practice guidelines. The evaluation field has moved increasingly towards methodological pluralism, participatory action research, and policy evaluation. Common to these changes is the basic fact that evaluators–and outcome-based evaluation–seek answers to questions such as: What outcomes is my program producing in its service recipients? How can my program meet increasing accountability demands? Is my program meeting its goals and objectives? Is my program efficient? Is my program effective? Does Program X produce better outcomes or results than Program Y? Does this education, health care, or human service program work? Does this policy work? How can outcome information be used to improve programs or policies? Numerous people are interested in outcome-based evaluation and its application. Throughout the text, I will sensitize the reader to the key players 1
2
Part I
in outcome-based evaluation (OBE) and make a distinction among promoters, stakeholders, and evaluators. Promoters include policymakers, funders, and consumers who are demanding results-based accountability, outcome reviews, and performance reporting. Stakeholders include governing/corporate boards, policy analysts, administrators, and consumers who are increasingly having to respond to cost containment, service reforms, and practice guidelines. Evaluators are those caught in the middle. They are the ones who are asked to answer efficiency and effectiveness questions. Although this book on outcome-based evaluation and its application is written for all three groups–promoters, stakeholders, and evaluators–the primary emphasis is on the evaluator who must answer the questions asked by promoters and stakeholders. My goal is twofold: first, to suggest an outcomes approach to evaluation that reflects the current dual-emphasis on accountability and program-policy improvement; and second, to sensitize key players in OBE to ask the right questions, to recognize the complexity of outcome-based evaluation and its application, and to appreciate the role that OBE plays in accountability and program-policy improvement. Part I of the text provides the reader with an overview of outcome-based evaluation and its application. Chapter 1 introduces you to the interrogatories and utility of OBE. The chapter discusses a number of interrogatories that are essential to understanding OBE: its definition, components, methodology, application, and comparison with other types of evaluation approaches. The chapter also discusses a number of reasons why an outcome-based approach to program evaluation is a good way to address the major trends currently impacting education, health care, and social programs: the quality revolution with its emphasis on quality of life outcomes, consumer empowerment, increasing accountability demands, and the emerging supports and pragmatic program evaluation paradigms that are challenging us to look differently at the way we think about and do program evaluation. Chapters 2–5 address each of the four types of outcome-based evaluation: program, effectiveness, impact, and policy. As a general overview: Program evaluation determines current and desired person and program-referenced outcomes and their use. Effectiveness evaluation determines the extent to which a program meets its stated goals and objectives. Impact evaluation determines whether a program made a difference compared to either no program or an alternative program. Policy evaluation determines policy outcomes in reference to their equity, efficiency, or effectiveness. Throughout Part I of the text, you may find yourself asking a number of questions concerning outcome-based evaluation and its application. Three of
An Overview of Outcome-Based Evaluation
3
the most common ones that I have encountered since the publication of the first edition involve Why should I use it; why do I have this gnawing feeling in the pit of my stomach about it; and how can I use OBE and still feel comfortable?
Why Should I Use OBE? Peter Drucker (as found in Schorr, 1997, p. 115) is proported to have said, “What is the bottom line when there is no bottom line? If profits are not the measure of value, what is?” As the reader is aware, much of the current discussion about whether anything works is ideological. But that does not diminish the need for rigor in distinguishing between actual success and failure in achieving public and other purposes. As stated by Schorr, Most legislators want to know what works when they vote on laws and appropriations; parents want to know how well their children are being educated; foundations want to know about the impact of their support; and the staff of social programs want to know how effective they are. . . . As a result, improving the ability to judge the success of agencies and programs in achieving agreed-upon outcomes is becoming a major reform strategy. (1997, p. 115)
But there are other reasons for key players to focus on outcome-based evaluation. The most important of these include: Understanding the contributions of specific programs/services/ interventions on the lives of persons. Helping consumers, families, providers, policymakers, and funders make rational education, health care, and social service–related choices based on a clearer understanding of the effectiveness, impact, and benefit-cost of the services or interventions. Improving education, health care, and social service programs based on the use of outcomes data. Meeting the increasing needs for program/service accountability and responsiveness. Increasing community support through the demonstration of valued outcomes and efficient services.
Why Do I Have a Gnawing Feeling in the Pit of My Stomach about OBE? Although the use of outcome-based evaluation and its application can be most productive, there are also a number of fears that people have about OBE. Depending upon one’s perspective, these fears might involve (1) the distor-
4
Part I
tion of programs to meet the expected results; (2) the responsibility for both progress and failure that cannot be accurately ascribed; (3) the true causes of person- and program-referenced outcomes often being outside the control of those held accountable; and (4) outcomes accountability becoming a screen behind which protections of the vulnerable are destroyed (Schorr, 1997). But the gnawing feeling may also be related to things that OBE will not tell you. For example, outcomes in isolation cannot improve education, health care, or social services. They need to be viewed as goals and objectives that provide guidance for program efficiency and effectiveness. Also, direct cause-effect relations are the exception rather than the rule in education, health care, and social services. Thus, one must be realistic about what to expect from OBE. In that regard, a major emphasis found throughout the text is putting OBE in its proper context and balancing its benefits against its costs and potential misunderstandings.
How Can I Use OBE and Still Feel Comfortable? There are some things you can do in reference to OBE to minimize the perceived dangers and fears and to maximize the benefits. Throughout the text I present a number of guidelines to increase your comfort level. Among the most important (Schalock, 1995a; Schorr, 1997; Weiss, 1972): work with program personnel to determine what needs to be in place for outcomes to occur (that is, stress process and outcomes); choose outcomes that are easy to understand and persuasive to skeptics; measure the outcomes reliably and with demonstrated validity; match the type and complexity of the evaluation to the program’s needs and resources; build on a strong theoretical and conceptual base; emphasize shared interests rather than adversarial relationships between evaluators and program personnel; employ multiple methods and multiple perspectives; offer both rigor and relevance; distinguish between short-term, intermediate, and long-term outcomes; realize that the most powerful tool you have is conceptual, not statistical. Hopefully, my answers to these three questions have allayed any fears that you have about OBE as you proceed to Chapter 1. There you will read about the interrogatories and utility of outcome-based evaluation and its application.
1 An Overview of Outcome-Based Evaluation and Its Application OVERVIEW
5
Definition Elements Methodological Pluralism Comparison with Other Types of Evaluation Formative Feedback Summary Study Questions Additional Readings
6 7 8 12 13 14 16 16
If you don’t know where you are going, you will wind up somewhere else. YOGI BERRA
Overview Welcome to the twenty-first century! In case you haven’t noticed, things have continued to changed significantly in the world of service delivery and program evaluation since the 1995 publication of the first edition of OutcomeBased Evaluation (Schalock, 1995a). The term that is used most frequently to reflect this change is paradigm, which refers to how we approach or think about something. Indeed, education, health care, and social service programs and the techniques we use to evaluate their efforts and outcomes are continuing to undergo significant changes as we continue to adapt to the “four Cs” of 5
6
Chapter 1
today’s social-political environment: change, competition, consumer, and cost containment. This paradigm shift, occurring at the same time that we are seeing a clear need for increased accountability, competition among service providers, program improvement, and constant social-cultural change, has resulted in new ways of thinking about program evaluation and the techniques we use to evaluate the outcomes from education, health care, and social service programs. My major purpose in writing this book is to familiarize you with these new ways of thinking and to acquaint you with the rapidly emerging outcome-based approach to program evaluation and its application. My goal is to make you a more knowledgeable and effective evaluation consumer or producer. As a consumer of program evaluation, you need to understand what OBE is; as a producer of OBE, you need to know how to use its techniques so that you are communicative, accurate, and credible. Our odyssey begins with an overview of OBE and its application. By the end of the chapter, you will know what OBE is, and why it is emerging rapidly as an essential approach to program evaluation.
Definition Outcome-based evaluation encompasses the central question of what education, health care, and social service programs ought to achieve for persons receiving them: valued, person-referenced outcomes. It also encompasses what outcome-based evaluation players (promoters, stakeholders, and program evaluators) are requesting of education, health care, and social service programs: organization-referenced outcomes that reflect the organization’s effectiveness and efficiency. These two questions provide the basis for the definition of OBE: A type of evaluation that uses person- and organization-referenced outcomes to determine current and desired person- and program-referenced outcomes and their use (program evaluation), the extent to which a program meets its goals and objectives (effectiveness evaluation), whether a program made a difference compared to either no program or an alternative program (impact evaluation), or the equity, efficiency or effectiveness of policy outcomes (policy evaluation).
This definition includes a number of terms that need to be understood clearly by both users and producers of outcome-based evaluation. Evaluation: a process that leads to judgments and decisions about programs or policies. Program: a set of operations, actions, or activities designed to produce certain desired outcomes. Throughout the text, three types of programs
An Overview of Outcome-Based Evaluation and Its Application
7
will be considered: education, health care (including mental health) and social services (including disabilities, aging, substance abuse, and corrections). Policy: a course or method of action selected from among alternatives to guide and determine present and future decisions. Outcomes: personal or organizational changes or benefits that follow as a result or consequence of some activity, intervention, or service. Some outcomes relate to the organization and some to the person. Outcomes can be short, intermediate, or long term. Analysis: the use of data collection, data storage and retrieval, and statistical manipulation of information resulting in trends, findings, and relationships regarding person-referenced or organization-referenced outcomes. Effectiveness: the extent to which a program meets its stated goals and objectives. Impact: whether the program made a difference compared to either no program or an alternate program.
Elements Today’s education, health care, and social service programs are confronted with two evaluation needs: to demonstrate increased accountability and continuous improvement. From a management perspective, these needs equate to managing for results, quality, and valued consumer-referenced outcomes. The five elements of outcome-based evaluation summarized in Figure 1.1 address these evaluation and management needs. Outcome-based evaluation begins by asking questions. Although potentially multifaceted, five questions asked frequently by promoters, stakeholders, or evaluators relate directly to the four types of evaluation discussed in the text. What outcome is my program producing in its service recipients (program evaluation)? Is my program meeting its goals and objectives (effectiveness evaluation)? Is my program better than others (impact evaluation)? Does this policy work (policy evaluation)? How can outcome information be used to improve programs or policies (evaluation utilization)? Note that some of these questions relate to the focus of evaluation (organization or individual) and some to the standard of evaluation (performance
8
Chapter 1
An Overview of Outcome-Based Evaluation and Its Application
9
or value). Note also that some of the outcome measures required to answer the questions are organization outcomes (performance, value) and some are individual outcomes (performance, value). These types of questions and evaluation type, focus, standards, and outcome measures establish the framework for this text. As we will see in Chapter 7, for example, the measurement approaches used in OBE are the techniques used to assess the outcome measures selected. Evaluation of these measures is done within the context of results and interpretation, which stresses the importance of validity, clinical significance, attrition, and external variables. The last element of OBE is the utilization of the evaluation’s results, which necessitates being clear about the purpose of the evaluation and being aware of a number of factors that increase the successful application of outcome-based results. Table 1.1 relates each of these elements to specific text chapters.
Methodological Pluralism Most evaluations of education, health care, and social services programs use a combination of performance measurement and value assessment. This use is consistent with a significant change that has occurred during the past decade in program evaluation: the use of both qualitative and qualitative research methods. A model that reflects this combination and change is shown in Figure 1.2 As depicted in the center of Figure 1.2, the model’s three components include standards, focus, and outcomes. The model’s standards reflect two perspectives on accountability: performance versus value; its focus reflects an emphasis on the organization (agency/service) or the individual (client/ customer/consumer); and its outcomes denote measurable results that are captured in a number of operationalized individual or organization-referenced performance or value indicators.
10
Chapter 1
Specific examples of outcome measures for each matrix cell are presented in Chapter 6 (see especially Tables 6.2–6.10). As an initial overview of these outcomes, consider the following: Organization performance outcomes: service coordination, financial stability, health and safety, program data, and staff tenure/turnover. Organization value outcomes: access to services, consumer satisfaction, staff competencies, family/consumer supports, wrap-around services, and community support. Individual performance outcomes: health status (physical and mental), functional status, financial status, residential status, and educational status. Individual value outcomes: self-determination, social inclusion, social relationships/friendships, rights and dignity, and personal development. Extending from the model’s standard/focus/outcome components are the measurement approaches used in outcome-based evaluation: two performance measurements and two value assessments. These four OBE measurement approaches are: Performance assessment: the preferred evaluation method for measuring organizational performance outcomes. Specific methods include performance planning and reporting, performance indicators (such as critical performance indicators and report cards), and financial accountability measures (such as a financial audit). Consumer appraisal: the preferred evaluation method for measuring organizational value outcomes. Specific methods include customer satisfaction surveys and measures reflecting fidelity to the service delivery model. Functional assessment: the preferred evaluation method for measuring individual performance outcomes related to adaptive behavior and role status. Specific measures include rating scales, observation, objective behavioral measures, and status indicators (such as education, living, employment status). Personal appraisal: the preferred evaluation method for measuring individual value outcomes. Specific measures include quality of life evaluations obtained from personal interviews, surveys, or focus groups. The methodological pluralism model presented in Figure 1.2 is fundamental to outcome-based evaluation and its application for a number of reasons. First, it guides and clarifies the evaluation process. Despite the need for
An Overview of Outcome-Based Evaluation and Its Application
11
12
Chapter 1
accountability and continuous improvement, “the problem is that clear and logically consistent methods have not been readily available to help program managers make implicit understanding’s explicit” (McLaughlin & Jordan, 1999, p. 65). Second, all measurements and assessments are focused on agreedupon outcomes related to the person or the organization. Thus, one uses a balanced approach that is responsive to the expressed needs of the key players in OBE. Third, methodological pluralism allows program evaluators to meet the following objectives of using mixed-method evaluations: triangulation: the determination of correspondence of results across consumer and personal appraisal, and functional and accountability assessment (Cook, 1985); complementarity: the use of qualitative and quantitative methods to measure the overlapping, but distinct, facets of the outcomes (Greene, Carcacelli, & Graham, 1989) initiation: the recasting of questions or results (i.e., if. . .then) from one strategy with questions or results from a contrasting strategy (Caracelli & Greene, 1993).
Comparison with Other Types of Evaluation Evaluation theory and strategies have undergone tremendous changes over the past 30 years. At least four stages of evaluation theory can be identified (Shadish, Cook, & Leviton, 1991): (1) the 1960s wherein program evaluation theory stressed the assessment of program effectiveness at solving social problems; (2) the 1970s wherein program evaluation focused on how information is used in the design and modification of social programs; (3) the 1980s, wherein the major focus was to integrate work from the previous two stages; and (4) the late 1980s and 1990s, which are referred to as the postmodernist period of program evaluation. This postmodernist approach is characterized by minimizing the role of science-based, quantitative research methodology and maximizing a social constructivist, qualitative, and pluralistic approach. Throughout these four stages, numerous evaluation strategies have evolved. As defined and used in this text, outcome-based evaluation is consistent with: Formative and summative approaches of Chambers (1994), Posavac and Carey (1980), Rossi and Freeman (1993), and Scriven (1972). Formative is very similar to program and effectiveness evaluation; summative to impact and policy evaluation. Performance and efficiency measurement of Suchman (1967) that is similar to program and effectiveness evaluation.
An Overview of Outcome-Based Evaluation and Its Application
13
Responsive evaluation of Stake (1983) that is similar to effectiveness evaluation in that evaluators employ proper goals as evaluative criteria. Performance-oriented evaluation of Wholey (1983) that is similar to performance and functional assessment. Utilization-focused evaluation (Patton, 1997; Weiss, 1988), which corresponds to the text’s formative feedback process. Theory-driven program evaluation (Chen & Rossi, 1989; Finney & Moos, 1989) that is reflected in the outcome-based methodological pluralism evaluation model presented in Figure 1.2. Social constructivism orientation of Fishman (1992), Guba and Lincoln (1989), and Denzin and Lincoln (1994) and their emphasis on the pragmatic evaluation paradigm, ideographic research, context-specific knowledge, decision-oriented evaluation, and use of methodological pluralism. ( The reader will find a similar orientation presented throughout the text.) Logic models (Conrad, et al., 1999; McLaughlin & Jordan, 1999) whose focus on accountability, managing for results and quality, and the use of evaluation results for program improvement are consistent with the model presented in Figure 1.2 and the concept of formative feedback.
Formative Feedback Outcome-based evaluation involves description, interpretation, and value judgments. In their discussion of scientific realism, Bhaskar (1975) and House (1991) suggest that evaluation should not focus on events, but rather on the structures or the causal entities that produce the events, and then use this information to modify or change the structures or causal entities. This process is facilitated through the use of the Formative Feedback Model presented in Figure 1.3. Note the key components of this model: evaluation planning that includes strategic and performance plans, evaluation activities, feedback to the organization in the form of program performance reports, and program improvement activities. The importance of incorporating a formative feedback loop such as shown in Figure 1.3 is that it: reflects the trends toward decentralization and the need that individual programs have to evaluate how well are they doing; represents critical input to stakeholders and promoters who are operating within the context of increased accountability and asking for outcome-based information to evaluate the effectiveness and efficiency of education, health care, and social service programs;
14
Chapter 1
constitutes a reasonable way for programs to respond to the broader context of increased accountability, effectiveness/efficiency evaluation, and potential programmatic change.
Summary At the same time that we are responding to the evaluation challenges represented by the increased need for accountability and continuous improvement, we are also experiencing changes in how we approach program evaluation. Historically, the experimental paradigm has been used for hypothesis testing or theory building. According to some (e.g., Schorr, 1997), this traditional approach to evaluation is overreliant on a biomedical, experimental model as the basis of understanding social and human service programs. As such, it does not fit well with the current service delivery system that is characterized as being support-oriented, community-based, comprehensive, and individualized. The historical approach, which required experimental and control conditions, has recently been replaced with a pragmatic evaluation paradigm that emphasizes a practical, problem-solving orientation to program evaluation. As stated by Fishman: In the pragmatic paradigm, a conceptually coherent program is designed to address a significant social or psychological problem within a naturalistic, real-world
An Overview of Outcome-Based Evaluation and Its Application
15
setting in a manner that is feasible, effective, and efficient. Quantification is used to develop performance indicators of a system’s functioning. The system is monitored in terms of both baseline and changes due to identified interventions. (1991, p. 356)
The pragmatic approach reflects a postmodernist approach to evaluation (Chelimsky & Shadish, 1997). Key aspects of the postmodernist approach that the reader will find throughout this book include: A commitment to an epistemology of social constructionism that assumes that there are only alternative, subjective constructions of reality produced by different persons. Application of the pragmatic evaluation paradigm. Focus on ideographic (person-referenced) evaluation and participatory action research (PAR). Emphasis on decision-oriented knowledge. Use of methodological pluralism. Emphasis on context-specific results and knowledge. Thus, the approach to outcome-based evaluation presented in this text is consistent with the emerging pragmatic and postmodernist approaches to evaluation. As a trained scientist, I believe in the experimental method and will argue for research methods that involve the true experimental/control research design whenever possible. However, throughout this book, I will also present alternative evaluation designs that can be used if true experimental conditions cannot be established. But I should add a caution: when one uses these alternative designs, one will need to make more assumptions, will have less certainty, and will be less precise about what the results mean. The shift toward the postmodernistic, pragmatic approach to program evaluation affects the measurement and analysis techniques used. Equally important, it changes the role of the program evaluator to one who facilitates interpretative dialog among the program’s stakeholders, attains consensus among the stakeholders about the program’s values and outcomes, and incorporates into their roles the concepts of internal program evaluation and formative feedback. The quotation by Lisbeth Schorr (1993, p. 1), “Could Mother Teresa survive an outcomes-oriented world?” is worth thinking about. There is no doubt that the “worlds” of education, health care, and social services have changed significantly over the past decade. We now live in a different world: one characterized by both the “four Cs” (consumers, change, competition, and cost containment) and the “three As” (assessment, accountability, and action). These changes pose significant challenges to those of us in outcome-based evaluation. The subsequent chapters represent my best approach to meeting those
16
Chapter 1
challenges within the confines of good evaluation theory and strategies, and important values.
Study Questions 1. What are the four types of outcome-based evaluation? Give examples. 2. Define outcome-based evaluation and compare it with other approaches to pro-
gram evaluation discussed in the chapter. 3. Review the methodological pluralism model presented in Figure 1.2. Generate a
4. 5.
6. 7. 8. 9.
10.
list of specific outcomes for each of the matrix cells for an education, health care, and social service program with which you are familiar. What is the difference between performance measurement and value assessment? Assume the role of an evaluation producer. Outline the critical steps that you will need to perform to complete an effectiveness evaluation of an education program. Remember that effectiveness evaluation is the extent to which a program meets its stated goals and objectives. Use Figure 1.1 as a guide. What is methodological pluralism? Why is it important in outcome-based evaluation? What is formative feedback? Give an example of how the components expressed in Figure 1.3 can be used in a program evaluation. Give examples of triangulation, complementarity, and initiation. How are these three techniques related to mixed-method evaluations? Find examples in the literature that reflect the two evaluation needs felt today–to demonstrate increased accountability and continuous improvement. What are your thoughts about “can the world survive an outcomes-oriented approach to program evaluation?”
Additional Readings Baker, E. L., O’Neill, H. F., & Linn, R. L. (1993). Policy and validity prospects for performancebased assessment. American Psychologist, 48(2), 1210–1218. Bickman, L. (1996). The application of program theory to the evaluation of a managed mental health care system. Evaluation and Program Planning, 19(2), 111–119. Capra, F. (1996). The web of life: A new scientific understanding of living systems. New York: Doubleday. Fishman, D. B. (1992). An introduction to the experimental versus the pragmatic paradigm in evaluation. Evaluation and Program Planning, 14, 353–363. Meyer, L. H., & Evans, I. M. (1993). Science and practice in behavioral intervention: Meaningful outcomes, research validity, and usable knowledge. Journal of the Association for Persons with Severe Handicaps, 18(4), 224–234. Reichardt, C. S., & Rallis, S. F. (1994). (Eds.). The qualitative-quantitative debate. San Francisco: Jossey-Bass. Schorr, L. B. (1997). Common purpose: Strengthening families and neighborhoods to rebuild America. New York: Anchor Books, Doubleday. Sederer, L. I., & Dickey, B. (Eds.). (1996). Outcomes assessment in clinical practice. Baltimore: Williams & Wilkins.
2 Program Evaluation OVERVIEW
17
Use of Outcome Measures A Multiple Measurement Approach to Accountability Performance Assessment Consumer Appraisal Functional Assessment Personal Appraisal Evaluability Program Evaluation Factors Action Steps Involved in Using Desired Outcomes to Guide Organization Improvement Step 1: Establish Baseline Step 2: Determine Desired Outcomes Step 3: Align Services with Desired Outcomes The Utilization of Program Evaluation Data Understanding the Organization’s Personality Being Aware of Key Success Factors Summary Study Questions Additional Readings
19 20 20 22 23 24 25 26 26 28 28 33 35 36 36 37 39 40
A mind stretched to a new idea never goes back to its original dimensions. OLIVER WENDELL HOLMES
Overview Our discussion in Chapter 1 indicated that current education, health care, and social service programs are confronted with two needs: to demonstrate 17
18
Chapter 2
increased accountability and continuous improvement. From a management perspective, these needs equate to managing for results, quality, and valued consumer-referenced outcomes. This chapter expands on these evaluation and management needs from five perspectives. First, the concept of outcome measurement will be proposed as central in efforts to gauge the effectiveness and efficiency of education, health care, and social services. Second, the discussion of accountability will be expanded to include the four measurement approaches used in outcome-based evaluation: performance assessment, consumer appraisal, functional assessment, and personal appraisal. Third, I will suggest that before a program can be held “accountable,” it needs to be “evaluable” (as reflected in a proposed evaluability assessment). Fourth, I will discuss three action steps involved in using desired outcomes for continuous improvement. And fifth, I will present a number of utilization guidelines that can be used across the four types of OBE evaluations, addressing the “utilization” element of outcome-based evaluation (see Figure 1.1). The approach to program evaluation presented in this chapter is based on a variant of the Methodological Pluralism Model presented in Figure 1.2. What is shown in Figure 2.1 are suggested outcomes for each of the matrix cells of Figure 1.2: Organization performance outcomes: service coordination, financial stability, health and safety, program data, and staff turnover/tenure Organization value outcomes: access to services, consumer satisfaction, staff competencies, family/consumer supports, and community support Individual performance outcomes: physical well-being (health status and wellness indicators) and material well-being (employment status, living status, education status) Individual value outcomes: emotional well-being, personal development, self-determination, interpersonal relations, social inclusion, and rights The Program Evaluation Model shown in Figure 2.1 was developed based on my integration of the literature on outcomes research and outcome-based evaluation. This integration resulted in the development of a list of 40 potential outcomes that were based on the work of Ashbaugh et al. (1997); Bruininks et al. (1986), Gardner (1999), Gettings (1998), Gettings and Bradley (1997), Kaplan (1992), Morreau and Bruininks (1991), Ridenour (1996), Schalock (1995a, 1999), and Trabin, Freeman, and Pallak, (1995). The 40 potential outcomes were aggregated into each cell of the matrix (10 indicators per cell), based on the standards and foci shown in Figure 2.1. For example, organization-referenced outcomes that dealt with effectiveness and efficiency issues (and to which costs could logically be assigned) were placed in the organization-performance cell; similarly, outcomes that dealt with value issues (such as
19
Program Evaluation
access, staff competence, and customer satisfaction) were placed in the organization-value cell. Analogously, outcomes that were person-referenced (such as health status, employment status, or educational status, and to which costs could logically be assigned) were placed in the individual-performance cell; and outcomes that related to the person’s preferences, desirable qualities, or important values were placed in the individual-value cell. Through a series of focus groups (using Q-sort techniques) and literature-based searchers, 20 potential outcomes were identified for each matrix cell of Figure 2.1. These 80 potential outcomes form the basis of the Outcomes Planning Inventory shown later as Exhibit 2-2.
Use of Outcome Measures The concept of outcome measurement is central in efforts to gauge the quality and effectiveness of education, health care, and social services. The major purpose of outcomes measurement in the public sector revolves around the concept of enhanced accountability. For example, in an era in which revenues are growing more slowly than the demand for services, governments are forced to make tough decisions about priorities. A greater consciousness of tax bur-
20
Chapter 2
dens and policy has resulted in a desire to not only prioritize services based on need and demand, but also to measure outcomes to ensure that the resources put into services are used to the best advantage. Despite these advantages, a basic concern about using outcomes within the context of the strong current emphasis on accountability is to be certain that any proposed outcomes measurement system balances measures of costs, adequacy of services, and benefits. This concern points out the importance of understanding the different perspectives on accountability such as those discussed next.
A Multiple Measurement Approach to Accountability As discussed in Chapter 1, two significant changes have occurred during the past decade in program evaluation: the use of both qualitative and quantitative research methods (that is, methological pluralism), and the use of different measurement approaches. At the individual level, methodological pluralism includes the use of personal appraisal and functional assessment; at the organizational level, it includes performance assessment and consumer appraisal. Each of these four strategies is discussed below briefly to show their relevance to program evaluation and the multiple perspectives on “accountability.” They are discussed in considerable more detail in Chapter 7.
Performance Assessment The concept of performance assessment is central in efforts to gauge the relative effectiveness and efficiency of education, health care, and social services. Although specific approaches to performance assessment (including report cards, performance planning and assessment, benchmarks, and costeffectiveness) will be presented in Chapter 7, it is important at this point to stress that the major purpose of performance assessment in the public sector revolves around the concept of enhanced accountability. For example, in an era in which revenues are growing more slowly than the demand for services, governments and for-profit organizations are forced to make tough decisions about priorities. A greater consciousness of tax burdens and corporate profits has resulted in policies to prioritize services based on need and demand and to ensure that the resources put into services are used to the best advantage. Citizens, boards of directors, and shareholders demand greater accountability for the resources they commit and the profits their organizations generate. In this process, they insist on objective data to prove or disprove the need for, worth, and outcome of education, health care, and social service programs. There are a number of advantages to performance assessment (Drucker,
Program Evaluation
21
1994; Kane, Bartlett, & Potthoff, 1995; Price Waterhouse, 1993). First, it enhances public accountability. While publicizing performance evaluation can be threatening to some, the public use of measurement data is one of the strongest benefits of a good measurement system. Reporting organization outcomes can stimulate a much greater interest on the part of the public and can result in an even greater emphasis on quality. Second, performance assessment improves internal accountability. Measuring organization effectiveness and efficiency gives program administrators a significant tool to achieve accountability, since they are accountable to upper-level managers, elected officials, or boards of directors for their performance or that of their staff. This relationship becomes much clearer when organization outcomes are measured by commonly accepted standards. Third, performance assessment focuses on long-term goals and strategic objectives, since an important aspect of program evaluation involves comparing actual performance against expectations. Without the ability to measure performance and progress, the process of developing strategic plans and goals is less meaningful. While there is some benefit to thinking and planning strategically, the evaluation of such plans and goals cannot be objective without measuring performance and achievement. Fourth, performance assessment provides information to stakeholders. Performance measures are the most effective method for communicating to legislatures, boards, and citizens about the success of programs and services. Fifth, performance assessment also enhances decision making. Since hierarchical structures and extensive oversight requirements can obstruct organization effectiveness, effectiveness and efficiency measures free senior executives for more strategic planning and management, while clarifying the responsibilities and authority of managers. And finally, performance assessment allows entities to determine effective resource use. With increasing public concern over levels of taxation and user fees and boards of directors concerned about the “bottom line,” funding bodies are under pressure to justify the existence of key programs. Here is where the accountant meets the program evaluator. For example, a current trend in program and policy evaluation is to determine if government is, in fact, the best provider of some services. Contracting services, privatizing, and abandoning some services are clearly directions for the public sector in the future. The ability to decide if government is the best provider of a given service, or if that service is really making a difference in the lives of its citizens, is dependent on a good performance measurement system. Without such data, public policymakers cannot make decisions on solid qualitative and quantitative bases. Despite these advantages—enhancing public accountability, improving internal accountability, focusing on long-term goals and strategic objectives, providing performance information to stakeholders, enhancing decision making, and allowing entities to determine effective resource use—a basic concern
22
Chapter 2
about performance assessment and the current strong emphasis on accountability is to consider that it is the only measurement approach to accountability. Nothing could be further from the truth, since “accountability” is defined differently by different program constituents.
Consumer Appraisal Current education, health care, and social service programs are being impacted significantly by two phenomena: (1) the movement toward assessing the value, quality, and “accountability” of respective programs on the basis of customer satisfaction; and (2) the development of new models of service delivery that reflect the devolution of government, the homogenization of services, and the community-based movement in mental health, disabilities, aging, substance abuse, and corrections. These two phenomena challenge program evaluators to assess adequately program outcomes that define for the consumer “an accountable program.” The two consumer appraised techniques considered in the text (see Chapter 7 for more details) are satisfaction and fidelity to the model of service delivery being employed. Satisfaction
Measuring customer satisfaction of organization-referenced outcomes requires a range of methods, both because the focus of services or interventions differ and because individuals define quality differently. Typically, rating or attitude scales are used that permit respondents to indicate how well satisfied they are with particular aspects of the services provided by the program or agency. The advantages and disadvantages of using satisfaction as one’s major outcome measure will be discussed in Chapter 7. Fidelity to the Model
With the significant shift to case management and brokered services, community-based programs, and the supports model, a critical component to consumer appraisal is the evaluation of how well the respective model is meeting its objectives and providing the value promised by its advocates. Chapter 7 will outline and discuss in detail the following four critical steps in evaluating the fidelity of the service delivery system to its espoused model (Bryant & Bickman, 1996): (1) an explicit description of the model; (2) the use of theory to guide the evaluation; (3) the use of methodological pluralism; and (4) a statement of the criteria for assessing the quality of the outcomes. The major advantage in using fidelity to the model in organization value assessment is that it reflects potentially the outcomes of system change and the significant
Program Evaluation
23
paradigm shift that has occurred over the past decade in education, health care, and social services. The assumption has been that the shifts are effective and efficient. The role of program evaluation is to determine the value of the paradigm shift and to balance the fidelity to the model being implemented with customer satisfaction with the services provided by the (model) program. The downside is that the service delivery system is in a constant state of flux and change, and therefore the model may change before an adequate evaluation of it can be accomplished.
Functional Assessment This measurement approach addresses the objective nature of one’s life and life experiences and is generally organized around the concepts of adaptive behavior and role status. Adaptive behavior refers to those behaviors that are required for successful adaptation to or functioning in different major life activity areas such as home and community living, school or work, and health and wellness. The more important of these include self care, receptive and expressive language, learning, mobility, self-direction, and capacity for independent living and economic self-sufficiency. Role status refers to a set of valued activities that are considered normative for a specific age group. Examples include one’s living arrangement, employment status, education level, community participation, recreation-leisure patterns, and health status. For youth, attending school is a valued, age-specific activity; whereas for high school graduates and adults, living and working in the community are valued activities. The most typical formats used in functional assessments include rating scales, participant observation, and questionnaires. Each attempts to document a person’s functioning across one or more adaptive behavior life activity areas. To accomplish this, most instruments employ some form of an ordinal rating scale to yield a profile of the individual’s functioning. For example, one can ask (or observe), How frequently do you use health care facilities? or How frequently do you visit the doctor?, How many days out of the last month have you been sick in bed”, or “How many civic or community clubs do you belong to?” There are a number of advantages to using functional assessments to evaluate one’s life experiences. First, objective measures can confirm the results from a personal appraisal strategy. Second, adding objective measures to personal appraisal overcomes the commonly reported low correlation between
24
Chapter 2
subjective and objective measures of life experiences. Third, their use allows for the evaluation of outcomes across groups. Fourth, objective measures provide important feedback to service providers, funders, and regulators as to how they can change or improve their services to enhance the recipient’s functioning level. However, there are also some disadvantages to functional assessment. First, functional assessment must be balanced with other considerations. For example, it is clear that not all outcomes related to interventions or services can be measured. Second, functional assessments can have more cost than benefit. One needs to be cautious that the functional assessment system does not consume in resources more than its information is worth. Third, the usefulness of functional assessments varies by their use, since they are only useful to management or the decision-making process to the extent that they are used and that they answer the right questions. Fourth, organizations are sometimes limited in their ability to influence outcomes; therefore, users of functional assessment data need to understand the role that many factors play in person-referenced outcomes and not focus exclusively on the service provider.
Personal Appraisal The personal appraisal component of methodological pluralism addresses the subjective nature of life experiences and is typically approached today within the context of quality of life. We are currently experiencing a quality revolution that focuses on quality of life, quality enhancement techniques, and quality assurance (Schalock, 1999). This revolution, evident in both industry and human services, stresses that quality is integral to both the processes we use in service delivery and the accountability that we ascribe to those services. One major aspect of the quality revolution affecting program evaluation is the increasing tendency to assess the core dimensions of personcentered and health-related quality of life. Person-centered core quality of life dimensions include emotional wellbeing, interpersonal relationships, material well-being, personal development, physical well-being, self-determination, social inclusion, and rights (Schalock, 1996,1999). Health-related core quality of life dimensions include general satisfaction and feelings of well-being, physiological state/symptoms of illness, neurological functioning, interpersonal relationships, performance of social skills, and economic and employment status (Faden & Leplege, 1992; Lindstrom, 1992,1994).
25
Program Evaluation
person’s stated level of satisfaction. Its advantages include the fact that: (1) satisfaction is a commonly used aggregate measure of individual life domains and demonstrates a traitlike stability over time (Edgerton, 1996); (2) there is an extensive body of research on levels of satisfaction across populations and clinical conditions (Cummins, 1998); and (3) satisfaction as a dependent variables allows one to assess the relative importance of individual and environmental factors to one’s assessed level of satisfaction (Schalock, 2000). Its major disadvantages are that satisfaction provides only a global measure of perceived well-being, is sensitive to response perserveration, and its subjective measures are poorly correlated with objective measures.
Evaluability The concept of evaluability assessment is not new in program evaluation (Wholey, 1987). Many evaluators, such as myself, have worked with numerous agencies and programs over the years and have found that some are more able to be evaluated than others. For example, some agencies or programs are more process- than outcome-oriented, some are committed to evaluation and program enhancement, some have a clearer vision than others, some are proactive versus reactive, some see the relevance of data and evaluation, and still others have had a positive experience with evaluation and some not. Given these characteristics, I have often wondered about the differences among those programs and agencies that are receptive to program evaluation and those that have the capability to engage in evaluation and application activities, as opposed to those that are neither. Over the years, I have come to the conclusion that three factors are the basis for the difference: the history and culture of the organization, the presence or absence of evaluation catalysts, and whether the program or agency has the ingredients in place to be evaluated. These three factors compose the Program Evaluation Index referenced in Figure 2.2
26
Chapter 2
Program Evaluation Factors History and Culture
It has been my experience that some of the most important contextual variables regarding evaluation are the organization’s history and culture, its focus on process versus outcomes, and their commitment to data and its utilization. Evaluation Catalysts
It is a safe assumption that most education, health care, and social service organizations find evaluation to be a chore, fraught with risks and of little or no potential payoff. My experience has also shown that frequently one needs a catalyst to embark on evaluation, such as accreditation, which that has forced schools to evaluate student outcomes, and health maintenance organizations, which have built outcome evaluation into reimbursement schedules. Some of these catalysts are internal (such as stakeholders) and some are external (such as promoters who are demanding increased accountability). Evaluation Ingredients
Program evaluation requires more than a commitment to outcomes evaluation and a catalyst for doing so; it also requires a number of critical ingredients for it to be successful. Chief among these include data sets, a data management system, and expertise (time, money, and skills). To emphasize the importance these three key factors play in program evaluation I have developed a simple way to determine the program’s “evaluability” and a resulting Program Evaluation Index. The questions and scoring for the index are found in Exhibit 2-1.
Action Steps Involved in Using Desired Outcomes to Guide Organization Improvement A number of assumptions underlie the use of desired outcomes to guide change and results-based accountability. First, new models of quality management developed from process engineering and the use of social science measurement techniques make objective, databased approaches to conversion possible (Friedman, 1995). These approaches include report cards, benchmarking, quality indicators, practice guidelines, and monitoring. Second, desired outcomes can be considered as benchmarks that allow an organization to compare its performance to either the best practices among simi-
27
Program Evaluation
Exhibit 2-1 Evaluability Assessment Directions: Use the following rating scale to evaluate whether or not each factor involved in a program's evaluation capability is: 3 = definitely in place or has been experienced 2 = omewhat in place or experienced to some degree 1 = not in place or has either not been experienced, or the program's experience has been negative Factor
Evaluation (Circle)
1 . A history o f experience with program evaluation. 2. A culture that encourages empiricism, evaluation, looking at outcomes, or committed to data and data utilization. 3. Promoters (internal or external) who are stressing the need for evaluation. 4. Stakeholders who are stressing the need for evaluation. 5. Data sets and data management system composed of person-referenced and program-referenced outcomes. 6. Expertise within the organization, defined as time, money, and personnel who are competent in outcome-based evaluation. Program Evaluation Index:
3
2
1
3
2
1
3
2
1
3
2
1
3
2
1
3
2
1
(sum of the six items)
6–9: Don't undertake a large-scale evaluation. Best to work on prerequisites and do smaller pilot studies. 10–13: Evaluate with caution. Be cautious in explanations. Stress “current status” considerations. 14–18: The program should be "evaluable." Intended Use of Index (Check): diagnosis of the program's evaluation capability basis for strategic planning basis for staff training basis for program enhancement or improvement
28
Chapter 2
lar organizations or against its desired outcomes. Third, conversion planning and the targeting of resources involve a fundamental need to use multiple outcomes. Three action steps are involved in implementing continuous improvement. First, a baseline of currently monitored outcomes is established. This is accomplished by completing the Outcomes Planning Inventory presented as Exhibit 2-2 on the basis of the outcomes on which the organization currently focuses. Second, desired outcomes are identified by organization personnel completing the inventory on the basis of desired outcomes that are either person- or organization-referenced, realizing that most organizations will strive for a reasonable balance between outcomes related to performance and value. The resulting summary indicates both the desired outcomes to guide the organization and the discrepancy between where the organization is and where it wants to go. The third step involves aligning services with the desired outcomes.
Step 1: Establish Baseline To obtain baseline data (that is, the outcomes currently being used) organization personnel complete the Outcomes Planning Inventory (Exhibit 2-2) using the baseline instructions: For each row, circle the outcome that BEST describes your current outcome measures or evaluation focus. (The reader should note that the outcomes presented in Exhibit 2-2 represent the four most commonly used exemplars for each outcome measure listed in Figure 2.1.) Although this inventory uses a forced-choice strategy to determine one outcome per row, some practitioners may wish to use other strategies (such as the Delphi method or a Likert scale). After completing the inventory (either individually or collectively) the number of circled items in each column is summed (or averaged if there are multiple respondents) resulting in a total score for each cell of the Program Evaluation Model (Figure 2.1). The number of As reflect organization-performance outcomes; Bs, organizational-value outcomes; Cs, individual-performance outcomes; and Ds, individual-value outcomes. An exemplary result of this baseline outcomes assessment is shown in Figure 2.3a, which indicates that this particular organization is currently emphasizing organization-performance outcomes, with significantly less emphasis on the other three matrix cells.
Step 2: Determine Desired Outcomes After determining the outcomes on which the organization is currently focusing, the second action step involves determining where the organization
Program Evaluation
29
2a. Exemplary Baseline Outcomes
2b. Exemplary Desired Outcomes
wants to go–that is, determining on which outcomes to base continuous improvement activities and to target resources. This step involves the organizational staff completing the Outcomes Planning Inventory using Step # 2 instructions: For each row, circle the outcome that BEST describes the outcomes your organization wishes to measure or evaluate. As shown in Figure 2.3b, the organization feels that a better balance among the four outcome categories is desired. This will require focusing more on organization and individual-value outcomes than was reflected in the organization’s baseline (Figure 2.3a). Setting conversion goals and targeting resources depend largely upon the organization’s strategic plan and its “personality.” Based on Figure 2.1, one can identify four organization personalities that are summarized in Figure 2.4. Stability organization: focus is primarily on organization performance outcome categories related to service coordination, financial solvency, health and safety, data systems, and/or staff turnover (Cell A in Figure 2.1 and Column A in Exhibit 2-2). Outreach organization: focus is primarily on outcome measures related to organization values, including access issues, consumer satisfaction, staff competencies, consumer supports, or community supports (Cell B in Figure 2.1; Column B in Exhibit 2.2).
30
Chapter 2
Program Evaluation
31
32
Chapter 2
Rehabilitation organization: focus is primarily on outcome categories related to the person’s health status, functional status, financial status, residential status, or educational status (Cell C in Figure 2.1; Column C in Exhibit 2-2). Advocacy organization: focus is primarily on outcome categories related to self-determination, social inclusion, social relations, rights and dignity, and personal development (Cell D in Figure 2.1; Column Din Exhibit 2-2) A detailed listing of potential outcomes and their selection criteria will be discussed and presented in Chapter 6 (Tables 6.2-6.10). For the time being, a number of guidelines should be used to select these specific outcome measures. Five of the most important guidelines are summarized in Table 2.1.
Program Evaluation
33
Step 3: Align Services with Desired Outcomes Once the desired outcomes are selected and measured (see Chapters 6 and 7) then the question is asked, What needs to be in place for the’ organization to use effectively the desired outcomes for guiding the organization is continuous improvement and thereby increasing the organizations accountability? Three suggestions include: (1) foster a culture of change; (2) develop and evaluate strategic plans and performance goals; and (3) implement an outcomes-oriented monitoring system. Foster a Culture of Change
There is considerable literature about organization change, the concept of a culture of change, how organizational culture influences outcomes information utilization, and the linking of outcome accountability to change and continuous improvement (Colarelli, 1998; Hodges & Hernandez, 1999; Schalock, 1999). Although a detailed discussion of organization change is beyond the scope of this chapter, key factors operating in change-oriented organizations include the organization’s mission and its view of itself, the communication style prevailing within the organization, previous experiences with using outcome data, and the organization’s relationship with regulatory bodies (Green & Newman, 1999). The concept of total quality management (TQM) is basic to fostering a culture of change. As stated by Hodges and Hernandez: The application of TQM principles requires the systematic analysis of quantitative data with the involvement and commitment of people throughout an organization in order to concentrate organizational efforts on constantly improving quality of the goods and services it offers. The philosophy of TQM encompasses the idea of making simultaneous improvements to both quality and cost effectiveness. (1999, p. 184)
TQM involves the following six management principles that facilitate the use of desired outcomes to guide continuous improvement and to enhance an organization’s accountability (Albin-Dean & Mank, 1997; Drucker, 1998; Hodges & Hernandez, 1999; Hoffman et al., 1999): strong quality leadership–adoptation of quality outcomes as a part of the corporate philosophy and a system to deploy this philosophy throughout the organization consumer orientation–organizations must be responsive to consumer needs continuous improvement–an emphasis on incremental change on an ongoing basis
34
Chapter 2
data-driven decision making–an emphasis on structured problem solving based on the analysis of data teamwork–employees throughout an organization work together in the process of quality improvement focus on organization process–an emphasis on organizational processes and systems that affect the organization’s clientele and services Develop and Evaluate Strategic Plans and Performance Goals
Current accountability initiatives seek to improve management, increase efficiency and effectiveness, and improve public confidence in government. For example, reform acts throughout the world provide a legislative base for many of the most important reform efforts, asking agencies to articulate goals in their strategic plans and to report results via program performance reports. Through strategic planning an organization develops its mission statement covering the agency’s major functions and operations; establishes and periodically updates long-term goals and objectives, including outcome-related goals and objectives; describes how those goals and objectives are to be achieved; describes how annual program performance goals will be related to the agency’s long-term goals and objectives; and identifies key external factors (that is, contextual variables) that can significantly affect the achievement of the long-term goals and objectives. Through annual performance planning the organization prepares annual performance plans that define performance goals for each fiscal year. These plans should include targeted levels of outputs and outcomes to be achieved by key agency programs, the resources and activities required to meet these performance goals, and the establishment of performance indicators to assess relevant program outcomes and compare actual program results with performance goals. Through annual program performance reports organizations report actual program results compared with the performance goals for that fiscal year, report actual program results for prior fiscal years, and explain why any performance goals were not met and what action is recommended. An Outcomes-Oriented Monitoring System
Many of the agencies with whom I am familiar are data rich and information poor, due primarily to the organization not knowing for sure what data to collect, what to measure, how to measure it, or how to organize and retrieve relevant outcomes-oriented data. Part of the implementation of an outcomesoriented monitoring system has already been discussed in reference to Action Step 2 (determine desired outcomes). The next step is to implement a monitoring system that can store, analyze, and report the status over time of the
Program Evaluation
35
desired outcome measures selected. Key components of such a system include the use of: Data sets that meet the following criteria: person or organization- referenced, complete (available for all program participants or relevant program components), timely (current and cover the period you are interested in), affordable (in terms of time, money, and expertise), and accurate (reflect actual events and characteristics). Data collection formats that can lead directly (and electronically) to data management and entry, and data analysis. A data collection responsibility center that is trained in data collection strategies, data management principles, data analysis, and data reporting procedures. Data collection time lines that are consistent with the organization’s strategic plan, required reporting period, and annual performance review cycle. Standardized report formats that will allow annual outcome reports (such as report cards) and longitudinal comparisons (such as benchmarks) .
The Utilization of Program Evaluation Data The evaluation utilization literature is extensive. Slow but steady progress has been made in our understanding of the use of program evaluation data to increase a program’s accountability and as a basis for continuous improvement. Numerous evaluation models have been developed to improve evaluation utilization including those of Johnson (1998) and Patton (1997). Common themes among these models include: the importance of stakeholder involvement in planning and implementation; the use of evaluation information as a marketplace of ideas and information; the use of evaluation information for programmatic change and improvement; the key roles that both the program’s internal and external environments play in the utilization of evaluation results; the need to impact decision makers’ understanding of and commitment to change based on the evaluation data; the necessity of changing managers’ performance based on the evaluation’s results;
36
Chapter 2
the realistic potential for organization learning and change; the importance of a utilization model that drives the implementation process. To these common themes I would add two additional suggestions: the importance of understanding the organization’s personality and being aware of key success factors.
Understanding the Organization’s Personality As summarized in Figure 2.4, four organization personalities can be identified depending upon their focus on performance or value standards, and individual or organization outcomes: stability, outreach, rehabilitation, and advocacy. These organization personality factors impact the utilization of program evaluation results, especially if the results are congruent with the organization’s personality. This is another reason to “begin with the end in mind” and realize that in the early stages of an outcome-based evaluation, one should evaluate both the program’s evaluation capability (see Exhibit 2-1) and determine the “personality” of the organization. My experience has been that organizations do have personalities, and as a result, some are more receptive than others to outcome-based evaluation and the utilization of program evaluation results. For example, stability organizations want to be stable and therefore are hesitant to change; although they embrace evaluation results that confirm their efficiency and effectiveness, they are less likely to change in significant ways. Similarly, the rehabilitation organization will use evaluation results that permit better person-referenced outcomes, but within the confines of resources, potential feasibility, and consumer involvement. In contrast, both outreach and advocacy organizations are quite receptive to evaluation results that confirm their expectations and beliefs that they are not doing enough and need to change to enhance organization-value outcomes and person-referenced, valued outcomes.
Being Aware of Key Success Factors There are a number of possible reasons for not implementing program evaluation results. For example, there is often lag time between when the questions are asked and when the evaluation results are available; there are constantly changing environments in which current education, health care, and social programs operate; and what’s relevant today may become secondary to a more pressing issue tomorrow, especially if different personnel are involved than those who originally asked the questions. However, there is
37
Program Evaluation
good agreement that the 10 factors listed in Table 2.2 are closely allied with the successful implementation of program evaluation results. The importance of the 10 factors listed in Table 2.2 is apparent in at least four ways. First, they allow all key players to see the relationship between program evaluation data and their use. Second, they underscore the primary purpose of outcome-based evaluation, which is to provide formative feedback to program administrators, policymakers, and funding bodies. This information can be used to bring about program and system changes that will result in increased equity, efficiency, and person and organization-referenced outcomes. Third, the 10 factors include the key component of change, for change can come about in regard to any component of a program: the input (goals and objectives), throughput (core service functions and costs), or output (person and organization-referenced outcomes) components of a program. And fourth, the list underscores a truism: change is hard because of the number of players involved.
Summary This chapter has focused on the increasing need for education, health care, and social service programs to demonstrate increased accountability within the context of two powerful, potentially conflicting forces: person-centered values and economic-based restructured services. To address this increased need for accountability, profit and not-for-profit organizations have had to make significant organizational changes over the past two decades to remain competitive and viable. Common to these approaches have been:
38
Chapter 2
adopting a framework that allows for entrepreneurship, resource development, and capital formation; creating organization-based systems involving marketing, fiscal management, clinical decision making, databased management, and evaluation; making total quality management and continuous improvement fundamental aspects of an organization’s culture; focusing on outcome-based evaluation; committing the organization to effective utilization management, including cost-control and risk reduction procedures linking outcomes to systems change; shifting to results-based measurement. A basic premise of this chapter is that increased accountability and continuous improvement do not just happen; rather, they require a clear vision of where the organization wants to go and benchmarks related to desired outcomes that give value, direction, strategies, and reinforcement to the conversion efforts. The Program Evaluation Model discussed in this chapter allows an organization to: develop an appreciation and understanding of the use of desired outcomes to increase accountability and guide continuous improvement; aid thinking about the anticipated benefits of organization change; allow organization staff to work together on agreed upon desired outcomes; provide change strategies based on quality improvement principles; guide the change efforts. This chapter is based on the principles of outcomes research and outcome-based evaluation, whose fundamental assumptions are that outcomes need to be objective and measurable, monitored, and used for multiple purposes including increased accountability and continuous improvement. As discussed throughout the chapter, there are a number of advantages of using desired outcomes. First, they are complementary to the characteristics of the change process. For example, change involves an ongoing process that needs direction and monitoring toward desired outcomes. Second, change is also gradual and requires positive feedback for all those involved in the change process, as well as communication to those persons how well the process is doing. Third, change also requires a commitment to objective, measurable results that are reflected in desired outcomes. Without that commitment, change is often lost in philosophy and mission statements that are less than effective without the specific indicators provided by the desired outcomes.
39
Program Evaluation
And finally, change is incremental, with results building on results. Without measurable outcomes, the incremental nature and positiveness of change is often overlooked. Despite their potential advantages and benefits, there are also some limitations regarding outcome evaluation and the performance and value measurement on which it is based. First, outcome measurement should be balanced with other considerations. For example, it is clear that not all outcomes can be measured quantitatively. Second, outcome measurement can have more cost than benefit. One needs to be cautious that the outcomes measurement system does not consume in resources more than its information is worth. Third, the usefulness of outcomes measurement varies by its use. Outcome measurements are only useful to the management or decision-making process to the extent that they are used and that they answer the right questions. And fourth, organizations are limited in their ability to influence outcomes. For example, education and social service programs are often directed at changing significant social concerns or problems. From programs as diverse as AFDC, Medicaid, forestry, corrections, and higher education, outcome measures will focus on societal issues such as crime rates, self-reliance, and performance in the job market. However, in each of these areas, the organization or program is not the sole determinant of outcomes. Economic trends, demographics, natural disasters, and other factors also play a role. Yet, the specific program is frequently held accountable for the reported results or outcomes. Users of outcome measurements need to understand that limited role, and recognize that a particular program cannot effect outcomes solely. But these outcomes are still important, because they are indicators of whether one is making a difference and converting in the desired direction. If the measures are in place, one can begin to explain the relative worth of public and private programs and try to determine if the combination of efforts is accomplishing the organization’s mission. As stated by Senge (1990, p. 88), “What ought to be primary are the results and accomplishments that the people in that enterprise really care about. Then the real question becomes how good you are at it and how you can do better.”
Study Questions 1. What is the difference between organization performance and organization value outcomes? Why is this distinction important? 2. What is the difference between individual performance and individual value outcomes? Why is this distinction important? 3. What is outcome measurement and what is its importance? 4. Compare and contrast performance assessment to consumer appraisal. What are their similarities and differences?
40
Chapter 2
5. Compare and contrast functional assessment to personal appraisal. What are their similarities and differences? 6. Complete an “evaluability assessment” (Exhibit 2-1) on an education, health care, or social services program with which you are familiar. What does it tell you? 7. Complete Exhibit 2-2 (Outcomes Planning Inventory) on an education, health care, or social service program with which you are most familiar. Complete first as a baseline evaluation, then as desired outcomes. Critique the resulting totals in each cell (see Figure 2.3). How would you characterize the results? 8. What is an organization's personality? Use the descriptors found in the chapter to characterize the two profiles obtained in question 7. 9. Review the 10 success factors listed in Table 2.2. Why is each factor important to outcome-based evaluation utilization? 10. Describe in your own words the critical components and steps in program evaluation and organization change.
Additional Readings Hodges, S. P., & Hernandez, M. (1999). How organizational culture influences outcome information utilization. Evaluation and Program Planning, 22, 183–197. Johnson, R. B. (1998). Toward a theoretical model of evaluation utilization. Evaluation and Program Planning, 21, 93–110. Labovitz, G., & Rosansky, V. (1997). The power of alignment: How great companies stay centered and accomplish extraordinary things. New York: John Wiley & Sons, Inc. Lowenthal, J. (1994, March). Reengineering the organization: A step-by-step approach to corporate revitalization. Quality Progress, 131–133. Mowbray, C. T., Bybee, D., Collins, M. E., & Levine, P. (1998). Optimizing evaluation quality and utility under resource constraints. Evaluation and Program Planning, 21, 59–71. Patton, M. Q. (1997). Utilization-focused evaluation (3rd ed.). Beverly Hills, CA: Sage Publications. Torres, R. T, Preskill, H., & Piontek, M. E. (1996). Evaluation strategies for communicating and reporting: Enhancing learning in organizations. Newbury Park, CA: Sage Publications. Turnbull, B. (1999). The mediating effect of participation efficiency on evaluation use. Evaluation and Program Planning, 22, 131–140.
3 Effectiveness Evaluation OVERVIEW
42
Effectiveness Evaluation Model and Analysis Steps Performance Goals (Anticipated Outcomes) Purpose and Comparison Condition Methodology Data Collection and Analysis Person and Organization-Referenced Outcomes Example 1: Effectiveness of a Demonstration Program Overview Step 1: Performance Goals (Anticipated Outcomes) Step 2: Purpose and Comparison Condition Step 3: Methodology Step 4: Data Collection and Analysis Step 5: Outcomes Critique Example 2: Effectiveness of Consumer-Generated Survey Data Overview Step 1: Performance Goals (Anticipated Outcomes) Step 2: Purpose and Comparison Condition Step 3: Methodology Step 4: Data Collection and Analysis Step 5: Outcomes Critique Example 3: Influence of Participant Characteristics and Program Components Overview Step 1: Performance Goals (Anticipated Outcomes) Step 2: Purpose and Comparison Condition Step 3: Methodology Step 4: Data Collection and Analysis Step 5: Outcomes Critique Summary Study Question Additional Readings
43 43 48 49 51 52 53 53 53 53 54 54 54 54 56 56 56 56 56 57 58 58
41
59 59 59 59 60 60 61 61 62 64 64
42
Chapter 3 One never notices what has been done; one can only see what remains to be done. MARIE CURIE
Overview Effectiveness evaluation determines the extent to which a program meets its stated performance goals and objectives. Its primary uses in outcome-based evaluation are to: (1) compare the program’s goals with its achieved outcomes; (2) report the program’s performance and value outcomes; and (3) provide formative feedback information for program change and continuous improvement. The reader will see some similarities between program evaluation (Chapter 2) and effectiveness evaluation. The intent of both is to enhance a program’s accountability and service quality. The major difference is that the determination of current and desired outcomes and their use in program evaluation (Chapter 2) does not require a comparison condition as is the case in effectiveness evaluation. Thus, a key point to remember about effectiveness evaluation is the necessity of establishing a comparison condition against which accountability and outcome information can be judged. As we will see throughout this chapter, the usual comparison condition is comparing anticipated with actual outcomes. The proposed approach to effectiveness evaluation discussed in this chapter is based on five factors that have been discussed thus far in the text. First, the increased need for accountability has changed our focus from process to outcomes. For example, the Government Performance and Results Act of 1993 requires agencies to articulate goals and report results and outcomes achieved; similarly, the National Performance Review Initiatives focus on the establishment of customer service standards and the development and use of performance agreements (Wholey, 1997). These and other efforts are focusing evaluation toward the measurement of a program’s effectiveness and efficiency by setting and evaluating specific desired outcomes, performance indicators, and performance targets. Second, the reform movement in education, health care, and social services has resulted in both significantly changed service delivery systems and outcome evaluation strategies. Third, the changing evaluation strategies discussed in Chapter 1 have placed increased emphasis on methodological pluralism, participatory action research, empowerment evaluation, and consumer-focused research. Fourth, the changing role of the evaluator has resulted in evaluators becoming collaborators who help organizations develop and use outcome-based evaluation methods and results. And fifth, there is an increased tendency to use outcome data for formative feedback in decision making, action research, and internal evaluation. These five factors provide the conceptual and procedural basis for the Effectiveness Evaluation
Effectiveness Evaluation
43
Model discussed in this chapter. They are also reflected in the three detailed examples of effectiveness evaluation presented later in this chapter.
Effective Evaluation Model and Analysis Steps The model presented in Figure 3.1 summarizes the five effectiveness analysis steps: performance goals (anticipated outcomes), purpose and comparison condition, methodology, data collection and analysis, and person- and organization-referenced outcomes. In reference to Figure 3.1, remember that any outcome-based evaluation involves a comparison group or condition against which one compares the significance of the results. In effectiveness evaluation, the most appropriate comparison is the one shown in Figure 3.1: comparison of the obtained person- and organization-referenced outcomes to the anticipated outcomes. Performance Goals (Anticipated Outcomes)
Effectiveness evaluation begins with the organization’s strategic and performance plans that specify anticipated outcomes. A detailed listing of potential outcomes and outcome indicators is presented in Table 3.1. Note that the
44
Chapter 3
Effectiveness Evaluation
45
46
Chapter 3
Effectiveness Evaluation
47
48
Chapter 3
four outcome categories (organization performance, organization value, individual performance, and individual value) are consistent with those presented in Figures 1.2 and 2.1.
Purpose and Comparison Condition The specific purpose of effectiveness evaluation should be stated clearly, along with the comparison condition. As we will see in the three examples presented later in this chapter, the evaluation purpose can relate to determining whether a particular program reform leads to improved outcomes for service recipients, whether self-advocates with disabilities can evaluate their own quality of life, or whether participant characteristics and program components enhance performance outcomes. As stated previously, outcome-based evaluation involves a comparison group or condition against which one compares the significance of the results. In effectiveness evaluation, the most appropriate comparison is the obtained person- and organization-referenced outcomes to the anticipated outcomes. There may well be other comparisons, however, depending on the specific effectiveness evaluation. For example, in Study 1 that follows, the comparison is between two service recipient groups; for other analyses, it may be pre/post intervention comparisons, or longitudinal-status comparisons. The point to keep in mind is that the comparison condition will be determined by two factors: the specific purpose of the effectiveness evaluation and the evaluation design used. Within-Subjects Evaluation Designs
The effectiveness of a particular program or intervention can be demonstrated by using a within-subjects evaluation design in which changes within program participants are compared against either their pre-enrollment status, their functional or behavioral condition over time, or their ability to exhibit behaviors reflective of a more functional, competent person. Three withinsubject evaluation designs are particularly useful: person as own comparison, pre/post comparisons, or longitudinal status comparisons. The second example presented in this chapter (“Effectiveness of Consumer-Generated Survey Data”) shows how a person as own comparison can be used to demonstrate the effectiveness of a program whose goal was to teach self-advocates to be surveyors of other self advocates’ quality of life, and to do so with the same degree of reliability and validity as “professional surveyors.” In regard to organization-referenced outcomes, a “within-subjects design” is also appropriate when the effectiveness evaluation focuses on organization performance or value outcomes. The typical scenario here is for the program
Effectiveness Evaluation
49
to compare itself over time in reference to goal achievement; efficiency measures; “grades” on report cards, reflecting increased person or program-referenced outcomes; consumer satisfaction; or the program or intervention’s fidelity to a new or modified service delivery model. Between Groups Evaluation Designs
Frequently, the effectiveness of a particular program or intervention involves comparing outcomes from individuals who have received particular services or interventions to those who have not. Three between-group evaluation designs are potentially useful here: hypothetical comparison group, matched pairs (cohorts), and experimental and control groups. As we will see in Examples 1 and 3 later in this chapter, a between-groups evaluation design was used to determine (1) if an integrated services agency demonstration program serving persons with persistent mental illness improved person-referenced outcomes compared to nonintegrated service programs; and (2) whether persons on welfare who participated actively in job training and job search programs have a higher probability of postprogram employment compared to participants who did not participate in these structured activities. In regard to organization-referenced outcomes, a between-groups evaluation design can also be used to determine a program’s effectiveness. A typical scenario is for benchmarks or hypothetical comparison groups formed on the basis of national databases to be used as a basis for the comparison and consequent judgments about the organization’s effectiveness.
Methodology The selection of specific outcome categories and measurement techniques depends not just on anticipated outcomes, but on three other factors as well: phase of program development, the immediacy of the outcomes, and the program’s geography. Phase of Program Development
There are at least three phases of program development: a feasibility/demonstration phase, an ongoing phase that has some longevity, or a changing phase, which is very typical in today’s education, health care, and social services program environments. These phases place additional challenges on an effectiveness evaluation. For example, in the pilot or demonstration phase, the key outcome variables might well include start-up costs, fidelity to the model standards, consumer satisfaction, and initial indicators of the program or intervention’s effects on the consumer’s education, health care, or functional
50
Chapter 3
level. As the program or intervention matures and gains longevity, organization and individual performance indicators such as program efficiency indices and longitudinal changes in consumer behavior become possible, and organization and individual value outcomes, such as consumer satisfaction and consumer-related quality of life changes, become appropriate. If a program is in a change phase, then outcomes need to be selected that reflect the changed program or intervention focus, its fidelity to the new model, its potential at saving costs and improving outcomes, and its ability to meet the requirements of measurability, reportability, and accountability. The Immediacy of the Outcomes
The central premise of any education, health care, or social service program is that the intervention, services, or supports it delivers to the target population induce some change in service recipients that effects positively their condition or values. A further premise of program impact theory is that the organization can produce this change efficiently and effectively, realizing that programs rarely exercise complete, direct control over the conditions they are expected to improve. Thus, education, health care, and social service programs generally must work indirectly by attempting to alter some critical but manageable aspect of the situation, which, in turn, is expected to lead to more far-reaching improvements (Rossi, Freeman, & Lipsey, 1999). Therefore, in selecting outcome categories and measurement techniques, users of effectiveness evaluation need to: Think clearly about what outcomes are logically related to the program’s intervention, services, or supports. Establish performance goals that are consistent with the organization’s strategic plan and capability. Determine whether the anticipated outcomes are short term, intermediate, or long term. The Program’s Geography
It is not uncommon to evaluate projects that have a common mission but that have local procedural autonomy. For example, foundation grant programs frequently have multiple sites but also have a common goal of bringing about a general change such as increased adult literacy, improved health practices of parents in rural communities, or increased participation of citizens in public policy formation. Where this is the case, Sanders (1997) suggests asking the following fundamental questions as a basis for the effectiveness evaluation: (1) have changes occurred in the desired directions, and what is the nature of
Effectiveness Evaluation
51
the changes; (2) in which types of settings have what types of changes occurred and why; (3) what insights have been drawn from the results to date; and (4) what is needed to maintain desired change?
Data Collection and Analysis Outcome-based data need to be collected and analyzed as described in Chapter 8. However, before expending considerable time, effort, and expense in data collection and analysis, the data sets need to be evaluated as to their relevance and quality. Data Relevance
Data relevance focuses on a simple question: are the data collected relevant to the effectiveness evaluation’s purpose and comparison condition? Answering this question points out the key role that the evaluator plays as a consultant in effectiveness evaluation (see Figure 3.1). My experience has been that many education, health care, and social service programs are awash (some say drowning) in data; but unless the data sets collected and analyzed are related clearly to the effectiveness evaluation’s purpose and comparison condition, then the analysis will be less than successful. Furthermore, by following the guidelines discussed previously regarding methodological pluralism, and determining the program’s evaluation capability, both the study’s data and the evaluation will be strengthened. The same can be said about selecting outcomes that are acceptable to key evaluation players, conform to psychometric standards, are affordable and timely, reflect the major organizational goals, are connected logically to the program or intervention, and are culturally sensitive. Data Quality
The “quality revolution” has resulted in a focus on the quality of outcome data being collected and analyzed. Thus, as I suggested earlier in the text, there are three key quality criteria: complete (available for all program participants), timely (current and cover the period of the analysis), and accurate (reflect actual events and characteristics). Another data quality criterion that is increasingly important in outcome-based evaluation is construct validity. Construct validity is the extent to which an outcome variable may be said to measure a theoretical construct or trait. The need to demonstrate construct validity is essential in effectiveness evaluation (as well as the other three types of outcome-based evaluation) since the outcomes selected need to capture the intent of the service, intervention, or support. This aspect of program impact theory requires the development and use of conceptual hypotheses that relate
52
Chapter 3
program services to both the anticipated outcomes and the actual person- and organization-referenced outcome measures used in the evaluation.
Person and Organization-Referenced Outcomes The comparison of the obtained person- and organization-referenced outcomes to the performance goals and anticipated outcomes specified in Step 1 is done in the discussion and recommendations sections of the effectiveness evaluation report. Two factors typically affect both the discussion and the recommendations: formative feedback and empowerment evaluation. Formative Feedback
As stressed repeatedly throughout both the previous edition of this book as well as this current text, one of the primary purposes of outcome-based evaluation is to provide meaningful information (referred to as formative feedback) to key players. Such formative feedback: Ensures key players a strong voice in the design and management of the program. Is an ongoing part of service delivery and organizing data collection, not something that is “added on” for program evaluation purposes. Links continuous improvement to person- and organization-referenced outcomes. Allows for the systematic evaluation and improvement of services. Identifies the potential foci for programmatic or systems change. Empowerment Evaluation
Empowerment evaluation is designed to help people help themselves and improve their programs using a form of self-evaluation and reflection. It is typically defined as, “a collaborative group activity [where] program participants learn to assess their progress continually toward self determined goals and to reshape their plans and strategies according to this assessment” (Fetterman, 1997, p. 383). The four steps involved in empowerment evaluation include: (1) determining whether the program is including its strengths and weaknesses in the evaluation; (2) focusing on established goals and determining where the program wants to go in the future, with an implicit emphasis on program improvement; (3) developing strategies and helping participants determine their own strategies to accomplish program goals and objectives; and (4) helping program participants determine the types of evidence required to document progress credibly toward their goals.
Effectiveness Evaluation
53
In summary, the five-step Effectiveness Evaluation Model depicted in Figure 3.1 allows one to approach effectiveness evaluation systematically. The three examples that follow show how the model’s five steps can be used to conduct an effectiveness evaluation. The examples also show how effectiveness evaluation can be used for multiple purposes and with organizations that are at different phases of development. For example, the first example uses a longitudinal approach to evaluate the effectiveness of a program change on two groups of program recipients. The second example uses the analysis of person-referenced outcomes at a moment in time to evaluate the effectiveness of a new approach to data collection from persons with disabilities. The third example shows the influence of participant characteristics and program components in an effectiveness evaluation, and the importance of using effectiveness analysis data for formative feedback.
Example 1: Effectiveness of a Demonstration Program Overview The first example involves a three-year controlled study (Chandler et al., 1996) of two California integrated service agency (ISA) demonstration programs that combined structural and program reforms. The effectiveness of these programs was evaluated to determine if they produced improved outcomes for a cross section of clients with severe and persistent mental illness. The ISA model combines capitation with program innovations based on an assertive community treatment team model (Hargreaves, 1992; Test, 1992). The staff to client ratio was 1:10. The ISA model integrated services provided by the team with the services of program specialists in employment, substance abuse, and socialization. The model also emphasized clients’ and family members’ involvement in determining the direction of services.
Step 1: Performance Goals (Anticipated Outcomes) Traditionally, programs for persons with severe and persistent mental illnesses have not stressed assertive community treatment, with its strong focus on assisted living, supported employment, community-based case management, and generic community services. The goals of the ISA were to provide these services in a cost-efficient manner and improve programmatic outcomes related to hospital care, welfare participation, living status, financial stability, social supports, leisure activities, family burden, family satisfaction, personal well-being, and friendship patterns.
54
Chapter 3
Step 2: Purpose and Comparison Condition The major purpose of the evaluation was to determine if the ISA demonstration programs that combined structural and program reforms produced improved outcomes (as listed in Step 1) for a cross section of clients with severe and persistent mental illness.
Step 3: Methodology Methodologically, two groups of service recipients were compared in each program: one group (demonstration group) that received the ISA services; and one group (comparison group) who received the usual services. In the first program (urban), 127 clients were assigned to the demonstration group and 129 were assigned to the comparison group. Due to attrition, only 83 demonstration clients and 69 comparison clients were followed for the threeyear evaluation. In the second program (rural), 125 clients were assigned initially to the demonstration program and 135 to the comparison group. Due to attrition, only 92 of the demonstration and 72 of the comparison group clients were followed for all three years. Groups were comparable on all demographic variables at the beginning of the evaluation. In reference to Figure 3.1, three outcome categories were used: outcomes related to organization value, individual performance, and individual value. The three measurement techniques used were satisfaction surveys, adaptive behavior/role status measures, and a quality of life measure. Key outcomes were listed in Step 1. For all groups, service utilization and cost data were obtained from the state and county data banks. Clients were interviewed once a year during the three-year evaluation by trained research staff to measure symptoms, self-esteem, and quality of life. Family members were eligible to be interviewed one year into the program, if clients consented. In both programs, demonstration and comparison family respondents generally did not differ on client demographic or baseline variables, relationship to the client, gender or education.
Step 4: Data Collection and Analysis Findings were considered nonsignificant if the probability of their occurrence, due to chance, was greater than .1. Generally, tests of statistical significance used linear regression (in which case a t-ratio was reported) or logistic regression (in which case the likelihood-ratio, chi square was reported). Both types of regression models used age, gender, race, and diagnosis as the covariates; data transformation and use of baseline as a covariate were sometimes used. Three-year results were summarized as an average annual mean, which
Effectiveness Evaluation
55
was calculated using the number of study years for which data were not missing. For example, the average annual mean for a client with complete data for only two years would equal the values for both years added together and divided by two.
Step 5: Outcomes Results indicated that compared to the comparison groups, clients served by the integrated service agencies had less hospital care, greater workforce participation, fewer group and institutional housing arrangements, less use of conservatorship, greater social support, more leisure activity, less family burden, and greater client and family satisfaction. Clients in the urban demonstration program, but not those in the rural program, did better than the comparison group on measures of financial stability, personal well-being, and friendship. No differences were found in either site in rates of arrest and conviction and in self-reported ratings of self-esteem, symptoms, medication compliance, homelessness, or criminal victimization. The capitated costs for demonstration clients were much higher than the costs of services used by comparison clients.
Critique This example reflects both the strengths and potential limitations of an effectiveness evaluation. On the one hand, the study showed clearly that the program’s goals related to enhanced outcomes were clearly achieved, and that the person- and organization-referenced outcomes obtained were congruent with the stated performance goals. As with any evaluation, there were some limitations. For example, attrition was problematic with family interviews. On the other hand, because clients were not interviewed when they enrolled in the evaluation, no benchmarks were available for interview-based measures. Likewise, no benchmarks were available for family-related measures, as interviews occurred only after the first year of service. Finally, designers of the demonstration program seriously overestimated the usual system costs against which the capitated ISAs were compared. Not only did the low baseline and comparison group costs prevent a test of the capacity of capitation to reduce costs, but they also distorted the research question of whether ISA is a more cost-effective service model (Chandler et al., 1996, p. 1342). Despite these limitations, this example shows the value of an effectiveness evaluation of either a program change or a demonstration program. Our next example indicates that effectiveness evaluation can also be used to evaluate the effectiveness of a new approach to data collection.
56
Chapter 3
Example 2: Effectiveness of Consumer-Generated Survey Data Overview The reform movement in disabilities has resulted in a number of changes in how people with disabilities are perceived. Terms such as inclusion, empowerment, and equity reflect this changed perception. Historically, persons with disabilities were considered as “subjects” in experimental studies, surveys, and program evaluations. By-products of this scenario were response bias and response preservation that reflected the subordinate-superior relationship. Today’s strong self-advocacy movement throughout much of the world reflects this changed perception and is also changing the way we collect data in outcome-based evaluation. Participatory action research is commonly employed, and as shown in the following effectiveness evaluation, consumers can reliably and validly survey their own perceived quality of life.
Step 1: Performance Goals (Anticipated Outcomes) In light of the changed perception and roles just described, the project’s four goals were to develop a consumer-based quality of life survey instrument; train consumers in quality of life survey techniques; have consumer surveyors interview other consumers of community-based programs for persons with disabilities; and evaluate the results to determine whether consumer-generated data were comparable to that obtained by “professional surveyors” (Schalock, Bonham, & Marchant, 2000).
Step 2: Purpose and Comparison Condition Apart from “could it be done,” the effectiveness evaluation employed participatory action research to (1) develop a consumer-friendly survey instrument to measure quality of life for persons with disabilities; (2) identify, train, and assist people with disabilities to be interviewers; (3) interview people with different types and degrees of developmental disabilities who were receiving different types of state-funded services from several different agencies in all regions of the state; and (4) evaluate psychometric results related to survey reliability, respondent’s response bias, and comparability of scores with those obtained from professional surveyors.
Step 3: Methodology In reference to the outcome categories listed in Figure 3.1, the evaluation focused only on the individual value category and used personal appraisal
Effectiveness Evaluation
57
(quality of life) measurement techniques. The 237 survey respondents represented a random sample of consumers with disabilities receiving services from 10 participating providers in a mid-Atlantic state. Three-fifths of the respondents were men and two-fifths women. Median age for the respondents was 40. A few of the people tested in the profound range of mental retardation and a few tested in the low average or normal range. About one-third tested in the mild range and one-fourth tested in the borderline range of intellectual functioning. People responding to the survey had a range of complicating conditions. About two-fifths had severe problems speaking and one-third had behavior or emotional problems. One-fourth had seizures, and many had more than one complicating condition. One-third lived with their families, and an additional one-sixth lived on their own or shared housing with another consumer or two. The remaining lived with some type of caregiver, generally in small living units of one to three consumers and live-in agency staff. The survey instrument was based on the Quality of Life Questionnaire (Schalock & Keith, 1993) that is a 40-item, 3-point Likert-scale instrument that has four empirically derived factors: satisfaction, work, independence, and community integration. Two major adaptations were made to the questionnaire by the participatory action research team. First, the wording of questions and responses were simplified, and second, a fifth factor, dignity, was added. The final questionnaire contained 50 items, with 10 items in each of the following dimensions: (1) satisfaction, whose items related to life experiences, fun and enjoyment, comparison with others, feelings of loneliness, and family involvement; (2) dignity (or respect), whose items related to safety, health, concern expressed by others, helpfulness of others, and receiving assistance in reaching goals; (3) independence, whose items related to choices and personal control; (4) integration, whose items related to community access and use, friends visiting the interviewee, and treatment by neighbors; and (5) work, whose items related to job satisfaction, co-worker relations, learning of new skills, pay and benefits, and sense of the job’s worthiness.
Step 4: Data Collection and Analysis The 29 interviewers were selected from among 90 consumer applicants. Selection came primarily through face-to-face job interviews after “help wanted” information had been mailed to 300 individuals and organizations. The project was designed for team interviewing, so reading ability was desired but not required. Interview training lasted six hours, during which interviewers practiced the interview questions, role-played possible scenarios, and practiced interviewing as teams. Residential staff, job coaches, service coordinators, and family members volunteered to assist throughout. Service coordi-
58
Chapter 3
nators were recruited to work as support persons during the interviews. The lead interviewer generally read the questions and answers and recorded the responses. The team member pointed to the response categories on a threepoint picture card and helped the lead interviewer with any problems. All consumers had the opportunity to respond for themselves during the face-toface interviews. The procedure and instrument enabled 81% of the consumers to respond for themselves; it enabled 93% of those who were verbal to respond for themselves; and the use of a flash card allowed 54% of those with severe expressive language problems to respond.
Step 5: Outcomes The Chronbach alphas for the five scales ranged between 0.73 and 0.81. Thus, the reliability analysis confirmed both the expected scales and the fact that the changing of the wording had no impact on the ability of the items to cluster and show the same pattern as that obtained in the original questionnaire (Schalock & Keith, 1993). Factor analysis confirmed the five factors around which the survey instrument was constructed; moreover, the first four factors confirmed the four factors comprising the original questionnaire from which the present items were derived. The fifth factor (“Dignity”), whose items were not included in the original questionnaire, did not develop as clean a factor. Most consumers had complete data. Concerns about acquiescence among respondents with disabilities are periodically mention in the literature. Response analyses indicated that only 2.5% of the people answered more than 90% of the questions with the most positive response, and only 5.5% answered 10% or fewer questions with the most positive response. The median was 42% of questions answered with the most positive response, with the overall pattern resembling a bell curve, slightly skewed toward 33%, since all questions had three possible answers. In addition, raw and scaled scores were equivalent to those reported from the large standardization group involved in the original questionnaire’s development and standardization (Schalock & Keith, 1993).
Critique This example shows that effectiveness evaluation can be used for multiple purposes, including the demonstration that consumer-generated quality of life survey data are psychometrically sound and comparable to that obtained from more traditional, professionally based procedures. Although not reported in the outcomes section, results of a path analysis conducted on the personreferenced outcomes indicated that perceived dignity and work contributed
Effectiveness Evaluation
59
the most to perceived life satisfaction and the degree of independence consumers felt, and the degree of their integration into the community indirectly affected measured satisfaction. Consumer abilities (as measured by intelligence tests) or characteristics such as age, communication problems, and ambulating difficulties had no effect, either directly or indirectly, on life satisfaction. These data suggest that effectiveness evaluation involves not just an overall evaluation statement about whether the program or intervention meets its stated goals or objectives, but it also identifies some of the variables that account for its effectiveness. Such is the case in our third example.
Example 3: Influence of Participant Characteristics and Program Components Overview Taxpayer concerns over government welfare programs for the poor are being politically expressed through various reform proposals (Seninger, 1998). Public perceptions of welfare as a source of long-term dependence on public assistance have lead to proposals for making welfare payments contingent on work or participation in programs that prepare a person for work. The potential employment impact of welfare-to-work has been explored through a number of experimental welfare employment programs over the past 15 years. Outcome evaluations of these experimental programs show that they usually increase employment, and in a number of cases increase earnings or reduce welfare benefits (Greenberg & Wiseman, 1992). What has not been done to date, however, is to evaluate the effect of personal characteristics and program components on welfare-to-work employment outcomes. Such was the purpose of the following effectiveness evaluation (Seninger, 1998).
Step 1: Performance Goals (Anticipated Outcomes) Despite the potential of the welfare-to-work reform, the problem of attrition between mandatory reporting for initial, assessment-orientation activities and subsequent participation in structured activity is a primary concern. People who are required to report to welfare employment programs have differences in behavioral attitudes and personal circumstances for the choice between work and welfare. For example, individuals who enter a program and actively participate in a structured activity probably have a stronger orientation toward work, which may in turn, affect their probability of getting a job. Other persons who report to the program, but then do not participate in a structured activity, may have less of a chance of employment after the pro-
60
Chapter 3
gram (Seninger, 1998). Thus, a program that provides skills training, work experience, job search, remedial education, or job readiness will have a higher probability of postprogram employment compared to one that does not have these structured activity tasks.
Step 2: Purpose and Comparison Condition The purpose of this effectiveness evaluation (Seninger, 1998) was to estimate the effect of personal characteristics on postprogram employment, conditional on selection into structured activity participation. The comparison condition was that persons who actively participate in either skills training, work experience, job search, remedial education, or job-readiness training would have a higher probability of postprogram employment compared to participants who did not participate in one of these structured activity tracks.
Step 3: Methodology In reference to Figure 3.1, the evaluation focus was on individual performance outcomes, organization value (fidelity to model) standards, and individual performance (role status) measures. The evaluation used management information systems data from a program for Aid to Families with Dependenl Children (AFDC) recipients in a large mid-Atlantic seaboard city. The program was part of the federal government JOBS program to provide employment-related services to AFDC recipients under the auspices of the 1988 Family Support Act. Participation was mandatory for able-bodied recipients who were not working full time, although mothers regardless of marital status with young children were exempted. The initial selection and referral occurred at local social service offices where recipients of AFDC grants go through an intake process to determine eligibility and exemptions. This intake results in a pool of nonexempt eligible persons who represented about 20% of the city’s AFDC population. The first step within program selection occurred after initial intake into the program when participants were placed in one of the several structured activities including job search, work experience skills training, job readiness training, or remedial education. In many cases, the placement into a structured activity was a joint decision between the case worker’s assessment of the participant’s training needs and abilities and the participant’s desire for a certain program track. Persons exited the program either when they completed their structured activity track or were on inactive hold for a long enough period to warrant termination from the program. Some of the inactive persons were terminated from the program when they became employed, and some of the active par-
Effectiveness Evaluation
61
ticipants failed to find employment at the time of their completion of program requirements. Nonsubsidized employment, a basic objective of the program, was achieved by persons finding a job on their own or with the assistance of a placement counselor. Some persons did not get jobs because of problems with health or transportation. The sample of 2839 persons who had exited the program by December 1993 was drawn from a base of 8,503 persons who entered the program between January 1992 and March 1993. Of the 2,839 persons, 1,177 exited with a non-subsidized job and another 1,662 left without a job. Postprogram employment was verified and reported by caseworkers assigned to the participants. Data quality controls with continual audits and revisions were imposed on the management information system data since the results were reported to state and federal agencies who conducted comprehensive audits at the end of each fiscal year.
Step 4: Data Collection and Analysis The effect of client demographics and program strategies on participation in a structured activity and on employment was estimated with a bivariate probit. Participation in a structured activity was a joint decision between an individual and a case manager that occurred after mandatory reporting for intake assessment. A second decision was the search for a job, which occurred near the end of a person’s program. Job search was supported through job counselors who provided job placement services to all persons. Some job seekers proceeded on their own and found a job. The estimated probabilities of participation and of employment were based on two assignment equations for unobserved indices. Participation in a structured activity was influenced by expected, present value of a future earning from program activities and a person’s welfare grant income. The latter may be reduced by sanctioning for nonparticipation in a structured activity. An alternative present value was defined for not participating in a structured activity.
Step 5: Outcomes Differences between those who did and did not get a job were most pronounced for several characteristics considered to be obstacles to employment. Specifically, the number of months a person had been receiving AFDC benefits when they entered the program did not significantly affect the probabilities of participation in a structured activity or getting a job. However, being single, teen parenthood, and lack of work experience significantly reduced the likelihood of getting a job. These results suggest that it is not welfare
62
Chapter 3
recipiency per se that affects the likelihood of employment, but rather family status and previous work experience. Older participants had a higher probability of getting a job, an aging effect that may reflect some people’s desire to get a job after a period of time in the welfare system (Seninger, 1998).
Critique This effectiveness evaluation reflects that multiple-outcome indicators and measurement techniques are used frequently to determine whether a program obtains its performance goals and to provide formative feedback to program managers. Education, health care, and social service programs serve a heterogeneous group of people, which in turn affects the interpretation of the program’s goal attainment. Therefore, one of the advantages of an effectiveness evaluation such as that just presented is to determine which participant characteristics affect programmatic outcomes. In addition, an effectiveness evaluation can clarify which specific programmatic components account for person-referenced outcomes. In the analysis just described, for example, welfare recipients who received structured activity services had a higher probability of employment than recipients who did not receive skills training, work experience, job search, or a composite of other activities. This is important formative feedback information.
Summary In summary, this chapter has focused on effectiveness evaluation whose major purpose is to determine the extent to which a program obtained its performance goals. Data resulting from effectiveness evaluations are used primarily for reporting program results and as a basis for data-based management, formative feedback, and programmatic change and improvement. As noted in Figure 3.1, effectiveness evaluation involves a five-step process that begins with stating the program’s performance goals and then proceeds sequentially to indicate the evaluation’s purpose and comparison condition, describe the effectiveness analysis methodology, discuss data collection and analysis, and use the obtained person- and organization-referenced outcomes as a basis for comparison with the program’s anticipated outcomes. This comparison permits evaluators to say with some assurance the extent to which the program, intervention, or service has obtained its performance goals. The three examples presented in the chapter reflect the use of the effectiveness evaluation model presented in Figure 3.1 and how effectiveness evaluation can be used for multiple purposes, including determining the effectiveness of a demonstration program, consumer-generated survey data, and the
63
Effectiveness Evaluation
influence of participant characteristics and program components. Throughout the chapter, two additional points were made: the importance of the evaluator as a consultant to the process, and the critical need to use the obtained results for formative feedback to management for program change and improvement. In today’s world of increasing accountability requirements, it is essential that program managers do effectiveness evaluation. Although the extent and complexity of those evaluations depend upon the program’s capability, no program in the twenty-first Century should be without this type of outcomebased evaluation. All key players are concerned about the degree to which a program meets its performance goals; and all program administrators need to use the results from their effectiveness evaluation for reporting and program improvement purposes. In addition, key players in outcome-based evaluation are also concerned about whether the program made a difference. We turn in Chapter 4 to address that issue, describing the third type of outcome-based evaluation: impact evaluation.
Study Questions 1. Summarize the basic components, uses, and weaknesses of effectiveness evalua-
tion. 2. Assume that you are an administrator of an education program and plan to do an
3.
4.
5.
6.
7.
8.
effectiveness evaluation of your program. Outline the specific steps and procedures of your analysis, following the five steps summarized in Figure 3.1. Review Table 3.1 if necessary. Assume that you are an administrator of a mental health program for either children or adults and plan to do an effectiveness evaluation of your program. Outline the specific steps and procedures of your analysis, following the five steps summarized in Figure 3.1. Again, review Table 3.1 if necessary. After you have outlined the evaluation activities in questions 2 and 3, answer each of the following: (a) how would you evaluate the degree of implementation? (b) what person and organization-referenced outcomes would you use and how would you measure them objectively? Assume that you are an administrator for a demonstration program for reducing substance abuse and are required to evaluate your initial results. How might the components of this evaluation compare with those outlined in questions 2 and 3? What are some of the personal characteristics, program components, and contextual factors that would influence the effectiveness evaluation of a communitybased corrections program? If you could interview a program director for 60 minutes, what questions would you ask that would provide the most valid indicators of the respective program’s ability to do an effectiveness evaluation? How would each of the following affect the selection of outcomes described in
64
Chapter 3
questions 2 and 3: stage of program development, the immediacy of the outcomes, and the program’s geography? 9. Review a journal article on effectiveness evaluation. How do the methods compare with those outlined in this chapter and summarized in Figure 3.1? 10. What obstacles do you see within education, health care, or social service programs to do effectiveness evaluation? What do you feel is the basis for the obstacles, and how would you overcome them?
Additional Readings Chandler, D., Meisel, J., Hu, T., McGowen, M., & Madison, K. (1996). Client outcomes in a three-year controlled study of an integrated service agency model. Psychiatric Services, 47(12), 1337–1343. Heckman, J. J. & Robb, R. Jr. (1985). Alternative methods for evaluating the impact of intervention. In J. J. Heckman and B. Singer (eds.), Longitudinal analysis of labor market data (pp. 156–246). New York: Cambridge University Press. Judge, W. Q. (1994). Correlates of organizational effectiveness: A multilevel analysis of multidimensional outcomes. Journal of Business Ethics, 13(1), 1–10. Krathwohl, D. R. (1993). Methods of educational and social science research: An integrated approach. New York: Longman. Martin, L. L., & Kettner, P. M. (1996). Measuring the performance of human service programs. Thousand Oaks, CA: Sage.
4 Impact Evaluation OVERVIEW
66
Outcomes versus Impacts Comparison Condition Impact Evaluation Designs Person as Own Comparison Pre/Post Change Comparison Longitudinal Status Comparison Hypothetical Comparison Group Matched Pairs (Cohorts) Experimental/Control Steps Involved in Impact Evaluation Study 1: The Impact of Different Training Environments Purpose/Questions Asked Comparison Condition Core Data Sets and Their Measurement Results Discussion of Results and Their Implications Study 2: The Impact of Transitional Employment Programs Purpose/Questions Asked Comparison Condition Core Data Sets and Their Measurement Results Discussion of Results and Their Implications Summary Study Questions Additional Readings
67 67 70 71 71 73 73 73 75 82 83 83 83 84 85 86 87 87 88 88 89 92 93 94 95
You can learn a lot by looking. YOGI BERRA
65
66
Chapter 4
Overview Impact evaluation determines whether a program made a difference compared to either no program or an alternate program. An absolute requirement in impact evaluation is that you have a comparison group or condition against which you compare the significance of your results. For example, you might be interested in determining the impact of job training program A by comparing the posttraining job status of its graduates with graduates of job training program B, or with persons not involved in a job training program. It has been my experience that program administrators seldom look at a comparison group of similar persons not in the program and ask, what would have happened to my service recipients had they not entered the program or received the service? My experience has also been that funding groups and policymakers are very interested in impact evaluation, for they want to know whether a particular education, health care, or social services program made a difference, and whether some types of programs do better than others. Impact evaluation involves data collection, reconcontacting people over time, and thinking about what actually happens to the service recipients and what would have happened had they not been served, or served in a comparable program. Specific purposes include: Focusing on the program’s impacts. Determining whether these impacts can be attributed with reasonable certainty to the intervention or services being evaluated. Providing formative feedback to program managers, policymakers, and funding bodies for both accountability and continuous improvement purposes. The data requirements for conducting an impact evaluation are similar to those required for effectiveness evaluation. The outcome indicators used are those shown in Figure 2.1 and Table 3.1: individual and organization performance outcomes, and individual and organization value outcomes. The measurement techniques are those shown in Figure 1.2: performance and functional assessment and consumer and personal appraisal. However, there are two important differences between effectiveness and impact evaluation: (1) cost estimates become more important in impact evaluation since they are frequently used for equating program intensities; and (2) estimated impacts are made based on the statistically significant mean differences (if any) between the outcomes.
67
Impact Evaluation
Outcomes versus Impacts Doing an impact evaluation is facilitated if you understand the difference between outcomes and impacts. This difference is diagrammed in Figure 4.1. The critical point to remember from Figure 4.1 is that program impacts represent the statistically significant differences in outcomes between the comparison conditions.
Comparison Condition Conducting an impact evaluation requires a comparison group or condition against which the results are compared. However, one needs to think beyond a simple comparison of mean differences. For example, calculating group differences simply in terms of the mean values of outcome variables may produce biased estimates of intervention or treatment effects, especially if there are differences among preassigment characteristics. Hence, regression or analysis of co-variance techniques are frequently used in impact evaluation. These techniques are advantageous because they control for initial sample differences and because they can be expected to produce unbiased estimates of intervention effects. Regression techniques also offer two additional advan-
68
Chapter 4
tages over simple comparison of mean values: (1) they provide more powerful tests of the program or intervention’s potential effects because they control statistically for the influence of other explanatory variables; and (2) by including the explanatory variables in the regression model, one can assess directly their individual net influences on the outcome variable(s). Six outcome-based evaluation designs, as listed on Figure 4.2, are used commonly in impact analysis. The selection of a particular design is based on at least four factors. First, the evaluation design depends upon the question(s) asked and the program’s evaluation capability. For example, one should not use an experimental/control design if one cannot assign participants randomly into different comparison conditions or use a comparable program with a different intervention or focus. Second, the evaluation design is influenced frequently by factors such as the developmental status of the program, the standardized data sets available, and the number of assumptions one is willing to make. Third, there is no such thing as the “best” evaluation design independent of the questions asked and the program’s evaluation capability. And fourth, as shown in Figure 4.2, there is a direct relationship among outcomebased evaluation designs and the certainty, precision, comparability with other studies, and generalizability of the results. The general principle is that the closer one can come to a true experimental/control design, the more certain one is of the results, the more precise one is in maximizing internal and external validity, the fewer assumptions one needs to make about the comparability with similar studies, and the more one is able to generalize the results to similar populations. The potential downside is that for many education, health care, and social services programs the feasibility of using the “true” experimental/control design is less feasible due to social, political, and cost factors.
Impact Evaluation
69
The three typical evaluation designs used to construct the comparison condition include hypothetical comparison group, matched pairs (cohorts), or experimental/control. Although these designs are more difficult to form than person as own comparison, pre/post change comparisons, or longitudinal status comparisons, their use results in greater precision, certainty, comparability, and generalizability of the evaluation’s results. The methods of classical experiments generally provide the most accurate and statistically valid means of identifying a comparison group, since these methods randomly assign program applicants or participants to either experimental or control conditions. The advantage of this design is that if the number of persons assigned is moderately large, the analyst can be reasonably sure of the comparability of the two groups. The comparability of the groups in terms of unmeasurable characteristics is also important, since it is very difficult to control for the influence of such characteristics using statistical methods. Furthermore, results based on data generated from an experimental design tend to be stable with respect to change in the specific details of the estimation process. While experimental/control designs have been used for some education, health care, and social programs, they are not always feasible. For example, it may not be possible to conduct random assignment in an entitlement program that guarantees services to all members of a specific target group or to withhold medical care from some. However, if all applicants cannot be served, or if the intervention is a new one in which there is still doubt about its impact, then random assignment can be a fair way of deciding who should get the program services. Another alternative is to use matched pairs (cohorts) in which one member of each pair receives one type of intervention or rehabilitation service and the other member the other. When it is not feasible to use either the experimental/control or matchedpair design, then a third approach to identifying a comparison group is to use conjecture in which a hypothetical comparison group is generated. By relying on a general knowledge about the average outcomes of nonparticipants or on a knowledge of preenrollment status, the analyst can estimate what would have happened to participants had they not enrolled in the program. Some researchers of supported work programs, for example, have estimated impacts under the assumption that had participants not enrolled in the program they would have continued in the activities they had prior to enrollment. This evaluation design clearly represents inexact estimation procedures, and therefore results in less precision, certainty, comparability, and generalizability. The chapter contains two major sections. In the following section, I discuss each of the six outcome-based evaluation designs that can be used to generate a comparison group or condition. Although three of these six designs (hypothetical comparison group, matched pairs (cohorts), and experimental/control) are more likely to be used in impact evaluation, all six are
70
Chapter 4
discussed here due to the potential use of the other three (person as own comparison, pre/post change comparison, or longitudinal status comparison). Examples of each design are presented in Exhibits 4-1–4-6. Additionally, two examples of impact evaluation are discussed in more detail to outline the five steps that are typically involved in conducting an impact evaluation. The chapter concludes with a brief discussion of the strengths and limitations of impact evaluation.
Impact Evaluation Designs In his book Designing Evaluations of Education and Social Programs, Cronbach (1982) states: Designing an evaluative investigation is an art. The design must be chosen afresh in each new undertaking, and the choices to be made are almost innumerable. Each feature of a design offers particular advantages and entails particular sacrifices, (p. 1)
Whether an art or a science, impact evaluation is challenging both practically and methodologically. Thus, in reading about the impact evaluation designs and procedures please keep the six impact guidelines listed in Table 4.1 clearly in mind. Also remember that there is no such thing as the “best” evaluation design independent of the questions asked. Thus, ask yourself again
71
Impact Evaluation
the two fundamental evaluation questions: For what purpose will I use the data; and what data will I need for the intended use?
Person as Own Comparison I am sure that many readers have had the same experience that I have had regarding testifying before an appropriation committee, making a presentation at a conference, or talking with a group of stakeholders. We have all shared success stories. Person-as-own-comparison evaluation designs allow one to share individual success stories, and at the same time, to demonstrate the program’s impact. But before you rush out and start doing this type of evaluation, keep in mind the following critical point: good single-subject research requires considerable skills at research methodology, since “controls” must be built into the design in order to demonstrate certainty and generalizability. Examples of single subject evaluation designs can be found in Baker and Curbow (1991), Campbell (1992), Campbell and Stanley, (1963), Cook and Campbell (1979), Hersen and Barlow (1984), and Kazdin and Tuma (1982). The essence of these designs is that you establish a baseline for the individual against which you then evaluate the effects of your intervention through one or more of the following person as own control designs: Reversal (ABAB): Multiple baseline across behaviors Multiple baseline across situations
Measure baseline (A); apply procedure (B); return to baseline (A); repeat procedure (B) Apply procedure to different behaviors one at a time with the same individual Apply procedure to behaviors across situations and at different times
Within clinical settings, however, frequently one cannot meet the more rigorous requirements of the reversal (ABAB) design. Therefore, one will most generally use the multiple-baseline design across behaviors and situations. Although this design is less precise and therefore limits one’s certainty, comparability, and generalizability, it does identify and describe promising intervention or treatment approaches.
Pre/Post Change Comparisons The requirement in the pre/post change evaluation design is that you have comparable measures on the individuals before intervention and sequentially thereafter. An example would be the employment status of service recipients after a job training program. This technique is used frequently when there is
72
Chapter 4
Exhibit 4-1 Example of Pre/Post Change Comparison The study (VanGelder, Gold, & Schalock, 1996) involved evaluating organizational changes resulting from an outcomes-based staff training program. Administrators of each of the 33 participating programs completed on a pre/ post basis the Enhancing Employment Opportunities Program Planning-Conversion Guide (Calkins et al., 1990). The nine critical agency change functions evaluated included: philosophy, program and resources, program practices, program evaluation, person/job match, employer expectations, systems interface, natural environment/supports, and quality of work life. The “pre” evaluation was done the first day of the training sessions, and the “post” evaluation, five months later. The evaluation was done by the same person (program manager) each time. The results of the evaluation are summarized below. The change ratio was computed by dividing the difference between the post- and preevaluation by the preevaluations. Note that significant changes occurred in the organization functions of philosophy, program evaluation, employer expectations, and [use of] natural environments/supports. Changes in Overall Organizational Functions Function Philosophy Program and Resources Program Practices Program Evaluation Person/Job Match Employer Expectations Systems Interface Natural Environments/Supports Quality of Work Life *p < .05
Average pre
Evaluation post
Change ratio
Impact Evaluation
73
no experimental or comparison group, and therefore it represents a low level of certainty in one’s analysis. An example of organizational changes following an outcome-based staff training program is presented in Exhibit 4-1.
Longitudinal Status Comparisons The longitudinal status comparison is a potentially good design since it allows one to look at change in service recipients over time and determine their living, work, educational, or health status at some point following program involvement. However, it is a relatively weak design in impact evaluation if there is no control or comparison group. Therefore, one is limited frequently in the degree of certainty in precision (internal validity), comparability, and generalizability (external validity). Longitudinal status comparisons require that: (1) multiple measures are taken at regular time intervals; (2) the person is followed for extended periods after program involvement; and (3) the measures taken before or during intervention are compared with those obtained during the follow-up periods. An example is shown in Exhibit 4-2.
Hypothetical Comparison Group The hypothetical comparison group method requires the evaluator to form a hypothetical comparison group that can be used as a benchmark for comparison purposes. The hypothetical comparison group can be based on one’s general knowledge of average outcomes from other, closely related programs; preenrollment status; archival data (for example, published indices such as Medicare costs); and national databases (see following chapter). By relying on a general knowledge of the average outcomes of nonparticipants or on a knowledge of preenrollment status, the analyst may estimate what would have happened to participants had they not enrolled in the program or had been involved in a comparable program. An example of a hypothetical comparison group is presented in Exhibit 4-3. Note the careful detail given to the assumptions made in forming the hypothetical comparison group and the potential impact of these assumptions on the evaluation’s certainty, precision, comparability, and generalizability.
Matched Pairs (Cohorts) There are two techniques that are used frequently in equating participants in an evaluation study: randomized placement into two or more conditions or matching participants and creating matched pairs prior to random assignment (Fairweather & Davidson, 1986). If equated on relevant variables,
74
Chapter 4
Exhibit 4-2 Example of Longitudinal Status Comparison The purpose of the study (Schalock & Harper, 1978) was to determine (1) the current status nine years after deinstitutionalization of a group of 166 adults with mental retardation who had been placed into a community-based program for persons with mental retardation; and (2) the significant predictors of both successful community placement and program progression. Outcome Variables Community Successful 4 = exited or remained in program (10 or fewer incident reports) 3 = exited or remained in program (10 or more incident reports) Community nonsuccessful 2 = admitted to a mental health facility 1 = returned to state mental retardation facility Program Successful 4 = exited (placed in independent/assistance housing and work) 3 = exited (placed in independent/assistance housing or work) Program Nonsuccessful 2 = progressed through one or more training components (did not exit) 1 = no training program progression (within 2 years)
No. of Persons
117 23 12 14
55 31
29 51
Predictor Variables
Community Success: Family involvement, work skills, institution size, visual processing, sensorimotor skills, gender, family attended IPP, social-emotional behavior, community size. Program Success: Language skills, sensorimotor skills, tested intelligence, previous education, family attendance at IPP, institution size, visual processing, community size.
Impact Evaluation
75
then the matched pairs theoretically provide a relatively equivalent representative sample from the targeted population who can then be randomly assigned to the two (or more) programs being compared. When such is done, the design is frequently referred to as a matched group design. Matched pairs (cohorts) is a good outcome-based evaluation design to use if (1) you have individuals in two different programs or in significantly different program foci; (2) you have comparable person- or program-referenced outcomes; and (3) you can match the individuals, groups, or programs on a number of recipient characteristics that are logically and statistically related to the outcome measure(s). The latter criterion is very important, since you want to be able to say that these individuals were essentially the same prior to the intervention. Common recipient characteristics on which individuals generally are matched include age, gender, functional level, educational status, intellectual level, risk level, diagnosis, prognosis, or the individual’s time in the program. The reader will find four examples of the matched pairs (cohorts) design in Exhibits 4-4 and 4-5. Exhibit 4-4 describes the matching procedures used in three different studies dealing with therapy comparisons, training environments, and skill acquisition. Exhibit 4-5 presents a more detailed description of matching procedures in the evaluation of a staff training program (VanGelder, Gold, & Schalock, 1996).
Experimental/Control This is the “true” evaluation design and the one that most outcome-based evaluators strive for. It is the best evaluation design, since it controls for both internal and external validity. Before considering an example, let’s review some of the key points about internal and external validity. Internal Validity
This type of validity focuses on the certainty as to whether the intervention produced the effects or as Cohen (1987) states, “the extent to which the effect of the treatment variables can be disentangled from the correlated extraneous variables” (p. 98). In order to demonstrate internal validity, one must demonstrate successfully that the following “threats” to internal validity have been controlled: selection bias, selection maturation, history, instrumentation, and an effect referred to as “regression towards the mean,” which involves extreme scores on the first test or observation tending to shift toward the middle (mean) on subsequent tests or observations (Cook & Campbell, 1979).
76
Chapter 4
Exhibit 4-3 Example of Hypothetical Comparison Group This study (Rusch, Conley, & McCaughlin, 1993) analyzed the benefits and costs of supported employment programs in Illinois from 1987 to 1990. Benefits and costs were identified and valued from three perspectives: society’s, taxpayer’s, and supported employees’. Calculating costs and benefits involved, in part, the use of a hypothetical comparison group as seen in each of the three analytical perspectives. 1. Social perspective. Social benefits were measured by the increase in earnings of supported employees over what they would have earned in an alternate program and the costs that would have incurred if placed in an alternate program. A key assumption was that all participants would have been placed in an alternate program if not engaged in supported employment. One justification for this assumption was that almost all supported employment participants were selected out of alternative programs. Benefits and costs were measured over each of the four years that the 30 programs were in operation as well as the combined four-year period. To es-
timate increased earnings, the first step was to calculate the gross earnings of participants. The second step was to estimate the earnings that participants would have had in the absence of the supported employment program. It was assumed that participants would have had the same earnings as participants in the alternativeplacement program from which the supported employment participants were selected. Average earnings currently reported by all alternative placement programs were used to estimate the probable alternative placement earnings for those few individuals who entered supported employment directly from school or without a reported previous placement. To estimate the savings in the costs of alternative placements, it was assumed that each supported employment participant would have incurred the same average costs as current clients in their previous placement. The costs were estimated from placements to these organizations. In those few cases involving participants who entered supported employment directly from school or without a reported previous placement, the
External Validity
This type of validity indicates the extent to which generalizations can be made to other programs or conditions and depends upon service recipients being selected randomly for program inclusion. If so, sampling theory suggests that the randomness of the selection process should result in the groups being
77
Impact Evaluation
average costs of clients in all alternative placement programs combined were used to estimate what would have been experienced by these participants. The costs of supported employment programs were estimated by adding all reported payments by state and local agencies to the 30 providers and the tax savings to employers who made use of Target Jobs Tax Credits program for participants involved in these programs. 2. Taxpayer perspective. Taxpayer benefits were measured as the total income and payroll taxes paid by supported employees, reductions in public support, and saving in expenditures for alternative programs. All supported employment costs were borne by taxpayers. Benefits from the taxpayers perspective included taxes withheld, reduction in government subsidies, and savings in operational expenditures for alternate programs. Taxpayer costs were the same as those from society's perspective since participants did not incur costs. 3. Supported employee perspective. Total social benefits were divided between participants and taxpayers. The benefits to participants were estimated by subtracting taxes withheld and decreases in income support from
public agencies from the estimated increased earnings of supported employees . Tax payments were estimated from the federal and state income taxes withheld, as well as FICA (Social Security), as reported monthly. For most of the 729 individuals, this was the first time in their lives that taxes had been withheld; no taxes had been deducted in their previous program placement. Savings in government subsidies were estimated by summing decreases in Supplemental Security Income (SSI), Social Security Disability Insurance (SSDI), Public Aid (PA), and Aid to Dependent Children (ADC). Reductions were calculated for each individual by comparing monthly benefits received before entering supported employment with those received while in the program. These reductions may be understated because they are based on the amount participants received before entering supported employment and were not adjusted for costof-living increases. In addition, some participants received government subsidies only after entering supported employment. Adapted with permission from Rusch et al. (1993).
very similar to one another before the intervention. If that is true, then one can generalize the evaluation results to similar interventions or services. Because of its ability to control internal and external validity, the experimental/control design has significant advantages over all other evaluation designs. However, it is not without potential problems including:
78
Chapter 4
Exhibit 4-4 Examples of Matched Pairs Example 1: “Reinforcement versus Relationship Therapy for Schizophrenics" (Marks, Sonoda, & Schalock, 1968). Study's purpose: To evaluate the effects of reinforcement versus relationship therapy on a group of 22 persons with chronic schizophrenia. Matching procedure: Before the study began, ward staff rated each person on the Hospital Adjustment Scale (HAS). On the basis of the total HAS adjustment scores, 11 matched pairs were selected. One member of each pair was placed randomly in the group which started with reinforcement therapy and the other member of the pair was assigned to the relationship group. Each person received during the course of the study both therapies (10–13 weeks on each), and both members of the pair had equal number of interviews. Example 2: "Effects of Different Training Environments on the Acquisition of Community Living Skills" (Schalock et. al., 1984). Study's purpose: To compare behavioral skill acquisition rates of 10 matched pairs of persons with mental retardation who received individualized, prescriptive programming for one year in one of two training environments: their own apartments or a group home.
Matching procedures: Persons were matched on age, tested intelligence, duration of community living skills training, total skills on a standardized community living skills screening test, medication history, and the number of recorded negative behavior incidents. Example 3: “Skill Acquisition among Matched Samples of Institutionalized and Community-Based Persons with Mental Retardation" (Eastwood & Fisher, 1988). Study's purpose: To test the hypothesis that community placement would have a positive effect on client skin acquisition among matched institutional and community samples. Matching procedure: Community residents were matched on seven variables (age, gender, level of mental retardation, secondary disability, selfpreservation, mobility, and visual impairment) based on four criteria: (1) appeared to be operative in the process of selecting persons for community placement; (2) correlated significantly with outcome measures used in the evaluation of the placement program; (3) had been shown to be related to similar outcome measures in earlier studies; and (4) was considered to affect the relocated residents' rate of adjustment to life in the community.
79
Impact Evaluation
Exhibit 4-5 Evaluation of an Outcomes-Based Staff Training Program Study's purpose: Most training evaluations use four potential levels of outcomes including (Kirkpatrick, 1967): (1) reactive measures that indicate the participant's liking or feeling for the training; (2) learning measures that test retention of training material and indicate the extent to which new ability or knowledge is acquired; (3) behavioral measures, such as performance evaluations, that indicate the extent to which the training transfers to the job; and (4) results measures that show whether broad organizational goals are achieved. The purpose of this study (Van Gelder, Gold, & Schalock, 1996) was to evaluate each outcome level, using a matched pairs design for evaluating outcomes. Reactive measures: Each participant completed a questionnaire at the conclusion of the workshop asking for an evaluation of the materials and techniques. The questionnaire's items addressed the issues of organization and presentation of the materials, helpfulness of ideas and information presented, usefulness of audiovisual materials and handouts, and the utility of the information presented. Learning measures: Each participant completed a questionnaire to assess their familiarity and knowledge about seven topics covered in the workshop. The questionnaire was given before and immediately after the workshop. Each of the 34 items was scored on a four-point Likert scale. Behavioral measures: This part of
the evaluation involved comparing onthe-job performance of the workshop participants to matched controls who had not attended the workshop. Behaviors evaluated reflected the logical outcomes from the topics covered in the workshop. Results measures: Administrators of each of the participating programs completed on a pre/post basis the Enhancing Employment Opportunities Program Planning and Conversion Guide (Calkins et al., 1990). The nine critical agency-change functions evaluated included: philosophy, program and resources, program practices, program evaluation, person/job match, employer expectations, systems interface, natural environments/supports, and quality of work life. The “pre” evaluation was done the first day of the workshop and the “post” five months later. The evaluation was done by the same person both times. Results: 1. Learner reactions were positive. 2. Program managers indicated a significant increase in knowledge in 26 of the 34 subareas surveyed. 3. Statistically significant impacts related to the quality of staff performance rather than the presence of specific skills. 4. Statistically significant organizational changes occurred in philosophy, program evaluation, natural environments, and employer expectations.
80
Chapter 4
It is feasible only when the demand for program services exceeds available resources. If this is the case, then some potential service recipients can be put on a waiting list and thereby serve as “controls.” It is risky if the service recipients are known to the “controls.” This situation might lead to what Cook and Campbell (1979) call “resentful demoralization” among control group members, with potentially significant diminished outcomes. It has the side effect of dropouts. Ideally, experimental/control group size should be equivalent; thus one is faced with how to handle the data from the dropouts. One might unknowingly suggest that all you have to do is replace the ones who drop out with other persons. Unfortunately, this simple solution causes considerable problems regarding the effects of maturation, history, and program duration. My suggestion is that if some recipients do drop out, conduct an attrition analysis (see Chapter 8) to determine whether those who drop out (or on whom one cannot obtain longitudinal follow-up data) are significantly different from those remaining. Assuming that the above problems are either nonexistent or can be resolved, and that the program's context allows the formation of experimental control comparison conditions, then the experimental/control evaluation design is excellent to use in impact evaluation. An example is found in Exhibit 4-6. Two variants of the experimental/control design that are used frequently in impact evaluation are: (1) the nonequivalent control group design; and (2) time series designs. The nonequivalent control group design includes a time series component (that is baseline, then treatment/intervention), along with a “control group” that is not exposed to the treatment. The essence of the nonequivalent control group design is that a comparable group of subjects is chosen and observed for the same period as the group for which the treatment or intervention is introduced. The control group is nonequivalent because it comes from a similar program, intervention, or service. A study reported by Fiedler and his colleagues (1984) illustrates such a design. The evaluation investigated the impact of a training program on productivity and safety in the silver mining industry. Records of mine safety were examined for three years prior to the introduction of an organization development program, which developed procedures and promoted mine safety workers to the rank of supervisor. Also a financial bonus was developed to be distributed to the members of mining shifts with the best safety records. Data were then collected postintervention. A second mine was included as a nonequivalent control group. Data were collected on mine safety at this mine for the same period as with the first. The difference between the two mines
81
Impact Evaluation
Exhibit 4-6 Example of Experimental/Control Design Ramey and Landesman-Ramey (1992) ditions or the control group. Children used an experimental/control design in the control group received free in two separate studies to evaluate the health and social services. effects of early educational intervention programs designed to prevent Study 2 (Infant Health and Developmental retardation and to improve ment Program): This study involved school readiness. one experimental and one control group. The target group was infants Study 1 (Project CARE): This study in- born prematurely and at low birth volved two experimental treatments weight. The procedure involved asand one control group. The study was signing children and families randesigned to study home-based early in- domly to receive either home- or centervention (wherein mothers learned ter-based early intervention (birth to more about how to provide good de- age three) or control services (addivelopmental stimulation for their in- tional medical and social services that fants and toddlers) versus center-based the families ordinarily would not have early intervention (daily early educa- received). tion intervention) when compared to a control group. The procedure involved randomly assigning all children Adapted with permission from Ramey and to either one of the two treatment con- Landesman-Ramey (1992).
was the organization development program. Initial results indicated clearly that the frequency of mine accidents at the first mine was consistently above the frequency for the second mine before the training program; after intervention, the two mines reversed positions. Although the nonequivalent control group design allows one to make comparisons one might not ordinarily be able to make, there are two potential weaknesses to the design. First, the validity of the design is compromised if the two groups differ on important variables before the evaluation begins. Second, if either group is selected on the basis of extreme scores on the pretest or condition, then any shift of scores from pretest to posttest toward the less extreme values may be due to regression toward the mean rather than to the effect of the treatment or intervention. The second variant of the experimental/control design is referred to collectively as time series designs or “quasi-experimental designs.” Two types of
82
Chapter 4
such designs have been used in effectiveness evaluation: time series and interrupted time series (Campbell & Stanley, 1963). In time series designs, one makes several observations of behavior over time prior to the intervention and again immediately afterward. For example, one might measure children’s school performance on a weekly basis and then introduce a new teaching program. Following the introduction of the new program one would again measure performance on a weekly basis. A comparison is then made between preintervention and postintervention performance. In the interrupted time series design, one charts changes in behavior as a function of some naturally occurring event (such as introducing a new law or program) rather than manipulating an independent variable. In the interpreted time series design, the naturally occurring event is a quasi-independent variable. As with the first time series design, one makes comparisons of behavior prior to and after the participants are exposed to the naturally occurring event. One advantage to quasi-experimental designs is that they allow one to evaluate the impact of a quasi-independent variable under naturally occurring conditions. In those cases where one manipulates the independent variable or even simply takes advantage of a naturally occurring event, one may be able to establish clear, causal relationships among variables. However, quasiexperimental evaluation designs do have weaknesses, the most important of which is not having control over the variables influencing the outcomes, therefore weakening internal validity. In summary, impact evaluation involves comparison conditions. One or more of the six outcome-based evaluation designs just summarized is used to provide these comparison conditions. Each of these evaluation designs has strengths and weaknesses. Those that are the easiest to do, such as person-asown comparison or pre/post change comparison evaluations, are also the weakest in terms of certainty, precision, comparability, and generalizability. Those that are the hardest to form, such as matched pairs (cohorts) or experimental/control, make the fewest assumptions about comparability and have the highest certainty, precision and generalizability. Thus, the evaluator’s dilemma is always balancing the need for certainty, precision, comparability, and generalizability with the availability of valid comparison groups or conditions. We will see how this dilemma plays out in two impact evaluation examples discussed in a following section of the chapter. But first we will focus on the steps involved in impact evaluation.
Steps Involved in Impact Evaluation The five steps involved in conducting an impact analysis are summarized in Table 4.2. They begin by stating the purpose of the evaluation and the ques-
Impact Evaluation
83
tions asked, continuing with establishing and describing the comparison conditions, describing the core data sets and their measurement, summarizing the major results, and discussing these results and their implications. As with the previous chapter, a case study approach will be used to discuss the critical components and process steps involved in impact evaluation. Two studies will be reviewed–one reasonably simple and the other much more involved. These studies were selected on the basis of meeting the following six criteria: (1) the definition and approach to impact evaluation is consistent with that presented in the text; (2) there is a clearly described evaluation design; (3) there are clear operational definitions of the core data sets; (4) person-referenced outcomes were used that meet the five criteria of: being valued by the person, multidimensional, objective and measurable, connected logically to the program, and evaluated longitudinally; (5) data analysis reflects an elaboration on the statistical analyses discussed in Chapter 8; and (6) the evaluation was published in an edited journal or book.
Study 1: The Impact of Different Training Environments Purpose/Questions Asked The purpose of this evaluation (“Effects of Different Training Environments on the Acquisition of Community Living Skills”; Schalock, Gadwood, & Perry, 1984) was to determine the impact on behavioral skill acquisition rates of 10 matched pairs of persons with mental retardation who received individualized, prescriptive programming for one year in one of two training environments: their own apartments or a group home.
84
Chapter 4
Comparison Condition Ten matched pairs were selected from among the 40 clients within two community living skills training programs. These two training programs had equivalent programmatic philosophy, assessment techniques, prescriptive programming techniques, staff-client ratios, and staff competencies. Ten community living skills instructors with BA degrees in social science were involved. Five provided training in staffed apartments; five within the center-based program. All staff had demonstrated 22 prescriptive programming competencies at the end of their initial two-day in-service training, and again on a competency probe conducted just before the study. The 14 program-writing competencies involved writing measurable behavioral objectives, demonstrating content and process task analyses, utilizing appropriate reinforcement and correction procedures, and specifying client data to be recorded. The eight program-conducting competencies involved following the prescribed strategies as outlined on the written program sheet, actively prompting and correcting if necessary, and recording training data during the training session. Each staff was rated (five-point likert scale) on each competency during the competency probe. Clients were matched on gender, age, IQ (Wechsler Full Scale), duration of prior community living skills training, skill level on the Community Living Skills Screening Test, medication history, and number of recorded negative behavior incidents. Clients and staff within the two training programs were essentially equivalent before the analysis began.
Core Data Sets and Their Measurement Recipient Characteristics
The evaluation involved 20 adult clients currently enrolled in the two training programs described above. The 10 females and 10 males averaged 31 years of age with an average IQ (Wechsler Full Scale) of 51, and had been enrolled in the community living training program on average for 23 months. Each client was assessed independently by two instructional staff prior to the study on the criterion-referenced Community Living Skills Screening Test. Across the 10 behavioral domains assessed by the test, interobserver reliability coefficients averaged .85, generalizability coefficients .84, and test-retest reliabilities .87. Core Service Functions
Clients lived in either a group home or a staffed apartment during the oneyear study. A staffed apartment was a residential quadplex with three clients
Impact Evaluation
85
and one part-time staff member who provided general supervision in the evenings and on weekends. Group homes were renovated family homes that provided room, board, and live-in staff supervision for 8 to 10 clients. They were not used for systematic training. During the study, training occurred within either a current-living or center-based training environment. Training in the current-living environment involved one-to-one instruction by community living instructors in the client’s individual apartment within the staffed apartment quadplex; center-based training occurred in a large group home adjacent to the adult developmental center that provided the facilities for one-to-one instruction in the same programmatic areas. Data from the criterion-referenced Community Living Skills Screening Test were used to develop the client’s annual Individual Program Plan (IPP). The IPP specified the specific behavioral skills within each of the behavioral domains on which the person was to receive prescriptive programming the next year. These prescriptive programs were carried out within the respective environment (current-living or center-based) for one year. The pass criterion for each prescriptive program was that stated in the criterion-referenced test. Training in both settings averaged six hours per day. There was a 20% turnover in instructional staff during the year. All replacement staff received in-service training and demonstrated the required prescriptive program-writing and program-conducting competencies before their involvement with the clients. Cost Estimates
Cost estimates were made on the basis of the number of training units received by the participants. The costs were the same for all participants regardless of the training location. Person-Referenced Outcomes
The major outcome variable was the number of behavioral skills acquired that met the pass criterion. Skill maintenance was evaluated by reassessing each person on the Community Living Skills Screening Test one year after the beginning of the analysis period.
Results Persons receiving training in their staffed apartments gained significantly more (p < .05 in all comparisons) community living skills in 7 of the 10 skill training areas than the center-based group. These differences are summarized in Exhibit 4-7. The “estimated impact” is expressed as the difference between the
86
Chapter 4
two groups’ average scores on each behavioral domain. A significant impact (that is, a statistically significant difference between the two means) is denoted by an asterisk (*).
Discussion of Results and Their Implications This impact evaluation found that significantly more community living skills were acquired when skill training occurred in the client’s living environment than when in a center-based training facility; clients acquired more than three times as many behavioral skills when instructed in their own apartments. Equally important, the skills were maintained, as reflected in skill profile stability. These findings demonstrate that transfer of stimulus control is increased when an intervention brings appropriate responses under control of training stimuli similar to those found in the transfer setting. With training in the natural environment, skill generalization is enhanced, in part because the training stimulus conditions are the same as those encountered in the natural environment. There are at least two methodological problems with this impact evaluation. One relates to using gain scores as the major outcome variable. Potential weaknesses include the presence of artifacts, a possible regression to the mean, or a potential plateau effect. Attempts to overcome these weaknesses were made by using multiple outcomes (skill acquisition and stability), independent evaluators, and an assessment instrument with demonstrated reliability and predictive validity. The second problem related to the lack of random assignment to the two training environments. This problem raises the possibility of regression toward different means, thereby destroying any representativeness of the original group. The analysis represents a common circumstance associated with impact evaluation in which the analyst is not in a position to implement a truly randomized design. Governmental regulations and political, community, financial, and practical considerations all affect decision-making concerning sample selection. Under such circumstances, what are the choices? They appear to be either to make no attempt at empirical evaluation or to find a method that without randomization provides a reasonably sound basis for inference about effects. In the evaluation just described, a three-step multivariate matching technique was used, including: (1) identifying distinct subgroups within the larger group that consisted of individuals with similar responses to particular sets of stimuli; (2) matching service recipients on a pairwise basis by an interdisciplinary team; and (3) applying a variety of tests to the matched pairs to demonstrate equivalence of the matching. At this point, the same selection procedure is completed, and from then on the assumption is that it has produced equivalent experimental and control groups; and the testing
87
Impact Evaluation
Exhibit 4-7 Results of Impact Analysis of Different Training Environments Number of Community Living Skills Gained Training Environment Means Behavioral domain
Apartment
Center
Estimated impact
2.4 1.6 1.4 0.5 0.7 2.2 0.6
0.7 0.5 0.3 0.4 0.8 0.5 0.2
1.7* 1.1* 1.1* 0.1 0.1 1.7* 0.4
1.9
0.3
1.6*
0.6
0.1
0.5*
1.5 13.4
0.7 4.5
0.8* 8.9*
Personal maintenance Dressing/Clothing care Eating/Food management Social behavior Expressive skills Home living Money skills Time awareness and utilization Recreation and leisure skills Community awareness and utilization Total *p < .05 (matched pairs sample t-test)
for impact of an intervention proceeds along the same lines as it would in studies in which the sample members had been randomly assigned to the experimental and control groups. (Sherwood, Morris, & Sherwood, 1975, p. 195)
Study 2: The Impact of Transitional Employment Programs Purpose/Questions Asked The purpose of this evaluation (“Impacts of Transitional Employment For Mentally Retarded Young Adults: Results of the Short Term Employment Training Strategies (STETS) Demonstration”; Kerachsky et. al., 1985) was to evaluate the impact of transitional employment programs in enhancing the economic and social independence of young adults with mental retardation. The STETS analysis was designed to address five basic questions: (1) does STETS improve the labor-market performance of participants; (2) does STETS participation help individuals lead more normal lifestyles; (3) in what ways do
88
Chapter 4
the characteristics and experiences of participants influence the effectiveness of STETS; (4) does STETS affect the use of alternative programs by participants; and (5) do the benefits of STETS exceed the costs?
Comparison Condition An experimental/control design was used in which individuals were assigned randomly into STETS/non-STETS groups. Eligibility criteria were established for two purposes: to limit program participation to those who could benefit potentially from program services and to encourage projects to recruit and enroll a broad range of clients in order to provide an adequate information base for examining the suitability of STETS for a diverse population. Each client met the following criteria: (1) between 18 and 24 years of age, inclusive; (2) mental retardation in the moderate, mild, or lower borderline ranges; (3) no unsubsidized full-time employment of six or more months in the two years preceding intake, and no unsubsidized employment of more than 10 hours per week at the time of intake into the program; and (4) no secondary disability that would make on-the-job training for competitive employment impractical.
Core Data Sets and Their Measurement Recipient Characteristics
The sample consisted of 437 individuals: 226 experimentals and 211 controls: 59% were male, 50% minority ethnic/social groups, 60% of the measured IQs were in the mild range and 12% in the moderate range of mental retardation, 80% lived with parents and 10% in supervised settings, fewer than 30% could manage their own finances, two-thirds were using some form of public assistance with prior vocational experiences limited primarily to workshops and activity centers, and one-third had no work experience the two years prior to enrollment. Core Service Functions
STETS involved three sequential phases. Phase I involved assessment and workreadiness training. This phase combined training and support services in a low-stress environment, the goal of which was to help participants begin to develop the basic work habits, skills, and attitudes necessary for placement into more demanding work settings. This preliminary stage, which was limited to 500 hours of paid employment, occurred in either a sheltered workshop or nonsheltered work setting; in all cases, the participants’ wages were paid by the project.
Impact Evaluation
89
Phase II involved a period of on-the-job training in local firms and agencies. During this stage, participants were placed in nonsheltered positions that required at least 30 hours of work per week, and in which, over time, the levels of stress and responsibility were to approach those found in competitive jobs. Wages were paid by either the project or the employers or some combination of the two. The STETS program provided workers in Phase II with counseling and other support services, and it helped the line supervisors at the host company conduct the training and necessary monitoring activities. Phase III, which included postplacement support services, began after participants had completed Phase II training and were performing their job independently. The purpose of this phase of program services was to ensure an orderly transition to work by tracking the progress of participants, by providing up to six months of postplacement support services, and, if necessary, by developing linkages with other local service agencies. Cost Estimates
The cost accounting framework disaggregated costs into three components: the operating costs of the projects, compensation paid to participants while they were in Phases I or II activities, and central administrative costs. The average STETS service package cost an average of $6,200 per participant.
Person-Referenced Outcomes
The STETS impact evaluation focused on three general outcome categories and 11 specific variables: (1) employment (percentage employed in regular job or any paid job; average weekly earnings in regular job; and average weekly earnings in any paid job); (2) training and schooling (percentage in any training or any schooling); and (3) income sources (percentage receiving SSI or SSDI; average monthly income from SSI or SSDI; percentage receiving any cash transfers; average monthly income from case transfers; and average weekly personal income). These person-referenced outcomes were collected at months 6,15 and 22 of the project.
Results The estimated program impact on the key outcome measures are summarized in Exhibit 4-8. Note that the significance of the impacts (denoted by asterisks in the table) depends upon the evaluation period. Employment in regular jobs was significantly greater for experimental group members than
90
Chapter 4
Impact Evaluation
91
92
Chapter 4
for control group members, and by month 22, experimental were an average of 62% more likely than controls to be employed in a regular job. A significant increase in average weekly earnings was also seen in the experimental group, as was a significant decrease in the percentage of experimental group members in training.
Discussion of Results and Their Implications An interesting finding of the evaluation was that the significant impacts related to school and some transfer uses disappeared by the end of the demonstration period, which reinforces the need to collect data longitudinally. Because of these significant impacts on the incidence of job holdings, hours worked, and earnings, it might be expected that the STETS demonstration would also have impacts on other areas of participants’ lives–especially their overall economic status, their independence in financial management and living arrangement, their use of formal and informal services, and their general level of involvement in regular, productive activities. The expected direction and duration of the effects of the program on economic status, measures of independence, and lifestyle were not always clear. Several factors in particular clouded the results of the evaluation of these impacts. First, the increased earnings observed for experimentals appeared to be partially offset by decreases in transfer benefits and other sources of income, thereby diluting the overall financial impacts of the program. Second, although STETS may have had impacts on financial management skills and independent living arrangements, those impacts may have followed others with a considerable time delay, in which case the 22-month observation period was too short to observe them. Third, although the program generated increased earnings for sample members, those increases might not have been enough to enable them to live independent lifestyles, especially in larger cities. Finally, parents and counselors might have wished to see more concrete and stable earnings gains before they were willing to give the sample members greater independence. Despite these limitations in the ability to detect long-term effects, some relatively small intermediate program effects were observed on such measures of independence, overall economic status, services received from community agencies, and involvement in activities oriented toward employment. However, these effects generally declined to a great extent in the postprogram period, seemingly due to two factors. First, in the later observation periods, either the direct effects of STETS participation on such outcomes as total income, service utilization, and level of inactivity were no longer evident; or, where they were evident (as with personal income), the estimated effects were not statistically significant. Second, while the STETS experience provided a
93
Impact Evaluation
head start toward independence for many sample members, those who did not participate in the program (controls) also began to achieve similar levels soon afterward. Additionally, although certain subgroups (for example, those with a moderate level of retardation) did seem to continue to benefit from the program, even those who were more likely to achieve and maintain positive effects from their experience in STETS exhibited relatively low levels of independence by the end of the observation period. Thus, given the short postprogram period for which data were available, one cannot tell whether a more economically and socially independent lifestyle would eventually be achieved by the participants, or whether the effects of participating in a transitional employment program would become more evident at a later period. This last statement reflects the reality of a frequent experience in even a large, well-funded impact evaluation such as STETS: despite rigorous experimental methodology, there can still be uncertainty in the impact measures obtained. As is frequently the case in such evaluations, the comparison between experimental and control participants is weakened because some members of the control group obtained assistance in finding employment from other sources. Consequently, the difference in outcomes between the experimental and control groups probably underestimates the full impact of the STETS demonstration on earnings.
Summary In summary, impact evaluation focuses on a program’s impacts and is used to determine whether these impacts can be attributed with reasonable certainty to the intervention being studied. Impact evaluation asks a very basic question: can the program’s impacts make a difference compared to either no program or an alternative program? It involves comparing outcomes from two comparable conditions or groups with statistically significant differences between these outcomes representing the program’s impacts. There are a number of additional key points to remember about impact evaluation, It is best to use an experimental/control evaluation design if at all possible, for that will increase the analysis’s certainty, precision, comparability, and generalizability. The data requirements for an impact evaluation are stringent, with special emphasis on person-referenced outcomes (from which impacts will be determined) and core service functions (for establishing the comparison conditions and cost estimates). Outcomes are not the same as impacts. At their simplest level, impacts
94
Chapter 4
represent the significant differences between the mean scores on person-referenced outcomes for the two comparison conditions (see Figure 4.1). Impact evaluation directly addresses accountability issues. The evaluation techniques and strategies described in this chapter can be used to answer questions key stakeholders frequently ask about whether the education, health care, or social service program being evaluated made a difference in the lives of program recipients. As a type of outcome-based evaluation, its results might appear to be less applicable for continuous improvement, which we saw to be the case in program and effectiveness evaluation. However, by identifying those factors which account for the differences between programs, or those variables which affect successful program outcomes, the results of the impact evaluation can be applied directly to program/service delivery improvement. Another advantage of impact evaluation is that it frequently provides the framework and basis for policy evaluation–a topic we turn to next.
Study Questions 1. What is the difference between outcomes and impacts (see Figure 4.1)? Relate this difference to the concept of a comparison condition and the three outcomebased evaluation designs most frequently used in impact evaluation to constitute the comparison condition. 2. Assume that you are a special education policymaker and that you plan to do an impact evaluation of your state’s special education programs. Outline the specific steps and procedures involved in such an impact evaluation, following the five steps summarized in Table 4.2. 3. Assume that you are a state director of mental health and that you plan to do an impact evaluation of your state’s active community treatment program. Outline the specific steps and procedures involved in such an evaluation, following the five steps summarized in Table 4.2. 4. Assume that you are a state director of vocational rehabilitation and that you plan to do an impact evaluation of your state’s supported employment program. Outline the specific steps and procedures involved in the evaluation, following the five steps summarized in Table 4.2. 5. Compare the steps and procedures involved in your impact evaluations with those listed for your effectiveness evaluation (see Chapter 3, questions 2, 3, and 5). What additional questions are you able to answer on the basis of your impact evaluation? Generally speaking, what did these additional answers “cost you”? 6. Review Figure 4.1. Select an alternative program to the one you use or one that you are familiar with, and compare the persons served and the services provided by the alternate. Why does this exercise result in a realistic hesitancy to compare program outcomes?
Impact Evaluation
95
7. What do you do if there is no alternate program and you still need (or want) to
complete an impact evaluation? 8. Review Exhibit 4-7. What is meant by “estimated impact,” and how are those
impacts determined? 9. Review Exhibit 4-8. What general trend across time do you see for the program’s
impacts? What are the implications for demonstrating the program’s accountability? 10. Why is it that impact evaluation results are often equivocal? Despite this, why is impact evaluation so important?
Additional Readings Blanck, P. D. (1995). Assessing five years of employment integration and economic opportunity under the Americans with Disabilities Act. Mental and Physical Law Reporter, 19, 384-392. Campbell, J. A. (1992). Single subject designs for treatment planning and evaluation. Administration and Policy in Mental Health, 19(5), 335-343. Frey, S. J. & Dougherty, D. (1993). An inconsistent picture: A compilation of analyses of economic impact of competing approaches to health care reform by experts and stakeholders. Journal of the American Medical Association, 270(17), 2030-2042. Howe, J., Homer, R.H., & Newton, J. S. (1998). Comparison of supported living and traditional residential services in the State of Oregon. Mental Retardation, 36(1), 1-11. Kane, R. L, Bartlett, J., & Potthoff, S. (1995). Building an empirically based outcomes information system for managed mental health care. Psychiatric Services, 46(5), 459-461. McFarlane, W. R., Dushay, R. A., Stastny, P., Deakins, S. M., & Link, B. (1996). A comparison of two levels of family-aided assertive community treatment. Psychiatric Services, 47(7), 223243.
5 Policy Evaluation 97
OVERVIEW An Overview of Benefit-Cost Analysis Policy Evaluation Model and Process Steps Model Data Sets Process Steps Example 1: Families and Disability Example 2: Welfare-to-Work Paths and Barriers Example 3: Implementation of the 1992 Vocational Rehabilitation Amendments Guidelines Summary Study Questions Additional Readings
99 102 102 103 107 109 111 114 116 118 120 121
Tell me what you pay attention to and I will tell you who you are. JOSE ORTEGA Y GASSET
Overview Policy evaluation determines policy outcomes in reference to their equity, efficiency, and effectiveness. Public policy is whatever governments choose to do or not do. As discussed by Dye (1984) governments do many things including distributing a great variety of symbolic rewards and material services to members of society, regulating behavior, organizing bureaucracies, distributing benefits, or extracting taxes. Societal problems and values often are the catalysts for public policies. Examples include policy problems such as how to deal with education, unem97
98
Chapter 5
ployment, poverty, crime, health care, rehabilitation, and economic security. Public policy goals refer to specific outcomes such as increasing effectiveness by reducing unemployment, increasing efficiency by reducing the tax costs of public aid (such as in the current welfare reform movement), increasing the equitable distribution of crime prevention and health care monies, increasing public participation in freedom of communication and the reforming of government structures, increasing the efficiency and effectiveness of schools, reducing health care costs, or increasing procedural due process in the administration of government programs and criminal prosecution (Cohen, 1986; Fisher & Forester, 1987; Nagel, 1990). Figure 5.1 summarizes the five stages of the public policy process. As discussed by Tannahill (1994), these stages are: agenda building, policy formulation, policy adoption, policy implementation, and policy evaluation. Agenda building: The process through which issues become matters for public concern and government action is through an official policy agenda. Key stakeholders involved in this process include public officials, interest groups, academic experts, private citizens, and the mass media. Policy formulation: The development of courses of action for dealing with the problems from the policy agenda. Courses of action typically involve both government officials and political actors outside of government, such as interest groups, political parties, and the media. Policy adoption: The official decision of a government body to accept a particular policy and put it into effect. In addition to formal legislation and adoption through the legislative process, policy can also be adopted through the judicial system, regulatory agencies, and the president (through executive order). Policy implementation: The process of carrying out the policy. This typically involves government officials, individuals, and groups outside the government such as private business, consumers, and the courts. Policy evaluation: Typically, the assessment of policy outcomes involves questions of equity (similarly situated people are treated equally), efficiency (a comparison of a policy’s costs with the benefits it provides), and effectiveness (the extent to which a policy achieves its goals). The impact of policy evaluation on the policy process is referred to as feedback, which provides useful information to key players in each stage of the process. Since benefit-cost analysis is increasingly being used in policy evaluation to address equity and efficiency issues, the next section of this chapter presents a brief overview of one approach to benefit-cost analysis described in
99
Policy Evaluation
detail elsewhere (Schalock, 1995a; Schalock & Thornton, 1988). The intent of the overview is to sensitize the reader to the potential and the complexity of benefit-cost analysis. Following that, a Policy Evaluation Model is presented that builds on the work of others and encompasses key concepts discussed thus far in the text: methodological pluralism, multiple perspectives, key stakeholders, and types of outcome-based evaluations. Three public policy examples are then presented, using the model as a framework for the public policy evaluation. The chapter concludes with five guidelines regarding policy evaluation. Throughout the chapter, the reader is advised to keep the following statement by Wildavsky in mind. Policy is one activity for which there can be no one fixed approach, for policy is synonymous with creativity, which may be stimulated by theory and sharpened by practice, which can be learned but not taught. (1979, p. 3)
An Overview of Benefit-Cost Analysis The purpose of benefit-cost analysis is to determine whether a program’s benefits outweigh its costs. The primary issue addressed in benefit-cost analysis is whether the impacts of a program, intervention, or service are big enough to justify the costs needed to produce them. Because of its major focus, benefit-cost analysis is increasingly being used in policy evaluation. Common uses include assessing the overall success or failure of a policy, helping to determine whether the policy should be continued or modified, assessing the probable results of potential policy changes, or addressing equity issues from different perspectives. Current public policy is based on two fundamental principles: equity and efficiency. Equitable programs contribute to balancing the needs and desires of the various groups in society; whereas efficient programs are those that serve to increase the net value of goods and services available to society. Benefit-cost analysis is a tool developed to help determine whether a program produces effects that justify the costs incurred to operate the program. Thus, benefit-cost analysis depends upon: the availability of cost estimates; benefits to program participants; and impact statements, which are the statistically
100
Chapter 5
significant mean differences of costs and benefits between the programs, interventions, or services being compared. Part of understanding benefit-cost analysis is understanding its terminology. Key terms include: Analytical perspective. Benefit-cost analysis addresses the perspective of different groups in society that are affected by a program, service, or intervention. Three common perspectives include the participant, the rest of society (that is, the “taxpayer”), and social (that is, “society as a whole”) which includes the sum of benefits and costs generated from the previous two perspectives. The inclusion of these three perspectives in a benefit-cost analysis is imperative since a program effect (such as taxes withheld) can be a benefit to some and at the same time a cost to others. Benefits. Outcomes that accrue to program participants such as increased wages, regular job, or reduced use of alternative programs. Specific examples include lives saved, improved health, increased productivity, increased jobs, increased skills, and increased assessed quality of life. Costs. Expenditures associated with a particular program, intervention, or service such as program expense, forgone market output (that is opportunity costs), or increased use of complementary programs. Efficiency. The extent to which there is an increase in the net value of goods and services available to society (that is, being productive with minimum waste). Equity. The balance between the needs and desires of the various groups in society (that is, fairness). Impacts. The significant mean differences on selected cost and benefit measures between the groups or conditions being compared. Monetized. Benefits to which a monetary value can be assigned (for example, wages, taxes paid, reduced public taxes). Nonmonetized. Benefits to which a monetary value cannot be assigned (for example, improved quality of life or increased satisfaction). The suggested approach to benefit-cost analysis as a part of policy evaluation reflects the current trend toward using both monetized and nonmonetized benefits to evaluate efficiency issues. This approach also allows one to examine which groups in society gain from a program and which groups pay, a concept referred to as the analytical perspective. Any public policy or program will affect many groups. For example, a stroke rehabilitation program will clearly affect participating consumers and their families and may have long-term effects on agencies and employers in the community. It will
Policy Evaluation
101
also have an impact on government budgets and hence indirectly affect taxpayers. Each of these groups has a perspective on the policy or program, and each of these perspectives will have relevancy to decision making. Thus, equity issues need to be addressed in a benefit-cost analysis through the perspectives of specific groups affected by the policy or program. The three most appropriate analytical perspectives include participant, the rest of society (“taxpayer”), and social (“society as a whole”). These three perspectives, along with their major concerns and an example of each are summarized in Table 5.1. The examination of equity is particularly important for education, health care, and social service programs, since a goal of many of these programs is to increase social equity by reallocating resources or equalizing opportunities. In fact, to many consumers, equity concerns dominate efficiency concerns. Benefit-cost analysis is also based on the premise that one needs to look at all benefits and costs of a program, even though one may be able to monetize only some. This broader perspective on benefit-cost analysis leads to a more complete analysis and also minimizes the tendency to reduce benefitcost analysis to a simple ratio of benefits to cost. Thus, benefit-cost analysis can be considered a process for systematically sorting through the available evidence of the multiple costs and benefits associated with education, health care, or social service programs, rather than relying on any single estimate of value or benefit to cost ratio. Even though benefit-cost analysis is a powerful tool for evaluating the benefits and costs of education, health care, and social service programs, there are a number of controversial issues surrounding its use, including: difficulty in establishing the alternative or counterfactual comparison group(s) against which the program is being compared; no consensus on the time frame for estimating benefit streams; its numerous assumptions and estimates regarding costs and impacts; considerable controversy involved in estimating dollar values for program effects; methodological problems involved in incorporating intangible effects that are often a central concern of human service programs; and the considerable time and resources needed to complete a thor-
102
Chapter 5
ough benefit-cost analysis (Schalock, 1995a). Despite these concerns and potential problems, the current Zeitgeist requires an understanding and use of benefit-cost analyses, if they are consistent with the questions asked and one’s capability to do the analysis. Readers interested in pursuing the concept of benefit-cost analysis in more detail are referred to sources listed at the end of this chapter. In summary, the approach to benefit-cost analysis just described differs from a cost-efficiency evaluation model, which converts all the program’s benefits and costs into monetary units and calculates a simple benefit-cost ratio by dividing gross benefits by gross costs (Cimera & Rusch, 1999). In contrast, the benefit-cost analyst approach uses both monetized and nonmonetized benefits to evaluate efficiency and equity issues. A key aspect of this approach is that benefit-cost analysis is viewed as a broad process whereby one looks at all the benefits and costs of a program or policy. This view is helpful to various stakeholders and minimizes the tendency to view benefit-cost analysis as a simple ratio of benefits to costs, which is seldom the case in education, health, and social service programs.
Policy Evaluation Model and Process Steps The proposed Policy Evaluation Model described next builds on the work of others. Indeed, there have been a number of approaches to policy evaluation that contain one or more components of the suggested model. Some policy analysts focus on consumer involvement and the demand that stakeholders should be involved in program evaluation and policy evaluation (Cuba & Lincoln, 1989). Specific examples include action research (Weiss, 1988) and empowerment evaluation (Fetterman, 1997). Other analysts suggest the need in policy evaluation to use methodological pluralism and multiple, discoveryoriented methods (Dennis, Fetterman, & Sechrest, 1994; Scriven, 1991). Still other policy analysts stress the need for contextual analysis. Weiss (1987) suggests, for example, that “policy evaluation needs to be sophisticated about the shape and contour of policy issues including the entire policy context surrounding the evaluated policy and the basic goals of a policy” (p. 42). Most evaluators stress the need for formative feedback to improve the management and performance of agencies and programs created by particular public policies (Wholey, 1985).
Model The Policy Evaluation Model presented in Figure 5.2 is based on four premises. First, policy evaluation from an outcome-based perspective should
Policy Evaluation
103
focus on the same standards discussed throughout the text: performance and values. Performance standards relate to the policy’s effectiveness and efficiency; value standards relate to equity and fidelity to the model or policy. Second, policy analysis can focus on the individual, program (or family), or system. As shown in Figure 5.2, the three concentric circles represent the individual, the program, and the systems level respectively. The “slice of the pie” depicts the notion that in policy evaluation, one needs to obtain outcome data for: (a) the individual (consumer); (b) the program (including potentially the family); and/or (c) the larger service delivery system. Third, multiple methods need to be employed. Figure 5.2 suggests that these methods (and the required data sets discussed next) will vary depending upon the focus: the individual, the program or service, or the larger service delivery system. The fourth premise is that the policy evaluation techniques employed include program, effectiveness, no impact evaluation (Chapters 2–4), or benefit-cost analysis (as just discussed).
Data Sets Individual Level Data Sets
A major purpose of policy evaluation is to validate on the basis of individuals’ experiences and outcomes the policy’s effectiveness, efficiency, and its fidelity to a model. Effectiveness refers to whether the policy achieved its intended goals and objectives; efficiency refers to whether it achieves its intended goals and objectives in a cost-efficient manner; and fidelity to the model refers to whether the policy was implemented as designed or promised. There are a number of specific data sets summarized in Chapter 6 (see Tables 6.2– 6.10) that reflect individual-level outcomes. One data source increasingly used in policy evaluation is focus groups that provide a reasonable and feasible way to obtain individual-referenced experiential and outcome data. The major purpose of using focus groups is to validate at the individual level whether the given policy has produced its desired outcomes and has been implemented consistent with its intended purpose and goals. I believe the advantages of using focus groups for this level of policy evaluation include: identifying the purpose and objectives of the evaluation activity, identifying relevant stakeholders to assist in making judgments about the policy’s intended and real goals and their importance, identifying and organizing the criteria into a meaningful structure for analysis, and assigning importance to each outcome through stakeholder judgments. Other techniques that the interested reader might wish to pursue to obtain individual-referenced data for policy evaluation include multiattribute evaluation (Lewis et al., 1994), empowerment evaluation (Fetterman, 1997), or action research (Weiss, 1988).
104
Chapter 5
Policy Evaluation
105
Program-Level Data Sets
The major need here is for organizations to develop empirically based outcomes information systems. At a minimum, such a system includes five classes of variables (Kane, Bartlett, & Potthoff, 1995): the individual’s baseline status, sociodemographic factors, clinical/educational/functional factors, intervention data, and person-referenced outcomes. The baseline data represent the level of activity for each of the outcome variables at the time the person enters the program or service. Sociodemographic variables include factors such as age, gender, socioeconomic status, and support systems. Clinical (or functional) information includes primary and secondary diagnoses, risk factors, reasons for entering the service or program, and previous program-related involvement. Intervention data relate to descriptions of what was actually done for the individuals receiving the service or program. Outcomes can be addressed at several levels: specific to the presenting problem, general behavior and social adaptation/role status, and functioning in the world at large. The latter include employment, social functioning, participation in community activities, legal problems, perceived health status, emotional health, utilization of medical care, satisfaction with treatment, and quality of life factors scores. Cost factors include unit or total costs. Systems-Level Data Sets
Policy evaluation requires data sets that can be aggregated across large number of persons and programs. These data sets can be obtained via a number of techniques including national surveys, national data sets, and metaanalyses. Increasingly, national databases are coming on line to provide aggregated data to assist in policy evaluation (Committee on National Statistics; ). For example, the New Federalism State Data Base () includes information on the 50 states and the District of Columbia in areas including income security, health, child well-being, demographic, fiscal and political conditions, and social services. The downloadable database is available as a Microsoft Access Version 2.0 database that consists of two parts: tables of data and an application (the Database Viewer) that allows the data to be browsed. There are two kinds of tables: those that describe the data (data dictionary tables) and those that contain data values. Other potentially useful national data sets include: Catalog of Electronic Products , which contains public use files related to aging, births, deaths, health care and health services utilization, health care expen-
106
Chapter 5
ditures, health and nutrition, health promotion, health status, marriage and divorce, maternal and child health, and risk factors. Current Population Survey The Current Population Survey (CPS) is a monthly survey of about 50,000 households conducted by the Bureau of the Census for the Bureau of Labor Statistics. The survey has been conducted for more than 50 years. The CPS is the primary source of information on the labor force characteristics of the U.S. population. The sample is scientifically selected to represent the civilian noninstitutional population. Respondents are interviewed by phone to obtain information about the employment status of each member of the household 15 years of age and older. However, published data focus on those ages 16 and over. The sample provides estimates for the nation as a whole and serves as part of model-based estimates for individual states and other geographic areas. Estimates obtained from the CPS include employment, unemployment, earnings, hours of work, and other indicators. They are available by a variety of demographic characteristics including age, sex, race, marital status, and educational attainment. They are also available by occupation, industry, and class of worker. Supplemental questions to produce estimates on a variety of topics including school enrollment, income, previous work experience, health, employee benefits, and work schedules are also often added to the regular CPS questionnaire. National Education Longitudinal Study (NCES), which contains a national representative study of eighth graders in base year 1988 with follow-up assessments every two years through 1996 <www.ed.gov/ pubs/ncesprograms/longitudinal/about.html>. RSA 911 data tapes. Administrative data collected by Rehabilitation Services Administration for people served in the fiscal year by state rehabilitation service agencies. The data base contains about 600,000 records per year <www.arcat.com/arcatcos/cos08/arc08729.cfm>. National Health Interview Survey (NHIS). Cross-sectional survey using multistage probability design that allows continuous sampling and reliable population estimates. Data sets include self-reported health status, medical conditions, activity limitations, use of medical care, employment status, and demographic characteristics. Available from National Center for Health Statistics, 6525 Belcrest Road, Room Hyattsville, MD 20782. Office of Evaluation Statistics, Social Security Administration. These statistics include total SSI and SSDI recipients who are working and participating in SSA work incentive programs . These national databases can be used for a number of purposes including benchmarks, report cards, and policy evaluation. As we will see in the first example in the following section, the Survey of Income and Program Participation data were used to evaluate current disability policy. Process Steps Policy evaluation from an outcomes-based perspective involves the five process steps summarized in Table 5.2. The first step is a key one: describe the intent of the policy and its context. It is important to understand the goals of the policy being evaluated, for these goals become the basis of later program, effectiveness, or impact evaluation (or benefit-cost analysis). One also needs to describe the intent of the policy evaluation. Finally, one needs to understand the context of the evaluation, for the policy’s context will definitely effect how one approaches the policy evaluation. The second step involves analyzing the policy’s anticipated outcomes. This may be more difficult than it sounds, for frequently public policy focuses
108
Chapter 5
more on process than outcomes. Because of this, over the past 30 years there have been a series of public management reforms that have attempted to increase the outcomes and thereby the accountability of public policy. Examples include management by objectives, zero-based budgeting, reinventing government, and most recently, the Government Performance and Results Act (Wholey, 1997). This reform movement in evaluating public policy from an outcomes perspective is still evolving. The third process step involves aggregating the stated goals into the respective cells of the program evaluation model presented in Figures 1.2 and 2.1. By way of review, the model has three components: evaluation focus (the individual or the organization), evaluative standards (performance measurement or value assessment), and outcome indicators (organization performance or value outcomes; individual performance or value outcomes). Generally, this aggregation will show that most anticipated outcomes from public policy fall into the individual performance or value cells of the model and reflect effectiveness more than impact evaluation. This focus may change in the future. The fourth process step involves evaluating the status of the anticipated outcomes. This step involves methodological pluralism and can include any of the data sets listed in Figure 5.2. The three public analysis examples presented next include the use of national surveys, individual value outcomes, fidelity standards, and effectiveness measures. Other data sets are reflected in the suggested additional readings presented at the end of this chapter. The fifth process step involves providing formative feedback to policy stakeholders. As shown in Figure 5.1, this process step should “feed back” to influence the other steps of the public policy process: agenda building, policy formulation, policy adoption, and policy implementation. This feedback should also focus on how well the policy is doing from the multiple perspectives identified; barriers identified from the level of the person, program, or system; potential use of the information for policy change; and the key components of a “policy system” that link society (with its associated social and economic conditions), the political system (with its institutions, processes, and behaviors), and public policies (education, health care, and social service policies). The following three public policy evaluation examples given in this chapter were selected for a number of reasons. First, they reflect public policy areas with which I am familiar. This is important because public policy evaluation can be quite complex and involve difficult issues and problems for which there are frequently no simple solutions. Thus, it is important for the evaluator to understand the larger contextual issues addressed by the public policy, have some familiarity with its implementation, and be sensitive to its potential multiple outcomes and effects (both direct and indirect). Second, the ex-
109
Policy Evaluation
amples represent stakeholders at the individual, program (or family), and systems levels. Third, the data sets used in the evaluation examples reflect many of those listed in Figure 5.2. Fourth, they indicate how outcome-based evaluation techniques can be used to evaluate public policy. And finally, they reflect a range of public policies: those related to disability (Fujiura, 1998), welfare-to-work (Timmons et al., 1999), and vocational rehabilitation (Whitney-Thomas et al., 1997).
Example 1: Families and Disability Analysis Intent and Context At the outset of the nation’s expansion of its system of community-based services for persons with developmental disabilities in the 1970s, a common paradox was the paucity of information on family care of persons with mental retardation and related developmental disabilities (MR/DD) in the face of near universal acknowledgment of the centrality of home-based supports (Fujiura, 1998). Ironically, we know the least demographically about what is most certainly the largest population: those persons with MR/DD supported at home by the immediate family, other relatives, or benefactors. The family remains a little-understood constituency with respect to nationally based MR/DD public policy. The principal unit of analysis thus far in policy research at the national level has been the individual receiving services. Fujiura states: Though it is widely believed that the immediate family, other relatives, and benefactors play a similar if not larger role in the support of individuals with MR/DD, there are very few empirical data on the most basic of their demographic features. Given the potential size of this cohort, it is unlikely that any facet of MR/DD policy or program development is not in some fashion affected by the demography of family-based support. The capacity of the long-term care system to address future needs is a contemporary example. To the extent that the character and size of the base of potential consumers will affect future demand, then basic demographic data are of central importance. (1998, pp. 225–226)
Anticipated Outcomes Public policy regarding the expansion of services for persons with MR/ DD has focused primarily on the development and expansion of residential and employment options. What has potentially been overlooked in this policy, however, is those who are not “in the system” and what impact this group will have on future needed services and long-term supports. In reference to Figure 5.2, the evaluation focused on the family (that is, “program”), system-level stakeholders, national data sets, and effectiveness evaluation.
110
Chapter 5
Outcome Indicators The policy analysis was designed to establish a demographic profile of Americans with MR/DD supported outside the formal long-term care residential system. Three core questions were addressed: (1) what is the size of the “informal” care system in the United States; (2) to what extent does the aging of the post–World War II generation skew the age structure of the family-based population; and (3) what is the economic status of the family-based population? Answering these questions is important to state and federal MR/ DD policy since there is only limited and patch-work literature on the financial status of families providing home-based support. As a consequence of recent initiatives in reforming health care and welfare at the federal and state levels, basic economic status data are critically important to informed family policy planning (Fujiura, 1998).
Status of Outcome Indicators The policy analysis used the Survey of Income and Program Participation, which is a nationally representative, probability-based survey of economic well-being that has been conducted annually by the U.S. Census Bureau since 1983. The survey includes items on functional limitations, work disability, and disability benefits. In the survey, selected households are interviewed longitudinally at four-month intervals for up to three years. In the present evaluation, the 1990 and 1991 samples were employed. The survey data are represented at the individual, family, and household levels. Estimates were derived for total number of persons with MR/DD and persons with MR/DD living in family households defined by the age, gender, and ethnicity of the household head. Population estimates were computed by summing the weights of persons or households in the sample or subgroups. Weights were calculated by the Census Bureau and represented the inverse of selection probability. Households were compared on income, means-tested income benefits, and age of members. Results of the analysis indicated that the MR/DD households represented a total of 2.63 million individuals with MR/DD. Including the 338,000 individuals living in the long-term care MR/DD system in 1991 (Braddock et al., 1995), there were an estimated 2.97 million Americans with MR/DD–a prevalence rate of 1.2%. Comparison of the .95 confidence intervals indicated that households supporting a member with MR/DD had larger household sizes, younger household ages averaged across members, lower aggregate income, and greater dependence on means-tested income support than did U.S. households generally. Also, the percentage of households living below the official poverty level was significantly higher among MR/DD households.
Policy Evaluation
111
Formative Feedback The national estimates obtained in this policy evaluation indicate that the nation’s long-term residential care systems serve only a small proportion of all persons with MR/DD. This is despite the fact that the total number of persons served across all models of residential care increased approximately 34% since 1977. Thus, application of even the most conservative prevalence estimates indicates that total capacity represents only a small proportion of the total out-of-home cohort (Fujiura, 1998). The juxtaposition of three facts–dramatic increases in public sector spending, low to moderate growth in system capacity, and a large home-based population–emphasizes at least two critical questions regarding public policy and persons with MR/DD. First, in the present analysis, over one-quarter of those individuals living in a family setting were in households headed by a family member 60 years of age or older. Another 35 % were adults in households of middle-age caretakers, for whom transition issues have become important considerations. The size of the cohort suggests significant and as yet unrealized demands on the states’ service systems. Second, estimates of economic status suggest that family households may be disproportionately impacted by pending revisions in state welfare and other programmatic supports, particularly the single-parent and minority households. Important policy-related demographic features of family-based care are gender and ethnic differences in economic status, principally characterized by the female-headed household. How this discrepancy between need and access to services and supports is resolved in the coming years will depend largely on the resolution of state and federal welfare reform initiatives (Fujiura, 1998). It is to this issue that we now turn in the second example.
Example 2: Welfare-to-Work Paths and Barriers Analysis Intent and Context On August 22, 1996, President Clinton signed into law the Personal Responsibility and Work Opportunity Reconciliation Act (PRWORA), which abolished AFDC as a federal entitlement. The stated goals under PRWORA were to encourage recipients of welfare to improve their economic status by returning to or entering employment. States received block grants to design their own reforms within specified parameters in the PRWORA and the Temporary Assistance to Needy Family (TANF) block grant. One of the major consequences of the new legislation is that states must prepare for the employment of a much larger (and potentially far needier) segment of the population than before. Many in this group have low skills and face numerous barriers.
112
Chapter 5
The potential employment impact of welfare-to-work programs has been explored through a series of evaluations, which show generally that those programs that encourage, help, or require welfare recipients to find jobs, while also providing various supportive services, can lead to gains in employment and earnings as well as reductions in welfare receipt (Greenberg & Wiseman, 1992; Seninger, 1998). What is less clear is the impact that this public policy has on individuals with disabilities within the TANF system. For this group, the development of marketable skills and acceptable work behaviors and the movement along a path to employment can be complicated by substantial barriers and insufficient transitional support (Timmons et al., 1999).
Anticipated Outcomes The study evaluated perspectives of four stakeholders in order to discover how welfare reform initiatives have affected the lives of people with disabilities. Since little is known about this issue, the analysis included the perspectives of those at the grassroots level: (a) people receiving welfare who have a disability or people receiving welfare who care for a family member with a disability, (b) welfare agency caseworkers, (c) disability and welfare rights advocates, and (d) employers with experience in hiring former welfare recipients. These perspectives enable one to get a glimpse of how this public policy affects service delivery and day-to-day personal experiences. In reference to Figure 5.2, the evaluation focused primarily on the individual stakeholder, individual value outcomes, and equity and fidelity to the model measures.
Outcome Indicators The purpose of the evaluation was to investigate the impact of welfare reform on individuals with disabilities from the perspective of each of the key stakeholders described above. It is important to understand what supports exist in the TANF system for persons, such as accommodations, the prevention of disability discrimination, alternative work arrangement, and provision of specialized child care, to facilitate the transition to employment for persons with disabilities (Timmons et al., 1999). Thus, the following six questions served as the basis for the policy evaluation: (a) what is the impact of welfare reform on people with disabilities receiving welfare; (b) what is the impact of welfare reform on people receiving welfare who are caregivers for a family member with a disability; (c) how has welfare reform affected TANF caseworkers who work with people with disabilities; (d) how has welfare reform affected advocates who work with people with disabilities; (e) what supports are needed to
Policy Evaluation
113
assist people receiving welfare who have a disability or a family member with a disability to return to or enter the workforce; and (f) what have been the experiences of employers who hire individuals formerly receiving welfare benefits?
Status of the Outcome Indicators A qualitative research approach was used because of the exploratory nature of the questions. The sample included participants from four key stakeholder groups: individuals receiving welfare who have a disability or individuals receiving welfare who are caring for a family member with a disability, caseworkers, disability/welfare rights advocates, and employers. Interviews and focus groups were taperecorded, transcribed, coded, and analyzed. Analysis of interviews and focus group transcripts indicated a five-step path to work through the welfare system: the circumstances leading individuals into the TANF system, the benefit application process, assessment of job skills and readiness, job training or community service, and finally job search and job placement.
Formative Feedback The results of the policy evaluation suggested a number of public policy questions related to the significant roadblocks encountered on the path to work. First, individuals with disabilities experienced some of the same roadblocks as the general population receiving TANF benefits, including the ongoing conflict between work and family and challenges to building rapport in the individual-caseworker relationship. Second, key stakeholders also described roadblocks along the path that were specific to individuals with disabilities or individuals caring for a family member with a disability. For example, several participants described losing jobs due to disability and health issues and the lack of job accommodations; others described the difficulties in finding child care that accommodated their children’s needs. Third, issues in the disability determination process arose for both caseworkers and individuals applying for TANF benefits. Caseworkers stated their discomfort with assessing disability and their doubt about the legitimacy of individuals’ disability claims. Individuals receiving welfare benefits stated that caseworkers demonstrated limited sensitivity and understanding about disability issues. And fourth, key stakeholders also described job training and placements that did not accommodate their disability, were inaccessible, or were poor matches, given their disability (Timmons et al., 1999).
114
Chapter 5
Example 3: Implementation of the 1992 Vocational Rehabilitation Amendments Analysis Intent and Context The 1992 Vocational Rehabilitation (VR) Amendments reflect a continued evolving public policy to include individuals with disabilities in community life and improve access to employment opportunities. A number of policyrelated elements were included in the 1992 Amendments including presumption of eligibility, expanded access to populations previously undeserved, increased broad-based stakeholder involvement, and an expansion of services related to supported employment, on-the-job training, personal assistant services, and a wide range of rehabilitation technology. Given this ambitious agenda for change, however, one can question how the traditional VR system has responded to the call for greater access, consumer empowerment, and improved services. Years after the mandate for these changes, we are still left with questions as to whether these changes have found their way into local agency offices and the lives of individuals with disabilities. In order to fully comprehend the impact of this public policy, an understanding of both administrator and counselor perceptions of change is required (Whitney-Thomas et al., 1997).
Anticipated Outcomes Traditionally, people with disabilities have been thought to be unable or unwilling to work. During the past few decades, the field of vocational rehabilitation has assisted people with disabilities to enter the job market and become gainfully employed. A key factor in improved job-related outcomes is the role played by VR counselors and administrators. Thus, the purpose of the present evaluation was to gain an understanding of how the 1992 amendments have been implemented, as of 1996, and whether the practice has changed from the perspective of service providers (that is, administrators and counselors) on a national level. In reference to Figure 5.2, the key stakeholders were VR programs and the larger VR system; data sets included national surveys and fidelity to the model standards; and the policy analysis technique was effectiveness evaluation.
Outcome Indicators The analysis sought to gain a better understanding of how the 1992 Rehabilitation Act Amendments have been implemented, and whether practices since its passage have changed from the perspective of vocational rehabilita-
Policy Evaluation
115
tion administrators and counselors as of 1996. To this end, four hypothesis were tested on differences between groups on their perceptions of change: (a) administrators and counselors differ on their responses to the global measure of change; (b) administrators and counselors differ on their responses to specific factors defined as consumer choice/awareness and consumer advocacy; (c) administrators and counselors differ on those responses to the areas of change regarding eligibility, use of existing information, consumer involvement, assessment and the Individual Work Rehabilitation Plan (IWRP), assistive technology and accommodation, and serving those that have not been adequately served in the past; and (d) administrators and counselors differ on their responses to level of change on each individual practice item (WhitneyThomas et al., 1997).
Status of Outcome Indicators The analysis involved a national, cross-sectional survey. Data were collected from 251 administrators and 254 counselors from 25 states. The research instrument was designed and organized to cover the five topic areas referenced in the 1992 amendments: eligibility, use of existing information, consumer involvement, assistive technology and accommodations, and serving those who have been undeserved in the past. In general, results showed that administrators perceived significantly more change than counselors. When asked how the 1992 Vocational Rehabilitation Amendments have impacted daily practice, neither group felt that more than “some change” had occurred in their offices or caseloads. The greatest disparity in the perceptions of administrators and counselors was in the area of consumer advocacy. Other results of the analysis indicated that a large percentage of both administrators and counselors perceived change in the severity of disability of individuals served; both administrators and counselors felt that the consumer was also actively involved in developing assistive technology accommodations, although the rate of using assistive technology was reportedly low; and administrators and counselors agreed that increased efforts and plans to target undeserved groups are under development but have not been fully implemented (Whitney-Thomas et al., 1997).
Formative Feedback Results of the evaluation indicated that the changes in VR practice since the implementation of the 1992 amendments have not been drastic, and that the major goals of the amendments while being addressed have not been fully met. This finding suggests a number of policy implications and important feedback to those who formulate rehabilitation policy. First, although some
116
Chapter 5
change has been observed by both administrators and counselors, these two groups perceive this change differently. The fact that counselors feel there has been less change in daily practice than do the administrators suggests organizational and communication issues need to be addressed. Second, the results of the analysis document that change is occurring, but that the pace of change is modest. For administrators, the perception of the amount of change seems to exceed the reality of change as reported by the practitioners and the front line counselors. Third, attempts to streamline the rehabilitation process have resulted in increased access of some consumers since a greater number of persons with disabilities are being accepted for service. Fourth, while there is greater access for persons with severe disabilities, the true outcome is real work, not eligibility for services that either lead to an unsuccessful closure or closure in a setting not wanted by the individual. Fifth, consumer advocacy roles have been clarified in the law and practice; nevertheless, in order to ensure that consumer involvement is maintained and maximized during the rehabilitation process, involvement should be monitored throughout the rehabilitation process. And finally, the task of monitoring improvement in services in the future should be accomplished through further definition of these services and resulting outcomes within the VR system. Greater use of supported employment, on-the-job-training, personal assistance services, and a wider range of rehabilitation technology need to be documented (WhitneyThomas et al., 1997).
Guidelines These three examples reflect a number of factors that explain why policy evaluation is not easy and is still an “emerging science.” First, the evaluation is complicated frequently by the lack of clearly stated goals and anticipated objective, person- and organization-referenced outcomes. Second, any group of stakeholders does not just represent a heterogeneous constituency with varying goals and agendas, but also frequently come in and out of the picture in various degrees of influence. Third, Figure 5.1 indicates clearly that any given policy is not static and varies across time in its public and political agendas, adaptation, implementation, and enforcement. Fourth, there is still a lack of standardized data across programs for many public policies, which makes their evaluation from an outcome-based perspective challenging. And fifth, as shown in Figure 5.2, policy evaluation potentially involves a number of key stakeholders, data sets, and evaluation techniques. Because of these factors, I suggest the five guidelines summarized in Table 5.3 to keep in mind for both outcome-based policy evaluation producers and consumers. First, the policy evaluation should identify the values that under-
Policy Evaluation
117
lie the policy being evaluated. These values can be the basis of determining the fidelity of the policy to its intended purpose and outcomes. For example, a consensus evaluation conference (Turnbull, Turnbull, & Blue-Banning, 1994) identified the following core concepts of disability policies: independence, productivity, inclusion, role in decision making, nondiscrimination, individualized and appropriate services, service coordination and collaboration, accountability, priority based on severity, cultural responsiveness, familycenteredness, and services to the whole family. This values-analysis approach has been applied to policies and their underlying values in general education, special education, human and social services, and vocational rehabilitation (Chambers, 1994). The values identified can relate to whether a policy's underlying purpose relates to human development (e.g., prevention and intervention in child maltreatment or nondiscriminatory access to health care), to bureaucratic efficiency, or to justice as street-level case managers define it. Intended outcomes can then be evaluated within this context. Second, policy analysis needs to move beyond a “positivist” approach, wherein one assumes that one can best comprehend and evaluate the policy by trying to be exclusively objective. One needs to recognize that the analyst always makes sense of the world through theories, and that a theoretical or conceptual framework is essential to securing an analysis that combines scientific elegance with social policy (Popper, 1959; Schorr, 1997). Thus, one needs to move to the “postpositivist” approach to evaluation and approach public policy as seeking to understand how webs of beliefs, desires, attitudes, and the histories of people who have experienced the effects of policies create a new context for the evaluation (Chambers, 1993). The importance of this second guideline was clearly evident in the face-to-face interviews with TANF clientele in the example described in the preceding section. Third, policy evaluation should also determine the constraints on, and the resources available for, meeting goals and objectives. This involves implementation research, which recognizes that policy is the result of the interaction of values, the providers who implement policies, and the policy recipients. For example, policy and its implementation are affected by politics and the struggle over ideas that occurs in political communities (Stone, 1997). Thus,
118
Chapter 5
each policy is a collection of arguments in favor of different ways of seeing the world. Additionally, any policy and its implementation changes as street-level providers and recipients implement it. In reference to the TANF evaluation, for example, a number of barriers were identified, which could be addressed through three “overarching issues”: interagency coordination, infrastructure, and need for additional research. Fourth, no single factor influences policy more than any other. Some factors may be “in play” and others may not be, depending on the particular policy considered. And each “in play” factor is influenced by the others. An example of this was the discrepancy between VR counselors and administrators in their perception of how well the 1992 VR amendments were being implemented. Moreover, some policies that potentially affect people may or may not be funded; and still others may seem to have no clear or immediate relationship to persons, only to become important later. An example of this is the Fair Housing Act Amendments of 1988 which does not affect current families with young children who are disabled, but will affect their children’s children when they reach adult age. Fifth, significant attention needs to be given to the obstacles to policy analysis knowledge (Nagel, 1990): multiple dimensions on multiple goals, multiple missing information, multiple alternatives that preclude the determination of the effects of each one, multiple and possibly conflicting constraints, and the need for simplicity in drawing and presenting conclusions in spite of all the multiplicity. Techniques, strategies, and methods for overcoming these obstacles represent a significant 21st-century agenda for an outcomes approach to policy evaluation.
Summary In summary, the approach to policy evaluation discussed in this chapter attempts to capture the reality that public policy is not in a steady state and that policy evaluation needs to be responsive to two potential models of policy formulation: the rational comprehensive model and the incremental model (Rossell, 1993). The rational comprehensive model assumes that policymakers establish goals, identify policy alternatives, estimate the costs and benefits of the alternatives, and then select the policy alternative that produces the greatest net benefit. Conversely, the incremental model assumes that policymakers, working with imperfect information, continually adjust policy in pursuit of policy goals that are subject to periodic readjustment or are not clear to begin with. Frequently, the incremental model is typically the case, which challenges policy evaluators to incorporate validity, importance, usefulness, and feasibility criteria into their evaluations (Nagel, 1990).
Policy Evaluation
119
Validity. In the context of policy evaluation, validity refers to being accurate and includes: (a) listing the major goals and nongoals of the evaluated policy; (b) encompassing the total set of feasible and nonfeasible alternatives that are capable of being adapted and implemented by the relevant policy; (c) describing the relations between the alternative policies and the goals; and (d) drawing logical conclusions that follow from the goals, policies, and relations. Importance. Does the evaluation deal with an issue that is considered to be “important” by one or more stakeholder groups? More specifically, does the evaluation include both individual- and organization-referenced performance and value outcomes? Usefulness. Does the analysis lead to some useful results and does the analysis produce good formative feedback to policymakers, funders, program administrators, and consumers regarding the effect or impact of the policy on people’s lives? Feasibility. Is the evaluation feasible? Are the goals of the evaluated policy clear; has the policy been implemented, and if so, has sufficient time passed for an effect to occur; has the policy been enforced; are objective outcomes from the policy available; and are there sufficient resources (time, money, expertise) to validly evaluate the policy? As evident in the three examples presented, policy evaluation does not occur in a vacuum. Rather, there are a number of stakeholders who need to be involved in the evaluation and a number of values that they will impose on an evaluation. Across public policies, there are at least four classes of stakeholders (Newman & Tejeda, 1996): the consumer, including the client, the client’s family, and the party paying for the service; the practitioner; the supervisors and service managers; and the policymakers who set the standards for services and reimbursement. These key stakeholders will insist on using a number of criteria to determine whether societal values are incorporated into the evaluation. Chief among these: effectiveness, the benefits achieved from alternative public policies efficiency, keeping the costs down in achieving the benefits equity, providing the minimum level of benefits or a maximum level of costs across individuals, groups, or places public participation and decision making by the target group, the general public, relevant interest groups, or other types of decisionmakers whose involvement appeals to our desire to use democratic procedures for achieving given goals predictability, making decisions by following objective criteria so that similar decisions can be arrived at by others following the same criteria
120
Chapter 5
procedural due process, those who have been unfairly treated are entitled to have notice of what they have done wrong, the right to present evidence, the right to confront their accusers, a decisionmaker who is not also an accuser, and an opportunity for at least one appeal In his discussion of policy evaluation, Dye (1984, p. 13) makes an interesting comment: “It is questionable that policy evaluation can ever provide solutions to American’s problems.” And should it? It seems reasonable to assume that policy evaluation needs to be tempered by five realities: it is easy to exaggerate the importance, both for good and ill, of the policies of governments; policy evaluation cannot offer solutions to problems when there is no general agreement on what the problems are; policy evaluation deals with very subjective topics and must rely on interpretation of results; there are some inherent limitations in the design of social science research related to comparison groups and inability frequently to do “the true experiment”; and social problems are sometimes so complex that social scientists are unable to make accurate predictions about the full impact or effectiveness of specific policies. Despite this caveat, it is hoped that this chapter’s discussion of policy evaluation, from an outcome-based approach has helped the reader to understand and use policy evaluation from a fresh perspective. Outcome-based evaluation is not antithetical or inconsistent with policy evaluation. Policies intend to produce outcomes, and our challenge is to capture and report those outcomes from the perspective of the consumer, service provider, and society at large.
Study Questions 1. What is public policy and what is its process? Identify one or more public policies
2. 3. 4. 5. 6.
that affects each of the text’s focus areas: regular education, special education, health care, mental health, disabilities, aging, substance abuse, and corrections. Study Figure 5.1. What is the significance of the arrows denoting feedback from the policy evaluation? What is benefit-cost analysis? Summarize in your own words the most important concepts basic to the suggested approach to benefit-cost analysis. Study Table 5.1. Use these three analytical perspectives to “evaluate” three or four of the policies you identified in question 1. Study Figure 5.2. Select four or five data sets that you could use to evaluate the policies you identified in question 4. Visit one or more Web sites identified under “Systems-Level Data Sets.” What factors do you find in reference to person-referenced data criteria: complete (available for all participants), timely (current and covers the period of time you are interested in), and accurate?
121
Policy Evaluation
7. Relate the three examples presented in the chapter to the Policy Evaluation Model
and its process steps (Table 5.2). 8. Why are the policy evaluation guidelines summarized in Table 5.3 important?
Relate each of these five guidelines to any one of the three examples presented in the chapter. 9. Review Figure 5.2 (Policy Evaluation Model) and the text regarding the use of national data sets in policy evaluation. What should be their primary use? What do you perceive to be their major strengths and limitations? 10. Review the guidelines presented in the chapter’s last section. Do you see a bias in these guidelines? If so, what is it? What additional guidelines might you suggest?
Additional Readings Bergman, A. I., & Singer, G. H. S. (1996). The thinking behind new public policy. In G. H. Singer and A. L. Olson (eds.), Redefining family support: Innovations in public-private partnerships (pp. 435–464). Baltimore: Brookes. Buck, A. J., Gross, M., Hakim, S., & Wienblatt, J. (1993). Using the Delphi process to analyze social policy implementation: A post hoc case from vocational rehabilitation. Policy Sciences, 26(4), 271–288. Chambers, D. E. (1993). Social policy and social programs: A method for the practical policy analyst (2nd ed.). New York: Macmillan. Haskins, R., & Gallaher, J. J. (eds.) (1997). Models for analysis of social policy: An introduction. Norword, N.J.: Ablex. Kingdom, J. W. (1995). Agendas, alternatives, and public policies (2nd ed.). New York: HarperCollins. Stone, D. (1997). Policy paradox: The art of political decision making. New York: Norton.
Benefit-Cost Analysis Sources: Barnett, S. W. (1993). Benefit-cost analysis of pre-school education: Findings from a 25-year follow-up. American Journal of Orthopsychiatry, 63(4), 500-525. Cimera, R. E., and Rusch, F. R. (1999). The cost-efficiency of supported employment programs: A review of the literature. International Review of Research in Mental Retardation, 22, 175225. Kee, J. E. (1994). Benefit-cost analysis in program evaluation. In J.S. Wholey, H. P. Hatry, and K. E. Newcomer (eds.), Handbook of practical program evaluation (pp. 456-488). San Francisco: Jossey-Bass Publishers. Reed, S. K., Hennessy, K. D., Mitchell, O. S., & Babigian, H. M. (1994). A mental health capitation program: II. Cost-benefit analysis. Hospital and Community Psychiatry, 45(11), 10971103. Rogers, E. S., Sciarappa, K., MacDonald-Wilson, K., & Danley, K. (1995). A benefit-cost analysis of a supported employment model for persons with psychiatric disabilities. Evaluation and Program Planning, 18(2), 105-115. Rossi, P. H., Freeman, H. E., & Lipsey, M. W. (1999). Evaluation: A systematic approach (6th edition). Thousand Oaks, CA: Sage Publications.
II Outcomes: Their Selection, Measurement, and Analysis Don’t do anything that is not consistent with your culture. P ETER D RUCKER (1988)
Outcomes research is typically defined as any research that attempts to link either structure or process, or both, to the outcomes from education, health care, or social service programs. Focusing on outcomes allows outcome-based evaluators to: Determine the contributions that education, health care, and social services make to the lives of program recipients. Assist rational choices for consumers among services provided. Develop benchmarks and practice guidelines. Improve the effectiveness and efficiency of education, health care, and human service programs. Place evaluation promoters and stakeholders at the center of program reform. Provide outcome-based information to program personnel, funders, and policymakers to assist in programmatic improvement and change. The attractiveness of outcomes research is that outcomes can be used for these purposes plus more. The question is, however, which outcomes to use and how can they best be measured and analyzed? These three questions provide the framework for this second section of this book. The selection, measurement, and analysis of outcomes cannot be separated from two critical concepts: the reform movement and alignment. The reform movement, with its emphasis on both accountability and quality, is 123
124
Part II
influencing not just the accountability and continuous improvement efforts discussed throughout Part I, but it is also affecting the selection, measurement, and analysis-interpretation of person- and program-referenced outcomes. The concept of alignment requires adjusting the approach one takes to outcome-based evaluation to the accountability and quality dimensions of the reform movement. Therefore, the program evaluation and methodological pluralism model presented in Figures 1.2 and 2.1 will continue to provide the conceptual and practical basis (that is, “alignment”) in Part II. Three chapters compose Part II. In Chapter 6, the reader will find a discussion of the critical aspects of the reform movement along with 10 criteria that are appropriate to use in selecting outcome measures. The chapter also discusses a number of person- and organization-referenced outcomes for each of the book’s targeted populations: education, health care, and social services (including mental health, disabilities, aging, substance abuse, and corrections). Each set of proposed outcomes is discussed in reference to the requirements of the reform movement. The chapter concludes with a composite listing across the target populations of potential outcomes that reflect the cells of Figures 1.2 and 2.1: organizational performance outcomes, organizational value outcomes, individual performance outcomes, and individual value outcomes. The measurement approaches to outcome assessment and appraisal presented in Chapter 7 are based on the evaluation standards (performance measurement or value assessment) and evaluation focus (individual or organization) presented in Figures 1.2 and 2.1. These standards and foci are expanded in Chapter 7 to include eight specific measurement and assessment approaches. For performance assessment, the two approaches include effectiveness and efficiency; for consumer appraisal, satisfaction and fidelity to the model; for functional assessment, adaptive behavior and role status; and for personal appraisal, person-centered or health-related quality of life. These eight measurement approaches have emerged over the past decade in response to the increased need for accountability and continuous improvement; devolution, homogenization, and downsizing of government; the dramatic growth in interest in quality measurement; the rise of consumerism; and the quality of life movement. Outcome-based evaluation requires data and its analysis, and that means that both outcome-based consumers and producers need to be familiar with techniques to analyze and interpret person- and program-referenced outcomes. Throughout Chapter 8, I stress the importance of understanding the relationship among the questions one asks, the evaluation design one uses, the data one collects to answer the questions, and how those data are collected and managed so that the questions asked and the outcomes selected and measured flow logically to analysis and interpretation. Chapter 8 also discusses a number of critical factors that affect the interpretation and application of out-
Outcomes: Their Selection, Measurement, and Analysis
125
come-based results, including internal validity, clinical significance, attrition, and a number of contextual variables. It is also important to point out what is not covered in these three chapters. First, the reader will not find a listing of specific instruments that have been developed to measure outcomes. There are just too many and rather than listing the plethora, I will discuss selection criteria and measurement standards that either the evaluation consumer or producer can use. Second, the proposed outcomes are not exhaustive; rather, they reflect the current state of outcomes research in the targeted areas and that meet the 10 selection criteria discussed at the beginning of Chapter 6. What is considered in the chapter, however, is the rationale for why certain outcome measures make more sense than others. In that regard, each outcomes measure section will begin with a brief rationale and overview of the impact of the reform movement on the respective targeted population (education, health care, and social service field). Third, in the area of health care, individual clinical conditions or illness are not considered, and the reader is referred to the literature on specific clinical conditions (e.g., cancer, epilepsy, coronary heart disease) for condition-specific outcome measures. Throughout these three chapters, the reader will want to think about answering three fundamental questions asked of any outcome-based evaluation: What outcomes do I select; how can they best be measured; and how can they be analyzed and interpreted? By the end of Chapter 8, the answers should be obvious.
6 Selecting Outcomes OVERVIEW
127
The Reform Movement Accountability Dimension Quality Dimension Selection Criteria Outcome Measures: Regular Education Outcome Measures: Special Education Outcome Measures: Health Care Outcome Measures: Mental Health Outcome Measures: Disabilities Outcome Measures: Aging Outcome Measures: Substance Abuse Outcome Measures: Corrections Generic Outcome Measures Summary Study Questions Additional Readings
128 129 131 134 136 137 139 141 144 148 150 152 155 156 157 158
Verbs, all of them tiring. CHARLES FRAZIER (1997, p. 92)
Overview The popularity of the term “outcomes” should be quite apparent at this point in our study. The end of the twentieth century saw a plethora of outcome indicators and measures that are a bit mind boggling and overwhelming at times. Hence, one of the major purposes of Chapter 6 is to present both a rationale and a set of criteria that are appropriate for selecting the potential outcome measures presented later in the chapter. 127
128
Chapter 6
The material presented in this chapter is organized around three basic points: First, outcomes should meet specific selection criteria that reflect the need discussed in Part I for accountability and continuous improvement. Second, outcomes are not merely “picked from a hat.” Rather, there needs to be a rationale for the selection of specific outcomes. A rationale based on the reform movement and the changing evaluation strategies discussed in Chapter 1 precedes the list of potential outcome measures for each of the text’s targeted areas: education, health care, and social services. And third, the selection of outcomes should be consistent with the evaluation model presented in Figures 1.2 and 2.1. Such a model is presented in Figure 6.1, which summarizes the four outcome measure selection categories around which the potential outcomes are discussed in this chapter and presented in tabular fashion in Tables 6.2–6.10. The chapter is divided into four sections: (1) the reform movement; (2) outcomes selection criteria; (3) rationale and potential outcomes for each targeted area; and (4) a general summary and listing of outcome measures that are commonly used across the targeted areas. In reference to the targeted populations, I have organized them thusly: education (regular and special), health care, and social services (mental health, disabilities, aging, substance abuse, and corrections).
The Reform Movement The 1990s have seen a significant change in how we view education, health care, and social service programs, including their purpose, character, respon-
Selecting Outcomes
129
sibility, and intended outcomes. I refer to this significant change as “the reform movement” whose major characteristics include (Mawhood, 1997): focusing on outputs rather than inputs; driven by goals and not by rules and regulations; redefining clients as customers; decentralizing authority; using market rather than bureaucratic mechanisms; catalyzing public, private, and voluntary sectors; empowering citizens; introducing private finance. A recent example of the reform movement was the passage by Congress on August 7, 1998, of the Workforce Investment Act (WIA, P.L. 105-220). The WIA provides the framework for a reformed national workforce and employment system designed to meet the needs of the nation’s employers, job seekers, and those who want to further their careers. Title I of WIA specifies core performance and customer satisfaction measures. Chief among the performance outcome measures are rates related to: entering employment, six-month job retention, average earnings change in six months, credential attainment, skill attainment, diplomas or equivalent attainment, and long-term job retention. Customer satisfaction measures are based on standardized surveys that, along with the core indicators of performance, promote continuous improvement. As it relates to outcome-based evaluation, the reform movement, as reflected in the WIA, has two dimensions: accountability and quality. The accountability dimension involves performance-based assessment and resultsbased accountability; the quality dimension involves consumer satisfaction, benchmarks, and practice guidelines. The impact of these two dimensions on OBE is shown diagrammatically in Figure 6.2, which indicates clearly that the major impact of the reform movement is on the selection of outcome measures and the methodological pluralistic approach to their measurement.
Accountability Dimension Performance-Based Assessment
Current accountability initiatives seek to improve management, increase efficiency and effectiveness, and improve public confidence in government (Newcomer, 1997; Wholey, 1997). For example, the Government Performance and Results Act (GPRA) passed in 1993 provides a legislative base for many of the most important reform efforts, asking agencies to articulate goals and to
130
Chapter 6
report results achieved through three processes: strategic plans, performance plans, and program performance reports (Wholey, 1997). 1. Through strategic planning each agency is to develop a mission statement covering the agency’s major functions and operations; establish and periodically update long-term goals and objectives, including outcome-related goals and objectives; describe how those goals and objectives are to be achieved; describe how annual program performance goals will be related to the agency’s long-term goals and objectives; and identify key external factors (that is, contextual variables) that can significantly affect the achievement of the long-term goals and objectives. 2. Through annual performance planning each agency is to prepare annual performance plans that will define performance goals for each fiscal year. These plans should include targeted levels of outputs and outcomes to be achieved by key agency programs, the resources and activities required to meet these performance goals, and the establishment of performance indicators to assess relevant program outcomes and compare actual program results with the performance goals. 3. Through annual program performance reports each agency is to report actual program results compared with the performance goals for that fiscal year, report actual program results for prior fiscal years, and explain why any performance goals were not met and what action is then recommended.
Selecting Outcomes
131
Results-Based Accountability and Budgeting
Analogously, private and public funding agencies are increasing their attention to the use of performance information in the budget process, moving toward full-scale implementation of results-based accountability and budgeting. These efforts are giving increased prominence to value for money, including the responsibility of managers at all levels to make the best use of resources and the need for good output and performance information, both inside departments and for publication. Results-based accountability and budgeting emphasize managerial control over resources, the need to work within tight budgets, the need for improved financial information systems, and the requirement to measure departmental output and performance (Mawhood, 1997; Schorr, 1997; Wholey, 1997).
Quality Dimension As discussed in Chapter 1, current education, health care, and social service programs are being impacted significantly by two phenomena: the movement toward assessing the value and quality of respective programs on the basis of consumer satisfaction, and the development of new models of service delivery that reflect the devolution of government, the homogenization of services, and the community-based movement in mental health and social services. Consumer Satisfaction
Contemporary discussions of quality outcomes are no longer grounded in the industrial-regulatory perspective wherein quality was defined as conformity with regulation and specification (Wren, 1979). In contrast, current definitions of quality are rooted in the postindustrial, knowledge-based society. The worldwide growth of service economies and the information revolution have elevated the importance of customer service. Because these services change over time, they need to be flexible to accommodate the ever-changing consumer and his or her demands (Gardner, 1999). For example, Albrecht (1993) suggests that, “quality in the 21st century must start with the customer, not with the tangible product sold or the work processes that create it...a profound change in focus, from activities to outcomes” (p. 54). In reference, to the development of new models, the challenge is the same: how to evaluate the organization-referenced outcomes of programs that are changing the way they deliver services. Examples include community-based programs, publicprivate partnerships, managed care, capitation, sole proprietorships, and brokered case management. These two phenomena—the movement toward assessing the value and
132
Chapter 6
quality of respective programs on the basis of consumer satisfaction and the development of new models of intervention and service delivery—have impacted outcome-based evaluation in three significant ways. First, they have provided a catalyst to the development of consumer appraisal and to the postmodernist’s emphasis on responsive, constructivist evaluation (Guba & Lincoln, 1989). Second, they emphasize the need to incorporate organization and individual value outcomes in outcome-based evaluation. And third, they have led to the development of benchmarks and practice guidelines. Benchmarks
Benchmarks are being used for comparison purposes, and are being developed and used out of the desire for organizations to achieve quality outcomes (Camp, 1989; Center for the Study of Social Policy, 1996; Schorr et al., 1994). The quest in benchmarking is to search for and, once found, to understand the underlying process that is responsible for generating the consistently superior results (Kinni, 1994). Although defined differently across the topical areas of the text, benchmarking is best defined as, the disciplined search for best practice. In contrast to conventional library research, benchmarking is conducted not by studying research literature but by identifying the organizations that are the best at what they do, determining what it is that makes them successful, and figuring out how to adapt their practices so that an organization can do better. (Tucker & Codding, 1998, p. 25)
Practice Guidelines
The concern for quality outcomes within a devolution environment has also resulted in the development of practice guidelines that reflect quality programming and intervention. Although not replacing Codes of Ethics and Ethical Standards, the current practice guidelines tend to focus on best practices in planning, performing, and evaluating. Within each area, one tends to find the following common themes across disciplines and programs (Moos & King, 1997): Planning: professional context and regulatory requirements, assessment, program development, resources, group and member preparation, consultation and training with network organizations, and professional development Performing: self-knowledge, group competencies, group plan adaptation, therapeutic conditions and dynamics, collaboration, diversity, and ethical surveillance Evaluation: mission statement, intended goals and objectives, monitoring of process, and measurement of outcomes
Selecting Outcomes
133
In summary, the accountability and quality dimensions of the reform movement are impacting each of the topical areas covered in this text: education, health care, and social service programs. In some of these areas (for example, education and health care), the impact has been to shift the focus more toward the accountability dimension; in others (for example, disabilities and aging) the impact has been to shift the focus toward the quality dimension. In still others (for example, substance abuse and corrections), the pendulum is still swinging. What is definitely true, however, is that the reform movement has spurred the development of “accountability systems” to evaluate the outcomes from specific agencies and programs. Included here is a list of such agencies accessible on Internet. The National Committee for Quality Assurance (NCQA) has developed successive versions of the Health Plan Employee Data and Information System (HEDIS) in order to gauge the performance of health plans from both a purchaser and consumer perspective. They have also just introduced a new, consumer-friendly HMO report card that provides consumers and employees with corporate information about health plans . The Center for Mental Health Services has developed a Consumer Oriented Health Report Card as part of its Mental Health Statistics Improvement Program (MHSIP) . The American Managed Behavioral Healthcare Association has developed Performance Measures for Managed Behavioral Health-Care Programs (PERMS) that organize performance measures into access, consumer satisfaction, and quality care domains, and includes measures of service utilization, cost, penetration rates, call abandonment rates, and consumer satisfaction with access to clinical care, efficiency, and effectiveness . The Joint Commission on Accreditation of Healthcare Organizations (JACHO) is currently integrating acceptable measures into its accreditation process under a single performance measurement umbrella. These performance measures are expected to include clinical, patient care, health/functional status, satisfaction, and administration and organization . The Committee on National Statistics of the National Academy of Science (NAS) National Research Council is developing performance measures for states and the federal government to use in assessing progress toward objectives for public health programs in the area of chronic diseases, disabilities prevention, emergency medical services, mental health, rape prevention, and substance abuse <www.facct.org>.
134
Chapter 6
The Federal Health Care Finance Administration (HCFA) is in the process of designing a quality indicator system for the Medicaid Home and Community-Based Waiver Program .
Selection Criteria There are several criteria for selecting specific person or organizationreferenced outcomes. The following 10, which are summarized in Table 6.1, are based on the work of Gettings (1998), Gettings and Bradley (1997), Rusch and Chadsey (1998), Schalock (1999), and Schorr (1997). Acceptable to Promoters and Stakeholders. It is important to remember the orientation and expectations of the different key players in outcome-based evaluation. These persons need to “buy into” the measures selected so as to enhance the evaluation’s validity and ensure the utilization of its results. This “buying into” is facilitated if the outcome measures selected can be assessed in terms of relevancy to one’s life and ease of measurement, communication, and interpretation. Conform to Established Psychometric Standards: Outcome measures need to be measured reliably and validly. One needs to remember also that outcomes need to be standardized, clear, objective, quantitative, and accurate. A thorough discussion of standardization procedures, reliability, validity, and norms is presented in Chapter 8. Affordable: The outcomes selected need to be “affordable” to the program in terms of time, money, and necessary expertise. In that sense, the collection of information regarding outcomes must be relatively easy and straightforward and not pose undue hardships or burdens on personnel. Affordability and the program’s “evaluability index” (see Figure 2.2 and Exhibit 2-1) tend to be significantly related. Timely: The outcome measures selected need to cover the evaluation period desired. This requires a functional management information system and the ability to collect and analyze the data in a timely fashion. Today’s questions cannot be answered by yesterday’s data. Reflect Major Individual or Organization Goals: Person-referenced outcomes need to be related directly to the person’s goals and aspirations. Analogously, organization-referenced outcomes need to be related to the organization’s mission statement and the services, interventions, or supports provided. Connected Logically to the Program or Intervention: Outcomes need to reflect what the program does and what the program has control over.
Selecting Outcomes
135
Otherwise, there is no way to relate results back to the organization or the intervention provided. This is especially true for performance enhancement and continuous improvement efforts. Can Be Evaluated Longitudinally: Behavior changes slowly, and therefore, the outcome measure(s) selected need to be assessed at different points in time (remember our earlier discussion of short-term, intermediate, and long-term outcomes). A number of ways to approach longitudinal evaluations are presented in Chapter 8. Accurate: The outcome measures selected need to be accurate and reflect the actual events and characteristics under investigation. As discussed in Chapter 8, procedures need to be in place to validate one’s data and to ensure that the error rate is within acceptable limits. Culturally Sensitive: Outcomes need to be sensitive to minority group and cultural factors. Primary factors to consider include: (1) all individuals develop in a cultural context; (2) culturally based values, norms, and behaviors are transmitted from one generation to the next via overt and covert processes of socialization and are adaptive to the demands of the local environment; (3) many aspects of culture are abstract in that they are not overtly or intentionally socialized; and (4) culture is evidenced in patterns and social regularities among members of a population and within the larger ecological context (Hughes, Seidman, & Williams, 1993). Multidimensional: Behavior is not singular; nor should be the outcome measures selected. The challenge of good outcome-based evaluation methodology is to select those person-referenced behaviors that reflect the complexity of the human condition, and the program-referenced outcomes that reflect the accountability and continuous improvement demands placed on education, health care, and social service programs.
136
Chapter 6
Outcome Measures: Regular Education During the past decade there has been a clear shift in educational policy from an emphasis on the process of education to concerns about the desired outcomes of schooling and standards against which schools and educationrelated outcomes can be judged (Tucker & Codding, 1998). Those undertaking educational reform and restructuring efforts are increasingly looking to outcomes research and evaluation to document the results of their efforts. A large part of their motivation is to develop a system of indicators that can inform evaluation promoters and stakeholders of the educational health of the country in much the same way as key economic indicators inform one about the economic health of the country (Vanderwood et al., 1995). Reflective of this movement are the eight goals stated in Goals 2000: Educate America Act (U.S. Department of Education, 1996): readiness for school; higher graduation rates; improved achievement; professional development; having a high rank in the world in math and science; adult literacy and life-long learning; safe, disciplined, and drug-free schools; and increased parent involvement. The requirement for educational accountability is encompassed in the guidelines that states must follow to obtain Goals 2000 funds for reform. There have been two major impacts of the reform movement in education on outcome-based evaluation: to focus on performance-based assessment at the individual level and to use peer groups at the systems level. Performance-based assessment involves the demonstration of educationally related anticipated outcomes, and calls for the demonstration of understanding and skills in applied, procedural, or open-end settings. Common attributes or elements of this performance-based approach include (Baker, O’Neil, & Linn, 1993): use of open-ended tasks, focus on higher order or complex skills, employment of context-sensitive strategies, and use of complex problems requiring several types of performance. The approach consists of either individual or group performance, and may involve a significant degree of student choice. In addition to other uses, performance-based assessment in education can be used for different kinds of accountability including assigning grades, evaluating education-related outcomes, and determining educational achievement. At the systems level, educational peer groups have been used as benchmarks for a number of purposes including comparison, increasing quality outcomes, as a basis for strategic planning, and for improving one’s competitive edge. Potential outcome measures for regular education are summarized in Table 6.2. They are aggregated into the four measurement approaches depicted in Figure 6.1. The outcomes presented in Table 6.2 are based on the work of Blank (1993), Keith and Schalock (1994), Tucker and Codding (1998), and Vanderwood et al. (1995).
137
Selecting Outcomes
Outcome Measures: Special Education The movement to include (mainstream) students with disabilities as fulltime members of general education has been based, for the most part, on constitutional grounds, legal precedents, and ethical considerations (Hunt et al., 1994). Considerable attention has been placed on developing outcome measures for these students that represent the goals of special education; reflect the constitutional, legal precedents, and ethical considerations on which the movement is founded; and reflect the need for greater accountability in special education. In 1989, for example, the National Council on Disability agreed with this latter need when it stated, The time has come to ask the same questions for students with disabilities that we have been asking about students without disabilities. What are they achieving? Are they staying in school? Are they prepared to enter the work force when they finish school? Are they going on to participate in postsecondary education and training? Are they prepared for adult life? (p. 2)
138
Chapter 6
Reflective of this need, the National Center on Educational Outcomes (NCEO) was funded to work with states and federal agencies to develop a conceptual model of educational outcomes for students with disabilities. The outcome classes and indications developed through the work of NCEO include eight domains (Ysselkyke, Thurlow, & Gilman, 1993; Rusch & Chadsey, 1998): 1. Presence and Participation—opportunities for physical presence as well as active and meaningful participation in school and the community by all individuals. 2. Accommodation and Adaptation—availability and use of adjustments, adaptive technologies, or compensatory strategies that are necessary to achieve outcomes. 3. Physical Health—extent to which the individual demonstrates or receives support to engage in healthy behavior, attitudes, and knowledge related to physical well-being. 4. Responsibility and Independence—extent to which the individual’s behavior reflects the ability to function, with appropriate guidance or support, independently or interdependently, and to assume responsibility for oneself. 5. Contribution and Citizenship—individual gives something back to society or participates as a citizen in society. 6. Academic and Functional Literacy—use of information to function in society, to achieve goals, and to develop knowledge. 7. Personal and Social Adjustment—individual demonstrates socially acceptable and healthy behaviors, attitudes, and knowledge regarding mental well-being, either alone or with guidance and support. 8. Satisfaction—favorable attitude is held toward education.
Special education is also facing increasing questions about what happens to students with special needs following graduation. Therefore, outcome-based evaluation also considers the postgraduation and transitional status of these students. Desired postgraduation outcomes relate to the individual’s employment, living/residential status, and quality of life (Halpern, 1993; Phelps & Hanley-Maxwell, 1997; Rusch & Chadsey, 1998). Potential measures that will allow evaluators to determine in-school, transition, and postschool outcomes for persons with disabilities are summarized in Table 6.3. The measures found in Table 6.3 are based on the work of Bruininks, Thurlow, and Ysseldyke (1992), Hunt et al. (1994), Meyer and Evans (1993), Rusch and Chadsey (1998), Schalock, et al. (1992), Vanderwood et al. (1995), and Ysseldyke, Thurlow, and Shriner (1992), and annual reports to the U.S. Congress on implementation of the IDEA (). The eight NCEO outcome classes and indicators listed on page 138 are incorporated into the respective measurement approaches found in Table 6.3.
Outcome Measures: Health Care There are a number of ways to think about outcome measures in health care. For example, one might consider the structure of the current health delivery system with its emphasis on managed care, health maintenance organi-
140
Chapter 6
zations, capitation, and patients’ “bill of rights.” One might also consider the comprehensiveness of health care that includes the aspects of structure that allow for health education, personal preventive services, diagnostic and therapeutic services, and rehabilitative and restorative services. Finally, one might consider that health is a state of complete physical, mental, and social wellbeing and not merely the absence of disease or infirmity (World Health Organization, 1997). Today, health care is almost synonymous with managed care. The term managed care refers to a collection of strategies for containing cost and utilizing care-related services and includes strategies for controlling costs and improving access that focus on primary care and prepaid arrangements as an alternative to traditional, fee-for-service based, retrospective reimbursement of costs. Managed health care can be as simple as gatekeeping structures that screen access to care, or as complex as systems that coordinate all of the various types of acute and long-term care that may be needed by an individual through contracts for the provision of this care at a reduced cost. Managed care may also control direct access to specialists and reduce costly emergency and inpatient care by increasing preventive care. The movement toward managed care in the United States has had a number of significant impacts on the field of outcome-based evaluation. Among the most important: It has irreversibly altered the relationship between patients and their health care providers. Some view it as the answer to the explosive growth to health care costs and as a way to ensure a more efficient use of limited health care resources. At the same time, managed care has demanded increased vigilance and sophistication from consumers. The typical managed care enrollee is faced with charting a path through various plans and restrictions, gaining an understanding of rapidly changing health care relationships, and responding to decisions denying requested care or denying payment for care already received. Person-referenced outcomes are difficult to track due to multiple providers, (potentially) brief involvement with any one provider, and potential provider instability The primary outcomes focus is on the accountability dimension of the reform movement, with more emphasis currently being placed on efficiency and effectiveness as opposed to consumer satisfaction. It is for this reason that we are currently (early 2000) seeing the increased calls for “patients’ bill of rights.” The development of performance standards and practice guidelines to ensure adequate treatment and maximize potential person-referenced outcomes.
Selecting Outcomes
141
There are also a number of factors about health-related outcomes that need to be kept in mind before selecting particular ones for inclusion in an outcome-based evaluation. Chief among these include (Donabedian, 1992; lezzoni et al., 1994): Outcomes do not directly assess quality of performanc; they only permit an inference about the quality of the process and the structure of care. The degree of confidence in that inference depends on the strength of the predetermined causal relationship between the process and outcome, and the modification due to factors other than health care. Because the relationship between process and outcome is a probability, it is necessary to collect an appropriately large number of cases before one can infer if care is better or worse or meets specified standards. Outcomes have the important advantage of being “integrative.” They reflect the contributions of all those who provide care, including the contributions of patients to their own care. Outcome measurement requires specification of the appropriate “time window,” which is the time when outcome differences caused by degrees of quality in health care are most manifest. There is a definite relationship between health-related outcomes and conditions prior to health-related interventions. This is especially true of the more chronic conditions such as cancer, acquired immuno-deficiency syndrome, chronic pulmonary disease, coronary artery disease, congestive heart failure, peripheral vascular disease, severe chronic liver disease, diabetes mellitus with end organ damage, chronic renal failure, nutritional deficiencies, and dementia. The current renaissance in outcomes measurement in the health care sector is largely driven by a rethinking of general health and health-related quality of life measurement.
General Health Rosenblatt and Attkisson (1993) suggest three health domains necessary to achieve the requisite breadth for defining general health status: clinical status, functional status, and life satisfaction and fulfillment. Similarly, Donabedian (1992) suggests a number of outcome measures that can be used to evaluate health care. These include: clinical symptoms, physiological-biochemical functions, physical, psychological, social and psychosocial, and integrative outcomes.
142
Chapter 6
Health-Related Quality of Life Health-related quality of life (HRQOL) is currently being used to reflect how well an individual functions in daily life and what their perceived wellbeing is. Within the HRQOL literature, there is a consensus that health-related quality of life is a multidimensional phenenomon encompassing the following core dimensions: general satisfaction and feelings of well-being, physiological state/symptoms of illness, neuropsychological functioning, interpersonal relationships, performance of social skills, and economic and employment status.
143
Selecting Outcomes
These general and HRQOL outcomes, along with organization-referenced measures, are summarized in Table 6.4. The tabled potential outcomes are based on the work of Donabedian (1982, 1992), lezzoni et al. (1994), Pulcini and Howard (1997), and Walsh and Kastner (1999).
Outcome Measures: Mental Health Today’s mental health system—referred to increasingly as behavioral health care—is being impacted by a number of significant factors including prospective payment schemes, capitation, managed care, community placements, the supports paradigm, assertive community treatment, and electronic medical records (Bickman, 1996; Bond et al., 1990; Clifford, 1999; Fonagy, 1999; Graham, 1999; Sederer & Dickey, 1996). Additionally, the system is being transitioned from an institution-based service delivery system to one that is increasingly focusing on active community treatment or partial hospitalization. As the system undergoes these significant service delivery and cost control changes, there is tremendous variation in the transition of programs from fee-for-service (FFS) to capitated payments, from global budgets to capitated payments, and from FFS to global budgets (Masland, Piccagli, & Snowden, 1996). Many states, for example, have capitated Medicaid enrollees (the biggest group of those with mental illness). Three additional factors influence how one interprets mental health–related outcomes (Manderscheid, 1998; Masland, Piccagli, & Snowden, 1996): (1) restrictions in public programs often limit the type and volume of services that will be paid for and thus frequently prevent the client from getting the most appropriate services; (2) captiated systems can remove these restrictions by combing revenue streams and increasing providers’ flexibility to use funds, which may allow providers to reduce service fragmentation and substitute potentially more efficient and effective services; and (3) capitation also passes along to mental health providers greater risk for the cost of services. This creates incentives for providers to alter their practice patterns by replacing when possible high-cost services with equally effective low-cost services. Therefore, capitated service providers replace in-patient services with community-based services. In thinking about factors impacting mental health–related outcomes the following outcome measurement conceptual framework is helpful (Rosenblatt & Attkisson, 1993): The impacts of severe mental disorder extend to the family members as well as to society at large. As a result, outcome measurement needs to address and integrate the perspectives of many potential stakeholders.
144
Chapter 6
A severe mental disorder impacts a wide range of social activities. As a result, measures need to address the many social contexts and living environments in which the mental disorder is manifested. A severe mental disorder is rarely cured and is most often a persistent condition with cyclical improvements and episodic, acute relapses. Thus, both measures that assess the “ebb and flow of symptoms” and the longitudinal changes in behavior and symptoms is required. A severe mental disorder has consequences for, and interrelationships with, physical health status. Thus, mental health–related outcome measures need to address and capture changes in general health as well as mental health status. Because of the complexity of the disorder and its manifestation, multidimensional measurement is essential. The outcomes considered suitable for understanding mental health outcomes for adults has expanded from proximal indicators, such as recidivism and symptom reduction, to more remote indicators such as quality of life and functional ability. These measures are summarized in Table 6.5. Beginning in the 1980s, children’s mental health services have focused on providing familycentered community-based services (Heflinger, 1992; Rugs & Kutach, 1994). In addition to those potential outcomes listed in Table 6.5, Rugs and Kutash (1994) suggest the following outcomes for children: (1) critical incident indicators such as abuse and neglect, suicide attempts, and sexually abusing others; (2) role performance indicators such as school attendance, peer relationships, living with family; (3) moderating factors such as feelings toward self, physical health, substance use, cognitive abilities/performance; and (4) organization outcomes such as reduced family burden, costs (institution versus community based), service utilization patterns, supports, and fidelity to the service delivery model. A number of potential adult’s and children’s mental health outcome measures, along with organization-referenced outcomes, are summarized in Table 6.5. The tabled outcomes are based on the work of Bachrach (1996), Bickman (1996), Calsyn et al. (1995), Campbell (1996), Chandler et al., (1996), Cook and Jonikas (1996), Dickerson (1997), Goodman et al. (1997), Lamb (1996), Lehman, Postrado, and Rachuba (1993), Lyons et al. (1997), McGlynn (1996), and Srebnik et al. (1997).
Outcome Measures: Disabilities The 1990s saw a surge in the rethinking and redrafting of policy related to disability. The Americans with Disabilities Act of 1990, the reauthorization
Selecting Outcomes
145
of the Individuals with Disabilities Education Act in 1991, and the 1992 Rehabilitation Act Amendments comprise a body of antidiscrimination legislation and service priorities that emphasize greater access to services and full involvement of individuals with disabilities in community life and service delivery. These laws were written through the collaborative efforts of people with disabilities, parents, professionals, and elected officials. This collaboration reflects an atmosphere of social activism, which has sought to empower per-
146
Chapter 6
sons to have greater control over their lives, the services they need, and the level of inclusion in the broader community. The field of disabilities is also experiencing a change in the way people with disabilities are viewed. In this regard, the relationship among terms such as pathology, impairment, functional limitations, and disability are being rethought (Institute of Medicine, 1991; World Health Organization, 1997). For example, “pathology” is associated with an abnormality at the cellular or tissue level, such as tubercular sclerosis. The pathology then produces an “impairment” at the organ or organ system level, such as brain dysfunction. The brain dysfunction then produces a “functional limitation” at the organism level, such as low intelligence. However, a functional limitation becomes a “disability” only when it impacts or interferes with the person’s social role or functional level. This four-stage conceptualization of the disabling process has a number of implications for outcome-based evaluation. Among the most important: Disability is neither fixed nor dichotomized; rather, it is fluid, continuous, changing, with an evolving set of characteristics that depend upon the characteristics and supports available within the environment. One lessens functional limitations (and hence a person’s disability) by providing interventions or services/supports that focus on adaptive behavior and role status. Outcome-based evaluation focuses on the extent to which the functional limitations have been reduced and the person’s adaptive behavior and role status enhanced. Thus, both practically and conceptually, the field of disabilities is currently undergoing a significant paradigm shift that influences the selection of outcome measures. First, the consumer movement in the field of disabilities is revolutionizing the way education and rehabilitation programs operate and the outcomes that they use to evaluate their interventions, services, or supports. Critical concepts in the movement include choices, inclusion, equity, supports rather than facilities, and consumer-driven evaluation. The net impact is to focus on valued, person-referenced outcomes that reflect choices as to where one lives and works, decisions about what is important in one’s life, inclusion in regular community activities and environments, and increased adaptive behavior and role status. Second, we are also experiencing a change in the way people with disabilities are viewed and served. This change, which is frequently referred to as the supports paradigm, has impacted outcome-based evaluation significantly. Support assessment involves many disciplines working as a team, analyzing a variety of assessment findings to determine the anticipated level of
Selecting Outcomes
147
needed support. This anticipated level of support is based on the strength and limitations of the person and his or her environment, not simply on the individual’s diagnosis. With an individual’s support profile, at any given time there is likely to be a varied array of support needs and intensities, which should decrease over time if the intervention or services received are effective. Thus, the intensity of support needs over time has become a major outcome (for either the individual or the organization) that needs to be monitored. Third, one needs to focus on adaptive behavior and a person’s role status that reflects their enhanced functioning. And fourth, person-centered quality of life (PCQOL) has become the unifying theme around which one can organize person-referenced outcomes for individuals with disabilities. There is an emerging consensus that PCQOL is a multidimensional concept that involves at least the following eight core dimensions: emotional well-being, interper-
148
Chapter 6
sonal relationships, material well-being, personal development, physical wellbeing, self-determination, social inclusion, and rights (Schalock, 1996). Table 6.6 lists a number of potential person- and organization-referenced outcome measures for persons with disabilities. The measures listed are based on the work of Bradley (1994), Bruininks, Thurlow and Ysseldyke et al. (1992), Gardner and Nudler (1997), Gettings (1998), and Schalock (1999).
Outcome Measures: Aging Social service programs for persons of age are currently being buffeted by four significant trends. First, the life span of people has increased dramatically, raising significant challenges for policymakers and service providers. This increased longevity has occurred within the context of three moral dimensions that constitute powerful frameworks that potentially work against older persons (Taylor, 1989): (1) a sense of respect for and obligation to others that elevates freedom and self-control, places a premium on avoiding suffering, and sees productive activity and family life as central to our well-being; (2) the understanding of what makes a full life and the things we value, such as physical and mental well-being, choices, giving and receiving involvement, and productivity; and (3) a sense of dignity, or commanding respect of those around us. Second, as a result of the potential devaluation and depersonalization, there has emerged a strong movement to protect the rights of older persons. These rights include: freedom, independence, and free exercise of individual initiative; an income in retirement to provide an adequate standard of living; an opportunity for employment free from discriminatory practices; an opportunity to participate in the widest range of meaningful civic, educational, recreational, and cultural activities; suitable housing; the needed level of physical and mental health services; ready access to effective social services; appropriate institutional care when required; and a life and death with dignity. Third, the recent emphasis on successful aging has been useful in focusing attention on environmental factors that can enhance the aging process. Advocates of successful aging emphasize the importance of environmental factors as moderators of the aging process. Common throughout the literature on successful aging are the following principles (Baltes & Baltes, 1990; Raphael, 1996): there is much latent reserve existing among the elderly; knowledge-based interventions can offset age-related declines in cognitive mechanisms; optimal aging occurs under development-enhancing and age-friendly environmental conditions; functioning is enhanced through factors such as active lifestyle, social supports, socioeconomic status, and minimal medica-
Selecting Outcomes
149
tions; and policymakers and service providers need to focus on outcomes that are valued by the culture and the person. Although there is strong consensus that outcome-based evaluation should be done regarding programs for the elderly, there is little agreement at this point about what to measure (Ebrahim, 1994; Rai & Kelland, 1995; Rockwood, 1995). Schultz and Heckhausen (1996), for example, suggest that outcomebased evaluation focus on outcomes related to physical, cognitive, intellectual, affective, creative functioning, and social relations. Consistent with this suggestion, and the work discussed earlier on person-centered and healthrelated quality of life, a number of potential outcome measures for persons of age are presented in Table 6.7 The tabled potential outcome measures are based on the work of Harbert and Ginsberg (1990), Holstein and Cole (1996), Schalock, DeVries, and Lebsack (1999), Taylor (1989), The United Nations (1991), and Ware et al. (1996).
150
Chapter 6
Outcome Measures: Substance Abuse Current programs for those who abuse alcohol and other substances are affected in large part by the same four trends discussed earlier for mental health services: community placements, community support and rehabilitation efforts, a focus on adaptive behavior and role status, and a broader conception of what constitutes successful treatment and rehabilitation. It is important to remember, however, that it was during the 1960s that drug addiction was redefined in terms of social policy as a disease rather than a crime, which set the stage for providing public funding to drug treatments through the 1963 Community Mental Health Center Act. However, community treatment for drug addiction did not really begin until 1970 when amendments to the legislation provided funds specially for drug abuse. This change was taking place during a period when the number of “addicts” was increasing, and the changing sociodemographic profile showed that the problem was growing outside the inner-city neighborhoods and into nonminority populations. The high prevalence of criminal involvement by “addicts” was also a concern, and the existing treatment programs serving criminal justice referrals and civil commitments had a lackluster record. Finally, there were new and untried drug treatment modalities becoming available and claiming effectiveness, such as methadone maintenance (Simpson, 1993). The past decade has brought radical change to substance abuse treatment. Lengthy inpatient treatment and residential programs are being replaced by community residential facilities (Moos & King, 1997). In addition, outcomes assessment, which was rarely a feature of inpatient treatment, has been integrated into the routine clinical procedures of most treatment programs (Grissom, 1997). The net impact on outcome-based evaluation of these changes in substance abuse programs and anticipated outcomes include: provision of a wide range of interventions that include cognitive, behavioral, pharmacolgic, counseling, and combined treatments; treatment in a wide variety of settings that include inpatient, residential, halfway house, day hospital, partial hospitalization, or outpatient; focus on a broad range of outcomes that include both cognitive and behavioral changes; development of an integrative substance abuse outcomes management system. Table 6.8 lists a number of potential person and organization-referenced outcomes related to the evaluation of substance abuse programs. The measures listed are based on the work of Alemi et al. (1995), Floyd et al. (1996), Holcomb, Parker, and Leong (1997), Moos, Pettit, and Gruber (1995), and Rosenheck, Frisman, and Gallup (1995).
Selecting Outcomes
151
152
Chapter 6
Outcome Measures: Corrections Assessing the effectiveness of criminal justice programs is currently both evolving and undergoing significant change (Bureau of Justice, 1994; Byrne & Taxman, 1994; Heilbrun & Griffin, 1998; Mobley, 1998). As the field of corrections moves from a focus on incarceration and institution-based programs toward more community-based services and prevention/diversion programs, the emphasis on evaluation is also changing. My assumption is that the changes most relevant to the readers of this text probably relate to the shift toward comprehensive community justice, using a social-ecological model in program design, development, implementation, and evaluation (Heilbrun & Griffin, 1998; Hernandez, Hodges, & Cascardi, 1998; McMurran, Egan, & Ahmadi, 1998). Critical to this approach are four groups of factors that effect the potential expression of criminal behavior and thus become the focus of outcomebased evaluation: risk factors and their amelioration, supports and their enhancement, community participation and its increase, and legal involvement and its decrease. Risk Factors
For each community these can include: percentage of families living at or below the poverty level, rates of domestic violence, presence of gangs, availability of drugs, number of individuals unprepared to enter the workforce because of a lack of job skills, homelessness in the community, rate of substance abuse, violent and physically harmful environments in the community, lack of positive opportunities for involvement, and rate of recidivism in the community. Their impact on crime-related behavior is clear: the more risk factors present in the community and experienced by individuals within the community, the higher the crime rate (Bureau of Justice, 1994). Supports
There are a number of different types of supports that can play a significant role in potentially reducing criminal behavior. These include (Schalock, 1995b): emotional, which involves expressions of affection, interest, and concern; appraisal, which involves helping people to evaluate and make sense of their troubles and problems; informational, which involves providing advice about how to handle a problem; and instrumental, which involves providing material aid and services. There is currently considerable conceptual and practical interest in the use of natural community-based supports as a efficient and effective way to maximize rehabilitation and correctional services (Heilbrun & Griffin, 1998; Lovell & Rhodes, 1997; Lurigio, 1995; Roberts, 1997).
Selecting Outcomes
153
Community Participation Evaluating the level and type of community participation of legal offenders is also in its infancy. A list of potentially significant community living and participation outcomes include (Schalock & Genung, 1993): controlling one’s environment and making decisions; contributing to one’s community; socializing and visiting with one’s neighbors; using one’s community as other community members do; spending one’s time as others do; living in a healthy and safe place; and being valued and accepted by others.
Legal Involvement A number of data sets can be used to monitor legal involvement and evaluate the outcomes from community-based correction and prevention/diversion programs. Among the most important: Law enforcement data. These data provide information on how law enforcement practices influence the criminal justice system, since law enforcement agencies have the greatest impact on the number of offenders entering the system. These data may include arrest and citation practices, traffic-related arrests, drug- and alcohol-related arrests, and charging practices. Court data. These data are directly related to the number of offenders in the system and how the offender moves through the system. These data are important since the philosophy and policies direct the operation of the system. Data may include information on bonding practices, failure to appear rate, case processing time, percentage released after first appearance, charges plea bargained to lower charges, type of offenses for which sentences are given, and average length of sentence served by offense. Corrections data. These data address the components of the system that supervise or restrict the offender’s movement, including probation, parole, prison, jail, and other community punishment or program areas. This information can identify individuals whose eventual sentence is community based. Specific data sets can include: (a) probation data such as numbers receiving probation based on presentence investigations or presentence recommendations, actual number under supervision, average length of supervision, percentage violated, and sanction used when violated; (b) jail data can include average daily population, total bookings, number of felony and misdemeanor inmates, diversion programs, and number of pretrial and sentenced inmates; and (c) community punishment or program data may include number of offend-
154
Chapter 6
ers in sanction, accessibility to programming, capacity, cost, eligibility requirements, length of program, and average length of time on a waiting list for service. Potential measures for evaluating correction programs and policies are summarized in Table 6.9. However, since community-based correction models are just emerging, the organization-referenced outcomes are referenced more to the development or early phases of program development. The tabled potential measures are based on the work of Bureau of Justice (1994), Byrne and Taxman (1994), Heilbrun and Griffin (1998), Lovell and Rhodes (1997), McMurran, Egan, and Ahmadi (1998), Mobley (1998), and Roberts (1997).
155
Selecting Outcomes
Generic Outcome Measures By now, the discerning reader will have noted two trends in the potential outcome measures summarized in Tables 6.2 to 6.9: They are not exhaustive, and there is considerable duplication across the eight outcome areas. Neither of these two trends should surprise anyone, for on the one hand, the number of potential outcomes is limited, and on the other, there are obvious desired outcomes that all key players in outcome-based evaluation want to see. Summarizing those is the purpose of this last section of the chapter. Before I do that, however, there are a few additional points that need to be discussed. First, the potential outcome measures summarized in Tables 6.2–6.9 generally meet the 10 selection criteria discussed earlier in the chapter and summarized in Table 6.1. Thus, in selecting specific outcomes, attention should be given to whether the outcomes incorporated into an evaluation are acceptable to promoters and stakeholders, conform to established psychometric standards, are affordable and timely, reflect major individual or organization goals, are connected logically to the program or intervention, are able to be evaluated longitudinally, and are accurate, culturally sensitive, and multidimensional. Second, be judicious in the number of outcomes selected. As we will see in Chapters 7 and 8, outcome measurement and analysis take time and are expensive, so select only those measures that relate clearly to two key questions: for what purpose will I use the outcome data, and what data will I need for the intended use? Involving promoters and stakeholders at the evaluation stage of selecting outcomes will definitely help in answering these two questions. And third, not all potential areas are covered in the eight outcome measures targeted for this text. Some readers, for example, might be interested in determining the effectiveness of psychotherapy; others, the effectiveness of welfare-to-work programs; and still others, the impact of drug education programs. For these readers, I would recommend the following: analyze the respective topical area and program in terms of its current context; review the program or area’s goals and objectives; study Tables 6.2–6.9 to determine logical outcomes; evaluate these potential outcomes against the 10 selection criteria found in Table 6.1; and select only those outcome measures that answer the two key questions and that are acceptable to your promoters and stakeholders. Table 6.10 represents my attempt to develop a set of generic outcomes that are common to the eight targeted areas summarized in Tables 6.2–6.9.1 believe, these generic outcomes meet the 10 selection criteria and thus potentially have heuristic value to both outcome-based evaluation producers and consumers.
156
Chapter 6
Summary In summary, the impact that the reform movement is having on each of the topical areas just discussed has considerable potential for each of the key outcome-based evaluation players: promoters, stakeholders, and evaluators. First, consensus on desired outcomes and acceptance of accountability for those outcomes facilitates collaboration among agencies and evaluators, since
157
Selecting Outcomes
no single agency or evaluator can accomplish all things for all consumers. Second, the reform movement can strengthen the role of individual programs or local communities in deciding the best strategies to solve local problems and the most effectively available resources to use. Third, when service providers are rewarded for improving outcomes with additional resources, they are motivated to make continuous improvements in service delivery that achieve necessary organizational changes and desired outcomes. Fourth, a collective focus on outcomes orients service personnel to new roles of evaluation participants and evaluation users. And fifth, focusing on outcomes helps clarify whether allocated resources are adequate to achieve both the organization’s mission and goals and the expectations of evaluation promoters and stakeholders. Both producers and users of outcome-based evaluation are greatly dependent on the validity of the outcome measures selected, the reliability with which those outcomes are assessed, and the value that evaluation promoters and stakeholders place on them. Successful outcome-based evaluation is like a three-legged stool that is only as strong as its weakest leg. The purpose of this chapter has been to “prop up” the first leg: the selection of valid outcome measures. Other chapters will deal with the other two legs (measurement and analysis). The important thing to remember is that producers of outcomebased evaluation have a number of issues and choices that they will face in planning and evaluating education, health care, and social service programs. Some of the issues are conceptual, some methodological, and some relate to analysis and interpretation. The next chapter addresses the methodology of measurement.
Study Questions 1. What are the differences among the four outcome measurement selection categories discussed throughout the chapter and summarized in Figure 6.1? Consider the program (including educational) that you are currently associated with. State two to three specific outcomes that would reflect performance assessment, consumer appraisal, functional assessment, and personal appraisal. 2. What is the reform movement, and how has it impacted outcome-based evaluation? 3. Interview an education, health care, or social service program administrator. Ask him or her to describe how the reform movement has impacted the respective program. 4. Review the selection criteria summarized in Table 6.1. Describe in your own words why these are so important in outcome-based evaluation. 5. Review Tables 6.2 and 6.3. What are the differences between the potential outcomes for regular versus special education? Justify these differences.
158
Chapter 6
6. If you are a college or university student, review again Table 6.2. How many of these potential outcomes are appropriate to your school? Can you think of others? What evidence do you see on your campus that outcome measures, such as those listed in Table 6.2, are being used? If they are, for what purpose? 7. Compare Table 6.5 and 6.8. Two facts are frequently noted in the literature: that many persons with mental illness also have problems with substance abuse; and programs for the two populations are frequently provided by the same facility. Thus, in reference to Tables 6.5 and 6.8, two questions: (a) what are the similarities and differences among the potential outcomes; and (b) how would you justify any differences among the selected outcomes? 8. Assume that you are a person with a cognitive or physical limitation. Which outcomes in Table 6.6 would you be most concerned about? Second, assume you are an administrator. Which outcomes would you then be most concerned about? And finally, assume you are a family member. Which outcomes would you then be most concerned about, or prefer to see in your family member? Compare the three sets. 9. What is your community’s and state’s attitude toward crime? What changes do you see in the approach they are taking to dealing with criminal offenders? Which potential outcome measures suggested in Table 6.9 best reflect your answers to these two questions? 10. Review Table 6.10. Would you agree that these are generic outcome measures? Why or why not?
Additional Readings Fonagy, P. (1999). Process and outcomes in health care delivery: A model approach to treatment evaluation. Bulletin of the Menninger Clinic, 63(3), 288–304). Gardner, J. F., & Nudler, S. (1997). Beyond compliance to responsiveness: Accreditation reconsidered. In R. L. Schalock (ed.), Quality of life: Vol. II: Application to persons with disabilities (pp. 135–148). Washington, D.C.: American Association on Mental Retardation. Hess, A. K., & Weiner, I. B. (eds.). (1998). The handbook of forensic psychology (2nd edition). New York: John Wiley & Sons. Manderscheid, R. W. (1998). From many into one: Addressing the crisis of quality in managed behavioral health care at the millennium. Journal of Behavioral Health Services and Research, 25(2), 233–237. Moos, R. H., & King, M. J. (1997). Participation in community residential treatment and substance abuse patients’ outcomes at discharge. Journal of Substance Abuse Treatment, 14(1), 71–80. Pulcini, J., & Howard, A. M. (1997). Framework for analyzing health care models serving adults with mental retardation and other developmental disabilities. Mental Retardation, 35(3), 209–217. Schultz, R., & Heckhausen, J. (1996). A life span model of successful aging. American Psychologist, 51(7), 702–714. Tucker, M .S., & Codding, J. B. (1998). Standards for our schools: How to set them, measure them, and reach them. San Francisco: Jossey-Bass Publishers. Vanderwood, M. L. (1995). Willing but unable: The search for data on the outcomes of schooling. Journal of Disability Policy Studies, 6(1), 23–50.
7 Measuring Outcomes OVERVIEW
159
Psychometric Measurement Standards Reliability Validity Standardization Group Norms Performance Assessment Effectiveness Efficiency Consumer Appraisal Satisfaction Fidelity to the Model Functional Assessment Adaptive Behavior Role Status Personal Appraisal The Concept of Quality of Life The Assessment of Quality of Life Summary Study Questions Additional Readings
164 164 164 165 165 165 165 169 171 175 176 180 181 182 186 187 189 193 194 194
The only meaning of ideas is found in terms of their possible consequences. WILLIAM JAMES
Overview Chapter 6 introduced the reader to potential outcomes and their selection criteria. In this chapter, I discuss the how these outcomes can be measured. 159
160
Chapter 7
The chapter introduces the reader to the “what” and “how” of performance measurement and value assessment. The “what” is shown in Figure 7.1, which is an extension of the evaluation model shown previously in Figures 1.2, 2.1, and 6.1. As shown in Figure 7.1, the evaluation standards (performance measurement or value assessment) and evaluation foci (individual or organization) are consistent with Figures 1.2, 2.1, and 6.1. However, each measurement category cell is expanded in Figure 7.1 to reflect the specific measurement focus discussed in this chapter: for performance assessment, the two foci are effectiveness and efficiency; for consumer appraisal, satisfaction and fidelity to the model; for functional assessment, adaptive behavior and role status; and for personal appraisal, person-centered or health-related quality of life. The eight measurement foci summarized in Figure 7.1 have emerged over the past decade in response to those factors discussed thus far in the text. Among the more important of these factors: Need for accountability in education, health care, and social service programs. Use of outcomes data for continuous program/service improvement. Devolution, homogenization, and downsizing, which have led to the need for best practices, standards, benchmarks, and report cards. Dramatic growth of interest in quality measurement. Consumers want information to guide choices among service providers, and purchasers (private or government) want to ensure accountability for quality in a competitive environment. Rise of consumerism and the desire for quality products. In this context, quality includes the concepts of appropriateness, access, acceptability, and satisfaction. The quality of life movement with its focus on quality products and person-referenced outcomes.
Measuring Outcomes
161
The measurement “how” requires that one operationalize the measurement foci identified in Figure 7.1 by assessing specific outcome indicators via specific measurement techniques. This process is summarized in Figure 7.2. As indicated in Figure 7.2, operationalizing the approach to performance measurement or value assessment is best considered a two-step process. First, specific outcome indicators are identified; and second, specific measurement techniques are used to quantify the outcome indicator(s). In reference to the first step of identifying outcome indicators, two examples related to functional assessment and personal appraisal are shown in Tables 7.1 and 7.2. Table 7.1, which is adapted from the work of Vanderwood et al. (1995), presents a list of potential student-referenced educational outcomes and associated outcome indicators. Similarly, Table 7.2, which is based on the work of Schalock (1999), lists core quality of life dimensions and associated outcome indicators. In reference to the second step, a number of specific measurement techniques can be used to quantify the selected outcome indicators. Later sections of the chapter discuss the following techniques: For performance assessment, goal attainment scaling, report cards, and cost-effectiveness analysis. For consumer appraisal, satisfaction surveys and fidelity to the model standards. For functional assessment, adaptive behavior measures and role status measures. For personal appraisal, quality of life attitude and rating scales.
162
Chapter 7
In thinking about these measurement techniques, with their associated outcome categories and outcome indicators, a frequently asked question is probably running through your mind: “Which specific indicators should I select to measure?” In addition to the 10 criteria for selecting outcomes discussed in Chapter 6 (Table 6.1), the following should be considered (Smith et al., 1997): (1) outcome assessment should be appropriate to the application or questions being answered; (2) tools for assessing outcomes should have demonstrated validity and reliability and must be sensitive to important behavioral change over time; (3) outcomes assessment should include the consumer’s perspective and responses; (4) outcome assessment systems should place a minimal burden on the respondent; (5) outcomes assessment should be multidimensional; and (6) outcomes should be initially assessed and reassessed at clinically or personally meaningful points in time. An overview of the measurement approaches to outcome assessment and appraisal is presented in Table 7.3. Due to the voluminous nature of the available scales and questionnaires, specific instruments will not be included in
Measuring Outcomes
163
this chapter, except as part of examples or exhibits. What the reader will find instead is an overview of each approach, along with relevant developmental principles and psychometric standards that are important to both the evaluation producer and user. I began this chapter with a brief discussion of four psychometric measurement standards that should guide the measurement process. The remaining four sections of the chapter correspond to the four measurement foci shown in Figure 7.1: performance assessment, consumer appraisal, functional assessment, and personal appraisal. Each section begins with a brief summary of the current Zeitgeist (“mood of the time”) and rationale for the respective measurement approach. Thereafter, the “how to’s” are discussed. Examples are provided along with appropriate guidelines regarding the development of the respective measures, their administration, and their interpretation.
164
Chapter 7
Psychometric Measurement Standards Standardized instruments are the typical way by which outcomes are measured. Hence, such instruments are used frequently to assess a number of aspects about people: their educational level, functional level, occupational or career interests, clinical status, values, personal orientations, symptomatology, academic achievement or aptitudes. Standardized instruments are also used to measure organizational outputs such as goal achievement, financial status, consumer satisfaction, and model standards. Himmel (1984) suggests three criteria for choosing a standardized instrument: psychometric quality, a graded response format, and utility. To these, I would add the following four essential psychometric standards: reliability, validity, standardization group, and norms. Reliability
Check the instrument’s manual (or the method section of the report) about the type of reliability determined for the test or measurement instrument used, and the magnitude of the reliability coefficient. The most common types of reliability are test-retest (administer the measure to the same individuals at Time 1 and Time 2), split-half (correlation of the total score on the first half of a measure with the total score on the second half), odd-even (correlation of the total score of the odd-numbered items of a measure with the total score on the even-numbered items), item-total (an average of the correlations of the score on each item on a measure with the total score on the measure), and inter-rater (correlation of ratings made by one observer of behavior with ratings made by a second observer). Reliability coefficients should generally be within the .80 to .85 range for the test or measure to be considered reliable or consistent.
165
Measuring Outcomes
Validity Again, check the manual or procedure section of the report to determine if the test’s or measure’s validity has been determined. Common forms of validity include content (do the measurement items measure what the test or measure proports to measure?), construct (do the items measure the underlying construct being studied?), predictive (does the person’s test results actually predict anything like future living or work status or performance?), or concurrent (are the current results consistent with a second, independent measure of the behavior or output under consideration?). Validity coefficients should generally be within the .60 to .70 range for the test or measure to be considered valid. Standardization Group Here, you want to check the manual or procedure section to determine if the assessment is to be used for comparative purposes, and, if so, on whom was the test or measure developed and standardized. One needs to be very careful not to use norms for comparisons that were developed on individuals dissimilar to those on whom the measurement is being made. This is an essential requirement in today’s multicultural and consumer-oriented world. Norms The comparative standards–or norms–are essential to understand and should be described and presented in the manual or procedure section. Again, be sure that the normative sample is comparable to the group being evaluated; otherwise, comparisons will be meaningless and fraught with practical and legal issues. For quality of life assessment, the person should be his or her own comparison.
Performance Assessment Effectiveness Effectiveness assessment measures are designed for making comparisons: comparisons over time within a provider group or organization; comparisons between providers, provider groups, or organizations; or comparisons to a performance goal. Since the 1960s the use of periodic measurement of program effectiveness has gained a foothold throughout the world and is reflected most recently in the United States in the passage of the Government Performance and Results Act (GPRA) of 1993 (P.L. 103-62) and the Government Management Reform Act of 1994 (P.L. 103-356). Under the GPRA, for ex-
166
Chapter 7
ample, each agency is to have strategic plans, performance plans, and program performance reports. In addition to increasing accountability, the major stimuli for developing these measures are: the rising costs of education, health care, and social services; the resulting reimbursement limitations imposed by taxpayers or third-party payers; and the increasing need to understand, document, and report outcomes for the purpose of program change and improvement. This section of the chapter outlines two approaches to effectiveness measurement: goal attainment scaling (GAS) and report cards. Although goal attainment scaling was developed initially for the evaluation of treatment outcomes (Kiresuk, 1973; Kiresuk & Sherman, 1968; Posavac & Carey, 1980), it shows considerable promise in evaluating program-referenced outcomes and is therefore useful in performance assessment. Similarly, even though report cards are just now beginning to be used in performance assessment, they are discussed in this section because of their current popularity and probable future use. Goal-Attainment Scaling
Goal-attainment scaling is a technique for measuring a program’s effectiveness. It involves three steps: (1) state the goal or anticipated outcome from the program or service intervention; (2) define operationally the anticipated outcome in objective, measurable terms; and (3) measure the level of attainment of the goal or anticipated outcome. STATE THE GOAL OR ANTICIPATED OUTCOME. For each of the text’s targeted groups, common goals include: Education: proficiency in reading, writing, math, and science; graduation rates; personal competencies Special Education: inclusion, adaptive behaviors, postsecondary status Health Care: clinical improvement and perceived wellness Mental Health: symptom reduction, recidivism, safety/welfare, medication level, living/employment status, legal involvement Disabilities: independence, productivity, community integration Aging: activities and instrumental activities of daily living; living arrangement; longevity; health status Substance Abuse: reduced drug usage, living/employment status, mental health or legal involvement, functional level, role status; health status Corrections: recidivism, involvement with the legal system, employment status
Measuring Outcomes
167
DEFINE OPERATIONALLY THE ANTICIPATED OUTCOME(S). This second step requires one to think about the results of the program or intervention. Mager (1962) describes a six-step process that works well for this second step: (1) write down the anticipated outcome; (2) jot down in words and phrases the performance that denotes that the outcome is achieved; (3) sort out the jottings and delete duplications and unwanted items; (4) repeat the first three steps for any remaining abstractions (fuzzies) considered important; (5) write a complete statement for each outcome indicator describing the nature, quality or amount considered acceptable; and (6) test the statement with the question, if someone achieved or demonstrated each of these outcome indicators, would I be willing to say they have achieved the outcome? When the answer is yes, the outcome is defined operationally, and then one moves to measuring the outcome. MEASURE THE OUTCOME LEVEL. The third step involves measuring the outcome level, realizing that different degrees of success (that is, outcome levels) are possible. The usual procedure is to use a 5-point rating scale such as: 5 (most favorable) to 3 (expected) to 1 (most unfavorable). If one wishes, different weights can be assigned to each outcome evaluated. These weights are determined subjectively and can typically reflect priority outcomes in a strategic plan. An example of the use of goal-attainment scaling in the evaluation of a community-based program’s attempt to establish group homes is shown in Exhibit 7-1. In addition to simply evaluating the presence/absence of group homes, this assessment considered that the potential consumers’ awareness was important and the present consumers’ acceptability critical. Therefore, outcome levels were assessed in all three areas. Thus, consistent with the methodology outlined in Figure 7.2, indicators of each potential outcome were established: for availability, the number of group homes; for awareness, the percentage of persons surveyed who knew of the services; and for acceptability, the percentage of consumers who approved of the services. The respective levels on each outcome category were then assessed pre- and postdevelopment. If desired, these data can be converted to standard scores for further comparisons, but certain assumptions are necessary to do so (Kiresuk, 1973; Kiresuk & Lund, 1978). Report Cards
Accreditation bodies, government agencies, and private organizations are developing performance measures or standards in the form of report cards as measures of a program’s effectiveness. This new field of performance as-
168
Chapter 7
Measuring Outcomes
169
sessment is increasingly being used as a mechanism for comparing key quantifiable aspects of performance across a range of health care (Carpinello et al., 1998; Dickey, 1996), and behavioral/mental health programs (Dewan & Carpenter, 1997; Teague, et al., 1997). Typical variables used in report cards include: complaint ratio, meeting industry standards, measured satisfaction, tracking clinical or behavioral condition(s), monitoring overall quality, focusing on prevention and screening, reporting overall results and outcomes. Although widely used, there are a number of questions regarding the development and use of report cards. Among the more important (Hibbard & Jewett, 1996): (1) the type of information consumers want to see in the report card; (2) achieving consensus on a core battery of measures to use as a standard for a national report card; and (3) a balance between systems-level outcomes (e.g., service access, service appropriateness, administrative efficiency) and individual-level outcomes (wellness, skill acquisition, community integration).
Efficiency Efficiency assessment generally equates to cost-effective analysis in that the costs and outcomes of alternative programs with similar goals are used to assess the relative efficiency of the two programs. Note, however, that a comparison (real or hypothetical) is required for valid cost-effectiveness to be done. Cost-effectiveness combines the quest for performance assessment of outcomes with the costs incurred by the program, intervention, or service. There are a number of guidelines and cautions that one needs to consider before embarking on cost-effectiveness analysis. First, the programs or interventions being compared need to have similar goals and equivalent clientele. Second, a matched pairs design (see Chapter 4) can be used to generate the comparison groups, but the matching variables need to correlate significantly to the outcome variables. Third, if programs or services cannot be matched successfully on the relevant variables, then analysis of covariance can be performed that statistically controls for any initial differences. Fourth, even with good matching, one needs to be aware of the major threat to the matched comparison design, which is regression artifacts, or the tendency for
170
Chapter 7
scores to drift or regress over time toward their respective population means. Fifth, the determination of costs can be problematic. Rudolph and his colleagues (1998) have recently published an example of cost-effectiveness analysis of a community behavioral support and crisis response demonstration project. The demonstration project reflects the development of a number of community behavioral support and crisis response programs that have been developed recently to respond to the needs of individuals with disabilities and challenging behaviors without resorting to institutional placements. These programs have a number of similar characteristics including interdisciplinary approaches, an emphasis on preventative services, and the provision of short-term residential alternatives to psychiatric hospitalization and/or institutional placement (Rudolph et al., 1998). The particular Special Services Program (hereafter The Program) compared in this analysis provided two basic types of services: (1) outreach services in the individual’s home, workplace, school, or other community setting; and (2) short-term (that is, 90 days or less) crisis placement services in a specialized unit. Services involved multidisciplinary assessment, and intervention focused on nonaversive responses to and functional analyses of challenging behaviors. Cost-effective analysis of The Program involved comparison of program expenditures with estimates of increases or decreases in service expenditures that would have arisen in the absence of the program. Case managers familiar with the clients and alternative program options were asked to describe the most likely service for each individual had The Program not been available. Scenarios were described in terms of the services that would have been purchased and the number of budget units (days, hours) for those services. The costs of these projected outcomes were computed using current average catchment area costs from the state service payment files. These alternative cost estimates were then compared to the development and operational costs of The Program. The results of the cost-effectiveness analysis are summarized in Exhibit 7-2. In summary, performance assessment involves comparisons: comparisons over time within a provider group or organization; comparison between stated goals and objectives and those actually obtained; or comparisons of the costs and outcomes of two or more alternative programs. The methods of goal attainment scaling, report cards, and cost-effectiveness analysis described in this section represent only three of the many techniques that can be used to determine whether programs are meeting their goals and objectives, and whether they are doing so with some degree of efficiency. (Other methods can be found in Chelimsky and Shadish, 1997, and Rossi, Freeman & Lipsey, 1998.)
171
Measuring Outcomes
Exhibit 7-2 Example of Cost-Effectiveness Analysis As noted in the text, estimates of the cost effectiveness of the Special Services Program [The Program] were based on projected changes in service dispositions of the 54 program recipients studied. Projected outcomes were obtained through interviews with each participant’s case manager. Dispositions were expressed in terms of types and amounts of program and professional services needed. Related expenditures were then estimated from average area expenditures for those services based on state payment files. Although there was no better source of projected outcomes for individuals in the absence of The Program than those individuals’ case managers, it was also important to validate their projections. This was done through follow-up on the 14 individuals who were unable to access The Program because they were not from the five
county priority service area served by The Program. Actual fiscal comparisons provided strong support for the validity of the case managers’ projections of likely service scenarios and expenditures in the absence of The Program. The cost-effectiveness analysis indicated that the net projected increase expenditure for participants in the absence of The Program were over $722,000. Based on projected expenditures for alternate services and established costs for development and operation of The Program for the cornparison year, expenditures for program participants were $282,172 less than the cost of what the services would have been without the availability of The Program, Source: Reprinted with permission from Rudolph et al. (1998).
Consumer Appraisal Current education, health care, and social service programs are being impacted significantly by two phenomena. One is the movement toward assessing the value and quality of respective programs on the basis of customer satisfaction; the second is the development of new models of service delivery that reflect the devolution of government, the homogenization of services, and the community-based movement in mental health, disabilities, aging, substance abuse, and corrections. As discussed previously, quality outcomes are no longer grounded in the industrial-regulatory perspective wherein quality was defined as conforming
172
Chapter 7
with regulation and specification (Wren, 1979). In contrast, current definitions of quality are rooted in the postindustrial, knowledge-based society. The worldwide growth of service economies and the information revolution have elevated the importance of customer service, and because these change over time, services must be flexible to accommodate the consumer (Gardner, 1999). Albrecht, for example, suggests that, “quality in the 21st century must start with the customer, not with the tangible product sold or the work processes that create it. This is a profound change in focus, from activities to outcomes” (1993, p. 54). This customer focus is not limited to the private sector. Osborne and Gaebler (1993) furthermore suggest that customer-driven services in all sectors have a number of advantages since they force service providers to be accountable to their customers, depoliticize the choice of provider services, stimulate more innovation, give people choices among different kinds of services, waste less because they match supply to demand, empower customers to make choices, and create greater opportunities for equity. In reference to the development of new models, the challenge is the same: how to evaluate the organization-referenced outcomes of programs that are changing the way they deliver services. Examples include community-based programs such as active community treatment in behavioral health care, supported living and employment programs in disabilities, in-home living assistance for the elderly, community-based programs and supports for people who abuse drugs, community-based diversion programs for legal offenders, and systems-level case management. These two phenomena–the movement toward assessing the value and quality of respective programs on the basis of customer satisfaction, and the development of new models of intervention and service delivery–challenge users and producers of outcome-based evaluation to use consumer appraisal methods to assess their programmatic outcomes. Although there are a number of techniques that one can use to conduct consumer appraisal, two will be considered in this section of the chapter: satisfaction and fidelity to the model. Both of these techniques employ rating scales, attitude scales, or questionnaires such as those summarized in Table 7.4. Rating scales are a series of ordered steps at fixed intervals used to rank people’s judgments of objects, events, or other people from low to high or from poor to good. Examples include (Sommer & Sommer, 1997): Graphic rating scales, wherein the person places a checkmark somewhere along the scale to indicate his or her evaluation (for example, “terrible” to “excellent”). Step scales, which require the rater to select one of a graded series of levels (for example, how would you rate the quality of : excellent through terrible).
Measuring Outcomes
173
Comparative rating scales, wherein the person is asked to compare one or more phenomena (for example, compared to your condition before admission, would you say that your condition is: much better, somewhat better, no different, worse, much worse?). Although easy to construct and answer, the major problem with rating scales is that they may be neither reliable nor valid. Hence, if one chooses to use a rating scale, it is incumbent to demonstrate at least test-retest reliability and either content or construct validity. A related problem is a lack of realization that rating scales tend to be at the ordinal level of measurement and should not be used in statistical analyses as though they were either interval or ratio. Attitude scales are a special type of questionnaire designed to produce scores indicating the intensity and direction (for or against) of a person’s feelings about an object or event. There are several types of attitude scales, but the two most frequently used are the Likert-type and the Semantic Differential. Likert-type scales presents a lists of statements on an issue to which the respondent indicates degree of agreement, using categories such as strongly agree, agree, undecided, disagree, strongly disagree. A Likert scale contains only statements that are clearly favorable, neutral, or clearly unfavorable, and generally uses either a 3-, 5-, or 7-point scale. The major advantage of the Likert scale is that one can score items as 5, 4, 3, 2, or 1 (if using a 5-point scale) and thus be able to analyze the data. The Semantic Differential is a procedure for measuring the meaning of concepts, wherein the person is asked to rate an object or concept along a series of scales with opposed adjectives at either end (for example,
174
Chapter 7
quiet/noisy, dangerous/safe, sad/happy). This procedure is good for measuring the meaning (i.e., connotative meaning) of things (Keith, Heal, & Schalock, 1996; Osgood, May & Miron, 1975). Research on the Semantic Differential has found three major categories of connotative meanings: value (for example, good/bad, ugly/beautiful, friendly/unfriendly, wise/foolish), activity (for example, fast/slow, active/passive, energetic/inert, fast/slow, excitable/calm), and strength (for example, weak/strong, large/small, hard/soft, heavy/light). Attitude scales pose the same advantages and disadvantages as rating scales. In addition, they reflect the fact that people’s opinions on a topic are complex and multidimensional. Thus, as with rating scales, users of attitude scales need to report the basis on which the items were generated, the field test of the items, and their demonstrated reliability and validity. Questionnaires are a series of written questions on a topic about which the respondent’s opinions are sought. They are a frequently used tool in outcomebased evaluation to gather information about people’s beliefs, attitudes, and values. There are two general aspects to a questionnaire: content (i.e., the subject matter) and format (i.e., its structure and appearance). The items (i.e., outcome indicators) of a questionnaire can be generated from a number of sources and via a number of techniques, such as focus groups and empowerment evaluation techniques (WhitneyThomas, 1997). It is critical that items are referenced to the outcome(s) being evaluated. The format can include either open- or close-end questions. With openend questions, the respondent is asked such questions as, what do you like (or dislike) about ? Open-end questions are desirable to use when the evaluator does not know all the possible answers to a question; when the range of possible answers is so large that the question would become unwieldy in multiple-choice format; when the evaluator wants to avoid suggesting answers to the respondents; and when the evaluator wants answers in the respondent’s own words. With closeend questions (multiple choice questions) the respondent is asked to choose among alternatives provided by the evaluator (for example, what do you think about the availability of services from ? Potential answers can then be scaled using a Likert-type scale such as described above. Close-end questions are desirable when there is a large number of respondents and questions; when the answers are to be scored by machine for further analyses; and when responses from several groups are to be compared (Sommer & Sommer, 1997).
Measuring Outcomes
175
Questionnaires have many of the same advantages and disadvantages as rating and attitude scales. In addition to the need to demonstrate test-retest reliability and content validity, the potential user should also be sensitive to the following limitations. First, they may be of little use with respondents who are limited in either receptive or expressive language. Second, the responses offered may be superficial and biased toward a positive or socially desirable response (a definite problem with some elderly, some persons in the criminal justice system, some students, and some persons with disabilities). Third, questionnaires strike many respondents as impersonal, mechanical, and demeaning and the response categories as limited, artificial, and constraining. Fourth, questionnaires are not suitable for examining deeper levels of motivation or opinions on complex issues (an interview might be better). With these limitations in mind, one needs to consider a number of factors for evaluating questionnaire items (Sommer & Sommer, 1997): (1) Is the question necessary and how useful will the answers be? (2) Is the item clear and unambiguous? (3) Is the respondent able to answer the question? (4) Will the respondent be willing to answer the questions asked? (5) Is the item short as possible, while remaining clear and concise? (6) Do the response options provide a comprehensive choice of responses, such as very negative to very positive? (7) Is the answer likely to be affected by the social desirability; if so, can the question be altered to reduce this bias? (8) Are the questions balanced so that the number of favorable items equals the number of unfavorable items?
Satisfaction Measuring customer satisfaction of organization-referenced outcomes requires a range of methods such as just described, both because the focus of services/interventions differ and because people define quality differently. Typically, rating or attitude scales are used to conduct customer value analysis (CVA), which tracks customer loyalty and satisfaction. The key question to ask, however, is, which outcomes should be measured via a satisfaction measure? Based on work in both private (Parasuraman, Zeithaml, & Berry, 1988) and public (Etter & Perneger, 1997; Schalock, 1986, 1995a) programs, the following are suggested outcomes to use as distinct dimensions of service quality or program-referenced outcomes: tangibles: the appearance of physical facilities, equipment, and people reliability: the ability to perform the promised service dependably and accurately responsiveness: the willingness to help customers and provide prompt service
176
Chapter 7
assurance: the knowledge and courtesy of employees and their ability to convey trust and confidence empathy: the caring and individualized attention provided to customers availability: the presence of needed services/interventions affordability: the ability to procure the services/interventions awareness: the degree to which the customer is aware of the potential services/interventions accessibility: the ability to access the services/interventions (e.g., transportation, barrier free environments) extensiveness: the variety of services/interventions offered appropriateness: the degree of correspondence between needs of the customer and the available services/interventions There are both advantages and disadvantages to using customer satisfaction as a measure of consumer appraisal. Its advantages include (Schalock, 1999): (1) it is a commonly used aggregate measure of organizational outputs; (2) it demonstrates a trait-like stability over time; (3) there is an extensive body of research on level of satisfaction across populations and service delivery recipients; and (4) it allows one to assess the relative importance of valued organization-referenced outcomes. Its major disadvantages include its limited utility for smaller group comparisons, the tendency for some consumers to show response perserveration or bias, its subjective nature, and its lack of correlation with objective measures. Hence, consumer appraisal should also include other approaches to outcome measurement, such as that discussed in the following section on fidelity to the model.
Fidelity to the Model With the significant shifts to case management/broker services, communitybased programs, and the supports model, a critical component to consumer appraisal is the evaluation of how well the respective model is meeting its objectives and providing the value promised by its advocates (McHugo, et al., 1999). In considering the evaluation of a program’s fidelity to the model it is
Measuring Outcomes
177
promoting, four critical standards need to be met (Bryant & Bickman, 1996): (1) an explicit description of the model; (2) the use of theory to guide the evaluation; (3) the use of methodological pluralism to evaluate the results; and (4) a statement of the criteria for assessing the quality of the outcomes. These four fidelity to the model standards are listed in Table 7.5 and described in more detail below. Description of the Model
In reference to the first step, the model needs to be explained in terms of its rationale, intended services/interventions, and anticipated outcomes. A systems approach to such description is desirous in which the input (mission statement and goals), process (services, interventions), and outcomes (both person- and organization-referenced) are described. Use of Theory
Hypotheses about adequate implementation and expected outcomes should flow from a program theory (Chen & Rossi, 1983; Peterson & Bickman, 1992) that allows one to formulate testable hypotheses about the model’s components. Peterson and Bickman (1992), for example, suggest that these hypotheses can be tested using either traditional hypothesis-testing analyses or matching approaches comparing actual with ideal (model) service provision. Frequently, concept mapping (Trochim, 1989) is useful in generating these hypotheses or criteria for evaluating the congruence between ideal and actual. In addition, one needs to specify clearly the intended outcomes of the model. Such intended outcomes will be model-specific. Methodological Pluralism
Methodological pluralism combines qualitative and quantitative methods. Depending upon the model being evaluated, these methods might include such techniques as questionnaires, naturalistic observation, chart reviews, logs, and interviews. Criteria for Assessing Outcomes
It is important to remember that the perception of quality depends on the values of the observer. As stated by Peterson and Bickman, “quality is abstract. It is that which appeals to expectations and preferences–a statement of goodness or symbolic value that is not easily measured” (1992, p. 166) The
178
Chapter 7
following three examples demonstrate that this fourth step (criteria) is also model-specific. As our first example, Bickman et al. (1995) and Bryant and Bickman (1996) evaluated the case management services at the Fort Bragg Demonstration Project, which provided a continuum of mental health care to military dependent children and adolescents in the Fort Bragg catchment area. In the project, traditional outpatient and acute hospital services were complemented by the immediate services of day-treatment, in-home counseling, therapeutic homes, specialized group homes, and 24-hour crisis management services. Furthermore, a multidisciplinary team planned care for children requiring services more intensive than usual outpatient treatment. The case manager was responsible for organizing the team’s reviews, ensuring that services were arranged, and monitoring children’s progress with respect to their treatment plan. Based on the four fidelity to a model standards summarized in Table 7.5, the following valued outcomes were identified: Treatment planning. Families and surrogate families are full participants in planning; the plan is comprehensive and individualized; and services are provided in the least restrictive, most normative environment that is clinically appropriate. Linkage and coordination. Services are accessible to families; staff are sensitive and responsive to cultural differences; the case manager mediates between the family and providers; the case manager keeps the family informed; the case manager links the family to supportive services; and the case manager helps adolescent clients with transition to adult mental health services. Monitoring and follow-up. Regular follow-up is maintained; and team reviews are conducted in a timely manner. Advocacy: The case manager focuses on the client and family; the parent feels efficacious; and the rights of the family and child are respected. The second example comes from the Core Indicators Project, which focuses on a number of potential outcomes dealing with managed long-term supports for people with disabilities (Gettings & Bradley, 1997). The model is based on person-centered planning and supports reflective of the current disabilities service delivery system, which is characterized by community-based alternatives, the supports paradigm, inclusion, empowerment, and the potential of persons with disabilities. The theory that guides the model is a combination of social constructivism, contextualism, human potential, and the supports paradigm. Methodological pluralism is used to evaluate both consumer outcomes and systems performance. A summary of the key domains assessed
Measuring Outcomes
179
(continued)
180
Chapter 7
and the criteria used to appraise organizational value-outcomes are found in Table 7.6. As shown, either averages or percentages are used regarding the number of persons who demonstrate each of the desired outcomes. The third example involves the desired outcomes from an assertive community treatment (ACT) model for persons with mental illness. This model reflects the current emphasis on community-based services and supports for persons with severe and pervasive mental illness. Typical goals for ACT programs include (Bond et al., 1990; Test, 1992): (1) service delivery within the individual’s natural environment; (2) assertive outreach; (3) shared caseloads among team members; (4) low staff-to-client ratio; (5) assistance with managing psychiatric symptoms, meeting basic needs, and improved instrumental functioning (e.g. employment, activities of daily living, and social interactions); and (6) ongoing personal, family, and social support. Given these goals, the following desired outcomes can be measured (via either one or more of the consumer appraisal techniques summarized in Table 7.4, or the functional assessment strategies described in the next section): movement of population to the community; services available and provided in the community; adequate community-based staffing (hours, coverage, and staff-to-client ratio); staff communication and planning; involvement of consumers in treatment planning evaluation; coordination with other support systems; demonstrated team approach; and evaluation of community outcomes.
Functional Assessment Functional assessment involves evaluating the person’s level of functioning across a number of relevant adaptive behavior domains and the role status that reflects one’s enhanced functioning. The most typical formats used in
Measuring Outcomes
181
functional assessments include rating scales, participant observation, and questionnaires. Each attempts to document a person’s functioning across a number of relevant behavioral or clinical domains. To accomplish this, most instruments employ some form of an ordinal rating scale to yield a profile of the individual’s functioning. For example, one might ask (or observe), How frequently do you use health care facilities or what is your current living/ employment status? There are a number of advantages in using functional, more objective assessments of one’s adaptive behavior or role status. First, objective measures can confirm results from the consumer appraisal strategies discussed in the previous section. Second, adding objective measures to personal appraisal measurement overcomes the commonly reported low correlation between objective and subjective outcome measures. Third, the use of functional measures allows for the evaluation of outcomes across groups. Fourth, objective measures provide important feedback to service providers, funders, and regulators as to how they can change or improve their services or interventions to enhance the recipients’ adaptive behavior level or role status.
Adaptive Behavior Adaptive behaviors are those behavioral skills that permit one to adapt successfully to one’s environment. They are commonly the target for assessment and intervention in education, health care, and social service programs. Typically, their assessment includes a 3- to 5-point Likert scale in which the person’s adaptive behavior level is assessed in reference to “does skill independently,” “does skill with assistance,” or “does not do skill.” Such data can be used to reflect the person’s adaptive behavior level upon admission, during, and following program involvement or intervention. A list of the more common adaptive behaviors are listed in Table 7.7 An example of an adaptive behavior scale based on the seven major life activity areas referenced in current federal legislation such as Public Law 95602 (Developmental Disabilities Assistance and Bill of Rights Act) is shown in Exhibit 7-3. This questionnaire, which I developed, has been used in a number of outcome-based evaluation studies related to adaptive behavior changes in mental health (Schalock et al., 1995), mental retardation (Schalock & Genung, 1993), and aging (Schalock, Devries, & Lebsack, 1999) service recipients. The questionnaire is included here primarily for three reasons. First, it reflects the two key steps in outcomes assessment and appraisal: the specification of indicators for each outcome category, and the use of specific measurement techniques (in this case a 3-point Likert scale). Second, it demonstrates that either the consumer or service provider can provide information in adaptive behavior assessment (note in the directions that it asks, “indicate
182
Chapter 7
the current level of assistance that either you or the service provider feel”). Third, it reflects the procedural need to evaluate potential outcomes at various points of a person’s involvement in a health or social services program. Role Status Role status refers to a set of valued activities that are considered normative for a specific age group. Examples include one’s living arrangement, em-
Measuring Outcomes
183
ployment setting, educational level, community participation, recreation-leisure patterns, and health status. For youth, attending school is a valued, agespecific activity; whereas for high school graduates and adults, living and working in the community are valued activities. A listing of valued role statuses is found in Table 7.7. A person’s role status is evaluated typically on the basis of longitudinal participant observation. This technique was used in a follow-up study (Schalock et al., 1992), for example, whose purpose was to summarize the current employment and living status of 298 students verified as either specific learning disability (SLD) or mentally handicapped (MH) who graduated between 1979 and 1988 from a rural special education program. Each graduate and/or parent was contacted for the purpose of collecting five sets of role-status-related outcome data: current employment status; employment outcomes (number of weeks employed during the past year, hours worked per week, and wages per hour); work-related benefits; current living arrangement; and primary source of income. Data were also collected on 12 predictor (of outcome) variables. Results indicated that: (1) 77% of the sample were working either full or parttime; (2) most former students were living semi-independently with parents or roommates, or in supported living arrangements; (3) across all outcome measures, students verified as SLD did better than students verified as MH; and (4) significant predictors of the outcomes included tested IQ, gender, verified handicap, hours in vocational programs, school enrollment, and number of years the resource teacher had taught. It is also possible to combine the use of adaptive behaviors, role status, and personal appraisal in the same evaluation study. That was the strategy employed in a recent study of the current status and quality of life of mental health service recipients (Schalock et al., 1997). This evaluation reflects a number of concepts discussed thus far in the text. For example, it reflects the challenge faced by administrators of mental health programs to evaluate person-referenced outcomes within the context of a changing paradigm in how we conduct mental/behavioral health research and program evaluation. Part of this changing paradigm is reflected in the use of methodological pluralism and longitudinal evaluation designs to evaluate multiple outcomes related to the person’s adaptive behavior and role status in the community. The changing paradigm is also reflected in the emerging broader conception of what constitutes successful mental health treatment outcomes for persons with mental illness, focusing on personal functioning, community adjustment, and quality of life. The study also reflects the fact that person-referenced outcomes from mental health services can be addressed at several levels: personal functioning measures (e.g., current problems, areas of needed assistance, and current employment/ living status); community adjustment (e.g., community resources used, social support networks, and activity patterns); and perceived
184
Chapter 7
Measuring Outcomes
185
186
Chapter 7
quality of life. The study involved 120 individuals representing three groups that were matched on age, gender, and educational level. The first group was composed of 40 (20 females and 20 males) psychiatric inpatients, whose average age was 38 years, whose average grade level was 12.1 years, and whose primary DSM-IV diagnoses were schizophrenia and other psychiatric disorders or mood disorders. The second group was composed of 40 (20 females, 20 males) club house clients who were previously inpatients at the same facility. Their average age was 37.7 years, average educational level, 11.7 years, and whose primary DSM-IV diagnoses were the same as Group 1. The third group was composed of 40 (20 females, 20 males) individuals from the adjacent community. They were matched on age, gender, and education level to members of Groups 1 and 2. Although summarized in more detail elsewhere (Schalock et al., 1997), both mental health samples reported significantly more problems than the community sample, especially mood swings, lack of motivation, physical problems, sadness and depression, sleep disturbances, problems with taking medication, and money management. The three groups also differed significantly in their mean scores on self-determination, capacity for independent living, and economic self-sufficiency. The club house group used more community resources, followed by the inpatient and community samples respectively. All three groups used family, friends, and church contacts approximately the same. In reference to social support networks, the mental health samples received less support from co-workers and neighbors. The pattern of support sources indicated that for the inpatient sample, the major support source was the family; for the club house group, friends; and for the community sample, a balance among family, friends, co-workers, and neighbors. For all groups, befriending was the most common support function, with in-home living assistance the second for the two mental health samples. Interestingly, there were no significant differences in the activity patterns among the three groups. As might be expected, both mental health groups had significantly lower assessed quality of life factor scores related to independence, productivity, community integration, and general life satisfaction.
Personal Appraisal Two trends have resulted in the focus on personal appraisal in outcomebased evaluation: the rise of consumerism and the pervasiveness of the concept of quality of life in education, health care, and social services. Although the measurement of a person’s attitudes is involved in consumer appraisal, as we saw earlier in the chapter, increasingly we are seeing personal appraisal used to assess one’s perceived quality of life. The concept of quality of life and its measurement concludes this chapter on measuring outcomes.
Measuring Outcomes
187
The Concept of Quality of Life As one who has studied and written extensively about the concept of quality of life over the past 15 years, it has been my experience that this concept has relevance to persons in education, health care, and social services for the following reasons (Schalock, 1999, 2000): (1) the concept of quality of life is a social construct that is impacting program development and service delivery in the areas of education, health care, disabilities, mental/behavioral health, aging, and substance abuse; (2) the concept is being used as the criterion for assessing individual-referenced valued outcomes from these programs; and (3) the pursuit of quality is apparent at three levels of today’s service programs: persons who desire a life of quality, providers who want to deliver a quality product, and stakeholders (policymakers, funders, consumers) who want quality outcomes. Additionally, it is apparent throughout the world that the concept of quality of life is being used as a: sensitizing notion that gives one a sense of reference and guidance from the individual’s perspective, focusing on the person and the individual’s environment; social construct that is being used as an overriding principle to evaluate person-referenced outcomes and to improve and enhance a person’s perceived quality of life; unifying theme that is providing a systematic framework to apply the quality of life concept. Education, health care, and social services have embraced the concept of quality of life both as a sensitizing notion and as an overarching principle of service delivery. As a sensitizing notion, the concept gives us a sense of reference and guidance from the individual’s perspective, focusing on the person and the individual’s environment. As a sensitizing notion, “quality” makes us think of the excellence or “exquisite standard” associated with human characteristics and positive values, such as happiness, success, wealth, health, and satisfaction; whereas “of life” indicates that the concept concerns the very essence or essential aspects of human existence (Lindstrom, 1994). The concept of quality of life has also become a social construct that is used as an overriding principle to improve and enhance a person’s perceived quality of life. In that regard, the concept is impacting program development, service delivery, management strategies, and evaluation activities in the areas of education (Snell & Vogtle, 1997), health (Coulter, 1997; Jencks, 1995; Jenkins, 1992; Nordenfelt, 1994; Renwick, Brown, & Nagler, 1997), disabilities (Felce & Perry, 1997; Gardner & Nudler, 1997; Keith, Heal, & Schalock, 1996; Schalock, 1996, 1997), mental health (Crimmins, 1994; Lehman, Postrando, & Rochuba, 1993; Schalock et al., 1997), and aging (Schalock, DeVries, &
188
Chapter 7
Lebsack, 1999; Schultz & Heckhausen, 1996). The concept is also central in the areas of participatory action research (Whitney-Thomas, 1997), consumeroriented evaluation (Oliver, 1992), and empowerment evaluation (Fetterman, 1994, 1997). Health-Related Quality of Life
The term “quality of life” was first used in medicine in connection with life or death situations (Romney, Jenkins, & Bynner, 1992). Now, however, the evaluation of one’s quality of life is routinely used in all kinds of medical decisions and evaluations, with the term “health-related quality of life’” (HRQOL) being used to reflect how well an individual functions in daily life and their perceived well-being (Coulter, 1997; Hays, Anderson, &Revicki, 1993). Within the HRQOL literature, there is a consensus that quality of life is a multidimensional phenomenon encompassing the core dimensions of general satisfaction and feelings of well-being, physiological state/symptoms of illness, neuropsychological functioning, interpersonal relationships, performance of social skills, and economic and employment status (Coulter, 1997; Faden & Leplege, 1992; Hays, Anderson, & Revick, 1993; Jenkins et al., 1990; Levine & Croog, 1984; Lindstrom, 1994; Romney, Jenkins, & Bynner, 1992, WHOQOL Group, 1993). HRQOL data are being used for clinical care, outcome evaluation, and benefit-cost analyses. In clinical care, questions related to quality of life dimensions are typically used to provide topics of discussion between providers and consumers who are making treatment decisions and who participate in choices related to their care (Cramer, 1994). In outcome evaluation, the focus is generally on measuring the postintervention functioning in daily life and the person’s perceived well-being (Bech, 1993; Ebrahim, 1995). HRQOL data are also frequently used in benefit-cost analyses to evaluate the personal and societal benefits and costs of alternative approaches to treatment (Gellert, 1993; Harris, 1987; Mulkay, Ashmore, & Pinch, 1987; Williams, 1985). Person-Centered Quality of Life
As with HRQOL, there is an emerging consensus in the person-centered quality of life (PCQOL ) literature that quality of life is a multidimensional concept. Current and ongoing research (Cummins, 1997; Felce & Perry, 1997; Hughes & Hwang, 1996; Hughes et al., 1995; Schalock, 1996, 1997) suggests strongly that PCQOL involves at least eight core dimensions: emotional wellbeing, interpersonal relationships, material well-being, personal development, physical well-being, self-determination, social inclusion, and rights. There is also good agreement that these eight core dimensions are valued by persons
Measuring Outcomes
189
differently, and that the value attached to each dimension varies across the life span (Flanagan, 1982; Schalock, 2000).
The Assessment of Quality of Life The application of the concept of quality of life to outcomes research and evaluation has been facilitated by the adoption of methodological pluralism that includes the use of personal appraisal and functional assessment. These two strategies reflect the historically based qualitative/subjective (personal appraisal) and quantitative/objective (functional assessment) methods. Personal Appraisal
This strategy addresses the subjective nature of quality of life, typically asking individuals how satisfied they are with the various aspects of their lives. For example, one can ask, How satisfied are you with the skills and experiences you have gained or are gaining from your job? or How happy are you with your home or where you live? Although individual’s responses are subjective, their responses need to be measured in psychometrically sound ways. Increasingly, a person’s measured level of satisfaction (i.e., personal appraisal) is a commonly used dependent measure in evaluating core quality of life dimensions. The advantages of using satisfaction as a basis are that: (1) satisfaction is a commonly used aggregate measure of individual life domains (Andrews, 1974; Campbell, Converse, & Rogers, 1976) and demonstrates a trait-like stability over time (Diener, 1984; Edgerton, 1996); (2) there is an extensive body of research on level of satisfaction across populations and service delivery recipients (Cummins, 1998; Schalock, 1997); and (3) satisfaction as a dependent variable allows one to assess the relative importance of individual quality of life dimensions and thereby assign value to the respective dimension (Campbell, Converse, & Rogers, 1976; Cummins, 1996; Flanagan, 1982; Felce& Perry, 1997). The major disadvantages of using only satisfaction as a measure of quality of life include the reported low or no correlation between subjective and objective measures of quality of life, its limited utility for smaller group comparisons, its tendency to provide only a global measure of perceived wellbeing, and its discrepancy with the multidimensional nature of quality of life (Cummins, 1996; Felce & Perry, 1997; Schalock, 1996). Because of these disadvantages, the general recommendation among quality of life researchers is to include both personal appraisal and functional assessment measures of the core quality of life dimensions.
190
Chapter 7
Functional Assessment
This strategy addresses the objective nature of quality of life. The most typical formats used in functional assessment include rating scales, participant observation, and questionnaires. Each attempts to document a person’s functioning across one or more core quality of life dimensions. To accomplish this, most instruments employ some form of an ordinal rating scale to yield a profile of the individual’s functioning. For example one might ask (or observe), How frequently do you use health care facilities?, How frequently do you visit the doctor?, How many days out of the last month have you spent in bed?, or How many civic or community clubs do you belong to? There are a number of advantages to using functional assessments to evaluate quality of life core dimensions. First, objective measures can confirm results from the personal appraisal strategy. Second, adding objective measures to personal appraisal overcomes the commonly reported low correlation between subjective and objective measures of quality of life. Third, their use allows for the evaluation of outcomes across groups. Fourth, objective measures provide important feedback to service providers, funders, and regulators as to how they can change or improve their services to enhance the recipient’s functioning level. However, there are also some disadvantages to functional assessment. First, functional assessment must be balanced with other considerations. For example, it is clear that not all outcomes related to one’s perceived quality of life can be measured. Second, functional assessments can have more cost than benefit. One needs to be cautious that the functional assessment system does not consume in resources more than its information is worth. Third, the usefulness of functional assessments varies by their use, since they are only useful to management or the decision-making process to the extent that they are used and that they answer the right questions. Fourth, organizations are sometimes limited in their ability to influence outcomes. Users of functional assessment data need to understand the role that many factors play in one’s perceived quality of life and not focus exclusively on the service provider. The approach to quality of life assessment that I propose is based on three assumptions: (1) quality of life is composed of eight core dimensions: emotional well-being, interpersonal relationships, material well-being, personal development, physical well-being, self-determination, social inclusion, and rights; (2) the focus on quality of life assessment should be on personreferenced outcomes; and (3) measurement strategies should include both personal appraisal and functional assessment. A model that incorporates these three assumptions is presented in Figure 7.3. As shown in the model, each of the eight core dimensions is defined operationally in terms of a number of specific indicators, which can include attitudinal, behavioral, or performance factors representing one or more aspect of each core dimension.
Measuring Outcomes
191
192
Chapter 7
Exemplary quality of life outcome indicators were listed in Table 7.2. Each of these indicators can be measured by using either the personal appraisal or functional assessment strategies described earlier. Each should also meet the outcome selection criteria summarized in Table 6.1. The advantage of using the approach to quality of life assessment depicted in Figure 7.3 is that one need not use different indicators for subjective versus objective measurement; rather, the core dimensions remain constant, and what varies is whether one uses a personal appraisal or a functional assessment approach to assess the respective indicator. Thus, all assessment is focused clearly on the core quality of life dimensions. However, it is important to point out that some of the domains are more amenable to personal appraisal; others to functional assessment. For example, personal appraisal might best be used for the core dimensions of emotional well-being, personal development, physical well-being and social inclusion. Hence, there is a definite need to use multiple measures when attempting to assess a person’s perceived quality of life. The ever-increasing interest in quality of life assessment and use of quality of life-related outcome data necessitate policy and practice guidelines. Based on my experience, nine assessment and application guidelines are summarized in Table 7.8. As an outcome measure, quality of life data need to be considered very carefully. Among the more important guidelines: Quality of life scores should be considered as relative and not absolute. Watch gain scores, for one’s subjective evaluation of life conditions may reflect a trait more than a changing condition. The potential ceiling effect may preclude longitudinal comparisons.
193
Measuring Outcomes
Do not use grouped data. One needs to realize that the standard of comparison is the person.
Summary In summary, this chapter has described the “whats” and “hows” of the four evaluation approaches to outcome assessment and appraisal: performance assessment, consumer appraisal, functional assessment, and personal appraisal. The suggested approach was based on an extension of the original outcome-based evaluation model (see Figures 1.2 and 2.1) that organizes outcome-based evaluation efforts around standards, foci, and critical indicators referenced to both the person and the organization. For performance assessment, the two major measurement approaches discussed were effectiveness and efficiency; for consumer appraisal, satisfaction and fidelity to the service delivery model; for functional assessment, adaptive behavior and role status; and for personal appraisal, person-centered and health-related quality of life. A two-step process to outcomes assessment and appraisal was also suggested: First, specific indicators for each outcome category are identified; and second, specific measurement techniques are used to quantify the outcome indicator. A summary of the suggested measurement approaches is found in Table 7.3. Throughout the chapter, a number of guidelines were suggested for measuring outcomes. Chief among these were: Tools for measuring outcomes should have demonstrated reliability and validity. Both personal appraisal and functional assessment strategies should be used. Outcome measurement should always include the person’s perspective. Outcome measurement systems should place minimum burden on the respondent and have the ability to be adapted across education, health care, and social service systems. Outcome measurement should use methodological pluralism. Outcomes need to be evaluated longitudinally. It is probably an understatement that measuring outcomes involves data collection and analysis. Good outcome-based evaluation includes gathering information about the characteristics of service recipients, the services or interventions they receive, and the cost of those services and interventions. It also involves the analysis and interpretation of the outcomes in reference to their validity, significance, and contextual basis. It is to these issues that we now turn in Chapter 8.
194
Chapter 7
Study Questions 1. What is the importance of the four psychometric measurement standards to outcome-based evaluation? 2. What is the two-step process suggested for outcomes assessment and appraisal? Give two to three examples of each step. 3. Review Tables 7.1–7.3. Describe in your own words the relationship among an outcome category, critical indicator, and respective measurement approach. 4. What is performance assessment? Compare the approaches suggested for effectiveness with that proposed for efficiency. How are they alike and how are they different? 5. What is consumer appraisal? What role does the measurement of satisfaction play in the fidelity to the model standards summarized in Table 7.5? 6. Find one example each of a rating scale, an attitude scale, and a questionnaire that is used to assess outcomes. Critique each in reference to (a) its stated purpose; (b) the relationship between the items rated and the measure’s stated purpose; and (c) its interpretability and potential use. 7. Construct a rating scale to measure consumer satisfaction with service quality or program-referenced outcomes related to tangibles, reliability, responsiveness, assurance, empathy, availability, affordability, awareness, accessibility, extensiveness, and appropriateness. In your construction, note the importance of the 2step process of question 2 and the need to define operationally your indicators. What advantage do you see to using a 3- to 5- point Likert scale? 8. Analyze a service delivery model with which you are familiar. Examples include community-based programs, distance learning programs, or school improvement programs. How would you evaluate the fidelity of the selected program? The standards summarized in Table 7.5 might help. 9. What is the difference between personal appraisal and functional assessment? Give two to three examples of each. 10. Quality of life measures were suggested in the text as appropriate personal appraisal outcome measures. What are their strengths and limitations; what are their advantages and disadvantages?
Additional Readings Fetterman, D. M. (1994). Steps of empowerment evaluation: From California to Cape Town. Program Planning and Evaluation, 17(3), 305–313. Hibbard, J. H. & Jewett, J. J. (1996). What type of quality information do consumers want in a health care report card? Medical Care Research and Review, 53(1), 28–47 McGlynn, E. A. (1996). Setting the context for measuring patient outcomes. New directions for mental health services, 71, (19–32). San Francisco: Jossey-Bass Publishers. Nordenfelt, L. (1994). Concepts and measurement of quality of life in health care. Boston: Kluwer Academic Publishers. Sommer, B., & Sommer, R. (1997). A practical guide to behavioral research: Tools and techniques
8 Analyzing and Interpreting Outcomes OVERVIEW
196
Input Variables: Recipient Characteristics Age and Gender Diagnosis or Verification Adaptive Behavior Level Role Status Throughput Variables Core Service Functions Cost Estimates Statistical Principles, Guidelines, and Analyses Statisitical Principles Statistical Guidelines Statistical Analyses Interpreting External Influences on Outcomes Clinical Significance Threats to Internal Validity Organization Variables Attrition Summary Study Questions Additional Readings
197 197 198 199 199 199 199 201 207 207 210 213 216 218 220 221 228 230 232 232
The mind must see before it can believe. ALEXANDER GRAHAM BELL
195
196
Chapter 8
Overview Believe me, there is no lack of data floating around most education, health care, and social programs! The problem is that much of it is either not usable for meaningful outcome-based evaluation or it is not retrievable. An unfortunate truism is that many programs are data rich but information poor. Part of the reason for this situation is that program administrators are bombarded with requests for data from funding, licensing, or accrediting bodies who ask frequently for different data sets. But part of the reason is also that most programs’ data systems have evolved over time, with little forethought given to the importance of developing an ongoing data-based management system that will provide data for multiple purposes, including analyzing and interpreting outcomes. This chapter was written to help that situation. But we also need to realize that data collection is hard work and expensive. Thus, an important recommendation: choose your data sets very carefully. Outcome-based evaluation requires data. As we will see in this chapter, data refer not just to the person- and organization-referenced outcomes discussed so far in the text, but also to recipient characteristics, core service functions, and cost estimates. I refer to these as core data sets. Throughout the chapter, I will also caution you to keep analysis and interpretation simple, for the processes involved in analyzing and interpreting outcomes are both timeconsuming and expensive. Thus, at the outset: Review two basic questions: For what purpose will I use the outcome data, and what data will I need for my intended use? Keep the following core data sets clearly in mind: recipient characteristics, core service functions, and cost estimates. Remember that a key purpose in gathering outcomes data is for formative feedback for both accountability and continuous program improvement purposes. Keep it simple. My strong advice is to measure a small number of reliable and valid core data sets. Throughout the chapter I stress five data selection criteria: (1) person- or organization-referenced; (2) complete (available for all program participants); (3) timely (current and covers the period of time you are interested in); (4) affordable (time, money, expertise); and (5) accurate (reflects actual events and characteristics). Use methodological pluralism for performance measurement and value assessment (see Chapter 7). Many factors influence a program’s or policy’s outcomes. As shown in Figure 8.1, outcomes are influenced significantly by three groups of factors:
Analyzing and Interpreting Outcomes
197
recipient characteristics, the core services provided and their cost, and a number of external influences. As discussed in subsequent sections of this chapter, the purpose of analysis is to relate inputs (recipient characteristics) and throughputs (core service functions and costs) to outcomes. Thus, the first three sections of the chapter describe the important input and throughput variables, along with key statistical principles, guidelines, and analyses. As discussed in the chapter’s fourth section, interpretation also involves relating external influences to outcomes; thus, this section discusses a number of external factors that can influence inputs, throughputs, and outcomes. Chief among these are: clinical significance, threats to internal validity, organization variables, and attrition.
Input Variables: Recipient Characteristics Think about three or four terms that describe typical clientele in education, health care, or social service programs. Most program recipients can be described quite adequately using five major descriptors: age, gender, diagnosis or verification status, adaptive behavior level, and role status.
Age and Gender Age and gender are pretty straightforward recipient characteristics. They are required frequently for program recipient description. Frequently outcomes are analyzed separately for either females or males, or for individuals of different age groupings.
198
Chapter 8
Diagnosis or Verification
Diagnosis or verification is a critical characteristic to include, for one’s diagnosis or verification identifies clearly the focus of the program’s or policy’s mission, goals, and intervention or services. Diagnosis or verification also determines eligibility for services, so from a reporting and accounting perspective, one needs to be sure that all service recipients are not just eligible for services, but also appropriate for those services. Adaptive Behavior Level
Because of the movement toward noncategorical funding and programs, it is increasingly necessary to consider adaptive behavior profiles as critical data sets that one can use for both the description of service recipients and as a basis for evaluating outcomes. One’s adaptive behavior level is inferred from one or more adaptive skills (see Table 7.7). Behavioral or functional skill indicators need to meet the following criteria (Jenkins, 1990): measure what they are supposed to measure (validity), be consistent across people or raters (reliability), measure change (sensitivity), reflect changes only in the situation concerned (specificity) Role Status
As discussed in Chapter 7 (see Table 7.7), role status refers to a set of valued activities that are considered normative for a specific age group. Examples include living environment (independent, semi-independent, supervised), employment situation (employed, unemployed, employment training, sheltered work, retired), education status (grade level/graduation or dropout status), health status (independent, semi-independent, congregate care), and community inclusion and involvement. A person’s statuses are very descriptive of his or her functioning level and can be used to both describe a person upon enrollment and as an outcome in those programs wherein there is a logical connection between skill deficits and the services provided. Status indicators also need to meet certain criteria including (Rapp et al., 1988): the list of statuses within a given domain, such as living, needs to be exhaustive and include all possibilities; each specific status needs to be mutually exclusive; statuses must be able to be hierarchically ordered from least desirable to most desirable, with a reasonable degree of consensus about the hierarchy; the measures need to be sensitive to change over time.
Analyzing and Interpreting Outcomes
199
In summary, information about recipient characteristics is used in all processes related to outcome-based data management, data analysis, data reporting, and data interpretation. The description need not be overly taxing as is seen in Exhibit 8-1. Note how a few core recipient characteristics can give the reader a good sense of the individuals involved in the evaluation.
Throughput Variables Core Service Functions Think for a moment about what services education, health care, and social service programs really deliver. There are four core functions: evaluation or assessment; education, health care, or rehabilitation program-specific services; service coordination; and ongoing supports. Evaluation/Assessment
This information is essential for both diagnosing and intervention planning. The important thing is to approach assessment from the perspective of intervention planning so that the information can be used as baseline data against which one can compare program- or service-related changes. There are a number of important guidelines regarding assessment: assess in the areas of service provision, assess both adaptive behavior skill levels and role status, consider the assessment as baseline data against which recipient progress can be compared, recognize there are a number of ways that information about people can be obtained including standardized procedures, personal appraisal, participant observation, status indicators, and performance based assessment. Program Services
Program services equate usually to education, rehabilitation, health-related intervention, support provision, diversionary programs, or training. Program services are the techniques employed that result logically in service recipients acquiring abilities, improved health, or increased functioning. Two essential points to consider: (1) one needs to describe the specific services that are being provided (for example, remedial education, job training, training in independent living, medical intervention, crisis management, counseling, workrelease, pain reduction, etc.); and (2) the description of these specific services
200
Chapter 8
Exhibit 8-1 Exemplary Descriptions of Service Recipients Special Education Students participating in the study had been verified in school as either specific learning disabled (SLD) or mentally handicapped (MH) according to the diagnostic criteria used by the State Department of Education. SLD children are children of school age who have a verified disorder in one or more of the basic processes involved in understanding or using language, spoken or written. MH children are children of school age who, because of retarded intellectual development, as determined by individual psychological examination and deficiencies in social adjustment, require additional regular educational programming.
Mental Health The primary diagnosis (DSM-IV) for the 387 persons composing the study sample included: attention deficit disorder (1.3%), mental retardation (2.3%), organic brain syndrome (6.3%) schizophrenic disorders (42.8%), paranoid disorders (0.7%), affective disorders (35.%), anxiety disorders (0.6%),
and adjustment disorders (10.9%). Their average age was 38.9 years; 57% were males, 43% females; 43% were voluntary admission, 8.9% were voluntary by guardian; 3.2% court order; and 33% were mental health board commitments. The sample averaged 2.9 previous admissions to the program. Marital status included 42.7% single, 22.4% married, 25% divorced 4.8% separated, and 4.7% widowed/ widower. Employment status included 71.7% unemployed at the time of admission.
Disability Subjects were 85 individuals (42 males, 43 females) who had been placed 15 years ago into either independent living or competitive employment and who had remained within that placement for a minimum of two years. Their average was 39.5 years (range = 33 to 74) and their average full scale IQ (Wechsler Adult Intelligence Scale) was, at the time of last testing, 67 (range = 40 to 91, SD = 12).
frequently requires a functional analysis and description of the interrogatories involved. An example is presented in Exhibit 8-2, which summarizes the interrogatories regarding a school to work transition model for students with special needs (Schalock et al., 1992). Service Coordination
Many service recipients obtain services from a number of sources. A typical scenario for a service recipient includes state-level certification and pro-
Analyzing and Interpreting Outcomes
201
gram entry, receipt of services, and systems-level coordination of ongoing supports. Service coordination ensures appropriate services through three major activities: (1) working with the assessment/ intervention personnel in identifying service or support needs in education, rehabilitation, living, medical and nursing, work, and recreational/leisure environments; (2) interfacing with other components of the service delivery system to obtain these services or opportunities; and (3) procuring the necessary supports for the individual to function well within the respective environment. The who and what interrogatories summarized in Exhibit 8-2 show well the critical importance of service coordination as a core service function. In reference to outcome-based evaluation and its interpretation, it is necessary to identify these service coordination functions, especially if they are involved in cost estimates, that impact either person or organization-related outcomes. Ongoing Supports
The supports paradigm is increasingly impacting service provision. Supports can generally be considered as resources and strategies that promote the interests and causes of an individual; that enable him or her to access resources, information and relationship within regular community environments; and that result in the person’s enhanced independence, productivity, community integration, and satisfaction (Klinkenberg & Calsyn, 1996; Schalock, 1995b). While the concept of supports is not new, what is new is the demonstration that the judicious use of supports results in significantly enhanced adaptive skills, functioning, and role status. There has been significant work recently in identifying support functions as reflected in Tables 8.1 and 8.2. The support functions and indicators listed are based on a computer-based research of supports. The 1,500 cited uses were aggregated initially into support categories related to quality of life core dimension areas and general support areas. A Q-sort was then completed by 50 professional collaborators. The top five support functions identified per area are presented in Table 8.1 (aggregated by core quality of life dimension) and Table 8.2 (aggregated by general support area). A comprehensive outcome-based evaluation requires that these support functions be incorporated into a description of the services provided and the program’s cost estimates. The type and intensity of supports can also be related to person- and program-referenced outcomes.
Cost Estimates After describing who is served and the core services provided, one needs to determine what the services or interventions cost. A cost estimate is one of
202
Chapter 8
Analyzing and Interpreting Outcomes
203
the most important components of outcomes analysis and interpretation. Being able to account for the resources used by a program is a crucial responsibility of any program administrator and is an absolute accountability requirement held by funders and other critical stakeholders. Cost estimates are also important for describing the intensity of the services provided, for budgeting program replications, and for evaluating whether the impacts produced by the program are sufficiently large to justify the program. Cost estimates begin with understanding revenue and expenditures, progressing to determining total and average cost per program recipient.
204
Chapter 8
Analyzing and Interpreting Outcomes
205
Revenue and Expenditures
Almost all education, health, and social programs have multiple sources of revenue that vary from public funds to user fees. Similarly, they have numerous expenditures. The secret of developing good cost estimates is to understand fully revenue, expenditures, and other economic costs. A simple listing is presented in Table 8.3. Total Costs
All programs have an accounting system that provides the basis for estimating total program costs based on the expenditures and other economic costs summarized in Table 8.3. Cost estimates can be produced from this accounting system without much effort and in an incremental fashion. All programs should seek to provide some basic cost information about their services that can be culled directly from their accounting systems with little work. Then, depending upon the question(s) asked, programs can examine more complex analyses involved in impact evaluation or benefit-cost analysis, that build on the basic estimates of cost. These additional cost data provide valuable information, but require more work and more complex analyses. The extent to which it is useful to pursue these additional issues depends on the questions asked, the developmental stage of the program, the interests of the stakeholders, and the information demands of policymakers. Therefore, a basic recommendation is that the type of cost data collected and the complexity of the cost analysis is dependent upon the type of outcome-based evaluation planned. A detailed discussion of all the issues involved in cost analysis is beyond the scope of this book. For a discussion of these issues and potential methods the interested reader is referred to Chang and Warren (1997), Criscione et al.,
206
Chapter 8
(1994), French et al., (1994), Kenny, Rajan, and Soscia (1998), and Ridenour (1996). For our purposes, the following guidelines will suffice: 1. A good accounting system will generally provide adequate information to estimate the total costs of the program as seen from the perspective of its own budget. 2. The first step is to define the period over which costs will be measured. It is generally best to select the most recently completed program fiscal year. 3. Use actual recorded costs rather than basing the cost analysis on budget or planning documents. 4. The period used for the cost analysis needs to correspond to the period of the evaluation. 5. Once the period has been defined, the costs are totaled for that period, resulting in expenditures reflecting actual resources used. This is relatively straightforward, except for capital expenditures and when the program accounts are maintained on a cash rather than accrual basis. As discussed more fully in Schalock (1995a) and Schalock and Thornton (1988), the accrual system offers substantial advantages in determining cost estimates. 6. In estimating total costs, it is important to keep in mind the level of precision needed for the cost analysis. It is often unnecessary to resolve all the issues about capital costs, accruals, or other cost factors. In many cases, it is clear that such refinements will only make trivial differences in the cost estimate. Average Costs
Although the above guidelines focus on total costs, determining total costs is often only an intermediate step of the cost estimate. The end goal is average costs. Total costs are determined to a great extent by the number of persons served. This means that total costs reflect program scale as much as the intensity of services provided to the typical participant. Average costs, on the other hand, are more easily compared with estimates of program impacts that will generally reflect the effect of the program on the average enrollee. Average costs also offer an important advantage when looking at the costs of several programs. By looking at the average costs of a program, intervention, or service, the cost analyst can focus attention on the issues of service intensity. Average cost is generally calculated by dividing the estimate of total cost for a specific accounting period by the number of persons served during that period. Since most education and social programs and many health care pro-
Analyzing and Interpreting Outcomes
207
grams (for example, Medicaid and Medicare) are in what one might consider a “steady state,” in which movement in and out is not rapid, average cost per participant can be determined by the following formula: Total Cost for Time Period Number of Persons Enrolled During that Period
Statistical Principles, Guidelines, and Analyses Once the input and throughput data are available, the next question that is asked frequently, is how should these data be analyzed? I make an assumption here that most readers of the text have had at least one course in statistics and therefore have some understanding of statistics. Although later in this section I discuss a number of specific statistical analyses related to outcomebased evaluation, I am less concerned that you understand statistics per se than that you understand what data really mean. To that end, this section of the chapter begins by discussing a number of general statistical principles and guidelines.
Statistical Principles I’ve spent much of my professional life dealing with numbers and trying to make sense of them. Sometimes this has been an easy task, and sometimes very difficult. The difference was frequently how I selected and measured critical outcomes in ways that were reliable and valid; designed the data collection process in a logical, organized fashion; and managed the data so that they were accessible, accurate, and timely. During this time, I have also discovered six principles about data and its analysis. Measuring and analyzing a small number of variables reliably is much better than measuring a large number haphazardly. For example, in outcomes evaluation, a fundamental concern is to ensure that if an effect exists, it is detected. For that reason much consideration is devoted to determining sample sizes that afford adequate statistical power. Although there is always a trade-off between sample size and the feasibility and cost of the evaluation, it is essential that an evaluation be designed with the statistical power necessary to detect clinically and behaviorally meaningful effects. Potential ways to accomplish this include: increase the sample size, increase the effect size, use reliable and valid measurement strategies,
208
Chapter 8
measure at the level of functional impairment and its amelioration, rather than at the level of the pathology or limitation, group many separate measures into a smaller number, group many categorical items (such as diagnoses) into a smaller number of groups according to an index of similarity (for example, all persons with chronic mental illness, lower back pain, or over the age of 65), calibrate the outcome measure(s) against behaviors and real events in people’s lives. Statistical approaches and methods change with time. Social science has historically relied heavily on the experimental/control research paradigm, in which the null hypothesis was tested to determine whether two or more groups are significantly different. The prevailing decision rule was that if the statistic (be it the t-ratio, F-ratio, or correlation coefficient) was sufficiently large, then one rejected the null hypothesis and stated that there was either a significant difference between or among groups, or a significant relationship between the variable(s). Although this is still the case in many evaluation studies, because of the complexity of most current education, health care, or social service programs, we are beginning to see increased use of multivariate evaluation designs with intact groups, rather than strict experimental/control comparisons. This shift is important to keep in mind, for it affects the questions that one asks and the way one plans evaluation efforts. One needs to think of multiple causes and multiple outcomes, and hence the need to use the multivariate statistical analysis designs discussed later in this chapter. Successful evaluation studies use a variety of methods. Multiple methods are used to bind or simplify data analysis which include (Johnston, 1987): limiting the scope of the analysis; asking few questions; using data from other sources (for example, published handbooks, national databases, or evaluations of similar programs) to establish benchmarks against which to compare outcomes; limiting the follow-up period to 12–18 months, since this period balances sufficiently the instability of short-term change with the problems of missing data, sample attrition, and increased costs of longer term follow-up; using proxy measures whenever possible (for example, length of hospital stay can be used as a proxy for resource consumption costs); and focusing on both person- and organization-referenced outcomes. Each key evaluation player uses statistics for his or her respective purpose. Promoters, stakeholders, and evaluators are frequently interested in different questions and thus will differ in the choice of what to measure, how to measure it, how often to measure it, and how to present and interpret the results. Some may be more interested in descriptive statistics; others, inferential statistics; others, relationships among variables; and still others, cause-effect re-
Analyzing and Interpreting Outcomes
209
lationships. Thus, in outcome-based evaluations, methodological pluralism is frequently combined with multiple statistical analyses. One has the option of focusing on the environment or the individual. If the individual is the reference, then one is more likely to focus on univariate or bivariate statistical analyses. However, if one focuses more on the environment, then one will measure more environmentally related factors; be more likely to use a multivariate statistical design; and be more environmentally or contextually oriented in explaining person or organization-referenced outcomes. No one evaluation is definitive. It is common for one’s results to generate more questions than answers. Thus one needs a long-term commitment to both outcome-based evaluation and the incorporation of evaluation data into the ongoing evolution and improvement of the program. These six principles are not exhaustive, but I have found that they help to understand what one gets from a “data analysis.” A brief example will be useful to show how these six principles have been reflected in our own longitudinal studies of the placement outcomes of persons with mental retardation. In our first studies (Schalock & Harper, 1978; Schalock, Harper & Carver, 1981) we were interested primarily in the placement success of individuals whom we had placed five years previously into independent living or competitive employment. The dichotomized variable used in the statistical analysis was whether the individual remained in the placement environment or returned to the rehabilitation program. During that period, the program’s primary goal and staffing patterns were directed at placing individuals with mental retardation into community environments that reflected increased independence and productivity, Five years later (10 years after the initial placement) we (Schalock & Lilley, 1986) reevaluated the status of these individuals, expanding our data sets to include a measure of the person’s quality of life (hence using a multivariate approach). The addition of a QOL measure reflected the organization’s commitment to not just placing persons into the community, but also assisting them to become better integrated and satisfied with their community placement. Although the organization’s mission statement reflected an enhanced quality of life for its clientele, the program structure during that five year period had not changed significantly in terms of the quality enhancement techniques used. What we found essentially in the 1986 evaluation study was that the organization had fulfilled its goal of placing people into community living and employment environments, but had overlooked the multifaceted quality of their lives. Based on that finding, a number of changes were made in the program’s structure and operation. Specifically, program structure (staffing patterns and
210
Chapter 8
resource allocation) was aligned with the mission statement, which now focused on enhanced quality of life outcomes; quality management principles that empowered employees to find community-based opportunities and supports were implemented; and quality enhancement techniques related to increased use of natural supports and the permanence of one’s home were integrated into staff training. The net result was to change significantly the evaluation paradigm and data analyses used in the 15-year follow-up of the original group. In that study (Schalock & Genung, 1993) personal interview and observation data were used to evaluate the person’s social and support networks, lifestyles and role functions, activity patterns, measured quality of life, and expressed satisfaction.
Statistical Guidelines I enjoy teaching the second or third course in statistics, for after two or three exposures, statistics start to make sense. My experience has been that statistical computations are not a problem, especially with today’s computers. But what is always the most difficult question is what statistical test should I use? Once that issue is resolved, then the related issue becomes, how do I interpret it? An absolute rule to follow in answering the “what” question is that the specific statistical test one employs depends upon four factors: (1) the level of measurement; (2) the number of persons and groups involved in the analysis; (3) the type of design that one is using; and (4) one’s ability to meet the respective test’s assumption. A listing of the specific statistical tests keyed to these four variables is summarized in Table 8.4, which provides the basis for the following discussion. Any published statistics textbook will give computation procedures and interpretation guidelines for those statistical tests referenced in the table. Level of Measurement
Frequency data are often used as the basis of one’s evaluation. Examples include the number of persons with posttreatment pain relief, the number of special education students who are in integrated versus segregated classrooms, or the number of persons with different diagnoses living or working in particular environments. For this level of measurement, one would need to use a nonparametric statistical test such as chi-square to determine whether the groups are significantly different. If, however, you rank your data, then one needs to use a different statistical test. As the term rank implies, the statistical tests that are appropriate for ranked data compare the sum of the ranks in the groups to determine whether
Analyzing and Interpreting Outcomes
211
one group yields scores that are generally lower in ranking than the other group. These rank tests are also considered nonparametric because they make no assumptions regarding the underlying distribution of the scores. Instead, statistical tests based on ranks simply assume that the dependent variable is at least ordinal in nature, so that scores can be ordered from lowest to highest and ranks can therefore be assigned (Russell & Buckwalter, 1991). If the level of measurement is either interval or ratio, then one can use a more powerful parametric statistical test. Tests such as the t-test or analysis of variance can be used if a number of assumptions can be met, including (Russell & Buckwalter, 1991): the variables are at least interval levels of measurement; scores on the dependent (outcome) variable are normally distributed; the variability of scores on the dependent variable within the groups is equal; and the observations are independent of one another, which means that persons cannot be in more than one grouping.
212
Chapter 8
Number of Groups and Persons Involved
In reference to groups, the material presented in Table 8.4 should make sense by itself. For example, if one has two groups, then certain statistical tests (such as chi-square or t-test) are appropriate. If, however, there are more than two groups, then depending upon the level of measurement and the ability to meet the test’s assumptions, one needs to use a multiple group comparison statistic such as analysis of variance. The number of persons included in the statistical analysis is very important, since the sample size determines both the magnitude of the statistic that is necessary for statistical significance and the practical significance of the statistical result(s). For example, with 1000 subjects and a correlation coefficient of .20, one could claim statistical significance. But the question that should be asked is, how much of the variance (in this case only 4 percent) in the outcome variable can reasonably be attributed to the fluctuation in the independent variable? Similarly, with a between-groups design, one may have a tratio of 1.98 that is statistically significant at the .05 level of significance, but the practical significance of this can also be questioned, since potentially only a small proportion of the total variation in the dependent (outcome) variable is due to the effect of the independent (intervention) variable. My purpose in making the distinction between statistical and practical significance (reflected in the “measures of strength of association” listed in Table 8.4) is that in reporting the results of outcome-based evaluation, it is important to look at the actual amount of change in one’s outcome variable that one can actually attribute to the intervention. This distinction becomes even more important later in the chapter when the concept of clinical significance is discussed. Type of Evaluation Design
Refer back to Figure 4.2 (page 68) for a moment. Note the six evaluation designs listed: person as own comparison, pre-post change comparisons, longitudinal status comparisons, hypothetical comparison group, matched pairs (co-horts), and experimental/control. In reference to potential statistical tests, some of these are referenced in Table 8.4 as “within subject designs,” and some are listed as “between group designs,” dependent upon whether or not the comparison involves the individual over time (“within subject’) or two or more different groups (between groups”). Meeting the Test’s Assumptions
In addition to the three factors just discussed, one needs also to remember that there are a number of statistical assumptions that underlie statistical
Analyzing and Interpreting Outcomes
213
tests–and especially parametric ones. The two big assumptions relate to the normal distribution of scores on the outcome measure and homogeneity of variance. If one cannot meet these assumptions, then a nonparametric test, which does not make these assumptions, should be used.
Statistical Analyses Descriptive Analysis
An extremely important aspect of program evaluation is often to describe events accurately rather than test specific hypotheses. Thus, sometimes all one wants to do is to describe such things as the characteristics of service recipients, to summarize the services or interventions received, or to summarize the status of service recipients on a range of outcome variables. The use of descriptive statistics is very important since it defines important characteristics about the involved clientele. It also allows one to quantify various aspects of the program and begin to generate or refine the specific questions to be answered by later phases of the evaluation (Rintala, 1987). This level of statistical analysis should not be interpreted as “high-level program evaluation” since the analyses are only descriptive in nature and do not permit generalizations from the program to a larger population. An example of a descriptive comparison of two groups of special education students on a number of student characteristics and outcome variables is shown in Exhibit 8-3 (Schalock et al., 1992). Exploratory Correlation Analysis
Exploratory correlation analyses help explain relationships among the various aspects of the program. For example, the correlation coefficients shown in Exhibit 8-4 are from a recent study (Schalock et al., 1997) of factors related to recidivism of persons with mental illness. The significant correlation coefficients found between the major predictors of recidivism and other measured variables became the basis for the next phase of evaluation efforts, which involved doing in-depth interviews regarding the role that these factors might play in recidivism, and then implementing programmatic changes. Group Comparisons
Frequently, outcome-based evaluators are interested in whether two or more groups differ significantly on particular person-referenced outcomes. This was the case in the study wherein we (Schalock & Genung, 1993) compared the lifestyles and role functions of two groups (“nonservice” and “service”) of individuals with mental retardation who had been placed 25 years
214
Chapter 8
Exhibit 8-3 Example of a Descriptive Analysis Variable Student characteristics Intelligence % of time in resource room Hours in vocational program Gender (no. females/males) Current employment status Employed full time Employed part time Vocational school CBMR (MR Program) Unemployed Current employment outcomes Number of weeks employed Hours/week Wages/hour Current work-related benefits (Percent receiving) Medical Unemployment Sick Vacation Profit Sharing Retirement Current living environment Independent Semi-independent Current primary source of income Personal (self; parent) Public (SSI; SSDI)
Verified Handicap SLD (n = 189) MH (n = 109) 93 7.4 22 (59/130)
63 12.7 15 (34/75)
70% 13% 6% -010%
44% 25% 2% 8% 17%
42 40 $4.52
36 26 $2.93
43 28 31 40 10 18
30 10 19 25 11 6
45% 55%
33% 67%
94% 6%
83% 17%
prior to the longitudinal study into community living and employment environments. The group comparisons are presented in Exhibit 8-5. Multivariate Analysis
Multivariate analysis allows one to determine the factors that lead to the obtained outcomes. In conceptualizing a multivariate analysis, it is useful to
215
Analyzing and Interpreting Outcomes
Exhibit 8-4 Example of Significant Correlations Between Predictors of Recidivism and Other Measured Variables Predictors of recidivism Number of previous admissions
Employment status at admission
Health problems
Instrumental activities of daily living
Other measured variables Axis I diagnosis (r = –.19) Age (.21) Educational level (–.37) Employment status/follow-up (.18) Age (–.18) Living arrangement/follow-up (.23) Employment status/follow-up(.26) Instrumental activities of daily living (.21) Cognitive level (.33) Gender (–.25) Marital status (.22) Educational level (–.19) Living arrangement/follow-up (.55) No. of community resources used (.67) Instrumental activities of daily living (–.76) Age (–.35) Educational level (.27) Employment status/admin. (.21) Days hospitalized (–.27) Self rating of change (.24) Living arrangement/follow-up (.42) Employment status/follow-up (.30) No. community resources used (.84)
Source: Schalock et al. (1997).
refer to the multivariate analysis model presented in Figure 8.2. The model shows the potential impact that recipient characteristics (which are usually considered independent variables in the statistical analysis), core service functions, cost estimates, and external influences (intervening variables) have on person- and organization-referenced outcomes (dependent variables). The advantage of using a multivariate analysis is that one can begin to “tease out” the relative contribution of the multiple factors that influence person- and program-referenced outcomes. An example of a multivariate analysis is shown in Exhibit 8-6, which summarizes the results of a study (Schalock et al., 1994) in which we evaluated the influence of three factors (referred to as “blocks” in the hierarchical statistical
216
Chapter 8
Exhibit 8-5 Example of Group Mean Score Comparisons Groups being compared Variables being compared Desired Community Outcomes Making decisions Contributing to community Doing things myself Arranging for assistance Visiting with others Using the community Spending time as others do Living in a healthy and safe place Owning things Being valued and accepted by others Quality of Life Factors Independence Productivity Community integration Satisfaction
Nonservice
Service
t-value
2.8 2.0 2.9 2.7 2.6 2.7 2.6 2.7 2.7 2.5
2.4 1.8 2.8 1.8 1.9 2.3 2.3 2.4 2.3 2.4
2.4* 0.7 0.9 4.3** 2.7** 2.1* 1.8 1.6 2.0* 0.5
27 26 22 25
24 22 21 24
3.8** 3.0** 0.8 0.7
* p < .05 **p < .01 Source: Schalock & Genung (1993).
design) on the measured quality of life of 968 individuals with developmental disabilities five years after they moved into the community. The results of the analysis show that personal characteristics contributed the most to the person’s total quality of life score, followed by objective life conditions, and perceptions of significant others. Interpreting External Influences on Outcomes Contextualism has emerged recently as a critical concept in understanding the influence of external influences on outcomes. The purpose of this last section of the chapter is to sensitize the reader to the importance of considering a number of contextual or external factors that have a direct impact on
Analyzing and Interpreting Outcomes
217
how one approaches an outcome-based evaluation, and the interpretation given to outcome-based results. Four of these contextual issues will be discussed: clinical significance, threats to internal validity, organization variables, and attrition. Contextualism has a number of central themes. First is the appreciation of the setting or context within which behavior occurs. In this regard, the context of an education, health care, or social service program includes more than just the immediate setting: rather, one needs to consider the larger cultural and historical setting that allows or invites the occurrence of an event and renders it socially acceptable and timely. Second, a central part of contextualism is its emphasis on reality as an active, ongoing, changing process that involves the program being transformed by its participants, who in turn are transformed by the program. A third contextual theme is an assault on the division of science into “basic” and “applied” branches, and an acceptance of the view that the best way to advance basic understanding is to study social reality as it occurs in everyday, practical states. This theme has had a profound impact on the changing evaluation strategies discussed earlier in the text. A fourth contextual theme is the notion that the program is an active determiner of its own development, and that information from the evaluation
218
Chapter 8
Exhibit 8-6 Multivariate Analysis of Predictors of Total QOL Score Block/Predictor variable Personal Characteristics Age Gender Adaptive behavior index Challenging behavior index Health index Need for medication Objective Life Conditions Earnings Integrated activities Physical environment Social presence Living unit size Residential supervision Home type Employment status Perception of Significant Others Client progress Environmental control Job satisfaction Working with person
Adjusted R squared .426
R squared change –
F for block
Beta coeff.
89.34*
–.08 .04 .40 .21 –.01 –.06 .505
.08
53.17* .17 .12 –.03 –.01 .01 –.05 .16 .07
.519
.01
43.75* .03 –.03 .07 .07
*p < .001 Source: Schalock et al. (1994).
process can be used by various key players to document accountability and improve the organization’s effectiveness and efficiency. Clinical Significance
The concept of clinical significance is based on a functional assessment and asks a simple, values-based question: is the person demonstrably improved or different (in regards to the medical condition, behavior, or personal development) as a result of the program or intervention? Operationally, this means determining whether the level of the person’s functioning or level of wellness
219
Analyzing and Interpreting Outcomes
subsequent to intervention places them closer to the mean of the functional population than the dysfunctional population. In this regard, clinical significance includes: (1) a measure of meaningful change that determines whether the client has moved from the dysfunctional to functional range; and (2) whether the amount of change is of sufficient magnitude to be considered statistically reliable (Ankuta & Abeles, 1993). Unfortunately, the use of statistical significance tests to evaluate treatment efficacy is limited in at least two respects (Jacobson & Truax, 1991): First, the tests provide no information on the variability of response to intervention within the sample (thus the need to look at within-treatment variability); and second, whether a treatment effect that exists in the statistical sense has little to do with the clinical significance of the effects (that is, clinical versus practical effects). Thus, one needs to understand the importance of concepts such as statistical significance and effect size. In that regard, Jacobson, Follette, and Revenstorf (1984) propose a two-condition criterion to judge clinical significance: (1) a measure of meaningful change that determines whether the person has moved from the dysfunctional to the functional range; and (2) whether the amount of change (or “effect size”) is of sufficient magnitude to be considered statistically reliable. Effect size refers to the magnitude of the difference between conditions (e.g. experimental versus control or, Group 1 versus Group 2). Effect size can be viewed as the variance among the population means and can be represented as: Effect Size = Treatment Mean–Control Mean Standard Deviation Other things being equal, the larger the effect produced by the treatment or intervention on a given outcome, the more likely it is that statistical significance will be attained, and thus the greater the statistical power (Lipsey, 1990). However, the role of effect size in the statistical power of treatment or intervention effectiveness is problematic for at least two reasons: the difficulty of knowing what effect size is reasonable to expect from a treatment/intervention being evaluated, and the fact that effect size is influenced by both the actual difference between groups and the variance within the groups. Thus, factors that affect either group differences or group variance can drastically alter the overall effect size. General guidelines for overcoming each of these potential problems include: Reasonable effect size is generally described in reference to small, medium, and large. These size effect categories represent the effect size on some dependent measure in terms of the difference between
220
Chapter 8
the treatment and control group means expressed in standard deviation units. Cohen (1987) reports that the spread of scores for most measures on most populations in the behavioral sciences is about five standard deviation units from the lowest value to the highest. Furthermore, he also reports that across a wide sampling of behavioral science research, the average statistical power reported for detecting large effects was about .8 standard deviations, .5 in the medium, and .2 at the small end of the range. The influence or relative values can be controlled best by using an effect size formulation (or index) that standardizes differences between means to adjust for arbitrary units of measurement. This can be accomplished by dividing the scores by the standard deviation of their distribution to produce a measure in standard deviation units rather than in the units of the original scale (Lipsey, 1990).
Threats to Internal Validity Internal validity is the degree of certainty that the program actually caused the effects reflected in the outcome variable(s). The concept statistical conclusion validity is generally used to operationalize this type of validity, which rests on the demonstration that all sources of external variance have been controlled so that one is certain that the program produced the obtained outcomes. This requirement typically is very difficult, if not impossible, in education, health care, and/or social service programs. In contrast, external validity refers to the degree to which the outcome-based evaluation results can be generalized to other settings or groups. External validity is a function of the representativeness of the sample(s), which typically equates to random selection and random assignment of participants (Garaway, 1997). In a true experimental design, potential program participants are assigned randomly to receive or not receive an intervention or program service, or to different services or interventions for comparison purposes. The advantage of random assignment is that based on sampling theory, the procedure should result in the groups being very similar to one another before the intervention. However, it is frequently not feasible to use random assignment due to ethical, moral, or practical considerations. Thus, when random assignment is not the case, a number of threats to internal validity emerge. Among the most important of these threats and possible ways to overcome them include: Selection bias. It is possible that the comparison groups were not equal to begin with, so any treatment or intervention effects might well be related to their initial differences, rather than the intervention. Analysis of covariance can be used to equate these initial differences.
Analyzing and Interpreting Outcomes
221
Participant maturation. If persons enter the program or receive the intervention at different times, or at different ages, it is possible that one group is “maturing” at a different rate than another, or that, in the case of Alzheimer’s evaluations, for example, the memory system is changing between the groups disproportionately. If selection maturation is a factor, a good design to use is a matched sample design to ensure that the persons are equal from a maturational perspective. History. The history of the person between assessment periods can influence the outcome variable. Of particular concern are events that affect one group but not the other. As suggested by Russell and Buckwalter (1991, p. 7), “in evaluating the possibility that history may have affected the results of an evaluation study, a careful scrutiny of any relevant events that may have occurred during the course of the study is required.” Instrumentation. Either “floor” or “ceiling” effects relate to the measurement of the outcome variable. These effects occur when scores on the dependent variable/measure cannot go below or above a certain level on a measure, or when the units of measurement are not uniform. Two suggestions to overcome this potential problem: (1) measures need to be selected that provide the possibility of a wide spread of scores, which is one of the reasons why 7-point Likert scales are potentially better to use than 3- or 5-point; and (2) scores should be obtained using reliable and valid instruments. Regression toward the mean. A potential regression to the mean effect is related to errors in measurement in which it is likely that individuals who received extreme scores on a measure at the first assessment will tend to receive scores closer to the population mean on the instrument on subsequent assessments. As discussed by Russell and Buckwalter (1991), whenever one of the groups in the evaluation receives extreme scores (referred to as “outlyers”) before the intervention, attention needs to be paid to possible regression effects. One way to assess the presence of regression artifacts is to use a second assessment before the intervention which will permit an evaluation to determine whether the differences reflect regression toward the mean or real treatment or intervention effects. Regression toward the mean effects are especially problematic when using nonequivalent control groups and quasi-experimental designs.
Organization Variables Education, health care, and social service programs do not operate in a vacuum. The context within which these programs operate can best be de-
222
Chapter 8
scribed by a number of metaphors. At one level, a program’s context is like an umbrella: it protects and shields programs from unwanted intrusions. It is also like a personality assessment: it explains a number of characteristics about the program, including its development and dynamics. A program’s context is also like a balloon: it surrounds the program and gives structure and dynamics to its mission, processes, and outcomes. Our task in this section is to analyze a program’s context and attempt to understand the most significant organization variables that influence outcomes and their interpretation. There are a number of key organization variables that I have found critical in understanding how education, health care, and social programs work
Analyzing and Interpreting Outcomes
223
and produce outcomes. These are listed in Table 8.5. Note the wide variety of organization/contextual variables that need one’s attention: organization description, philosophy and goals, phase of program development, resources, formal linkages, community factors, and family variables. Organization Description
Education, health care, and social programs represent multi-faceted organizations that vary on a continuum from reasonably simple to unbelievably complex. To help understand this, I have listed in Table 8.5 five descriptive variables that will impact an organization’s outcomes and their interpretation: the type of organization (profit or not-for-profit), the organization’s governance structure, funding resource(s) and certainty, core service functions, and the organizational structure. As indicated in the table, most programs are legitimated and monitored through either a governmental or corporate structure, and their governance structure includes a board of directors. Funding sources are multiple, and funding dimensions typically focus on the certainty of funding and/or control over financial incentives. Core service functions usually include assessment, intervention or rehabilitation, service coordination, and ongoing supports. Philosophy and Goals
Current programs and services operate from a number of philosophical concepts such as cost containment, quality services, customer responsiveness, normalization, integration, inclusion, equal opportunity, wellness, and quality of life. The importance of a program’s stating and acting on its mission statement and philosophy provides its purpose, direction, efforts, and expected outcomes. For example, a program whose mission statement elaborates upon the concepts of inclusion and enhanced quality of life will look very different from one whose mission is self-preservation. However, the mission statement cannot stand by itself. It must be reflected in all aspects of the program’s service provision, since the research is very clear: a program’s outcomes are positively related to its culture, including artifacts, values, norms, ideology, and assumptions (Schein, 1990). Phase of Program Development
Not all programs are in the same stage of development. Indeed, one can probably identify at least three phases of program development: feasibility/ demonstration, ongoing, or changing. The interpretation of program outcomes needs to be sensitive to a program’s developmental phase since the program’s
224
Chapter 8
purpose, capability, and anticipated use of data are very different depending upon its developmental phase. For example, if a program is established to determine the feasibility of a service or intervention, then we are dealing with a “pilot project.” This is done often when policy- or decisionmakers are interested in determining whether to invest in a new program direction or component. Probably the best example of this is the supported employment initiative funding by the Rehabilitation Services Administration a number of years ago, in which project grants were given to a few employment programs that wanted to determine the feasibility of the concept of supported employment for persons with mental retardation. Other examples include work fairs, job retraining, inclusionary schools in special education, and club house programs in mental health. From an outcome-based evaluation perspective, it is critical to understand that a program in a feasibility phase of development is very different from one that is either ongoing or changing. Its organizational structure may well be quite different; its mission, philosophy, and goals will undoubtedly be different; and the type of data and their use will be significantly different. Here the basic question is one of cost versus outcomes, and unless the results of the analysis are favorable, then policy- or decisionmakers are probably not going to be very supportive of further program development or expansion. For ongoing programs, the history of the organization and its “place in the community” are critical. For example, one of the biggest barriers to community placement of individuals with special needs is the issue of what happens to the staff and the facility if all the clientele “move into the community.” Additionally, the purpose for an outcome-based evaluation may well be different in ongoing programs. For example, in working with both community- and facility-based programs over the years, I have found that the focus of my evaluation efforts is quite different: Community-based programs are typically more interested in placement and support evaluations; whereas facility-based programs are more interested in either change within the facility or why people return to the facility. Programs that are in a changing phase will look much different from either those in a feasibility or ongoing phase. Here the emphasis will be on factors such as justification for the change, barriers to the change, social-cultural factors that provide ongoing support for the change, and/or evaluating whether the outcomes are congruent with the changed program’s promises or model. Resources
Resources are defined broadly to mean money, experience, and time. There is no doubt but that resources will affect the program’s processes and outcomes. But, one needs to be both critical and evaluative in reference to the
Analyzing and Interpreting Outcomes
225
assumptions one makes. For example, I was involved recently in a 30-day study that evaluated how habilitation staff spent their time. The study was initiated by a program administrator who was concerned about minimum outcomes related to skills being gained by adults with disabilities. His assumption was that the 5:1 training staff to clientele ratio was more than sufficient to permit considerable habilitation training and hence good skill acquisition rates. What we found was quite disturbing: Only about 15 percent of the staffs’ time was actually devoted to training. Almost a third was spent in assisting clients, another 13 percent in supervising clients, and 8 percent supporting (“doing for”) persons. Other results were equally informative: 7 percent of staff time was spent on “general maintenance,” 7 percent in meetings, 3 percent in transportation, and only 2.5 percent in inservice training. The point to all of this is that there is probably no reason to assume that habiliation and training is automatically going on, based on the 5:1 staffing ratio, and that the program’s person-referenced outcomes will reflect that fact. Without including an analysis of resources in interpreting an outcome-based evaluation, one can overlook a critical contextual variable. Formal Linkages
This is an era of “interagency cooperative agreements.” I have been involved in such efforts for a number of years and have written extensively about their importance in contextual analysis (Schalock, 1984, 1986). My experience suggests a simple rule of thumb in including formal linkages in the interpretation of outcomes: explain who does what for whom. Why is a discussion of formal linkages important to include? Very simply, it helps to explain the results obtained. It also explains a lot about the philosophy of programs as well as the combining of resources to fulfill the program’s goals. Community Factors
No education, health care, or social program exists in isolation; rather, it is a part of a larger environment that has specific economic, social, and political characteristics. Again, my experience has suggested a number of community factors that exceed some threshold and thereby effect either the program’s process or outcomes. These include attitudes regarding the program, as reflected in historical funding patterns; auxiliary support services (mental health, community living, employee assistance programs, respite care, homemaker services, employer round tables, etc.); crime rates; economic indicators such as average income and employment rates; public health indicators such as life expectancy rates; mobility patterns reflected in net in- or out- migration; tax structure and policies; transportation availability; and civil rights enforcement.
226
Chapter 8
Family Variables
Families are key to understanding programmatic processes and outcomes. As suggested in Table 8.5, five factors related to the services provided frequently account for the family’s involvement: whether the service is available, accessible, appropriate, accountable, and affordable. There is considerable empirical support for the key role that families play in education and rehabilitation outcomes. For example, in 1978, we (Schalock & Harper, 1978) evaluated the significant predictors of successful community placement for persons with mental retardation. In that analysis, we found that the best predictor was the amount of parental support, as measured by attendance at Individual Program Plan meetings and assessed agreement with the thrust of the program toward community placement. Similarly, in a 1986 study of graduates from a rural special education program, we (Schalock et al., 1986) found that the parent’s assessed level of involvement in the Individual Education Plan process was a statistically significant predictor of the postgraduation living and employment status of the students. And third, we (Schalock & Lilley, 1986) found that the amount of family involvement was positively (and significantly) related to a service recipient’s assessed quality of life. If the family’s role is so critical, then how should it be assessed or handled in an analysis? My preference (as reflected in the studies referenced above) is to include family involvement as a variable in a multivariate analysis (see Figure 8.2) so that its importance can be evaluated empirically and statistically. Once that is done, then it is simply a matter of discussing it in reference to the evaluation’s interpretation. There is a definite advantage to incorporating external variables/influences in the planning and implementation of an outcome-based evaluation: one can determine empirically whether they are related statistically to the evaluated outcomes. An example is presented in Exhibit 8-7. This study (Schalock, McGaughey, & Kiernan, 1989) involved two national surveys that were conducted to document selected employment outcomes as reported by 1,500 vocational rehabilitation agencies that met one of the following criteria during the survey periods: (1) placed adults with developmental disabilities into transitional, supported, or competitive employment settings; and/or (2) provided sheltered employment for adults with developmental disabilities. The analysis involved 17 predictor variables and 8 person-referenced outcomes. The results, based on multiple regression and summarized in Exhibit 8-7, indicate the critical role that facility size and philosophy, cost per participant, hours of job support, geographical environment, staff utilization patterns, and the area’s unemployment rate have on placement of persons with disabilities into nonsheltered employment.
227
Analyzing and Interpreting Outcomes
Exhibit 8-7 Example of Contextual Variables on Outcomes Person-referenced outcome
Predictor variables*
Hours/Week
Disability level (+) Facilitysize (+) Age (–) Wage (+) Receives SSI (–) Transitional placement (–) Supported placement(–) Hours of job support (+) Gender (–) Cost per participant (+)
Wage/Hour
Disability level (+) Supported placement (–) Facility size (+) Transitional placement (–) Prior sheltered employment (+) Percent DD served (–) Hours of job support (–) Cost per participant (+)
SSI/SSDI Benefits Affected Level of Integration
Wage/hour (+) Hours of job support (–) Supported placement (–) Transitional placement (–) Geographical Environment (+) Percent sheltered staff (–) Prior employment; nonsheltered (–) Unemployment rate (–)
Supported Employment Retention Unemployment rate (+) Hours of job support (+) *Denotes direction of the relationship. Specific multiple regression values can be found in Schalock, McGaughey, & Kiernan (1989).
228
Chapter 8
Attrition When evaluating an intervention service or program, many evaluators track the participants over time. This allows the evaluator to assess whether the effects of the intervention or service diminishes or grows with the passage of time. An added benefit is that the data often include a pretreatment baseline measure for individuals in the treatment group, which allows the evaluator to adjust comparisons for any differences that might have existed prior to the treatment or intervention. Although powerful (and frequently preferred), longitudinal evaluations are beset by one of the greatest threats to internal validity: attrition. Attrition is a nonresponse in a longitudinal context; it is the failure of individuals to respond or be found for requests for information over time (Fitz, 1989; Foster & Bickman, 1996). Attrition is among the more difficult problems outcome-based evaluators face. As stated by Foster and Bickman (1996), If individuals providing information differ systematically from those who do not, the estimated effect of the treatment may be grossly inaccurate. Worse still, although potentially very damaging to a study, an attrition problem is difficult to detect: By its nature an attrition problem hides any evidence that it exists. Missing is the essential piece of the puzzle: What would the nonrespondents have looked like had they responded. (p. 713)
Based on my experience, there are four approaches one can take to deal with the problem of attrition in outcomes interpretation: prepare the data collectors, be systematic, use modeling to fill in the missing observations, and conduct an attrition analysis. Prepare Data Collectors
Adequate preparation and supervision of data collectors is an important part of maintaining subject participation in longitudinal evaluations. Data collectors need to have skills that enhance the desire of consumers to participate, reflect knowledge and importance of the evaluation, demonstrate a high level of consideration and concern for others, demonstrate excellent communication skills, and reflect enthusiasm and commitment to the evaluation (Given et al., 1990). Be Systematic
Recently, Desmond and his colleagues (1995) reviewed methods used to follow substance abusers. Methods of high completion rates included patience, persistence, time, travel, enthusiasm, and creative team work. The top 10 sug-
229
Analyzing and Interpreting Outcomes
Exhibit 8-8 Example of an Attrition Analysis An attrition analysis was done as part of a study (Schalock et al., 1995) whose purpose was to identify significant predictors of recidivism to a mental health residential facility. The study, conducted over a five-year period, included measures on 32 predictor variables collected either on admission, on discharge, or 12-15 months following discharge or upon readmission. Although the study identified health problems, instrumental activities of daily living, employment status and number of previous admissions as significant factors in recidivism, complete data were available on only 61 percent of the initial sample. Specifically, the initial sample had been composed of 510 persons with mental illness; however, the eventual study sample was reduced to 309 due to persons leaving after admission but before we could obtain discharge data (n = 131), or for whom we had admission and discharge data but who could not be found during the follow-up contact
period (n = 70). Thus, an attrition analysis was conducted in which these three groups (309, 131, and 70) were compared on eight “admission data sets” (age, gender, race, legal status, number of previous admissions, marital status, highest grade completed, and employment status on admission) and 13 “at discharge data sets” (cognitive level, treatment progress, scale scores, days of hospitalization, individual psychotherapy hours, OT/IT/RT hours, number of family visits, living arrangement at discharge, Axis I diagnosis, relationship with others, general physical health, self-image, general enjoyment of life, and average ability to cope). The attrition analysis found that the only significant difference among these three groups was in reference to legal status in which the study sample had proportionally more voluntary patients and fewer mental health board commitments than the “admission” or “at discharge” samples.
gestions for being systematic in longitudinal data collection included: collect complete locator information at the start of the study, inform subjects about the follow-up, provide adequate remuneration, hire appropriate staff, document all follow-up activities, exhaustively use institutional information, streamline the follow-up assessment, conduct the interviews in a mutually convenient location, provide resources for travel, and allow enough time to complete the follow-up.
230
Chapter 8
Use Modeling
The most common statistical methods for purging parameter estimates of the effects of nonresponse rely on modeling to fill in the missing observations. These models require distributional assumptions or exclusion restrictions, which are assumptions that a given variable influences the likelihood of response but not the outcome of interest itself. These assumptions are generally untestable and can be very complex (Foster & Bickman, 1996). Conduct an Attrition Analysis
An attrition analysis can be conducted to determine the influence of the loss of subjects, which is especially characteristic of longitudinal evaluation studies wherein one is unable to contact former service recipients. Therefore, one needs to determine whether individuals who drop out of the evaluation are significantly different in any important way from those who remain. A simple way to complete an attrition analysis is to compare mean scores on available data sets for those who drop out compared to those who remain in the evaluation. Such an example is presented in Exhibit 8-8.
Summary In conclusion, the mind (our’s and other’s) must see something before it can believe it. Outcome-based evaluations not only require the collection and analysis of considerable data, they also require analyzing and interpreting data in ways that are accepted by others, and that fulfill the requirements of “good science.” Data collection is more than simply collecting numbers or “counting beans.” Rather, it is a systematic process wherein one judiciously collects relevant information on recipient characteristics, core service functions, cost estimates, and person- and program-referenced outcomes within the psychometric standards of reliability, validity, and standardization. Throughout this process, it is important to continually asked two questions: for what purpose will the outcome data be used; and what data are really needed for the intended purpose? Thus, in data collection and analysis, one always needs to begin with the end in mind and follow a few, but critical guidelines: quantify the data, assign a data collection responsibility center, determine relevant data collection timelines, employ multiple perspectives on costs and outcomes, fulfill a number of data selection criteria, and be sensitive to the influence of external influences. The context within which outcome-based evaluations are being conducted
Analyzing and Interpreting Outcomes
231
and interpreted has changed significantly over the past two decades and potentially promises to change even more in the future. The current emphasis on contextualism has a number of central themes: (1) the appreciation of the setting or context within which behavior occurs; (2) its emphasis on reality as an active, ongoing, changing process; (3) the view that the best way to advance basic understanding is to study social reality as it occurs; and (4) that an education, health care, or social service program is an active determiner of its own development. Embedded within the context of any education, health care, or social service program are the key players in outcome-based evaluation: promoters who are demanding results-based accountability, stakeholders who are increasingly having to respond to cost containment and service reforms, and evaluators who are frequently caught in the middle. The present challenge in outcome evaluation and research is to recognize the importance that external influences–such as clinical significance, threats to internal validity, organization variables, and attrition–have on the production and interpretation of person- and program-referenced outcomes. Our future challenge is to recognize that outcome-based evaluation is changing very rapidly in response to the following trends: • optimizing evaluation quality and utility under resource constraints; • joining of program evaluation and economics in an emerging field– allocation efficiency; • emerging trends within program evaluation such as the social ecology of outcome-based evaluation; • emphasizing data management and accountability and the use of informatics, report cards, and benchmarks; • outsourcing of accountability status via accreditation bodies; • increasing influence of consumerism, participatory action research, and needs assessment as the basis of outcome-based evaluation; • increasing variability in education, health care, and social service programs that are increasingly characterized by multiple service delivery models and in a constant state of flux; • increasing use of modeling to capture the dynamic and constantly changing social-ecological environment within which education, health care, and social service programs operate. In the text’s final chapter, I suggest five outcome-based evaluation scenarios based on these eight trends. These trends and the five scenarios discussed in Chapter 9, remind one of the sage advice reportedly given by Laotzu: “Lay plans for the accomplishment of the difficult before it becomes difficult.”
232
Chapter 8
Study Questions 1. Select an outcome-based evaluation study from current evaluation literature. What statistical analyses were performed on the data? Critique these analyses in reference to the principles and guidelines discussed in this chapter. 2. What is internal validity and how can one best minimize its threats? 3. Why is it so important to simplify the data one collects for outcome-based evaluation? Give specific examples of how one can simplify data sets regarding client characteristics, core service functions, and cost estimates. 4. If you had to choose only one indicator each of client characteristics, core service function, cost estimate, and person- and organization-referenced outcome as the basis for an impact evaluation (Chapter 4), which one would you select? Why? 5. What is contextualism? Why is it so critical in the analysis and interpretation of outcomes? 6. Outline how you would incorporate support functions (Tables 8.1 and 8.2) into your approach to outcome-based evaluation. 7. Become familiar with an education, health care, or social program. Summarize key aspects of each of the following contextual variables: organization description, philosophy and goals, phase of program development, resources, formal linkages, community factors, and family variables. 8. Why is attrition such a potentially significant problem in the analysis and interpretation of outcomes? Outline two or three strategies to overcome the problem. 9. What is the difference between bivariate and multivariate evaluation designs and analyses? What are their respective strengths and limitations? 10. What is clinical significance and how is it judged? How does effect size relate to it?
Additional Readings Cohen, J. (1987). Statistical power analysis for the behavioral sciences. Hillsdale, NJ : Lawrence Erlbaum Associates. Howard, K. I., Moras, K., Brill, P. L., Martinovich, Z., & Wolfgang, L. (1996). Evaluation of Psychotherapy: Efficiency, effectiveness, and patient progress. American Psychologist, 51(10), 1059–1064. Kraemer, H. C., & Thiemann, S. (1987). How many subjects: Statistical power analysis in research. Newbury Park, NJ: Sage Publications. Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, NJ: Sage Publications. McDonald, R. M. (1991). Assessment of organizational context: A missing component in evaluation of training programs. Evaluation and Program Planning, 14, 273–279. Schein, E. H. (1990). Organization culture. American Psychologist, 45 (2); 109–119. Schorr, L. B. (1997). Common purpose: Strengthening families and neighborhoods to rebuild America. New York: Anchor Books, Doubleday.
9 Future Scenarios OVERVIEW
233
Increased Variability of the Service Delivery System Balance between Performance Measurement and Value Assessment Evaluation Theory: Embracing the Postmodernist Paradigm Managing for Results Outsourcing of Evaluation Summary and Conclusion Study Questions Additional Readings
235 237 238 239 242 245 247 247
Even if you are on the right track, you will get run over if you just stand there. WILL ROGERS
Overview As discussed throughout the preceding eight chapters, current education, health care, and social service programs are increasingly being asked to demonstrate their effectiveness and efficiency. This is happening within the context of the “four Cs” that are impacting all organizations: customers, competition, change, and cost containment. At the organization or program level, education, health care, and social service programs are responding to these challenges by changing the ways they do business and conduct program evaluation, using outcome-based data as a basis for making programmatic changes to both improve services and increase their measurability, reportability, and accountability. At the systems level, society and policymakers are asking questions about the effectiveness and efficiency of programs and public policies 233
234
Chapter 9
such as welfare to work, school reform, managed care, the supports paradigm, and community-based approaches to mental/behavioral health, disabilities, aging, substance abuse, and corrections. This book is about ways to both adapt to the “four Cs” and answer questions about a program or policy's effectiveness and efficiency. Throughout the text, I have stressed five principles that provide the framework for outcome-based evaluation: Person- and program-referenced outcomes can be used to demonstrate accountability and as a basis for continuous improvement. Multiple perspectives on accountability require the use of methodological pluralism. All stakeholders can play a key role in outcome-based evaluation. There needs to be a balance between performance measurement and value assessment. Outcome-based evaluations should be based on two fundamental questions: for what purpose will I use the outcome-based evaluation data, and what data will I need for the intended use? In this last chapter, we end where we began: discussing outcome-based evaluation and the factors that influence its present status and a number of future scenarios that will characterize its potential application in the twentyfirst century. Throughout the preceding eight chapters, I have discussed factors that have influenced both the development and application of outcomebased evaluation: the need for increased accountability and continuous improvement, the rise of consumerism, the changing evaluation paradigm characterized by methodological pluralism, and the increased use of outcomebased information for policy and program analysis and change. Basic to both its development and application has been the need to ask good questions and optimize evaluation quality and utility by using a number of outcome-based evaluation guidelines related to methodological pluralism, the evaluation's validity, the program's evaluation capability, the strengths and limitations of outcome-based evaluation, and formative feedback. Now it is time to think about the future and to keep clearly in mind Will Rogers’s statement quoted above, plus that of the Spanish philosopher Ortega y Gasset that, “human life is a constant preoccupation with the future.” There is a popular myth that futurists are in the business of predicting what will happen in the future. The truth is quite different. Futurists know better than most people that the future is not predictable: We cannot know what will happen in the future. Then, what do futurists do? Quite simply, they try to suggest things that might happen in the future so that people can decide what they want to make happen. To that end, this chapter discusses five scenarios that I feel might well unfold during the early part of the twenty-first century:
Future Scenarios
235
(1) an increased variability of the service delivery system; (2) a balance within program evaluation between performance measurement and value assessment; (3) evaluation theory continuing to embrace the postmodernist paradigm; (4) the fields of education, health care, and social services managing for results; and (5) an outsourcing of program evaluation activities. However, before considering each of these possible scenarios, let me be a "Monday morning quarterback" and evaluate briefly a list of predictions I made in the concluding chapter of the 1995 edition of this text. As I looked at the outcome-based evaluation scene in the early 1990s, it was apparent to me that six potential scenarios (that is, predictions) were quite likely. The first was the increase in a “noncategorical approach to services and supports." Indeed, this has happened and we are continuing to see services and interventions being provided based on functional and support needs of persons rather than strict diagnostic labels. Second, “accountability would be defined on the basis of outcomes.” No contest. Third, we would see the emergence of “multiple evaluation designs.” Methodological pluralism is alive and well as we enter the twenty-first century. Fourth, we would see "increased service provider networks.” Interagency cooperation and agreements are a sign of the time! Fifth, “consumer-oriented evaluation would be a critical component of outcome-based evaluation.” Welcome to participatory action research and empowerment evaluation. And sixth, we would see a “linking of program evaluation and forecasting.” We are not quite there yet, but informatics and information technology are the potential vehicles whereby policymakers and service providers will be made aware of emerging issues, allowing them enough lead time to develop policies and programs to address education, health care, and social problems. The reader can be the judge, but the crystal ball for these scenarios seemed to work well. Time, however, will be the judge of the accuracy of the five twenty-first-century scenarios discussed in the remaining sections of this chapter.
Increased Variability of the Service Delivery System Education, health care, and social service programs will be fundamentally different in the future. The parameters of what they will no doubt look like are summarized well in the following principles described by Osborne and Gaebler (1993) around which entrepreneurial public and private organizations are beine built: steer more than row empower communities rather than simply deliver services encourage competition rather than monopoly
236
Chapter 9
driven by their mission, not their rules fund outcomes rather than inputs meet the needs of the customer, not the bureaucracy concentrate on earning, not just spending invest in prevention rather than cure decentralize authority solve problems by leveraging the market place, rather than simply creating public programs These are powerful principles that many of us are already seeing influence education, health care, and social service programs. I am convinced that they will result in profound changes in service provision, funding streams, and the locus of decision making. Obviously, they will also have a profound impact on outcome-based evaluation, stressing the continued need to be sensitive to contextual factors in the design of outcome-based evaluations, the critical role that the key outcome-based evaluation players play in the process, and the continued need to balance performance measurement and value assessment. In addition, the increased variability of the service delivery system will force program evaluation theory to develop new outcome-based evaluation designs and analyses. A future reality is that singular approaches to evaluation and analysis will not answer all the questions asked by the various stakeholders, and that we will need to be sensitive to the fact that different evaluation designs are well suited for programs at different levels of evaluation capability and the analyst’s theoretical and practical orientation and training. There is value in both multiple ways of knowing and using multiple designs and analyses that allow for the integration of values, practices, policies, and science. Thus, it will be acceptable for different programs to use different approaches to measurement (performance assessment, consumer appraisal, functional assessment, personal appraisal) and evaluation (program, effectiveness, impact, policy). Finally, the increased variability of the service delivery system will result in the continued development and use of multivariate research methods driven by hypothesis testing. These methods are particularly valuable since they allow one to probe more deeply and realistically into the phenomena they investigate. Multivariate research methods are particularly valuable in education, health care, and social service programs wherein person- and organization-referenced outcomes have multiple origins and external influences. These methods allow one to evaluate empirically the influence of a number of person, program, and environmental variables that significantly affect program outcomes.
Future Scenarios
237
Balance Between Performance Measurement and Value Assessment Twenty-first-century education, health care, and social service programs will continue to be buffeted by two competing accountability requirements: to demonstrate valued, person-referenced outcomes, and to measure organization-referenced outcomes that reflect effectiveness and efficiency. There is no reason to assume that either consumer empowerment or economic realism (as reflected in the concept of allocation efficiency) (Chang & Warren, 1997) will diminish as major forces in both service delivery and evaluation. The balance between performance measurement and value assessment will not be static: indeed the pendulum will swing between both extremes and significant conflicts and public policy and funding debates will ensue. The relative influence of the key evaluation players will fluctuate, but the Aristotelian "golden mean" should prevail. To this end, a number of evaluation-oriented models were presented throughout the text to stress the need to balance performance measurement and value assessment. By way of review: Figure 2.1 ("program-evaluation model") stressed the importance of both standards and focus in outcome-based evaluation. Standards pertain to performance and value; focus, to the individual or organization. Within the matrix, one can evaluate a balance among organization performance and value outcomes, and individual performance and value outcomes. Figure 6.2 ("dimensions of the reform movement and their impact on outcome-based evaluation") stressed the selection of outcome measures and the use of methodological pluralism as a way to balance the accountability and quality dimensions of the reform movement. As discussed more fully in Chapter 7, the use of methodological pluralism also ensures a balance among performance assessment, consumer appraisal, functional assessment, and personal appraisal (see Figure 7.2). Figure 7.1 ("measurement foci") related these same standards and foci to a balance among outcome-based evaluation measurement approaches including effectiveness, efficiency, satisfaction, fidelity to the model, adaptive behavior, role status, and person-centered and healthrelated quality of life. The importance of these figures should not be overlooked in this future scenario. Each figure represents a model that not only integrates current understanding of the issues of performance versus value, but also provides
238
Chapter 9
tomorrow's road maps. The balance between performance measurement and value assessment will be a challenge, but one that is worthy of our efforts.
Evaluation Theory: Embracing the Postmodernist Paradigm Program evaluation theory is still evolving. Shadish, Cook, and Leviton (1991), for example, describe the historical evolution of evaluation theory in at least two complementary ways: confrontation between an initial naiveté on the part of evaluation researchers and the unanticipated complexities of social and political reality; and a partial cumulative progression and refinement of ideas, both theoretical and practical, within the field over that time. The term that best reflects the current evolutionary status of program evaluation is "postmodernism." From a general perspective, postmodernism has been a growing movement, at least since 1960, in reaction against modernism, a worldview that is grounded in the paradigm of positivism. Modernism stemmed from the Enlightenment's vision that rationality will lead to the discovery of objective, general, quantitative laws that will spin off technologies for creating a "grand narrative" of continuous material and cultural progress (Fishman, 1995). Postmodernism provides the umbrella for at least two intellectual movements that have attacked modernism: epistemological and sociopolitical. The epistemological view argues against the rational, value-free objectivist “purity" of positivism on logical grounds, and has been integrated into philosophy's phenomenological and hermeneutic orientation. Central to this position is the theme of social constructivism that an important part of our reality is created or constructed from within our language, our culture, our subjective consciousness, our paradigms, and our core belief systems, rather than being discovered from without by sense impression of the external world. The sociopolitical attack on modernism is reflected in the rebellion against the mechanistic and deterministic principles of behaviorism, the civil rights movement, and identity politics that include disempowered groups who stress the importance of concepts such as diversity, pluralism, and multiculturalism (Fishman, 1995). As with the previous scenario, there will continue to be conflicts between logical positivists and social constructionists, but my prediction is that program evaluation will continue to embrace the postmodernist paradigm. Within that paradigm, we will continue to see a focus on: utilization-focused evaluation, performance-oriented evaluation, theory-driven evaluation,
239
Future Scenarios
pragmatic evaluation paradigm, ideographic research, context-specific knowledge, decision-oriented evaluation, methodological pluralism.
Managing for Results Chapter 2 set the stage for this scenario. As you will recall, the material in Chapter 2 suggested using outcomes to guide program analysis and change and outlined an approach for doing so. Although logical and attractive, managing for change, which will undoubtedly characterize twenty-first-century education, health care, and social service programs, may be easier said than done. For example, reference was made frequently throughout the text to the Government Performance and Results Act of 1993 (GPRA). The act specified a seven-year implementation time period and required the federal Office of Management and Budget (OMB) to select pilot tests to help federal agencies develop experience with the act's processes and concepts. Initial evaluations of those programs by the U.S. General Accounting Office (1997, 1998) provide important information as to how this future scenario might play out. In their evaluation of the implementation of GPRA to date, the Government Accounting Office (GAO) conducted structured interviews with program officials in 20 departments and major agencies with experience in performance measurement. Programs were selected that represented diversity in program purpose, size, and other factors thought to affect their experience. For each program, the official responsible for performance measures and a program evaluator or other analyst who had assisted in this effort were interviewed. Program officials were asked a number of questions about each of the four programmatic requirements of the act: identifying goals, developing performance measures, collecting data, and analyzing data and reporting results. Then, for each stage, program officials were asked to describe how they approached their most difficult challenge and whether and how they used prior studies and technical staff. Based on the GAO’s analysis and subsequent reports (U.S. General Accounting Office, 1997, 1998), a number of findings have significant implications for this fourth scenario. For example, the following significant challenges were found in reference to identifying goals, developing performance measures, collecting data, and analyzing data and reporting results: Identifying goals: challenges included translating general, long-term strategic goals to more specific annual performance goals and objectives;
240
Chapter 9
distinguishing between outputs and outcomes; and specifying how the program's operations will produce the desired outputs and outcomes. Developing performance measures: challenges included getting beyond program outputs to develop outcome measures of the results of those activities; specifying quantifiable, readily measurable performance indicators; developing interim or alternative measures for program effects that may not show up for several years; estimating a reasonable level of expected performance; and defining common, national performance measures for decentralized programs. Collecting data: challenges included ascertaining the accuracy of and quality of performance data. Analyzing data and reporting results: challenges included separating the impact of the program from the impact of other factors external to it. Despite these challenges, a number of approaches were developed to overcome them successfully, which reinforces one of my favorite statements that, "there are no problems, only opportunities to excel and succeed." Specifically, the two GAO studies (1997, 1998; ; U.S. General Accounting Office, P.O. Box 6015, Gaithersburg, MD 20884-6015) describe a number of strategies that agencies used to overcome the challenges listed above. For example; Identifying goals: approaches used included: specifying performance goals over an extended period; focusing annual goals on proximate outcomes; developing a conceptual model to specify annual goals; focusing annual goals on short-term strategies for achieving long-term goals; developing a qualitative approach; involving stakeholders; clarifying definitions of outputs and outcomes; focusing on known, quantitative outcomes; focusing on projected outputs; and surveying customers to identify potential outcomes. Developing performance measures: approaches used included: developing a measurement model that encompasses state and local activity to identify outcome measures for the federal program; encouraging program managers to develop projections for different funding scenarios; conceptualizing the outcomes of daily activities; using multiple measures that are interrelated; developing measures of customer satisfaction; using qualitative measures of outcome; planning a customer survey; involving stakeholders; and identifying outcome measures used by similar programs. Data collection: approaches used included: verifying and validating the data; researching alternative data sources; conducting a special study and redesigning a survey to develop new sources of outcome data;
Future Scenarios
241
involving stakeholders; creating new data elements; using data from other agencies; developing a customer survey; developing an activitybased cost system; providing training; using a certified automated data system; using data verification procedures; acknowledging data limitations; and using management experience. Data analysis and reporting: approaches included: specifying as outcomes only the variables that the program can effect; advising field offices to use control groups; using customer satisfaction measures; monitoring the economy at the regional level; expanding data collection to include potential outcome variables; analyzing time-series data; analyzing local-level effects that are clearly understood; and involving stakeholders. These results suggest a number of issues that need to be addressed as the scenario of managing for results unfolds. First, evaluators can play an important role in working with program administrators to identify goals, develop performance measures, collect and analyze data, and report the results. Second, agency personnel and program evaluators need to use a variety of strategies in situations where they have limited control over outcomes. For example, GAO (1998) identified four strategies that agencies used to address this challenge, occurring at different steps throughout the performance measurement process. Each strategy aimed to reduce, if not eliminate, the influence of external factors on the agencies' outcome measures: (1) selected a mix of outcome goals over which the agency has varying levels of control; (2) redefined the scope of a strategic goal to focus on the more narrow range of the actual activities; (3) disaggregated goals for distinct populations for which the agency has different expectations; and (4) used data on external factors to adjust statistically for their effect on the desired outcome. Third, all key OBE players need to become better able to identify clearly initial, immediate, and end outcomes for education, health care, and social service programs. Two currently available strategies should facilitate this process: data mining and logic models. Data mining is the basic process employed to analyze patterns in data and abstract information. It is a multivariate technique, to be used with a consolidated database, for the purpose of problem definition, data collection and consolidation, data analysis, model building and validation, and evaluation and interpretation (Trybula, 1997). The downside is that data mining is possible only with very large data sets. In distinction, logic models can be used to describe a program's performance story (McLaughlin & Jordan, 1999). Those logic models can be used to describe the logical linkages among program resources, activities, outputs, customers reached, and short-, intermediate-, and longer-term outcomes. Fourth, we need to make better use of national data sets that will fill
242
Chapter 9
important gaps and obtain data that are not readily available to local providers. Web sites have been provided throughout the text to facilitate use of these national data sets. And finally, we need to be sure to involve all key players in the management of outcomes, since a consistent finding across the agencies analyzed by GAO was that "stakeholder involvement helped in selecting measures of program outcomes" (GAO, 1998, p. 18).
Outsourcing of Evaluation The outsourcing of evaluation is personally troublesome, since I believe strongly that outcome-based evaluation should be an internal, organizing process with the evaluator playing the role of an internal "consultant" who is familiar with the program's history, current context, and evaluation capability. However, a recent advertisement on the Web under the heading of "health care performance measurement" may well reflect our futures: x group's low cost “x-system” is a JLAHO-ORYX approved performance measurement system. The "x system" collects 100% of your perioperative clinical indicators, eliminating wasted time manually going through charts every month to compile statistics. It presents your data in easy-to-read monthly reports and includes an annual benchmark. Receive a FREE MONTH–Sign up ONLINE.
This example reflects a number of factors that will potentially make the outsourcing of evaluation the program evaluation model of the twenty-firstcentury. First, there is the cost of evaluation for programs that typically are lacking in resources as defined by time, expertise, and money (Mowbray et al., 1998). Second, report cards and benchmarks are both attractive and potentially able to provide data standards, standardized data sets, comparability across programs, apparent ease of understanding, and quick turnaround time (Epstein, 1995, 1998). They may also be perceived as cost-efficient in a costcontainment environment. In this regard, health care report cards and performance indices are a booming business because they help health care plans compete for big employer contracts in tough markets, particularly those markets where managed care is trying to control costs. Another force affecting how this scenario may well play out are organizations such as the National Committee for Quality Assurance (NCQA; <www.timeformerica.com/xopm/ ncqa.html>; www.hcfa.gov.stats/hedis.htm). This national nonprofit accrediting agency for managed care plans has recently released it latest version of standardized performance measures for managed health care plans–HEDIS 3.0. HEDIS 3.0 is a set of standardized performance measures designed to allow purchasers to compare the performance of managed health care plans and includes a series of indicators that measure the performance of each man-
Future Scenarios
243
aged care plan reporting within the HEDIS parameters. For the most part, HEDIS 3.0 is very outcome-oriented and consists of 75 measures that fall under one of eight categories: the effectiveness of care, the accessibility and availability of care, the patient's satisfaction with the experience of care, the cost of care, the financial and provider stability of the health plan, the information made available to plan members, the use of plan services, and certain plan descriptive information regarding how the plan functions to provide the consumer with information that will assist in making informed health care choices. Such use of a report card is very attractive, especially in an environment that is changing rapidly, where the distinction between public and private is shrinking, and where devolution and heterogeneity are major forces. Third, there will continue to be a recognized need for data standards. In this regard, the increasing need for accountability and continuous improvement may well increase the use of report cards and benchmarks as data standards. As discussed in Chapter 7, report cards are just "coming on line" in education, health care, mental health, and other social services as one way to report outcome-referenced data and allow consumers to gauge the accessibility, appropriateness, and quality of interventions and services. Generally a report card covers four major areas that may well become data standards: Access . Does the program, intervention, or service plan offer quick and convenient access to a full range of services? Do consumers have access to information such as best practice guidelines, and to make informed choices about treatment/intervention/service options? Quality/appropriateness. Do services, interventions, or programs promote growth and wellness, focusing on strengths rather than weaknesses? Promotion/prevention. Does the service or intervention provide information about risk factors and promote prevention and proactive approaches? Outcomes. Does the intervention, program, or services result in measurable changes in the person's condition or behavior, and can these changes be related to process variables? Similarly, benchmarks (in the form of data sets or peer groups) may also become data standards (Epper, 1999). In that regard, Kinni (1994) suggests three lessons that consistently occur in the literature on benchmarking: Start by benchmarking functions or processes that are critical to success. Don't waste time or money on insignificant studies. Benchmarking requires self-assessment. You cannot uncover performance gaps without first understanding and measuring your own processes.
244
Chapter 9
Gather the most cost-effectiveness information first. The more you learn before an on-site visit, the more you will take away from it. If costs, the attractiveness of report cards and benchmarks, and the need for data standards are the catalysts for the outsourcing of program evaluation, time and technology will be the vehicles that bring it about. According to Gates (1999), if the 1980s were about quality and the 1990s were about reengineering, then the 2000s will be about velocity and how quickly business (that is, evaluation) will be done. To function in the digital age, we have developed a new digital infrastructure that will have a significant impact on evaluation. Specifically, we will see an increase in the use of informatics. Informatics is a relatively young discipline that concerns at this point primarily the management of information in medicine and mental health (Shortliffe, 1998). The role of the information sciences continues to grow, and the past few years have seen informatics begin to move into the mainstream of clinical practice generally under the rubic "electronic records." This new generation of electronic tools is designed to provide patients or health care professionals with the tools, skills, information, and support they need to play the role of primary practitioner in the emerging education, health care, and social service systems (Clifford, 1999). Some of the emerging subdisciplines are artificial intelligence, coding and classification of information, informal retrieval, image processing telemedicine, psychocyberepistemology, and information systems. One of the most important recent advances in informatics is the development of secondary databases of summarized evidence that is ready for clinical application. A landmark in this area is the U.S. National Library of Medicine's provision of free MEDLINE searches on the Internet via two different search engines, Internet Grateful Med and PubMed . PubMed also includes embedded search strategies for optimizing the yield of clinically useful studies . Ovid Online <www.ovid.com>, Silver Platter <www.silverplatter.com>, and several other vendors provide more comprehensive (and expensive) services, including access to full-text articles. Readers who want to learn more about informatics can get free information from . The potential scope of informatics is enormous, with future application in the design of decision and support systems for practitioners, the development of computer tools for research, and the study of outcomes. The increasing application of computing and information technology has created the need for precise information including terminology, classification, and the modeling of information both statically (that is, defining the structural elements)
245
Future Scenarios
and dynamically (that is, showing the processing of information and its change with respect to time). In the future, informatics will provide: (1) consistency and efficacy in the application of education, health care, and social service interventions and services; (2) practitioners with an invaluable support in their relationship with their clients; (3) consumers with the information they need so they can take responsibility for critical decisions; and (4) key outcomebased evaluation players with aggregated data regarding person- and organization-referenced outcomes (Brennan, 1994; Haynes & Jadad, 1997; Tudor & Benov, 1995). Three results are very possible due to these factors related to costs, the attractiveness of report cards and benchmarks, the need for data standards, and the availability of technology: (1) the outsourcing of program evaluation to national accrediting bodies, econometric firms using the emerging concept of allocation efficiency, and evaluation departments ("general accounting offices") within state education, health, and human and social service agencies; (2) more "point in time" (rather than longitudinal) outcome-based evaluation being done at the level of the consumer; and (3) the increasing use of informatics and electronic records for both intervention and outcome-evaluation purposes.
Summary and Conclusion Each of these five scenarios reflects a number of challenges and opportunities for the field of program evaluation generally, and outcome-based evaluation specifically. For example, if the last scenario comes true, outcome-based evaluation will look very different and outcome-based evaluators will wear a number of hats including those of the economist, social scientist, policy analyst, and group facilitator. For the other scenarios, the challenges and opportunities are quite clear. In the future, we must adapt to a service delivery system that is increasingly variable and complex; ongoing debate between performance measurement and value assessment; a need to balance social constructivism with the realistic rigor of good program evaluation science; the managing of person- and organization-referenced outcomes; and the potential outsourcing of evaluation. These challenges and opportunities also contain within them the obligation to point out the realistic strengths and limitations of outcome-based evaluation, along with its appropriate and inappropriate uses. Throughout the history of program evaluation, the field has faced a number of challenges and opportunities regarding the relevance of program evaluation, the best research and evaluation designs to use, how to improve evaluation utilization, the need to establish a strong infrastructure for evaluation,
246
Chapter 9
and the necessity of increasing the visibility of evaluators. The role that evaluation plays in program development and policy analysis has progressed significantly since the early 1960s. Two major trends are emerging in the field that have the promise of "moving the field along" toward a more coherent approach, better acceptability, and greater impact. First, is the trend of moving away from a "goal-fixed" approach to a "multi-goal, theory driven" approach to program evaluation (Chen & Rossi, 1989). Whereas the "goal-fixed" approach typically focuses on identifying officially stated program goals, selecting a subset of measurable outcome measures, and assessing whether the program achieves the narrowly defined effects implied in the outcome measures, the "multi-goal, theory driven" approach evaluates both the status of goal attainment and the effects of the program on desired person- and organization-referenced outcomes. The second trend is to rethink program evaluation theory and to expand it to include components dealing with practice, knowledge, value, use, and techniques to improve education, health care, and social service programs. As stated by Shadish, Cook, and Levinton (1991): the fundamental purpose of program evaluation is to specify feasible practices that evaluators can use to construct knowledge of the value of social programs that can be used to ameliorate the social problems to which programs are relevant. (p. 36)
The methodological pluralistic approach to outcome-based evaluation presented in this text is both consistent with–and supportive of–these two trends. By focusing on valued, person-and organization-referenced outcomes within the context of quality of life, functional limitations, and non categorical service delivery, the suggested approach is value-based and growth-oriented. By employing methodological pluralism, the approach is multifaceted and helps programs be both accountable and better able to answer consumergenerated policy and funding issues related to equity and efficiency. By focusing on formative feedback and internal program evaluation, the suggested approach addresses issues of knowledge and practice. By focusing on reporting and using outcome-based evaluation information, the approach suggests ways to improve education, health care, and social service programs. As suggested metaphorically throughout the text, outcome-based evaluation is a three-legged stool: the selection of valid outcome measures, the measurement of those outcomes, and the utilization of the evaluation results. At various points in time, one leg may be unbalanced, as the pendulum swings from qualitative to quantitative, from accountability to responsiveness, from effectiveness to efficiency, and from internal to external. Constant throughout these swings, however, is the consistent need to know where one is going and to work closely with key outcome-based evaluation players to ask the
247
Future Scenarios
right questions, to use methodological pluralism in answering those questions, and to remember that you cannot pour from an empty vessel.
Study Questions Note: Each of the following questions requires a literature search of post-1995 publications. Ideally, you will search relevant literature in your programmatic area to find support for the five scenarios discussed in the text. 1. What evidence do you find for increased variability of the service delivery system? 2. What evidence do you find for a balance between performance measurement and value assessment? 3. What evidence do you find for evaluation theory continuing to embrace the postmodernist paradigm? 4. What evidence do you find for your respective area of interest for managing for results? 5. What evidence do you find for outsourcing program evaluation?
Additional Readings Chelimsky, E., & Shadish, W. R. (1997). Evaluation for the 21st century: A handbook. Thousand Oaks, CA: Sage Publications. Collins, J. C, & Porras, J. I. (1997). Built to last: Successful habits of visionary companies. New York: HarperCollins. Drucker, P. (1998). The future has already happened. The Futurist (Nov.): 16–18. Gates, W. H. (1999). Business @ the speed of thought: Using a digital nervous system. New York: Warner Books. Graham, P. (1999). Implementation of an outcomes management model of treatment. Bulletin of the Menninger Clinic,63(3), 346–365.
References Albin-Dean, J. E., & Mank, D. M. (1997). Continuous improvement and quality of life: Lessons from organizational management. In R. L. Schalock (Ed.), Quality of life: Volume II: Application to persons with disabilities (pp. 165–180). Washington, DC: American Association on Mental Retardation. Albrecht, K. (1993). The only thing that matters: Bringing the power of the customer into the center of your business. New York: Harper Business. Alemi, F., Stephens, R. C., Llorens, S., & Orris, B. (1995). A review of factors affecting treatment outcomes: Expected treatment outcome scale. American Journal of Drug Alcohol Abuse, 21(4), 483–509. Andrews, F. M. (1974). Social indicators of perceived quality of life. Social Indicators Research, 1, 279–299. Ankuta, G. Y., & Abeles, N. (1993). Client satisfaction, clinical significance, and meaningful change in psychotherapy. Professional Psychology: Research and Practice, 24(1), 70–74. Armenakis, A., Harris, S., & Mossholder, K. (1993). Creating readiness for organizational change. Human Relations, 46(6), 681–703. Ashbaugh, J., Bradley, V. J., Taub, S., & Bergman, A. (1997). The development of indicators to monitor the performance of systems of long term services and supports for people with lifelong disabilities. Final report: Review of candidate performance indicators. Washington, DC: Office of Assistant Secretary for Planning and Evaluation, U.S. Department of Health and Human Services and Office for Special Education and Rehabilitation Services, U.S. Department of Education. Bachrach, L. L. (1996). Managed care: IV. Some helpful resources. Psychiatric Services, 47(9), 925–930. Baker, E. L., O’Neil, H. F., & Linn, R. L. (1993). Policy and validity prospects for performancebased assessment. American Psychologist, 48(12), 1210–1218. Baker, F., & Curbow, B. (1991). The case–control study in health program evaluation. Evaluation and Program Planning, 14, 263–272. Baltes, P. B., & Baltes, M. M. (eds.), (1990). Successful aging: Perspectives from the behavioral sciences. New York: Cambridge University Press. Bech, P. (1993). Quality of life measurements in chronic disorders. Psychotherapy and Psychosomatics, 59(1), 1–10. Bhaskar, R. (1975). A realist theory of science. Sussex: Harvester Press. Bickman, L. (1996). The application of program theory to the evaluation of a managed mental health care system. Evaluation and Program Planning, 19(2), 111–119.
249
250
References
Bickman, L., Guthrie, P. R., Foster, E. W., & Lambert, E. W. (1995). Evaluating managed mental health services: The Fort Bragg experiment. (pp. 146–147). New York: Plenum. Blank, R. K. (1993). Developing a system of education indicators: Selecting, implementing, and reporting indicators. Education Evaluation and Policy Analysis, 15(1), 65–80. Bond, G. R., Witheridge, T. F., Dincin, J., & Wasmer, D. (1990). Assertive community treatment for frequent users of psychiatric hospitals in a large city: A controlled study. American Journal of Community Psychology, 18(6), 865–891. Braddock, D., Hemp, R., Bachelder, L., & Fujiura, G. T. (1995). The state of the states in developmental disabilities (4th ed.). Washington, DC: American Association on Mental Retardation. Bradley, V. J. (1994). Evolution of a new service paradigm. In V.J. Bradley, J. Ashbaugh, & B. Blaney, (Eds.), Creating individual supports for people with developmental disabilities (pp. 110– 122). Baltimore: Brookes. Brennan, F. (1994). On the relevance of discipline in informatics. Journal of the American informatics Association, 1(2), 200–1. Bruininks, R. H., Hill, B. K., Weatherman, R. F., & Woodcock, R.W. (1986). Inventory For Client and Agency Planning. Riverside, CA: Riverside Publishing Corporation. Bruininks, R. H., Thurlow, M. L., & Ysseldyke, J. E. (1992). Assessing the right outcomes: Prospects for improving the education of youth with disabilities. Education and Training in Mental Retardation, 27, 93–100. Bryant, D. M., & Bickman, L. (1996). Methodology for evaluating mental health case management. Evaluation and Program Planning, 19(1), 121–130. Bureau of Justice. (1994). Assessing the effectiveness of criminal justice programs: Assessment and evaluation handbook series No. 1. Washington, DC: Bureau of Justice Assistance, Office of Justice Programs, U.S. Department of Justice. Byrne, J. M., & Taxman, F. S. (1994). Crime control policy and community corrections practice: Assessing the impact of gender, race, and class. Evaluation and Program Planning, 17(2), 227–233. Calkins, C. F., Schalock, R. L., Griggs, P. A., Kiernan, W. E., & Gibson, C. A. (1990). Program planning: In C. F. Calkins & H. M. Walker (Eds.), Social competence for workers with developmental disabilities: A guide to enhancing employment outcomes in integrated settings (pp. 51– 64). Baltimore: Brookes. Calsyn, R. J., Morse, G. A., Tempelhoff, B., & Smith, R. (1995). Homeless mentally ill clients and the quality of life. Evaluation and Program Planning, 18(3), 219–225. Camp, R. C. (1989). Benchmarking: The search for industry best practices that lead to superior performance. Milwaukee: American Society for Quality Control, Quality Press. Campbell, A., Converse, P. E., & Rodgers, W. L. (1976). The quality of American life. New York: Sage Publications. Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago, IL: Rand McNally. Campbell, J. A. (1992). Single-subject designs for treatment planning and evaluation. Administration and Policy in Mental Health, 19(5), 335–343. Campbell, J. A. (1996). Toward collaborative mental health outcomes systems. New Directions for Mental Health Services, 71 (pp. 69–78). San Francisco: Jossey-Bass Publishers. Caracelli, V. J., & Greene, J. C. (1993). Data analysis strategies for mixed-method evaluation designs. Educational Evaluation and Policy Analysis, 15(2), 195–207. Carpinello, S, Felton, C. J., Pease, E. A., DeMasi, M., & Donahue, S. (1998). Designing a system for managing the performance of mental health managed care: An example from New York State’s prepaid mental health plan. Journal of Behavioral Health Services Research,25(3), 269–278. Center for the Study of Social Policy. (1996, August). Beyond lists: Moving to results-based accountability. Washington, DC: Author.
References
251
Chambers, D. E. (1993). Social policy and social programs: A method for the practical policy analyst (2nd ed.). New York: Macmillan. Chambers, F. (1994). Removing confusion about formative and summative evaluation: Purpose versus time. Evaluation and Program Planning, 17, 9–12. Chandler, D., Meisel, J., Hu, T., McGowen, M., & Madison, K. (1996). Client outcome in a threeyear controlled study of an integrated service agency model. Psychiatric Services, 47(12), 1337–1343. Chang, Y., & Warren, J. T. (1997). Allocative efficiency and diversification under price-cap regulation. Information Economics Policy, 9(1), 3–17. Chelimsky, E., & Shadish, W. R. (1997). Evaluation for the 21st Century: A handbook, Thousand Oaks, CA: Sage Publications. Chen, H., & Rossi, P. H. (1983). Evaluating with sense: The theory-driven approach. Evaluation Review, 7, 283–302. Chen, H., & Rossi, P.H. (1989). Issues in the theory-driven perspectives. Evaluation and Program Planning, 12, 299–306. Cimera, R. E., & Rusch, F. R. (1999). The cost-efficiency of supported employment programs: A review of the literature. International Review of Research in Mental Retardation, 22, 175–225. Clifford, P. I. (1999). The FACE recording and measurement system: A scientific approach to person-based information. Bulletin of the Menninger Clinic, 63(3), 305–331. Cohen, J. (1987). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates. Cohen, R. (Ed.), (1986). Justice: Views from the social sciences. New York: Plenum. Colarelli, S. M. (1998). Psychological interventions in organizations. American Psychologist, 53(9), 1044–1056. Conrad, K. J., Randolph, F. L., Kirby, M. W., Jr., & Bebout, R. R. (1999). Creating and using logic models: Four perspectives. Alcoholism Treatment Quarterly, 17(1–2), 17–31. Cook, J. A., & Jonikas, J. A. (1996). Outcomes of psychiatric rehabilitation service delivery. New Directions for Mental Health Services, 71 (pp. 33–47). San Francisco: Jossey-Bass Publishers. Cook, T. D. (1985). Postpositivist critical multipluralism. In R. L. Shortland & M. M. Mark (Eds.), Social science and social policy (pp. 21–62). Beverly Hills, CA: Sage Publications. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally. Coulter, D. L. (1997). Health-related application of quality of life. In R. L. Schalock (Ed.), Quality of life: Vol. 11: Application to persons with disabilities (pp. 95–104). Washington, DC: American Association on Mental Retardation. Cramer, J. A. (1994). Quality of life for people with epilepsy. Neurologic Clinics, 12, 1–14. Crimmins, D. B. (1994). Quality of life for persons with challenging behaviors: Intervention goal, contradiction in terms or both? In D. Goode (Ed.), Quality of life for persons with disabilities (pp. 208–217). Boston: Brookline Books. Criscione, T., Kastner, T. A., O’Brien, D., & Nathanson, R. (1994). Replication of a managed health care initiative for people with mental retardation living in the community. Mental Retardation, 32(1), 43–52. Cronbach, L. J. (1982). Designing evaluations of education and social programs. San Francisco: JosseyBass. Cummins, R. A. (1996). The domains of life satisfaction: An attempt to order chaos. Social Indicators Research, 38, 303–328. Cummins, R. A. (1997). Assessing quality of life. In R. I. Brown (ed.), Quality of life for people with disabilities: Models, research and practice (pp. 116–150). Cheltenham, UK: Stanley Thornes (Publishers) Ltd. Cummins, R. A. (1998). The second approximation to an international standard for life satisfaction. Social Indicators Research, 43, 307–334.
252
References
Dennis, M. L., Fetterman, D. M., & Sechrest, L. (1994). Qualitative and quantitative evaluation methods in substance abuse research. Evaluation and Program Planning, 17, 419–427. Denzin, N. K., & Lincoln, Y. S. (Eds.). (1994). Handbook of qualitative research. Thousand Oaks, CA: Sage Publications. Desmond, D. P., Maddux, J. F., Johnson, T. H., & Confer, B. A. (1995). Obtaining follow-up interviews for treatment evaluation. Journal of Substance Abuse Treatment, 12, 95–102. Dewan, N. A., & Carpenter, D. (1997). Performance measurement in healthcare delivery systems. In L. J. Dickstein & M. B. Riba (Eds.), American psychiatric press review of psychiatry, 16 (pp. 81–102). Washington, DC: American Psychiatric Press, Inc. Dickens, P. (1994). Quality and excellence in human services. Wiley Series in Clinical Psychology. Chichester: Wiley and Sons. Dickerson, F. B. (1997). Assessing clinical outcomes: The community functioning of persons with serious mental illness. Psychiatric Services, 48(7), 897–905. Dickey, B. (1996). The development of report cards for mental health care. In L. I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 142–160). Baltimore: Williams and Wilkins. Diener, E. (1984). Subjective well-being. Psychological Bulletin, 95, 542–575. Donabedian, A. (1982). Explorations in quality assessment and monitoring: The criteria and standards of quality (2). Ann Arbor, MI: Health Administration Press. Donabedian, A. (1992). The role of outcomes in quality assessment and assurance. Quality Review Bulletin, 18(11), 356–360. Drucker, P. F. (1994). Managing for results. New York: Harper & Row. Drucker, P. F. (1998, October 5). Management’s new paradigms. Forbes, 152–176. Dye, T. R. (1984). Understanding public policy (5th ed.). Englewood Cliffs, NJ: Prentice-Hall, Inc. Eastwood, E. A., & Fisher, B. A. (1988). Skills acquisition among matched samples of institutionalized and community-based persons with mental retardation. American Journal on Mental Retardation, 93(1), 75–83. Ebrahim, S. (1994). The goals of rehabilitation for older people. Reviews in Clinical Gerontology, 4(2), 93–95. Ebrahim, S. (1995). Clinical and public health perspectives and applications of health-related quality of life measurement. Social Science and Medicine, 41(10), 1383–1394. Edgerton, R. B. (1996). A longitudinal-ethnographic research perspective on quality of life. In R. L. Schalock (ed.), Quality of life: Vol. I: Conceptualization and measurement (pp. 83–90). Washington, DC: American Association on Mental Retardation. Epper, R. M. (1999). Applying benchmarking to higher education: Some lessons from experience. Change, 31(6), 24–31. Epstein, A. (1995). Performance reports on quality–Prototypes, problems, and prospects. New England Journal of Medicine, 333, 57–61. Epstein, J. H. (1998, March). Making sense of science: The rise of meta-analysis. The Futurist, 44–45. Etter, J., & Perneger, T. V. (1997). Quantitative and qualitative assessment of patient satisfaction in a managed care plan. Evaluation and Program Planning, 20(2), 129–135. Faden, R., & Leplege, A. (1992). Assessing quality of life: Moral implications for clinical practice. Medical Care, 30(5, Suppl.), 166–175. Fairweather, G. W.,& Davidson, W. S. (1986). An introduction to community experimentation: Theory, methods, and practice. New York: McGraw-Hill. Felce, D., & Perry, J. (1997). Quality of life: The scope of the term and its breath of measurement. In R. I. Brown (Ed.), Quality of life for people with disabilities: Models, research, and practice (pp. 56–70). Cheltenham, UK: Stanley Thornes (Publishers) Ltd. Fetterman, D. M. (1994). Steps of empowerment evaluation: From California to Cape Town. Program Planning and Evaluation, 17(3), 305–313.
References
253
Fetterman, D. M. (1997). Empowerment evaluation and accreditation in higher education. In E. Chelimsky & W. R. Shadish (Eds.), Evaluation for the 21st century: A handbook (pp. 381–395). Thousand Oaks, CA: Sage Publications. Fiedler, F. E., Bell, C. H., Chemers, M. M., & Patrick, D. (1984). Increasing mine productivity and safety through management training and organization development: A comparative study. Basic and Applied Social Psychology, 5,1–18. Finney, J. W., & Moos, R. H. (1989). Theory and method in treatment evaluation. Evaluation and Program Planning, 12, 307–316. Fisher, F., & Forester, J. (Eds.) (1987). Confronting values in policy analysis. Beverly Hills, CA: Sage Publications. Fishman, D. B. (1991). An introduction to the experimental versus the pragmatic paradigm in evaluation. Evaluation and Program Planning, 14, 353–363. Fishman, D. B. (1992). Postmodernism comes to program evaluation. Evaluation and Program Planning, 15, 263–270. Fishman, D. B. (1995). Postmodernism comes to program evaluation II: A review of Denzin and Lincoln’s Handbook of qualitative research. Evaluation and Program Planning, 18(3), 301–310. Fitz, D. (1989). Attrition and augmentation biases in time series analysis. Evaluation of clinical programs. Evaluation and Program Planning, 12, 259–270. Flanagan, J. C. (1982). Measurement of quality of life: Current state of the art. Archives of Physical Medicine and Rehabilitation, 63, 56–59. Floyd, A. S., Monahan, S. C., Finney, J. W., & Morley, J. A. (1996). Alcoholism treatment outcome studies, 1980–1992: The nature of the research. Addictive Behaviors, 21(4), 413–428. Fonagy, P. (1999). Process and outcome in health care delivery: A model approach to treatment evaluation. Bulletin of the Menninger Clinic, 63(3) 288–304. Foster, E. M., & Bickman, L. (1996). An evaluator’s guide to detecting attrition problems. Evaluation Review, 20(6), 695–723. Frazier, C. (1997). Cold mountain. New York: Vintage Books. French, M. T., Bradley, C. J., Calingaert, B., Dennis, M. L., & Karuntzos, G. T. (1994). Cost analysis of training and employment services in methadone treatment. Evaluation and Program Planning, 17, 107–120. Friedman, M. A. (1995). Issues in measuring and improving health care quality. Health Care Financing Review, 16(4), 1–13. Fujiura, G. T. (1998). Demography of family households. American Journals on Mental Retardation, 103(3), 225–235. Garaway, G. B. (1997). Evaluation, validity, and values. Evaluation and Program Planning, 20(1), 1–5. Gardner, J. F. (1999). Quality services. In J. F. Gardner & S. Nudler (Eds.), Quality performance in human services. leadership, values, and vision (pp. 3–19). Towson, MD: Brookes. Gardner, J. F., & Nudler, S. (1997). Beyond compliance to responsiveness: Accreditation reconsidered. In R. L. Schalock (Ed.), Quality of life: Vol. II: Application to persons with disabilities (pp. 135–148). Washington, DC: American Association on Mental Retardation. Gates, W. H. (1999). Business @ the speed of thought: Using a digital nervous system. New York: Warner Books. Gellert, G. A. (1993). The importance of quality of life research for health care reform in the USA and the future of public health. Quality of Life Research, 2(5), 357–361. Gettings, R. M. (1998). Core indicators project: Progress report #2. Alexandria, VA: National Association of State Directors of Developmental Disabilities, Inc. Gettings, R. M., & Bradley, V. J. (1997). Core indicators project. Alexandria, VA: National Association of State Directors of Developmental Disabilities Services, Inc. Given, B. A., Keilman, L. J., Collins, C., & Given, C. W. (1990) Strategies to minimize attrition in longitudinal studies. Nursing Research, 39(3), 184–186.
254
References
Gold, M., Van Gelder, M., & Schalock, R. L. (1999). A behavioral approach to understanding and managing organizational change (or how not to get mired in the mud). Journal of Rehabilitation Administration, 22(3), 191–207. Goodman, M., Hull, J. W., Terkelsen, K. G., & Smith, T. E., (1997). Factor structure of quality of life: The Lehman Interview. Evaluation and Program Planning, 20(4), 477–480. Graham, P. (1999). Implementation of an outcomes management model of treatment. Bulletin of the Menninger Clinic, 63(3), 346–365. Green, R. S., & Newman, F. L. (1999). Total quality management principles promote increased utilization of client outcome data in behavioral health care. Evaluation and Program Planning, 22, 179–182. Greenberg, D., & Wiseman, M. (1992). What did the OBRA demonstrations Do? In C. Manski & I. Garfinkel (Eds.), Evaluating welfare and training programs (pp. 25–75). Cambridge: Harvard University Press. Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework for mixedmethod evaluation designs. Educational Evaluation and Policy Analysis, 11, 255–274. Grissom, G. R. (1997). Treatment outcomes in inpatient and substance abuse programs. Psychiatric Annals, 27(2), 113–118. Guba, E. S., & Lincoln, Y. S. (1989). Fourth generation evaluation. Newbury Park, CA: Sage Publications. Halpern, A. (1993). Quality of life as a conceptual framework for evaluating transition outcomes. Exceptional Children, 59, 486–498). Harbert, A. S., & Ginsberg, L .H. (1990). Human services for older adults: Concepts and skills. Columbia: University of South Carolina Press. Hargreaves, W. A. (1992). A capitation model for providing mental health servuces in California. Hospital and Community Psychiatry, 43, 275–277. Harris, J. (1987). QUAL-fying the value of life. Journal of Medical Ethics, 11,142–145. Haynes, R. B., & Jadad, A. R. (1997). What’s up in medical informatics. Canadian Medical Association Journal, 157 (12), 1718-1720. Hays, R. D., Anderson, R., & Revicki, D. (1993). Psychometric considerations in evaluating health-related quality of life measures. Quality of Life Research, 2, 441–449. Heflinger, C. A. (1992). Client-level outcomes of mental health services for children and adolescents. New Directions for Program Evaluation, 54, 31–45. Heilbrun, K., & Griffin, P. A. (1998). Community-based forensic treatment. In R. M. Wettstein (Ed.), Treatment of offenders with mental disorders (pp. 168–210). New York: Guilford Press. Hernandez, M., Hodges, S. P., & Cascardi, M. (1998). The ecology of outcomes: System accountability in children’s mental health. Journal of Behavioral Health Services Research, 25(2),136–150. Hersen, M., & Barlow, D. H. (1984). Single-case experimental designs. New York: Pergamon Press. Hibbard, J. H., & Jewett, J. J. (1996). What type of quality information do consumers want in a health care report card? Medical Care Research and Review, 53(1), 28–47. Hodges, S. P., & Hernandez, M. (1999). How organizational culture influences outcome information utilization. Evaluation and Program Planning, 22, 183–197. Hoffman, F. L., Lechman, E., Russo, N., & Knauf, L. (1999). In it for the long haul: The integration of outcomes assessment, clinical services, and management decision-making. Evaluation and Program Planning, 22, 211–219. Holcomb, W. R., Parker, J. C., & Leong, G. B. (1997) Outcomes of inpatients treated on a VA Psychiatric Unit and a Substance Abuse Treatment Unit. Psychiatric Services, 48(5), 699–704. Holstein, M. B., & Cole, T. R. (1996). Reflections on age, meaning, and chronic illness. Journal of Aging and Identity, 1(1), 7–22. House, E. R. (1991). Realism in research. Educational Researcher, 20(6), 2–9.
References
255
Hughes, C., & Hwang, B. (1996). Attempts to conceptualize and measure quality of life. In R. L. Schalock (Ed.), Quality of life: Vol. I: Conceptualization and measurement (pp. 51–62). Washington, DC: American Association on Mental Retardation. Hughes, C., Hwang, B., Kim, J-H., Eisenman, L. T., & Killian, D. J. (1995). Quality of life in applied research: A review and analysis of empirical measures. American Journal on Mental Retardation, 99, 623–641. Hughes, D., Seidman, E., & Williams, N. (1993). Cultural phenomenon and the research enterprise: Toward a culturally anchored methodology. American Journal of Community Psychology, 21, 687–703. Hunt, P., Farron-Davis, F., Beckstead, S., & Curtis, D. (1994). Evaluating the effects of placement of students with severe disabilities in general education versus special classes. The Association for Persons with Severe Handicaps, 19(3), 200–214. Himmel, P. B. (1984). Functional assessment strategies in clinical medicine: The care of arthritic patients. In C. V. Granger & C. E. Gresham (Eds.), Functional assessment in rehabilitative medicine (pp. 343–363). Baltimore: Williams & Wilkins. Iezzoni, L. I., Heeren, T., Foley, S. M., & Daley, J. (1994). Chronic conditions and risk of inhospital death. Health Services Research, 29(4), 435–460. Institute of Medicine. (1991). Disability in America: Toward a national agenda for prevention. Washington, DC: National Academy Press. Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods of reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336– 352. Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12–19. Jencks, S. F. (1995). Measuring quality of care under Medicare and Medicaid. Health Care Financing Review, 16(4), 39–54. Jenkins, C. D. (1992). Assessment of outcomes of health intervention. Social Science and Medicine, 35(4), 367–375. Jenkins, C. D., Jono, R. T., Stanton, B-A., & Stroup-Benham, C. A. (1990). The measurement of health-related quality of life: Major dimensions identified by factor analysis. Social Science and Medicine, 35(4), 925–931. Jenkins, R. (1990). Towards a system of outcome indicators for mental health care. British Journal of Psychiatry, 157, 500–514. Johnson, R. B. (1998). Toward a theoretical model of evaluation utilization. Evaluation and Program Planning, 21, 93–110. Johnston, M. V. (1987). Cost-benefit methodologies in rehabilitation. In M. J. Fuhrer (Ed.), Rehabilitation outcomes: Analysis and measurement (pp. 99–113). Baltimore: Brookes. Kane, R. L., Bartlett, J., & Potthoff, S. (1995). Building an empirically based outcomes information system for managed mental health care. Psychiatric Services, 46(5), 459–461. Kaplan, R. (1992). Behavior as a central outcome in health care. American Psychologist, 45(10), 1211–1220. Kazdin, A. E., & Tuma, A. H. (Eds.) (1982). Single case research designs. San Francisco: JosseyBass. Keith, K. D., Heal, L. W. & Schalock, R. L. (1996). Cross-cultural measurement of critical quality of life concepts. Journal of Intellectual and Developmental Disabilities, 21(4), 273–293. Keith, K. D., & Schalock, R. L. (1994). The measurement of quality of life in adolescence: The Quality of Student Life Questionnaire. The American Journal of Family Therapy, 22(1), 83–87. Kenney, G., Rajan, S., & Soscia, S. (1998, January-February). State spending for Medicare and Medicaid home care programs. Health Affairs, 201–212.
256
References
Kerachsky, S., Thornton, C., Bloomenthal, A., Maynard, R., & Stephens, S. (1985). Impacts of transitional employment for mentally retarded young adults: Results of the STETS demonstration. Princeton, NJ: Mathematia Policy Research. Kinni, T. B. (1994, December 5). Measuring up. Industry Week, 27–28. Kiresuk, T. J. (1973). Goal attainment scaling at a county mental health service. Evaluation, Monograph No. 1, pp. 15–30. Kiresuk, T. J., & Lund, S. H. (1978). Goal attainment scaling. In C. C. Attkisson, W. A. Hargreaves, & M. J. Horowitz (Eds.), Evaluation of human services programs (pp. 341–369). New York: Academic Press. Kiresuk, T. J., & Sherman, R. E. (1968). Goal Attainment Scaling: A general method for evaluating comprehensive community mental health programs. Community Mental Health Journal, 4, 443–453. Kirkpatrick, D. L. (1967). Evaluation of training. In R. L. Craig & L. R. Bittel (Eds.), Training and development handbook (pp. 135–160). New York: McGraw-Hill. Klinkenberg, W. D., & Calsyn, R. J. (1996). Predictors of receipt of aftercare and recidivism among persons with severe mental illness: A review. Psychiatric Services, 47(5), 487–496. Lamb, H. R. (Ed.). (1996). Using client outcomes information to improve mental health and substance abuse treatment. San Francisco: Jossey-Bass Publishers. Lehman, A., Postrado, L., & Rachuba, L. (1993). Convergent validation of quality of life assessments for persons with severe mental illnesses. Quality of Life Research, 2, 327–333. Levine, S., & Croog, S. H. (1984). What constitutes quality of life? A conceptualization of the dimensions of life quality in health populations and patients with cardiovascular disease. In N. K. Wenger, M. E. Mattson, C. D. Furberg, & J. Elinson (Eds.), Assessment of quality of life in clinical trials of cardiovascular therapy (pp. 46–58). New York: Le Jacq. Lewis, D. R., Johnson, D. R., Erickson, R. N., & Bruininks, R. H. (1994). Multiattribute evaluation of program alternatives within special education. Journal of Disability Policy Studies, 5(1), 77–90. Lindstrom, B. (1992). Quality of life: A model for evaluating health for all. Soz Praventivmed, 37, 301–306. Lindstrom, B. (1994). Quality of life for children and disabled children based on health as a resource concept. Journal of Epidemiology and Community Health, 48(6), 529–530. Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage Publications. Lovell, D., & Rhodes, L. A. (1997). Mobile consultation: Crossing correctional boundaries to cope with disturbed offenders. Federal Probation, 61(3), 40–45. Lurigio, A. J. (1995). Crime and communities: Prevalence, impact, and programs. In L. B. Joseph (Ed.), Crime, communities and public policy (pp. 150–187). Chicago: University of Chicago Press. Lyons, J. S., O’Mahoney, M. T., Miller, S. I., & Neme, J. (1997). Predicting readmission to the psychiatric hospital in a managed care environment: Implications for quality indicators. American Journal of Psychiatry, 154(3), 337–350. Mager, R. F. (1962). Preparing obstructional objectives. Belmont, CA: Fearon Publishing Company. Majchrzak, A., & Wang, Q. (1996, September-October). Breaking the functional mind set in process organizations. Harvard Business Review, 93–99. Manderscheid, R. W. (1998). From many into one: Addressing the crisis of quality in managed behavioral health care at the millennium. Journal of Behavioral Health Services & Research, 25(2), 233–237. Marks, J., Sonoda, B., & Schalock, R. (1968). Reinforcement vs. relationship therapy for schizophrenics. Journal of Abnormal Psychology, 73(4), 397–402.
References
257
Masland, M. C., Piccagli, G. & Snowden, L. (1996). Planning and implementation of capitated mental health programs in the public sector. Evaluation and Program Planning, 19(3), 253– 262. Mawhood, C. (1997). Performance measurement in the United Kingdom (1985–1995). In E. Chelimsky & W. R. Shadish (Eds.), Evaluation for the 21st century: A handbook (pp. 134–144). Thousand Oakes, CA: Sage Publications. McHugo, G. J., Drake, R. E., Teague, G. B., & Xie, H. (1999). Fidelity to assertive community treatment and client outcomes in New Hampshire dual diagnosis study. Psychiatric Services, 50(6), 818–824. McGlynn, E. A. (1996). Setting the context for measuring patient outcomes. In J. A. Campbell (Ed.), New directions for mental health services, 71 (pp. 19–32). San Francisco: Jossey-Bass Publishers. McLaughlin, J. A., & Jordan, G. B. (1999). Logic models: A tool for telling your program’s performance story. Evaluation and Program Planning, 22, 65–72. McMurran, M., Egan, V., & Ahmadi, S. (1998). A retrospective evaluation of a therapeutic community for mentally disordered offenders. Journal of Forensic Psychiatry, 9(1), 103–113. Meyer, L. H., & Evans, I. M. (1993). Science and practice in behavioral intervention: Meaningful outcomes, research validity, and usable knowledge. Journal of the Association for Persons with Severe Handicaps, 18(4), 224–234. Mobley, M. J. (1998). Psychotherapy with criminal offenders. In A. K. Hess & I. B. Weiner (Eds.), The handbook of forensic psychology (2nd ed., pp. 603–639). New York: John Wiley & Sons, Inc. Moos, R. H., & King, M. J., (1997). Participation in community residential treatment and substance abuse patients’ outcomes at discharge. Journal of Substance Abuse Treatment, 14(1), 71–80. Moos, R. H., Pettit, B., & Gruber, V. A. (1995). Characteristics and outcomes of three models of community residential care for substance abuse patients. Journal of Substance Abuse, 7, 99–116. Morreau, L. E., & Bruininks, R. H. (1991). Checklist of Adaptive Living Skills. Austin, TX: DLM. Mowbray, C. T., Bybee, D., Collins, M. E., & Levine, P. (1998). Optimizing evaluation quality and utility under resource constraints. Evaluation and Program Planning, 21, 59–71. Mulkay, M., Ashmore, M., & Pinch, T. (1987). Measuring the quality of life: A sociological intervention concerning the application of economics to health care. Sociology, 21(4), 541–564. Nagel, S. (1990). Bridging theory and practice in policy and program evaluation. Evaluation and Program Planning, 13, 275–283. National Council on Disability. (1989). The education of students with disabilities: Where do we stand? Washington, DC: Author. Newcomer, K. E. (Ed). (1997). Using performance measurement to improve public and nonprofit programs. San Francisco: Jossey-Bass Publications. Newman, F. L., & Tejeda, M. J. (1996). The need for research that is designed to support decisions in the delivery of mental health services. American Psychologist, 51(10), 1040–1049. Nordenfelt, L. (1994). Concepts and measurement of quality of life in health care. Boston: Kluwer Academic Publishers. Oliver, M. (1992). Changing the social relations of research production? Disability, handicap and society, 7, 101–114. Osborne, D., & Gaebler, T. (1993). Reinventing government: How the entrepreneurial spirit is transforming the public sector. Reading, MA: Addison-Wesley. Osgood, C. E., May, W. H., & Miron, M. S. (1975). Cross cultural universals of affective meaning. Urbana, IL: University of Illinois Press. Parasuraman, A., Zeithaml, V. A., & Berry, L. (1988). SERVQUAL: A multi-item scale for measuring consumer perceptions of service quality. Journal of Retailing, 2, 12–40.
258
References
Patton, M. Q. (1997). Utilization-focused evaluation (3rd ed.). Beverly Hills, CA: Sage Publications. Peterson, K., & Bickman, L. (1992) Using program theory in quality assessments of children’s mental health services. In H. T. Chen & P. Rossi (Eds.), Using theory to improve program and policy evaluations, (pp. 146–170). New York: Greenwood Press. Phelps, L. A., & Hanley-Maxwell, C. (1997). School-to-work transitions for youth with disabilities: A review of outcomes and practices. Review of Educational Research, 67(2), 197–226. Popper, K. (1959). The logic of scientific discovery. New York: Harper & Row. Posavac, E. J., & Carey, R. G. (1980). Program evaluation: Methods and case studies. Englewood Cliffs, NJ: Prentice-Hall. Price Waterhouse. (1993). Performance measurement: The key to accelerating organizational improvement. Washington, DC: Price Waterhouse. Pulcini, J., & Howard, A. M. (1997). Framework for analyzing health care models serving adults with mental retardation and other developmental disabilities, Mental Retardation, 353, 209– 217. Rai,G. S., & Kelland, P. (1995). Quality of life cards: A novel way to measure quality of life in the elderly. Archives of Gerontology, 21(3), 285–289. Ramey, C. T., & Landesman-Ramey, S. (1992). Effective early intervention. Mental retardation, 30(6), 337–345. Raphael, D. (1996). Quality of life of older adults toward the optimization of the aging process. In R. Renwick, I. Brown, & M. Nagler (Eds.), Quality of life in health promotion and rehabilitation (pp. 290–306). Thousand Oaks, CA: Sage Publications. Rapp, C. A., Gowdy, E., Sullivan, W. P., & Winterstein, R. (1988). Client outcome reporting: The status method. Community Mental Health Journal, 24(2), 118–133. Renwick, R., Brown, I., & Nagler, M. (Eds.) (1997). Quality of life in health promotion and rehabilitation: Conceptual approaches, issues, and applications. Thousand Oaks, CA: Sage Publications. Ridenour, M. (1996). Performance accountability system: Services and costs—setting the stage. Government Services Newsletter, 30(3), 2–3. Rintala, D. H. (1987). Design and statistical considerations in rehabilitation outcomes analysis. In M. J. Fuhrer (Ed.), Rehabilitation outcomes: Analysis and measurement (pp. 87–97). Baltimore: Brookes. Roberts, A. R. (Ed.) (1997). Social work in juvenille and criminal justice settings (2nd ed.) Springfield, IL: Charles C. Thomas Publishers. Rockwood, K. (1995). Interaction of research methods and outcome measures. Canadian Journal on Aging, 14(1, Supp. 1), 151–164. Romney, D. M., Jenkins, C. D., & Bynner, J. M. (1992). A structural analysis of health-related quality of life dimensions. Human Relations, 45(2), 165–176. Rosenblatt, A., & Attkisson, C. C. (1993). Assessing outcomes for sufferers of severe mental disorder: A conceptual framework and review. Evaluation and Program Planning, 16, 347– 363. Rosenheck, R., Frisman, L., & Gallup, P. (1995). Effectiveness and cost of specific treatment elements on a program for homeless mentally ill veterans. Psychiatric Services, 46(11), 1131– 1140. Rossell, C. H. (1993). Using multiple criteria to evaluate public policies: The case of school desegration. American Politics Quarterly, 21, 155–184. Rossi, P. H., & Freeman, H. E. (1993). Evaluation: A systematic approach (5th ed.). Newbury Park, CA: Sage Publications. Rossi, P. H., Freeman, H. E., & Lipsey, M. W. (1999). Evaluation: A systematic approach (6th ed.). Thousand Oaks, CA: Sage Publications.
References
259
Rudolph, C., Lakin, K. C., Oslund, J. M., & Larson, W. (1998). Evaluation of outcomes and costeffectiveness of a community behavioral suport and crisis response demonstration project. Mental Retardation, 36(3), 189–197. Rugs, D., & Kutach, K. (1994). Evaluating children’s mental health service systems: An analysis of critical behaviors and events. Journal of Child and Family Studies, 3(3), 249–262. Rusch, F. R., & Chadsey, J. G. (Eds.). (1998). Beyond high school: Transition from school to work. Belmont, CA: Wadsworth Publishing Company. Rusch, F. R., Conley, R. W., & McCaughlin, W. B. (1993). Benefit-cost analysis of supported employment programs in Illinois. Journal of Rehabilitation (April–June), 31–36. Russell, D. W., & Buckwalter, K. C. (1991). Researching and evaluating model geriatric mental health programs, Part I: Design of mental health evaluation studies. Archives of Psychiatric Nursing, 5(1), 3–9. Sanders, J. R. (1997). Cluster evaluation. In E. Chelimsky and W.R. Shadish (Eds.), Evaluation for the 21st century: A handbook. (pp. 396–404). Thousand Oaks, CA: Sage Publications. Schalock, R. L. (1984). Comprehensive community services: A plea for interagency collaboration. In R. H. Bruininks and K. C. Lakin (Eds.), Living and learning in the least restrictive environment (pp. 37–63). Baltimore: Brookes. Schalock, R. L. (1986). Service delivery coordination. In F. R. Rusch (Ed.), Competitive employment issues and strategies (pp. 115–127). Baltimore: Brookes. Schalock, R. L. (1995a). Outcome-based evaluation. New York: Plenum Press. Schalock, R. L. (1995b). The assessment of natural supports in community rehabilitation services. In Karan, O. C. & S. Greenspan (Eds.), Community rehabilitation services for people with disabilities (pp. 184–203). Newton, MA: Butterworth-Heinemann. Schalock, R. L. (1996). Reconsidering the conceptualization and measurement of quality of life. In R. L. Schalock (Ed.), Quality of life. Volume I: Conceptualization and measurement (pp. 123– 139). Washington, DC: American Association on Mental Retardation. Schalock, R. L. (Ed.) (1997). Quality of life. Vol. II: Application to persons with disabilities. Washington, DC: American Association on Mental Retardation. Schalock, R. L. (1999). A quest for quality: Achieving organizational outputs and personal outcomes. In J. F. Gardner & S. Nudler (Eds.), Leadership for quality performance in human services, (pp. 55–81). Baltimore: Brookes. Schalock, R. L. (2000). Three decades of quality of life. In M. Wehmeyer & J. R. Patton (Eds.), Mental retardation in the 21st century (pp. 335–358). Austin: PRO-ED Publishers. Schalock, R. L., Bonham, G. S., & Marchand, C. B. (2000). Consumer based quality of life assessment: A path model of perceived satisfaction. Evaluation and Program Planning, 23(1), 77–88. Schalock, R. L., DeVries, D., & Lebsack, J. (1999). Enhancing the quality of life of older persons with developmental disabilities. In S. S. Herr & G. Weber (Eds.). Aging, rights and quality of life for older persons with developmental disabilities (pp. 81–92). Baltimore: Brookes. Schalock, R. L., Gadwood, L.S., & Perry, P. B. (1984). Effects of different training environments on the acquisition of community living skills. Applied Research in Mental Retardation, 5, 425–438. Schalock, R. L., & Genung, L. T. (1993). Placement from a community-based mental retardation program: A 15-year follow-up. American Journal on Mental Retardation, 98(3), 400–407. Schalock, R. L., & Harper, R. S. (1978). Placement from community-based mental retardation programs: How well do clients do? American Journal of Mental Deficiency, 83(2), 240–247. Schalock, R. L., Harper, R. S., & Carver, G. (1981). Independent living placement: Five years later. American Journal of Mental Deficiency, 86(2), 170–177. Schalock, R. L., Holl, C., Elliott, B., & Ross, I. (1992). A longitudinal follow-up of graduates from a rural special education program. Learning Disability Quarterly, 15, 29–38.
260
References
Schalock, R. L., & Keith, K. D. (1993). The Quality of Life Questionnaire. Worthington, Ohio: IDS Publishing Company. Schalock, R. L., Lemanowicz, J. A., Conroy, J. W., & Feinstein, C. S. (1994). A multivariate investigative study of the correlates of quality of life. Journal on Developmental Disabilities, 3(2), 59–73. Schalock, R. L., & Lilley, M. A. (1986). Placement from community based mental retardation programs: How well do clients do after 8-10 years? American Journal of Mental Deficiency, 90(6), 669–676. Schalock, R. L., McGaughey, J. J., & Kiernan, W. E. (1989). Placement into nonsheltered employment: Findings from national employment surveys. American Journal on Mental Retardation, 94(1), 80–87. Schalock, R. L., Nelson, G., Sutton, S., Holtan, S., & Sheehan, M. (1997). A multidimensional evaluation of the current status and quality of life of mental health service recipients. Siglo Cero, 28(4), 5–12. Schalock, R. L., & Thornton, C. (1988). Program evaluation: A field guide for administrators. New York: Plenum Press. Schalock, R. L., Touchstone, F., Nelson, G., Weber, L., Sheehan, M., & Stull, C. (1995). A multivariate analysis of mental hospital recidivism. Journal of Mental Health Administration, 22 (4), 358–367. Schalock, R. L., Wolzen, B., Elliott, B., Werbel, G., & Peterson, K. (1986). Post-secondary community placement of handicapped students: A five-year follow-up. Learning Disabilities Quarterly, 9, 295–303. Schein, E. H. (1990). Organizational culture. American Psychologist, 45(2), 109–119. Schorr, L. B. (1997). Common purpose: Strengthening families and neighborhoods to rebuild America. New York: Anchor Books, Doubleday. Schorr, L. B., Farrow, F., Hornbeck, D., & Watson, S. (1994). The case for shifting to resultsbased accountability. In N. Young, S. Gardner, S. Coley, L. Schorr, & C. Bruner (Eds.), Making a difference: Moving to outcome-based accountability for comprehensive service reforms (pp. 13–28) Des Moines, IA: Child and Family Policy Center/National Center for Service Integration. Schultz, R., & Heckhausen, J. (1996). A life span model of successful aging. American Psychologist, 51(7), 702–714. Scriven, M. S. (1972). The methodology of evaluation. In C. H. Weiss (Ed.), Evaluating action programs: Readings in social action and education (pp. 123–136). Boston: Allyn & Bacon. Scriven, M. S. (1991). Evaluation thesaurus. Newbury Park, CA: Sage Publications. Sederer, L. I., & Dickey, B. (Eds.). (1996). Outcomes assessment in clinical practice. Baltimore: Williams & Wilkins. Senge, P. M. (1990). The fifth discipline: The art and practice of organizational learning. New York: Doubleday. Seninger, S. F. (1998). Evaluating participation and employment outcomes in a welfare-to-work program. Evaluation and Program Planning, 21, 73–79. Shadish, W. R., Cook, T. D., & Leviton, L. C. (1991). Foundations of program evaluation: Theories of practice. Newbury Park, CA: Sage Publications. Sherwood, C. D., Morris, J. N., & Sherwood, S. (1975). A multivariate, nonrandomized matching technique for studying the impact of social interventions. In E. L. Struning & M. Guttentag (Eds.), Handbook of evaluation research: Vol. I (pp. 183–224). Beverly Hills, CA: Sage. Shortliffe, E. H. (1998). Medical informatics: Computer applications in health care. Menlo Park, CA: Addison-Wesley. Simpson, D. D. (1993). Drug treatment evaluation research in the United States. Psychology of Addictive Behaviors, 7(2), 120–128.
References
261
Smith, G. R., Manderscheid, R. W., Flynn, L. M., & Steinwachs, D. M. (1997). Principles for assessment of patient outcomes in mental health care. Psychiatric Services, 48(8), 1033– 1036. Snell, M. E., & Vogtle, L. K. (1997). Facilitating relationships of children with mental retardation in schools. In R. L. Schalock (Ed.), Quality of life, Vol. II: Application to persons with disabilities (pp. 43–62). Washington, DC: American Association on Mental Retardation. Sommer, B., & Sommer, R. (1997). A practical guide to behavioral research: Tools and techniques (4th ed.). New York: Oxford University Press. Srebnik, D., Hendryx, M., Stevenson, J., & Caverly, S. (1997). Development of outcome indicators for monitoring the quality of public mental health care. Psychiatric Services, 48(7), 903– 910. Stake, R. E. (1983). Program evaluation, particularly responsive evaluation. In G. F. Madaus, M. Scriven, & D. L. Stufflebeam (Eds.), Evaluation models (pp. 287–310). Boston: Kluwer-Nijhoff. Stone, D. (1997). Policy paradox: The art of political decision making. New York: Norton. Suchman, E. A. (1967). Evaluative research: Principles and practice in public service and social action programs. New York: Russell Sage Foundation. Tannahill, N. (1994). American government policy and politics (4th ed.). New York: HarperCollins College Publishers. Taylor, C. (1989). Sources of the self: The making of modern identity. Cambridge: Harvard University Press. Teague, G. B., Ganju, V., Hornik, J. A., & Johnson, J. R. (1997). The MHS1P mental health report card: A Consumer-oriented approach to monitoring the quality of mental health plans. Evaluation Review, 21(3), 330–341. Test, M. A. (1992). Training in community living. In R. P. Liverman (Ed.), Handbook of psychiatric rehabilitation (pp. 148–175). New York: Macmillan. Timmons, J. C., Foley, S., Whitney-Thomas, J., Green, J., & Casey, J. (1999). Negotiating the landscape: The path to employment for individuals in the TANF system. Boston: Institute for Community Inclusion. Boston Children’s Hospital. Trabin, T., Freeman, M. A., & Pallak, M. E. (Eds.). (1995). Inside outcomes: The national review of behavioral healthcare outcomes programs. Tiburon, CA: Central Link Publications. Trochim, W. (1989). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12, 1–16. Trybula, W. (1997). Data mining and knowledge discovery. Annual Review of Information Science and Technology, 32, 197–229. Tucker, M. S., & Codding, J. B. (1998). Standards for our schools: How to set them, measure them, and reach them. San Francisco: Jossey-Bass Publishers. Tudor, R. K. & Benov, J. W. (1995). Consumer health informaties. Journal of Health Care Marketing, 15(3), 64–71. Turnbull, A., Turnbull, H. R., & Blue-Banning, M. (1994). Enhancing inclusion of infants and toddlers with disabilities and their families: A theoretical and programmatic analysis. Infants and Young Children, 7, 1–14. United Nations (1991). United Nations principles for older persons. (Resolution 46191). NY: Author. U.S. Department of Education. (1996). Report to Congress: Goals 2000: Increasing student achievement through state and local initiatives. Washington, DC: Author. U.S. Government Accounting Office. (GAO). (1997, May). Managing for results: Analytic challenges in measuring performance (Report GAO/HEHS/GSD-97-138). Washington, DC: U.S. General Accounting Office. U.S. General Accounting Office. (GAO). (1998, December). Managing for results: Measuring program results that are under limited control (Report GAO/GGD-99-16). Washington, DC: U.S. General Accounting Office.
262
References
Vanderwood, M. L., Spande, G.E., Thurlow, M. L., & Ysseldyke, J. E. (1995). Willing but unable: The search for data on the outcomes of schooling. Journal of Disabilitiy Policy Studies, 6(1), 23–50. Van Gelder, M., Gold, M. & Schalock, R. L. (1996). Does training have an impact: The evaluation of competency based staff training programs in supported employment. Journal of Rehabilitation Administration, 20(4), 273–290. Walsh, K. K. & Kastner, T. A. (1999). Quality of health care for people with developmental disabilities: The challenge of managed care. Mental Retardation, 37(1), 1–15. Ware, J. E., Bayliss, M. S., Rogers, W. H. & Kosinski, M. (1996). Difference in 4-year health outcomes for elderly and poor, chronically ill, patients treated in HMO and fee-for-service systems: Results from the medical outcomes study. Journal of the American Medical Association, 276(13), 1039–1047. Weiss, C. H. (1972). Evaluation research: Methods for assessing program effectiveness. Englewood Cliffs, NJ: Prentice-Hall. Weiss, C. H. (1987). Evaluating social programs: What have we learned? Society, 25(1), 40–45. Weiss, C. H. (1988). Evaluation for decisions: Is anybody there? Does anybody care? Evaluation Practice, 9, 5–19. Whitney-Thomas, J. (1997). Participatory action research as an approach to enhancing quality of life for individuals with disabilities. In, R. L. Schalock (Ed.), Quality of life. Vol. II: Application to persons with abilities (pp. 181–198). Washington, DC: American Association on Mental Redardation. Whitney-Thomas, J., Timmons, J. C., Thomas, D. M., Gilmore, D. S., & Lynch, S. L. (1997). Changes in Vocational Rehabilitation practice since the 1992 Rehabilitation Act Amendments. Boston: Institute for Community Inclusion Boston Children’s Hospital. Wholey, J. S. (1983). Evaluation and effective public management. Boston: Little-Brown. Wholey, J. S. (1985). Managing for high performance: The role of evaluation. Evaluation News, 6, 40–50. Wholey, J. S. (1987). Evaluability assessment, developing program theory. In L. Bickman (Ed.), Using program theory in evaluation. New directions for program evaluation, no. 33 (pp. 60–81). San Francisco: Jossey-Bass. Wholey, J. S. (1997). Trends in performance measurement. In E. Chelimsky & W. R. Shadish (Eds.), Evaluation for the 21st century: A handbook (pp. 124–133). Thousand Oaks, CA: Sage Publications. WHOQOL Group. (1993). Study protocol for the World Health Organization project to develop a quality of life assessment instrument: WHOQOL. Quality of Life Research, 2, 153–159. Wildavsky, A. (1979). Speaking truth to power. New York: John Wiley. Williams, A. (1985). Economics of coronary artery bypass grafting. British Medical Journal, 291, 326–329. World Health Organization (1997). ICIDH-2 International classification of impairments, activities and participation. Geneva, Switzerland: WHO. Wren, D. (1979). The evolution of management thought. New York: John Wiley & Sons. Ysseldyke, J. E., Thurlow, M. L., & Shriner, J. (1992). Outcomes are for special educators too. Teaching Exceptional Children, 25, 38–50. Ysseldyke, J. E., Thurlow, M. L., & Gillman, C. J. (1993). Educational outcomes and indicators for Students completing school. Minneapolis: University of Minnesota, National Center on Educational Outcomes.
Author Index Abeles, N.,219 Ahmadi, S.,152,154 Albin-Dean, J.E.,33 Albrecht, K.,131 Alemi, F.,150 Anderson, R.,188 Andrews, F.M.,189 Ankuta, G.Y.,219 Ashbaugh, J.,18 Ashmore, M.,188 Attkisson, C.C.,141,143 Bachrach, L.L.,144 Baker, F.,71,136 Baltes, M.M.,148 Baltes, P.B.,148 Barlow, D.H.,71 Bartlett, J.,21,105 Bech, P.,188 Benov, J.W.,245 Berry, L.,175 Bhaskar, R.,13 Bickman, L.,22,143,144,177, 178,230 Blank, R.K.,136 Blue-Banning, M.,117 Bond, G.R.,143,180 Bonham, G.S.,56 Bradley, C.J,148 Bradley, V.J.,18,134,178 Brennan, F.,245 Brown, I.,187 Bruininks, R.H.,18,138,148 Bryant, D.M.,22,177,178 Buckwalter, K.C.,211
Bynner, J.M.,188 Byrne, J.M.,152,154 Calkins, C.F.,79 Calsyn, R.J.,144,201 Camp, R.C.,132 Campbell, D.T.,71,75,80,82 Campbell, J.A.,71,144,189 Caracelli, V.J.,12 Carey, R.G.,12,166,168 Carpenter, D.,169 Carpinello, S.,169 Carver, G.,209 Cascardi, M.,152 Chambers, D.E.,117 Chambers, F.12,117 Chandler, D.,53,55,144 Chadsey, J.G.134,138 Chang, Y.,205,237 Chelimsky, E.,15,170 Chen, H.,13,177,247 Cimera, R.E.,102 Clifford, P.I.,143,244 Codding, J.B.,132,136 Cohen, J.,87,220 Cohen, R.,98 Colarelli, S.M.,33 Cole, T.R.,149 Conley, R.W.,76 Conrad, K.J.,13 Cook, J.A.,144 Cook, T.D.,12,71,75,80,238, 246 Coulter, D.L.,187,188 Cramer, J.A.,188
263
Crimmins, D.B.,187 Criscione, T.,205 Cronbach, L.J.,70 Croog, S.H.,188 Cummins, R.A.,25,188,189 Curbow, B.,71 Davidson, W.S.,73 Dennis, M.L.,102 Denzin, N.K.,13 Desmond, D.P.,228 DeVries, D.,149,181,187 Dewan, N.A.,169 Dickerson, F.B.,144 Dickey, B.,143,169 Diener, E.,189 Donabedian, A.,141,143 Drucker, P.F.,20,33,123 Dye, T.R.,97,120 Eastwood, E. A.,78 Ebrahim, S.,188 Edgerton, R.B.,25,189 Egan, V.,152,154 Epper, R.M.,243 Epstein, A.,242 Epstein, J.H.,242 Etter, J.,175 Evans, I.M.,138 Faden, R.,24,188 Fairweather, G.W.,73 Felce, D.,187,188,189 Fetterman, D.M.,52,102,103, 188
264 Fielder, F.E.,80 Fisher, F.,78,98 Fishman, D.B.,13,14,238 Fitz, D.,228 Flanagan, J.C.,189 Floyd, A.S.,150 Follette, W.C.,219 Fonagy, P.,143 Forester, J.,98 Foster, E,M.,228,230 Frazier, C.,127 Freeman, M.A.,12,18, 50,170 French, M.T.,206 Friedman, M.A.,26 Frisman, L.,150 Fujiura, G.T.,108,109,110,111 Gadwood, L.S.,83 Gaebler, T.,172,235 Gallup, P.,150 Garaway, G.B.,220 Gardner, J.F.,18,131,148,172, 187 Gates, W.H.,244 Gellert, G.A.,188 Genung, L.T.,153,181,210, 213,216 Gettings, R.M.,18,134,148,178 Gillman, C.J.,138 Ginsberg, L.H.,149 Given, B.A.,228 Gold, M.,72,79 Goodman, M.,144 Graham, P.,143 Graham, W.F.,12 Green, R.S.,33 Greenberg, D.,59 Greene, J.C.,12 Griffin, P.A.,152.154 Grissom, G. R.,150 Gruber, V.A.,150 Guba, E.S.,13,102,132 Halpern, A.,138 Hanley-Maxwell, C.,138 Harbert, A.S.,149 Hargeaves, W. A.,53 Harper, R.S.,74,209,226 Harris, J.,188 Haynes, R.B.,245
Author Index Hays, R.D.,188 Heal, L.W.,174,187 Heckhausen, J.,188 Heflinger, C.A.,144 Heilbrun, K.,152,154 Hernandez, M.,33,152 Hersen, M.,71 • Hibbard, J.H.,169 Hodges, S.P.,33,152 Hoffman, F.L.,33 Holcomb, W.R.,150 Holstein, M.B.,149 Howard, A.M.,143 Hughes, C.,188 Hughes, D.,135 Hunt, P.,137,138 Hwang, R.,188 lezzoni, L.I.,141,143 Jacobson, N.S.,219 Jadad, A.R.,245 Jencks, S.F.,187 Jenkins, C.D.,187,188 Jenkins, R.,198 Jewett, J.J.,169 Johnson, R.B.,35 Johnston, M.V.,208 Jonikas, J.A.,144 Jordan, G.B.,12,13,241 Kane, R.L.,21,105 Kaplan, R.,18 Kastner, T.A.,143 Kazdin, A.E.,71 Keith, K.D.,57,58,136,174,187 Kenney, G.,206 Kerachsky, S.,87,91 Kiernan, W.E.,226.227 King, M.J.,132,150 Kinni, T.B.,132.243 Kiresuk, T.J.,166,167,168 Kirkpatrick, D.L.,79 Klinkenberg, W.D.,201 Kutach, K.,144 Lamb, H.R.,144 Landesman-Ramey, S.,81 Lebsack, J.,149,181,188 Lehman, A.,144.187
Leong, G.B.,150 Leplege, A.,24,188 Levine, S.,188 Leviton, L.C.,12,238,246 Lewis, D.R.,103 Lilley, M.A.,209,226 Lincoln, Y.S.,13,102,132 Lindstrom, B.,24,187,188 Linn, R. L.,136 Lipsey, M.W.,50,170,219,220 Lovell, D.,152,154 Lund, S.H.,167 Lurigio, A.J.,152 Lyons, J.S.,144 Mager, R.F.,167 Manderscheid, R.W.,143 Mank, D.M.,33 Marchand, C.B.,56 Marks, J.,78 Masland, M.C.,143 Mawhood, C.,129,131 May, W.H.,174 McCaughlin, W.B.,76 McHugo, G.J.,176 McGaughey, J.J.,226,227 McGlynn, E.A.,144 McLaughlin, J.A,,12,13,241 McMurran, M.,152,154 Meyer, L.H.,138 Miron, M.S.,174 Mobley, M.J.,152 Moos, R.H.,132,150 Mobley, J.A.,154 Morreau, L.E.,18 Morris, J.N.,87 Mowbray, C.T.,242 Mulkay, M.,188 Nagel, S.,98,118 Nagler, M.,187 Newcomer, K.E.,129 Newman, F.L.,33.119 Nordenfelt, L.,187 Nudler, S.,148,187 O´Neil, H. F.,136 Oliver, M.,188 Osborne, D.,172,235 Osgood, C.E.,174
265
Author Index Pallak, M.E.,18 Parasuraman, A.,175 Parker, J.C.,150 Patton, M.Q.,13,35 Perneger, T.V.,175 Perry, J.,187,188,189 Perry, P.B.,83 Peterson, K.,177 Pettit, B.,150 Phelps, L.A.,138 Piccagli, G.,143 Pinch, T.,188 Popper, K.,117 Posavac, E.J.,12,166,168 Postrado, L.,144,187 Potthoff, S.,21,105 Pulcini, J.,143 Rachuba, L.,144 Rajan, S.,206 Ramey, C.T.,81 Raphael, D.,148 Rapp, C.A.,198 Renwick, R.187 Revenstorf, D.,219 Revicki, D.,188 Rhodes, L,A.,152,154 Ridenour, M.,18,206 Rintala, D.H.,213 Roberts, A.R.,152,154 Rogers, W.H.,189 Romney, D.M.,188 Rosenblatt, A.,141,143 Rosenheck, R.,150 Rossell, C.H.,118 Rossi, P.H.,12,13,50,170,177, 246 Rudolph, C.,170,171 Rugs, D.,144 Rusch, F.R.,76,102,134,138 Russell, D. W.,211
Sanders, J.R.,50 Schalock, R.L.4,5,18,24,25, 33,56,57,58, 72,74,78,79, 83,99,102,134,136,138,148, 149,153,161,163,174,175, 176,181,183,186,187,188, 189,200,201,206,209,210, 213,215,216,218,225,226, 227,229 Schein, E.H.,223 Schorr, L.B.,3,4,14,15,117, 131,132,134 Schultz, R.,188 Scriven, M.S.,12,102 Sechrest, L.,102 Sederer, L.I.,143 Seidman, E.,135 Seninger, S. F.,59,60,62 Shadish, W.R.,12,15,170,238, 246 Sherman, R.E.,166 Sherwood, C. D.,87 Sherwood, S.,87 Shortliffe, E.H.,244 Shriner, J.,138 Simpson, D.D.,150 Smith, G.R.,162 Snell, M.E.,187 Snowden, L.,143 Sommer, B.,172,174,175 Sommer, R.,172,174,175 Sonoda, B.,78 Soscia, S.,206 Srebnik, D.,144 Stake, R.E.,13 Stanley, J.C.,71,82 Stone, D.,117 Suchman, E.A.,12
Taylor, C.,148,149 Teague, G.B.,169 Tejeda, MJ.,119 Test, M.A.,53,180 Thornton, C.,92,206 Thurlow, M.L.,138,148 Timmons, J.C.,108,112,113 Trabin, T.,18 Trochim, W.,177 Truax, P.,219 Trybula, W.,241 Tucker, M.S.,132,136 Tudor, R. K.,245 Tuma, A.H.,71 Turnbull, A.,117 Turnbutl, H.R.,117 Van Gelder, M.,72,79 Vanderwood, M.L.,136,138, 161,162 Vogtle, L.K.,187 Walsh, K.K.,143 Ware, J.E.,149 Warren, J.T.,205,237 Weiss, C.H.,4,13,102,103 Whitney-Thomas, J.,108,114, 115,116,174,188 Wholey, J.S.,13,25,42,107, 129,130,131 Wildavsky, A.,99 Williams, A.,188 Williams, N.,135 Wiseman, M.,59 Wren, D.,131,172 Ysseldyke, J.E.,138,148 Zeithaml, V.A.,175
Tannahill, N.,98 Taxman, F.S.,152,154
Subject Index Accountability accountability systems, 133–134 assessment, 20–24 current emphasis on, 6 dimensions, 130 impact on outcome-based evaluation, 130 multiple measurement approach to, 20–25 results-based accountability, 131 See also Performance assessment; Consumer appraisal; Functional assessment; Personal appraisal Action steps to guide organization change, 26, 28–32, 33–35 Adaptive behavior, 181–182 Level of Assistance Evaluation Scale, 184– 185 Aging recent changes in approaches to evaluating, 148–149 outcome measures,149 Analysis definition, 7 Analyzing outcomes input variables, 197–199 multivariate analysis model, 217 overview, 196–197 throughput variables, 199–207 Attitude scales, 173–174 Attrition dealing with, 228–230 definition, 228 Benchmarks, 132,242 Benefit-cost analysis definition, 99
Benefit-cost analysis (cont.) key terms, 100 suggested approaches to, 100–102 Between subject evaluation design, 49 See also Evaluation designs Clinical significance, 218–220 Comparison conditions, 48, 67 See also Evaluation designs Complementarity, 12 Consumer appraisal appraisal methods, 22–23 attitude scales, 173–174 definition, 10 questionnaires, 174–175 rating scales, 172–173 See also Rating scales; Attitude scales; Questionnaires; Consumer satisfaction; Fidelity to the model; Satisfaction Consumer satisfaction, 131–132 Contextual variables analysis of, 216–230 key contextual variables, 222 Contextualism, 216–218 Core data sets, 196–197 Core service functions, 199–197 Corrections evaluation, 152–154 factors impacting outcome-based outcome measures, 154 recent changes in, 152 Cost-efficiency analysis, 102 Cost estimates, 201–207 Critical performance indicators individual performance, 30–31
267
268 Critical performance indicators (cont.) individual value, 30–31 organizational performance, 30–31 organization value, 30–31 Data collection guidelines, 35 mining, 241 quality, 51 relevance, 51 standards, 243 See also Core data sets Descriptive analysis, 213 Disabilities outcome measures, 147 recent changes in approaches to evaluating, 144–147 Education outcome measures, 137 recent changes regarding accountability requirements and reporting, 136–137 Effectiveness evaluation basis of, 42 data collection and analysis, 54 definition, 7, 42 examples, 53-62 impact of program’s geography, 50 methodology, 54 model, 43 performance goals/anticipated outcomes, 53 purpose and comparison condition, 53 Effect size, 219–220 Efficiency assessment, 169–171 See also Benefit-cost analysis; Cost-efficiency analysis Empowerment evaluation, 52–53 Evaluation definition, 6 Evaluation approaches, See Outcome-based evaluation Evaluability evaluability assessment, 25, 27 model, 25 program evaluation factors, 26 Evaluation designs, 48–49, 68-70 See also Experimental/control design; Matched pairs design; Hypothetical comparison design; Longitudinal status comparisons; Pre-post change comparisons; Person as own comparison design
Subject Index Evaluation strategies, See Outcome-based evaluation Evaluation theory, 12–13 See also Postmodernist approaches to evaluation Evaluation utilization, 13–14 Experimental/control design, 75, 81 See also Nonequivalent control group design; Time series designs Exploratory correlation analysis, 213 External validity, 76–77 Fidelity to the model concept, 22–23 examples, 178–180 measurement of, 22–23 standards, 176–178 Focus groups, 103 Formative feedback model, 14 use, 13–14, 52 Functional assessment, 10, 22–24, 180–186 See also Adaptive behavior; Role status Future outcome-based evaluation scenarios balance between performance measurement and value assessment, 237–238 evaluation theory: embracing the postmodernist paradigm, 238–239 increased variability of service delivery system, 235–236 managing for results, 239–242 outsourcing of evaluation, 242–246 Goal attainment scaling, 166–167 Group comparisons, 213–214 Health care recent changes in, 139–143 outcome measures, 142 Hypothetical comparison group design, 73, 76 Impact evaluation comparison condition, 67 definition and use, 7, 66, 93-94 examples, 83–93 guidelines, 70 impact evaluation designs, 70–75 outcomes vs. impacts, 67 overview, 66 steps involved in, 82–83 See also Evaluation designs
Subject Index Indicators behavioral skill, 182 role status, 182 Individual performance outcomes, 45–46 Individual value outcomes, 46–47 Informatics, 244–245 Initiation, 12 Internal validity definition, 75 overcoming threats to, 220–221 threats to, 75, 220–221 Interpreting outcomes contextualism, 216–218 See also Clinical significance; Internal validity; Organization variables; Attrition Logic models, 241 Longitudinal status comparison design, 73– 74
Managing for results, 239–242 Matched pairs (cohort) design, 73, 75, 78–79 Measurement standards, 164–165 See also Reliability; Validity; Standardization group; Norms Mental health recent changes in evaluation strategies, 143–144 outcome measures, 145 Methodological pluralism, 8-12 See also Performance assessment; Consumer appraisal; Functional assessment; Personal appraisal Mixed methods evaluations, 12 Multivariate analysis, 214–216 Nonequivalent control group design, 80–81 Norms, 165 Organization changes made, 37–38 evaluation capability, 30–31 key contextual variables, 221–227 personality, 36 Organization improvement action steps involved in, 26, 28–32, 33–35 organization personalities, 29, 32–31 Organization performance outcomes, 44 Organization value outcomes, 44-45
269 Outcomes definition, 7 external influences on, 216–230 guidelines for selecting, 32 immediacy of, 50 outcome indicators, 162–163 overview, 127–128 selection criteria, 134–135 short term vs. long term, 241 vs. impacts, 67 See also Analyzing outcomes; Individual performance outcomes; Individual value outcomes; Interpreting outcomes; Organization performance outcomes; Organization value outcomes Outcome-based evaluation comparison with other types of evaluation, 12–13 definition, 6–7 elements, 7–10 measurement approaches in, 10 model, 9 outcome measures in, 10 overview, 5-6 principles, 234 questions and answers about, 3–4 trends impacting, 231 See also Program evaluation; Effectiveness evaluation; Impact evaluation; Policy evaluation Outcome-based evaluation utilization key success factors, 35–37 Outcome measures aging, 148–149 corrections, 152–154 disabilities, 144–148 health care, 139–143 generic, 155–156 mental health, 143–144 regular education, 136–137 selection categories, 128 special education, 137–139 substance abuse, 150–151 use of, 19–20 See also Individual performance outcomes; Individual value outcomes; Organization performance outcomes; Organization value outcomes Outcome measurement guidelines, 193 measurement approach, 161
270 Outcome measurement (cont.) measurement foci, 160 measurement standards, 164–165 outcomes and outcome indicators, 162–163 overview, 159–160 See also Performance assessment; Consumer appraisal; Functional Assessment; Personal appraisal Outcome-oriented monitoring system, 34–35 Outcomes Planning Inventory, 30–31 Outcomes research, 123 Outsourcing of evaluation, 242–245 Performance assessment advantages and disadvantages, 20–22 definition, 10 measures, 165–171 relation to Government Performance and Results Act, 129–130 See also Goal attainment scaling; Report cards; Efficiency assessment Performance goals, 34,43 See also Organization performance outcomes; Organization value outcomes; Individual performance outcomes; Individual value outcomes Performance measurement vs. value assessment, 237–238 Person as own comparison evaluation design, 71 Personal appraisal, 24–25, 186–193 definition, 10 use of satisfaction in, 25 See also Quality of life Policy evaluation criteria for, 119 data sets, 103, 105–107 definition, 97 examples, 109–116 guidelines, 116–118 model, 102–104 process steps, 107–109 societal values to incorporate into, 119–120 See also System-level data sets Postmodernist approaches to evaluation, 15, 238–239 Practice guidelines, 132 Pragmatic evaluation paradigm, 14–15
Subject Index Pre-post change comparison design, 71–73 Program evaluation definition, 6 model, 18–19, 38 overview, 17–18 See also Post modernist approaches to evaluation Program development phases of, 49–50 Program evaluation data successful implementation factors, 37 use of, 35–36 Psychometric measurement standards See Norms; Reliability; Standardization group; Validity Public policy definition, 97 formulation, 118 process, 98–99 Public policy evaluation See Policy evaluation Quality dimension to reform movement, 131–134 Quality of life assessment of, 189–193 concept, 187–188 health-related, 188 person-centered, 188–189 Questionnaires, 174–175 Rating scales, 172–173 Recipient characteristics, 197–199 Reform movement accountability dimension, 129–130 characteristics, 128–129 impact on outcome-based evaluation, 130 performance plans, 130 performance reports, 130 quality dimension, 131–134 strategic plans, 130 See also Accountability; Performance assessment; Quality of life Reliability definition, 164 types, 164 Report cards, 167–169, 242 variables used in, 169 Role status, 182-183
271
Subject Index Satisfaction, advantages and disadvantages, 176 dimensions, 175–176 measurement of, 22 Special education recent changes in, 137–138 outcome measures, 139 Standardized group, 165 Statistical analyses, 211, 213–216 Statistical guidelines, 210–213 Statistical principles, 207–210 Statistical vs. practical utility, 212 Strategic planning, 34 Substance abuse recent changes in approaches to evaluation, 150 outcome measures, 151
Supports, 152, 201, 203–204 System-level data sets, 105–107 Time series designs, 81–82 Total quality management (TQM), 33–34 Triangulation, 12 Validity, 165 Value assessment role in outcome-based evaluation, 237–238 See also Individual value outcomes; Organization value outcomes Within-subjects evaluation design examples, 48–49 See also Evaluation designs