When Handbook of Normative Data for Neuropsychological Assessment was originally published in 1999. it was the first bo...
246 downloads
2124 Views
25MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
When Handbook of Normative Data for Neuropsychological Assessment was originally published in 1999. it was the first book to provide neuropsychologists with summaries and critiques of normative data for neuropsychological tests. The second edition, which has been revised and updated throughout, 1.1resents data for ~6 commonly used neuropsychological tests, including: Trailmaking. Color Trails, Stroop Color Word Interference, Auditory Consonant Trigrams. Paced Auditory Serial Addition, Ruff ~ & 7. Digit Vigilance. Boston Naming, Verbal Fluency. Rey-Osterrieth Complex Figure, Hooper Visual Organization. Visual Form Discrimination, Judgment of Line Orientation. Ruff Figural Fluency. Design Fluency. Tactual Performance, Wechsler Memory Scale-Revised, Rey Auditory-Verbal Learning. Hopkins Verbal Learning. WHO/UClA Auditory Verbal Learning, Benton Visual Retention, Finger Tapping, Grip Strength (Dynamometer). Grooved Pegboard. Category. and Wisconsin Card Sorting tests. In addition. California Verbal Learning (CVLT and CVLT- II). CERAD ListLearning, and Selective Reminding Tests, as well as the newest versions of the Wechsler Memory Scale (WMS-Ill and WMS-IIIA). are reviewed. Locator tables guide the reader to the sets of normative data that are best suited to each individual case. depending on the demographic characteristics of the patient. and highlight the advantages associated with using data for comparative purposes. Those using the book have the option of reading the authors' critical review of the normative data for a particular test, or simply turning to the appropriate data locator table for a quick reference to the relevant data tables in the Appendices. The second edition includes reviews of 15 new tests. The way the data are presented has been changed to make the book easier to use. Meta-analysis tables of predicted values for different ages (and education. where relevant) are included for nine tests that have a sufficient number of homogeneous datasets. No other reference offers such an effective framework for the critical evaluation of normative data for neuropsychological tests. Like the first. the second edition will be welcomed by practitioners, researchers. teachers, and graduate students as a unique and valuable contribution to the practice of neuropsychology.
Maura Mitrushina, Ph.D., is Professor of Psychology at California State University. Northridge, and Associate Clinical Professor of Psychiatry at UClA School of Medicine. She is an ABPP/ABCN diplomate and maintains a clinical and forensic practice in Encino, California. Her research interests include cognitive correlates of normal aging and differential diagnosis of dementia, as well as factors influencing rates of recovery after traumatic brain injury. Kyle B. Boone, Ph.D., is Professor-inResidence of Psychiatry at UClA School of Medicine, and Director of Neuropsychological Services and Training at Harbor- UClA Medical Center. She is an ABPP/ABCN diplomate and maintains a clinical and forensic practice in Torrance, California. She has conducted research on the development and validation of techniques to identify noncredible cognitive performance, and on the effects of demographic factors and medical and psychological illnesses on neuropsychological test performance.
Jill Razani, Ph.D., is an Assistant Professor of Psychology at California State University, Northridge, and a licensed clinical psychologist in the state of California. In the past, she has conducted research on cognitive aspects of aging and neurodegenerative disorders. Presently, she has an active program of research examining issues related to multicultural and cross-cultural neuropsychology, as well as the relationship between cognitive functioning and activities of daily living in patients with dementia. Louis F. D'Elia, Ph.D., is Assistant Clinical Professor of Psychiatry, and former CoDirector of the Neuropsychology Assessment Laboratory at the University of California, Los Angeles, School of Medicine. He remains active in the training, supervision, and mentaring of UClA Postdoctoral Neuropsychology Fellows in his work with them in his private practice in Pasadena, California.
jACKET DESIGN: E\'E SIEGEL
OXFORD UNIVERSITY PRESS www.oup.com
PRAISE FOR THE FIR T EDITIO ''Should neuropsychologists purchase this volume? The answer is an unqualified yes. The book is a very valuable asset to any neurop~ ·chology collection. This reviewer wholeheartedly recommends it for purchase; the tables alone justify the pnce .... The authors are due a great deal of credit for gathering together material that most of us would understand as a multi-year project. In examining this book in even a cur orv way. the prospective buver will see that the effort needed to bring it to fruition is humbling .. -Kenneth M Adams. PhD. in]oumalofClinical and Experimental Neurops_rcholog.r
"Overall, Mitrushina et al. have made a substantial contribution with their text. and it nicely complements other thorough overviews of neuropsychology authored by Lezak or Spreen and Strauss. It is concise. timely, comprehensive, and cogent, and it holds great utility for the practice of clinical neuropsychology.... Let us hope they continue this good work as additional data emerge ... -Michael R. Basso, PhD, in Neuropsychiatry, Neuropsychology. and Behavioral Neurology
" ... a valuable and well-written addition to the literature that should find its way onto the reference shelves of practicing neuropsychologists. The book will be a useful educational tool. ... There IS a lot to be gained from consulting this book. In readability, utility, and practicality. it goes way beyond the norms." -Russell M. Bauer. PhD, infoumal of the International Neuropsychological Society
90000
9 780195 169300 ISBN 0-19-516930-1
Handbook of Normative Data for Neuropsychological Assessment
OXFORD UNIVERSITY PRESS
Oxford University Press, Inc., publishes works that further Oxford University's objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam
Copyright© 2005 by Maura Mitrushina, Kyle B. Boone,
Jill Razani, and Louis F. D'Elia
Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying. recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Handbook of normative data for neuropsychological assessment I Maura Mitrushina ... [et al.].- 2nd ed. p. ; em. Includes bibliographical references and indexes. ISBN-13 978-0-19-516930-0 ISBN 0-19-516930-1 1. Neuropsychological tests-Handbooks, manuals, etc. 2. Reference values (Medicine)-Handbooks, manuals, etc. [DNLM: 1. Neuropsychological Tests. 2. Reference Values. WL 141 H23654 2005] RC386.6.N48M58 2005 616.8'0475-dc22 2004054724
9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper
With admiration and gratitude, we dedicate this book to those professionals whose normative research efforts made this volume possible.
Preface
The Handbook of Nonnative Data for Neuropsychological Assessment is our attempt to provide ready access to neuropsychological normative data and to evaluate their strengths and weaknesses. Because the interpretation of test scores profoundly affects the quality and utility of neuropsychological reports and research, we felt that a critical compendium containing most of the available normative data for commonly used tests was essential. Before this book's publication, only those lucky individuals with the time or staff to conduct exhaustive library searches or with extensive professional subscription lists could hope to be aware of more than a few normative reports for any specific test. Although several books cover the intricacies of administration and scoring procedures for neuropsychological tests and a few contain some normative data, no previous volume has been exclusively devoted to the presentation and discussion of existing normative data for specific neuropsychological tests or provided a framework for judging studies that report normative data. This handbook was written to help guide the busy clinician, researcher, and graduate student to the utility of commonly used neuropsychological tests and to the normative data accompanied by critical reviews for comparison purposes for most of the tests described in this book. The following tests have been described: Trailmaking, Color Trails, Stroop Color Word Interference, Auditory Consonant Trigrams, Paced Auditory Serial Addition, Ruff 2&7, Digit Vigilance, Boston Naming, Verbal Fluency, Rey-
Osterrieth Complex Figure, Hooper Visual Organization, Visual Form Discrimination, Judgment of Line Orientation, Ruff Figural Fluency, Design Fluency, Tactual Performance, Wechsler Memory Scale (WMS-R, WMS-111, WMS-IIIA), Rey Auditory-Verbal Learning, California Verbal Learning, Hopkins Verbal Learning, WHO-UCLA Auditory Verbal Learning, CERAD List-Learning, Selective Reminding, Benton Visual Retention, Finger Tapping, Grip Strength (Dynamometer), Grooved Pegboard, Category, and Wisconsin Card Sorting tests.
ORGANIZATION OF THE BOOK
The book contains 25 chapters. The basic concepts of normative neuropsychology are addressed in the first three chapters. The first chapter provides an introduction to the practice and philosophy of neuropsychology as a clinical discipline. The second chapter explores the interface of neuropsychology with other professional/clinical disciplines and revisits critical issues in neuropsychology. The third chapter provides an overview of statistical methods and the use of statistical and methodological concepts in neuropsychology, history and applications of meta-analysis in clinical practice, and description of procedures for the use of meta-analysis in this book. The remaining 22 test chapters review and present the normative data for specific neuropsychological tests, which are derived from articles and other communications reporting results of normative and clinical comparison
viii
studies. These chapters begin with a brief ovetview of the history, utility, and psychometric properties of the test under discussion, which indicates whether there are different versions of the test and/or varying administration procedures. If more than one version of a test exists, the differences in content, administration, and scoring are described. We purposely avoided an exhaustive review of the history and psychometric properties of the tests because this information is readily available in other Oxford publications, specifically Lezak et al. (2004) and Spreen and Strauss (1998). The next part of the test chapters is a summary of the findings from research that has examined the influence of demographic variables (e.g., age, education, intellectual level, gender, ethnicity/culture, handedness) and administration procedures on test performance. The findings from this review highlight the critical variables needed to evaluate the normative reports for the test. These critical variables are broken down into two categories: (1) subject variables and (2) procedural variables.
Subject variables address such issues as: "How broad are the utilized age group ranges in data reporting?" Optimally, studies report data across rather discrete age groups (e.g., 20-24, 25-29, 30-34, 35--39, 4044, 45-49, 50-54 years) rather than across one allinclusive range (e.g., 20-54 years).
'What is the education and/or IQ of the study participants?" Because education and IQ may have a dramatic impact on test performance, it is important to include this information so that data that closely match the education and/or IQ of the patient under study can be used.
'What was the sample size in each of the reported age or age/education categories?" "Is the sample from which data were collected well described?" For instance, the age of the subjects and the country where the study was conducted always
PREFACE must be reported. Depending on the test administered, other important variables may include gender, ethnicity/culture, and hand preference.
Procedural variables address such issues as: 'What version of the test was administered?" "How was the test administered?" "How was the test scored?" "Did the data reported include mean and standard deviation scores?'' The next section of each of these chapters summarizes the status of the normative data for the test and answers the questions: "How many studies are out there?'' 'Which versions of the test have been the most frequently administered?'' 'What demographic characteristics have been the most frequently studied?'' The next section presents critiques of the studies, with the strengths and considerations regarding the use of each normative report discussed in some depth. Data tables are presented in the appendix corresponding to each chapter. Each appendix starts with the data locator table for that chapter, which summarizes the subject and procedural variables for each study reviewed in the text, organized in ascending chronological order. The table quickly highlights the most appropriate normative data, given the demographic characteristics of the patient under study, as well as the test administration and scoring criteria employed. The locator table also indicates the page number on which an extensive critical review of the study can be found in the text of the chapter and directs the reader to the corresponding data tables in the appendix. Therefore, readers have the option of reading the critical review of the normative data for a particular test or simply using the data locator table to rapidly identify the appropriate data set for quick test interpretation. Several test chapters also include summaries of results of the meta-analyses which were used to derive the predicted scores for different age groups. The tables of predicted
ix
PREFACE
scores with education or gender correction (where appropriate) are presented in the corresponding appendices, along with descriptive statistics for the aggregate sample, significance tests, and scatterplots depicting dispersion of the data points around the regression line. The test chapters conclude with a summary and suggestions for future research to improve the database for the test.
HOW TO BEST USE THE BOOK The process of selecting the inost appropriate normative report for interpretive purposes involves determining the "best fit" between a patient's demographic characteristics (e.g., age, years of education, IQ, handedness) and the demographic characteristics of the study sample. It is also critical to insure that the version of the test administered is the same as that used to collect the normative data. Likewise, it is critical that the scoring procedures are identical. As a general policy, before seeing a patient, we typically determine which normative data we are going to use to interpret his or her performance. This way we do not discover after a patient has gone home that the only reference data available utilized a different administration and/or scoring protocol from the version we used. Such "discoveries" undermine confidence in test score interpretation. Fortunately, however, the vast majority of normative reports use standard administration and scoring procedures. If the data have already been collected, an important variable to screen for initially is country of origin. If the patient was born and/ or educated in the United States, then the most appropriate comparison data should have been collected from individuals born and/or educated in the United States. Another critical variable is age. A patient's test scores must be compared to those of age peers because performance on most neuropsychological tests changes as a function of age. Educational level and/or IQ are also important variables. Because they can have a tremendous impact on performance on most neuropsychological
tests, a patient's IQ and/or educational characteristics should closely match the demographics of the normative comparison sample. Optimally, normative data are reported by age/ education or age!IQ categories (i.e., performance of those aged 20-25 years with 12 years of education, performance of those aged 2025 years with 13-15 years of education, performance of those aged 20-25 years with 16 years of education, etc.). Sample size is also critical because small sample size within any of the comparison categories (i.e., age, age/ education) can undermine the stability of the normative data and reduce confidence in score interpretation. For some tests, gender and handedness must be considered. Ideally, the administration and scoring procedures used to assess the patient should be identical to those used to collect the normative comparison data. If the data locator table suggests that more than one study could be appropriately used, then the reader is especially advised to read the critical reviews of the studies closely to help determine whether one data set is more appropriate than others. Close inspection of the details of the studies often leads to clear-cut conclusions. If the data from different studies yield contradictory values, the reader is advised to consult the table of meta-analytically predicted values (when available) to aid in theselection of the appropriate normative data set. If normative data for a certain demographic group cannot be found in the studies reviewed, with proper caution (see Chapter 3), the expected value for that group can be extrapolated based on the table of predicted values or can be computed based on the regression equation provided with the table. However, we strongly discourage the use of predicted values when the actual data sets are available.
HISTORY OF THIS PROJECT
The Beginnings The idea for this book originally grew out of the frustration that was experienced by Lou D'Elia in his attempts to locate appropriate normative data during the early years of his postdoctoral training. This frustration is
X
familiar to anyone who has used normative data and was practicing before 1990. Back in the "old days," it was fairly typical fQr practicing professionals to have access to, at most, one or two sets of normative data for any particular neuropsychological test. More often than not, graduate students and postdoctoral fellows and trainees were handed a m~ual of norms to be used in the clinic or laboratory. These "lab manuals" containing tables; of normative data were passed from mentorjto trainee {and vice versa) as if they were t&e Holy Grail. Early in his training, Lou beg~ to ask "Where did these data come from?"l Sometimes a graduate student, postdocto~ fellow, or faculty member would "discover" a pew set of norms for a particular test and a neW table would magically appear in the lab 1panual. Applying the new reference data to .patient scores often yielded wildly different percentile performance interpretations from those based on the "standard" norms. This sent Loo to the UCLA Biomedical Ubrary to search for the source of the data and to unearth the original research articles. Often, as he read the ·article, he discovered to his horror that the data had been collected from individuals not educated in the United States, that the sample size was extremely small (i.e., n < 10), or worse ~t. that the data were generated from a differ~t version of the test. If the same version of the test was being used, often the normative data had been collected by a nonstandard administration and/or scoring procedure. It was only after a thorough examination of how the J~tudies were carried out-in terms of test ~ tration, scoring, and demographic ch~acter istics of the study participants-that one could begin to unravel the reasons why the: use of one set of normative data yielded a ~erent interpretation than use of another. Those trips to the library resulted in the first article to summarize the availab.e normative data for any neuropsychological test: 'Wechsler Memory Scale: A Critical AP)>raisal of the Normative Studies" {D'Elia :et al., 1989). It was during the preparation ~f this article that our basic template for analyzing normative reports was developed. Lou's next question was 'Why has ~o one gathered all this information together into a
PREFACE
reference book?'' Fortunately, Lou found two student colleagues in the same training program who shared his concern: Kyle Boone and Allen D. Brandon. Lou, Kyle, and Allen eagerly returned to the library to collect the data necessary to produce a reference book. Soon, however, they discovered why no such volume existed. It is hard to imagine now, but as recently as the late 1980s and early 1990s, the majority of neuropsychology-related professional journals still had not been referenced in databases. No subject category for "Norms" or "Normative Data" was listed in the key reference indices such as Index Medicus or Psychological Abstracts. As a result, most of the research papers were located by going through the various journals article by article. Gathering the necessary information proved to be a very large task, not one that we would recommend to a postdoctoral fellow at the beginning of his or her career. Yet, that is exactly what they did. Hindsight is 20/20! Allen Brandon withdrew from the project upon completing his postdoctoral fellowship. Private practice called. Only Lou and Kyle remained. However, for Lou and Kyle, free time seemed to evaporate as they pursued developing professional careers and attended to their ever-increasing family activities and obligations. The project slowly moved forward. Finding and cataloguing the articles, then analyzing them using the templates required much more work than they had imagined. Then, about 1994, Maura Mitrushina joined the project, and thanks to her considerable enthusiasm and efforts the first edition of the book was 6nally completed.
The Second Edition-Changes and Updates Now, 6 years later, we are glad to have on board a new member of the team, the young and vibrant Jill Razani. We invited her to participate in the preparation of the second edition in order to share responsibilities for writing new chapters with reviews of additional commonly used tests in response to the wishes of our audience. This was the only way to keep our sanity, attend to our families and jobs, and have a semblance of "normal life" while working on the second edition.
xi
PREFACE
The new tests reviewed in the second edition include Paced Auditory Serial Addition, Ruff 2&7, Digit Vigilance, VISual Form Discrimination, Judgment of Line Orientation, Ruff Figural Fluency, Design Fluency, WMSIIIA, California Verbal Learning, Hopkins Verbal Learning, WHO-UCLA Auditory Verbal Learning, CERAD List-Learning, Selective Reminding, Benton VISual Retention, and Wisconsin Card Sorting tests. The chapters in the first edition have been updated and revised. Information on methodological issues, new versions and new approaches to the tests, and their clinical utility has been added. Studies published after 1998 that are based on well-defined, intact samples were reviewed. Outdated information, data on diagnosed clinical groups, and chapters describing tests that are not in wide use were removed. The format of data presentation has been changed. Learning from our mistakes with the first edition (data tables are not exactly placed in the text of their description, as we originally envisioned!), we removed all data tables from the text and placed them in the appendices. We hope that this change will make it easier to locate the needed tables. In response to the wishes of the readers of our first edition, we synthesized the data in meta-analytic tables of predicted values with supporting statistics for those chapters that have sufficient number and homogeneity of
studies for such analyses. The limitations of such predicted norms were highlighted.
FUTURE DIRECTIONS The handbook is as up-to-date as we could make it. We intend to update the handbook every few years; and with subsequent editions, it will be expanded to include additional tests frequently used by neuropsychologists. We have already made a step in this direction with the second edition. Almost all of the tests in this book continue to appear on lists of the most popular tests in neuropsychology. We also managed to sneak in some information regarding a couple of published tests that were developed in our laboratory that seem to be gaining popularity elsewhere (i.e., Color Trails Test, WHO-UClA Auditory Verbal Learning Test). We hope this book finds its place on the desks of professionals performing or reviewing neuropsychological assessments. We also hope it will be welcomed by teachers of assessment and psychological statistics and helpful to graduate students learning to interpret test scores. Our goal is to help bolster confidence in the basis for clinical judgments and to strengthen the credibility of research and clinical findings.
Los Angeles . California
M.M., KB.B., J.R., L.F.D.
Acknowledgments
We extend our deepest gratitude to all the authors whose normative and clinical comparison research is reviewed in this book. Without their work, this book would not have been possible. This volume is not intended to disparage the work of any author as we strongly believe that each author has made an important contribution to our overall knowledge through their research efforts. Over the years, several people have helped us with the preparation of the first and second editions of this book. Their help took many forms, including everything from typing tables and checking the accuracy of references to providing us with materials to be included in the book and simple moral support. We offer each one our heartfelt thanks for every kindness and courtesy extended to us along the way: Lidia Artiola i Fortuny, Jean Avezac, Eyzzz Baccarrdi, Julian Bach, Robert Bomstein, Virdette Brumm, Debora Burnison, Robert Butler, Flo Comes, Lou Costa, Michele Croisier, Jeffrey Cummings, Janine Czametzki, Doug Danaher, Dean Dellis, Jack Demick, Lois Desmond, Carl Dodrill, Linda Dukmajian, Katharine Earhart, Robert Elliot, Kadimah Elson, Gwenn Evans, Bee Fletcher, Travis Fogel, David Forney, Jennifer Forrest, Paula Fuld, Stephen Ganzell, Ismelda Gonzalez, Patricia Gross, Adrienne Gundry, Tiffany Harris, Lany Herrera, Charles Hinkin, Stacey Horowitz, Robert Ivnik, Lissy Jarvik, Irene Kassorla, Ellen Kester, Glen Larrabee, Asenath LaRue, Stanislav Levin, James Loong, Enrique Lopez, Christine LoPresti, Anahit Magzanyan, Mario Maj, Lawrence Majovski,
Alfred Marohl, Gayle Marsh, James Marsh, Joan McConnell, Susan McPherson, Fernando Melendez, John Meyers, Eric Miller, Robin Morris, Hector Myers, Narine Nazari, Linda Nelson, Tina Noriega, Lara Orchanian, Elizabeth Pacheo, Daniel Parks, Nikki Passanante, Helen Paull, Eileen Pearlman, Marcel Ponton, Stephen Rebello, Matt Reinhard, Mark Richardson, Linda Ringer, Marcela Rivera, Eddie Rozenblat, Michael Salmone, Manuela Saul, Robert Sbordone, Jeffrey Schaeffer, Karen Schiltz, David Schretlen, Amanda Schrey, Ola Seines, Glenn Smith, Fabrizio Starace, Norton Stein, Tony Strickland, Donald Stuss, Donald Trahan, Craig Uchiyama, Doug Umetsu, Harry Van der Vlugt, Wilfred Van Gorp, Valdis Volkovskis, Travis White, Jane Williams, Bennett Williamson, Lome Yeudall, Betty Young, and Miguel Zavala. We express endless gratitude to Courtney Sheen, who organized and coordinated the preparation of tables for the second edition. We thank Linda Fidell and Ingram Olkin for their advice on the design and statistical treatment of the meta-analyses. We are indebted to Xiao Chen and the UCLA ATS Statistical Consulting Group for their advice and support, ranging from providing ample literature resources on applications of Stata in meta-analyses to invaluable help with the set-up of command files and interpretation of results of the analyses. Special thanks go to Muriel Lezak and Edith Kaplan, who have been a constant source of encouragement and support from the very beginning of the project.
xiv
We extend our gratitude to Paul Satz, who fostered in three of the authors appreciation for the complexity and excitement of the field of neuropsychology. The contribution of Dale Sherman to the methodological accuracy of the first edition qualifies him for a spot in heaven. We also extend special thanks to Allen Brandon, who was an early collaborator on the first edition. Allen, your early efforts and great enthusiasm were deeply appreciated. Dr. D'Elia offers his admiration and appreciation to his three coauthors, whose efforts brought this project to completion.
ACKNOWLEDGMENTS
Sincere thanks to our editors Jeff House, Fiona Stevens, and ancy Wolitzer, who e support throughout has been continuous and enthusiastic. Finally, we thank our families: M.M. thanks Masha, Sasha, and Kaley for their endless patience and understanding; K.B.B. thanks Rodney, Galen, and Fletcher; J.R. thanks her parents and family, especially Bill, Rl10nda, and Mike; L.F.D. thanks his parent and family, especially Michael D. Salazar, for their constant encouragement and support. M.M., K.B.B., J.R. L.F.D.
Contents
I. BACKGROUND 1. Introduction, 3 Test-Taking Environment, 6 Test Norms, 7 Tests, 9 Standard and Experimental, 9 When Is a Test Considered Experimental?, 10 What Determines Whether a Test Is Considered "Standard?'', 11
2. Use of Methodological Concepts in Neuropsychology Practice, 12 Interface of Neuropsychology with Other Clinical Disciplines, 12 Applications of Neuropsychological Evaluation, 13 Different Levels of Data Integration in Neuropsychology Practice, 15 Judgment and Decision Making in Clinical Neuropsychology, 17 Strategies in Test Selection, 17 Normative References and Interpretation of Clinical Data, 18 Alternative Methods for Interpretation of Clinical Data, 22 Factors Influencing Performance on Neuropsychological Tests, 27 Effort and Motivation, 27 Issues in Cross-Cultural and Multicultural Neuropsychological Assessment, 28 Final Caveats, 30 Data Inclusion in Neuropsychological Reports, 31
3. Statistical and Psychometric Issues, 33 Measurement and Interpretation of Numerical Values, 33 Standardization of Raw Scores, 35 Standard Scores and Normal Distribution, 36 Interpretation of Infrequent (Outlying) Scores, 38 Interpretation of Scores That Are Not Normally Distributed, 38 Psychometric Properties of Tests, 39 Reliability, 39 Methods of Estimating Test Reliability, 39 Standard Error of Measurement, 40 Validity, 41 Decision Theory, 42 Base Rates, 42 XV
xvi
CONTENTS Selection Ratio, 43 Incremental Validity, 43 Cutoffs and Diagnostic Acctiracy of a Test or Interpretive Strategy, 44
Synthesis of Results of Differen~ Studies in a Meta-Analysis, 45 Historical Overview and the Raticinale for Using Meta-Analysis in This Book, 45 Application of Meta-Analysis in Quucal Practice, 46
Advantages, 46 Sources of Bias, 46 Selection of Studies and Procedures for Meta-Analyses Presented in 11lis Book, 47 Uterature Search and Selection ci Studies, 47 Procedures Used in the Analyses, 48 Data Editing, 48 Regression, 50 Prediction, 51 Standard Deviations, 51 Testing Model Fit and Parameter'Specilications, 52 Effect of Demographic Variables, ; 54 Comments on the Applicability oP;the Meta-Analyses Presented in This Book, 55
I
II. TESTS OF ATTENTION AND f::ONCENTRATION: VISUAL AND AUDITORY 4. Trailmaking Test, 59 I
Brief History of the Test, 59 Contributions of Cognitive Mechatlisms and Physical Layout Differences to Performance on Parts A and B, 60 Utility of the Derived Measures, Which Are Based on Differences in Performance Times for Parts A and B, 61 Utility of the Error Analysis, 62 Utility of the Cutoffs for lmpairm~nt, 63 Effect of the Order of Presentatioa and Practice Time, Practice Effect, and Alternate Versions oftheTMT, 64 Culture-Specific Sets of Normativ~ Data and Cultural Adaptations for the TMT, 65 Modified Versions of the TMT, fti
Relationship Between TMT PerfQnnance and Demographic Factors, 67 Method for Evaluating the No~tive Reports, 70 Summary of the Status of the Norms, 71 Summaries of the Studies, 72 Results of the Meta-Analyses of t\le Trailmaking Test Data, 96 Conclusions, 98
5. Color Trails Test,
99
Brief History of the Test, 99 Relationship Between CTT Performance and Demographic Factors, 101 Method for Evaluating the NonnJtive Reports, 102 Summary of the Status of the NofiJls, 103 Summaries of the Studies, 103 Conclusions, 106
6. Stroop Test,
1oa
Brief History of the Test, 108 Current Administration Procedures, 110
CONTENTS Relationship Between Stroop Test Perfonnance and Demographic Factors, 112 Method for Evaluating the Nonnative Reports, 114 Summary of the Status of the Nonns, 115 Summaries of the Studies, 116 Results of the Meta-Analyses of the Stroop Test Data, 132 Conclusions, 133
7. Auditory Consonant Trigrams, 134 Brief History of the Test, 134 Administration Procedures, 134 Psychometric Properties, 135 Relationship Between ACT Perfonnance, Demographic Factors, and Vascular Status, 135 Method for Evaluating the Nonnative Reports, 135 Summary of the Status of the Nonns, 136 Summaries of the Studies, 137 Conclusions, 140
8. Paced Auditory Serial Addition Test, 141 Brief History of the Test, 141 Modifications and Alternate Formats of the PASAT, 142 Psychometric Properties of the Test, 143 Relationship Between PASAT Perfonnance and Demographic Factors, 143 Method for Evaluating the Nonnative Reports, 145 Summary of the Status of the Nonns, 145 Summaries of the Studies, 146 Conclusions, 158
9. Cancellation Tests, 160 Brief History of the Tests, 160 Ruff 2&7 Selective Attention Test, 160 Brief Overview of the Ruff 2&7, 160 Psychometric Properties of the Ruff 2&7, 161 Relationship Between Ruff 2&7 Performance and Demographic Factors, 162 Digit Vigilance Test, 162 Brief Overview of the DVf, 162 Psychometric Properties of the DVf, 163 Relationship Between DVf Performance and Demographic Factors, 163 Method for Evaluating the Nonnative Reports, 163 Summary of the Status of the Nonns, 164 Summaries of the Studies, 164 Conclusions, 170
Ill. LANGUAGE 10. Boston Naming Test, 173 Brief History of the Test, 173 Studies Using BNT Error Quality Analyses, 174 Current Views on the Mechanisms Underlying Confrontation Naming Deficits, 176
xvii
xviii
CONTENTS
Modifications and Short Versions bf the BNT, 177 Cultural Adaptations and Culture~pecific Normative Data for the BNT, 178 Psychometric Properties of the Test, 179
Relationship Between BNT Perf()rmance and Demographic Factors, 180 Method Jor Evaluating the Nonqative Reports, 182 Summary of the Status of the Norms, 182 Summaries of the Studies, 183 Results of the Meta-Analyses of the Boston Naming Test Data, 197 Conclusions, 199
11. Verbal Fluency Test, 200 Brief History of the Test,
200
Psychometric Properties of the Ttft, 202 Cognitive Mechanisms Underlying Word Generation, 202 Biochemical and Anatomical Cort;lates and Effect of Brain Pathology · on Verbal Fluency, 203 Assessment of Verbal Fluency in JPifferent Languages, 205
Relationship Between VFT Perfopnance and Demographic Factors, 206 Method for Evaluating the No~tive Reports, 208 Summary of the Status of the Nc;ms. 209 Summaries of the Studies, 209 Results of the Meta-Analyses of ~e Verbal Fluency Data, 235 Conclusions, 237
IV. PERCEPTUAL ORGANIZATIQN: VISUOSPATIAL AND TACTILE 12. Rey-Osterrieth Complex Figure, 241 Brief History of the Test, 241 Administration Procedures, 241 Alternate Versions, 242 Scoring Systems, 243 Reliability, 248 Clinical Utility, 249 I Culture-Specific Studies and Nomfative Data for the ROCF, 251
Relationship Between ROCF Performance and Demographic Factors, 251 Method for Evaluating the Norm.tive Reports, 253 Summary of the Status of the Noims, 254 Summaries of the Studies, 255 Results of the Meta-Analyses of ~e ROCF Data, 269 Conclusions, 270 ·
13. Hooper Visual Organization ~-est, 272 Brief History of the Test, 272 Construct Validity, 273 Psychometric Properties of the Test, 274
Relationship Between HVOT Ped>rmance and Demographic Factors, 274 Method for Evaluating the Norm~tive Reports, 274 Summary of the Status of the No~s, 275 Summaries of the Studies, 275 Conclusions, 277
CONTENTS
14. Visual Form Discrimination Test, 278
Brief History of the Test, 278 Relationship Between VFDT Perfonnance and Demographic Factors, 280 Method for Evaluating the Nonnative Reports, 280 Summary of the Status of the Nonns, 281 Summaries of the Studies, 281 Conclusions, 282 15. Judgment of Line Orientation, 284
Brief History of the Test, 284 Psychometric Properties of the Test, 286 Alternate Brief Forms of the JLO, 286
Relationship Between JW Perfonnance and Demographic Factors, 286 Method for Evaluating the Nonnative Reports, 287 Summary of the Status of the Nonns, 288 Summaries of the Studies, 288 Conclusions, 296 16. Design Fluency Tests, 298
Brief History of the Tests, 298 Psychometric Properties of the Design Fluency Tests, 300 Ruff Figural Fluency Test, 300 Design Fluency Test Oones-Gotman!Milner Vemon), 300
Relationship Between Design Fluency Perfonnance and Demographic Factors, 301 Method for Evaluating the Nonnative Reports, 301 Summary of the Status of the Nonns, 302 Summaries of the Studies, 303 Conclusions, 310 17. Tactual Performance Test, 312
Brief History of the Test, 312 Psychometric Properties of the TPT, 314 Relationship Between TPT Perfonnance and Demographic Factors, 314 Method for Evaluating the Nonnative Reports, 315 Summary of the Status of the Nonns, 316 Summaries of the Studies, 318 Conclusions, 333
V. VERBAL AND VISUAL LEARNING AND MEMORY 18. Wechsler Memory Scale (WMS-R, WMS-111, and WMS-IIIA), 337
Brief History of the Test, 337 Relationship Between Test Perfonnance and Demographic Factors, 344 Method for Evaluating the Nonnative Reports, 345 Summary of the Status of the Nonns, 345 Summaries of the Studies, 346 Conclusions, 355
xix
CONTENTS
XX
19. List-Learning Tests, 357
Rey Auditory-Verbal Learning Test, 357 Variability in Administration of the Rey AVLT, 357 Functioning of Different Memory Mechanisms, as Assessed by the Rey AVLT, 359 Practice Effect and Alternate Fonns of the Rey AVLT, 361 Assessment of Auditory Verbal Learning with the Rey AVLT in Different Languages and Cultures, 362
California Verbal Learning Test-Second Edition, 362 Structure of the CVLT-11 and Description of the Nonnative Data Provided in the Test Manual, 362 Alternate and Short Fonns of the CVLT-11, 363 Review of the Recent Literature on the CVLT and CVLT-11, 363 Effect of Semantic Organization on Recoil, 363 Anatomical Correlates, 364
Assessment of Learning and Memory in Traumatic Brain Injury, 365 Assessment of Serial Position Effect in Dementias, 366 Repeated Administration and Practice Effects, 366 Assessment of Effort with the CVLT, 367 Use of the CVLT in Other Languages and Cultures, 367 Adaptations and Alternate Versions of the CVLT, 367 Hopkins Verbal Learning Test, 368 WHO-UCLA Auditory Verbal Learning Test, 369 CERAD List-Learning Test, 370 Selective Reminding Test, 370 Other Verbal and Nonverbal List-Learning Tests, 371 Relationship Between List-Learning Test Perfonnance and Demographic Factors, 372 Method for Evaluating the Nonnative Reports, 374 Summary of the Status of the Nonns, 375 Summaries of the Studies, 375 Results of the Meta-Analyses of the Rey AVLT Data, 391 Conclusions, 392
20. Benton Visual Retention Test, 394
Brief History of the Test, 394 Psychometric Properties of the Test, 397 Relationship Between BVRT Perfonnance and Demographic Factors, 398 Method for Evaluating the Nonnative Reports, 400 Summary of the Status of the Nonns, 400 Summaries of the Studies, 402 Conclusions, 416
VI. MOTOR FUNCTIONS 21. Finger Tapping Test, 419
Brief History of the Test, 419 Relationship Between FIT Perfonnance and Demographic Factors, 421 Method for Evaluating the Nonnative Reports, 422 Summary of the Status of the Nonns, 422 Summaries of the Studies, 423
xxi
CONTENTS
Results of the Meta-Analyses of the Finger Tapping Test Data, 441 Conclusions, 442
22. Grip Strength Test (Hand Dynamometer),
444
Brief History of the Test, 444 Relationship Between Hand Dynamometer Performance and Demographic Factors, 445 Method for Evaluating the Normative Reports, 445 Summary of the Status of the Norms, 446 Summaries of the Studies, 447 Results of the Meta-Analyses of the Hand Dynamometer Test Data, 457 Conclusions, 458
23. Grooved Pegboard Test,
459
Brief History of the Test, 459 Relationship Between GPT Performance and Demographic Factors, 460 Method for Evaluating the Normative Reports, 460 Summary of the Status of the Norms, 461 Summaries of the Studies, 462 Results of the Meta-Analyses of the GPT Data, 470 Conclusions, 471
VII. CONCEPT FORMATION AND REASONING 24. Category Test, 475 Brief History of the Test, 475 Alternate Formats, 477
Relationship Between Category Test Performance and Demographic Factors, 480 Method for Evaluating the Normative Reports, 481 Summary of the Status of the Norms, 482 Summaries of the Studies, 483 Results of the Meta-Analyses of the Category Test Data, 494 Conclusions, 495
25. Wisconsin Card Sorting Test,
496
Brief History of the Test, 496 Anatomical Correlates and Effect of Brain Pathology on the WCST, 498 Brief Overview of Clinical Findings Using the WCST, 499 Modifications and Alternate Formats of the WCST, 503 Psychometric Properties of the Test, 505
Relationship Between WCST Performance and Demographic Factors, 508 Method for Evaluating the Normative Reports, 511 Summary of the Status of the Norms, 512 Summaries of the Studies, 513 Conclusions, 531
References,
533
xxii
CONTENTS
Appendices 1. 2a. 2b. 2c. 2d. 3. 4. 4m. 5. 6. 6m. 7. 8. 9. 10. 10m. 11. 11m. 12. 12m. 13. 14. 15. 16. 17. 18. 19. 19m. 20. 21. 21m. 22. 22m. 23. 23m. 24. 24m. 25.
Where to Buy the Tests, 611 Subject Instructions for ACT According to Boone et al. (1990) and Boone (1999), 613 Auditory Consonant Trigrains (Boone et al., 1990; Boone, 1999), 614 Subject Instructions for ACT According to Stuss et al. (1987, 1988), 615 Auditory Consonant Trigrapts (Stuss et al., 1987, 1988), 616 WHO-UCLA Auditory Ve~al Learning Test: Instructions and Test Forms, 618 Locator and Data Tables fqr the Trailmaking Test (TMT), 623 Meta-Analysis Tables for Trailmaking Test (TMT), 648 Locator and Data Tables f~ the Color Trails Test, 657 Locator and Data Tables £ the Stroop Test, 661 Meta-Analysis Tables for Stroop Test (Golden Version, Interference Version), 680 Locator and Data Tables£ Auditory Consonant Trigrams, 684 Locator and Data Tables £ the Paced Auditory Serial Addition Test, 689 Locator and Data Tables £ the Cancellation Tests, 705 Locator and Data Tables£ the Boston Naming Test (BNT), 709 Meta-Analysis Tables for t}t Boston Naming Test (BNT), 724 Locator and Data Tables ~ complete placing metal pegs in all the grooved slots on the pegboard, the PIN Test 1core is based on the number of holes punc~d, the Dynamometer score is expressed in kildgrams,
exptcted
and the Finger Tapping Test score is expressed in terms of the number of taps made in 10 seconds. The ability to convert each of these various scores to a standard score equivalent, regardless of the previously expressed units of measurement (seconds, number of holes punched, kilograms, etc.) allows determination of a subject's relative standing in one distribution and permits its comparison with relative standing in another. The underlying assumption when using z or T scores is that the distribution of scores obtained by the normative sample follows what is known as the "standard normal distribution," which approximates the bell-shaped normal curve (see Chapter 3). Therefore, there is a fixed relationship between the standardized test scores, z scores, and percentile ranks. Table 2.2 illustrates the interrelationship between z scores, percentile ranks, and corresponding WAIS-III IQ equivalents. A positive z score will translate to a percentile rank of 50 or greater (refer to left side of the percentile rank column) and to a WAIS-III IQ of 100 or greater (left side of the WAIS-IJI IQ column). A negative z score will translate to a percentile rank below 50 (use right side of column) and a WAIS-III IQ below 100 (right side of column). Consider the following example. You have just assessed Mr. Smith's right (dominant) hand performance on the Grooved Pegboard Test, and you note that it took him 68 seconds to complete. Mr. Smith is 35 years old and has finished 11 years of formal schooling. He has lived almost his entire life in a large western Canadian city and only recently moved to the city where you evaluated his performance. After surveying the available normative data for possible comparison purposes (see Chapter 23), you decide that use of Bornstein's (1985) normative data for the Grooved Pegboard performance would be optimal. Examining the normative table, you note that males in his age and education group performed the test with their dominant hand in 65.3 (8.5) seconds. 68-65.3 (8.5) = 0.32
Considering that higher scores on this test reflect poorer performance (since it took
METHODOLOGICAL CONCEPTS IN NEUROPSYCHOLOGY
21
Table 2.2. Percentile Ranks and WAIS-III IQ Equivalent Scores for Corresponding z Scores Percentile
Percentile
WAIS-III IQ Equiv.
Rank
Rank
WAIS-III IQ Equiv.
SDor
SD or
z Score
+SD
-SD
2.17-3.00 1.96-2.16 1.82-1.95 1.70-1.81 1.60-1.69 1.52-1.59 1.44-1.51 1.38-1.43 1.32-1.37 1.26-1.31 1.21-1.25 1.16-1.20 1.11-1.15 1.06-1.10 1.02-1.05 0.98-1.01 0.94--0.97 0.90-0.93 0.86--0.89 0.83--0.85 0.79-0.82 0.76--0.78 0.73--0.75 0.70-0.72 0.66-0.69
99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
83 82 81 80 79 78
77 76 75
+SD ~133
130-132 127-129 126 124-125 123 122 121 120 119
-SD
z Score
+SD
-SD
'5,67 68-70 71-72 73-74 75-76
0.63-0.65 0.60-0.62 0.57--0.59 0.54--0.56 0.51--0.53 0.49-0.50 0.46-0.48 0.43--0.45 OA0-0.42 0.38-0.39 0.35-0.37 0.32-0.34 0.30-0.31 0.27--0.29 0.25-0.26 0.22-0.24 0.19-0.21 0.17--0.18 0.14--0.16 0.12-0.13 0.09-0.11 0.07--0.08 0.04--0.06 0.02-0.03 0.00-0.01
74 73 72 71 70 69
26 27 28 29 30 31 32 33 34
77 78 79 80 81
118 117 116
83
115
85
114 113
86 87
82
84
112
88
111
89
110
90
68
67 66 65 64
63 62 61 60 59 58 57 56 55 54 53 52 51 50
35
36 37 38 39 40 41 42 43
+SD
-SD
109
91
108
92
107
93
106
94
105
95
104
96
103
97
102
98
101
99
100
100
44
45 46
47 48 49
SD, standard deviation.
longer to complete placing all those pegs in the slots), you know that Mr. Smith has performed below the mean for the group (z = - 0.32). Obviously, this z score will be converted to something less than the 50th percentile. Indeed, when you locate a z score of - 0.32 in Table 2.2, the corresponding value (using the right side of the percentile rank column) is the 37th percentile. Percentile ranks permit one to indicate whether the performance is very superior, superior, high average, average, low average, borderline, or impaired. By convention, neuropsychologists use the percentile cutoffs presented in Table 2.3 in describing performance levels. However, it should be noted that these cutoffs may vary depending on psychometric properties of a given test. Therefore, whenever possible, the clinician is advised to use the cutoffs for performance levels that are provided by test authors.
Performance across different tests, expressed in standard scores, is frequently compared in neuropsychology practice, to identify strengths and weaknesses across cognitive domains. This approach, traditionally viewed as the "pattern analysis," should be used with caution as the probability of obtaining abnormal scores in an intact individual increases
Table 2.3. Converting Percentiles to Performance Levels Percentile ~98
91-97 75-90 25-74 9-24
2-8
etween age as a predictor variable and the te. means as an outcome variable. R2 indicates ~e proportion of variance in the test scores accounted for by the model. It should be noted that we used R2 rather than adjUfted R2 (which corrects for chance variation) $nee we had only one predictor for a relativqly large number of observations in each mocJel and, therefore, both values are very close; The F value and a corresponding probabilfy level indicate how reliably predictor varia~es pre-
BACKGROUND
diet test scores. The strength of the relationship between age and the test means is reflected in the dispersion of data points around the regression line. A scatterplot illustrating the dispersion is included in each relevant chapter, with the size of the bubbles reflecting the weight of the data point (larger bubbles indicate larger standard error [SE] and smaller weight). Information for each term of the regression model is provided in the tables. The coefficient for a predictor variable indicates the extent of gain or loss in the test performance given a one-unit change in the value of that predictor variable (given that all other variables in the model are held constant). For example, for the FAS version of the Verbal Fluency test, the coefficient for the Education term is 0.498 (see Table llm.l in Appendix Urn, under "Effect of demographic variables"). This means that for each 1-year increment in education we expect a 0.498-unit increment in word production. In other words, with every additional year of schooling, an individual is expected to increase verbal fluency by almost 0.5 of a word, irrespective of age. The 95% CI for the coefficient shows how high and how low the actual population value of the coefficient might be. Dividing the coefficient by the SE for that parameter yields the t value, which is used in testing the null hypothesis that the coefficient for a given term is 0. In our example, a t value of 2.47 with a two-tailed p = 0.025 indicates that the coefficient for Education of 0.498 in a model based on 29 observations is significantly different from 0; thus, we can infer that education significantly contributes to test performance. This information was used in the tables to provide a correction factor for predicted test scores for different levels of education. It should be noted that significance tests for the term age in the quadratic equations do not accurately reflect the linear effect of age on test performance due to collinearity with the quadratic effect, i.e., with the age2 term in the equation. To address the linear effect of age, avoiding the collinearity, we present significance tests for age centered (by subtracting the mean age for the aggregate sample from
51
STATISTICAL AND PSYCHOMETRIC ISSUES the mean age for individual samples) in the footnotes to the relevant summaries of the regression models.
Prediction The model that was estimated using regression command was used to make out-of-sample predictions on another data set, which included values for age distributed in 5-year increments (with smaller intervals at the extremes of the age distribution in some cases), representing mathematical centers of respective age categories. For example, for the F AS, the data set includes values 19.0, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5, 52.5, 57.5, 62.5, 67.5, and 72.5. These numbers represent the age categories 18-19,20-24,25-29,30-34,35-39,40-44,4549, 50-54, 55-59, 60-64, 65-69, and 70-74. Care was taken to avoid out-of-range estimates. For example, when the available data extended only to the age of 82, the 80-84 category was not used in the prediction table because an assumption that the same rule applies to ages 83 and 84 would not have an empirical basis. In some cases, when a partial age group was well represented, a predicted value for this age group is listed. It should be noted that distributions of the data used for model estimation were examined for continuity, to avoid gaps within the distribution. When such gaps were detected, the extreme data points were excluded from analyses. Tables of predicted values with corresponding Cis for the relevant neuropsychological tests are presented in the meta-analytic tables in the appendices along with supporting statistics. Critical reviews of strength and limitations of predicted values are included in the text of the respective chapters. In clinical practice, situations might arise where an estimated score is needed for an age that falls beyond the range of age categories included in a prediction table. We strongly recommend that the clinician seek the needed data in individual studies included in this book (using locator tables to facilitate the search) or tum to the data accumulated in that specific clinic. However, if everything else fails, the needed score can be calculated using the regression equations included in the tables,
which underlie calculations of the predicted scores. The equations are based on the coefficients for all predictor variables used by the program. The equation for a linear model is as follows: Predicted test score =constant+ (Page) x age That for a quadratic model is as follows: Predicted test score= constant+ (Page) x age + (pag_.) x age2 where ~age is the coefficient for age and ~agei is the coefficient for age2 , respectively. For example, a quadratic equation derived from the model estimation for the F AS is 34.298 + 0.554 x age- 0.007 x age2 (see Table llm.1 in Appendix 11m). The equations are provided below the prediction tables; coefficients used in these equations are listed among the results of the analyses provided in the tables, specifically under the subtitle "Ordinary least square regression." As reflected in the shape of the regression line in the scatterplot in Appendix lim, FAS performance is expected to increase somewhat up to approximately age 40, with a subsequent decline. The value for age when performance reaches its maximum can be derived from the regression equation using the following formula:
Using coefficients from the regression equation for the F AS this value is - [.55412 x (- .007)] = 39.57. The obtained value represents the age at which the curve turns over to the declining direction.
Standard Deviations To test for a possible relationship between the variability in scores at different ages, regressions of SDs for test scores on age were run. When age accounts for a significant amount of variability in the SD, the predicted values for SDs and Cis (calculated using the same approach as above} are reported along with the predicted test scores. The results of
52
BACKGROUND
significance tests for regressions on~ SDs are reported. Tests for model fit for the lsolutions on SDs were performed using the same approach as for the performance scqres. The results of these tests were used for decisionmaking purposes, but they are not P,resented in the meta-analytic tables in the apjendices, to avoid information overload. When the results suggested tha~e does not account for any notable amoun of variability in SD, as reflected in a ve low R2, mean SDs derived from the original ~ta are listed in the tables as they are appliciahle for all age groups. :
Testing Model Fit and Parameter Specifications
I
Postestimation tests of parameter s 1 tions were performed to ensure cy of the prediction. Though violation of e normality of the residuals would not affi estimates of regression coefficients and p ·cted values, it would affect the validity of ypothesis testing; in other words, significan deviation from normality would affect the ·dity of p values for the t-test and F -te . The Shapiro-Wilk W test was used to as ss the normality of residuals for the variables bsed in the regressions. The p value for the W $tatistic
is based on the assumption that the distribution is normal. Thus, high values of p indicate that we cannot reject the hypothesis that the variable is normally distributed. The normality of residuals was also assessed using the "kdensity" plot (Kernel Density Estimate), which approximates the probability density of a variable, and through visual inspection of residuals regressed on age. Close approximation of the estimated curve to the normal density overlaid on the plot and no pattern in the dispersion of residuals support the results of the Shapiro-Wilk test. Kernel Density Estimate and plot of the residuals regressed on age for the F AS are reproduced in Figures 3.3 and 3.4 for illustration purposes (the size of the bubbles in Fig. 3.4 reflects the size of the SEs of the data points, reciprocal to their weights). However, they are not included in the meta-analytic tables in the appendices. Hmnoscedo.sticity, or homogeneity of variances of the residuals, is one of the main assumptions of the regression analysis. We used White's general test for heteroscedasticity, which regresses the squared residuals on all distinct regressors, cross-products, and squares of regressors. It tests the null hypothesis that the variance of the residuals is homogenous. Low values of the derived Lagrange multiplier statistic and high values of p indicate that we
.2
.15
I
.1
.05
figure 3.3. Kernel Density Estimate, which compares the estimated curve to the normal density (data for the Verbal Fluency-FAStest were used).
53
STATISTICAl AND PSYCHOMETRIC ISSUES 5
0 0
0 0
I
0
oo
0
0
0
0
0 0
0
0 0
of6
0
o o0
00
0
0 0
0
-5
0
20
50
40
30
70
60
mean age
60 ~
Figure 3.4. Plot of residuals regressed on age (data for the Verbal Fluency-FAStest were used). The size of the bubbles reflects the size of the standard errors of the data points, reciprocal to their weights.
cannot reject the hypothesis of homogeneity of variance in the residuals. A dispersion of residuals plotted vs. fitted values on the residualvs.-fitted plot (rvf plot) was visually inspected for each regression. In a model with a good fit and homogenous residual variances, this distribution should have no pattern. An rvf plot for the F AS is included (see Fig. 3.5) to illustrate this technique. However, these plots are
not reproduced in the meta-analytic tables in the appendices. Independence assumption refers to the expectation that errors associated with one observation are not correlated with errors associated with any other observation. Our data clearly do not meet this assumption because the data points derived from the same study (e.g., when the scores are stratified by
0
4.41668 0
0
0 0 0 0
.. 1
0 0 0
0
0
0
'0
1a:
0 0
0
0 0
0
0 0
0 0
0
0 0 0 0
-5.174
0
36.6217
Fitted values
45.1
7
STaTa-
Figure 3.5. Residual-vs.-fltted plot (data for the Verbal Fluency-FAStest were used).
54
BACKGROUND I
age group) are likely to be related and subjected to the same source of errqr. To account for the lack of independ$lce, we used the cluster option for model e~mation, which specifies that observations ;-e independent across studies (clusters) ;but not within studies.
Effect of Demographic Variables The effect of education was exploted with the "metareg" command, which yiel~ an estimated between-study variance taq2 , measuring residual heterogeneity adjtqted for covariates. The value of the tau2 estirpate was compared for regressions of test ~ans on additive components of variance ~th and without education. If the tau2 valu1·for the regression with education was mu lower, indicating that education explains a nsiderable amount of heterogeneity in tesq performance, education was entered as a ltedictor variable into the equation used for th~ model estimation. If R2 considerably improted as a result of addition of the education tdrm and the t value for education was high wi~ a low p value, the coefficient for education, !derived from the latter regression, was used ~ a correction factor in the tables for relevapt tests. Where education accounts for a ~e proportion of variance in test performal)ce, the predicted scores listed in the age-s~tified tables are accurate for individuals wijh education at the mean level for the origijal data set. With every year of education a~ove or below the mean, expected gains or lcfs~es in test performance are equal to the coctfticient for the education term. For exampleJ values listed in the prediction table for the PAS are accurate for individuals with appro'4mately 14 years of education since the meanfeducation across all samples for this test i$ 14.31 (see Table llm.1 in Appendix llm,i under "Description of the aggregate sample"). Thus, an expected score for a 37-year-old individual with 12 years of education (2 years bebw the mean of 14 cited above) is 45.17- 2(b.50) = 44.17. : Correction tables provided for the FAS and the Trailmaking Test (TMT) parts A and I
B allow such adjustments by adding or subtracting the appropriate correction factor to/ from the predicted scores provided in the prediction tables. The SD to be used with the education-corrected score is that for the person's actual age group (which is relevant to the TMT tables but not to the F AS as the same SD is used for all age ranges for the latter). It should be noted that the range of years of education for the correction tables is limited. This limitation is due to a lack of empirical data for individuals with lower levels of education in the studies reviewed. We do not know whether the pattern of education/test performance relationship linearly continues into the lower educational levels. Therefore, extrapolation of the suggested correction pattern onto educational levels falling below the empirically supported range might undermine the accuracy of the prediction. The effect of gender on test performance was assessed by adding a variable accounting for a percent of males in the sample as a predictor variable into a regression of test means on age. In addition, a t-test was run on the data that are reported for males and females separately. Male/female differences in mean test scores are reported in the tables, where appropriate. If a sufficient number of studies for a specific neuropsychological test report the data stratified by gender and a significant relationship between gender and test scores is highlighted in the literature, age-predicted scores are presented for males and females separately (e.g., GPT). For a number of neuropsychological tests, the differences between genders were not large enough to warrant separate predicted tables or addition of a correction factor for gender. Although it is widely known that intelligence level makes a considerable contribution to performance on certain tests, we could not provide corrections for IQ level because of the paucity of reported data on IQ in the samples aggregated for the analyses in this
book. Similarly, the volume of information on ethnicity or other demographic variables gathered from the studies reviewed was not sufficient to conduct statistical analyses.
STATISTICAL AND PSYCHOMETRIC ISSUES
Comments on the Applicability of the Meta-Analyses Presented in This Book As discussed earlier, an advantage of any metaanalysis is in increased power to direct in-
formed clinical decisions based on synthesis of empirical data. Data derived from individual studies underlying our meta-analyses might be biased by imperfect sampling procedures, random individual differences due to small sample sizes in each demographic cell, and deviations from standard administration procedures. In addition, they are setting-specific and contain data for limited age ranges or demographic categories. Thus, choosing a normative data set as a reference for a specific patient might become a time-consuming undertaking. Meta-analytically derived regression estimates are based on large aggregate samples and represent the mathematical center of all studies across demographic groups. As such, regression-based tables of normative data are relatively free of chance factors affecting individual studies. However, regressionbased norms should not be used as a substitute for empirically derived tables presented in the context of study reviews. Any averaging results in a loss of specific qualities. We intended to present corrections for variables that are in theory expected to affect test performance. However, we were limited by the data available in the literature, which in many cases seem to be at odds with the theory. Individual data sets based on a sample of participants who are similar in terms of setting, demographic characteristics, and/or functional level to the patient for whom normative comparisons are sought would provide more accurate estimates of expected performance than regression-based tables. It has been emphasized by a number of investigators (e.g., Heaton et al., 1986; Kalechstein et al., 1998; Ross & Lichtenberg, 1998; Van Gorp & McMullen, 1997) that a selection of the normative data set should be guided by the comparability of the patient's demographic characteristics to those of the normative data and, more specifically, by the moderating variable that is most likely to affect performance (e.g., age for tests tapping psychomotor
55
speed, education for tests emphasizing verbal achievement). Regression-based predictions are best considered an aid in selecting an appropriate table when results from different studies yield contradictory values. Regression-based norms have been criticized (Fastenau, 1998; Fastenau & Adams, 1996; Heaton et al., 1996; Morgan & Caccappolo-van Vliet, 2001; Moses et al., 1999). Major criticisms refer to the concerns of violation of assumptions undermining the accuracy of prediction and extrapolation of the rule summarizing the relationship between predictor and outcome variables to the ranges of the predictor variable that are not supported by available data. As it follows from the above description of the procedures for heterogeneity and parameter specification testing applied in our analyses, the issue of violation of assumptions was closely attended to. In addition, all predicted values fall strictly within the empirically supported ranges of the predictor variables. In spite of these efforts, the scope and quality of our analyses are limited by the scope of the data available in the literature. The accuracy of regression solutions presented in this book is undermined by several factors: 1. Age groupings provided in the literature
vary greatly between studies. Whereas for one study the mean age of 48 years might represent a range of 45--50, in another study the mean age of 48 represents a range of 20--86. The performance score reported in the latter study is much less meaningful in terms of agereferenced prediction than in the former. This situation was mitigated by weighting data points on SEs for the means as this weighting takes into account the dispersion around the mean. 2. Evaluation of the effects of demographic variables on estimates of test performance was limited by scarcity of demographic data provided in the literature. For example, an important variable such as IQ, which is expected to contribute significantly to variability in several neuropsychological tests, had very
56
limited variance across the data sets. Only few studies reported IQ. 3. Levels of education and IQ for the majority of data sets are high. Therefore, the predicted values overestimate expected performance for individuals with a high school education or below and with average or lower than average range of intelligence. 4. We cannot describe our aggregate sample in terms of ethnic distribution because of scarcity of information on participants' ethnicity in the individual articles. We believe that the underlying samples are not representative of the mixture of ethnic groups according to U.S. Census figures since many samples were dominated by Caucasian participants. Those data that were collected exclusively on representatives of specific ethnic groups (e.g., Chinese, African American, or Hispanic) were not included in the meta-analyses as they increase the heterogeneity of the data. Ideally, separate analyses on data for different ethnic groups should be conducted in the future, providing that a sufficient number of studies reporting normative data specifically for different ethnic groups will be generated. 5. Increments in the values of predictor or moderator variables extracted from the literature are uneven. As reflected in scatterplots depicting the distribution of data points around the regression line
BACKGROUND
for each relevant neuropsychological test, available data seem to cluster at the young and advanced ages, with more scarce data points in-between. Further investigations are needed to assure consistency in the relationship between predictor and outcome variables across all ages. However, large gaps in the ranges of predictor or moderator variables were avoided by eliminating extreme scores from the analyses. As a consequence of such adherence to empirically supported data, ranges of demographic categories covered in prediction tables are restricted; e.g., age groups are limited from both ends, and lower levels of education are not represented. 6. The suggested predictions for age (and education in a few cases) are based on the data for largely intact samples. It is unknown if the same relationship between demographic variables and test performance holds for individuals with brain pathology. Ultimately, normative databases should be expanded to include meta-analyses based on various clinical samples across test batteries, to acquire information on expected performance proffies for different diagnostic categories. In spite of the weaknesses addressed above, we hope that the predictions presented in this book will facilitate the process of clinical decision making, which encompasses historical, clinical, and psychometric information.
II TESTS OF ATTENTION AND CONCENTRATION: VISUAL AND AUDITORY
4 Trailmaking Test
BRIEF HISTORY OF THE TEST The Trailmaking Test (TMT) is included in the Halstead-Reitan Battery (HRB) and was originally part of the Anny Individual Test Battery (1944). Part A is an 8" x 11" page on which the numbers 1-25 are scattered within circles. The patient is instructed to draw lines connecting the numbers in order as quickly as possible. Part B is a page with the numbers 1-13 and letters A-L within circles. The patient is instructed to draw lines connecting the numbers and letters in order, alternating between numbers and letters (e.g., 1-A-2-B, etc.). Specific administration procedures are provided in Reitan's (1979) Manual for Administration of Neuropsychological Test Batteries for Adults and Children. Two scores are obtained, reflecting the total time in seconds to complete each task. In the Reitan (1979) administration format, errors are not scored, but when they occur, the patient is alerted to the mistake and instructed to correct it, thus slowing overall performance time. The patient is presented with short sample items prior to the administration of each task. Detailed administration instructions are provided in Lezak et al. (2004) and Spreen and Strauss (1998). Charter et al. (1987) reported reliability coefficients expressed as correlations with the
alternate forms of the test in which the order of the progression was reversed but the locations of the circles were not altered. The resulting coefficients were 0.89 and 0.92 in the normal sample with over 300 participants and 0.95 and 0.94 in the mixed sample for Trails A and B, respectively. The standard errors of measurement were 8.05 and 21.7 for Trails A and B. Dikmen et al. (1999) reported testretest reliabilities of 0. 79 for Trails A and 0.89 for Trails B over a 9-month interval in a mixed sample. Data on repeated administration are also presented by McCaffrey et al. (2000). Reliability and validity of the TMT are further addressed in Franzen (2000), Lezak et al. (2004), and Spreen & Strauss (1998). A children's version of the tasks (age 9-14) is available, which incorporates fewer items. The TMT enjoys considerable popularity due to its high sensitivity to the presence of cognitive imp~ent. In addition, a number of studies document the usefulness of the TMT as a predictor of instrumental activities of daily living in the elderly (Cahn-Weiner et al., 2002) and of functional outcome following acquired brain injury (Acker & Davis, 1989; Millis et al., 1994; Ross et al., 1997; Schmidt et al., 1996). According to surveys of test usage in neuropsychology practice (Butler et al., 1991; Camara et al., 2000; Lees-Haley et al., 1996; Sellers and Nadler, 1992; Sullivan & Bowden, 1997), the 59
60
TESTS OF ATTENTION AND CONCENTRATION
TMT is one of the most frequently used tests. The TMT is a standard component of screening batteries designed to detect cognitive impairment in different neuropsychological conditions. For example, in 1990, the TMT was adopted as a measure of cognitive impairment by the Drug Abuse Treatment Outcome Study (DATOS), sponsored by the National Institute on Drug Abuse of the National Institutes of Health. The DATOS was a naturalistic, prospective cohort study of adults enrolled in drug abuse treatment programs, which collected data on 10,010 adults in 96 programs across 11 cities in the United States between 1991 and 1993 (Horton & Roberts, 2003). The TMT data for a subsample of 8,521 adults were analyzed and presented by Horton and Roberts in a series of 19 articles published by the International Joumtd of Neuroscience between 2001 and 2003. The findings reflected in these publications point to significant effects of age, education, and ethnicity on many indices of TMT performance across various groups of drug users. However, the authors emphasized that these demographic effects are weak.
Contributions of Cognitive Mechanisms
and Physical Layout Differences to Performance on Parts A and B The TMT is described as a measure of visual conceptual and visuomotor tracking (Lezak et al., 2004); complex visual scanning with a motor component (Shum et al., 1990) with a contribution of motor speed and agility (Schear & Sato, 1989); simple motor-spatial skills and basic sequencing abilities (Lamberty et al., 1994); visual tracking, mental flexibility, and attention (Crowe, 1998b); visual perceptual abilities (Groff & Hubble, 1981); motor speed and visual attention (Gaudino et al., 1995); attention, simple motor and spatial skills, and sequencing abilities (Martin et al., 2003); and executive function (Burgess, 2003). Based on the results of a neuroimaging study exploring cognitive correlates of brain aging, Coffey et al. (2001) concluded that the neural substrates for the functions measured with the TMT part B involve multiple systems distributed throughout the brain. They attributed age-related slowing on part B to reduced
motor speed, impaired working memory, poor visual scanning, or a combination of several cognitive deficits. Factor analytic studies indicated that both parts A and B load on a visual perceptual factor (Groff & Hubble, 1981), a spatial factor (Moehle et al., 1990), a visuomotor scanning factor (Shum et al., 1990), a visuomotor speed and coordination factor (Swiercinsky, 1979), a motor problem-solving factor (Goldstein & Shelly, 1975), and a sustained attention and mental tracking factor (Lamar et al., 2002). Because of the complexity of mechanisms contributing to TMT performance, poor performance on this test is a nonspecific finding, which can be attributable to visual perceptual, motor, executive, motivational, or other factors (Anderson et al., 1995; Crowe, 1998b; Heilbronner et al., 1991; Iverson et al., 2002; Lezak et al., 2004; Lorig et al., 1986; Reitan & Wolfson, 1995b). To tease out a contribution of executive functioning to TMT performance, investigators turned to part B as a more complex measure requiring sequence alternation. According to the literature, several factors contribute to greater difficulty of part B in comparison to part A, which include cognitive demands and physical layout. Part B was found to place additional demands on the ability to alternate (Crowe, 1998b; Gaudino et al., 1995; Salthouse et al., 2000) and to flexibly modify a course of action (Arbuthnott & Frank, 2000; Kortte et al., 2002; Lamar et al., 2002; Lamberty et al., 1994; Pontius & Yudowitz, 1980) with a task-set inhibition component (Arbuthnott & Frank, 2000). Conversely, several investigators have identified additional demands on the ability to maintain two response sets simultaneously as the cognitive mechanism contributing to the greater difficulty of part B (Eson et al., 1978; Lezak et al., 2004; Reitan, 1971). Recent studies suggest that differences in physical layout further contribute to the greater difficulty of part B. Rossini and Karl (1994) reported that part B is 32% longer than part A. According to Gaudino et al. (1995), mean distances for parts A and B are 7.8 (3.2) and 10.2 (4.5) em, respectively, which increases trail length for part B by 56 em in
TRAILMAKING TEST
comparison to part A. In addition, analysis of visual interference indicated that part A has on average less than one visually interfering stimulus between each target, whereas part B averages more than one. Vickers et al. (1996) stated that the two parts of the TMT differ with respect to length and angular variability. However, there is no difference in structural complexity. Fossum et al. (1992) alternated the configura! arrangement of test stimuli by placing stimuli of part A in the spatial configuration of part B and vice versa. The authors concluded that differences in symbolic complexity and spatial arrangement, as well as interactions between these factors, contribute to the greater difficulty of part B. Arnett and Labowitz (1995) developed a modified version, which used a standard layout of part B with numbers substituted for letters, thus eliminating the alternation component. The authors found that it takes about 1.4 times as long to complete part B relative to part A because of the more complex layout of part B. Based on analysis of differences in time to completion between parts A, B, and the new version, the authors related longer completion time for part B to three factors: a cognitive processing factor that is unique to part B, the more complex layout of part B, and a psychomotor-attentional factor that is common to both parts A and B.
Utility of the Derived Measures, which Are Based on Differences in Performance Times for Parts A and B The above review suggests that TMT part B differs from part A in cognitive demands, length of trail, and perceptual complexity. However, studies that removed confounds of physical layout or visual complexity still documented a significant increase in time to completion with addition of an alternating component to the trailmaking condition (Crowe, 1998a; Gaudino et al., 1995). This suggests that sequence alternation places additional demands on executive function beyond the confounds of physical layout, which accounts for the increase in time to completion on part B. This assumption is further supported by the report that part B loads on an attention factor
61
(O'Donnell et al., 1994) and by findings of clinical studies indicating that samples of patients with frontal lobe damage or traumatic brain injury demonstrate lower performance on part B than normal samples or clinical samples with intact frontal lobes (Cicerone & Azulay, 2002; Corrigan & Hinkeldey, 1987; Pontius & Yudowitz, 1980; Reitan, 1971; Stuss et al., 2001). In contrast, Cicerone (1997) demonstrated low sensitivity of Part B to mild traumatic brain injury. Similarly, Anderson et al. (1995) and Reitan and Wolfson (1995a) did not find the TMT useful in detecting frontal lobe damage. These contradictory findings might be explained by differences in the severity of pathology in the study samples or by differences in the anatomy of the affected frontal regions (dorsolateral convexities vs. medial or basal-orbital regions). Several investigators have examined the relationship between performance on the two parts of the TMT, using the B-A difference and B:A ratio, in an attempt to identify an increment in time associated with the additional processing demands imposed by part B. Golden (1981) examined the properties of the B:A ratio and found that it has a curvilinear relationship with impairment. Both high and low ratios may indicate neuropsychological impairment, with a ratio score lower than 2 being indicative of deficient performance on part A and a ratio score greater than 3 reflecting deficient performance on part B. Heaton et al. (1985) recommended use of the B-A difference as a measure of cognitive efficiency. Corrigan and Hinkeldey (1987) found the difference and the ratio measures to be sensitive to the increased cognitive demands of part B. They recommended use of the ratio as an intrasubject comparative index, allowing one to control for individual variability. Lamberty et al. (1994) demonstrated the usefulness of the ratio measure as an index of cognitive flexibility controlling for intrasubject variability as it is relatively free of age and education confounds. The authors concluded that the ratio measure is most useful in a screening evaluation when strong diagnostic information is not available, in the context of age and education confounds. They also described its usefulness in the forensic context as
62
TESTS OF ATTENTION AND CONCENTRATION
"fakers" are expected to have a smaller ratio than brain-damaged individuals. The authors suggested that ratios of 2.0-2.5 represent normative performance, with 3.0 being identified as a cutoff for the presence of neuropsychological impairment. The usefulness of this cutoff is supported by Arbuthnott and Frank (2000), who found an especially large cost for alternating switches in participants with a B:A ratio greater than 3.0. However, Drane et al. (2002) concluded that rates of false-positive misclassifications for the 3.0 cutoff were unacceptably high in their sample of normal adults, especially for older age groups. In addition, Martin et al. (2003) did not find the B:A ratio to be sensitive to the severity of traumatic brain inju:ry. It also failed to identify examinees who were dissimilating, according to independent psychometric indicators. The authors concluded that the ratio measure does not enhance the clinical utility of the TMT in individuals with traumatic brain injury. Axelrod et al. (2000a) examined the sensitivity and specificity of different cutoffs for the ratio measure based on the Hebrew version of the TMT. They recommended use of a more conservative cutoff when performance is considered to be pathological.
Utility of the Error Analysis The diagnostic utility of the rate of performance errors on the TMT was examined in several studies. Rasmusson et al. (1998) reported an increase in error rate on part B, but not on part A, with each decade of life in their sample of nondemented participants over the age of 60. However, there was no longitudinal change in the error rate on a 2-year follow-up. In the demented sample, dementia status was significantly associated with the proportion of participants making errors on both parts A and B, independent of age. Stuss et al. (2001) found error analysis to be more useful than time to completion in distinguishing between patients with frontal lobe injuries and those with damage to nonfrontal areas or normal controls. All patients who made more than one error on part B had frontal lesions. Further division of the frontal
patients into subgroups, based on the number of errors, indicated that damage in dorsolateral frontal areas was associated with the greatest degree of impairment, whereas damage to the inferior medial aspects of the frontal lobes did not significantly affect performance. Steffens et al. (2001) reported greater frequency of subsequent errors after controlling for the overall initial error rate in a sample of geriatric depressed patients in comparison to a control elderly sample. The authors interpreted this finding as a performance feedback deficit in geriatric depression which is linked to a dysfunction of the orbital frontal cortex. Ruffolo et al. (2000) investigated the diagnostic utility of TMT errors by comparing error rates in two head-injured groups (varying in degree of injury severity), patients with suspect effort, controls, and experimental malingerers. No differences in error rates between both head-injured and control groups were found. However, error rates on part B were significantly higher in both malingering groups in comparison to the head-injured and control groups. The authors concluded that performance errors lack diagnostic utility for persons with head injuries but that they may be helpful in the assessment of malingering if used in conjunction with time to completion. To explore the specific cognitive mechanisms contributing to poor performance on the TMT in clinical samples, several investigators have examined the frequency of different categories of errors. Klusman et al. (1989) categorized errors on part B into shifting (from number to letter and from letter to number) and sequencing (number and letter) errors. The authors found that neither error type nor frequency nor percentage of individuals making errors differed significantly between head-injured and control groups. McCaffrey et al. (1989) identified two types of errors: sequential errors, which involve omission of the consecutive element in the series, and perseverative errors, indicating a failure to alternate between categories, with the latter type being applicable only to part B. Stability of both types of error between two administrations over a 7-10 day period was evaluated by the authors on a sample of
TRAILMAKING TEST
polysubstance users. The error analyses revealed a significant improvement in performance from the test to retest. The authors interpreted this improvement as a practice effect and pointed to the questionable utility of the error rates as indicators of stable central nervous system (CNS) dysfunction in polysubstance users. Amieva et al. (1998) used similar error categories in an investigation of cognitive failures contributing to low TMT performance in Dementia of Alzheimer's Type (DAT). Every transition between two items in their demented and elderly control samples was examined, and errors were further broken down into subcategories. The sequential errors (SE) category, identified by the authors as a failure to efficiently pursue the letter or the number series by omitting an item, was further subdivided into prox:imitySE (characterized by spatial proximity), SE to rectify (an attempt to move back and rectify an initial error), displacement SE (displacement of the subsequent sequence), and unexplained SE (not falling under any of the above descriptions). The perseverative errors (PE) categorywas conceptualized as a failure to alternate between the series of numbers and letters. The results yielded a significantly greater frequency of proximity SEs and PEs in the patient group, which was interpreted as an inhibitory deficit largely accounting for poor TMT performance in demented patients. Thus, review of the literature suggests that error analysis is not sensitive to cognitive deficits associated with head injury or polysubstance use. However, analysis of frequency and type of errors might be diagnostically useful in dementia and might contribute to the detection of localized frontal lobe dysfunction, suboptimal effort, and age-related decline in performance accuracy when used in conjunction with time to completion. Further research is needed to replicate these findings and to investigate the usefulness of these indices in other clinical settings.
Utility of the Cutoffs for Impairment The manual for the Army Individual Test (1944) provides a 10-point scale for converting raw scores, with 10 being the best score and 1
63
the worst. Reitan (1958) initially recommended use of this scaling method and suggested cutoffs for impaired performance based on the scaled scores. Given the significant association between TMT performance and age, IQ, education, and possibly gender, use of single cutoff scores does not appear to be appropriate, as has been confirmed by several studies. Bornstein et al. (1987b) found that, using a cutoff of ~40 seconds on part A and ~92 seconds on part B, 33% and 39% of a healthy elderly sample were misclassified as brain-damaged. Ernst (1987) found that use of cutoffs ~39 and ~92 seconds resulted in a misclassification rate of 48% for both parts A and B. Bak and Greene (1980) reported that at least 40% of their elderly normal participants were misclassified on part B. Dodrill (1987) documented misclassification rates of 11.7% and 13.3% when using cutoffs of ~39 seconds on part A and ~89 seconds on part B, respectively, in a young control sample. Bornstein and colleagues (1987b) noted that 96% and 98% correct classification rates for parts A and B, respectively, were obtained when cutoffs of ~55 seconds and 2:137 seconds were used (but 46% and 40% of braindamaged participants were then misclassified with these cutoffs). The authors emphasized that cutoff scores may be useful but only if considered in the context of other neuropsychological information obtained in a test battery and if age, education, and other appropriate adjustments are made. Cahn et al. (1995) used cutoffs of ~66 and ~172 seconds for parts A and B, respectively, in a study comparing DAT patients with a large sample of neurologically normal individuals. They report sensitivity and specificity indices of 69% and 90% for part A and 87% and 88% for part B. The authors underscored the diagnostic effectiveness of part B, which was one of the few measures contributing to optimal differentiation between DAT and control participants. The part B cutoff of 172 seconds was used by Rasmusson et al. (1998) in distinguishing between nondemented elderly and those participants who met criteria for DAT. Obtained sensitivity and specificity indices of 77% and 89.4% support the usefulness of this cutoff.
64
TESTS OF ATTENTION AND CONCENTRATION
The utility of the cutoff scores was further emphasized by Soukup et al. (1998), who recommended reporting cutoff scores that represent borderline (15th percentile) and defective (< 5th percentile) perforinance in addition to the means and standard deviations in future studies, to offset problems associated with the positive skew in the disbibution of TMT scores. i
Effect of the Order of Presentation and Practice Time, Practice Effect, an~ Alternate Versions of the TMT . The effect of the order of presentation on performance on parts A and B was examined by Taylor (1998a) in a sample of pa*nts with neurological disorders and by Miner and Ferraro (1998) in a sample of undergraduate students. Both studies revealed a ;gnificant time x order interaction, with time~ to completion being lower for part A and lii.gher for part B, for the reverse order of pretentation. Taylor (1998a) explained this trend i~ terms of a slight effect of practice in visual1scanning and noted that part B can be used if4 isolation as omission of part A will not lead ., serious distortion of part B performance. Thompson et al. (1999) examined the utility of practice times in predicting success or failure on the full version of the test. ~ authors presented tables of classification acetween diastolic blood pressure at age 50 and performance on the TMT part B 20 years later in a population-based study conducted fin Sweden on a sample of 502 men.
METHOD FOR EVALUATING THE NORMATIVE REPORTS To adequately evaluate the TMT normative reports, six key criterion variables were deemed critical. The first five of these relate tq subject variables, and the remaining dimensi~ refers to a procedural issue. Minimal requifements for meeting the criterion variables ivere as follows. ·
Subject Variables Sample Size
Fifty cases are considered a desirable sample size. Although this criterion is somewhat arbitrary, a large number of studies suggest that data based on small sample sizes are highly influenced by individual differences and do not provide a reliable estimate of the population mean. Sample Composition Description
Information regarding medical and psychiatric exclusion criteria is important. It is unclear if gender, geographic recruitment region, socioeconomic status, occupation, ethnicity, handedness, or recruitment procedures are relevant. Until this is determined, it is best that this information be provided. Age Group Intervals
This criterion refers to grouping of the data into limited age intervals. This requirement is especially relevant for this test since a strong effect of age on TMT performance has been demonstrated in the literature. Reporting of Education Levels
Given the association between education and TMT scores, information regarding educational level should be reported for each subgroup, and preferably normative data should be presented by educational levels. Reporting of Intellectual Levels
Given the relationship between TMT performance and IQ, information regarding intellectual level should be reported for each subgroup, and preferably normative data should be presented by IQ levels.
Procedural Variables Data Reporting
Means and standard deviations, and preferably ranges, for total time in seconds for each part of the TMT should be reported. Given the demonstrated utility of the B-A difference, B:A ratio, and error analysis with some
TRAILMAKING TEST clinical groups, reporting of means and SDs for these indices would facilitate interpretation of the results.
SUMMARY OF THE STATUS OF THE NORMS Our review of the literature located TMT normative reports for adults, as well as three interpretive guides for the HRB (Gilandas et al., 1984; Golden et al., 1981b; Reitan & Wolfson, 1985). Hundreds of other studies have also reported control subject data, and we have included a discussion of those investigations based on well-defined samples that involved some unique features, such as large sample size, retest data, elderly population, cutoff score analysis, reporting of derived measures, error analysis, etc. It should be noted that Russell and Starkey (1993) developed the Halstead-Russell Neuropsychological Evaluation System (HRNES), which includes the TMT among 22 tests. In the context of this system, individual performance is compared to that of 576 braindamaged participants and 200 participants who were initially suspected of having brain damage but had negative neurological findings. Data were partitioned into seven age groups and three educationaVIQ levels. The authors published an appendix to the manual (HRNES-R; Russell and Starkey, 2001), which contains tables of scale scores based on the original HRNES norms, demographic corrections, and regression-based predicted scores. These data will not be reviewed in this chapter because the "normal" group consisted of the Veterans Administration patients who presented with symptoms requiring neuropsychological evaluation. For further discussion of the HRNES system, see Lezak et al. (2004, pp. 676-677). There is a great deal of variability in the methodological aspects of studies summarized in this chapter. Sample sizes vary from 19 to over 700. Age represented in the studies varies from 15 to over 90 years. Sample compositions have been diverse and have included neurologically normal individuals (according to stringent exclusion criteria), job applicants,
71
medical/psychiatric patients, V.A. inpatients and outpatients, and homosexual/bisexual males. Similar concerns regarding variability between studies were raised by Soukup et al. (1998). In addition, some investigators set the maximum time for both parts A and B, which varies from 180 to 300 seconds. The majority of studies report mean age, education, and gender distribution for the sample and/or for the age groups. Some studies report WAIS-R IQs or estimated intelligence level, handedness, occupational level, and ethnic composition. Many studies present data divided into age groups. Few studies classify participants into education or IQ groups or present data for males and females separately; few studies report data for males only or present data in age by education by gender cells. Geographical origin of the data also varies widely: British, Australian, and Canadian data sets are presented in this chapter. Data for other cultural groups are also available in the literature (see above). The data are most commonly reported as time to completion for parts A and B. Some studies present raw data converted to T scores, error rate, percentile ranks, median time, total time for parts A and B, B-A difference, and B:A ratio. One study provides regression equations to correct raw data for age and education. Few studies present classification rates for different cutoff criteria. Test-retest data are reported in some studies with intertrial intervals ranging from 1 week to 24 weeks (in some studies, which are not reviewed for the purposes of data collection, up to 2 years). Issues of reliability and/or practice effect are discussed in these studies. Given that use of the TMT has typically been within the context of the HRB, the Reitan data and interpretation recommendations will be reported first, followed by a summary of the other interpretation formats and then by the normative publications and control data from clinical studies, presented in ascending chronological order. The text of study descriptions contains references to the corresponding tables identified by number in Appendix 4. Table A4.1, the locator table, summarizes information
TESTS OF ATTENTION AND CONCENTRATION
72
provided in the studies described in this chapter. 1
SUMMARIES OF THE STUDIES Reitan and Wolfson, 1985
The authors provided general guidelines for TMT score interpretation in the form of test completion times (in seconds), which correspond to "severity ranges" for part B only: 0-60 sec:
61-72 sec: 73-105 sec: ~106 sec:
perfectly normal (or better than average) normal mildly impaired seriously impaired
No other information was provided, such as score means, SDs, or any data regarding the normative sample on which these guidelines were developed. These cutoffs represent a substantial departure from cutoffs published earlier; the definition of normal performance here is approximately 20 seconds less than in the 1958 and 1979 guidelines.
Considerations regarding use of the study The authors argued that these norms were meant as "general guidelines" and that "exact percentile ranks corresponding with each possible score are hardly necessary because the other methods of inference are used to supplement normative data in clinical interpretation of results of individual participants" (p. 97). However, we maintain that more precise scores as well as separate normative data for different age, IQ, and educational levels are necessary to avoid false-positive errors in diagnosis. Gilandas, Touyz, Beumont, and Greenberg, 1984(p.l02)
The authors provided the percentile ranks associated with Davies' (1968) TMT normative data and concluded that a percentile rank of 25 is "mildly suggestive of brain damage" and scores at the lOth percentile and lower are "moderately suggestive of brain damage." 'Nonns for children are available in Baron (2004) and Spreen and Strauss (1998).
Golden, Osmon, Moses, and Berg, 1981b (pp. 22-23)
The authors provided recommendations regarding the detection of laterality of brain damage: part A is generally considered more a measure of right hemisphere integrity (i.e., visual scanning skills), where part B is more indicative of left hemisphere intactness (i.e., language symbol manipulation and direction of behavior according to a complex plan). Therefore when one part indicates impairment relative to the other part, a lateralized injwy may be present. . . . Part A is considered to indicate greater impairment if the score on part B is less than twice the score on part A. Part B indicates greater impairment if its score is more than three times the score on part A. Tests in which the part B score lies between two times and three times the part A score suggest that performances on the two parts are essentially equal. However, lateralizing properties of performance time ratios for two conditions have been repeatedly refuted in the literature (Hom & Reitan, 1990; Salthouse et al., 1996).
[TMT.l] Davies, 1968 (Table A4.2)
The author published TMT data on 540 British participants as a part of her investigation of the influence of age on TMT performance. Test scores were obtained on 50 men and 40 women in each of six decade age groups. The reference Davies cited as containing a further description of her subject sample could not be located. Mean times in seconds corresponding to lOth, 25th, 50th, 75th, and 90th percentile ranks for parts A and B are provided for each age decade, with the exception that the data on the participants in their 20s and 30s were collapsed. Davies also reports optimal cutoff points for young vs. middle-aged individuals. No significant gender differences were observed within any specific decade, although in the group as a whole men performed slightly but significantly more quickly on part B.
Study strengths 1. Presentation of the data in 10- or 20-year age intervals.
TRAILMAKING TEST
2. Very large sample size, large Ns within each age subgroup, and fairly equal representation of males and females.
Considerations regarding use of the study 1. Lack of IQ and education data or description of exclusion criteria. 2. Lack of test score SDs. 3. Tested in England, which may limit generalizability for clinical interpretive purposes in the United States. [TMT.2] Goul and Brown, 1970 (Table A4.3) The authors tested 103 (or 106) Canadian workers' compensation board non-braininjured patients who had been hospitalized for at least 3 months. These data were collected as a part of the authors' analysis of the effects of age and intelligence on TMT perfonnance. Participants had negative neurological histories and included amputees, burn victims, and patients with lumbosacral fusions. Educational levels ranged from 6 to 13 years of fonnal schooling; no means are reported. Participants were classified into five age groups: 20-29, 30-39, 40-49, 50-59, and 60-72. Individual group sizes ranged from 15 to 26. Mean (SD) WAIS FSIQs for the five groups were 103.8 (12.1), 110.1 (8.9), 105.3 (7.9), 112.7 (8.6), and 104.2 (12.2), respectively. TMT parts A and B data are presented in tenns of mean time in seconds, SDs, ranges, medians, and recommended cutoff scores for the five age groups. Perfonnance declined significantly with age. Contrary to expectations, IQ was significantly positively correlated with TMT scores.
Study strengths 1. Presentation of the data by age groups. 2. Infonnation on mean IQ and SD for each age group. 3. Infonnation on educational level and geographic area provided. 4. Means and SDs are reported.
73
4. Small sample sizes in the upper age ranges. 5. Data were collected in Canada, raising questions regarding their usefulness for clinical interpretation in the United States. [TMT.l] Wiens and Matarazzo, 1977
(Table A4.4)
The authors collected TMT data on 48 male applicants to a patrolman program in Portland, Oregon, as a part of an investigation of the WAIS and Minnesota Multiphasic Personality Inventory (MMPI) correlates of the Halstead-Reitan battery. All participants passed a medical exam and were judged to be neurologically nonnal. Participants were divided into two equal groups, which were comparable in age (23.6 vs. 24.8 years), education (13.7 vs. 14.0 years), and WAIS FSIQ (117.5 vs. 118.3). TMT mean time in seconds and SDs are provided for each group. A random subsample of 29 of the applicants was readministered the TMT 14-24 weeks following the original administration. Means and SDs for TMT times in seconds for both the original testing and retest are reported. None of the 29 original participants obtained scores lower than Reitan's suggested cutoff for part B; however, one subject fell below the recommended cutoff for part A on the second administration. Correlations between test perfonnance and IQ scores were not meaningful. In the first group, significant negative correlations were obtained between part B performance and FSIQ and VIQ, but no significant correlations were obtained between the second control group and IQ measures. In group 1, a significant negative correlation between part A and FSIQ and a significant positive correlation between part A and PIQ were documented; again, no significant correlations were obtained between part A scores and IQ measures in the second control group.
Study strengths Considerations regarding use of the study 1. Participants were medical patients with extensive hospitalizations. 2. Lack of data regarding education means. 3. Some variability in IQs across age groups.
1. Demographic characteristics of the sample are presented in tenns of gender, age, education, IQ, recruitment procedures, and geographic area. 2. Adequate medical exclusion criteria.
74
TESTS OF ATTENTION AND CONCENTRATION
3. Means and SDs are reported: 4. The data are provided in a restricted age range. 5. Information on test-retest performance is provided. Considerations regarding use of the study 1. High IQ level. 2. Relatively small sample size. 3. All-male sample. [TMT.4] Eson, Yen, and Bourke, Personal Communication (Table A4.5)
The authors collected normative data for the TMT on a sample of 63 older patticipants. Mean time in seconds and SDs are;provided for four age groups, with mean agdi of 63.2, 67.0, 72.0, and 78.3 years. Samplel sizes for each age group range between 15 ar!d 16. No other information is provided, such ias exclusion criteria or demographic data. : Study strengths 1. Data on an elderly sample are provided and stratified by age group. 2. Means and SDs are reported. Considerations regarding use of the $rudy 1. No reported exclusion criteria ,or other demographic, IQ, or geographic data. 2. Age range and SDs for each group are not reported. 3. Relatively low sample sizes. [TMT.S] Harley, Leuthold, Matthews, and Bergs, 1980 (Table A4.6)
The authors collected TMT data on 193 V.A.hospitalized patients in Wisconsin, ranging in age from 55 to 79. Exclusion criteria if'tcluded FSIQ less than 80, active psychosis, qnequivocal neurological disease or brain damage, and serious visual or auditory acuity p~blems. Patients with chronic brain syndronle were included. Patient diagnoses were as follows: chronic brain syndrome unrelated to ~cobol ism (28%), psychosis (55%), alcoholis~ (37%), neurosis (9%}, and personality disorddr (4%). Mean educational level was 8.8 ye~. The sample was divided into five age groups: 55-59 (n =56), 60--64 (n = 45), J 65-69 (n=35), 70-74 (n=37), and 75-7~ years
(n =20). Mean educational level and percent included in each of the diagnostic classifications are reported for each age group. The authors also provide test data on a subgroup of 160 participants equated for percent diagnosed with alcoholism across the five age groups. The "alcohol-equated sample" was developed "to minimize the influence that cognitive or motor/sensory differences uniquely attributable to alcohol abuse might have upon group test performance levels" (p. 2). This subsample remained heterogeneous regarding representation of the other diagnostic categories. Mean time in seconds, SD, and ranges are reported for parts A and B for each age interval for the whole sample and for the alcohol-equated sample.
Study strengths 1. Large sample size, with some individual cells of approximately n =50. 2. Reporting of IQ data, geographic area, age, and education. 3. Data presented in age groupings. 4. Means and SDs are reported. Considerations regarding use of the study 1. The presence of substantial neurological (chronic brain syndrome), substance abuse, and major psychiatric disorders in the sample. 2. Low educational level, though IQ levels are average. 3. No information regarding gender, but given the V.A. setting, it is likely that most or all of the sample was male. Other comments The scores for the two oldest age groups are identical in the whole sample and the alcoholequated group because these two groups did not have overrepresentation of alcoholics and, thus, did not need to be adjusted. [TMT.6] Anthony, Heaton, and Lehman, 1980 (Table A4.7)
The purpose of the study was to cross-validate two computer programs designed to determine the presence, location, and process of brain lesions using scores from the HRB and
TRAILMAKING TEST
the WAIS. Patients with structural brain lesions and normal controls were compared. The control group consisted of 100 volunteers with no medical or psychiatric problems and no history of head trauma, brain disease, or substance abuse. The study was conducted in Colorado. TMT data are presented in terms of mean times in seconds and SDs for part B only. Study strengths 1. Information regarding education, IQ, age, and geographic area is provided. 2. Large sample size. 3. Adequate exclusion criteria. 4. Means and SDs are reported. Considerations regarding use of the study 1. Undifferentiated age grouping. 2. The IQ range is high average. 3. No information is available regarding the gender ratio. 4. Data are provided for part B only. [TMT.7] Bak and Greene, 1980 (Table A4.8) The authors gathered TMT data on 30 righthanded Texan participants as a part of an investigation of the effect of age on performance on the HRB and the Wechsler Memory Scale. Participants were equally divided into two age groupings: 50-62 and 67-86. Participants were 8uent in English and denied a history of CNS disorders, uncorrected sensory deficits, or illnesses or "incapacities" which might affect test results; participants in poor health were excluded. The mean (SD) ages of the two groups were 55.6 (4.44) and 74.9 (6.04), respectively. Participants in the first group were born between 1916 and 1929, and participants in the second group were born between 1892 and 1912. Nine individuals in the first group were female, and 10 participants in the second group were female. Four WAIS subtests were administered (Information, Arithmetic, Block Design, Digit Symbol); the mean scores on these measures suggested that IQ levels were within the high average range or higher. Mean times in seconds and SDs for parts A and B are presented for the two age groups. Significant differences in performance were
75
documented between the two groups on both parts of the test. Study strengths 1. The study provides data on a very elderly cohort not found in other published normative data. 2. Adequate exclusion criteria. 3. Sample composition is well described in terms of age, gender, education, 8uency in English, handedness, and geographic area. 4. Means and SDs are reported. Considerations regarding use of the study 1. Sample sizes are small. 2. High IQ and educational level for the older age grouping. 3. The older age grouping spans nearly two decades and may be too broad for optimal clinical interpretive use. [TMT.B] Kennedy, 1981 (Table A4.9)
The author collected TMT data on 150 Canadian participants as a part of his analysis of the effects of age on TMT performance. Participants were employees of a mental health center "who represented diverse work roles" randomly selected from five age groups: 2029,30-39,40-49,50-59,60-69. Participants were excluded who reported histories of "central nervous system disorders, illnesses, or incapacities which would bias test results;" exclusion criteria were not further specified. Mean education was 13. 73, 13.53, 13.11, 11.59, and 12.50 years, respectively; those 50-59 years old were significantly less educated than those 20-29 or 30-39 years old. The Ammons Quick Test was used as an estimate of intelligence level; average estimates for the five groups were 123.43, 127.10, 127.40, 123.30, and 128.54, respectively. Males and females were equally represented in each group. The mean time in seconds and SDs for parts A, B, and A+ B for each group are provided. Performance decreased significantly with age, and significant negative correlations between TMT test scores and education and IQ suggest that lower education and IQ are adversely related to test performance.
76
TESTS OF ATTENTION AND CONCENTRATION
Study strengths 1. Large sample size, although the individual cells had only 30 participants per cell. 2. Presentation of the data in terms of age groupings. 3. Reporting of education, IQ estimates, gender, and geographic area. 4. Means and SDs are reported. Considerations regarding use of the study 1. Very high mean intelligence scores. 2. Some variability in educational level across groups, which may have led to some unusual findings; inexplicably, those 60--69 years old performed either as well as or slightly better than those 50-59 years old. 3. Vague exclusion criteria. 4. Lack of reference to ethnicity/language issues and the fact that data were obtained on Canadians, possibly reducing its generalizability for clinical interpretation in the United States.
Study strengths 1. The large overall sample size. 2. Data are partitioned into five age groups. 3. Sample composition is described in terms of IQ, educational level, age, gender, handedness, recruitment procedures, and geographic area. 4. Some psychiatric and neurological exclusion criteria are used. 5. Means and SDs are reported. Considerations regarding use of the study 1. High intellectual and educational levels of the sample. 2. Sample size for some age groups is very
small. 3. Data were collected in Canada, which may limit their usefulness for clinical interpretation in the United States. 4. Essentially no differences in performance were noted between those 1823 years old and those 24-32 years old, suggesting that use of a single age grouping for 18-32 would have been appropriate.
[TMT.9] Fromm-Auch and Yeudall, 1983 (Table A4.10)
[TMT.10] Bornstein, 1985 (Table A4.11)
The authors obtained TMT data on 193 Canadian participants (111 male, 82 female) recruited through posted advertisements and personal contacts. Participants are described as "nonpsychiatric" and "nonneurological." Eighty-three percent of the sample were right-handed. Mean (SD) age was 25.4 (8.2) years (range= 15--64). Mean (SD) education was 14.8 (3.0) years (range= 8-26) and included technical and university training. Mean (SD) WAIS FSIQ, VIQ, and PIQ were 119.1 (8.8, range= 98-142), 119.8 (9.9, range=95-143), and 115.6 (9.8, range=89146), respectively. Of note, no subject obtained an FSIQ which was lower than the average range. Mean time in seconds, SDs, and ranges for parts A and B are reported for five age groupings: 15-17, 18-23, 24-32, 33-40, and 41-64 years. Sample sizes range from 10 to 75. The two oldest age groupings had sample sizes less than 20. No gender differences were documented, and male and female data were collapsed.
The author collected data on 365 Canadian individuals (178 males and 187 females) recruited through posted notices on college campuses and unemployment offices, newspaper ads, and senior-citizen groups. Participants were paid for their participation. Participants ranged in age from 18 to 69 years, with a mean of 43.3 (17.1) years, and had completed 5-20 years of education, with a mean of 12.3 (2. 7) years. Ninety-one and a half percent of the sample were right-handed. No other demographic data or exclusion criteria are reported. Mean time in seconds and SDs for parts A and B are reported for three age groupings (20-39, 40-59, and 60--69 years), two educational levels (less than high school, greater than or equal to high school), and gender, resulting in a total of 12 separate groups. Individual group sample sizes ranged from 13 to 86. Significant correlations were obtained between TMT scores and age and education, suggesting that better performance was associated with younger age and more years of
TRAILMAKING TEST
education. Females generally outperformed males on both parts A and B. Study strengths 1. Very large overall sample size. 2. Data are stratified by age, gender, and educational level. 3. This data set is unique in that it reports data for participants with less than a high school education. 4. Information on handedness, recruitment procedures, and geographic area is provided. 5. Means and SDs are reported. Considerations regarding use of the study 1. Individual sample sizes of some cells are small. 2. Lack of any reported exclusion criteria. 3. Data were collected on Canadian citizens, which may limit generalizability for their use in the United States. 4. Lack of IQ data. The concern over the lack of IQ data is somewhat mitigated by the fact that the mean education level was not unduly elevated (12.3 years), which might suggest that mean intellectual levels were within the average range. [TMT.11 1 Heaton, Grant, and Matthews, 1986 (Table A4.12)
The authors obtained TMT data on 553 normal controls in Colorado, California, and Wisconsin as a part of an investigation into the effects of age, education, and gender on HRB performance. Nearly two-thirds of the sample were male (males= 356, females= 197). Exclusion criteria were history of neurological illness, significant head trauma, and substance abuse. Participants ranged in age from 15 to 81 years, with a mean of39.3 (17.5) years, and mean education was 13.3 (3.4) years, with a range of 0-20 years. The sample was divided into three age categories (less than 40, 40-59, and greater than or equal to 60 years) with 319, 134, and 100 participants respectively, and into three education categories (less than 12 years, 12-15 years, and greater than or equal to 16 years) with 132, 249, and 172 participants, respectively.
77
Testing was conducted by trained technicians, and all participants were judged to have expended their best effort on the task. The TMT mean time in seconds for part B is reported for the six subgroups, as well as percent classified as normal using Russell et al.'s (1970) criteria. Approximately 30% of the test score variance was accounted for by age and approximately 20% was associated with education level. Significant group differences in TMT scores were found across the three age groups and across the three education groups, and a significant age-by-education interaction was documented. No significant differences in performance were found between males and females. Study strengths 1. Large size of overall sample and individual cells. 2. Information regarding age, education, gender, handedness, and geographic area is provided. 3. Adequate exclusion criteria. 4. Data are grouped by age and educational level. Considerations regarding use of the study 1. No reporting of data for part A. 2. SDs are not provided. 3. Mean scores are reported for individual WAIS subtest scaled scores but not for overall IQ scores. 4. Age groupings are quite large in terms of ranges. [TMT.12] Alekoumbides, Charter, Adkins, and Seacat, 1987 (Table A4.13) The authors report data on 118 medical and psychiatric inpatients and outpatients without cerebral lesions or histories of alcoholism or cerebral contusion from V.A. hospitals in southern California as a part of their development of standardized scores corrected for age and education for the HRB. Among the 41 psychiatric patients, nine were diagnosed as psychotic and 32 as neurotic. In addition to psychiatry services, patients were drawn from medicine (n =57), neurology (n = 22), spinal cord injury (n = 9), and surgery (n = 6) units. Mean age was 46.85 (17.17) years, ranging
78
TESTS OF ATTENTION AND CONCENTRATION
from 19 to 82 years, and mean education was 11.43 (3.20) years, ranging from 1 to 20 years. Frequency distributions for age and years of education are provided. Mean WAIS FSIQ, VIQ, and PIQ were within the average range: 105.89 (13.47), 107.03 (14.38), and 103.31 (13.02), respectively. Means and SDs for individual age-corrected subtest scores are also reported. All participants except one were male; the majority were Caucasian (93%), with 7% African-American. The mean score on a measure of occupational attainment was 11.29. No differences were found in test performance between the two psychiatric groups and the nonpsychiatric group, and the data were collapsed. Mean times to complete parts A and B in seconds and SDs are reported. In addition, regression equation information to allow correction of raw scores for age and education is included. Study strengths 1. Large sample size. 2. Information regarding IQ, age, education, ethnicity, gender, occupational attainment, and geographic area is provided. 3. Regression equation for computation of age- and education-corrected scores is provided. 4. Means and SDs are reported. Considerations regarding use of the study 1. The sample was heterogeneous in terms of medical diagnoses; psychiatric patients were included in this sample, which was supposedly representative of "normal" participants. 2. Undifferentiated age range (mitigated by the regression equation information). 3. Nearly all-male sample. [TMT.13] Bornstein, Baker, and Douglass, 1987a (Table A4.14)
The authors collected TMT test-retest data on 23 volunteers (14 women, nine men) who ranged in age from 17 to 52, with a mean age of 32.3 (10.3), as part of an examination of the short-term retest reliability of the HRB. Exclusion criteria consisted of a positive
history of neurological or psychiatric illness. Mean Verbal IQ was 105.8 (10.8), ranging from 88 to 128, and mean Performance IQ was 105.0 (10.5), ranging from 85 to 121. Participants were administered the HRB in standard order both on initial testing and again 3 weeks later. Means, SDs, and ranges for time in seconds to complete parts A and B for both testing sessions are provided, as well as raw score change and SD, median raw score change, and mean percent of change. For part A, no significant correlations between mean change and age or education or between mean percent of change and age or education were documented. For part B, no significant correlations between mean change and age or education or between mean percent change and education were found; however, a significant correlation did emerge between mean percent of change and age. Study strengths 1. Information on short-term (3-week) retest data is provided. 2. Sample composition is described in terms of age, VIQ, PIQ, and gender. 3. Minimally adequate exclusion criteria. 4. Means, SDs, and ranges are reported. Considerations regarding use of the study 1. Undifferentiated age range. 2. Small sample size. 3. No data on educational level. [TMT.14] Dodrill, 1987 (Table A4.15)
The author collected TMT data on 120 participants in Washington during the years 19751976 (n = 81) and 1986-1987 (n = 39). Half of the sample was female, and 10% were minorities (six black, three Native American, two Asian American, one unknown). Eighteen were left-handed, and occupational status included 45 students, 37 employed, 26 unemployed, 11 homemakers, and one retiree. Participants were recruited from various sources, including schools, churches, employment agencies, and community service agencies, and either paid for their participation or offered an interpretation of their abilities. Exclusion criteria were history of "neurologically relevant disease (such as meningitis or
TRAilMAKING TEST
encephalitis);" alcoholism; birth complications "of likely neurological significance;" oxygen deprivation; peripheral nervous system injury; psychotic or psychosis-like disorders; or head injury associated with unconsciousness, skull fracture, persisting neurological signs, or diagnosis of concussion or contusion. Of note, one-third of potential participants failed to meet the above medical and psychiatric criteria, resulting in a final sample of 120. Mean age was 27.73 (11.04) years, and mean education was 12.28 (2.18) years. The participants tested in the 1970s were administered the WAIS, whereas the participants assessed in the 1980s were administered the WAIS-R; WAIS scores were converted to WAIS-R equivalents by subtracting 7 points from the VIQ, PIQ, and FSIQ. Mean FSIQ, VIQ, and PIQ scores were 100.00 (14.35),100.92 (14.73), and 98.25 (13.39), respectively. The IQ scores ranged from 60 to 138 and reflected a normal distribution. Mean time in seconds and SDs for parts A and B are reported as well as IQ-equivalent scores for various levels of intelligence. Between 10% and 15% of the sample were misclassified as brain-damaged using cutoffs of 39 seconds for part A and 89 seconds for part B.
Study strengths 1. Large sample size. 2. Comprehensive exclusion criteria. 3. Sample composition is described in terms of education, IQ, occupation, gender ratio, age, handedness, ethnicity, recruitment procedures, and geographic area. 4. IQ-equivalent scores are provided. 5. Data for different IQ levels are provided. 6. Means and SDs are reported.
Considerations regarding use of the study 1. Undifferentiated age range. [TMT.15] Ernst, 1987 (Table A4.16) The author obtained TMT data on 110 primarily Caucasian (99%) residents of Brisbane, Australia, aged 6~75. Fifty-nine were female and 51 were male, with a mean educational level of 10.3 years; men and women did not
79
differ in years of education. Participants were recruited primarily through random selection from the Queensland State electoral roll (n = 97), with the remainder (n = 13) solicited through senior-citizen centers. Exclusion criteria were history of significant head trauma or neurological disease. Nearly one-half of the sample was diagnosed with at least one chronic disease (hypertension = 33, heart disease= 9, thyroid dysfunction = 7, asthma= 5, emphysema= 2, diabetes= 1) for which they were receiving treatment described as "wellcontrolled." Sixty-six of the participants were receiving medications, primarily for the diseases listed above. The test was administered according to Reitan's instructions. All participants were administered the TMT first, followed by either the Tactual Performance Test or Booklet Category Test. Using the standard cutoffs of 39 seconds and 92 seconds, 48% and 48% of all participants were misclassified as impaired for parts A and B, respectively. Gender differences were not significant; however, education was significantly related to performance on part B. There were no significant effects of chronic disease or medication intake.
Study strengths 1. Large sample size in a restricted age range. 2. Presentation of the data by gender. 3. Sample composition is described in terms of age, education, geographic recruitment area, recruitment procedures, and ethnicity. 4. Information regarding test administration order effects is provided. 5. Means, SDs, and error rates are reported.
Considerations regarding use of the study 1. Approximately half of the participants had at least one chronic illness, and over half were taking prescribed medications. 2. No information regarding IQ. 3. Low mean educational level. 4. Data were collected in Australia and may be unsuitable for clinical use in the United States.
80
TESTS OF ATTENTION AND CONCENTRATION
[TMT.16] Stuss, Stethem, and Poirier, 1987 (Tables A4.17 and A4.18)
The authors collected normative data on 60 Canadian English- or French-speaking participants, who were recruited through personal contacts or employment agencies and paid for their participation. Tests were administered in each subject's native language. Participants were tested twice at 1-week intervals. Exclusion criteria were abnormal vision (even after correction); history of substance abuse; presence of medical, neurological, and/ or psychiatric disorders; and current use of psychotropic medication (Stuss, personal communication). Ten participants were assigned to each of six age ranges: 16-19, 2029, 30-39, 40-49, 50-59, and 60-69. Fifty-five percent of the sample were male, and 18% were left-handed. Mean education was 14.3 (2.62) years. Data are provided regarding handedness, gender distribution, and education. Mean time in seconds and SDs for the two parts of the TMT for the first, second, and combined testing sessions are reported for each age interval. Mean time and SDs are also provided for males, females, those with less than or equal to 12 years of education, and those with greater than 12 years of education, collapsed across age groupings. Older participants and those with a high school education or less performed significantly poorer than younger participants or those with some college or university education. Educational level was somewhat irregularly distributed across age groups, and the authors suggest that the normative data be used with caution. A practice effect was present, but the authors question the clinical relevance of the improvement. No significant gender differences in performance were present. Study strengths 1. Presentation of the data by age groupings, education groupings, and gender. 2. Extensive information on educational level. 3. Sample composition is described in terms of age, gender, handedness, geographic location, and recruitment procedures.
4. Adequate exclusion criteria. 5. Information regarding practice effect. 6. Means and SDs are reported.
Considerations regarding use of the study 1. Small sample sizes within each age group. 2. Variability in mean educational levels across age groups; of importance, those 50-59 years old had the lowest mean educational level, the lowest mean test scores, and the largest SDs relative to the other age groups. 3. Lack of IQ data. 4. Unknown influence of language differences. 5. Data were obtained in Canada and may be of limited usefulness for clinical interpretation in the United States. [TMT.17] Yeudall, Reddon, Gill, and Stefanyk, 1987 (Table A4.19)
The authors obtained TMT data on 225 Canadian participants recruited from posted advertisements in workplaces and personal solicitations. The participants included meat packers, postal workers, transit employees, hospital lab technicians, secretaries, ward aides, student interns, student nurses, and summer students. In addition, high school teachers identified for participation average students in grades 10-12. The participants (127 males and 98 females) did not report any history of forensic involvement, head injury, neurological insult, prenatal or birth complications, psychiatric problems, or substance abuse. Data were gathered by experienced testing technicians who "motivated the participants to achieve maximum performance" partially through the promise of detailed explanations of their test performance. Means and SDs for time in seconds to complete parts A and B are presented for four age groupings (15-20, 21-25, 26-30, and 31-40) for males and females combined and separately. Information regarding percent right-banders, mean years of education, and mean WAIS/WAIS-R FSIQ, VIQ, and PIQ is reported for each age grouping and ageby-gender grouping. For the sample as
TRAILMAKING TEST
a whole, 88% were right-handed and had completed an average (SD) of 14.55 (2.78) years of schooling. The mean FSIQ, VIQ, and PIQ were 112.25 (9.83), 114.77 (10.34), and 108.50 (10.34), respectively. Study strengths 1. Large sample size. 2. Grouping of data by age. 3. Data availability for a 1~20 year age group. 4. Adequate medical and psychiatric exclusion criteria. 5. Information regarding age, handedness, education, IQ, gender, occupation, recruitment procedures, and geographic area is provided. 6. Means and SDs are reported. Considerations regarding use of the study 1. High educational level of the sample. 2. Data were obtained on Canadian participants, which may limit their usefulness for clinical interpretation in the United States due to possible subtle cultural differences. Other Comments No significant correlations were found between age or education and part A; a significant correlation emerged for age and part B (r = 0.27), but no significant relationship was documented between education and part B. Significant correlations emerged between parts A and B and PIQ but not VIQ. No significant gender differences were observed for part A or B. The authors recommend use of the combined age group norms for part A and the separate age-grouped norms for part B. [TMT.18] Bomstein and Suga, 1988 (Table A4.20)
As part of their evaluation of the effect of educational level on neuropsychological test performance in the elderly, the authors report TMT data on 134 healthy elderly Canadian paid volunteers aged 5~70 according to three educational levels: ~10 (n =46), 11-12 (n = 44), and greater than 12 (n = 44) years. Nearly two-thirds of the sample were female
81 (n = 85). The average (SD) age for the sample was 62.7 (4.3), and the mean ages of the three educational groups were comparable: 62.3, 62.9, and 63.0 years, respectively. Exclusion criteria were history of neurological or psychiatric disorder. Significant group differences in performance on both TMT A and B were obtained across the three education groups, which were due to the group with ~10 years of education performing significantly worse than both of the other education groups (which did not differ from each other). Mean time in seconds and SDs for parts A and B are reported for the three education groups.
Study strengths 1. Large overall sample size and individual cell sizes are adequate. 2. Data are partitioned into three education groups; the study is unique in terms of representation of participants with less than 12 years of education. 3. Information regarding gender, age, and geographic area is provided. 4. Means and SDs are reported. 5. Minimally adequate exclusion criteria. 6. Reasonably restricted age grouping. Considerations regarding use of the study 1. No information regarding IQ. 2. Greater than 12 years of education is too large a category. 3. Data collected in Canada, which may limit generalizability for use in the United States. [TMT.19] Stuss, Stethem, and Pelchat, 1988 (Table A4.21)
In this publication, Stuss and colleagues expanded the data presented in Stuss et al. (1987). The size of the sample is increased, and the participants are collapsed into three age groupings of 30 participants each: 16-29, 30-49, and 50-69. Gender distribution was essentially equal across groups. Mean years of education for the youngest to oldest groups were 14.1 (1.34), with a range of 11-18; 14.9 (3.95), with a range of ~20; and 13.2 (2.38), with a range of 8-18, respectively (compare to TMT.16).
82
TESTS OF ATTENTION AND CONCENTRATION
Mean time in seconds and SDs for the two parts on the initial test and retest 1 week later are reported for each age interval. The,authors call attention to the skewness and i lack of normal distribution of the test datli, which they suggest have implications for t~t score interpretation.
Study strengths 1. Large overall sample size. 2. Small age range within each grouping. 3. Adequate exclusion criteria. 4. Information on age, IQ, education, gender, and geographic recruitment Mea is pro~ded.
5. Means and SDs Me reported. Study strengths 1. Increased sample size per age iqterval. 2. Adequate exclusion criteria. I 3. Information regarding age, e~cation, gender, and handedness is provif!ed. 4. Data regarding retest at 1-week ibtervals Mepro~ded. : 5. Means and SDs Me reported. Considerations regarding use of the stpdy 1. Considerations remain the same for the initial report except for the imprqvement in sample size.
a;
!
[TMT.20] Van Gorp, Satz, and Mitrushh.., 1990 (Table A4.22) I
The authors present TMT data for 156 pealthy elderly participants ranging in age fro~ 57 to 85, recruited from an independent-li~ng retirement community in California. llle data were collected as a part of their investigation of cognitive changes in normal aging1 Information regMding general medical status was collected. Participants with a history of neurological or psychiatric disorder or substance abuse were excluded. Sixty-one percent: of the sample were females. Mean educati~n was 14.14 (2.86) years, and mean FSIQ (\VAIS-R Satz-Mogel format) was 117.21 (12.59, years. Mean time in seconds and SDs to cdmplete parts A and B were listed for the sample as a whole and for four age groups: 57-65,:66-70, 71-75, and 76-85. Sample sizes for e~h age group ranged from 26 to 57. Mean VJQ and PIQ and SDs for each age range M~ listed. Mean VIQs were consistently within ~e high average range, except for those 71-75 years old, who fell within the superior range. Me~ PIQs were within the high average range for those 57-65 years old and 76-85 yeMs old~ Older participants (70 or older) did not diffec significantly from younger participants (less ~an 70) in VIQ, PIQ, or years of education. '
Considerations regarding use of the study 1. High intellectual level of the sample. 2. Relatively high educational level. 3. Inexplicably, part B performance was lower in those 66-70 years old relative to those 71-75 yeMs old, and there appeMed to be considerably more variation in performance in the 66-70 age grouping. Given that increasing age is associated with a worsening of performance, the data for the 66-70 yeM group Me problematic. [TMT.21] Heaton, Grant, and Matthews, 1991 The authors pro~ded normative data on the TMT from 486 (378 in the base sample and 108 in the validation sample) urban and rural participants recruited in several U.S. states (California, Washington, Colorado, Texas, Oklahoma, Wisconsin, Illinois, Michigan, New York, Virginia, and Massachusetts) and Canada. Data were collected over a 15-year period through multicenter collaborative efforts. Sixty-five percent of the sample were males. Mean age for the total sample was 42.06 (16.8) yeMs, and mean educational level was 13.6 (3.5) yeMs. The majority of participants were administered the WAIS; mean FSIQ, VIQ, and PIQ were 113.8 (12.3), 113.9 (13.8), and 111.9 (11.6), respectively. Exclusion criteria were history of learning disability, neurological disease, illness affecting brain function, significant head trauma, significant psychiatric disturbance (e.g., schizophrenia), and alcohol or other substance abuse. The TMT was administered according to procedures outlined by Reitan and Wolfson (1985), with the exception that attempts to complete part B were limited to 10 minutes. In those situations when part B was
83
TRAILMAKING TEST discontinued at 10 minutes, the time score was prorated by dividing 300 seconds by the number of items completed and then multiplying the resulting figure by 25. Participants were generally paid for their participation and judged to have provided their best efforts on the tasks. The normative data, which are not reproduced here, are presented in comprehensive tables in T-score equivalents for test scaled scores for males and females separately in 10 age groupings (20-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-80 years) by six educational groupings (6--8, 9-11, 12, 13-15, 16-17, ~18 years). For part A, 30% of the score variance was accounted for by age, while 16% was attributable to educational level; gender accounted for a negligible amount of unique variance in performance (1%). A total of 35% of the test score variance was accounted for by demographic variables. For part B, 34% of the score variance was accounted for by age, while 27% was attributable to educational level; again, gender accounted for a negligible amount of unique variance (1%). A total of 45% of the test score variance was accounted for by demographic variables. For the sample as a whole, mean time in seconds for part A was 29.0 (12.5) and that for part B was 75.2 (42.8). The interested reader is referred to the Fastenau and Adams (1996) critique of Heaton et al. (1991) norms and Heaton et al.'s (1996a) response to this critique. In 2004, the authors published the revised norms, which are based on a sample of over 1,000 normal adults. In addition to age, education, and gender stratification, the data are partitioned by race/ethnicity (AfricanAmerican and Caucasian).
Study strengths 1. Large sample size. 2. Comprehensive exclusion criteria. 3. Detailed description of the demographic characteristics of the sample in terms of age, education, IQ, geographic area, and gender. 4. Administration procedures are outlined.
5. The normative data are presented in comprehensive tables in T-score equivalents for males and females separately in 10 age groupings by six educational groupings.
Considerations regarding use of the study 1. Above average mean intellectual level (which is probably less of an issue given that these are WAIS rather than WAIS-R IQ data).
[TMT.22] Seines, Jacobson, Machado, Becker, Wesch, Miller, Visscher and McArthur, 1991 (Table A4.23)
The investigation used participants from the Multi-Center AIDS Cohort Study (MACS). The article presents data for seronegative homosexual and bisexual males collected in Los Angeles for the purpose of establishing normative data for neuropsychological test performance based on a large sample. Participants with a history of head injury with loss of consciousness greater than 1 hour and who reported drinking 21 or more drinks per week in the previous 6 months were excluded. The majority of the sample consisted of Caucasian participants. African-American participants ranged from 3.4% to 4.1% for different age groups. Left-banders ranged from 11.3% to 14.9%.
Study strengths 1. The overall sample size and individual cell sizes are large. 2. Normative data are stratified by age and education. 3. The demographic composition of the sample is described in terms of age, gender, sexual orientation, handedness, ethnicity, and geographic area; demographic composition is described for each age and education cell separately. 4. Means, SDs, as well as scores for the 5th and lOth percentiles are presented. 5. Minimally adequate exclusion criteria.
Considerations regarding use of the study 1. All-male sample. 2. No information on IQ is reported. 3. Very high educational level of the sample.
84
TESTS OF ATTENTION AND CONCENTRATION
[TMT.23] Elias, Robbins, Walter, and Schultz, 1993 (Table A4.24)
The authors explored the influence of gender and age on performance on tests included in the HRB. The sample consisted of 427 community-dwelling volunteers. As per medical interview and self-report on the Cornell Medical Index, none of the participants had a history of treatment for neurological disorder, senility, alcoholism, brain trauma, mental illness, cerebral vascular or catastrophic disease, or a diagnosis of senile dementia. To achieve equivalence between age groups in terms of education, the lower and upper limits for education were set at 12 and 19 years, respectively. All participants had normal or correctedto-normal vision. Occupations ranged from blue-collar to professional. Non-age-corrected WAIS Vocabulary scaled scores ranged from 13.9 to 14.7, and Information scores ranged from 13.2 to 13. 7. Mean time in seconds and SDs to complete parts A and B were reported for six age groups (15-24, 25-34,35-44,45-54,55-64, and ;:::65) for males and females separately. The authors found significant linear trends across age cohorts for parts A and B. Study strengths 1. Large overall sample and adequate sample size for individual cells. 2. The sample composition is well described in terms of age, education, gender, and WAIS Vocabulary and Information scaled scores. 3. Rigorous exclusion criteria. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Education and estimated intelligence level for the sample are high. 2. Age range for the oldest group is not reported. [TMT.24] Cahn, Salmon, Butters, Wiederholt, Corey-Bloom, Edelstein, and Barrett-Connor, 1995 (Table A4.25)
The study examined the accuracy of neuropsychological measures at detecting Dementia
of the Alzheimer's Type (DAT) in a communitydwelling elderly sample. The participants are stable, upper middle-class, retired older adults who entered the Rancho Bernardo Study, surveying for heart disease risk factors, between 1972 and 1974. The initial sample included 5,052 adults between 30 and 79 years of age, who have been followed until the present. Participants over the age of 65 who returned for a reexamination in 1988 and later and screened positive for cognitive impairment were seen in clinic for diagnostic pmposes (n = 199). A matched control sample of 203 normal elderly participants who screened negative for cognitive impairment was randomly selected for the comprehensive evaluation, which included neurological examination, neuropsychological assessment, standard medical history and examination, and, in some cases, CT scans of the brain. On the basis of the diagnostic evaluation, the group composition was re-assessed. The final sample of normal elderly included 238 participants (97 males, 141 females), with a mean age of 78.4 (6.8), education of 13.8 (2.6), and Dementia Rating Scale (DRS) score of136.8 (5.4). The TMT was administered as part of a larger battery by a trained psychometrist who was blind to the participants' group assignment. Time to completion was reported for the entire sample. In addition, the authors provided optimal cutoff scores and sensitivity/ specificity of the TMT for the diagnosis of DAT: 69%/90% for part A at the cutoff of 66 seconds and 87%/88% for part B at the cutoff of 172 seconds. Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, DRS score, geographic area, history of the project, and recruitment procedures. 3. Rigorous exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. 6. Sensitivity and specificity for optimal cutoff scores for the two parts of the test are reported.
TRAILMAKING TEST
Considerations regarding use of the study 1. The data are not partitioned by age group. 2. No information on IQ is reported. [TMT.25] lvnik, Malec, Smith, Tangalos, and Petersen, 1996 (Table A4.26)
The study provides age-specific norms for the TMT obtained in Mayo's Older Americans Normative Studies (MOANS) projects, which aim at obtaining normative data for elderly individuals on different neuropsychological tests. The total sample consisted of 746 cognitively normal volunteers over age 55; however, only 359 volunteers participated in TMT testing. Mean MAYO FSIQ (which differs somewhat from standard WAIS-R FSIQ) for the whole sample was 106.2 (14.0) and mean Mayo General Memory Index on the WMS-R was 106.2 (14.2). For a description of their samples, the authors refer to their earlier publications. Participants were independently functioning, community-dwelling persons who were recently examined by a physician and had no active neurological or psychiatric disorder with the potential to impact cognition. Age categorization used the midpoint interval technique. The raw score distribution for each test at each midpoint age was "normalized" by assigning standard scores with a mean of 10 and SD of 3, based on actual percentiJe ranks. The authors provided tables of age-corrected norms for each age group. The procedure for clinical application of these data are described in the original article (Ivnik et al., 1996) as follows: first select the table that corresponds to that person's age. Enter the table with the test's raw score; do not use corrected or final scores for tests that might present their own age- or educationadjustments. Select the appropriate column in the table for that test. The corresponding row in the left-most column in each table provides the MOANS Age-Corrected Scaled Score . . . for your subject's raw score; the corresponding row in the right-most column indicates the percentile range for that same score.
Further, linear regressions should be applied to the normalized, age-corrected MOANS scaled scores (A-MSS) derived from
85
the tables, to adjust the patient's score for education. Age- and education-corrected scores for the TMT (A&E-MSS) can be calculated as follows: A&E-MSSn1r=K+(W 1 -
(W2
* A-MSSn.n) * Education)
where the following indices are specified for the two parts of the TMT: Part A
Part B
1.99 1.10 0.21
3.38 1.06 0.29
Education should enter the formula in years of formal schooling. The tables of scaled scores per age group provided by the authors should be used in the context of the detailed procedures for their application, which are explained in Ivnik et al. (1996). Therefore, they are not reproduced in this book. Interested readers are referred to the original article. Table A4.26 in Appendix 4 summarizes sample sizes for different demographic groups. Study strengths 1. Information regarding age, education, gender, ethnicity, occupation, recruitment procedures, and geographic area is reported. 2. The data were stratified by age group based on midpoint interval technique. 3. The innovative scoring system was well described. The authors developed new indices of performance. 4. The sample sizes for most groups are large. 5. Restricted age range in each cell. Considerations regarding use of the study 1. The measures proposed by the authors are quite complicated and might be difficult to use in clinical practice. 2. Participants with prior history of neurological, psychiatric, or chronic medical illnesses were included.
86
TESTS OF ATTENTION AND CONCENTRATION
Other comments 1. The theoretical assumptions underlying this normative project have been presented in lvnik et al. (1992a,b). 2. The authors cautioned that the validity of the MAYO indices depends heavily on the match of demographic features of the individual to the normative sample presented in this article. 3. Correlations of parts A and B with age were 0.30 and 0.53, respectively, whereas correlations with education and gender were negligible.
[TMT.26] Richardson and Marottoli, 1996 (Tables A4.27 and A4.28)
The authors report data for 101 autonomously living elderly participants who comprise a subsample of a cohort of participants in Project Safety, a study on driving performance conducted in New Haven, Connecticut. Individuals with a history of neurological disease or excessive use of alcohol or those who were at risk for dementia based on MMSE scores were excluded. The sample includes 53 males and 48 females, with a mean age of 81.47 (3.30) years and mean education of 11.02 (3.68) years. Part B was administered and scored according to the standard instructions provided in the test manual. The data were divided into two age groups of younger-old (76--80 years) and older-old (8191 years) and two education groups. The results indicated that the mean performance for participants with less than 12 years of education was stable across the younger-old and older-old age groups; however, it was considerably lower than for participants with 2:12 years of education and well below expectation in comparison to the Heaton et al. (1991) norms. For the participants with 2: 12 years of education, performance for the younger-old age group was superior to that of the older-old and comparable to the norms published by Heaton et al. (1991). Study strengths 1. Data for a relatively large sample of very elderly participants are presented.
2. Information on age, education, gender, and geographic location is reported. 3. Exclusion criteria are described. 4. The data are classified into two age groups by two education groups. 5. Means and SDs are reported. Considerations regarding use of the study 1. Only part B part of the test was administered. 2. No information on IQ is reported. 3. Sample sizes for each age-by-education cell are relatively small.
[TMT.27] Hoff, Riordan, Morris, Cestaro, Wieneke, Alpert, Wang, and Volkow, 1996 (Table A4.29) The authors used the TMT in a study exploring the relationship of cocaine use to performance on neuropsychological tests tapping functions of frontal and temporal brain regions. The performance of crack cocaine users was compared to that of a control group consisting of 54 paid male volunteers with a mean age of 32.1 (9.7) years and mean education 15.4 (2.4) years. The sample included 48 white, four black, and two Hispanic participants. Exclusion criteria were a history of medical, neurological, or psychiatric problems; more than moderate use of alcohol (12 oz./week); history of intravenous drug use; and self-reported history of learning disability (with enrollment in special education classes). Study strengths 1. Relatively large sample size 2. The sample composition is described in terms of age, education, and ethnicity. 3. Rigorous exclusion criteria. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Wide age and education range. No information on IQ or gender distribution. 2. Recruitment procedures were not reported. 3. Education level for the sample is high.
87
TRAILMAKING TEST
[TMT.28] Salthouse, Toth, Hancock,
and Woodard, 1997 (Table A4.30) The authors examined controlled and automatic processes underlying memory and attention using the process-dissociation procedure, as well as the uniqueness of age-related influences on these processes. Participants were 115 healthy adults (47% male, 53% female) between the ages of 18 and 78 years, who were recruited from appeals to groups and acquaintances. They were included in the study if they were in "reasonably good health," were not currently students, and had at least 11 years of education. No other exclusion criteria are reported. Participants were administered a battery of neuropsychological tests in their homes. The data were stratified into three age groupings: 18-39 (mean age= 29.0, SD = 4.8; mean education= 15.5, SD = 1.7), 40-59 (mean age=49.1, SD=5.1; mean education= 15.2, SD=2.5), and 60-78 (mean age= 69.2, SD = 5.1; mean education= 15.3, SD = 2.6) years. The TMT was administered according to the standard instructions.
Study strengths 1. Sample size is large. 2. Sample composition is well described in terms of age, education, gender, and various health indices. 3. Recruitment procedures are specified. 4. Data are partitioned into three age groups. 5. Test administration procedures are specified. 6. Means and SDs for the test scores are reported.
Consklerations regarding use of the study 1. Exclusion criteria are not well identified. 2. High educational level for each age group. [TMT.29] Rasmusson, Zonderman, Kawas,
and Resnick, 1998 (Table A4.31)
The authors explored the effect of age and dementia status on TMT performance within the scope of the Baltimore Longitudinal Study
of Aging (BLSA). The sample has been recruited continuously since 1958, and participants were asked to return for testing every other year. The majority of the sample are white (37% female); working or retired from scientific, professional, or managerial positions; graduated from college (71% ); and married. All participants aged 70 years and older and some younger participants who met specific criteria were seen by a neurologist for a clinical evaluation, who classified participants in three categories: cognitively normal, suspect for early dementia, or dementia. The 667 nondemented participants who were included in the TMT portion of the study were 60 years of age or older at the last visit at which the TMT was administered. The mean age of the sample was 74.4 (8.2) years, mean education 16.0 (2.9) years, mean MMSE score 28.6 (1.6), and mean number of errors on the Blessed Mental Status Exam 1.3 (1.8). The TMT was administered according to the standard procedures. A maximum of 300 seconds was allowed for each part. The authors provided a detailed description of the administration procedures. The authors found a significant effect of age on completion times for both parts A and B. Incidence of errors increased with age only for part B. Dementia status was significantly associated with the proportion of participants making errors on both parts A and B, independent of age. The error rates did not increase over a 2-year longitudinal comparison made on a subset of the nondemented sample. The authors described the sensitivity and specificity of various cutoff scores in distinguishing between nondemented participants and those with cognitive dysfunction, based on receiver operating characteristic (ROC) analyses. They evaluated on their sample sensitivity and specificity of the previously reported optimal dementia cutoff score on part B of 172 seconds reported by Cahn et al. (1995). All significant effects were replicated.
Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education,
TESTS OF ATTENTION AND CONCENTRATION
88
3. 4. 5. 6. 7.
8.
gender, geographic area, and recruitment procedures. Mental status was assessed with MMSE and Blessed Mental Status Exam. Adequate exclusion criteria. Performance for very old group (9096 years) is reported. Test administration procedures are thoroughly described. Means and SDs for the test scores and the percentage of participants who made errors on parts A and B are reported. Data are partitioned by four age groups.
Considerations regarding use of the study 1. Education levelfor the sample is very high. 2. No information on IQ is reported. [TMT.30] Miner and Ferraro, 1998 (Table A4.32)
The study examined the role of different information-processing factors and presentation order in TMT performance. The sample consisted of 110 undergraduate students (88 females and 22 males) from the University of North Dakota, with a mean age of 21.7 (5.24) years, who received a course credit for their participation. Their health was assessed with a background information questionnaire and with the Geriatric Depression Scale. The TMT was administered in a counterbalanced order as part of a larger battery. Those participants who received the test in the part B-part A order demonstrated considerably slower performance on part Bin comparison to the group tested in the standard order.
[TMT.31] Crowe, 1998b (Table A4.33)
The TMT and a series of measures derived from it were administered to 98 undergraduate students from La Trobe University in Melbourne, Australia, in order to examine cognitive mechanisms contributing to performance on both parts. Participants were screened for a history of loss of consciousness or other neuropathology. The mean age for the sample was 23.4 (3.1) years, mean education 14.0 (2.3) years, and mean Wide Range Achievement Test (WRAT) Reading score 101.0 (9.0). The authors developed modified procedures in an effort to separate cognitive mechanisms contributing to TMT performance. They concluded that visual search and motor speed contributed to performance on part A, whereas visual search and cognitive alternation contributed to performance on part B. The latter was further influenced by reading level, ability to mentalJy maintain two simultaneous sequences, attention, and working memory. Time to compJetion for both TMT parts is provided. Study strengths 1. Large sample size. 2. The sample composition is described in terms of age, education, gender, WRAT Reading score, and geographic area. 3. Minimally adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported.
Study strengths 1. Relatively large sample. 2. The sample composition is described in terms of age, education, gender, and incentive for participation. 3. Minimally adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. No information on IQ is reported. 2. High educational level of the sample. 3. The data were obtained on Australian participants, which may limit their usefulness for clinical interpretation in the United States.
Considerations regarding use of the study 1. Exclusion criteria are not described. 2. No information on IQ is reported.
and Adams, 1998 (Table A4.34) The authors challenged Dodrill's (1997) findings of no relationship between level of
[TMT.32] Tremont, Hoffman, Scott,
89
TRAILMAKING TEST
intelligence and neuropsychological test performance by presenting data collected from archival files of the University of Oklahoma Neuropsychological Laboratory, stratified by intelligence level. The data included files for 157 patients (71 males and 86 females) between 16 and 74 years of age, with a mean age of 39.38 (15.80) and a mean education of 13.12 (3.26); 143 were Caucasian, nine African-American, and the rest other races, and or unknown. All patients were evaluated for suspected neurological disease, which yielded no biomedical evidence for brain impairment. The TMT was administered as part of the HRB. The results are stratified by three intelligence levels, based on patients' WAIS-R FSIQ. The authors concluded that performance on both parts of the test was affected by intelligence level, with the greatest impact on part B.
later. The composition of the latter sample was 48 Caucasian, one African-American, and one Hispanic, with a mean age of 32.5 (9.27) years, a mean education of 14.98 (1.93) years, and a mean FSIQ of 109.30 (12.29) at baseline. At each probe, participants were screened for neurological disease, head injury, learning disabilities, or other medical illnesses based on an informal interview. They were also screened for psychiatric disorders through a structured clinical interview. None was excluded based on these screens. The TMT was administered according to standard procedures by thoroughly trained and supervised technicians. The authors compared TMT performance at baseline and on the retest using reliable change indices and concluded that TMT scores did not change on the retest. Performance on the TMT for the two probes is reported for the entire sample.
Study strengths 1. Relatively large sample. 2. The sample composition is well described in terms of age, education, gender, VIQ, PIQ, FSIQ, geographic area, and clinical setting. 3. It is presumed that standard administration procedures were used since the TMT was administered as part of the HRB. 4. Means and SDs for the test scores are reported. 5. Data are stratified by intelligence level.
Considerations regarding use of the study 1. Wide age range. 2. Data were collected from patients' files. Though biomedical evidence for brain impairment was negative, this is not a normal sample. [TMT.33] Basso, Bornstein, and Lang, 1999 (Table A4.35)
The study examined the practice effect on repeated administration of several tests over a 12-month interval. The baseline sample consisted of 82 men recruited through newspaper advertisements, who were not paid for their participation. Fifty men out of this sample returned for the repeated testing 12 months
Study strengths 1. Adequate sample size. 2. The sample composition is described in terms of age, education, gender, ethnicity, FSIQ, and recruitment procedures. 3. Adequate exclusion criteria. 4. Test administration procedures are thoroughly described. 5. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. The data are not partitioned by age group. 2. Education level for the sample is high. [TMT.34] Crews, Harrison, & Rhodes, 1999 (Table A4.36)
A control sample of 30 nondepressed women was used in a study on the effect of depression on executive functions in young women. Control participants were recruited via flyers/ sign-up sheets from town and university settings. They did not meet diagnostic criteria according to the ADIS-R and scored within the nondepressed range on the Beck Depression Inventory (BDI). The exclusion criteria were past or present history of neurological problems or psychiatric disorders, alcoholism
TESTS OF ATTENTION AND CONCENTRATION
90
or drug abuse, learning disabilities, concurrent medication/drug usage, eating disorders, or current medical illness. The TMT was administered according to the standard procedures. Test performance is reported for the entire sample.
Study strengths 1. The sample composition is described in terms of age, education, gender, scores on selected WAIS-R tests, and recruitment procedures. 2. Adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. The sample is small. 2. Education level for the sample is high.
[TMT.35] Dikmen, Heaton, Grant, and Temkin, 1999 (Table A4.37) The TMT was used in a study on the psychometric properties of a broad range of neuropsychological measures, based on a sample of 384 normal or neurologically stable adults who were tested twice as part of several longitudinal studies. A group of "friend controls" consisted of 138 individuals who had no history of recent trauma and were friends of head-injured patients. Their mean age was 28.5 (12.2) years and mean education was 12.2 (1.9) years; 60% of the sample were males, and the test-retest interval was 11.1 (0.6) months. A group of "trauma controls" consisted of 121 individuals who had a recent traumatic injury that did not involve the head. They were tested at baseline 1 month after trauma and then 11 months later. Their mean age was 31.2 (13.6) years and mean education was 12.0 (2.6) years; 70% of the sample were males, and the test-retest interval was 10.7 (0.6) months. Both of these groups were tested at the University of Washington under the direction of one of the authors. Twenty percent of friend controls and 46% of trauma controls had preexisting conditions that might affect test performance, the most significant being alcohol abuse or a significant traumatic
brain injury. The rest of the participants in these samples denied any history of conditions that might be expected to affect brain function. The third group, mixed normal controls, consisted of 125 participants who had no history of trauma or disease involving the brain. They were enrolled in longitudinal research projects at multiple sites under the supervision of the neuropsychology laboratories at the University of Colorado and the University of California at San Diego. Their mean age was 43.6 (19.6) years and mean education was 12.0 (3.3) years; 68% of the sample were males, and the test-retest interval was 5.4 (2.5) months. The data are reported for all groups combined. Demographic information for all groups combined is also provided. The mean WAIS FSIQ (Wechsler, 1955) on the initial testing for the three groups combined was 108.8 (12.3). Trails A and B were administered according to the procedures specified by Reitan and Wolfson (1993). Time limits were imposed of 100 seconds on Trails A and 300 seconds on Trails B. The authors provide raw scores for performance at two time probes, as well as various measures of test-retest reliability and magnitude of practice effect. The test-retest reliability over an 11-month interval for Trails A was r = 0. 79 and that for Trails B was r=0.89.
Study strengths 1. Large sample sizes for the three groups. 2. The sample composition is well described in terms of age, education, gender, IQ, geographic area, and setting. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are reported. 5. Information on test-retest reliability is provided. Considerations regarding use of the study 1. Exclusion criteria are not clear]y described. As the authors pointed out, 20% of friend controls and 46% of trauma controls had preexisting conditions that might affect test performance, the most
91
TRAILMAKING TEST significant being alcohol abuse and a significant traumatic brain injury. 2. The data are not partitioned by age group. 3. Time limits were imposed on test performance that deviated from test administration procedures. However, these limits should not have had a noticeable effect on the results.
[TMT.36] Binder, Storandt, and Birge, 1999 (Table A4.38) The authors examined the relationship between performance on psychometric tests and a modified Physical Performance Test (modified PPT) in a sample of 125 adults aged 75 years and older, who participated in trials of exercise or hormone replacement therapy. The study was approved by the Washington University School of Medicine, St. Louis. The mean age for the sample was 82.3 (4.4) years, mean education was 13.5 (3.0) years, 25% were male, and 87% were Caucasian. Indices of physical health, Blessed score, and Geriatric Depression Scale score are reported. Preliminary screening included a medical history; physical examination; the Short Blessed Test of memory, concentration, and orientation; blood and urine chemistries; a chest X-ray; and a cross-validated self-report regarding health problems in the previous 12 months. Exclusion criteria were inability to walk 50 feet independently, active medical problems that would contraindicate performance of a graded exercise stress test, inability to complete the graded exercise stress test or the modified PPT, a score greater than 8 on the Short Blessed Test, inability to provide informed consent due to cognitive impairment, and inability to follow the directions for the psychometric tests due to visual or auditory impairments. The standard administration procedure was used except that the maximal allowed time for both parts A and B was 180 seconds. Time to completion and the number of lines correctly drawn within the allotted time were recorded. The authors found that part B performance was significantly associated with total modified PPT score.
Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, ethnicity, indices of physical health, Blessed score, Geriatric Depression Scale score, geographic area, and research setting. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. The data are not partitioned by age group. 2. No information on IQ is reported.
[TMT.37] Ruffolo, Guilmette, and Willis, 2000 (Table A4.39) Time to completion and number of errors in TMT performance are compared for four clinical I experimental groups and a control group. The latter sample included 49 introductory psychology students, graduate students, and employees of a local social services agency, who were screened for any prior head injuries. The TMT was administered according to standard instructions.
Study strengths 1. Adequate sample size. 2. The sample composition is described in terms of age, education, and setting. 3. Minimally adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores and error rates are reported.
Considerations regarding use of the study 1. The data are not partitioned by age group. 2. Education level for the sample is high. 3. No information on gender and IQ.
[TMT.38] Saxton, Ratcliff, Newman, Belle, Fried, Vee, and Kuller, 2000 (Table A4.40) The TMT was administered as part of the Memory and Aging Study (MAS) conducted as an ancillary project to the CHS,
92
TESTS OF ATTENTION AND CONCENTRATION
a multicenter observational study of heart disease and stroke in Washington County, Maryland, and Pittsburgh, Pennsylvania. No selection criteria were used. Data were analyzed for a sample of 989 participants (444 males and 545 females), who completed all of the cognitive tests included in the battery. The mean age for the sample was 73.63 (4.45) years, and mean education was 13.23 (2.85) years; 93.9% of the sample were white. This sample was divided into two clinical groups and a "no disease" group, based on cardiovascular status. Times to completion for the TMT for the "no disease" sample of 357 participants are reproduced in Table A4.38. Demographic characteristics for this sample are not reported by the authors. However, we assume that they are similar to the demographics for the entire sample described above.
protocol included a standardized general medical history and physical examination; a detailed neurological and mental status examination; hematological, metabolic, and serological tests; and neuroimaging when appropriate. Relevant medical records were abstracted. The sample included 302 females and 181 males, with a mean age of 74.9 (4.4) years; 31.9% of participants had less than a high school education. Times to completion for the two parts of the TMT were reported for the entire sample. Results of the ROC analysis suggested that TMT part B was one of the tests that had the highest accuracy in discriminating between nondemented participants and those who were in the preclinical stages of DAT (area under the curve=0.773).
Study strengths 1. Large sample size. 2. The sample composition is described in terms of age, education, gender, setting, geographic area, and recruitment procedures. 3. Means and SDs for the test scores are reported.
Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, history of the project, and geographic area. 3. Rigorous exclusion criteria. 4. Means and SDs for the test scores are reported. 5. Information on the diagnostic accuracy of part B is provided.
Considerations regarding use of the study
Considerations regarding use of the study
1. No exclusion criteria. 2. The data are not partitioned by age group. 3. No information on IQ is reported. 4. Demographic characteristics for the "no disease" group are not reported. [TMT.39] Chen, Ratcliff, Belle, Cauley, DeKosky, and Ganguli, 2000 (Table A4.41) A control sample of 483 elderly nondemented individuals was derived from a communitybased multiwave prospective study, the Monongahela Valley Independent Elders Survey (MoVIES), in southwestern Pennsylvania. The purpose of the study was to identify cognitive measures that are most accurate in discriminating between individuals with presymptomatic DAT and nondemented individuals. The control participants remained nondemented over a 10-year follow-up period. The study
1. The data are not partitioned by age group. 2. No information on IQ is reported. 3. The number of participants with less than a high school education is reported. However, mean education and SD is not reported. [TMT.40] Small, Graves, McEvoy, Crawford, Mullan, and Mortimer, 2000 (Table A4.42)
The authors examined the relationship between APOE genotype and cognitive functioning in normal aging based on a sample of 413 adults between 60 and 85 years of age, with a mean age of 72.90 years, who were randomly selected from a larger sample of participants in the community-based, crosssectional Charlotte County Healthy Aging Study conducted in south Florida. The sample was stratified into two age groups, young-old
93
TRAILMAKING TEST
(00-73 years, n = 202) and old-old (74-85 years, n = 211), and further divided into two groups according to the presence of the APOE-e4 allele. The sample was almost exclusively white. Education, gender distribution, and self-rated indices of health status are reported for each group. Intelligence levels were estimated using the Spot the Word Test. The TMT was administered according to standard procedures.
Study strengths 1. Large sample sizes per group. 2. The sample composition is well described in tenils of age, education, gender, geographic area, and research setting. 3. Test administration procedures are specified. 4. Data are stratified by two age groups. 5. Estimated intelligence levels are reported. 6. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Exclusion criteria are not clearly described. 2. Education level for one of the groups is high. [TMT.41] Stuss, Bisschop, Alexander, Levine, Katz, and lzukawa, 2001 (Table A4.43) The study examined the relationship of the TMT to focal frontal lobe lesions. Time to completion and number of errors performed by the clinical groups with different lesion localizations and the control group were compared. The sample of 19 control participants with a mean age of 53.4 (13.6) years and a mean education of 13.7 (2.5) years was drawn from a general popu1ation pool of volunteers. The participants were fluent English speakers with adequate ability to read and had no prior history of any neurological or psychiatric disorders. The TMT was administered according to standard procedures. All participants continued working on the test until they completed the task. Time to completion for both TMT parts, the B-A difference, and the proportional score (B-A)/A are reported in both raw
scores and their logarithmic transformations. Only four control participants made one error on part B. The resu1ts suggest that error analysis is a more useful method of categorizing performance than time to completion. All patients who made more than one error on part B had frontal lesions.
Study strengths 1. The sample composition is described in terms of age, education, gender, estimated IQ, and clinical setting. 2. Adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scores and derived measures are reported.
Considerations regarding use of the study 1. The sample size is small. 2. Age range is wide. 3. The data were obtained on Canadian participants, which may limit their usefulness for clinical interpretation in the United States. [TMT.42] Bell, Hermann, Woodard, Jones, Rutedd, Sheth, Dow, and Seidenberg, 2001 (Table A4.44) The TMT was administered as part of a larger battery in a study examining the neurobehavioral status of patients with early-onset temporal lobe epUepsy. The control group included 29 friends, relatives, and spouses of patients (72% female), who were between ages 16 and 60 years, with a mean age of 34.4 (12.5) years; FSIQ (as measured with the WAIS-111 7-subtest short form) between 69 and 110, with a mean FSIQ o£97.7 (6.4); and mean education of 13.0 (1.7) years. Exclusion criteria were current substance abuse, psychotropic medication use, medical or psychiatric condition that could affect cognitive functioning, an episode of loss of consciousness longer than 5 minutes, developmental learning disorder, and repetition of a grade in school. Time to completion for both TMT parts is provided.
94
TESTS OF ATTENTION AND CONCENTRATION
Study strengths 1. The sample composition is well described in terms of age, education, gender, FSIQ, and recruitment criteria. 2. Adequate exclusion criteria. 3. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. The sample is small and includes a wide age range. 2. The data are not partitioned by age group.
[TMT.43] Stein, Kennedy, and Twamley, 2002 (Table A4.45)
The authors compared cognitive functioning in female victims of domestic violence and nonvictimized women. The control sample included 22 participants who were recruited through posted advertisements and ongoing personal contacts. The study was conducted in San Diego, California. The control group included 22 participants who had no lifetime exposure to a posttraumatic stress disorder Diagnostic and Statistical Manual-IV criterion A stressor, spoke English fluently, and had at least an 8th-grade reading ability. Exclusion criteria were use of any psychotropic medications within 6 weeks before participation, use of oral or intramuscular steroids within 4 months before participation, history of learning disability or attention-deficit disorder, head injury with loss of consciousness greater than 10 minutes, seizure disorder, drug or alcohol use, and history of psychotic illness or neurological disorder. Mean age for the sample was 29.4 (10.7) years and mean education was 13.9 (1.5) years. Time to completion for both TMT parts as well as the B-A difference are provided. Study strengths 1. The sample composition is described in terms of age, education, gender, geographic area, setting, and recruitment procedures. 2. Rigorous exclusion criteria. 3. Means and SDs for the test scores and the B-A difference are reported.
Considerations regarding use of the study 1. The sample is small. 2. Wide age span. 3. No information on IQ is reported. 4. All-female sample.
[TMT.44] Drane, Yuspeh, Huthwaite, and Klingler, 2002 (Table A4.46)
The purpose of the study was to examine the relationship of TMT time to completion as well as derived indices, such as difference scores and ratio scores, with demographic variables. The sample consisted of 285 adults (205 males and 80 females) between 18 and 90 years of age, who participated in a comprehensive neuropsychological normative project. They were recruited through a variety of civic organizations. Participants did not have any history of known psychiatric or neurological disorder, were living independently, had no history of substance abuse, and were not treated with psychotropic medications at the time of the examination, per clinical interview. All participants performed within the normal range on the MMSE. Mean age for the sample was 48.30 (19.68) years, mean education was 12.98 (2.65) years, and mean MMSE score was 28.63 (1.61). The TMT was administered according to standard procedures. Time to completion, BA difference, and B:A ratio are reported for eight age groups. The authors evaluated the sensitivity of the B:A impairment cutoff score of 3.0 that was suggested by Lamberty et al. (1994) and concluded that rates of false-positive misclassification are unacceptably high, especially for older age groups. Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, MMSE scores, setting, and recruitment procedures. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. The data are partitioned by eight age groups.
TRAilMAKING TEST
6. Means and SDs for the test scores as well as derived indices are reported. Considerations regarding use of the study 1. Demographic characteristics for age groups are not reported. 2. Overall sample is large, but some individual cells are small. 3. No information on IQ is reported. [TMT.45] Grady, Yaffe, Kristof, Lin, Richards, and Barrett-Connor, 2002 (Table A4.47)
Data on TMT part B were collected for a subsample of 1,063 older women in a multicenter study examining the effect of hormone replacement therapy on cognitive functioning in postmenopausal women. This is a follow-up on the articles reporting normative data for different subgroups from the same study (Barrett-Connor & Goodman-Gruen, 1999; Kritz-Silverstein & Barrett-Connor, 2002). The participants were younger than 80 years old and had established coronary disease and an intact uterus. They were randomly assigned to treatment vs. placebo groups in a doubleblind experiment. They were followed for 4.2 (.04) years. At the end of the trial, cognitive functioning was measured in both groups. The data are reported for 517 participants in the treatment group and 546 in the placebo group, separately. The mean age for the two groups at the time of testing was 66.3 (6.4) and 67.3 (6.3) years, respectively, and mean education was 12.7 (2. 7) years for both groups; approximately 90% of the sample were white. There are no notable differences between the groups on any demographic variables or physical indices. Trails B was administered according to standard procedures. The authors concluded that there were no differences between the treatment and placebo groups on any cognitive measures. Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, physical findings, clinical setting, and selection criteria.
95
3. Test administration procedures are specified. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Participants had established coronary disease. It is unclear if any neurological exclusion criteria were used. 2. All-female sample. 3. The data are not partitioned by age group. 4. No information on IQ is reported. [TMT.46] Miller, 2003, Personal Communication (Table A4.48)
The investigation used participants from the MACS study. The data were collected from 949 seronegative homosexual and bisexual males for the purpose of establishing normative data for neuropsychological test performance based on a large sample. These data represent an update on the data provided by Seines et al. (1991). Mean age for the sample was 38.0 (7.5) years and mean education was 16.3 (2.4) years; 91.5% were Caucasian, 3.0% Hispanic, 4.5% black, and 1% other. All participants were native English speakers. The TMT was administered according to standard instructions. The data are partitioned by three age groups (25--34, 35-44, 45-59) times three education levels ~ 16, 16, > 16 years). Study strengths 1. The overall sample size is large, and most of the individual cells have more than 50 participants. 2. Normative data are stratified by age x education. 3. Information on age, education, ethnicity, and native language is reported. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. All-male sample. 2. No information on IQ is reported. 3. No information on exclusion criteria.
96
TESTS OF ATTENTION AND CONCENTRATION
[TMT.47] Tombaugh, 2004 (Table A4.49)
The author provided normative data for 911 community-dwelling adults between 18 and 89 years of age. The data for volunteers who participated in earlier studies were analyzed. Out of this sample, 823 participants were recruited through booths at shopping centers, social organizations, places of employment, psychology classes, and word of mouth. Exclusion criteria were history of neurological disease, psychiatric illness, head injury, or stroke, per self-report; the remaining 88 participants represent a subset of individuals who had received a consensus diagnosis of "no cognitive impairment" made by physicians and clinical neuropsychologists, based on history, clinical and neurological examination, and an extensive battery of neuropsychological tests, over two successive evaluations separated by approximately 5 years. The author pointed out that all participants 18-24 years old were university students. Mean age for the sample was 58.5 (21.7) years, mean education was 12.6 (2.6) years, and the male/female ratio was 4081503. All participants scored above 23 on the MMSE, with a mean of 28.6 (1.5), and below 14 on the Geriatric Depression Scale, with a mean of 4.1 (3.4). Elderly participants were also excluded on the basis of a clinical evaluation of depression. Trails A and B were administered as part of a larger battery according to the Spreen and Strauss (1998) guidelines. The results indicated that test performance for both Trails A and B was affected by age. Performance on Trails B was also related to education, particularly in individuals over 54 years of age. Therefore, tables of raw data and percentiles are stratified into 11 age groups. For ages 55 and above, they are further partitioned into two education levels (~12 and 12+years).
Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, setting, and recruitment procedures. 3. Rigorous exclusion criteria.
4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. 6. Data are stratified by age x education.
Considerations regarding use of the stw:hj 1. As the authors pointed out, the sample size of the oldest group is small. 2. No information on the intellectual level of the sample is reported. 3. The data were obtained on Canadian participants, which may limit their usefulness for clinical interpretation in the United States.
RESULTS OF THE META-ANALYSES OF THE TRAILMAKING TEST DATA (See Appendix 4m)
Data collected from the studies reviewed in this chapter were combined in regression analyses in order to describe the relationship between age and test performance and to predict expected test scores for different age groups. Effects of other demographic variables were explored in follow-up analyses. The general procedures for data selection and analysis are described in Chapter 3. Detailed results of the meta-analyses and predicted test scores across adult age groups for parts A and B are provided in Appendix 4m. Educational range was unevenly represented, with a large gap between 8.5 and 11.59 years at the lower extreme. Based on the preliminary analyses, the data point with 8.5 years of education was retained in the main analyses but dropped in the analyses generating an education-correction factor (see below). After data editing for consistency and for outlying scores, 28 studies for Trails A and 29 for Trails B, which generated 89 data points for each part based on totals of 6,317 and 6,360 participants, respectively, were included into the analyses. Quadratic regressions of the test scores on age yielded R2 values of 0.905 for Trails A and 0.876 for Trails B, indicating that 91% and 88% of the variance in test scores for the two parts, respectively, is accounted for by the
TRAILMAKING TEST
model. Based on these models, we estimated scores for both parts for age intervals between 16 and 89 years. If predicted scores are needed for age ranges outside the reported boundaries, with proper caution (see Chapter 3), they can be calculated using the regression equations included in the tables, which underlie calculations of the predicted scores. It should be noted in the context of acrosscondition comparisons that mean age for Trails B is somewhat higher than for Trails A because data for one study based on the older sample were reported for Trails B only. Quadratic regressions of SDs on age yielded R2 of 0.602 for Trails A and 0.676 for Trails B, indicating an increase in variability with advancing age, consistent with the literature. Predicted SDs, based on these models, are reported. Examination of the effects of demographic variables on the test scores revealed that education is a significant predictor of test performance for both parts A and B. Values of estimated between-study variance (tau 2 ) for regression of test means with education were considerably lower than the corresponding values for regression without education. This suggests that education explains a considerable amount of the heterogeneity in the outcome variable. Inclusion of education into the regression of test means on age considerably improved the R2 (see Appendix 4m). In this analysis, regression with and without education was rerun on a subset of studies that reported education for each data point. In addition, the group with 8.5 years of education was dropped because of a large gap at the lower extreme of the educational range, with the next lowest level available for analyses being 11.59 years of education. Therefore, the data set for Trails A was based on 25 studies and that for Trails B, on 26 studies. The t-value for education is -3.00 (p = 0.006) for Trails A and -2.56 (p = 0.017) for Trails B. The coefficient for education of -1.308, rounded to -1.31, for Trails A indicates that with a 1-year decrement in education we expect a 1.31-second slowing in test performance. This suggests that the table of predicted values is accurate for individuals with
97
13.87 years of education, rounded to 14 years (which is the mean education for the original data set) in the education-correction tables. With every year of education above or below this level, we suggest correcting the obtained score by adding or subtracting 1.31 to or from the predicted score given in the table for the relevant age group (see Chapter 3 for an example). The coefficient for education for Trails B is -6.446, rounded to -6.45 in the education-correction table. Thus, we suggest correcting the obtained score by adding or subtracting 6.45 to or from the predicted score given in the table for the relevant age group. The SDs for the person's actual age group should be used with the educationcorrected scores. Correction factors for different education levels for both Trails A and B are included in Appendix 4m. These corrections should be applied within the education range of 12-17 years since this is the range available in the original data set. Unfortunately, data for lower educational levels were not available in the literature. Any extrapolation of scores outside the reported range should be made with caution. IQ did not have a significant effect on the test scores in our data set. Given consistent evidence of the effect of intellectual level on test performance described in the literature, our lack of association is likely due to insufficient data regarding IQ levels reported in the studies reviewed (only seven studies reported IQ levels, which generated 21 data points for each TMT part). The difference in mean scores for the two genders across 17 studies reporting scores for males and 15 studies reporting scores for females separately was negligible: 0.704 in favor of males for Trails A and 0.379 in favor of females for Trails B. Strengths of the analyses 1. Total sample sizes of 6,317 for Trails A and 6,360 for Trails B. 2. R2 of 0.905 for Trails A and 0.876 for Trails B, indicating a good model fit. 3. Postestimation tests for parameter specifications did not indicate problems with normality.
98
TESTS OF ATTENTION AND CONCENTRATION
4. Effect of education was evident, which is consistent with the literature. Significant effect of education on both parts A and B called for corrections for education. Limitations of the analyses 1. Postestimation tests for parameter specifications indicated lack of homoscedasticity for both Trails A and B. Variability in scores across age groups is greater than expected by chance, with a considerable increase in variability in the older age groups, as reflected in the size of the confidence intervals. Therefore, the predicted scores are less accurate for the older age ranges than for the younger ranges. 2. Levels of education and IQ for the samples included in the review are high. Although corrections for education are provided, mean IQ levels are 116.69 (7.48) for Trails A and 116.88 (7.80) for Trails B. According to the literature, there is a strong relationship between test performance and IQ level. Therefore, the predicted values are likely to underestimate expected time to completion for individuals with average and lower than average intellectual levels.
CONCLUSIONS
The TMT has achieved high popularity as a screening tool for cognitive impairment. There is ample evidence supporting the sensitivity of performance times for parts A and B to cerebral dysfunction in mild traumatic brain injury, in the differential diagnosis of dementia, in detecting attentional I concentrational dysfunction in children and adults, and in other conditions. Although poor performance on the TMT is viewed as a nonspecific finding due to the complexity of the mechanisms contributing to test performance, the TMT is most sensitive to attentional I
concentrational and executive problems, as well as to psychomotor slowing. It would be misleading to view the TMT as the test for organic brain pathology. For example, patients with memory deficits associated with temporal lobe pathology may perform normally on this test, and if the TMT is administered in isolation, the serious processing difficulties of this population might be overlooked. Most commonly, clinical use of the TMT is based on a norm-referenced interpretation of completion times for each condition. Use of the TMT cutoff criteria for brain impairment is now quite infrequent (Spreen & Strauss, 1998). According to the literature, performance on the TMT (especially part B) is highly affected by age and education. This finding is supported by the results of the meta-analyses discussed above. Thus, it is of utmost importance to interpret individual scores with reference to the relevant normative data. Among the derived measures, B:A ratio was found to be diagnostically useful in several studies, with modest support for use of the B-A difference. There is little consensus on the utility of error analysis. Further research is needed to gain better insight into the diagnostic utility of the derived measures and error analysis with different clinical populations. The optimal format for data reporting in future investigations is in age-by-education and I or -by-intelligence level cells. Given the demonstrated utility of the B-A difference, B:A ratio, and error analysis with some clinical groups, reporting of statistics for these indices would further facilitate interpretation of the results and contribute to diagnostic decision making. As to the use of the cutoffs, Soukup et al.'s (1998) recommendation to report cutoff scores for borderline (15th percentile) and defective (< 5th percentile) ranges in addition to the descriptive statistics should be given careful consideration due to the positive skew in the distribution of TMT scores.
5 Color Trails Test
BRIEF HISTORY OF THE TEST Because of their ease of administration and sensitivity to brain damage, trail-making tasks have long been among the most widely used measures in neuropsychological practice (Lezak et al., 2004). The original Trail Making Test (TMT) was developed in 1944 (see Chapter 4); however, it relied upon the English alphabet as part of the test stimuli, thereby limiting its use in non-English-speaking countries. Further, its use in English-speaking countries was problematic when assessing adults with language and reading disorders, limited education, or English as a second language. The Color Trails Test (CTT) was created in response to a request made in 1989 by the World Health Organization (WHO) for a test that would be similar to the TMT (1944) in terms of its sensitivity and specificity yet allow broader application in cross-cultural contexts. The WHO wanted a test with standardized, equivalent, multiple forms for test-retest purposes. Additionally, although the TMT had been translated into other languages, its basic linguistic and phonological properties continued to limit its application in special-needs contexts (e.g., language disorders, specific reading disorders, or illiteracy). The WHO also wanted standardized test stimuli to insure the new test's reliability (Maj
et al., 1993). Because of the TMT's popularity and availability in the public domain, it became perhaps the most frequently photocopied neuropsychological test of the 20th century. Poor photocopy quality often blurred the target stimuli, and it was not uncommon to discover TMT protocols in which the stimuli closest to the edge of the page had been cut off due to improper placement of the original on the photocopy machine. Successive generations of photocopies yielded slightly smaller or slightly larger versions of the test, thereby changing the distance between stimuli. Because the "time to complete" score obtained for the test reflects not only visual scanning and psychomotor speed but also the distance traveled between stimuli, the problem of not having a standard version of the TMT would necessarily hamper the comparability of research and clinical findings. Therefore, it was important to develop a format of the test that would also discourage photocopying (D'Elia et al., 1996). The CTT is similar to the TMT in that it requires cognitive flexibility and visuomotor skills to complete the task. Additionally, the CTT is similar to the TMT in that it is administered under timed conditions. However, the CTT relies on the use of numbered, colored circles and universal sign language symbols to solve the task, rather than relying 99
100
TESTS OF ATTENTION AND CONCENTRATION
on English (or any other) alphabet letters as part of the test stimuli. Instructions for the CTI may be administered verbally or nonverbally, using only visual cues. Both the TMT and CTI are paper-andpencil tests that are administered in two parts on an 8 ~ x 11" page. However, for the CTI1, the numbers 1-25 are printed within colored circles. All even-numbered circles are printed with a bright yellow background and all oddnumbered circles, with a vivid pink background. These background color differences are perceptible even to color-blind individuals. The individual is instructed to quickly draw a continuous line that connects the numbers in consecutive/sequential order. The incidental fact that color alternates with each succeeding number is not highlighted or discussed with the subject since attention to color sequence is not necessary for completion of the CTIL The CTI2 introduces a divided attentional component, requiring attention to the alternating and sequencing of the stimuli. For the CTI2, the number 1 circle is printed against a vivid pink background; however, the numbers 2-25 are presented twice: once with a vivid pink background and once with a bright yellow background. The subject has to again quickly connect the numbers in sequence; however, the task requires alternation of colors as the sequence of numbers advances, so the subject must ignore distracter circles that contain the correct number but are printed in the wrong color background (e.g., start with pink 1 and avoid pink 2, select yellow 2, avoid yellow 3, select pink 3, avoid pink 4, select yellow 4, etc.). Therefore, there is always a distracter number that must be avoided because it is printed against a color background that is not appropriate to the sequence. Before the CTI1 and CTI2 are administered, nontimed practice trials are administered to insure that the subject understands the task. When the CTI1 and CTI2 forms are administered, however, the time required to complete each form is noted. Subjects must complete each form of the test in :5240 seconds, or that part of the test is discontinued. The CTI1 is a less cognitively demanding task because it requires the subject to perceptually track only a single specified sequence
(number), whereas the CIT2 requires the subject to simultaneously track both a specified number sequence and a separate color sequence. Therefore, an interference index was developed to quantify and highlight the relative difference regarding the effects of visual attention and perceptual tracking required on the CTI1 from the more demanding sustained, divided attention and more complex perceptual tracking required by the CIT2. Interference Index = (CTT2 time raw score- CITl time raw score)
C1Tl time raw score The interference index reflects the comparison of the subject's performance on the CTI1 relative to the CIT2. ThiS index is expressed as a function of the level of performance on the CTil. Therefore, the index score is a relatively "pure" measure of the extent of interference (if any) attributable to the more complex divided attention and the alternating sequencing tasks required by the CIT2. For example, an interference index score of 0 indicates that the subject's time to complete the CTI1 was the same as that to complete the CTI2 (i.e., no interference). An inter- ference index score of 1.0 indicates that the subject required twice as long to complete the CIT2 as the CTI1, whereas a score of 3.0 indicates that it took the subject four times as long to complete the CIT2 relative to the CTI1 (i.e., significant interference). As the interference index score increases, the increasing score suggests the presence of greater susceptibility to cognitive interference from alternating and sequencing demands (i.e., decreased cognitive flexibility). The WHO's request for a test that would allow broader application in cross-cultural contexts seems quite reasonable. Ideally, neuropsychological procedures that assess the effects of conditions affecting neurological functioning, including brain injury, infectious diseases (e.g., HIV) and other pathologies, should be as culture-free as possible; but is it possible to develop a totally culture-free neuropsychological test? Perhaps not. If this is the case, then procedures should be developed that allow, at minimum, enhanced assessment
COLOR TRAILS TEST
in cross-cultural contexts. Although color perception may not be a totally culture-free phenomenon (Bomstein, 1973), color was used as the test stimulus for the categorical shifting in the CIT because it typically transcends most cultural distinctions. Also, the decision to use numbers and colors was based on the fact that both are universal symbols that place limited demands on language production or knowledge (D'Elia et al., 1996). In cross-cultural pilot tests of the CIT, it was found that individuals in poor Third World countries in Africa, Asia, and South America, with little or no formal education, know and recognize the Arabic numbers 1-25, perhaps because they have to barter for goods and services (Maj et al., 1991). In developing the CIT, it was hypothesized that the alternating shift between number and color sequences would require more effortful executive processing than the shift between numbers and letters of the alphabet. Specifically, in the United States, the English alphabet is learned at a very early age. Students are taught not only to recite the alphabet but to sing it as well. As such, the alphabet sequence is strongly encoded. Indeed, it is not unusual to observe a premorbidly high-functioning individual presenting with a history of moderate brain injury who is able to call upon sufficient brain reserve capacity (Satz, 1993) to complete the TMT part B within a nominally "normal" time limit. Interestingly, these individuals have been occasionally observed to hum or sing the alphabet (although almost inaudibly) while solving part B. Removal of reliance on the English alphabet to solve the CIT2 was hypothesized to effectively eliminate this potential performance confound. Use of colors also permitted the development of identical, equivalent forms of the test for repeat administration in longitudinal research. Currently, there are four versions of the CIT (i.e., forms A, B, C, and D). Form A is the standard test form, on which normative data were collected. Therefore, form A is the only one that should be used for clinical evaluation. The subsequent forms were created by printing a mirror-image version, a 90degree rotated version, and a 90-degree mirrorimage version of form A. This method of creating alternate forms insured that the distance traveled between stimuli was standard
101
for all forms. 'The alternate forms (i.e., forms B, C, and D) are considered experimental and should be used only in research settings. The scoring of the CIT differs from that of the TMT, to allow quantification of the cognitive slippage that often occurs following mild brain injury. For instance, following mild cerebral insults, patients commonly report subtle changes in sequencing, planning, and ability to inhibit specific responses. They frequently complain that it takes extra effort to perform most tasks they formerly completed without much thought or effort. Unfortunately, current approaches to characterizing performance on most neuropsychological tests allow empirical quantification of only gross errors but not the more subtle forms of cognitive slippage frequently described by these patients. The nearmiss score was developed to allow empirical quantification of this type of cognitive slippage. This response occurs when a subject initiates an incorrect response but self-corrects before actual connection to a distracter circle. Reporting near-miss scores allows the examiner to comment on the degree to which a patient is susceptible to distracters. Other scoring criteria include quantification of prompts, numbersequence errors, and color-sequence errors. In the course of preparatory work for the WHO cross-cultural study on the neuropsychiatric aspects of HIV-1 infection, Maj et al. (1993) evaluated the CIT in comparison to translated versions of the TMT at four world sites: Munich, Germany; Bangkok, Thailand; Naples, Italy; and Kinshasa, Zaire. Those preliminary results suggested that the CIT was not only sensitive to HIV-1-associated cognitive impairment but also more culturally fair than the TMT. 'The sensitivity of the test was found to hold across the different cultures examined. However, whether it would hold in other cultures was unknown at that time, and more work still needs to be done.
RELATIONSHIP BETWEEN CTT PERFORMANCE AND DEMOGRAPHIC FACTORS 'There are currently four normative reports regarding the CIT. Analyses conducted on
102
TESTS OF ATTENTION AND CONCENTRATION
the CTT data obtained from the U.S. standardization manual revealed that increasing age adversely affects performance on both CTT1 and CTT2. Increasing education was found to enhance performance on CTT2 but not on CTTl. Gender and the interactions between gender and age were not significantly related to CTT performance scores after the effects of age were removed (D'Elia et al., 1996). Ponton et al. (1996) and LaRue et al. (1999), in examining their respective normative data from their Hispanic samples, also found a negative performance association between increasing age and CTT1 and 2 test scores. In addition, they found a positive relationship between education and CTI1 and 2 scores. No gender effects were found. Similarly, Hsieh and Riley (1997), in examining the normative data from their Chinese sample, found a negative performance association between increasing age and CTI1 and 2 test scores and a positive performance association between increasing education and CTI1 and CTT2 scores. No gender effects were found in the Chinese sample. In summary, research suggests that performance on the CTT is enhanced by education and negatively affected by increasing age. No gender effects have been reported. The CTT is available in adult and child formats from Psychological Assessment Resources (see Appendix 1 for ordering information). Normative data for the Children's CTT can be found in Uorente et al. (2003).
METHOD FOR EVALUATING THE NORMATIVE REPORTS Our review of the literature located four normative reports: one for primarily MexicanAmerican, Central and South American, Spanish-speaking adults (Ponton et al., 1996); one for Mandarin-speaking mainland Chinese (Hsieh & Riley, 1997); one for senior adult bilingual Spanish/Mexican Americans (LaRue et al., 1999); and the U.S. standardization manual (D'Elia et al., 1996). To adequately evaluate the CIT normative reports, five key criterion variables were deemed critical. The first four of these relate to
subject variables and the last to procedural variables. Minimal requirements for meeting the criterion variables were as follows.
Subject Variables Sample Size
Fifty cases are considered a desirable sample size. Although this criterion is somewhat arbitrary, a large number of studies suggest that data based on small sample sizes are highly inHuenced by individual differences and do not provide a reliable estimate of the population mean. Sample Composition Description
Information regarding medical and psychiatric exclusion criteria is important. It is unclear if geographic recruitment region, socioeconomic status, occupation, ethnicity, handedness, or recruitment procedures are relevant. Until this is determined, it is best that this information be provided. Age Group Intervals
This criterion refers to grouping of the data into limited age intervals. This requirement is especially relevant for this test since a strong effect of age on CTT performance has been demonstrated in the literature. Reporting of IQ and/or Education level
Given the association between educational level and CTT scores, information regarding highest educational level completed should be reported. Optimally, normative data should be categorically reported by age and education level. It is unclear ifiQ is relevant, so until this is determined, it is best that information on IQ be provided.
Procedural Variables Data Reporting
Means, standard deviations, and preferably ranges for total time in seconds for each part of the CTT should be reported. Additional information regarding prompts, near-misses, errors, and interference index would facilitate interpretation of test performance.
103
COLOR TRAILS TEST
SUMMARY OF THE STATUS OF THE NORMS In tenns of subject variables, the standardization manual as well as the Ponton et al. (1996) and LaRue et al. (1999) studies provide perfonnance data grouped by ag~ and education categories. Hsieh and Riley (1997) present data separately for age and for education. Although the total sample for the U.S. standardization study is 1,528, unfortunately the manual does not indicate the sample size within each of the 30 age/education categories. Whereas in the LaRue et al. (1999) study sample sizes for most of the age and education categories are generally adequate, the sample size for each of the age and education categories reported by Ponton et al. (1996) is small. Similarly, the sample size for each of the age categories reported by Hsieh and Riley (1997) is small. For the standardization study, Ponton et al. (1996), LaRue et al. (1999), the younger age group categories are generally narrowly defined and therefore adequate; however, the older age group categories tend to be very broad. Hsieh and Riley (1997) report age data in 10-year increments as well as data according to the age groupings found in the U.S. standardization manual. Regarding procedural variables, all studies report means and SDs for time to completion for CTfl and 2. Only the U.S. standardization manual reports data regarding errors, near-miss responses, and prompts. The standardization manual and the Hsieh and Riley (1997) study provide data on the interference index. The LaRue et al. (1999), Ponton et al. (1996), and D'Elia et al. (1996) normative data were collected from participants residing in the United States. The Hsieh and Riley (1997) data were collected from participants residing in the mainland People's Republic of China. In this chapter, nonnative publications are reviewed in ascending chronological order. The text of study descriptions contains references to the corresponding tables identified by number in Appendix 5. Table A5.1, the locator table, summarizes infonnation provided in the studies described in this chapter.
SUMMARIES OF THE STUDIES This section presents critiques of the nonnative studies for the CTT. [CTT.1] D'Eiia, Satz, Uchiyama, and White, 1996
This is the original standardization of the CTT. The manual reports nonnative data from a sample of 1,528 healthy, nor.mal i.ndi~duals residing in a variety of settings m diverse regions of the United States. Participants were excluded if there was a history of head trauma, neurological disorder, or substance abuse. The data were collected during the course of several norming studies with distinct samples, including medically and psychiatrically normal participants from a longitudinal cardiovasc~lar epidemiological study that has ~ee? ongomg since 1960; medically and psychiatrically normal pilots from four major U.S. commerci~ airline manufacturing corporations undergomg a yearly medical examination as part of a na~on ally mandated Federal Aviation Administration/ Equal Employment Opportunity Commission study to obtain nonnative data on neuropsychological functioning of pilots across the age span; medically and psychiatrically no~al residents living in an independent retirement community in southern California; medically and psychiatrically healthy, HN-negative, bisexual and homosexual men participating in a multi-center epidemiological study; and medically and psychiatrically healthy ~rican American men living in Los Angeles With no history of drug!alcohol abuse, participating in a larger study of the neuropsychological, medical, and psychosocial consequences of polydrug abuse and HN. The data are stratified by age and education. There are five age categories: 18-29, 3044, 45-59, 60-74, and 75-89. For each age category, data are reported for perfonnance of those with education of :58, 9-11, 12, 13-15, 16, and 2::17 years. The sample is primarily male; women comprise only 12% of the sample. The manual states: Gender and the interactions between gender and age were not significantly related to CIT raw scores
104
TESTS OF ATTENTION AND CONCENTRATION
after the effects of age were removed, explaining between 0.4% to 2.4% of the variance. Therefore, the relatively small proportion of women in the normative sample does not constitute a threat to either the validity or the utility of the CIT. (D'Elia et al., 1996)
Spanish-language administration instructions and preliminary normative data for Hispanics are provided in the manual. The preliminary normative data are from a sample of healthy, normal Hispanics living in southem California, participating in a large, ongoing normative study. The Hispanic data are reported separately since all participants in this subsample were educated outside the United States and were primarily Spanishspeaking or had Spanish as their first language. Data for Hispanics are presented by four age categories: 17-29, 30-39, 40-49, and 50-75 years. The normative data contained in the standardization manual are not reproduced here, and the interested reader is referred directly to the publication for further information.
Study strengths 1. Sample composition is well described in
terms of exclusion criteria. 2. Performance is reported by age and education intervals. 3. Data reporting includes means and SD scores for each age/education interval. 4. Age group intervals are generally adequate.
Considerations regarding use of the study 1. Sample size within each of the 30 age/ education categories is not indicated. 2. No information on the IQ of participants is reported, although the data are presented by age/education intervals. [CTT.2] Ponton, Satz, Herrera, Ortiz, Urrutia, Young, D'Eiia, Furst and Namerow, 1996 (Tables A5.2 and A5.3)
This study presents normative data stratified by age and education for Spanish-speaking adults' performance on the Neuropsychological Screening Battery for Hispanics (NeSBHIS), which contains the C'IT. This is
the initial report from an ongoing project. The sample consists of 300 volunteers (180 female, 120 male) recruited from fliers and advertisements posted at community centers and churches in Los Angeles County, California (Santa Ana, Pasadena, Pacoima, Montebello, and Van Nuys). The sample was primarily right-handed (95%). Regarding language, 210 were monolingual Spanish and 90 were rated by the examiner to be bilingual. The average (SD) duration of residence in the United States was 16.4 (14.4) years; however, 55% of the total sample had lived in the United States less than 15 years, and half of those participants had less than 6 years of residence in this country. Sixty-two percent of the sample were born in Mexico, 15% in Central America, and 23% in other Latin countries. Exclusion criteria included a history of neurological disease, psychiatric disorder, alcohol or drug abuse, or head trauma. Participants ranged in age from 16 to 75 years (mean = 38.4 [13.5] years). Whereas the 3039 and 40-49 age groupings are adequately narrow, the 16-29 and 50-75 age groupings are somewhat broad. The data are reported by age and education groupings. The tables separately present data for males and females.
Study strengths 1. Sample composition is well described in terms of exclusion criteria. 2. Educational levels are reported. 3. Mean and SD scores are reported. 4. Age group intervals are generally adequate for younger samples (w age groupings. 1 2. Good description of test sti~ and administration procedures. . 3. Information on geographic rec¥tment area is provided. ' 4. Mean time in seconds is reported. Considerations regarding use of the study l. No exclusion criteria. . 2. No information regarding gendet or IQ, and cursory data on educational tvel are provided only for those 17-19 years old. 3. Small individual cell sizes. I 4. No SDs reported. [STROOP.4] Eson, Penonal
Communica~n
(Coma IIi Version) (Table A6.3)
Eson provides Stroop data on 63 older participants in four age groupings that reftect the following mean ages: 63.2 (n = 1~, 67.0 (n = 16), 72.0 (n = 16), and 78.3 (n = 16). The Comalli test stimuli and adminiktration procedures were utilized. Means and ~Ds are reported.
Study strengths
l. Large overall sample size, an~ while individual cell sizes are small, ttey are for very restricted age ranges. 1 2. Test stimuli and administration 1 procedures are specified. 3. Mean time in seconds and Sps are reported. '
[STROOP.Sl Stuss, Ely, Hugenholtz, Richard, LaRochelle, Poirier, and Bell, 1985 (Comalli Version) (Table A6.4)
These authors collected Stroop data on 20 control participants (13 male, seven female) in Canada as a part of their investigation of the neuropsychological effects of closed head injury. Participants spoke either English or French. Mean age was 29.2 (12.0), mean years of education was 12.5 (2.0), and mean WAIS IQ was 106.6 (13.4). Participants were paid $15 for their participation. The Comalli test stimuli and administration procedures were employed. The mean and SD for time in seconds to name colors were 64.0 (12.9) for the control group. (Data from the two other trials are not provided.) Performance on Color Naming was significantly depressed in the head injury group relative to controls; groups did not differ in word reading or color interference.
Study strengths l. Data provided on gender, age, education, IQ, language, and geographic area. 2. Test stimuli and administration procedures are speci6ed.
3. Mean and SD for time to name colors are reported. Considerations regarding use of the study l. No information on exclusion criteria. 2. Data were collected in Canada with at least some participants French-speaking; cultural and linguistic factors may limit usefulness for clinical interpretation in the United States. 3. Data from the word-reading and colorinterference trials are not reported. 4. Small sample size. 5. Undifferentiated age range. [STROOP.6] Boone, Miller, Lesser, Hiii-Gutierrez, and D'Eiia, 1990 (Comalli Version) (Table A6.5)
Considerations regarding use of the stt,dy l. No information on exclusion qriteria, education, gender, IQ, or other characteristics.
Data were collected on 61 middle-aged and older individuals ranging in age from 50 to 79 recruited as controls in southern California through newspaper ads, flyers, and personal
119
STROOP TEST
contacts as part of a study of the effect of aging on executive abilities. Participants had no history of psychotic, major affective, or alcohol and other drug-dependence disorders and spoke English ftuently (a handful of participants spoke English as a second language). Participants were excluded if there was a history of neurologic disease, such as stroke, Parkinson's disease, or seizure disorder. Also excluded were individuals with laboratory findings showing serious metabolic abnormalities (e.g., low sodium level, elevated glucose level, or thyroid or liver function abnormalities). Eighteen percent of the original sample of 74 were eventually excluded due to the presence of previously unidentified strokes or other significant lesions documented on MRI, (n = 9), metabolic abnormalities or undiagnosed medical illness (n = 2), or evidence from laboratory studies and EEG findings of alcohol abuse and substance intoxication (n =2). The final sample (n =61) included 25 men and 36 women grouped by three age decades: 50-59 (n = 25), 60-69 (n = 21), and 7~79 (n = 15). All but 10 participants were white: four were African-American, three were Asian, and three were Hispanic. Mean educational level was 14.34 (2.63) and mean WAIS-R FSIQ was 113.79 (13.51). The Comalli version of the Stroop was administered (i.e., word reading, color naming, and color interference). Mean time in seconds as well as number of errors, with SDs, are presented by age grouping for each card. A significant decline with age was observed for word reading and color naming, with a trend toward a decline with age on color interference.
Study strengths 1. Overall sample size is large, although individual cell sizes are small. 2. Data are presented by age groupings. 3. Good exclusion criteria. 4. Information regarding gender, educational level, IQ, geographic area, ethnicity, ftuency in English, and recruitment procedures is provided. 5. Test stimuli and procedures are specified. 6. Means and SDs are reported for both time and errors.
Considerations regarding use of the study 1. The sample size within each age group interval is small (see 1 above). 2. High educational and IQ levels of the sample. [STROOP.7] Boone, Ananth, Philpott, Kaur, and Djenderedjian, 1991 (Comalli Version) (Table A6.6)
Stroop data were collected on 16 controls as part of a study on the neuropsychological characteristics of obsessive-compulsive disorder (OCD). Participants were recruited in southern California from newspaper advertisements and from siblings (n =9) of OCD patients. Medical exclusion criteria included history of alcohol or drug abuse, head injury, seizure disorder, cerebral vascular disease or stroke, current or past psychiatric disorder, or any renal, hepatic, or pulmonary disease. Participants included nine women and seven men, and 19% were left-handed (n = 3). Fourteen were Caucasian, and two were Asian; and all were ftuent in English. Two participants had a history oflearning disability. Mean age was 35.8 (13.7), mean educational level was 15.2 (2.8), and mean WAIS-R FSIQ was 109.1 (10.9). The Comalli version of the Stroop (i.e., word reading, color naming, color interference) was administered. Mean (SD) for time in seconds to complete the color-interference portion of the test was 112.9 (22.5). Controls and patients did not differ in test performance.
Study strengths 1. Good exclusion criteria, with the exception that two participants had histories of learning disability. 2. Information regarding age, education, FSIQ, gender, handedness, ftuency in English, ethnicity, recruitment procedures, and geographic area is reported. 3. Information on test stimuli and administration procedures. 4. Mean time in seconds and SD are provided but only for the color-interference card.
Considerations regarding use of the study 1. Small sample size. 2. Undifferentiated age range.
120
TESTS OF ATTENTION AND CONCENTRATION
3. High educational level; two participants had a history of learning disability. 4. No data reported for word-reading and color-naming trials. [STROOP.8] Demick and Harkins, 1997 (Coma/Ji Version) (Tables A6. 7-A6.1 0)
The sample consists of 231 individuals recruited in Massachusetts who participated in a study assessing the relationship between field dependence-independence (FQI) cognitive style and driving behavior. Participants were community-dwelling individuals who in telephone screening denied any ~tory of major impairment in perception, cognition, or motor execution and described themselves as having good overall health; corrected visual problems were allowed. The average educational level of the sample was high schrol plus some college courses completed. The Comalli cards and Kaplan administration procedures (i.e., color naming. word reading, color interference) were employed. Means, SDs, and ranges for time in seconds, errors, color difficulty factor (total time on B/total time on A), and interference factor (total time on C - total time on B) are provided for four age groupings (20--39, 40--59, 60--74, 75+ years).
Study strengths 1. Overall sample size is large, with individual cell sizes exceeding 50. 2. Data are presented by age groupings. 3. Probably adequate exclusion criteria. 4. Information regarding gender, overall educational level, and geographic region is provided. 5. Test stimuli and procedures are indicated. 6. Means, SDs, and ranges are prm4ded. Consideration regarding use of the study 1. No information regarding inteDectual level.
Other comments 1. Theoretical issues concerning the Stroop (e.g., process vs. achievement measures, identification of a cognitive sty~) are discussed.
[STROOP.9] Boone, 1999 (Comalli Version) (Table A6.11)
The author obtained Stroop data on 155 middle-aged and older individuals (age range 45-84) recruited as described by Boone et al. (1990); data from the 1990 study were included in the 1999 publication. Mean age of the sample was 63.07 (9.29), mean years of education was 14.57 (2.55), and mean WAIS-R FSIQ was 115.41 (14.11). Fifty-three were male and 102 were female. Medical and psychiatric exclusion criteria are the same as in the 1990 publication, with the exception that participants with significant white-matter hyperintensities documented on MRI were retained in the sample. All participants considered themselves healthy, although 51 had some evidence of vascular illness (defined as cardiovascular disease and I or significant white-matter hyperintensities on MRI) based on self-report or evidence on examination of at least one of the following: current or past history of hypertension (n = 39), arrhythmia (n = 8), large area of white-matter hyperintensities on MRI (e.g., 10 cm2 ; n = 7), coronary artery bypass graft (n = 3), angina (n = 2), and old myocardial infarction (n = 1). Twenty-four participants were currently on cardiac and/or antihypertensive medications. The Comalli version of the Stroop was administered. Means and SDs for time in seconds to complete the color-interference portion of the test are provided. A stepwise regression analysis revealed that age and FSIQ were significant contributors to Stroop color-interference performance, accounting for 15% and 13%, respectively, of test score variance; educational level, gender, and vascular status did not account for a significant amount of unique test score variance. Stroop normative data are presented for color-interference time in seconds stratified by IQ and age (< 65 and ~65; average, high average, and superior IQ).
Study strengths 1. Large overall sample size. 2. Presentation of the data by IQ and age groupings.
121
STROOP TEST
3. Comprehensive medical and psychiatric exclusion criteria including MRI brain scans on all participants. 4. Information regarding educational level, gender, geographic area, recruitment procedures, and fluency in English is provided. 5. Mean times in seconds and SDs for color interference are reported. 6. Test administration format is specified. Considerations regarding use of the study l. Individual IQ by age groupings range in size from 16 to 37. 2. High IQ and educational level of the sample. 3. No data reported for word-reading and color-naming trials. [STROOP.10] Boone, Swerdloff, Miller, Geschwind, Razani, Lee, Gaw Gonzalo, Haddal, Rankin, Lu, & Paul, 2001 (Comalli Version) (Table A6.12)
Stroop performance was assessed in 22 male controls as part of a study on neuropsychological function in adult Klinefelter's syndrome. Participants were recruited from newspaper and radio ads and flyers in the southern California area and paid for their participation. Exclusion criteria included history of learning disability, major psychiatric disorder, substance abuse, or neurologic disorder. All participants were fluent in English. Mean age was 34.32 (14.81), mean years of education was 13.36 (2.15), mean WAIS-R (Satz-Mogel) VIQ was 106.46 (17.01), and mean PIQ was 107.46 (16.58). Means and SDs for the color-interference trial are reported.
Kaplan Venion [STROOP.11] O'Eiia, Satz, and Uchiyama, Unpublished Data (Kaplan Version) (Table A6.13)
These data were collected in 1993 and 1994 during the course of an Federal Aviation Administration/Equal Employment Opportunity Commission (FAA/ EEOC)-mandated study to examine neuropsychological functioning of airline pilots. The sample consisted of 197 male, Caucasian airline pilots aged 40--59 employed by major airplane manufacturers in the United States. All pilots had recently passed their yearly comprehensive FAA physical examination. Data are presented in two age groupings: 4~9 (n = 118) and 50-59 (n = 79). Those 4~9 years old had an average of 16.1 (1.9) years of education, and those 50-59 years old group had 15.6 (2.0) years of education. The Comalli cards and Kaplan administration procedures (i.e., color naming, word reading, color interference) were used to obtain Stroop data on the pilots as part of a 55-minute neuropsychological screening battery. Means and SDs for time in seconds, errors, and near-miss (i.e., self-corrected) errors are provided.
Study strengths 1. Information regarding age, education, gender, IQ, geographic area, language, and recruitment procedures is provided. 2. Adequate exclusion criteria. 3. Means and SDs are reported.
Study strengths 1. Overall sample size is large, with individual cell sizes exceeding 50. 2. Data are presented by age groupings. 3. Adequate exclusion criteria. 4. Information regarding gender, educational level, recruitment procedures, ethnicity, and occupation and some information regarding geographic region is reported. 5. Test stimuli and procedures are indicated. 6. Means and SDs for time and nearmiss (i.e., self-corrected) errors are provided.
Considerations regarding use of the study l. Data are not stratified by age. 2. Small sample size. 3. All-male sample.
Considerations regarding use of the study 1. All-male sample. 2. High educational level of the sample. 3. No information regarding IQ.
122
TESTS OF ATTENTION AND CONCENTRATION
[STROOP.12] Schiltz, Personal Comm1111ication (Kaplan Version) (Table A6.14)
The sample consists of 50 (28 male, 22 female) native English-speaking participants recruited from the University of California at Los Angeles undergraduate introductory psychology courses during 1988-1989. Th~e were healthy normal adults aged 18-20 without a history of head trauma or loss of cqnsciousness. Average years of education for t\e group was 13.36 (0.63, range 13-15). All students were required to participate in ongoing research as a part of their. coursework, and students self-selected to the various studies based on the written descriptions of the studies. The Comalli stimulus cards and Kaplan administration procedures (i.e., color naming, word reading, color interference) were used as part of a larger neuropsychological battery assembled for the putpose of collecting norms. All participants were tested individually. Total battery length was 55 minutes, and the Stroop was adm4listered about 30 minutes into the protocol. Means, SDs, and ranges in seconds are reported for the first half of each stimulus card as well as for each card in total. Performance time on the second 50 items can be calculated by subtracting the time to complete the first half of the card from the total time to complete the whole card. Study strengths 1. The sample size is adequate for the restricted age interval. 2. Exclusion criteria were minimally adequate. 3. Data on gender composition, educational level, geographic area, and recrUitment procedures are reported. 4. Test stimuli and administration procedures are specified. 5. Means for time in seconds and SDs are reported. 6. Data provided for the first half of each card as well as total. Consideration regarding use of the 1. No data on IQ.
st1¥ly
[STROOP.13] Strickland, D'Eiia, James, and Stein, 1997 (Kaplan Version)(Table A6.15)
Stroop data were collected in southern California on 42 Mrican-American participants (15 males, 27 females) aged 19-41 with no remarkable history of neurologic, psychiatric, cardiovascular disease or substance abuse. Mean age for the whole sample was 30.17 (6.34) years; mean age of males was 31.93 (5.26) years, and that of females was 29.19 (6.75) years. Mean educational level for the sample was 14.76 (2.24) years. The Comalli stimulus cards and Kaplan administration procedures (i.e., color naming, word reading, color interference) were employed to obtain Stroop data. Mean times in seconds and SDs are provided for each of the three cards. Errors and near-miss (i.e., self-corrected) errors were tabulated. Women demonstrated significantly better performance than men on cards 1 and 2. There was a similar trend noted on card 3. Study strengths 1. Adequate exclusion criteria. 2. Information regarding gender, ethnicity, age, educational level, and geographic area is reported. 3. Information regarding test stimuli and test administration procedures is provided. 4. Means and SDs for time in seconds, errors, and near-miss (self-corrected) errors is provided for each card. Considerations regarding use of the study 1. Small sample size. 2. Undifferentiated age range. 3. High educational level of the sample. 4. No information regarding IQ. [STROOP.14] Miller, 2003; Personal Communication (Kaplan Version) (Table A6.16)
The investigation used participants from the Multi-Center AIDS Cohort Study (MACS). The data were collected from a sample of seronegative homosexual and bisexual males for the purpose of establishing normative data
STROOP TEST
for neuropsychological test performance based on a large sample. There were 522 participants in the Color Naming, 521 in the Word Reading. and 692 in the Interference conditions. Mean age for the full sample used in the Interference condition was 40.57 (7.5) years, and mean education was 16.31 (2.3) years; 91.2% were Caucasian, 2.9% Hispanic, 5.5% black, 0.4% other. All participants were native English speakers. The three conditions of the Kaplan version of the Stroop were administered according to standard instructions. The data are partitioned by three age groups (25-34, 35-44, 45-59) x three educational levels (< 16, 16, > 16 years). Study strengths 1. The overall sample size is large, and most individual cells have more than 50 participants. 2. Normative data are stratified by age x education. 3. Information on age, education, ethnicity, and native language is reported. 4. Means and SDs are reported. Considerations regarding use of the study 1. All-male sample. 2. No information on IQ is reported. 3. No information on exclusion criteria.
Golden Version [STROOP.15] Ingraham, Chard, Wood, and
123
ence score, are provided. Performance on rapid word reading and color naming was significantly slower (by eight and three words, respectively) than the norms reported by Golden for English-speaking individuals, which the authors suggest may be due to longer speaking times because the Hebrew words were two-syllable. However, scores for the color-interference trial were not significantly different from English-speaking language norms, although the interference score was significantly larger. The authors hypothesize that Hebrew speakers may show less of an interference effect because Hebrew readers are accustomed to reading words without vowels and determining meaning from context, allowing them on the colorinterference trial to "read" the words as other than color names, thereby reducing any interference effect. No significant differences between men and women were detected. Study strengths 1. Relatively large sample size for reasonably restricted age range. 2. Information regarding native language, educational level, gender distribution, and geographic area is reported. 3. Test stimuli and procedures are described. 4. Means and SDs for number of items completed and interference score are reported. 5. Data provided for Hebrew version.
Mirsky, 1988 (Golden Version) (Table A6.17)
Data were gathered on 46 college students and college-educated adults in Tel Aviv using "the general format of Golden's 1978 version with new randomization, a bold typeface, and Hebrew lettering," which is carefully detailed. The sample consisted of 28 men and 18 women, with an average age of 28.4 (3.2) years and a range of 24-36 years. Exclusion criteria included prior psychiatric disorder, primary language which was not Hebrew, and prior familiarity with the Stroop Test. Means and SDs for number of items completed within 45 seconds for the three stimulus cards, as well as Golden's interfer-
Considerations regarding use of the study 1. Psychiatric exclusion criteria reported but no medical exclusion criteria. 2. No information regarding recruitment procedures or IQ level.
[STROOP.16] Connor, Franzen, and Sharp, 1988 (Golden Version) (Table A6.18)
Stroop data were obtained on 40 college student volunteers in West Virginia (17 male, 23 female) who ranged in age from 18 to 25, with the exception of one 32-year-old. The Golden version of the Stroop was administered with either standard instructions
124
TESTS OF ATTENTION AND CONCENTRATION
as detailed in the test manual or •tandard instructions plus six suggestions ("l~king at no more than three words at a time; focusing on only one letter in the word; remembering that the same color never occurs twlce consecutively; going at an even, steady pPce; trying not to become distracted or l~e one's place; and not repeating an already-correct answer when correcting a mistake"). Participants were administered th~ Stroop at baseline (pretest), following five practice sessions (post-test), and at a 1-week follow-up. No effect of gender or instructioq format was documented. A significant effect pf practice was found between the pre- and tost-test but not between the post-test and follow-up. Data are presented in means and ~Ds for number of items completed for the jPretest, posttest, and follow-up sessions. ' I
Study strengths 1. Information on the effects of practice, gender, and alternative instruc~ons on Stroop performance is provided.. 2. Information on age, gender, ~d geographic area, with some inform~on on education and recruitment prooedures, is reported. , 3. Test stimuli and procedures are siiecified. 4. Data are presented in means and SDs for number of items completed. :
Participants had no history (as judged through medical records) of color blindness, cataracts, or glaucoma. The Golden Stroop Test stimuli and administration procedures were employed. Means and SDs are reported for number of items completed on each trial. Some participants (five female, three male) had difficulty discriminating between the colors blue and green on the color trial.
Study strengths 1. Data presented in a homogenous age
grouping. 2. Information is given regarding mean age, mean educational level, gender, mean Blessed Dementia Scale score, ge~ graphic area, and recruitment procedures. 3. Information is given on test stimuli and administration procedures. 4. Means and SDs reported for number of items completed. 5. Test administration format was described. Considerations regarding use of the study 1. Relatively small sample size. 2. No information regarding IQ. 3. High educational level of the sample. 4. Unclear exclusion criteria.
[STROOP.18] Daigneault, Braun, and Whitaker, 1992 (Golden Version) (Table A6.20)
Considerations regarding use of the stfdy 1. Relatively small sample size. ; 2. Undifferentiated age range, althOugh it is somewhat restricted. 3. No information on exclusion ~riteria or IQ. 4. Data are not broken down by gender or education.
I
[STROOP.17] fisher, Freed, and Corkin, ~990 (Golden Version) (Table A6.19)
i
The authors collected Stroop data on 36 older controls (typically spouses of patients) from southern California as part of an inves,gation of Stroop performance in Alzheimer's cliSease. Mean age was 72.9 (8.3) years, mean .educational level was 14.6 (2.7) years, an4 mean Blessed Dementia Scale score was (6.1). The sample included 13 males and 23 f~males.
1.' I
Stroop data were obtained on 128 Frenchspeaking participants in Canada as part of a study investigating the effects of aging on prefrontal lobe skills. Participants were recruited through ads, trade union collaboration, and the help of a large sports center. Exclusion criteria included consumption of more than 24 beers, five bottles of wine, or 15 ounces of spirits per week; consumption of cocaine, LSD, or psychostimulants; any neurological or psychiatric consultation, psychoactive medication, head trauma with hospitalization, or major surgery (e.g., cardiac). Participants were divided into two age groupings: 20-35, with a mean of 27.71 (4.05) years (n = 70), and 45-65, with a mean of 56.62 (5.29) years (n =58). The younger group contained 38 men and 32 women; they were primarily specialized blue-collar
125
STROOP TEST
workers, although some specialized whitecollar and unskilled blue-collar professions were represented. The older group contained 30 men and 28 women, and slightly more than half were specialized blue-collar workers, with some unskilled blue-collar professions, specialized white-collar occupations, and professional occupations represented. The mean educational level of the younger group was 12.36 (2.09) years, and that of the older group was 12.11 (3.63) years. Mean number of items completed and SDs for the color-interference portion of the test are reported. The two age groups differed significantly in test performance, with the younger group outperforming the older group.
Study strengths 1. Good exclusion criteria. 2. Large overall sample size, and each of the two age groupings has more than 50 participants. 3. Information regarding educational level, gender, occupations, geographic area, and recruitment procedures is provided. 4. Information on test stimuli and administration procedures is reported. 5. Means and SDs for number of items completed on part C is provided. Considerations regarding use of the study 1. Data were obtained on French-speaking participants in Canada; thus, it is unclear whether these data are appropriate for clinical interpretation on English-speaking individuals in the United States. 2. No information regarding IQ (although mean scores on the vocabulruy subtest of the French-language WAIS analog are reported). 3. No data provided for the first two sections of the Stroop Test. 4. Test administration format may have been altered (i.e., participants appeared to scan the stimulus cards across rows from left to right). [STROOP.19] Swerdlow, filion, Geyer, and Braff, 1995 (Golden Version) (Table A6.21 ) The authors collected Stroop data on 72 "normal" controls (34 males, 38 females)
recruited through newspaper ads and posted advertisements. No subject had a history of psychiatric illness, substance abuse or dependence, recreational drug use in the month prior to testing, schizophrenia in a first-degree relative, sustained loss of consciousness, severe neurologic or medical illness, or psychotropic medication use. Three participants were excluded based on a urinalysis positive for iJlicit drug use. Participants were divided into "psychosisprone" (n =26) and "non-psychosis-prone" (n = 46) groups based on MMPI criteria (Goldberg index ~60 and F ~70 or Wiggins Psychoticism index ~60). Psychosis-prone individuals scored significantly below nonpsychosis-prone participants on the interference trial and "interference ratio." Women performed significantly better than men on color naming. Means and SDs are reported for word reading, color naming, interference, and the interference ratio for the psychosis-prone and non-psychosis-prone groups (and for subgroups based on MMPI scales that determine psychosis-proneness) and for men and women separately.
Study strengths 1. Excellent exclusion criteria. 2. Information provided regarding gender and recruitment strategies. 3. Large overall sample size. Considerations regarding use of the study 1. No information regarding age, education, IQ, ethnicity, or geographic area. [STROOP.20] lvnik, Malec, Smith, Tangalos, and Petenen, 1996 (Golden Version) (Table A6.22)
This study presents normative data for performance on the Golden (1978) version of the Stroop Test obtained on 356 individuals between the ages of 56 and 94, who participated in the ongoing Mayo Older Americans Normative Studies (MOANS), a project to develop normative data for elderly individuals on various neuropsychological tests. The data are derived from a population of "almost exclusively Caucasian older adults who live in
TESTS OF ATTENTION AND CONCENTRATION
126
an economically stable region of the United States" (the area surrounding Rochester, MN). All participants were community-dwelling, had no active neurologic or psychiatric disorder, and had undergone recent physical exams. Data are reported in discrete age ranges. Age categorization used the midpoint interval technique. The raw score distribution at each midpoint age was "normalized" by assigning standard scores with a mean of 10 and SO of 3, based on actual percentile ranks. The authors provided tables of age-corrected norms for each age group. The procedure for clinical application of these data is described in the original article (Ivnik et al., 1996) as follows:
Analyses did not suggest that a performance correction was necessary for gender.
first select the table that corresponds to that person's age. Enter the table with the test's raw score; do not use "corrected" or "final" scores for tests that might present their own age- or educationadjustments. Select the appropriate column in the table for that test. The corresponding row in the leftmost column in each table provides the MOANS Age-Corrected Scaled Score . . . for your subject's raw score; the corresponding row in the rightmost column indicates the percentile range for that same score.
Considerations regarding use of the study 1. The measures proposed by the authors are quite complicated and might be difficult to use in clinical practice. 2. No information on IQ.
Mean and SO scores for performance by age are not reported; however, the raw performance score can be easily translated to percentile performance scores (and standard scores) using the data tables. MOANS scaled scores by age and education level (A&E-MSS) have to be empirically derived using the following equation: A&E-MSSstmop
= K + (Wt * A-MSSstroop) - (Wz * Education)
Where K is a constant for each test, W1 is a weight to be applied to the age-corrected MOANS scaled score, and W2 is a weight to be applied to the person's education. For the Stroop Test, the values are as follows: K
Word Color Interference
3.47 1.88 1.38
1.10 1.10 1.09
0.34 0.23 0.19
Study strengths 1. Minimally adequate exclusion criteria. 2. Information regarding age, education, gender, handedness, ethnicity, recruitment procedures, and geographic area is reported. 3. The data are stratified by age group based on the midpoint interval technique. 4. Sample sizes for the five age groupings spanning ~79 exceed 50. 5. The test version and scoring procedures are specified.
[STROOP.21] Doan and Swerdlow, 1999 (Golden Version) (Table A6.23)
The Golden Stroop version translated into Vietnamese was administered to 30 native Vietnamese speakers, while the standard Golden version was given to 30 native English speakers. All participants resided in the San Diego area. The average age of the 13 males and 17 females in the Vietnamese group was 34.4 (13.1) years, with a range of 19-68, and the average educational level was 14.3 (3.5) years; 47% were students, and the sample averaged 16 (7) years residence in the United States. The 12 male and 18 female native English speakers averaged 31.2 (11.9) years of age, with a range of 19-57, and average educational level was 15.4 (1.6); 30% were students. All but one subject were righthanded (one Vietnamese speaker was ambidextrous). The colors employed for the Vietnamese translation were blue, brown, and red because the Vietnamese word for green is used for both blue and green. Means and SDs for number of responses for the word reading, color naming, and interference sections and the Golden interference score are provided. No significant differences in performance
127
STROOP TEST
were found between the Vietnamese speakers and English speakers or between monolingual Vietnamese speakers and bilingual speakers.
Study strengths 1. Data provided for Vietnamese test version. 2. Information regarding geographic area, language, ethnicity (although incomplete), age, education, handedness, and gender is provided. 3. Test stimuli and procedures are described. 4. Means and SDs for number of items completed and interference score are reported.
Considerations regarding use of the study 1. No exclusion criteria are reported. 2. Data are not stratified by age. 3. Small sample sizes. 4. No information regarding IQ recruitment procedures.
or
[STROOP. 22] Rapport, Van Voorhis, Tzelepis, and Friedman, 2001 (Golden Version) (Table A6.24) Stroop data were collected on 32 controls (19 males, 13 females) who were either undergraduates at a large midwestern university or residents in the neighboring metropolitan area as part of a study of executive function in adults with ADHD. Exclusion criteria included history of significant neurologic disorder (head injury, stroke, seizure disorder), current substance abuse, or scores greater than 1 SD higher than mean values on ADHD behavior rating scales. Mean age was 33.2 (13.2), mean years of education was 14.8 (2.5), and mean WAIS-R FSIQ was 108.0 (7.7). Means and SDs for the word trial and color trial are reported.
Study strengths 1. Generally adequate exclusion criteria. 2. Information regarding age, education, gender, IQ, and geographic location is reported. 3. Test stimuli and procedures are described.
Limitations regarding use of the study 1. Small sample size. 2. Data are not stratified by age. 3. No specific information regarding recruitment strategies. 4. High educational level.
(STROOP.23] Rosselli, Ardila, Santisi, Arecco, Salvatierra, Conde, and Lenis, 2002 (Golden Version) (Table A6.25)
Stroop Test data were obtained on 40 English monolinguals, 71 Spanish-English bilinguals (90% with Spanish as the first language), and 11 Spanish monolinguals in south Florida who were primarily college students, their family members, and friends. All were right-handed. The average age of the 32 male and 39 female bilinguals was 31.98 (13.14), and average years of education was 14.92 (2.35). The 13 male and 27 female English monolinguals averaged 35.90 (13.08) years of age and 15.35 (2.45) years of education. The three male and eight female Spanish monolinguals averaged 40.91 (15.17) years of age and 14.25 (3.49) years of education. None had any psychiatric or neurological conditions, and all had normal MMSE scores. All bilinguals had had at least some formal education in English and averaged 19 years speaking the second language. A Spanish version of the Stroop Test (Rey, unpublished) was administered to the monolingual Spanish speakers and to the bilinguals, who also completed the English version (order of administration of the two versions was randomized). Number of errors and time in seconds to complete all stimuli (instead of the number of items completed in 45 seconds) were collected. Means and SDs for the three trials are reported for the three groups as well as for three subgroups of bilinguals (unbalanced-Englishdominant, unbalanced-Spanish-dominant, and balanced). Groups did not significantly differ in Stroop scores, with the exception of slower performance in the bilinguals relative to English speakers on color naming in English. Testing bilinguals in their second language was associated with a lOo/o-15% increase in time for color naming and a 5o/o-10% increase in time for color interference. Bilinguals who were more facile in
128
TESTS OF ATTENTION AND CONCENTRATION
Spanish were significantly slower on the English Stroop trials, while bilinguals more fluent in English were slower on the Spanish Stroop.
Study strengths 1. Data on Spanish-language and Englishlanguage Stroop performance in a large sample of bilingual participants (n = 71) as well as a smaller group of monolingual Spanish speakers are provided. 2. Adequate exclusion criteria. 3. Information provided regarding age, education, gender, and handedness, as well as comprehensive information on language characteristics. Considerations regarding use of the study 1. The administration format was altered (time to finish the stimuli rather than number of responses at 45 seconds). 2. No information regarding IQ. 3. Data are not stratified by age. 4. High educational level.
[STROOP.24] Lopez-Carlos, Salazar, Villasenor Saucedo, and Peiia, 2003
version) were included in the battery. WAISIII Block Design raw scores are included in Tables A6.26-A6.29. Mean performance on the Marin Marin Acculturation Scale for the Los Angeles sample was 17.61 (6.19). For the Los Angeles group, Picture Vocabulary subscale scores from the Woodcock-JohnsonIII Tests of Achievement (Mean= 5.36, SD = 6.01) and the Bateria Woodcock-Muiioz-R, Pruebas de habilidad cognitiva-R (Mean= 29.77, SD =5.37) were used to assess level of English and Spanish word expressive abilities. The results are presented by years of education (0-6, 7-10), age (18-29, 30-49 years), and education and age (18-29 years old, 0-6 and 7-10 years of education; 30-49 years old, 0-6 and 7-10 years of education). The authors found a significant difference (p< 0.05} in performance on the Stroop (color and color/ word interference) between the two education groups. However, the two age groups did not differ significantly on any of the sections of the Stroop. No significant differences in scores between individuals from Los Angeles and Mexico were noted.
(Golden Version) (Tables A6.26-A6.29)
The Golden version of the Stroop was used in a study investigating the effects of demographic variables on cognitive abilities in Spanish-speaking individuals with low education. The total sample included 115 volunteer monolingual Latino men with $10 years of formal education, who worked at manual labor in the Los Angeles area (n = 65) and Jalisco, Mexico (n =50). Volunteers were recruited from posted advertisements in workplaces and personal solicitations. The mean age for the sample was 28.23 (8.74) years and mean education was 6.66 (2.54) years. Exclusion criteria consisted of any self-report of head injury, neurological insults, prenatal or birth complications, learning disabilities, psychiatric problems, or substance abuse. Scores on the Beck Depression Inventory-11-Spanish Version (Mean = 12.92, SD = 8.94) and the Beck Anxiety Inventory-Spanish Version (Mean= 6.60, SD = 6.03) are also reported. Standard administration procedures were used. Participants were tested in Spanish. Selected subtests from the WAIS-III (Mexican
Study strengths 1. Large sample for age and education groups. 2. Data availability for a healthy, employable, monolingual Spanish-speaking group with low education level 3. The sample is stratified into two education groups, two age groups, and four age x education groups. Additionally, data are available for United States and Mexico. 4. The sample composition is well described in terms of age, education, gender, geographic area, and recruitment procedures. 5. Adequate exclusion criteria. 6. Means and SDs are reported. 7. WAIS-III Block Design subtest scores are presented. Considerations regarding use of the study 1. All-male sample. 2. Small sample sizes for the combined age and education groups.
129
STROOP TEST
[STROOP.25] Cohen, Brumm, Zawacki, Paul, Sweet, and Rosenbaum, 2003 (Golden Version) (Table A6.30)
Twenty males who averaged 30.5 (10.7) years of age and 11.8 (3.3) years of education were used as controls in a study of cognitive function in domestic violence perpetrators at the University of Massachusetts Medical Center; most of the controls were hospital workers. Mean WAIS-R FSIQ was 100.7 (11.0), mean VIQ was 101.8 (10.6), and mean PIQ was 100.2 (11.2); 16.5% admitted to at least some past drug use, and 9.3% admitted to prior alcoholrelated problems. Three reported previous head injury (two mi1d, one moderate), 10% had experienced learning difficulties, and 12.5% admitted to childhood behavioral problems. Means and SDs for the interference trial are provided. Study strengths 1. Information provided regarding gender, education, age, IQ, previous learning difficulties, childhood behavioral problems, head injury, substance use/abuse, and geographic area. 2. Means and SDs reported for interference trial. 3. Data for a sample of average IQ and education. Considerations regarding use of the study 1. Small sample size. 2. No apparent exclusion criteria; individuals with a history of substance use I abuse, learning and I or childhood behavioral problems, and head injury included. 3. Only males included. 4. Data provided only for interference trial. 5. Data are not stratified by demographic factors. [STROOP.26] Moering, Schinka, Mortimer,
and Graves, 2004
(Golden Version)
(Table A6.31)
Stroop data were collected on 236 older African Americans, aged 60-84, living in private residences in the Tampa, Florida, area.
Participants were recruited through the use of epidemiological sampling procedures. Exclusion criteria included endarterectomy, transient ischemic attack, cerebrovascular accident, Parkinson's disease, or traumatic head injury with loss of consciousness and retrograde amnesia; no information on psychiatric disorders was collected. Fifteen individuals were excluded on the basis of the above factors, as well as one outlier whose Stroop data was 3 SDs from the mean of the sample, which was clearly separated from the remainder of the scores. The sample was divided into two age groupings: 60--71 (n = 111) and 72-84 (n = 125). Younger participants scored significantly better than older participants on all trials, with an effect size of > 0.25. In addition, education and gender were significant predictors of performance, with higher levels of education and female gender associated with better performance. The majority of the sample (72.5%) had12 years), for a total of 12 separate subgroups with sizes ranging 2-56. Means and SDs are reported. In addition, adjustments for education and gender to be applied to raw scores are provided, as well as data on percentile scores for raw scores for each age group. Study strengths 1. Large sample sizes for the two age groupings. 2. Data are stratified by age, education, and gender, although individual cell sizes ranged 2-56. 3. Good exclusion criteria for neurological conditions, although psychiatric conditions or chronic medical illnesses (e.g., hypertension) were not used as exclusion criteria and could at least partially explain the poorer performance observed in this sample relative to Caucasian individuals. 4. Data provided for an African-American population; however, most had a low level of education (although this was apparently representative of the communities in which they lived).
130
TESTS OF ATTENTION AND CONCENTRATION
5. Information on geographic atea and recruitment strategies is provid~. 6. Means and SDs are reported, as well as percentile equivalents of raw scores and score adjustments for education and gender.
Considerations regarding use of the study 1. Issues regarding exclusion criteria, lowered educational level, an' small individual cell sizes. ; 2. No data available on IQ level. . 1
Dodrill Venion [STROOP.27] Dodrill, 1978a (Dodrill Version) (Table A6.32)
Dodrill collected control data on 50 participants in the state of Washington as a paft of his investigation of the cognitive corre~tes of epilepsy. Thirty were male and 2() were female; and mean age and educatim~ level were 27.34 (8.41) years and 11.96 (2.01\) years, respectively. Forty-nine were Caucasi$1, with one listed as non-Caucasian. Nine w~re students, six were housewives, 20 wdre unemployed, and 15 were employed. P~ipants were recruited through employment f$:ilities, churches, a community college, a pub.c high school, a volunteer service agency, and a semisheltered workshop. Participants underwent a detailed neurological history, and those with diseases or other conditions affectfng the nervous system were excluded. The Dodrill version of the Stroop was administered. Means and SDs are r~rted for time in seconds to complete parts I ,md II. In addition, means and SDs are provi4ed for part I+ part II, and part II- part I. Using a cutoff of 93/94 seconds on part I, 7p% of controls were correctly classified. A cQtoff of 150/151 seconds for part II- part I res~ted in a 74% correct classification rate.
Study strengths 1. Adequate sample size (n =50). 2. Information on age, education, ~nder, occupation, geographic area, etlpricity, and recruitment procedures is pr¥ded. 3. Test stimuli and procedures are s~fied.
4. Mean time in seconds and SDs are reported.
Considerations regarding use of the study 1. No information on IQ. 2. Apparently adequate exclusion criteria, although some controls were recruited from sheltered workshops. 3. Undifferentiated age range. [STROOP.28] Sacks, Clark, Pols, and Geffen, 1991 (Dodrill Version) (Table A6.33)
Stroop data were obtained on 12 male university student volunteers in Australia, ranging in age from 18 to 32 with a mean of 22.4 (5) years, as a part of the development of five alternate forms of the Dodrill Stroop. All participants had normal vision (20:20, as tested with a standard Snellen wall chart) and no evidence of color blindness (assessed through Ishihara charts). Participants averaged 13.7 (2.3) years of education. Mean abbreviated WAIS-R FSIQ, VIQ, and PIQ were 109.1 (9.5), with a range of 100-124; 108.4 (8.7), with the range of 100-124; and 106.6 (7.1), with a range of97-120, respectively. The exact procedures used to develop the alternate forms are specified. All participants were administered all six forms of the test in 1 day with a 50-minute rest period between trials on each form. Order of completion of the six forms was randomized. Participants were halted at each error and instructed to correct the mistake before proceeding. Means and SDs for time in seconds are reported for each form. The forms were judged to be equivalent, although a significant practice effect was still present between the first and second test administrations. Sets of the six alternate forms are available from the test authors.
Study strengths 1. Data provided on six alternate forms and practice effects. 2. Information reported on education, gender, IQ, vision, age, and geographic area. 3. Test stimuli development and administration procedures are carefully described.
131
STROOP TEST
4. Means and SDs for time in seconds are reported for each form. Considerations regarding use of the study 1. Small sample size (n = 12). 2. All-male sample. 3. Data are collected in Australia; cultural differences may render the data questionable for clinical interpretation in the United States. 4. No exclusion criteria. Victoria Version [STROOP.291 Regard, 1981, cited in Spreen and Strauss, 1991, 1998 (Victoria Version) (Table A6.34)
Data were obtained on 40 right-handed young adults of average intelligence. Average age was 26.7 (range 20--35). The Victoria Stroop Test stimuli and procedures were employed. Means and SDs are reported for time and errors. Study strengths 1. Homogeneous age grouping. 2. Information regarding age, IQ, and handedness is provided. 3. Test stimuli and procedures are described. 4. Means and SDs for time and errors are reported. Considerations regarding use of the study 1. Fairly small sample size. 2. No information regarding educational level, gender, fluency in English, geographic recruitment area (assumed to be Canada), or exclusion criteria. [STROOP.30] Spreen and Strauss, 1991 (Victoria Version) (Table A6.35)
These authors collected Stroop normative data on 86 healthy older participants aged 50-94; average age was 68.5 (10.78) years. Mean years of education was 13.2 (3.1) years. The Victoria Stroop Test stimuli and administration procedures were used. Means and SDs are reported for time and errors for four age groupings: 50-59 (n = 19), 60-69 (n = 28), 70-79 (n = 24), and 80-94 (n = 15).
Study strengths 1. Data are presented by narrow age groupings. 2. Information is provided regarding mean age and mean educational level. 3. Test stimuli and procedures are well described. 4. Means and SDs are reported for time and errors. Considerations regarding use of the study 1. Unclear exclusion criteria (participants are described as "healthy"). 2. No information regarding IQ, gender, fluency in English, and geographic recruitment area (assumed to be Canada). 3. Small cell sample sizes. Trenerry Version [STROOP.31] Anstey, Matters, Brown, and Lord, 2000 (Trenerry Version) (Table A6.36)
Stroop data were obtained on 369 retired individuals residing in Anglican retirement villages in Australia and involved in a randomized controlled trial of exercise on falls risk and psychological well-being. There were 52 males and 317 females, ranging in age from 62 to 95, with a mean of 79.04 (6.59) years; average years of education was 11.25 (2.79). Exclusion criteria included Parkinson's disease, stroke, or heart attack. Sixty-six percent rated their health as good or very good, 18% rated their health as excellent, and 16% rated their health as fair or poor. The most common health problems were arthritis (65%), cataract (53%), hypertension (50%), glaucoma or poor vision (38%), lung problems (19%), and diabetes (7%). Seventeen percent of the sample had MMSE scores 10 cm 2), significant declines are detected in ACT performance (Boone et al., 1992). In fact, ACT may be one of the most sensitive cognitive tests to the presence of white-matter damage sustained through hyperintensities of probable vascular origin (Boone et al., 1992) or white-matter and/ or frontal-limbic-reticular activating system disruption secondary to accelerationdeceleration closed head injury (Stuss et al., 1985).
METHOD FOR EVALUATING THE NORMATIVE REPORTS
Our review of the literature located three ACT normative reports published since 1987 (Anil et al., 2003; Stuss et al., 1987, 1988) as well as data from two studies examining the impact of age, education, IQ, gender, and medical illness on ACT performance (Boone et al., 1990; Boone, 1999), which included unique features such as large sample size (Boone, 1999) and reporting of a wide range of ACT scores (Boone et al., 1990).
136
TESTS OF ATTENTION AND CONCENTRATION
To adequately evaluate the ACT normative reports, six key criterion variables were deemed critical. The first four of these relate to subject variables, and the remaining two relate to procedural variables. Minimal criteria for meeting the criterion variables were as follows.
Subject Variables Sample Size
Fifty cases are considered a desirable sample size. Although this criterion is somewhat arbitrary, a large number of studies suggest that data based on small sample sizes are highly influenced by individual differences and do not provide a reliable estimate of the population mean. Sample Composition Description
Given the evidence that ACT performance may be significantly impacted by medical status (e.g., vascular illness), information regarding medical exclusion criteria is critical. In addition, as discussed previously, information should probably also be provided regarding educational level, gender, psychiatric exclusion criteria, geographic region, ethnicity, occupation, handedness, and recruitment procedures, even though there are as yet no data indicating that these factors influence test performance. Reporting of Age
Given the equivocal and modest relationship between age and ACT performance, ACT normative data probably do not need to be presented by age group intervals, but information on the ages of the normative samples should be provided. IQ Group Intervals
Given the evidence that IQ may account for more unique test score variance than do demographic factors, information regarding IQ level should be reported for each subgroup, and preferably normative data should be presented by IQ intervals.
Procedural Variables Description of the Administration Format Used
Given that different test administration formats involve differing lengths of distraction intervals, specific information regarding the delays should be provided. Data Reporting
Means and standard deviations, and preferably ranges, for total score out of 60 are important. In addition, it is advantageous for data to be provided for each of the distraction intervals separately.
SUMMARY OF THE STATUS OF THE NORMS In terms of subject variables, only one study provides data by IQ level (Boone, 1999), although IQ data are reported in a second study (Boone et al., 1990). Information on age, gender, education level, geographic area, and recruitment procedures is reported for all studies. In addition, medical, psychiatric, neurologic, and substance abuse exclusion criteria are described and judged to be adequate for all studies. Ethnic composition was indicated in two studies (Anil et al., 2003; Boone et al., 1990). Handedness data were provided only in the investigations conducted by Stuss and colleagues (Stuss et al., 1987, 1988). While all studies exceeded a total sample size of 50, only one study reached the criterion of 50 participants per individual grouping cell (Anil et al., 2003). In terms of procedural variables, information is available regarding the precise administration formats for all studies. Means and SDs are reported for total score in all but one study (Anil et al., 2003), and means and SDs for individual distractor delays are provided in all but one study (Boone, 1999). Practice effects are investigated in the reports by Stuss and colleagues (Stuss et al., 1987, 1988), and data on qualitative performance variables (perseverations, errors in letter sequence) are provided in Boone et al. (1990).
137
AUDITORY CONSONANT TRIGRAMS
Data are presented in ascending chronological order for two ACT versions separately: first for the 9-, 18-, and 36-second and then for the 3-, 9-, and 18-second delay version. The text of study descriptions contains references to the corresponding tables identified by number in Appendix 7. Table A7.1, the locator table, summarizes information provided in the studies described in this chapter. 1
SUMMARIES OF THE STUDIES Data for 9-, 18-, and 36-Second Delay Version [ACT.l] Stuss, Stethem, and Poirier, 1987 (Tables A7.2 and A7.3)
ACT baseline and 1-week retest data were collected on 60 participants in Canada, who were recruited through employment agencies and paid $10 for the two testing sessions. Participants ranged in age from 16 to 69, with a mean of 39.6 (2.62) years. Years of education ranged 8-20, with a mean of 14.5 (2.63). Thirty-three participants were male and 27 were female. Forty-nine were right-handed. None had a history of significant medical, neurological, or psychiatric disorder; substance abuse; or current psychotropic medication use. Participants were tested in their native language (English or French). Each threeconsonant combination was presented at a rate of one consonant per second followed by a three-digit number. Participants were instructed to count backward by 3s from the number for random delays of 9, 18, and 36 seconds and then to recall the trigram. Practice trials were employed until participants demonstrated understanding of the procedures. Five trials were conducted for each delay interval, with intertrial delays of 2--5 seconds. The counting delays were extended from those employed by Cermak and Butters (1972), to minimize ceiling effects. A total score of 15 was possible for each of the three delays. An alternate form was employed on retesting. 'Norms for children and adolescents are available in Baron (2004) and Spreen and Strauss (1998).
Test performance was not impacted by age, educational level, or gender. A practice effect for the 9- and 18-second delays was observed despite the alternate form. ACT data are provided by six age groupings (16-19, 20-29, 30-39, 40-49, 50-59, and 60-69) for baseline testing, retesting, and the two testing sessions combined for the three delay intervals separately. Data on gender distribution, handedness, mean age and SD, and mean years of education, SD, range, and frequency of"$high school" and ">high school" for each age grouping are provided. In addition, data are presented by gender and educational level ($high school, >high school) separately. Study strengths 1. Information regarding age, education, gender, and handedness for the total sample and for individual age groupings is provided. 2. Adequate exclusion criteria. 3. Information on practice effects is reported. 4. Precise description of test administration procedures. 5. Data presented by age groupings, gender groupings, and education groupings. 6. Data presented separately for each distraction interval. 7. Information on geographic area and recruitment procedures is provided. 8. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Small individual cell sizes (n = 10). 2. Data collected in Canada, with some test administrations conducted in French; cultural and linguistic factors may limit usefulness of data for clinical interpretation in the United States. 3. No information regarding IQ level. [ACT.2] Stuss, Stethem, and Pelchat, 1988 {Tables A7.4 and A7.5)
In this publication, the authors supplement the data reported in 1987 by expanding the number of participants, increasing cell sizes
138
TESTS OF ATTENTION AND CONCENTRATION
by collapsing the data from six to three age groupings (16-29, 30-49, and 50--69), and presenting the combined data from the two testing sessions in box plots, which has the advantage of visual display of data variability. Each of three age groupings contained baseline and 1-week retest data on 30 participants, none of whom had a positive psychiatric or neurologic history. Data are presented on gender distribution, handedness, mean age and SD, and mean years of education, SD, and range for each age group separately.
Study strengths 1. 2. 3. 4.
Large overall sample size (n = 90). Information on practice effects. Adequate exclusion criteria. Information on gender, educational level, and handedness for each age grouping is presented. 5. Presentation of test score variability via box plots. 6. Test administration format is the same as in Stuss et al. (1987). 7. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Same as above: although the sample has been increased by 50%, the three age groupings still have only 30 participants each.
Data for 3-, 9-, and 18-Second Delay Version [ACT.3] Boone, Miller, Lesser, Hill, and D'Eiia, 1990 (Table A7.6)
Data were collected on 61 middle-aged and older individuals ranging in age from 50 to 79, recruited as controls in southern California through newspaper ads, flyers, and personal contacts as a part of ongoing research on latelife depression and psychosis. Participants had no history of psychotic, major affective, or alcohol or other drug dependence disorder and spoke English fluently. (A handful of participants spoke English as a second language.) Participants were excluded if there was a history of physical findings of neurological
disease, such as stroke, Parkinson's disease, or seizure disorder. Also excluded were individuals with laboratory findings showing serious metabolic abnormalities (e.g., low sodium level, elevated glucose level, or thyroid or liver function abnormalities). Eighteen percent of the original sample of 74 were eventually excluded due to the presence of previously unidentified strokes or other significant lesions documented on MRI (n = 9), metabolic abnormalities or undiagnosed medical illness (n =2), or evidence from laboratory studies and EEG findings of alcohol abuse and substance intoxication (n = 2). The final sample (n = 61) included 25 men and 36 women grouped by three age decades: 50-59 (n = 25), 60--69 (n = 21), and 70-79 (n = 15). All but 10 participants were white; four were African American, three were Asian, and three were Hispanic. Mean educational level was 14.34 (2.63) years, and mean WAIS-R FSIQ was 113.79 (13.51). No significant effect of age on ACT performance was documented in comparisons of the three age groups. Means and SDs are presented for ACT total score as well as for 3-, 9-, and 18-second delay for each age group separately. Total possible was 60 (15 points for each delay interval as well as 15 points for a five-trial, 0-delay condition). Means and SDs are also reported for number of perseverations and altered sequences. Perseveration was defined as the reporting of an incorrect letter which was used as an answer on the preceding trial; a total of 57 perseverations were possible. Altered sequence referred to reporting of correct letters but in the wrong position within the trigram; a total of 20 altered sequences were possible.
Study strengths 1. Information on IQ level, years of education, gender distribution, geographic area, recruitment procedures, ethnicity, and fluency in English is presented. 2. Data are reported in terms of total score but also by individual delay intervals; information is also provided on perseverations and altered sequences.
139
AUDITORY CONSONANT TRIGRAMS
3. Comprehensive medical and psychiatric exclusion criteria, including MRI brain scans, on all participants. 4. Test administration format is described. 5. Means and SDs for the test scores are reported. 6. Data stratified by age.
Considerations regarding use of the study 1. Fairly small individual cell sizes (n = 15-25). 2. High average IQ level. [ACT.4] Boone, Ananth, Philpott, Kaur, and Djenderedjian, 1991 ACT data were obtained on 16 controls (nine women, seven men) as part of an investigation of the neuropsychologicaJ characteristics of obsessive-compulsive disorder (OCD). Nine of the participants were siblings of OCD patients, while the remaining participants were recruited through newspaper ads and friends of OCD patients in the southern California area. Mean age was 35.8 ( 13.7) years, and mean educational level was 15.2 (2.8) years. Mean FSIQ, VIQ, and PIQ were 109.1 (10.9), 106.3 (13.0), and 111.8 (10.8), respectively. MedicaJ exclusion criteria were history of alcohol or drug abuse, head injury, seizure disorder, cerebral vascular disease or stroke, psychosurgery, current or past psychiatric condition, or any renal, hepatic, or pulmonary disease. Mean ACT total score was 44.3, with an SO of7.5. Data are not reproduced in this book
Study strengths 1. Information regarding age, education, gender distribution, IQ, geographic area, and recruitment procedures is reported. 2. Comprehensive psychiatric and medical exclusion criteria. 3. Though not stated, test administration procedures are the same as those in Boone et al. (1990). 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Small sample size. 2. Data are not stratified by age or IQ level.
3. Data are presented in terms of total score only, with no information regarding distraction intervals. 4. Nearly high average IQ level.
[ACT.5] Boone, 1999 (Table A7.7) The author obtained ACT data on 155 middleaged or older individuals (ranging in age from 45 to 84 and recruited as described above for Boone et al., 1990; data from the 1990 study are included in the 1999 publication). The mean age of the sample was 63.07 (9.29) years, mean educational level was 14.57 (2.55) years, and mean FSIQ was 115.41 (14.11); 53 were male and 102 were female. Medical and psychiatric exclusion criteria are listed above, with the exception that participants with significant white-matter hyperintensities documented on MRI were retained in the sample. All participants considered themselves healthy, although 51 had some evidence of vascular illness (defined as cardiovascular disease and/ or significant white-matter hyperintensities on MRI) based on self-report or evidence on examination of at least one of the following: current or past history of hypertension (n = 39), arrhythmia (n = 8), large area of white-matter hyperintensity on MRI (e.g., > 10 cm2 ; n = 7), coronary artery bypass graft (n = 3), angina (n = 2), and old myocardial infarction (n = 1). Twenty-four participants were currently on cardiac and/ or antihypertensive medications. A stepwise regression analysis revealed that FSIQ, age, and vascular status were significant contributors to total ACT score, accounting for 17%, 6%, and 3% of test score variance, respectively; educational level and gender did not account for a significant amount of unique test score variance. ACT normative data are presented for total ACT score stratified by IQ and age ( < 65 and ~65; average IQ, high average IQ, and superior IQ).
Study strengths 1. Large overall sample size. 2. Presentation of data by IQ and age groupings. 3. Comprehensive medical and psychiatric exclusion criteria, including MRI brain scans, on all participants.
140
TESTS OF ATTENTION AND CONCENTRATION
4. Information regarding educational level, gender, geographic area, recruitment procedures, and fluency in English. 5. Though not stated, test administration procedures are the same as dtose in Boone et al. (1990). 6. Means and SDs for the test scores are ~ reported.
Considerations regarding use of the s#!.uly 1. Individual IQ-by-age groupings have sample sizes ranging 16-37. 2. Data are presented in terms of total score rather than separately for each distraction interval. [ACT.6l Anil, Kivircik, Batur, Kabakci, ICitis, Giiven, Basar, Turgut, and Arkar, 2003 : (Table A7.8)
ACT data were collected on 236 individuals in Turkey, who were recruited from hospttaJ staff or through personal contacts. Exclusfon criterion included neurological or p~hiatric conditions. The sample was strati6tJ into three age groups (16-25, 26-45, andj46-65) and three education groups (8-10, 11-il-4, and >14 years). The youngest age group ~nsisted of 40 males and 22 females, who aver4ged 22 (2. 7) years of age. The middle age grqup was composed of 70 males and 55 femal.s, who averaged 34.1 (5.9) years. The oldest group included 28 males and 21 females, w}1o averaged 53.8 (4. 7) years. The ACT was translated with the consultation of a linguist, and consonants from the Turkish alphabet showing similar phone~c characteristics to the original ACT were e~loyed. Participants were instructed to count b~kward by 1s rather than the standard 3s. M~s and SDs are reported for each delay intervaj. Analyses revealed no gender Fffects, although better performance was asspciated with younger age and more years of ed"Qcation.
Study strengths 1. Large overall sample size, although the sizes of the nine subcells ate not reported. ~ 2. Information provided for age, ~ender, education, geographic area, l~age, and recruitment procedures. 1
3. Data stratified by both age and education. 4. Adequate exclusion criteria. 5. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Test was translated into Turkish and data were collected in Turkey, rendering use problematic for English-speaking patients. 2. Test administration was not standard (subjects counted backward by 1s rather than 3s). 3. No information regarding IQ.
CONCLUSIONS ACT has been underutilized as a clinical measure of executive dysfunction despite evidence that it may be particularly sensitive to whitematter disturbance. Given emerging interest in working-memory paradigms, the consonant trigrams task may experience an increase in popularity. Most working-memory paradigms have been used in experimental studies, and normative data are typically not available. The fact that a normative data pool of upward of500 participants has been collected for ACT may make it an attractive working-memory procedure for clinical practice. In addition, the fact that the ACT task does not involve a timed response makes it a desirable executive measure in that test performance is not confounded by declines in mental speed. For tasks such as Trails B, Stroop Color Interference, and word and design generation, poor scores may reflect slowing in information-processing speed rather than executive dysfunction per se. Future research is needed to determine which delay intervals (i.e., 3, 9, and 18 seconds vs. 9, 18, and 36 seconds) are most sensitive and appropriate for clinical use. Also, normative data need to be obtained on populations with less than average IQs.2
2 Meta-analyses were not performed on ACI' due to a lack of sufficient data.
8 Paced Auditory Serial Addition Test
BRIEF HISTORY OF THE TEST
Sampson (1956) originally developed an auditory and a visual version of the Paced Serial Addition Test. In a landmark study, Gronwall and Sampson (1974) used the auditory version, the Paced Auditory Serial Addition Test (PASAT), to assess information-processing speed and working memory in post-concussive patients. This version was further researched and made popular by Gronwall and colleagues (Gronwall & Wrightson, 1974; Gronwall, 1977a). It is now believed that the PASAT also measures other cognitive abilities, such as sustained and divided attention (Lezak. 1995; Lezak et al., 2004). In the original version of the PASAT, a random series of 61 digits (1-9) are presented on audiotape and the participant is to add the last digit presented to the preceding digit and verbalize the answer. For example, if the digits 1 and 2 are presented, the participant's correct response would be 3 (i.e., 1 + 2), and if the digit 4 is presented next, the participant's correct response would be 6 (i.e., 2 + 4) and so on. Each trial has the same random presentation of the 61 digits; however, the pace at which the digits are presented differs for the four trials. In trial 1, the digits are presented at the rate of 2.4 seconds, in trial 2 at 2.0 seconds, in trial3 at 1.6 seconds, and in trial 4
at 1.2 seconds. The task of the participant is identical for each trial, and thus, a total of 60 correct responses per trial is possible. A practice trial of 10 digits presented at the rate of 2.4 seconds precedes the four test-trials. Functional brain imaging studies have suggested that PASAT performance is associated with activation of right anterior and left posterior cingulate, consistent with an emerging body of literature relating cingulate function to attentional mechanisms (Deary et al., 1994). The PASAT was initially devised as a measure to detect cognitive deficits in postconcussive individuals. Several studies have shown that patients with head trauma perform significantly worse than their normal control counterparts on the PASAT (Bate et al., 2001; Brooks et al., 1999; Cicerone, 1997; Gronwall and Sampson, 1974; Maddocks & Saling, 1996; Ponsford & Kinsella, 1992; Stuss, et al., 1989; Tiersky et al., 1998). Cicerone and Azulay (2002) documented the PASAT to be among one of the most sensitive neuropsychological tests for detecting impairment in patients with post-concussion syndrome, but Maddocks and Sailing (1996), using only the 2.4-second presentation of the PASAT, did not find the same results. The original studies with the PASAT also found it to be a sensitive measure of recovery rate and capability to return to work (Gronwall & Wrightson, 1974; Gronwall, 1977a). 141
142
TESTS OF ATTENTION AND CONCENTRATION
However, of concern, several subsequent studies have been unsuccessful in finding a relationship between PASAT scores and severity of head injury (Levin et al., 1982; ~erman et al., 1997; Stuss et al., 1989). Fos et al. (2000) found that both the auditory and ~ visual versions of the Paced Serial Addition Test significantly correlated with other tesfs of attention but that neither version of this test differentiated patients with mild traumatic brain injury from normal controls in a. college population. In an interesting study }t Chan (2001), there were no differences in TASAT scores of those postconcussive patieJts who were considered "low symptom refrters" and those who were considered "hig~ symptom reporters." More recently, the PASAT has been used to study cognitive functioning in patietlS with multiple sclerosis (MS). In fact, modified versions of the PASAT, using 3- and 2tsecond pacing in digit presentation, are includeP in the Brief Repeatable Battery of Neuropsychological Tests developed by the National ~ultiple Sclerosis Society to be used as a scree$g tool for MS (Rao et al., 1990). In a study lw Shawaryn et al. (2002), the PASAT predictep mental and emotional responses of MS pa~nts on a quality-of-life questionnaire. Johnson et al. (1996) found that patients with MS performed poorly on both the PASAT and the: Paced Visual Serial Addition Test (a visual a¥og to the PASAT), while patients with Chrotrlc Fatigue Syndrome displayed difficulty o~y with the PASAT. The authors postulate that ~eficits on both of these tasks by MS patien;s may suggest impairment of central executite system, a view that is shared by D'Esposith et al. (1996). Kujala et al. (1995) reported intpaired PASAT scores for a group of mildly deteriorated MS patients but intact scores for~ nondeteriorated MS group. Solari et al. ;(1995) found that the PASAT was one of two neuropsychological tests that best discri~J?inated between MS and controls. Fisk and ~hibald (2001) used a different scoring techniq+e (the "dyad" method of counting two consr:utive right responses as one correct point) apd observed that controls outperformed MS J.lltients on only the first two out of four prese,tation trials. Using the dyad scoring method !of the
PASAT and magnetic resonance imaging, Snyder and Cappelleri (2001) found that PASAT scores correlated with the total area of sclerotic brain lesions in MS patients. This correlation was not observed when the original PASAT scoring method was used. It should be noted that studies have found the PASAT to be increasingly difficult for MS patients, and as a result a number of them refuse to perform the task (Aupperle et al., 2002). Additional factors have also been shown to affect PASAT performance. Studies have demonstrated a reduction in PASAT scores during pain (Sjogren et al., 2000), with sleep disruption (Martin et al., 1996), in solvent exposure (Rasmussen et al., 1993), during hypoglycemia in patients with diabetes (Gold et al., 1995), in HIV-positive individuals (Honn et al., 1999), in individuals with schizotypal personality disorder (Mitropoulou et al., 2002), in individuals with Attention-Deficit Disorder (Katz et al., 1998), and in cannabis addicts (Elwan et al., 1997). In addition, a negative effect of smoking on PASAT performance has been reported but only in poorly educated males (Elwan et al., 1997). Further details about Gronwall's (1977b) version of the PASAT and verbatim instructions can be obtained from the PASAT administration manual and test kit (see Appendix 1 for ordering information) or Spreen and Strauss (1998, pp. 243-251; see also Lezak et al., 2004).
Modifications and Alternate Formats of the PASAT While the original version is the most commonly administered format, modifications to the PASAT have been made. Several of these modified versions and alternate formats are presented below. Levin Version
Levin et al. (1987) developed a version in which only 50 digits (rather than 61 digits) are presented in different random order (as opposed to the same random order) for each trial using the same 2.4-, 2.0-, 1.6-, and 1.2-second interval presentation. This version minimizes the practice effects observed with Gronwall's
143
PACED AUDITORY SERIAL ADDITION TEST
original version, as demonstrated by Stuss et al. (1987). PASAT-200, PASAT-100, and PASAT-50 Shortened versions of the PASAT were also developed by Diehr et al. (1998) and further modified by Diehr et al. (2003). The Diehr et al. (1998) PASAT, also referred to as the PASAT-200, is very similar to Levin et al.'s (1987) version in that it consists of the presentation of 50 single digits (except for the number 7) in random order at four different pacing intervals. However, the pacing intervals are 3.0, 2.4, 2.0, and 1.6 seconds per digit instead of 2.4, 2.0, 1.6, and 1.2 seconds. Different random presentation of the digits is used for each trial. Diehr et al. (2003) shortened the PASAT-200 by providing normative data on trial 1 (3-second pacing trial) only, referred to as the PASAT-50, and trials 1 and 2 combined (3- and 2.4-second pacing trials), referred to as the PASAT-100. Computerized Versions of the PASAT Holdwickand Wingenfeld (1999) and Wingenfeld et al. (1999) created a computerized version of Gronwall's (1977a) original PASAT, in which the auditory stimuli are presented via external speakers and responses are recorded by an external microphone. However, the computerized administration does not record a response as correct if it occurs after presentation of the subsequent stimulus, while the traditional administration format of the PASAT has typically given credit for correct "late" responses. Tombaugh (1999) and Royan et al. (2004) recently developed an interesting computer version of the PASAT, referred to as the Adjusting-PSAT. This version measures speed of information processing and working memory by assessing temporal thresholds versus the traditional method of counting number of correct responses. In this version, stimuli are presented through either auditory or visual modalities, and the duration of the interval between number presentation depends on the correctness of the response. In other words, correct responses lead to decreased time between intervals and incorrect responses lead to increased time between intervals.
An additional PASAT administration format includes giving only one or two trials of the original PASAT at select pacing rates (e.g., 2.4 or 2.0 seconds). Psychometric Properties of the Test Adequate reliability and validity have been reported for the original PASAT. Studies have cited split-half reliability of greater than 0.90 (Egan, 1988), suggesting high internal consistency, and test-retest reliability values of0.930.97 (McCaffrey et al., 1995). O'Donnell et al. (1994) reported adequate construct validity of the PASAT, demonstrating relatively strong correlations with other tests of attention, such as Visual Search and Attention Task (r=0.55) and TrailMaking Test Part B (r = 0.58). Moderate correlations between PASAT and other tests of concentration, information processing, and working memory have also been noted by Crawford et al. (1998b), Deary et al. (1991), Gronwall & Wrightson (1981), and Larrabee and Curtiss (1995). However, some authors caution that the PASAT correlates not only with tests of attention but also with tests that measure mathematical skills and overall intellectual ability (Chronicle & McGregor, 1998; Sherman et al., 1997). Sherman and colleagues (1997) voiced concern that "PASAT performance depends on mathematical ability, at least as much as on attentional skills" and recommend that "the PASAT should not be interpreted as a measure of attention when mathematical skills are poor" (p. 43). For further information on the effect of repeated administration and psychometric properties of the PASAT, see Franzen (2000) and McCaffrey et al. (2000).
RELATIONSHIP BETWEEN PASAT PERFORMANCE AND DEM()(jRAPHIC FACTORS Age effects have been frequently reported for the PASAT. Stuss et al. (1988) found declining PASAT scores as a function of age grouping. Using the Levin version of the PASAT, Brittain et al. (1991) also demonstrated an age-related decline in performance. Roman et al. (1991)
144
TESTS OF ATTENTION AND CONCENTRATION
found that individuals in the 6th and 7th decades of life performed significantly worse than two younger groups on all PASAT trials, and Wiens et al. (1997) reported a steady ageassociated decline in PASAT performance for individuals in their twenties to late forties. Further, Diehr et al. (1998) documented a decline with age in a modified version of the PASAT for individuals in three age groups (20-34, 35-49, 50-68). A few studies, however, have shown weak or no age effects. Boringa et al. (2001) reported declining scores as a function of age on the 2-second trial of the PASAT but not the 3-second trial. This would suggest that age differences emerge only during the more difficult portion of the task. Epperson and Cripe (1985) found no significant age effects for a sample of individuals aged 18-49, and Elwan et al. (1997) found no significant correlation between PASAT scores and age in an Egyptian sample ranging from 20 to > 60 years of age. Finally, in one study (using the 2-second delay in number presentation), PASAT scores for older individuals (mean age= 52) were actually higher than for young college students (mean age= 25) (Ward, 1997). A relatively consistent relationship between PASAT performance and education has been reported. Stuss et al. (1987) found that individuals with less than a high school education performed poorer on the PASAT than those with a college education or higher. Wiens et al. (1997) found education effects for trial 1 of the PASAT but not the other trials. Diehr et al. (1998) reported a steady increase in PASAT scores as a function of higher education attainment. In contrast, Brittain et al. (1991) and Elwan et al. (1996, 1997) could detect no significant relationship between education and PASAT performance. The results are mixed in terms of the relationship between general intelligence and the PASAT. Gronwall and colleagues (Gronwall & Sampson, 1974; Gronwall & Wrightson, 1981) and others (Johnson et al., 1988; Roman et al., 1991) report weak or no correlation between intelligence and PASAT, while others have shown a moderate relationship between these two factors (Crawford et al., 1998b; Deary et al., 1991; Egan, 1988; Kanter, 1984; Wiens
et al., 1997). Kanter (1984) observed a strong correlation between PASAT responses and speeded nonverbal intelligence tasks, and significant relationships between PASAT scores and the Shipley tests of intelligence have also been reported (Brittain et al., 1991; Egan, 1988). Deary et al. (1991) found a significant correlation between PASAT scores and WAIS-R IQ in a group of diabetic patients, but on closer examination, the relationship was only significant between the PASAT and the freedom from distractibility index of the WAIS- R. In terms of basic math skills and the PASAT, Gronwall and Sampson (1974) found a weak correlation, but others have shown a stronger relationship (Sherman et al., 1997). Gender differences have not been found in most studies using the PASAT (Boringa et al., 2001; Diehr et al., 1998, 2003; Roman et al., 1991; Stuss et al., 1987). Some studies have found statistically significant differences in performance in favor of males, but the differences were of little clinical or practical importance (Brittain et al., 1991; Wiens et al., 1997). Elwan and colleagues (1996, 1997), administering the PASAT to a sample of Egyptians, found better performance in males but particularly in subjects age 60 or above. Interestingly, Wiens et al. (1997) noted that Hispanic, Asian, and Native American males in their sample appeared to perform "slightly" better than their female counterparts, while the opposite was true for African-American and Caucasian participants. However, cell sample sizes were too small to confirm these observations with statistical analyses. Only a few studies have examined the relationship between race/ethnicity and PASAT performance. Brittain et al. (1991) reported a complex interaction effect between age, IQ, and race. They found that in older "minority" women, PASAT scores across all trials were associated with IQ scores. The specific racial breakdown of their minority subjects was not provided, and this interaction effect was not reported for their Caucasian group. Wiens et al. (1997) found no statistically significant differences between African-American, Hispanic, Native-American, Asian, and Caucasian participants. Diehr et al. (1998), however, reported significantly better PASAT performance by
PACED AUDITORY SERIAL ADDITION TEST Caucasians relative to African Americans across three age groups (20-34, 35-49, 50-68). Additionally, using T-score conversions, Diehr et al.'s distribution of the PASAT scores of a small sample of Hispanic individuals more closely resembled that of the African Americans than the Caucasians.
METHOD FOR EVALUATING THE NORMATIVE REPORTS To adequately evaluate the PASAT normative reports, seven key criterion variables were deemed critical. The first five of these relate to subject variables and the two remaining dimensions refer to procedural issues. Minimal requirements for meeting the criterion variables were as follows.
Subject Variables Sample Size
Fifty cases are considered a desirable sample size. Although this criterion is somewhat arbitrary, a large number of studies suggest that data based on small sample sizes are highly influenced by individual differences and do not provide a reliable estimate of the population mean. Sample Composition Description
Information regarding medical and psychiatric exclusion criteria is important. It is unclear if gender, geographic recruitment region, socioeconomic status, occupation, ethnicity, or recruitment procedures are relevant. Until this is determined, it is best that this information be provided. Age Group Interval
This criterion refers to grouping of the data into limited age intervals. This requirement is especially relevant for this test since a strong effect of age on PASAT performance has been demonstrated in the literature. Reporting of Education Levels
Given the strong association between education and PASAT performance, information
145
regarding educational level should be reported for each subgroup, and preferably normative data should be presented by educational levels. Reporting of Intellectual Levels
Given the probable association between PASAT performance and IQ, information regarding intellectual level should be reported for each subgroup, and preferably normative data should be presented by IQ levels.
Procedural Variables Description of Administration Procedures
Due to variability in administration procedures, a detailed description of the procedures, including identification of the version of the test administered and number of trials (with reported pacing of digit presentation), is desirable. This would allow one to select the most appropriate norms or to make corrections in interpretation of the data. Data Reporting
Group means and standard deviations for the number of correct responses for each pacing condition should be presented at minimum.
SUMMARY OF THE STATUS OF THE NORMS Information presented in the studies reporting data for the PASAT differs across studies. Some of these differences will be summarized below. Of the studies reviewed below, nine were essentially designed to provide normative information (Boringa et al., 2001; Brittain et al., 1991; Diehr et al., 1998, 2003; Roman et al., 1991; Stuss et al., 1987, 1988; Wiens et al., 1997; Wingenfeld et al., 1999). Data for "normal" control groups from clinical comparison studies are also included in this chapter. Various test formats of the PASAT are used, with several studies devoted to modifying test versions or scoring methods. The variations in testing procedure and format include the number of digits used (e.g., 61 or 50), the same vs.
TESTS OF ATTENTION AND CONCENTRATION
146
different random order of the digit presentation across trials, the number of trials administered, and the pace at which the digits are presented (e.g. 3.0-, 2.4-, 2.0-, 1.6-, and/or 1.2-second pacing). Among all of the clinical studies available in the literature, we selected for review those that used well-defined samples; presented means and SDs for more than one presentation condition (e.g., 2.4-second pace per digit); provided adequate description of the test version, procedures, and format; and provided descriptive statistics for sample demographics, such as age and education. In the studies reviewed below, the test scores represent the number of correct responses for each pacing rate or the total scores across all trials, unless indicated otherwise. Summaries of the studies are presented in ascending chronological order for each version of the test separately. Studies using Gronwall's administration procedure are presented first, followed by those using Levin's version, concluding with the PASAT-50, PASAT-100, and PASAT-200 versions. The text of study descriptions contains references to the corresponding tables identified by number in Appendix 8. Table A8.1, the locator table, summarizes information provided in the studies described in this chapter. 1
SUMMARIES OF THE STUDIES
Gronwall's Administration Version
retested with the PASAT. The retesting was approximately 1 week later for head-injured patients; it can be assumed that it was the same time delay for the controls, but there is no specific mention of this. There is no additional information regarding age, gender, or education for this sample. No other exclusion criteria are reported. The 61-digit version of the PASAT was presented at four different pacing rates (2.4, 2.0, 1.6, and 1.2 seconds).
Study strengths 1. Adequate sample size. 2. Test administration procedures are well specified. 3. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. The sample composition is not well described in terms of age, education, gender, IQ, and recruitment procedures. 2. The age range of the group is quite large, and the majority of the participants are between the ages 17-25 years. 3. No exclusion criteria are provided, and the non-head-injured "accident" cases are not well described. 4. The test-retest time frame for the normal controls is not provided (but the head-injured patients were tested 1 week apart). 5. The data were obtained on New Zealanders, which may limit their usefulness for clinical interpretation in the United States.
[PASAT.1] Gronwall, 1977a (Gronwall Version) (Table A8.2)
[PASAT.2] Stuss, Stethem, and Poirier, 1987
This is one of the first studies to use the PASAT in order to assess cognitive functioning in brain-damaged patients. A sample of 60 "normal" participants in New Zealand aged 14-55 years (with the majority aged 17-25), consisting of 10 non-head-injured accident cases, 10 naval "ratings," and 40 firstyear university students, served as controls. All subjects were initially tested and then
(Gronwall Version) (Table A8.3)
'Nonns for children and adolescents are available in Baron (2004) and Spreen and Strauss (1998).
The authors examined age-related differences in performance on three neuropsychological tests, one of which was the PASAT. The authors recruited 60 participants from Ottawa, Canada, through personal contacts or various agencies (e.g., Seniors Employment Bureau, Youth Employment Agency). Participants were grouped by six decades of life (16-19, 20-29, 30--39, 40-49, 50-59, 60-69). Information regarding handedness, years of education, and ratio of males to females is provided for each
PACED AUDITORY SERIAL ADDITION TEST
age group. None had a history of neurological or psychiatric illness. Educational levels of males (14.36) and females (14.55) were approximately the same, but significant differences were found between educational levels of participants in the different age groups, with the 50-59 group having the lowest educational level. The original Gronwall four-trial version of the PASAT was used. It should be noted that the authors report using 60 digits but also state that 60 correct responses are possible. Thus, it is believed that the original 61-digit version was used. Participants were tested at two different intervals, separated by 1 week. The test was administered in the participants' native language of French or English. Study strengths 1. The sample composition is well described in terms of age, education, gender, geographic area, and recruitment procedures. 2. The data are stratified by six age groupings. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Overall sample is adequate, but individual cells are very small. 2. Educational levels are not equal across the different age groups, and some of the groups are highly educated. 3. The data were obtained on Canadian subjects, sometimes in French, which may limit their usefulness for clinical interpretation in the United States.
Other comments 1. Individuals in the 50-59 age group had the lowest educational level and the lowest PASAT scores relative to the other age groups. Their PASAT scores were significantly lower than even the oldest age group (60-69). 2. The authors present another table that collapses PASAT scores across age groups, stratifying the data by gender
147
and educational level (~high school vs. >high school). Given the significant age effect, these tables have not been reproduced in this chapter but can be found in the original source. [PASAT.l] Sluss, Stethem, and Pelchat, 1988 (Gronwall Version) (Table A8.4) This study builds on the previous normative study by Stuss et al. (1987) by collapsing the age groups (i.e., creating larger age ranges per group}, thus increasing the number of participants per cell. In the current study, there were three age groups. For the 1~29 age group, there were 16 males and 14 females, with an average age of 22.43 (2.67) and education range of 11-18 years (mean= 14.1, SD = 1.34); for the 30-49 group, there were 14 males and 16 females, with an average age of 40.63 (2.97) and education range of 5-20 years (mean= 14.9, SD = 3.95); and for the 50-69 group, there were 14 males and 16 females, with an average age of 61.77 (3.0) and education range of ~18 years (mean= 13.2, SD=2.38). See the above study (PASAT.2) for additional participant characteristics and recruitment procedures.
Study strengths 1. The sample composition is well described in a previous study (Stuss et al., 1987) in terms of age, education, gender, geographic area, and recruitment procedures. 2. The data are stratified by three age groupings. 3. AdeC(luate exclusion criteria are described in a previous study (Stuss et al., 1987). 4. Test administration procedures are described in Stuss et al. (1987). 5. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Need to access Stuss et al. (1987) study in order to learn about the sample recruitment and testing procedures. 2. Mean educational levels for some of the age groups are relatively high; the 1~19 and 50-59 groups have
TESTS OF ATTENTION AND CONCENTRATION
148
substantially less education than the other age groups. 3. Overall sample size is adequate, hut individual cells are small. 4. The data were obtained on Canadian subjects, sometimes in French. which may limit their usefulness for · clinical interpretation in the United StatEs. [PASAT.4] Rao, Mittenberg, Bernardin, Haughton, and Leo, 1989 (Gronwall Version) (Table A8.5)
This study examined the effects of £~peri ventricular white-matter changes on 'tive functioning in healthy adults. The uthors selected 40 participants (10 males, 30 males) who had normal brain imaging to serve as controls. Participants ranged in age ~m 25 and 60 years, with an average age of (8.1), average educational level of 14.0 (2. ), and average Verbal IQ of 106.5 (5.8). All articipants were recruited from newspaper!: advertisements in the Milwaukee, Wiscons~, area. Additional exclusion criteria were a pristory of hypertension, cardiac or cerebro~cular disease, neurological illness, head in~·, substance abuse, or psychiatric illness. articipants underwent physical and ne logical exams. I Gronwall's 61-digit test administratiln version of the PASAT was employed, h t only two trials, at 3- and 2-second pacin rates, were used. Total correct responses fqr both trials are reported. '
42!
[PASAT.S] Stuss, Stethem, Hugenholtz, and Richard, 1989 (Gronwall Version) (Table A8.6)
The authors compared the performance of two groups of head-injured patients to controls on three neuropsychological tests. Twenty-six control participants (20 males, 6 females) with no history of neurological or psychiatric disorder were recruited. Participants were matched with head-injured patients on age (± 2 years), education (± 2 years), and gender. Thus, control subjects ranged in age from 17 to 57, with an average of 29.7 (12.4), and ranged in educational level from 7 to 20 years, with an average of 13.2 (3.0). The standard 61-digit version using four trials (2.4, 2.0, 1.6, and 1.2 seconds) was administered at two different points in study 1 and at five different points in study 2. Testing and retesting sessions were separated by approximately 1 week. Data for study 1 are reported in this review.
Study strengths 1. The sample composition is well described in terms of age, education, gender, and recruitment procedures. 2. Adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are provided.
Study strengths 1 1. The sample composition is w~ described in terms of age, educatio•• gender, and recruitment procedures.! 2. Exclusion criteria are provided. 1 3. Test administration procedures rj-e de! scribed. 4. Means and SDs for the test scolies are reported. :
Considerations regarding use of the study 1. The geographic location where participants were recruited is not provided; however, it may he assumed that they were from the Ottawa, Canada, region, which may limit their usefulness for clinical interpretation in the United States. While not mentioned in this study, in previous studies the authors have administered the test in French or English, depending on the participant's language preference. 2. Small sample size.
Considerations regarding use of the stuldy 1. Relatively small sample size. I. 2. The data are not stratified by ag;, gen: der, or education. 3. Data for only two pacing rates ipr the PASAT are provided. i
Other comments 1. Test data for two testing sessions (from study 1) have been reproduced in this chapter. In addition, the authors provide data for five testing probes (study 2), which can he found in the original study.
i
I
PACED AUDITORY SERIAL ADDITION TEST
[PASAT.6] Rao, Leo, Bernardin, and Unverzagt, 1991a (Gronwall Version) (Table A8.7)
The study examined the pattern of cognitive deficits in patients with MS using a brief neuropsychologicaJ battery. The authors recruited 100 (25 maJes, 75 femaJes) normaJ, heaJthy adults through newspaper advertisements in the Milwaukee, Wisconsin, area. Controls were matched to MS subjects based on age (±3 years), education (±1 year), and gender. Thus, control participants had an average age of 46.0 (11.6) years, an average education of 13.3 (2.0) years, and an average Verba] IQ of 107.2 (11.2). Exclusion criteria were history of substance abuse, psychiatric illness, head injury, or other neurologicaJ disorders. All controls were given neurologicaJ evaJuations and MRI scans. Only one participant was non-Caucasian. All subjects were paid for their participation. GronwaJJ's 61-digit administration version of the PASAT was employed, but only two triaJs, at 3- and 2-second pacing rates, were used. TotaJ correct responses for both triaJs are reported. Study strengths 1. The sample composition is well described in terms of age, education, gender, and recruitment procedures. 2. Relatively large sample size. 3. Adequate exclusion criteria. 4. Test administration procedures are specified in a previous study (Rao et aJ. 1989). 5. Means and SDs for the test scores are reported. Considerations regarding use of study 1. The data are not stratified by age, gender, or education. 2. Data for only two pacing rates are provided. [PASAT.7] Strauss, Spellacy, Hunter, and Berry, 1994 (Gronwall Version) (Table A8.8)
The authors examined the utility of the PASAT as a tool for detecting malingering. They selected 10 (four maJes, six femaJes)
149
undergraduate students from the University of Victoria to serve as controls. Participants ranged in age from 20 to 35, with an average age of23.7 (2.58) and an average education of 15.21 (0.79) years. No exclusion criteria are provided. Two triaJs of GronwaJl's 61-digit version of the PASAT were administered at 2.0- and 1.6-second pacing rates. Study strengths 1. The sample composition is well described in terms of age, education, gender, geographic location, and recruitment procedures. 2. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Sample size is smaJI. 2. No exclusion criteria are described. 3. The data were obtained on Canadian subjects, which may limit their usefulness for clinicaJ interpretation in the United States. 4. Only two triaJs of the PASATwere used. 5. Education level is high. [PASAT.BJ Zalewski, Thompson, and Gottesman, 1994 (Gronwall Version) (Table A8.9)
The authors compared the cognitive performance of patients with Post-traumatic Stress Disorder and GeneraJized Anxiety Disorder to controls. The data were selected from a large database of scores collected in the Vietnam Experience Study (VES) during 1985-1986 (for more description, see Decoufle et aJ., 1991). The control group consisted of241 nonpsychiatric veterans randomly drawn from a larger sample of 1,579 veterans who had never met criteria for various psychiatric disorders (e.g., depression, bipolar disorder, substance abuse, personaJity disorders). No other exclusion criteria are provided. These participants were initiaJly recruited for the VES in order to study the long-term heaJth effects of military service in Vietnam. Participants were Vietnam and non-Vietnam veterans who entered the U.S. Army between 1965 and 1971. All participants underwent comprehensive medicaJ and psychologicaJ evaJuations. This
150
TESTS OF ATTENTION AND CONCENTRATION
sample is most likely primarily all male, but there is no mention of the gender composition. They were an average of 38.0 years old and had an average of 13.6 years of education (no SDs were reported). There were 189 Caucasians, 35 Mrican Americans, 11 Hispanics, and 6 "others" in the sample. Two trials (2.4 and 1.2 seconds) of Gronwall's version of the PASAT were administered, and total correct responses for both trials is reported.
Study strengths 1. Large sample size. 2. Sample composition is well described in terms of age, education, and ethnicity. 3. Test procedures are relatively well described. 4. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. It is unclear whether the control group was recruited for research participation only or if any of the participants were referred for clinical assessment. 2. Sample composition is not well described in terms of gender or recruitment procedures, but reference is made to another study. 3. Exclusion criteria only included psychiatric disorder. 4. Only two trials of the PASAT were administered, and total scores were reported. [PASAT.9] Crawford, Obonsawin, and Allan, 1998b (Gronwall Version) (Table A8.10)
The authors examined the relationship between age and PASAT performance, to obtain validity data on the PASAT and to provide additional normative data. A sample of 152 participants (77 males, 75 females) were screened for neurological, psychiatric, and systemic disorders. Participants ranged in age from 16 to 74, with an average age of 40.21 (13.89), an average education of 12.97 (2.86) years, and an average IQ of 105.0 (14.08). Participants were recruited from various communities and organizations within the United Kingdom, including recreational clubs,
community centers, and public service, and were paid for their participation. The original 61-digit version of Gronwall's PASAT was administered in its entirety, and total scores for the four trials are reported for the total sample and for three age groups.
Study strengths 1. Large sample is used. 2. The composition is well described in terms of age, education, gender, IQ, geographic area, and recruitment procedures. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. 6. The sample is stratified into three age groups.
Considerations regarding use of the study 1. The data are not stratified by education or IQ. 2. Total scores are reported instead of individual scores for each of the four trials. 3. The data were obtained on subjects from the United Kingdom, which may limit their usefulness for clinical interpretation in the United States. [PASAT.10] Prevey, Delaney, Cramer, Mattson, and VA Epilepsy Cooperative Study 264 Group, 1998 (Gronwall Version) (Table A8.11)
As part of a large multicenter study of epilepsy,
the cognitive functioning of patients with complex partial and generalized seizure disorders was examined. Control participants consisted of 45 neurologically normal individuals. Additional exclusion criteria were a history of serious medical disorders, psychiatric disorders, or substance abuse. There is no mention of the gender of the participants nor their IQ; however, average age was 44.4 (11.4) years and average education was 12.8 (1.9) years. Participants were primarily recruited from nonmedical hospital staff at 13 different study centers across the United States. Only two trials (2.4 and 2.0 seconds) of Gronwall's 61-digit version of the PASAT were administered.
151
PACED AUDITORY SERIAL ADDITION TEST
Study strengths 1. Sample composition is relatively well described in terms of age, education, and recruitment procedures but not gender or IQ. 2. Adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. The data are not partitioned by age or education group. 2. Only two trials of the PASAT were used. [PASAT.11] Holdwick and Wingenfeld, 1999 (Gronwall Version) (Table A8.12)
The relationship between mood, anxiety, and attention was assessed in college students. Undergraduate participants were randomly assigned to different conditions in which various mood states were induced (e.g., sad or anxious). Twenty controls were assigned to a neutral condition. There is no specific information regarding the age, education, IQ, or gender of the controls. All were native English speakers, had adequate hearing, and had no histmy of repeating grades in elementuy or high school. Additional exclusion criteria were history of psychological problems, neurological illness affecting attention, head trauma, medication use, substance abuse, attention problems, or learning disability. Age, gender, and ethnicity are described for the sample as a whole but not specifically for the control group. The 61-digit Gronwall version of the PASAT was administered using a computer. The four trials (2.4-, 2.0-, 1.6-, and 1.2-second pacing) were delivered via synthesized computer voice, and responses were recorded by a microphone. All responses were scored manually.
Considerations regarding use of study
1. The sample is small. 2. The age, education, and gender composition of participants in all conditions of the study are provided but not specifically for the control group. [PASAT.12] Honn, Para, Whitacre, and Bornstein, 1999 (Gronwall Version) (Table A8.13)
The authors examined the role of exercise in HIV-positive and -negative males and found that exercise only minimally improved cognitive functioning in both groups. Seventy-six HIV-negative homosexual or bisexual males, with a mean age of 32.5 (6.3) and mean educational level of 14.6 (2.4) years, served as controls. Exclusion criteria were history of intravenous drug use, head injuries resulting in greater than 1 hour of unconsciousness, learning disability, or other neurological disease. In this control sample, 32.4% (n = 13) of nonexercisers and 13.2% (n = 5) of exercisers reported past history of marijuana abuse or dependence. Participants were also administered an intelligence test (WAIS-R), the SCID, and various anxiety and depression rating measures. Only three trials (2.4, 2.0, 1.6 seconds) of Gronwall's 61-digit version of the PASAT were administered.
Study strengths 1. Relatively large sample size. 2. The sample composition is well described in terms of age, education, gender, and IQ. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported.
Study strengths 1. Adequate description of participant recruitment procedures. 2. Adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. An all-male sample is used. 2. Education levels are relatively high. 3. Recruitment procedures are not specified. 4. A portion of the sample reports a history of marijuana abuse or dependence.
152
TESTS OF ATTENTION AND CONCENTRATION
Other comments 1. 'nle exercisers scored significantly higher on the 1.6-second trial of the PASAT relative to the nonexercisers. [PASAT.13] Wingenfeld, Holdwick, Davis, and Hunter, 1999 (Gronwall Version) (Table A8.14)
This study was designed to develop normative data for a computerized version of Gronwall's PASAT. The authors recruited 168 (80 males, 88 females) college students between the ages of 17 and 48 with an average age of 21 (5.1) years at the University of Arkansas, Fayetteville. The sample was 88% Caucasian, 4% African American, 4% Asian American, and 4% other ethnic group. The data were first stratified by gender and then by two age groups (1729, 30-48 years). Exclusion criteria were any history of neurological illness, emotional problems, learning disability, attentional problems, or uncorrected hearing difficulty. Only native English speakers were included. Subjects were given course credit for participation. The testing procedures are similar to those of Gronwall, except that the digits are presented by the computer via speaker and responses are recorded through an external speaker. Additionally, while all four trials are delivered (2.4-, 2.0-, 1.6-, and 1.2-second pacing), a new random series of the 61 digits is presented during each trial.
Other comments 1. Additional outcome measures, such as number of errors committed and number of "no" responses, are reported in the original article, which have not been reproduced in this chapter. [PASAT.14) Bate, Mathias, and Crawford, 2001 (Gronwall Version) (Table A8.1 5)
This study examined the relationship between the Test of Everyday Attention and various neuropsychological measures in patients with severe head injury. The study was conducted in Australia, where 35 controls (20 males, 15 females) who were native English speakers with no history of psychiatric illness, neurological disorders, intellectual disability, substance abuse, or hemiplegia of the dominant hand, were recruited. Participants were an average of 30.2 (10.3) years of age, obtained an average of 12.6 (2.0) years of education, and had an average premorbid IQ of 101.1 (9.1) based on the National Adult Reading Test-Revised (NART-R). The exact location and procedures for participant recruitment are not specified. Also, it is unclear whether the participants were patients with non-brain injury-related illness or healthy individuals from the community. 'nle Gronwall 61-digit version of the PASATwas presented with all four trials (2.4-, 2.0-, 1.6-, 1.2-second pacing).
Study strengths Study strengths 1. Adequate sample sizes, except for the 30-48 age group. 2. 'nle data are stratified first by gender and then by two age groups (17-29, 3048 years). 3. The sample composition is well described in terms of age, gender, ethnicity, and recruitment procedures. 4. Adequate exclusion criteria. 5. Test administration procedures are specified. 6. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Cell size for the 30-48 age group is relatively small (n = 12).
1. The sample composition is well described in terms of age, education, gender, and IQ. 2. Adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are reported.
Considerations reganhng use of the study 1. The sample size is small. 2. Recruitment procedures are not well described. Controls may be non-headinjured medical patients. 3. The data were obtained on Australian subjects, which may limit their usefulness for clinical interpretation in the United States.
153
PACED AUDITORY SERIAL ADDITION TEST
[PASAT.15] Boringa, Lazeron, Reuling, Ader, Hennings, Underboom, de Sonneville, Kalken, and Polman, 2001 (Gronwall Version) (Table A8.16) The sensitivity of the Brief Repeatable Battery of Neuropsychological Tests, used to assess cognitive functioning in patients with MS, was evaluated in Amsterdam. This battery includes a modified, two-trial version of Gronwall's PASAT. A total of 140 healthy participants (62 males, 78 females) between the ages of 22 and 73, with an average age of 45.8 years, were recruited from the community. None had central nervous system disease, psychiatric illness, learning disability, history of substance abuse, serious head injury, or other major medical illness. In terms of education, 31 participants had< 9 years, 55 had 9 or 10 years, and 53 had> 10 years (one participant did not state his education). Gronwall's 61-digit version of the PASAT was administered using only two trials (3- and 2-second pacing).
have been reviewed in this chapter. Participants were 60 (30 males, 30 females) young and middle-aged adults recruited from the Guy's College campus in the vicinity of London, England, via newspaper advertisements and notices. The "young" men were an average of 21.1 (0.4) years of age and had an average IQ of 113.0 (1.5), the "young" women were an average of 20.9 (0.2) years of age and had an average IQ of 112.4 (1.7), the "middle-aged" men were an average of 57.5 (1.3) years of age and had an average IQ of 117.7 (1.8), and the "middle-aged" women were an average of 60.3 (0.7) years of age and had an IQ of 113.3 (2.2). All participants were screened for physical illness in the past week, use of any medication, history of psychiatric disorders, and high scores on a depression or anxiety scale. All four trials (2.4, 2.0, 1.6, and 1.2 seconds) of Gronwall's version of the PASAT were used, and scores for each trial are presented.
Study strengths
Study strengths
1. Large sample size. 2. The sample composition is well described in terms of age, education, and gender. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported.
1. The sample composition is well described in terms of age, gender, IQ, geographic area, and recruitment procedures. 2. The data are stratified by two age groups (young and middle-aged) x gender. 3. Adequate exclusion criteria are used. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Over half of the sample has < 10 years of education. 2. The data were obtained on individuals from Amsterdam, which may limit their usefulness for clinical interpretation in the United States. [PASAT.16] Fluck, Fernandes, and File, 2001 (Gronwall Version) (Table A8.17)
The study had two goals: (1) to examine the effects of two dosages of lorazepam on attention in healthy individuals and (2) to investigate the effects of age and gender on selected tests of attention. More comprehensive norms are presented for the part of the study that examined age and gender; thus, those data
Considerations regarding use of the study 1. Overall sample size is adequate, but individual cells are relatively small. 2. Intelligence level for the sample is relatively high. 3. Educational levels are not reported. 4. The data were obtained on individuals from London, England, which may limit their usefulness for clinical interpretation in the United States. [PASAT.17] Snyder, Cappelleri, Archibald, and Fisk, 2001 (Gronwall Version) (Table A8.18)
Using two different scoring methods for the PASAT, the authors examined the classification
154
TESTS OF ATTENTION AND CONCENTRATION
rates of patients with secondary progressive and relapsing-remitting types of MS. The authors reanalyzed data from MS patients and 35 (9 males, 26 females) healthy controls collected in an earlier study (Fisk & Archibald, 2001). Staff, volunteer workers, and $tudents from the Queen Elizabeth II Health ·Science Centre, Dalhousi University, and MS~ Society in Nova Scotia, Canada, served as qontrols. The average age of the participants ~ 37.97 (12.94) years, average education ~ 14.06 (2.27) years, and average raw WAIS-R!Vocabulary subtest score was 54.5 (7.0). Eiclusion criteria were history of drug or alcohol abuse, major psychiatric illness, learning disability, seizures, head trauma, or other neurological disorder. Additional exclusion criteria were use of specific medications, such as· neuroleptics, benzodiazepines, antiepileptic drugs, or sedatives. All four trials (2.4, 2.0, 1.6, and 1.2 s+conds) of Gronwall's version of the PASAT ~read ministered. Two mean outcome measqres are reported: (1) the mean number of ~rrect responses across the four trials (i.e., the sum of the correct responses for all trials divided by 4) and (2) the dyad score, in which ~airs of correct responses were counted as one correct point.
Study strengths 1. The sample composition is well described (in an earlier study by Fisk & Archibald, 2001) in terms of age, education, Vocabulary subtest perfollllance, geographic area, and recruitment procedures. 2. Adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scotes are reported.
Considerations regarding use of the study 1. The sample size is relatively small 2. The data were obtained on Canadian subjects, which may limit their .sefulness for clinical interpretation In the United States. 3. The educational level is relatively high (14.1 years).
Levin's Administration Version [PASAT.18] Brittain, Ia Marche, Reeder, Roth, and Boll, 1991 (Levin Version) (lables A8.19 and A8.20)
In this normative study using the Levin et al. (1987) version of the PASAT, the authors present data for 526 healthy participants (aged 17-88 years). The data were stratified by four age groups (< 25, 25-39, 40-54, and > 55 years). In the< 25 age group, there were 145 (55 male, 90 female) participants, 79 Caucasians and 66 "other" race, with an average of 13.0 (1.3) years of education and an average Shipley IQ of 105.0 (9.1). In the 25-39 age group, there were 164 (67 male, 97 female) participants, 114 Caucasians and 50 "other" race, with an average of 14.0 (2.2) years of education and an average Shipley IQ of 103.0 (10.4). In the 40-54 age group, there were 95 (50 male, 45 female) participants, 79 Caucasians and 16 "other" race, with an average of 13.0 (3.1) years of education and an average Shipley IQ of 101.0 (12.6). In the >55 age group, there were 122 participants, 119 Caucasians and 3 "other" race, with an average of 12.0 (2.5) years of education and an average Shipley IQ of 106.0 (15.1). For the >55 age group, the authors report 82 males and 82 females, but this appears to be a misprint since there were only 122 participants in total for this age group. Exclusion criteria were a history of psychiatric or neurological problems, as well as concussions or loss of consciousness. A detailed description of this modified version of the PASAT is presented. Error rates (rather than correct responses) and seconds taken for each response are used as the outcome measures.
Study strengths 1. The sample composition is well described in terms of age, education, gender, and Shipley IQ. 2. The data are stratified by age and IQ level. 3. Adequate exclusion criteria. 4. Test administration procedures are well specified. 5. Means and SDs for the error scores are reported.
155
PACED AUDITORY SERIAL ADDITION TEST
Considerations regarding use of the study 1. The data are not stratified by educational level. 2. Overall sample is adequate, but some of the individual cells are small. Other comments 1. Number of errors rather than correct responses are reported. 2. Data for number of seconds taken to respond are reported in the original article, but since these data are rarely used in clinical evaluations, they have not been reproduced in this chapter. [PASAT.19] Roman, Edwall, Buchanan, and Patton, 1991 (Levin Version) (Table A8.21)
The authors conducted this study in order to provide additional normative data for the Levin et al. (1987) version of the PASAT. They recruited 143 white adults in three different age groups (18-27, 33-50, and 60-75). IQ was prorated with the Block Design and Vocabulary subtests from the WAIS-R. In the 18-27 age group, there were 62 (58% female) participants, with an average education of 12.0 (0.77) years and an average IQ of llO (12.3). In the 33-50 age group, there were 40 (50% female) participants, with an average education of 15.0 (2.6) years and an average IQ of 110 (12.3). In the 60-75 age group, there were 41 (51% female) participants, with an average education of 15.0 (3.2) years and an average IQ of 107.0 (11.0). Participants were undergraduate students and employees of Baylor University, students from a local business college, members of service clubs and retired professional groups, employees of local businesses, individuals from senior citizen organizations, and individuals in retirement communities. Only one-fourth of the participants were paid ($5 each). Exclusion criteria were a history of head injury with loss of consciousness, other neurological disorders, substance abuse, psychiatric disorders, or current use of psychoactive medication. Study strengths 1. Relatively large sample. 2. The sample composition is well described in terms of age, education,
gender, ethnicity, IQ, geographic location, and recruitment procedures. 3. The data are presented for three age groups. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Educational levels are high in the middle-aged and older adult age groups. Other comments 1. IQ was estimated using only the Vocabulary and Block Design subtests of the WAIS-R. [PASAT.20] Cicerone, 1997 (Levin Version) (Table A8.22)
The author compared the attentional abilities of mildly head-injured patients and normal controls on four neuropsychological tests. Forty control participants between the ages of 18 and 59, with an average age of 33.3 (12.4) years and average educational ·level of 14.9 (2.2), were enrolled. Participants had no history of head injury, neurological disease, or psychiatric illness and were recruited from the Edison, New Jersey, community. They were administered the Levin et al. (1987) version of the PASAT. Study strengths 1. Adequate sample size. 2. The sample composition is well described in terms of age, education, geographic area, and recruitment procedures but not gender. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Wide age range among participants. 2. Educational level is relatively high. 3. Total PASAT scores, rather than individual scores for each of the four trials, are reported.
156
TESTS OF ATTENTION AND CONCENTRATION
[PASAT.21] Wiens, Fuller, and Crossen, 1997 (Levin Version) (Tables A8.23 and A8.24)
[PASAT.22] Tierslcy, Cicerone, Natelson, and Deluca, 1998 (Levin Version) (Table A8.25)
This is a normative study for Levin et al. 's (1987) version of the PASAT. The authors selected 821 (672 male, 149 female) participants aged 20-49 years who were administered neuropsychological and psychological tests as part of a civil service job selection process. There were 699 Caucasians, 46 African Americans, 31 Hispanics, 32 Asians, and 13 Native Americans in the sample. The data were stratified by gender. Male participants were an average of 29.2 (6.1) years of age, with an average education of 14.6 (1.5) years and an average WAIS-R full-scale IQ (FSIQ) of 106.6 (11.0). Female participants were an average of 29.2 (5.6) years of age, with an average education of 14.5 (1.6) years and an average WAIS-R FSIQ of 105.4 (11.1). They were all from the Pacific Northwest of the United States. All participants had passed physical and medical health screening prior to test administration. All had passed a test of basic academic skills, and none had alcohol or substance abuse. All four trials of Levin's version of the PASAT were administered.
Information-processing speed was compared among patients with chronic fatigue syndrome, mild head injwy, and normal controls. All 20 normal control participants were females, who were recruited from advertisements in the local community of New Jersey and paid for their participation. Participants were an average of37.1 (2.4) years of age, with an average education of 15.0 (0.55) years. Exclusion criteria were current medical illnesses, a history of loss of consciousness > 5 minutes, psychiatric illness, use of medication, or participation in a regular exercise program. The Levin et al. (1987) version of the PASAT was used, and the total number of correct responses for all four trials was reported.
Study strengths 1. The sample composition is well described in terms of age, education, gender, IQ, ethnicity, geographic location, and recruitment procedures. 2. The data are stratified by gender and by age x IQ. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Overall sample size is adequate, but some of the individual cells are relatively small. Other comments 1. The authors found differences between the ethnic groups, but the sample sizes were too small to make any definitive conclusions.
Study strengths 1. The sample composition is well described in terms of age, education, gender, geographic area, and recruitment procedures. 2. Adequate exclusion criteria. 3. Reference is provided for test administration procedures. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Small sample size. 2. Female participants only. 3. Education level is high. 4. Total scores are reported instead of individual scores for each of the four trials. [PASAT.23] Stein, Kennedy, and Twamley, 2002 (Levin Version) (Table A8.26)
The authors examined the difference in neuropsychological test performance of female victims of partner violence with posttraumatic stress disorder (PTSD) compared to victims without PTSD and nonvictimized controls. Twenty-two female control participants were recruited through posted advertisements and personal contacts in the San Diego, California, community. They were an average of 29.4 (10.7) years of age, had an average of 13.9 (1.5) years of education, and had an average raw WAIS-111 Verbal subtest score of 45.9
157
PACED AUDITORY SERIAL ADDITION TEST
(7.4). All participants were ftuent English speakers and had at least an 8th grade reading ability. Further exclusion criteria were meeting DSM-IV criteria for PTSD; use of psychotropic medication within the last 6 weeks of the study; use of oral or intramuscular steroids within the last 4 months of the study; learning disability; history of attention-deficit disorder, substance abuse, seizure disorder, schizophrenia, or other psychotic disorders; or neurological illness. The Levin et al. (1987) version of the PASAT was used, and the total number of correct responses for all four trials was recorded.
Study strengths 1. The sample composition is well described in terms of age, education, Vocabulary subtest performance, geographic area, and recruitment procedures. 2. Adequate exclusion criteria. 3. While test administration is not described, appropriate reference is made to the version of the PASAT used. 4. Means and SDs for the test scores are reported.
Considerations regarding use of this study 1. The sample is small. 2. An all-female sample is used. 3. Summary scores across all trials are reported, rather than correct responses for each individual trial.
[PASAT.24] Diamond, Deluca, Kim, and Kelley, 1997 (Levin Version) (Table A8.27)
This study compared performance on the PASAT and the visual analog version of the PASAT (the PVSAT) of patients with MS and controls. The authors recruited 22 participants to serve as controls on the PASAT task. There is no information about the gender of the participants. They ranged in age from 31 to 56, with an average age of 40.9 (8.9), average educational level of 15.4 (2.2), and average North American Adult Reading Test (NAART) premorbid IQ of 113.6 (13.0). None of the participants had a history of psychiatric or neurological disorders, drug or alcohol abuse, or loss of consciousness. All participants had
normal Mini-Mental Status Exam scores. Participants were recruited from either the Kessler Institute in West Orange, New Jersey, or the local community. The authors report using a 50-digit version of the PASAT at four pacing intervals (2.4, 2.0, 1.6, and 1.2 seconds). However, it is unclear whether the standard version of Levin et al.'s (1987) procedures were used.
Study strengths I. The sample composition is relatively well described in terms of age, education, IQ, geographic area, and recruitment procedures but not gender. 2. Adequate exclusion criteria. 3. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. The sample size is small. 2. It is unclear whether the digits were presented in a different random order or in a fixed random order across trials. 3. The educational level is relatively high.
PASAT-50, PASAT-100, and PASAT-200 Administration Versions [PASAT.25) Diehr, Heaton, Miller, and Grant, and the HIV Neurobehavioral Center, 1998 (PASAT-200 Version) (Table A8.28)
The authors present normative data for a large sample of Caucasian and African-American males and females, using a modified version of the PASAT (i.e., PASAT-200; see section on Modifications and Alternate Formats of the PASAT). A total of 566 participants were used from four separate studies. One hundred fifty of the participants were HIV-!-seronegative controls recruited from a research center in San Diego, California; 277 participants were African-American volunteers recruited for a normative study from the San Diego, California community; 78 served as controls for a study examining the effects of alcohol on cognitive performance; and 60 were controls for a study examining the effects of eosinophilia myalgia syndrome. Exclusion criteria for all studies were history of neuropsychiatric
158
TESTS OF ATTENTION AND CONCENTRATION
conditions such as substance abuse or dependence, head injury, and developmental disability. Participants ranged in age from 20 to 68, with an average age of39.7 (12.1) years, and ranged in education from 9 to 20, with an average education of 14.2 (2.6) years; 39% were female and 55% were African American. Briefly, the PASAT-200 is very similar to Levin et al.'s (1987) version in that it consists of the presentation of 50 single digits (except for 7) in random order at four different pacing intervals. However, the pacing intervals are 3.0-, 2.4-, 2.0-, and 1.6-seconds per digit, instead of 2.4-, 2.0-, 1.6- and 1.2-seconds. Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, ethnicity, gender, geographic area, and recruitment criteria. 3. Test administration procedures are specified. 4. Adequate exclusion criteria are used. 5. Means and SDs for the test scores are reported. 6. Data are stratified by ethnicity and by educational level.
of education, 21% had a high school education, and 12% had lower than a high school education. Forty-five percent of the sample were Caucasian, while the remaining 55% were African American. All participants were screened for psychiatric illness, developmental disabilities, substance abuse, and head injuries. A more detailed description of the sample is provided above (PASAT.25) and in Diehr et al. (1998). Brie8y, the PASAT-50 consists of one trial of 50 digits (excluding 7) presented in random order at a pace of 3 seconds. The PASAT-100 consists of the same 50 digits presented over two trials, 3-second pace and 2.4-second pace. Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, ethnicity, gender, geographic area, and recruitment criteria. 3. Test administration procedures are specified. 4. Adequate exclusion criteria are used. 5. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Total PASAT-200 scores, rather than individual scores for each of the four trials, are reported.
Considerations regarding use of the study 1. The average education level of the sample is relatively high. 2. Total scores, rather than individual scores for each trial, are reported.
[PASAT.26] Diehr, Cherner, Wolfson, Miller, Grant, Heaton, and the HIV Neurobehavioral Research Center Group, 2003
CONCLUSIONS
(PASAT-50, -100, -200 Versions)
(Table A8.29) The authors present demographically corrected normative data for two shortened versions of the PASAT-200, namely, the PASAT-50 and the PASAT-100. The authors used 560 (61% male) participants from a pool of archival data on which the PASAT-200 normative information was based (Diehr et al., 1998). Participants ranged in age from 20 to 68, with an average age of 39.7 (12.1), and 24% of the sample was over 50 years. Their education level ranged from 9 to 20, with an average education of 14.2 (2.6) years. Most (33%) had between 13 and 15 years
Studies have documented the utility of the PASAT as a measure of attention/concentration, working memory, and information processing. In fact, the National Multiple Sclerosis Society included a version of this test in their Brief Repeatable Battery of Neuropsychological Tests. However, the major drawback of the original version of the PASAT is that it can be a lengthy, difficult, and stressful test. In fact, several studies have noted participant frustration and attrition. Fortunately, there are alternatives to the original version of the test. Clinicians can administer only one or two trials rather than
PACED AUDITORY SERIAL ADDITION TEST
all four or use alternative, shortened versions of the PASAT. A review of the literature reveals that there are no significant gender effects for the PASAT but that scores are strongly affected by age, education, and intellectual functioning. As would be expected for most tests involving speed, PASAT performance significantly declines with age, particularly as the pacing time for the digits is reduced, requiring more cognitive resources. Likewise, inspection of the data clearly reveals an improvement in performance with higher educational levels. While not all studies have found strong correlations between the PASAT and intellectual functioning, the data reviewed in this chapter indicate that it is an important factor to consider when administering this test. It is clear that
159
further normative studies partitioning the effects of age, education, and IQ are needed. Significant practice effects have been reported for the Gronwall (1977a,b) version of the PASAT, presumably because the digits are presented in the same random order during each pacing trial. This problem has been addressed to some degree with Levin et al.'s (1987) version, in which digits are presented in a different random order during each trial. The effects of culture, ethnicity, and linguistic background on the PASAT have received very little attention. Only one study explicitly examined the role of ethnicity in PASAT performance (Diehr et al., 1998). It is clear that future PASAT normative studies need to examine factors such as culture, ethnicity, and bilingualism. 2
•Meta-analyses for the PASAT were conducted using data reported in this chapter for each of the four presentation rates separately. Although the R2 and significance level for the resulting regression were minimally acceptable, we felt that the solution was greatly inHuenced by only few data points which had a considerable weight. Therefore, the results of meta-analyses are not presented in this chapter.
J
9 Cancellatiori Tests
BRIEF HISTORY OF THE TESTt
A number of cancellation tests have been developed over the years. Such tests te primarily designed to assess aspects of atlt!ntion, such as sustained and selective a~ntion. Sustained attentiOn "refers to the abllity to maintain a consistent level of perfoemance over an extended period of time,''; while selective attention entails selection of ~levant target stimuli while avoiding distracto~ (Ruff & Allen, 1996). Some cancellation te$ts are also referred to as "vigilance tests" (tezak, 1995; Lezak et al., 2004) and typically fivolve measures of both speed and accuracy of performance. A number of cancellation testJ using letters, numbers, or symbols as target stimuli are available to clinicians. The Ruff 2&7 (Ruff et al., 1986a), Digit Vigilance (Le;vis & Rennick, 1979), Digit Cancellation Test (Della Salla et al., 1992, 1998}, Visual Searcih and Attention Test (Trenerry et al., 1990), Yerbal and Nonverbal Cancellation Tasks (Mt$ulam, 1985}, Letter and Symbol Cancellatio+ Task (Caplan, 1985), and Star Cancellation (Halligan et al., 1991; Wilson et al., 1987) are $mong the many cancellation tests available t~ clinicians and researchers (see Lezak, 19~. and Lezak et al., 2004, for more details on these tests). The Ruff 2&7 Selective Attentioq Test and Digit Vigilance Test are the tw~ most 160
commonly used cancellation tests with the most available literature and have been selected for review in this chapter.
RUFF 2&7 SELECTIVE ATTENTION TEST Brief Overview of the Ruff 2&7
The Ruff 2&7 Selective Attention Test was developed by Ruff and colleagues and is included in the San Diego Neuropsychological Test Battery (Baser & Ruff, 1987; Ruff & Crouch, 1991). The test is designed to examine both sustained and selective attention using two distractor conditions. The test consists of 20 blocks, each containing three lines of 50 characters. Within each line, 10 target digits (2s and 7s) are intermixed with either other number distractors or capital letter distractors. Ruff distinguished two test conditions: (1) blocks in which the target numbers are embedded among letters, referred to as the "Automatic Detection" condition, and (2) blocks in which the target stimuli are embedded among other numbers, referred to as the "Controlled Search" condition. The presentation of the conditions (blocks of all digits or blocks of digits and letters) is alternated. Following brief practice trials, the examinee is given 15 seconds to complete each of the 20 blocks. He or she is
CANCELLATION TESTS
prompted to move to the succeeding block when the examiner says "next." Ruff and Allen (1996) state that in the Automatic Detection condition, because the numbers belong to a different stimulus category from the letters, the selection process is automatic (i.e., "single-step retrieval of categorical information"). However, in the Controlled Search condition, since the targets and distractors belong to the same category, a more effortful search involving aspects of working memory is required. Three outcome measures can be obtained for each of the two conditions: (1) speed is measured with total number of target letters crossed out, (2) errors consist of the total number of commissions and omissions, and (3) detection accuracy is calculated by dividing the speed value by the sum of the speed plus error values (Ruff & Allen, 1996). A number of clinical studies have been conducted with the Ruff 2&7 test. Ruff et al. (1992) found that patients with right hemisphere cerebral lesions performed at far slower rates than those with left-sided lesions and normal controls. Interestingly, those with right anterior lesions were also far less accurate in their performance, while patients with left anterior lesions performed similar to controls. Ruff et al. (1989a) examined the effects of cognitive rehabilitation on Ruff 2&7 performance in patients with head injury. They found that teaching cognitive strategies, such as focused, sustained attention, as well as teaching spatial relationships and memory strategies actually improved test performance over time. Specifically, on the Ruff 2&7, patients in the cognitive strategy condition made fewer errors relative to those in the control condition. Bate et al. (2001) found that patients with severe traumatic brain injury (TBI) crossed out fewer target stimuli (i.e., were slower) than normal controls. Additionally, while significance values are not reported, the TBI patients who were within 1 year postinjury were slower than those who were at least 2 years postinjury. Cicerone and Azullay (2002), in their examination of the sensitivity and specificity of various neuropsychological tests in patients with mild TBI
161
(but whose symptoms persisted for at least 3 months), found the Ruff2&7 test to be among the most sensitive and specific measures. They concluded that this test "can be used with confidence" since those without concussions were unlikely to display impairments on the Ruff 2&7. Finally, Ruff et al. (1993) found that the Ruff 2&7 was among the neuropsychological tests that most strongly predicted head-injured patients' ability to return to work after 1-6 months postinjury. Ruff (1994) observed relatively mild impairment in depressed patients on the Ruff 2&7. The percentile ranking of the majority of patients fell within the average range for speed and accuracy. In fact, none of the depressed patients was impaired on the accuracy measures, and only three patients exhibited slowed speed. Weiss (1996) reported that schizophrenic patients had more difficulty with speed (only 23% of patients scored in the normal range) than with accuracy (67% scored in the normal range) on the Ruff 2&7. Additionally, patients were better able to detect a target stimulus when it was embedded in letters (Automatic Detection condition) rather than within other digits (Controlled Search condition). Finally, Schmitt et al. (1988) discovered that AIDS patients and patients with AIDSrelated complex who were on medication displayed improved performance on the Ruff 2&7 relative to those who were receiving a placebo. Further details about the Ruff 2&7 testing materials, administration procedures, and scoring can be obtained from the test manual and kit (see Appendix 1 for ordering information; also Lezak et al., 2004). Psychometric Properties of the Ruff 2&7
Ruff et al. (1986a) performed a test-retest reliability study of the Ruff 2&7 for four age groups, ranging between 16 and 70 years of age. Testing probes were separated by 6 months. The correlation coefficients for the four age groups by the two conditions (i.e., automatic or controlled) ranged 0.84-0.97. The r values were in approximately the same ranges for the four age groups; however,
162
TESTS OF ATTENTION AND CONCENTRATION
slightly better performance was noted· for the automatic condition (letter distractors) relative to the controlled condition (dipt distractors). While an improvemept of approximately 10 points on the retest was reported, the two conditions showed similar rates of practice effects (Ruff et al., 1986a). Baser and Ruff (1987) conducted factor analysis on the Ruff 2&7 along with a jhost of other neuropsychological tests and fouhd that in normal controls the Ruff 2&7 best [loaded on a factor they termed "complex lintelligence." This factor also contained suclt measures as Controlled Oral Word Assfiation, Full Scale IQ, Vocabulary, Block ~esign, Digit Span, and Digit Symbol. Howdver, in the same study, using a mixed clinical ~ample (e.g., psychiatric and head-injured pai;ients), the Ruff 2&7 outcome measures loade4 on an "arousal" factor (which also included :Finger Tapping, mean designs on the Ruff Figural Fluency Test, Digit Symbol) and a "pikning and 8exibility" factor (which also ittluded outcome measures from the Wiscons~ Card Sorting Test, perseverative score frdm the Ruff Figural Fluency Test, and Ruff-Light Trail Learning Test). I
Relationship Between Ruff 2&7 Performance and Demographic Factors Ruff et al. (1986a) examined diff~ences between genders, four age groups, and three educational levels on the two Ruff 2&!1 conditions. They found no gender effec~. with males and females performing similarly :across the two conditions. Clear age effect~ were found across the two conditions, with a: linear decline in performance as age increased! Similarly, they found that performance im~roved as educational level increased up to 15~years; Ruff 2&7 performance plateaued at > 1$ years of education. They also found that on ayerage individuals performed approximately 15 ~ints better on the Automatic Detection (letter distractors) relative to the Controlled Search (digit distractors) condition. Clearly, more normative studies are qeeded to better understand the relationship hEttween key demographic factors and Ruff 2&7 gerformance. Additional studies should also elilmine
the effects of intellectual functioning, ethnicity, and motor functioning on the Ruff 2&7. For further normative information regarding the Ruff 2&7, see the professional manual produced by Ruff and Allen (1996).
DICIT VIGILANCE TEST
Brief Overview of the DVT The Digit Vigilance Test (DVT) was developed by Lewis and Rennick (1979) as part of a larger test battery, the Repeatable CognitivePerceptual-Motor Battery. The DVT is a test of vigilance and sustained attention, which also measures aspects of rapid visual tracking ability and psychomotor speed. This test consists of two pages, with 35 single digits appearing within 59 rows. The digits on the first page are printed in red ink, and the digits on the second page are printed in blue ink. For the standard administration, the task is to cross out the number 6, which is randomly dispersed throughout the page of digits. The alternate administration procedure requires that the participant cross out the number 9, which also randomly appears throughout the page of digits. The time in seconds taken to complete the task, the number of omissions (target numbers not crossed out), and the number of commissions (numbers other than the target crossed out) are recorded. There are relatively few clinical or normative studies on this test. In a study of mildly hypoxemic patients with chronic obstructive pulmonary disease (COPD), Prigatano et al. (1983) observed that patients required a significantly greater amount of time to complete the DVT relative to normal controls. In a study by Bardwell et al. (2001), DVT was the only neuropsychological test score to significantly improve in obstructive sleep apnea patients who were given continuous positive airway pressure relative to those who were given placebo treatment (Grant et al., 1987). These studies suggest that the DVT, and perhaps similar cancellation tests, is sensitive to detecting neuropsychological deficits in patients with even mild forms of hypoxemia. Smith et al. (2001) reported better performance on the DVT in postmenopausal women
163
CANCELLATION TESTS
who were on hormone replacement therapy (HRT) relative to their age-matched counterparts who were not taking HRT. Shean et al. (2002) found that coaching or providing testtaking instructions significantly improved DVT performance in a group of patients with schizophrenia. Additionally, these authors detected that negative symptoms and degree of disorganized thought significantly correlated with lack of ability to benefit from coaching on the DVT. These findings essentially replicated an earlier study by Eckman and Shean (2000). Psychometric Properties of the DVT
Kelland and Lewis (1994) reported a testretest (probes separated by 1 week) coefficient of 0.87, with a 95% confidence interval of 0.71--0.95, for the standard form test administration of the DVT and a coefficient of 0.89, for the alternate form administration, with a 95% confidence interval of 0.75--0.96. Unfortunately, these data are based on a sample of only 20 individuals. In a subsequent study, Kelland and Lewis (1996) reported practice effects on the DVT, with test speed improving on the second week of test administration relative to the first (initial) testing session. However, no improvements were noted between the third week of testing relative to the second. Kelland and Lewis (1996) also assessed the convergent validity of the Repeatable Cognitive-Perceptual-Motor Battery, which contains the DVT, by evaluating its sensitivity to diazepam. While the overall score for the battery discriminated between individuals on diazepam and placebo, no differences were found between the two groups for the DVT. However, this was also a small sample, with each group containing only 20 individuals. Grant et al. (1987) conducted a factor analysis on tests from the Halstead-Reitan Neuropsychological Test Battery and several other neuropsychological tests, including the DVT, in COPD patients and healthy controls. They observed the DVT to cluster with tests of "alertness-psychomotor speed," such as Trails B and Digit Symbol. In the same study, they noted that the DVT was one of only three
neuropsychological tests that did not discriminate between mild, moderate, and severe hypoxemic COPD patients but did discriminate between the COPD group as a whole and normal controls. Overall, these authors conclude that the DVT clusters with tests of attention and psychomotor speed and that it is a sensitive test for discriminating COPD patients from controls but not for discriminating patients at various stages of COPD. Relationship Between DVT Performance and Demographic Factors
As noted earlier, there are very few normative studies available for the DVT. Heaton et al. (1991) included the DVT in their comprehensive normative book on various neuropsychological tests, making this the largest normative study to date on the DVT. Heaton et al. (1991) detected that in a group of 210 participants, age and years of education accounted for 24% and 13% of variability in the time to complete the test, respectively, and for 15% and 16% of variability in the number of errors committed. However, gender alone accounted for only 2% of the variability in DVT outcome measures. Kelland and Lewis (1996) also found no gender effect for total time required to complete the task or for total number of errors in a group of college students.
METHOD FOR EVALUATING THE NORMATIVE REPORTS
To adequately evaluate the Ruff 2&7 and DVT normative reports, five criterion variables were deemed critical. The first four of these are related to subject variables, and the last one refers to procedural issues. Subject Variables
Sample Size Fifty cases are considered a desirable sample size. Although this criterion is somewhat arbitrary, a large number of studies suggest that data based on small sample sizes are highly influenced by individual differences
164
TESTS OF ATTENTION AND CONCENTRATION
and do not provide a reliable estimate of the population mean. Sample Composition Description
Information regarding medical and p~hiatric exclusion criteria is important. It is unclear if gender, intellectual level, handedness, geographic recruitment region, socioeconomic status, occupation, ethnicity, or recrUitment procedures are relevant. Until this is determined, it is best that this information be provided. Age Group Interval
This criterion refers to grouping of the data into limited age intervals. This requireJnent is especially relevant for this test since a strong effect of age on cancellation test perfo~ance has been demonstrated in the literatu~. Reporting of Educational levels
Given the possible association betw~ education and cancellation test scores, information regarding educational level shotild be reported for each subgroup.
Procedural Variable Data Reporting
For the Ruff 2&7, group means and standard deviations for the number of items correctly cancelled should be reported for the Au~matic Detection and Controlled Search co~tions separately. For the ovr, the mean and SD for time in seconds taken to complete the task should be reported. Additional useful irformation for the cancellation tests includes the number of omissions (target numbers not cancelled) and the number of commissions: (numbers other than the target digits cancelltil).
Only one study was designed to provide normative information on the Ruff 2&:7 (Ruff et al., 1986a). Other data on the Ruff 2&:7 come from control groups in clinical comparison studies. Ruff et al. (1986a) partition normative data for the two conditions by four age groups and three educational levels; the other studies report demographic information. Another study by Ruff et al. (1992) provides normative data for speed and accuracy for normal controls. Finally, Bate et al. (2001) provide Ruff 2&7 data on a small sample of healthy controls. Most of these studies report either speed or speed and accuracy data summed across the two Ruff 2&:7 conditions. Additional normative information, particularly tables for converting raw scores into T scores and percentiles, based on age and educational level, are provided in the Ruff 2&:7 professional manual (Ruff &: Allen, 1996). There are very few normative studies on the DVf. Most of the studies have small sample sizes (10--40), with the exception of Heaton et al.'s (1991, 2004) normative manuals, which include data for 210 participants with standardized scores adjusting for age, education, and gender presented for African-American and Caucasian participants separately in the 2004 edition. In this chapter, we review studies which use Ruff2&:7, followed by DVf studies. Published manuals are reviewed first, followed by normative studies and control groups from clinical comparison studies presented in ascending chronological order for each test separately. The text of study descriptions contains references to the corresponding tables identified by number in Appendix 9. Table A9.1, the locator table, summarizes information provided in the studies described in this chapter. 1
SUMMARIES OF THE STUDIES
Ruff 2&7 Manual SUMMARY OF THE STATUS OF THE NORMS Information presented in the studies fleporting data for the cancellation tests differslacross studies. Some of these differences be summarized below.
Jill
[Ruff 2&7.1] Ruff and Allen, 1996 The normative information in this manual is primarily based on previous studies by Ruff 'Children's norms for various cancellation tests are available in Baron (2004) and Spreen and Strauss (1998).
165
CANCELLATION TESTS
and colleagues (Ruff et al., 1986a; Baser & Ruff, 1987; Ruff & Crouch, 1991). A total of 360 (180 male, 180 female) healthy volunteers between the ages of 16 and 70 years and with 7-22 years of education participated in the study. The sample was initially stratified by four age groups (16-24, 25-39, 40-45, and 55-70 years) and three education groups (:512, 13-15, 16 years) but not gender since this was not a significant factor in test performance. The authors mention that the sample "roughly approximated the 1980 U.S. census proportions with regard to race," but no specific ethnicity data are provided. Data are available for speed and accuracy for each condition individually, as well as total scores for speed and accuracy for the two conditions combined. Thus, a total of six outcome variables are available. Raw score to T score conversion and percentiles are available by age and educational level. Sixty-five percent of the sample was recruited from California, 30% from Michigan, and the rest from the eastern seaboard. The normative data contained in Ruff and Allen's manual are not reproduced here, and the interested reader is referred directly to this publication for further information.
Study strengths 1. The sample composition is well described in terms of age, education, gender, and geographic area. 2. The testing procedures and scoring are well described in the manual. 3. Means and SDs are reported for some of the Ruff 2&7 outcome measures. 4. Raw scores can easily be converted to T scores and percentiles for four age groups and three educational levels.
Considerations regarding use of the study 1. Overall sample is adequate, but some individual cells are relatively small (e.g., fewer than 20 participants in the 55-70 year age group who have 13-15 years of education). 2. No exclusion criteria and recruitment procedures are provided.
Normative Studies and Control Groups in Clinical Comparison Studies for the Ruff 2&7 [RUFF 2&7.2] Ruff, Evans, and Light, 1986a (Table A9.2)
The authors recruited 259 healthy participants (107 male, 152 female) as part of this normative study. Nearly half of the sample was recruited from California and the rest, from Michigan. The investigators selected individuals with a wide age range and educational attainment in order to examine the effects of these demographic factors on test performance. Participants were aged 16-70. The authors report that their sample had 7-72 years of education, but it is unclear whether the upper limit reported is a misprint. The sample was stratified by four age groups (16-24, 25-39, 40-54, and 55-70 years) and three educational levels (:512, 13-15, ~16 years). Standard administration procedures were used.
Study strengths 1. The sample composition is well described in terms of age, education, gender, and geographic area. 2. Means and SDs for the test scores are reported. 3. Data are stratified by four age x three education groups.
Considerations regarding use of the study 1. Overall sample is adequate, but individual cells are relatively small (e.g., some cells contain only 10 participants). 2. No exclusion criteria and recruitment procedures are reported. [RUFF 2&7.3] Ruff, Niemann, Allen, Farrow, and Wylie, 1992 (Table A9.3)
This study examined the effects of cerebral lesions on Ruff 2&7 performance. The authors selected 60 normal controls from a larger standardization sample of 259 reported by Baser and Ruff (1987). The larger sample was recruited from California, Michigan, and New York. Participants were screened for chronic medical illness, "extensive" substance abuse,
166
TESTS OF ATTENTION AND CONCENTRATION
or loss of consciousness due to a heacJ injury. The ethnic breakdown is reported by Baser and Ruff (1987) for the larger subject pool but not for the subsample that setved · in this study. The 60 participants in the currept study were an average of31.2 (4.1) years of~ge and had an average of 12.9 (1.5) years ol education. There is no information on the gender distribution for this sample. Standard, administration procedures were used.
i
Study strengths 1. Sample size is adequate. . 2. The sample composition is well d~cribed in terms of age and education. 3. Adequate exclusion criteria. , 4. Means and SDs for the total sceres for both conditions are reported.
i
Consideration regarding use of the study 1. The data are not partitioned by age.
[RUFF 2&7.4] Bate, Mathias, and Crawfdrd, 2001 (Table A9.4)
This study examined the relationship ~tween the Test of Everyday Attention and ~arious neuropsychologicaJ measures in patierlts with severe head injury. The study was cor!ducted in Australia, where 35 controls (20 nfde. 15 female), who were native English SP,eakers, with no history of psychiatric illness, (neurologicaJ disorders, intellectual disability. substance abuse, or hemiplegia of the dominant hand, were recruited. The exact locatibn and procedures for participant recruitment are not specified. Also, it is unclear whether tite participants were patients with non-brain injuryrelated illness or healthy individuals frf>m the community. Participants were an avefage of 30.2 (10.3) years of age, had an average of 12.6 (2.0) years of education, and had an tfverage premorbid IQ of 101.1 (9.1), as estim.ed by the National Adult Reading Test-Revised (NART-R) (Crawford, 1992). S~ndard administration procedures were used. · Study strengths 1. The sample composition is: well described in terms of age, edti:ation, gender, and premorbid IQ.
l
2. Adequate exclusion criteria. 3. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. The sample size is relatively small. 2. Recruitment procedures are not well described. Controls may be nonhead-injured medical patients. 3. The data were obtained on Australian participants, which may limit their usefulness for clinical interpretation in the United States.
DVT Manual [DVT.1J Heaton, Grant, and Matthews, 1991; Heaton, Miller, Taylor, and Grant, 2004 The DVf manual (Lewis, 1995) refers the reader to the comprehensive normative book published by Heaton et al. (1991). Heaton et al. (1991) gathered a large sample of data on various neuropsychological tests over a 15-year period using several studies. The DVf is among the tests for which normative data are presented. The total sample used in this normative book was recruited from various areas across the United States, including California, Washington, Colorado, Texas, Oklahoma, Wisconsin, Illinois, Michigan, New York, and Virginia, as well as Canada. It is unclear which specific regions were used for DVf data collection. All participants reportedly completed structured interviews, and those with a history of learning disabilities, neurologicaJ illness, "significant" head injury, "serious" psychiatric illness (e.g., schizophrenia), or substance abuse were excluded from the normative data set. The DVf normative data were gathered on a total of 280 participants, who were an average of 44.9 (20.0) years of age and obtained an average of 14.0 (3.2) years of education. The manual provides regression-based raw to T score and percentile conversion for the DVf (and other neuropsychological tests) based on gender, 10 age groups (20-34, 35-39, 40-44,45-49,50-54,55-59,60-64,65-69,7074, and 75--80 years) and six education groups (6-8, 9-11, 12, 13-15, 16-17, and 18+ years). The average DVf raw score reported for the
CANCELLATION TESTS
entire sample of 280 participants for time taken to complete the task is 388.5 (86.5), and that for errors committed is 7.1 (8.7). Other data from the manual are not reproduced here. Interested readers are referred to the original publication. In their recently updated normative manual, Heaton et al. (2004) have gathered additional normative data for the DVf (and other neuropsychological tests). Their sample consists of 860 normal participants, of whom 466 are Caucasian and 394 are African American. The average age of the Caucasian sample was 47.0 (20.2) years, and average educational level was 14.0 (2.9) years; approximately 57.3% of the sample were male. The average age of the African-American sample was 38.7 (12.2) years, and average educational level was 13.5 (2.5) years; approximately 49.7% of the sample were male. The authors report that the data were gathered from various individual and multicenter collaborative research projects over a 25year period. Participants were from various U.S. states and Canada, including California, Washington, Colorado, Texas, Oklahoma, Wisconsin, Illinois, Michigan, New York, Virginia, and the province of Manitoba, Canada. All participants reportedly completed structured interviews, and those with a history of learning disabilities, neurological illness, "significant" head injury, "serious" psychiatric illness (e.g., schizophrenia), or substance abuse were excluded from the normative data set. The manual provides regression-based raw to T score and percentile conversion for the DVf (and other neuropsychological tests) based on gender, 11 age groups (20-34, 35--39, 4~. 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, and 80-85 years), six education groups (7-8, 9-11, 12, 13-15, 16-17, and 18-20 years), and two ethnic groups. The average DVf raw score reported for the entire sample of 860 participants for time taken to complete the task is 390.87 (57.59). The average time taken to complete the DVf for the Caucasian sample is 394.59 (88.92), and that for African-American sample is 380.41 (86.63). Other data from the manual are not reproduced here. Interested readers are referred to the original publication. Stan-
167
dard administration procedures were used in both manuals. Study strengths 1. The sample composition is well described in terms of age, gender, ethnicity, and education. 2. Adequate exclusion criteria. 3. Means and SDs are reported for Caucasian and African-American participants separately and for the entire sample. Additionally, T scores and percentiles corrected for age and education are reported for different demographic groups. Considerations regarding use of the study 1. Specific sample sizes used per cell are not reported. 2. Recruitment procedures are not well described. Other comments 1. The interested reader is referred to the Fastenau and Adams (1996) critique of the Heaton et al. (1991) norms, and Heaton et al.'s (1996a) response to this critique.
Normative Studies and Control Groups in Clinical Comparison Studies for the DVT [DVT.2] Prigatano, Parsons, Levin, Wright, and Hawryluk, 1983 (Table A9.5)
The authors examined the neuropsychological test performance of mildly hypoxemic patients with COPD. Twenty-five healthy controls were matched to the COPD patients based on age, education, handedness, and gender. Control participants were an average of 59.6 (9.0) years of age and obtained an average of 10.5 (3.3) years of education. Participants were excluded if they had an "illness that might interfere with their neuropsychological testing (e.g., physical handicap, emotional problems, alcoholism or psychosis)," had COPD, were taking medications for heart or lung disease, or had diabetes. Fifteen of the participants were selected from Winnipeg, Manitoba, Canada, and 10 were selected from
168
TESTS OF ATTENTION AND CONCENTRATION
Oklahoma City, Oklahoma. Standard· administration procedures were used.
Study strengths
.
1. The sample composition is well descnbed in terms of age, education, geOgraphic location, and recruitment procedjues. 2. Adequate exclusion criteria. : 3. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Small sample size. . 2. Wide age range for the sample. Data are not presented by age group. 3. The data for over half of the sample were obtained on Canadian part:ipipants, which may limit their use~ss for clinical interpretation in the :United States. ' 4. Low educational level. [0Vl.3] Grant, Prigatano, Heaton, McS~eeny, Wright and Adams, 1987 (Table A9.6)
.
The authors examined neuropsycbPlogical functioning in COPD patients wi~ mild, moderate, and severe hypoxemia. They selected 99 "nonpatient" participants (75 m!Ie, 24 female) who did not have COPD, a ru.tory of "significant" head injury, a history of substance abuse, heart disease that required treaa,ent, or neurological or metabolic illnesses. Partfipants were an average of 63.1 years of age apd had obtained an average of 10.2 (3.6) yead of education. The authors do not specify !testing procedures but do mention the larger ~ttery from which the Dvr is drawn (i.e., the Rennick-Lafayette Repeatable Battery).
Study strengths 1. Relatively large sample size. 2. The sample composition is well described in terms of age, edtfation, I and gender. : 3. Adequate exclusion criteria. 4. Means and SDs for the test sco!es are reported. '
2. Data are not partitioned by age. 3. Low educational level. [DVT.4] Kelland and Lewis, 1994 (Table A9.7)
This study was designed to assess the testretest reliability and validity of the DVf, as well as to measure the single-dose effects of diazepam in groups of college students. The authors selected 20 college students (10 male, 10 female) from a "large urban university" to serve as controls (who were administered a placebo rather than diazepam). Participants ranged in age from 18 to 30, with an average age of 20.0 (2.8) and an average educational level of 13.1 (1.3) years. Participants were excluded from the study if they reported taking medications; had a history of subs~ce abuse· had a medical history that reqwred centr~ nervous system~epressant medication use; had a history of neurological, cardiac, renal, or hepatic disease; or drank more than two cups of coffee a day. The DVf, along ~.th other neuropsychological tests, was administered two times to each participant, with each session separated by 1 week. Standard administration procedures were used. Data are reported for both the standard (crossing out 9s) and the alternate (crossing out 6s) administrations. These data were later reanalyzed by Kelland and Lewis (1996), who found a practice effect from week 1 to week 2 of test administration but no differences between week 2 and week 3. The Kelland and Lewis (1996) data for weeks 1 and 2 are the same as those reported in this study and, thus, will not be reproduced in this chapter.
Study strengths
.
.
Considerations regarding use of the study 1. Test administration procedures are not specifically described.
.
1. The sample composition 1S well descnbed in terms of age, gender, education, and recruitment procedures. 2. Adequate exclusion criteria. 3. Means and SDs for the test scores are reported. 4. Test-retest data are reported.
Consideration regarding use of the study 1. Small sample size.
169
CANCELLATION TESTS
[DVT.S] Bamcord and Wanlass, 1999 (Table A9.8)
The authors compared the performance of college students on six neuropsychological tests administered in the standard, paper-andpencil format vs. a more ecological format of using plastic sheet protectors so as to not create paper waste. For the purposes of this chapter, the participants in the standard testing format were considered the "normal" controls. Ten college students (five male, five female) were recruited. Participants were an average of 19.8 (3.95) years of age, with an average of 12.8 (0.63) years of education.
Study strengths 1. The sample composition is well described in terms of age, education, and gender. 2. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. The sample is small. 2. No exclusion criteria are provided. 3. Test administration procedures are not specified.
scores for the Vocabulary, Block Design, and Wide Range Achievement Test (WRAT) Reading for the HRT group were 14.2 (3.3), 12.7 (2.4), and 108.6 (5.5), respectively; values for the non-HRT group were 13.9 (3.7), 11.8 (3.5), and 108.8 (12.7), respectively. The women on HRT made significantly fewer errors on the DVT than those who were not onHRT.
Study strengths 1. The sample composition is well described in terms of age, education, gender, and recruitment procedures, with limited IQ data available. 2. Adequate exclusion criteria. 3. Means and SDs for the test scores are reported. 4. Data are reported for postmenopausal women on HRT and those not on HRT.
Considerations regarding use of this study 1. The sample is small. 2. Educational level is relatively high. 3. An all-female sample is used. [DVT.7] Stein, Kennedy, and Twamley, 2002 (Table A9.10)
[DVT.6] Smith, Giordani, Lajiness-O'Neill, and Zubieta, 2001 (Table A9.9)
The neuropsychological effects of HRT were examined in 29 healthy postmenopausal women. Participants were recruited through advertisements and selected if they were 60 years or older, had received HRT without interruption after menopause, or had never been treated with HRT. Exclusion criteria included participants who had stopped and restarted HRT for more than 1 month at a time; had a significant general medical, neurological, or psychiatric illness; had a history of head trauma leading to loss of consciousness; had substance dependence; or were taking medications affecting the central nervous system. Standard administration procedures were used. Participants taking HRT were an average of 65.0 (4.0) years of age, with an average of 15.0 (2.0) years of education; and those not on HRT were an average of 67.0 (6.0) years of age, with an average of 16.0 (3.0) years of education. Average WAIS-R standard
The authors compared neuropsychological test performance of female victims of partner violence with PTSD to victims without PTSD and nonvictimized controls. Twenty-two female control participants were recruited through posted advertisements and personal contacts in the San Diego, California, community. They were an average of 29.4 (10.7) years of age, had an average of 13.9 (1.5) years of education, and had an average raw WAI SIll Vocabulary subtest score of 45.9 (7.4). All participants were fluent English speakers and had at least an 8th-grade reading ability. Further exclusion criteria were presence of PTSD (DSM-IV criteria), use of psychotropic medication within the last 6 weeks of the study, use of oral or intramuscular steroids within the last 4 months of the study, learning disability, history of attention-deficit disorder, history of substance abuse, seizure disorder, a history of schizophrenia or other psychotic disorders, or neurological illness. Standard administration procedures were used.
170
TESTS OF ATTENTION AND CONCENTRATION
Study strengths 1. The sample composition is well ~scribed in terms of age, education, geographic area, and recruitment procedw.'es, with limited Verbal IQ data (i.e., Vocabulary raw scores were available). 2. Rigorous exclusion criteria. 3. Means and SDs for the test scbres are reported.
Considerations regarding use of this study 1. The sample is small. 2. An all-female sample is used.
i
CONCLUSIONS Clinicians and researchers use canqellation tests to assess various aspects of atkention, including vigilance and sustained and ~lective attention. There are numerous such te$ts from 1 r-andwhich to choose, and most involve p~ pencil administration. Such tests also equire aspects of psychomotor responding, , well as
visual tracking ability. Two tests were selected for discussion in this chapter, the Ruff 2&.7 Selective Attention Test and the DVI'. A review of the literature indicates that there •are no gender differences on either of these tests but that performance clearly declines with age. Performance on such tests appears to improve with higher levels of education. Additionally, there appear to be some critical gaps in the existing normative data for the cancellation tests reviewed in this chapter. For example, for the Ruff 2&7, when the data are partitioned by age, sample sizes are vecy small (fewer than 20), particularly for individuals older than 40 years. For the DVT, most participants over 50 years of age tend to have lower educational levels (24 and ranged in age from 57 to 85 years, with a mean age of 70.4 (5.0) years, at the first testing probe. Mean education was 14.1 (2.7) years, and mean FSIQ was ll8.2 (13.0). The sample was partitioned into four age groups, which did not differ in level of education. Participants were screened for a history of neurological or psychiatric disorder. All participants were native English speakers. The BNT was administered according to standard instructions as part of a large neuropsychological battery. Some decline in scores after age 70 was apparent from cross-sectional age group comparisons. The pattern of correlations with various neuropsychological measures suggests a predominantly verbal mode of information processing in BNT performance on the first probe, as opposed to a visuospatial mode by the third probe. A comparison of BNT scores across the three probes revealed adequate stability of scores over time, with test-retest correlations ranging r = 0.62-0.89.
185
BOSTON NAMING TEST
Study strengths 1. Infonnation regarding age, education, gender, geographic area, IQ, and fluency in English is reported. 2. Adequate exclusion criteria were used. 3. The data are partitioned into four age groups. 4. Test-retest data are provided. 5. Overall sample size is large, with some cells approaching 50 while some cells being rather small. 6. Means and SDs for the test scores are reported.
Consideration regarding use of the study 1. Mean education and intelligence levels are high. [BNT.S] Neils, Baris, Carter, Dell'aira, Nordloh, Weiler, and Weisiger, 1995 (Table A10.7) The study addresses the effects of demographic factors on BNT perfonnance. Participants were 323 nonnal elderly (244 females, 79 males) aged 65-97 residing in northern Kentucky and the greater Cincinnati, Ohio, area; 167 participants were living independently and 156 were institutionalized in extended-care facilities for at least 1 month. All participants were carefully screened for neurological disorders and had adequate vision, language comprehension, and attention. The administration procedure differed from standard in that the stimulus cues were offered after any error was made, irrespective of whether it was a visual-perceptual error. The data are presented in an age-byeducation-by-living environment matrix. The combination of age, education, and living environment accounted for 32% of the perfonnance variance. The results suggest that scores for low-education and high-education groups are less affected by age and living environment than scores for participants with 10-12 years of education. Correlation between BNT score and education was r=0.38, whereas the correlation of BNT with age was r = -0.33.
Study strengths 1. Infonnation regarding age, education, gender, and geographic area is provided.
2.
3. 4. 5.
Data across wide ranges of different demographic characteristics are presented. Strict selection criteria were used for neurological disorders and cognitive dysfunction. Overall very large sample size. The data are presented in an age-byeducation-by-living environment matrix. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. No infonnation regarding intellectual level. 2. Sample sizes in individual cells are small. 3. The administration procedure somewhat differed from standard instructions. [8NT.6] Ross, Lichtenberg, and Christensen, 1995 (Table A10.8) This article represents an expansion on the previously reported data in Lichtenberg et al. (1994). In study 1, the authors provide data for 123 geriatric medical inpatients at an urban rehabilitation hospital in Michigan (60% African American, 40% Caucasian, 62% female, 38% male). Mean age was 75.87 (7.42), with mean education of 11.05 (3.38). Rigorous exclusion criteria for neurological disorders and depression were used. Mean Mattis Dementia Rating Scale (DRS) score for the sample was 132.76 (4.93). Patients treated for hypertension, diabetes, and hypothyroidism were included if their conditions were well controlled with medications and without neurological complication. Some participants were tested 2-3 weeks after orthopedic surgery and were not on narcotic medications at the time of assessment. In study 2, participants from study 1 were compared as a "nonnative" group to a "cognitively impaired" group of 151 participants with Mattis DRS scores below 123 (61% African American, 39% Caucasian, 30% male, 70% female). Mean age for this group was 79.7, with mean education of 8.9 years. Participants from this group presented with a wide variety of physical disorders which are likely to affect cognitive status. Twenty-four
186
percent of these participants had scores above 10 on the Geriatric Depression Scale (GDS). The results of study 1 indicated significant correlations of BNT scores with age, education, and ethnicity (-0.308, 0.375, and 0.326, respectively). The combined effects of demographic variables accounted for 21% of the BNT variance. In study 2, a discriminant function analysis based on the BNT and demographic data discriminated between cognitively intact and impaired participants with an accuracy of 72.75% (sensitivity 63%, specificity 80%). The authors underscore the importance of using a demographically appropriate set of normative data and suggest use of their data in urban medical settings.
Study strengths 1. Means and SDs for the test scores are reported. 2. Data are presented by age group. 3. A comparison of BNT performance for clinical and medical control groups is presented. 4. Information regarding age, education, ethnicity, gender, and geographic area is reported. 5. Individual cell sizes approach 50. Considerations regarding use of the study 1. "Normal" participants were geriatric inpatients, many of whom had physical illnesses potentially affecting cognitive status. 2. The age range for the oldest group is not reported. 3. No information on intellectual level. [BNT.7] Worrall, Yiu, Hickson, and Bamett, 1995 (Table A10.9)
The authors assessed the validity of the BNT as part of a large educational project on 136 independently living older Australians. Participants were a recruited through advertisements. Participants with a reported history of neurological disease and non-native English speakers were excluded. The mean age for the sample was 70.43 (SD = 7.8) years, and 74.3% were female.
LANGUAGE
The BNT was administered according to standard instructions, followed by a trial of seven alternative items as potential substitutes for low-frequency original items. In addition to standard scoring, an analysis of errors was conducted according to current systems (e.g., Nicholas et al., 1989). The results revealed that the mean BNT score was 2-5 points below that reported for North American samples. Interrater reliabilities for the total score and for error scoring were high (94.89% and 98.17% agreement, respectively). Age, education, visual acuity, and backward digit span were signi&cantly related to BNT scores (r=0.23-0.33). The analysis of errors indicated that semantically related errors and "don't lmow" responses were most frequent. The authors emphasized an effect of culturerelated word frequency on BNT performance. The proposed alternate items for "beaver" and "pretzel" were "platypus" and "pizza." The longitudinal follow-up data for 91 participants from this sample are reported in Cruice et al. (2000).
Study strengths 1. Minimally adequate exclusion criteria are reported. 2. Data are presented by age group. 3. Authors recommend cutoff scores. 4. Analysis of errors was performed. 5. Information regarding age, gender, geographic area, and recruitment procedures is reported. Considerations regarding use of the study 1. Education and intellectual level are not reported. 2. Sample sizes for most of the age groups are small. 3. Participants were recruited in Australia, and it is unclear if these norms are suitable for clinical interpretation in the United States given that this sample scored 2-5 points below North American samples. [BNT.8] Lafleche and Albert, 1995 (Table AlO.lO)
The BNT was administered to 20 volunteers who comprised a control group in a study on
187
BOSTON NAMING TEST
executive function deficits in mUd AD. The control group included nine men and 11 women, with a mean age of 76.2 years, mean education ofl4.7 years, and mean MMSE score of 29.4 (0.8). Participants were screened for severe head injury, alcoholism, major psychiatric illness, epilepsy, and learning disabtlities. They did not show evidence of a dementing process, either on testing or by history.
Study strengths 1. Adequate exclusion criteria. 2. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. The sample is small. 2. SDs and ranges for age and education are not provided. 3. Recruitment procedures are not reported. 4. Education level for the sample is high. 5. No information on IQ is reported. [BNT.9] lvnik, Malec, Smith, Tangalos, and
percenttle ranks. The authors provided tables of age-corrected norms for each age group. The procedure for clinical application of these data is described in the original article (Ivnik et al., 1996) as follows: first select the table that corresponds to that person's age. Enter the table with the test's raw score; do not use "corrected" or "final" scores for tests that might present their own age- or educationadjustments. Select the appropriate column in the table for that test. The corresponding row in the left-most column in each table provides the MOANS Age-Corrected scaled score . . . for your subject's raw score; the corresponding row in the right-most column indicates the percentile range for that same score.
Further, linear regressions should be applied to the normalized, age-corrected MOANS scaled scores (A-MSS) derived from the tables, to adjust patient scores for education. Age- and education-corrected scores for the BNT (A&E-MSS) can be calculated as follows:
Petersen, 1996 (Table A10.11)
The study provides age-specific norms for the BNT obtained in Mayo's Older Americans Normative Studies (MOANS), which produce normative data for elderly individuals on different neuropsychological tests. The total sample consisted of 746 cognitively normal volunteers residing in Minnesota, over age 55, 663 of whom took the BNT. Mean MAYO FSIQ (which differs somewhat from standard WAIS-R FSIQ) for the whole sample was 106.2 (14.0), and mean Mayo General Memory Index on the Wechsler Memory ScaleRevised (WMS-R) was 106.2 (14.2). For a description of their samples, the authors refer to their earlier publications. Participants were independently functioning, communitydwelling persons who were recently examined by a physician and had no active neurological or psychiatric disorder with the potential to impact cognition. Age categorization utilized the midpoint inteiVal technique. The raw score distribution for each test at each midpoint age was "normalized" by assigning standard scores with a mean of 10 and SD of 3, based on actual
A&E-MSSsNT = K+(W, •A-MSSsNT) - (W2 *Education)
where the following indices are specified for the BNT: K
3.32
w. 1.07 w2 o.34 Education should enter the formula as years of formal schooling. The tables of scaled scores per age group provided by the authors should be used in the context of the detailed procedures for their application, which are explained in Ivnik et al. (1996). Therefore, they are not reproduced in this book. Interested readers are referred to the original article. Table AIO.ll summarizes sample sizes for different demographic groups.
Study strengths 1. Information regarding age, education, IQ, gender, ethnicity, handedness, and geographic area is reported.
188
2. The data are stratified by age group based on the midpoint inte~ technique. 3. The innovative scoring system is well described. The authors developed new indices of performance. 4. The sample sizes for each group are large. 5. Restricted age range in each cell.
Considerations regarding use of the ~dy 1. The measures proposed by the ~uthors are quite complicated and might be difficult to use in clinical practice. • 2. Participants with prior history ol neurological, psychiatric, or chronic ptedical illnesses were included.
Other comments 1. The theoretical assumptions un4erlying this normative project have lJe;n presented in Ivnik et al. (1992a,b). 2. The authors cautioned that the )validity of the MAYO indices depends he,vily on the match of demographic featUres of the individual to the normative :Sample presented in this article. 3. Correlation of the BNT with age was -0.46, whereas correlations with education and gender were 0.26 and .-0.19, respectively. · [BNT.10] Welch, Doineau, Johnson, and king, 1996 (Tables A10.12-A10.14)
The study provides data on BNT perfofiJlance for 176 normal older adults from middle Tennessee (74 males, 102 females), ranging in age from 60 to 93, with a mean age of 74 years. Education ranged from third grade to lf years, with a mean of 12.28 years. The sample consisted of 61% urban and 39% rural participants; 29% professional, 28% skilled, m;d 43% labor workers; 71% white, 28% ~rican American, and 1% other. Participan~ were recruited mostly from senior-citizen organizations and retirement centers, to ensure tample representation approximating the gene¥ population for the following parameters: yarious occupational levels (skilled, professio~al, or manual labor), race and living characf:J'ristics (urban vs. rural). Strict medical and psyruatric
LANGUAGE
exclusion criteria were employed. Participants with well-controlled hypertension or who had adequate corrected vision were included. The data were presented for five age groups and then further stratified into five age groups by two educational levels and into five age groups for males and females separately. The table for five age groups includes suggested cutoff scores. The results indicated that the interaction of age and education is a better predictor of BNT performance than age alone. Performance variability was higher in the older age and lower education groups. In the ~12th grade education group, BNT performance remained stable until 80 years, while in the 17 years. Volunteers from local churches in Richland and Florence counties in South Carolina; students, faculty, and staff from the University of South Carolina (USC); and participants in a lexical function study at the USC were included in the sample. Exclusion criteria were a history of mental retardation, dementia or developmental language disorders, traumatic brain injury, cerebrovascular accident, treatment for alcoholism, or current psychiatric illness including depression. Participants with scores above 3 on the Hachinski Ischemia Rating Scale, above 0.5 on the Zung Depression Scale, and 27, had Beck Depression Inventory-IT scores 50. 2. The sample composition is described in terms of age, education, and ethnicity. 3. Rigorous exclusion criteria. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. It is unclear which version of the test was administered. 2. Wide age and education range. No information on IQ is reported. 3. Recruitment procedures were not reported. 4. Educational level for the sample is high. [VF.17] Ponton, Satz, Herrera, Ortiz, Urrutia, Young, D'Eiia, Furst, and Namerow, 1996
logical or psychiatric disorder, drug or alcohol abuse, and head trauma. Data for a sample of 300 participants with a median educational level of 10 years were analyzed. Participants ranged in age 16-75 years, with a mean of 38.4 ( 13.5) years. Education ranged 1-20 years, with a mean of 10.7 (5.1) years. Male to female ratio was 40%/60%. The average duration of residence in the United States was 16.4 (14.4) years. Seventy percent of the sample were monolingual Spanishspeaking, and 30% were bilingual. The proportion of the sample respective to their country of origin closely approximates the 1992 U.S. Census distribution. Correlations between Marin and Marin (1991) acculturation scale scores and neuropsychological variables are provided. The FAS test was administered in the participants' native language, Spanish. In the follow-up study on the factor structure of the NeSBHIS (Ponton et al., 2000), which extracted five factors, the FAS primarily loaded on the Language factor, with a varimax-rotated factor loading of 0.71.
Study strengths 1. Large overall sample, with acceptable sample size for most of the cells. 2. The sample composition is well described in terms of age, education, gender, acculturation information, geographic area, and recruitment procedures. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. 6. Data are partitioned by gender xage x education.
(Table A11.22)
The F AS version was administered to Spanishspeaking volunteers as part of a larger battery in a project designed to provide standardization of the Neuropsychological Screening Battery for Hispanics (NeSBHIS). Volunteers were recruited through fliers and advertisements in community centers of the greater Los Angeles area over a period of 2 years. Exclusion criteria were a history of neuro-
Considerations regarding use of the study 1. It is unclear whether the administration procedure required restrictions in the types of word to be used in the process of word generation. 2. No information on IQ is reported. 3. It is unclear which of the two educational groups included participants with 10 years of education.
220 [Vf.18] Crossley, D'Arcy, and Rawson, 1997 (Table A11.23)
The authors compared performance on letter and category fluency in a sample of cognitively normal seniors (n=635) and in samples ofDAT and vascular dementia patients participating in the Canadian Study of Health and Aging. The control sample included communitydwelling individuals who were screened for cognitive impairment using the Modified Mini-Mental State Examination (3MS). All participants were fluent in either English or French. A detailed overview of the study participants, methods, and findings is provided by the Canadian Study of Health and Aging Working Group (1994). Letter fluency was assessed with the F AS task, administered in three 60-second trials. Participants were instructed to avoid proper nouns and the same word with a different suffix. Category fluency was assessed with the animal name generation task, within a 60second interval. The data are reported by age group, gender, and educational level. Study strengths 1. Administration procedures are well outlined. 2. Sample composition is well described in the previous reports. 3. Subject selection criteria are outlined. 4. Data are stratified by age group, gender, and education. 5. Means and SDs for the test scores are reported. 6. Sample sizes for each demographic grouping are very large. Considerations regarding use of the study 1. Data were collected in Canada and, therefore, might be of limited use in the United States. 2. It is unknown to what extent having some data collected in French impacted the overall results. [VF.19] Beatty, Testa, English, and Winn, 1997 (Table A11.24)
The authors used FAS and Animal Naming to investigate clustering and switching strategies
LANGUAGE
as determinants of hierarchical organization of semantic memory. Performance of an Alzheimer's group was compared to that of an elderly control group, which consisted of 38 volunteers: 18 males and 20 females. None of the participants had a history of major psychiatric or medical illness, drug or alcohol abuse, head injury, learning disability, or other neurological disease. Standard procedures for administration of the FAS and Animal Naming versions were used. Responses were recorded on audiotape and later analyzed. In the follow-up studies on VF mechanisms in Alzheimer's and Parkinson's diseases (Tr6ester et al., 1998; Piatt et al., 1999a), the authors apparently used the same control sample (at least in part). Therefore, the data from these articles will not be reproduced in this book. Study strengths 1. The sample composition is described in terms of age, gender, and education. 2. Rigorous exclusion criteria. 3. Administration procedure is well described. 4. Means and SDs for the test scores are reported. Considerations regardtng use of the study 1. The sample is relatively small. 2. Recruitment procedures were not reported. 3. No information on IQ is reported. [VF.20] Nybers, Winocur, and Moscovitch, 1997 (Table A11.25)
The FAS word fluency test was administered as part of a test battery sensitive to medialtemporal and frontal lobe function in a study investigating age-related differences in the effect of lexical priming on memory. The sample included 39 healthy elderly participants who ranged in age 66-87 years, with a mean age of 77.3 years. Education ranged 822 years, with a mean of 13.6. Performance on the WAIS Vocabulary test was used as a screening measure. Study strengths 1. The sample composition is described in terms of age and education.
221
VERBAL FLUENCY TEST
2. Test administration procedures are speci£i.ed. 3. Means and SDs are reported for the FAS.
furniture, and vegetable categories were used in the category fluency test. The data are reported for each trial separately.
Study strengths
Considerations regarding use of the study 1. The sample is relatively small. 2. Exclusion criteria are not described. It is unclear which version of the WAIS was administered and what performance on Vocabulary served as a cutoff for inclusion into the study. 3. Recruitment procedures are not reported, and gender distribution is not specified. 4. It is unclear whether the administration procedure required restrictions in the types of word to be used in the process of word generation. 5. SDs for age and education are not reported. 6. The data were obtained on Canadian and/or Swedish participants, which may limit their usefulness for clinical interpretation in the United States. [VF.21] Salthouse, loth, Hancock, and Woodard, 1997 (Table A11.26)
The authors examined controlled and automatic processes underlying memory and attention using the process-dissociation procedure, as well as age-related influences on these processes. Participants were 115 healthy adults (47% male, 53% female) aged 18-78 years, who were recruited from appeals to groups and acquaintances. They were included in the study if reported to be in "reasonably good health,'' to not be a current student, and to have at least 11 years of education. No other exclusion criteria are reported. Participants were administered a battery of neuropsychological tests in their homes. The data were stratified into three age groupings: 18-39 years [mean age= 29.0 (4.8); mean education= 15.5 (1.7)], 40-59 years [mean age=49.1 (5.1); mean education= 15.2 (2.5)], and 60-78 [mean age= 69.2 (5.1); mean education= 15.3 (2.6)]. Letters C, F, and L were used in the letter fluency test, with the constraint that none of the words should be proper nouns. Animal,
1. Sample size is large. 2. The sample composition is well described in terms of age, education, gender, and various health indices. 3. Recruitment procedures are speci£i.ed. 4. Data are partitioned into three age groups. 5. Test administration procedures are speci£i.ed. 6. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Exclusion criteria are not well identified. 2. High educational level for each age group. [VF.22] Kempler, Teng, Dick, Taussig, Davis, 1998 (Table A 11.27)
The Animal Naming test was administered to 317 Chinese, Hispanic, and Vietnamese immigrants, speaking primarily their native language, and to white and African-American English speakers 54-99 years old. Participants generated animal names in their native language. The test was administered as part of a normative study for the Cross-Cultural Neuropsychological Battery. Volunteers who had a history of stroke, head injury, or psychiatric, speech, language, or memory problems, as reported on a self-rated health history questionnaire, were not included in the study. The standard administration procedure was used. The results indicated an inverse relationship of word fluency with age and a positive relationship with education. A pronounced effect of native language was also noted (see above).
Study strengths 1. Large sample. 2. The sample composition is well described in terms of age, education, gender, ethnicity, and information on acculturation level for the immigrant groups. 3. Adequate exclusion criteria.
222
LANGUAGE
4. Test administration procedures are spec-
ified. 5. Means and SDs for the test scores are reported, grouped by age, education, gender, and ethnicity.
Consideration regarding use of the study 1. No information on IQ is reported. [VF.23] Stuss, Alexander, Hamer, Palumbo, Dempster, Binns, Levine, and lzukawa, 1998 (Table A11.28) The study addresses the effect of brain lesion location and etiology on VF. The control group included 37 participants (19 males, 18 females) without neurological or psychiatric disorder, with mean age of 54.4 (14.4) years and mean education of 13.9 (2.3) years. Mean NART-estimated IQ was 113.8 (6.1). The letter fluency task (FAS) was administered according to Benton and Hamsher's (1978) instructions (numbers were not excluded according to the instructions). Semantic fluency was measured with the animal name generation task. Number of target words generated, different error types, and measures of clustering were recorded. Measures of VF correlated with age but not with education or NART IQ. Normative data for letter and semantic fluency tasks for three age groups (21-39, 40-64, 65-81 years) stratified by gender are provided. The authors reviewed the results in light of the relationship between different cognitive processes and brain regions.
Study strengths 1. The sample composition is well described in terms of age, education, gender, and estimated IQ. 2. Adequate exclusion criteria. 3. Test administration procedures are described. 4. The data are stratified by three age groups and by gender. 5. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Small sample size and data are inconsistent across age groups, with older fe-
males scoring considerably higher than younger females on the letter fluency task. 2. Recruitment procedures are not reported. 3. The data were obtained on Canadian participants, which may limit their usefulness for clinical interpretation in the United States. [VF.24] Johnson-Selfridge, Zalewski, and Aboudarham, 1998 (Table A11.29) The authors examined the effect of ethnicity on word fluency, measured with the F AS and Animal Naming versions. The sample included white, black, and Hispanic male veterans, with 200 participants in each group, who were randomly drawn from a larger sample of 4,462 veterans participating in the Vietnam Experience Study. Hispanic participants were not differentiated by country of origin or primary language. However, the authors stated that s were 74% female. · 2. Uneducated sample: Data for lterates (n = 26) and illiterates (n = 47) l-ith no formal education were analyze4 sepa; rately. 3. Stratified random Spanish-sfleaking sample: Stratified random sanjple of education-matched literate and illiterate elders (n = 32 for each group). ; 4. Uneducated Spanish-speaking 4ample: Uneducated literate (n = 17) an4 illiterate (n=43) elders. 1. Stratified mndom sample:
Three category fluency conditions were used-animals, food, and clothing-with standard administration procedures for the Boston Diagnostic Aphasia Examination (BDAE). The score represents the number of words averaged over the three conditions. The authors concluded that category fluency is not affected by literacy status. Study strengths 1. Large overall sample size. 2. The overall sample is described in terms of age, education, gender, ethnicity, geographic area, setting, recruitment procedures, and sampling methods. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. 6. Data for illiterate and low-education samples are provided. Considerations regarding use of the study 1. Demographic characteristics for three out of four groups are not provided. 2. The data are not partitioned by age group. 3. No information on IQ is reported. However, WAIS-R Similarities performance is reported. [VF.27] Boone, 1999 (Table A11.32)
The chapter summarizes the results of a study on the effect of aging, demographic factors, and medical conditions on executive functions, which were presented in earlier publications (Boone et al., 1990, 1995). Participants are 155 healthy elderly volunteers (53 males, 102 females) aged 45-84 years, with a mean age of 63.07 (9.29), mean education of 14.57 (2.55), and mean FSIQ of 115.41 (14.11). All participants were fluent English speakers and were recruited through newspaper ads. Participants underwent physical and neurological examinations and psychiatric interviews. Rigorous exclusion criteria were used, including history of psychosis, major affective disorders, alcohol dependence, neurological disorders, and serious metabolic abnormalities. Frequency of vascular illnesses and intake of
225
VERBAL FLUENCY TEST
cardiac and/or antihypertensive medications was recorded. The F AS version of the test was used. Normative data are stratified by IQ level (average, high average, superior) based on performance on the Satz-Mogel abbreviation of the WAIS-R. The results identified the FSIQ as the only significant predictor of F AS performance, responsible for 15% of test score variance, based on stepwise regression analysis.
The standard administration procedure was used.
Study strengths
Considerations regarding use of the study
1. The sample size is large. 2. Composition of the sample is well described in terms of IQ, age, fluency in English, education, gender, and recruitment procedures. 3. Rigorous exclusion criteria. 4. Normative data are stratified by IQ. 5. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Age and education for each of the three IQ groups are not provided. 2. Education and intelligence levels of the sample are high. 3. Data are not presented by age groupings. [VF.28] Demakis, 1999 (Table A11.33)
The authors used the COWA as part of a battery in a study of response consistency across a 3-week interval in an analog malingering design. Data are presented for control and dissimilation groups. All participants were students from undergraduate psychology courses at a small midwestern liberal arts college. The control group consisted of 21 participants with a mean age of 22.5 years (7.99) and mean education of 13.6 (1.46) years; 67% were female. Control participants were told that they were in a car accident but that they had not suffered any injuries and were instructed to perform to the best of their ability. Participants were retested 3 weeks after the initial testing. Control participants demonstrated a practice effect on the retest. Only data for the initial testing probe for the control group are replicated in this book.
Study strengths 1. The sample composition is described in terms of age, education, gender, and geographic area. 2. Test administration procedure is specified. 3. Means and SDs for the test scores are reported.
1. The sample is small. 2. Exclusion criteria are not clearly described. 3. It is unclear which version of the test was administered. 4. Recruitment procedures were not reported. 5. No information on IQ is reported. [VF.29] Epker, Lacritz, and Cullum, 1999 (Table A11.34)
The authors used F AS and Animal Naming in a study of the diagnostic utility of a qualitative scoring technique for fluency tasks in Alzheimer's and Parkinson's diseases. The control group included 65 elderly participants with a mean age of 70.6 years (4.7), mean education of 14.3 (2.9) years, and a male/female ratio of 22/43, who participated in an investigation of cognitive function in aging. They were screened for health problems using a semistructured neuromedical interview. Participants did not have a known history of substance abuse, major mental illness, learning disability, neurological disease, or major psychopathology. Standard administration procedures were used.
Study strengths 1. Relatively large sample. 2. The sample composition is well described in terms of age, education, gender, and MMSE score. 3. Adequate exclusion criteria. 4. Test administration procedures are well specified. 5. Means and SDs for the test scores are reported.
226
Considerations regarding use of the siludy 1. Recruitment procedures were ~ot reported. ' 2. Educational level for the sample ?is high. 3. No information on IQ is reportetl. [VF.30] Tombaugh, Kozak, and Rees, 1gf9 (Tables A11.35-A11.37)
The article provides normative data fur FAS and Animal Naming stratified by three 1evels of age (16-59, 60-79, 80-95) and three ~vels of education (0-8, 9-12, 13-21), as weQ as for nine age groups, four education gro~s, and the two genders separately. The total!sample included participants from two differelt studies. Participants were recruited throu~ booths at shopping centers, social organizatic:fs, places of employment, psychology classfs, and word of mouth. Volunteers with a kn~ history of neurological disease, psychiatri~ illness, head injury, or stroke were excluded ~m the study. A subsample of participant$ were judged to be cognitively intact on the ~asis of history, clinical and neurological exa~ation, and an extensive battery of neuropsycht>logical tests. All participants stated that Engijsh was their first language. : The subset of the sample for the F~S test included 895 participants aged 16-9~ years, with a mean age of 60.7 years (19.9), 4J1d education ranging 0-21 years, with a me+n education of 12.1 (3.2). The male-to-female ratio was 559n4L I The subset of the sample for the Animal Naming test included 735 participanis aged 16-95 years, with a mean age of 67.0 years (19.8), and education ranging 0-21 yeats, with a mean education of 11.4 (3.4). The n.ale-tofemale ratio was 310/425. . The standard administration pr~dures were used, with the exception that n*mbers were allowed on the F AS test. Meru} numbers of words are presented for four edpcation groups, nine age groups, and the two genders separately. Percentile scores and mea+ number of words are also presented in thtee age (16-59, 60-79, and 80-95) by three edrcation (0-8, 9-12, and 13-21) cells. . FAS was found to be more sens~ve to the effects of education than age. For f\nimal Naming, the relationship was opposite. (;ender
LANGUAGE
was not found to affect performance on either test.
Study sfrengths 1. Large sample. 2. The sample composition is well described in terms of age, education, gender, and recruitment procedures. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. 6. Data are stratified by age, education, gender, and age x education. Considerations regarding use of the study 1. The data were obtained on Canadian participants, which may limit their usefulness for clinical interpretation in the United States. 2. No information on IQ is reported. [VF.31] Basso, Bomstein, and Lang, 1999 (Table A11.38)
The study examined the practice effect on repeated administration of several tests over a 12-month interval. The baseline sample consisted of 82 men recruited through newspaper advertisements, who were not paid for their participation. Fifty men out of this sample returned for the repeated testing 12 months later. The composition of the latter sample was 48 Caucasian, 1 African American, and 1 Hispanic, with a mean age of 32.5 (9.27) years, mean education of 14.98 (1.93) years, and mean FSIQ of 109.30 (12.29) at baseline. At each probe, participants were screened for neurological disease, head injury, learning disabilities, or other medical illnesses based on an informal interview. They were also screened for psychiatric disorders through a structured clinical interview. None was excluded based on these screens. The F AS was administered according to standard procedures by thoroughly trained and supervised technicians. The authors compared FAS performance at baseline and on the retest using reliable change indices and concluded that FAS scores did not change on the retest.
227
VERBAL FLUENCY TEST
The number of words generated on the FAS for the two probes, with age, gender, and education corrections applied, is reported for the entire sample.
Study strengths 1. Adequate sample size. 2. The sample composition is described in terms of age, education, gender, ethnicity, FSIQ, and recruitment procedures. 3. Adequate exclusion criteria. 4. Test administration procedures are thoroughly described. 5. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. The data are not partitioned by age group. 2. Educational level for the sample is high. [VF.32] Gladsjo, Schuman, Evans, Peavy, Miller, and Heaton, 1999 (Tables A11.39, A11.40)
The authors provided normative data and demographic corrections for age, education, and ethnicity, for letter and category fluency tasks, based on a sample of 768 normal adults aged 20-101 years, with education of 020 years; 55% are Caucasian and 45% African American; 52% are male. Mean age is 50.4 (19.4) years; mean education is 13.6 (3.1) years. The sample consists of volunteers who were enrolled as normal comparison participants in various clinical studies at the UDiversity of California San Diego. Caucasian participants were recruited through local media announcements and personal contacts. Mrican-American participants were part of a federally funded study (African American Norms Project) and were recruited to match the census representation of Mrican Americans within the larger San Diego area. Participants were screened with the Structured Clinical Interview for DSM-III-R or based on self-report of no past history of diagnosis or treatment for an Axis I disorder. Exclusion criteria were history of significant head trauma with loss of consciousness for >20 minutes or persisting neurological sequelae, neurological illness, conditions expected to affect neuropsychological test performance, psychotic
disorder, other major psychiatric illness, current substance dependence or abuse within the last 6 months, or primary language other than English. F AS and Animal Naming were administered. According to the F AS instructions, proper names and plurals were excluded. Total number of words generated for three FAS trials and for Animal Naming are reported for the sample stratified by three age groups (20-34, 35-49, 50-101 years) and three education groups (0-11, 12-15, 1620 years). Data stratified by age are also presented for African Americans and Caucasians separately. In addition, multiple regression analyses were used to develop equations for demographic corrections. Tables for conversion of raw scores to demographically corrected T scores were provided by the authors. Raw scores for FAS and Animal Naming are reproduced in this chapter.
Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, ethnicity, geographic area, setting, and recruitment procedures. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Normative data are stratified by age x education for the whole sample and by age for African Americans and Caucasians separately. 6. Means and SDs for the test scores are reported. Consideration regarding use of the study 1. No information on IQ is reported. [VF.33] Binder, Storandt, and Birge, 1999 (Table A11.41)
The authors examined the relationship between performance on psychometric tests and a modified Physical Performance Test (modified PPT) in a sample of 125 adults aged 75 years and older, who participated in trials of exercise or hormone replacement therapy. The study was approved by the Washington University School of Medicine, St. Louis. The
228
mean age for the sample was 82.3 (4.4), mean education was 13.5 (3.0), 25% were male, and 87% were Caucasian. Indices of physical health, Blessed score, and Geriatric Depression Scale score are reported. Preliminary screening included a medical history; physical examination; the Short Blessed Test of memory, concentration, and orientation; blood and urine chemistries; a chest X-ray; and a crossvalidated self-report regarding health problems in the previous 12 months. Exclusion criteria were inability to walk 50 feet independently, active medical problems that would contraindicate performance of a graded exercise stress test, inability to complete the graded exercise stress test or the modified PPT, a score >8 on the Short Blessed Test, inability to provide informed consent due to cognitive impairment, and inability to follow the directions for the psychometric tests due to visual or auditory impairments. The test was administered according to standard instructions. The authors found that VF was not significantly associated with total modified PPT score. Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, ethnicity, indices of physical health, Blessed score, Geriatric Depression Scale score, geographic area, and research setting. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. The data are not partitioned by age group. 2. No information on IQ is reported. [VF.34] Fama, Sullivan, Shear, Cahn-Weiner, Marsh, Lim, Yesavage, Tinklenberg, and Piefferbaum, 2000 (Table A11.42)
Fluency tests were administered to Alzheimer's patients and normal controls in a study
LANGUAGE
on the relationship between regional brain volume and semantic, phonological, and nonverbal fluency. The control group included 51 participants with a mean age of 66.7 (7.4) years and mean education of 16.4 (2.3) years. Exclusion criteria were significant history of psychiatric or neurological disorder, past or present alcohol or drug abuse or dependence, or other serious medical condition, as identified on a psychiatric interview and medical examination. The standard administration procedure was used for the FAS, with the exception that participants were not instructed to avoid numbers. Semantic fluency was measured with two !-minute trials, in which participants were instructed to generate names of animals and names of inanimate objects, respectively. These data were used in a previous article by Fama et al. (1998) in calculations of standardized z scores for Alzheimer's participants that corrected the raw scores for age. Study strengths 1. Relatively large sample. 2. The sample composition is described in terms of age and education. 3. Rigorous exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Recruitment procedures a not reported. 2. Gender distribution is not reported 3. The data are not partitioned by age group. 4. Educational level for the sample is high. 5. No information on IQ is reported. [VF.35] Troyer, 2000 (Table A11.43)
The study addressed clustering and switching on phonemic and semantic VF tasks in a total sample of 411 healthy adults aged 18-91. This is a follow-up on previous publications by these authors (Troyer et al., 1997, 1998a,b). The mean age for the sample was 59.8 (20. 7) years, and mean education ranged 5-21 years, with a mean of 13.9 (2.9). The male/female ratio was 30%nO%. All participants were
VERBAL FLUENCY TEST
fluent in English. Participants were screened for neurological or psychiatric disorders. Participants aged 2::60 were screened for cognitive decline. Only those participants who obtained MMSE score 2::25 or scores within the normal range on an episodic memory test were included. The F AS version of the phonemic fluency test was administered to 257 participants and the CFL version, to 154 participants. Standard administration procedures were used, with the exception that participants were not instructed to avoid numbers. Two 60-minute semantic fluency trials were administered: animal fluency version was administered to 407 participants; 156 participants from this sample were also administered supermarket fluency. Based on the results of regression analyses, the author inferred that age had a greater effect on semantic than on phonemic fluency. Education affected both semantic and phonemic fluency. Gender was not related to VF performance.
Study strengths 1. Large sample. 2. The sample composition is well described in terms of age, education, gender, and native language. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are reported.
Considerations regarding use of the study 1. Recruitment procedures are not reported. 2. Participants were screened for neurological or psychiatric disorders; however, medical exclusion criteria are not reported. 3. Demographic characteristics for subsets of participants in each condition are not provided. 4. Phonemic fluency norms are provided as the mean for FAS/CFL. 5. The data are not partitioned by age group. 6. No information on IQ is reported. 7. The data were obtained on Canadian participants, which may limit their use-
229
fulness for clinical interpretation in the United States.
[VF.36] Acevedo, Loewenstein, Barker, Harwood, Luis, Bravo, Hurwitz, Aguero, Greenfield, and Duara, 2000 (Tables A11.44-A11.47) The authors provided normative data for three conditions of the Category Fluency test, Animals, Vegetables, and Fruits, for 424 English-speaking and 278 Spanish-speaking participants over the age of 50. The sample was drawn from a larger pool of communitydwelling individuals who presented for free memory screening sessions offered by the Wien Center for Alzheimer's Disease and Memory Disorders between 1994 and 1999. Participants in the English-speaking group spoke English as their prirruuy language and were born in the United States. Participants in the Spanishspeaking group spoke Spanish as the primary language and were hom in a country where Spanish is the primary language. All participants were screened in their primary language using the MMSE, Hamilton Depression Rating Scale (Hamilton, 1960), and questionnaires related to demographic information, medical and psychiatric history, and cognitive status. Only participants who had MMSE score 2::27 and a score 2::10 on four delayed recall trials of the three words used in the MMSE (based on the cutoff identified in Loewenstein et al., 2000) were included in the study. For English speakers, the mean age was 69.1 (6.9) years, mean education was 14.4 (2.5) years, male/female ratio was 26%n4%, and mean MMSE score was 28.9 (1.0). For Spanish speakers, the mean age was 64.9 (7.7) years, mean education was 13.4 (3.2) years, male/female ratio was 30.8/ 69.2%, and mean MMSE score was 28.7 (1.0). Among English speakers, 99% were classified by the examiner as white, r copy scores. The authors attributed this losi of infonnation to inadequate encoding, ir1apaired consolidation, or accelerated rates of forgetting. Deckersbach et al. (2000a,b) and Savage et al. (1999, 2000) found problems with organization of the drawing in spite of ~urate reproduction of the geometric figure ip their sample of individuals with OCDqwhich implicates frontostriatal dysfunction, b ed on neuroimaging findings. Organization uring the copy condition was a strong pr · tor of subsequent memory performance · their sample. A role of executive dysfun~on in OCD, identified through ROCF perfo~ance, is further described by Savage and Otto ~2003). In a study by Waber et al. (1994), lon;-tenn survivors of childhood acute leukemia rfalled fewer organizing-scheme components pn the ROCF but more incidental features iii comparison to nonnative expectations. The ~thors suggest a metacognitive basis for this !weakness, rather than a visuoperceptual de6fit. Shorr et al. (1992) computed measqres of copy accuracy, perceptual clustering, 4ncoding, and savings ror 50 neuropsychia~c patients based on their ROCF perfonnan24, with a mean of 28.5 (1.7). The sample consisted of 31 males and 29 females. All participants were white, and 10% of the sample were left-handed. All participants were administered copy and immediate recall trials with no time limits,
262
PERCEPTUAL ORGANIZATION: VISUOSPATIAL AND TACTILE
which were followed by one of the four delay durations. Timing of the delay started from completion of the copy trial. During tbe delay periods, participants were administered other neuropsychological tests of a verbal 'nature. Each protocol was scored for accutacy by two independent raters using the sys4em described by Beny et al. (1991). Interr~er reliability for the three trials was as foll~: copy, r=0.95; immediate recall, r=0.98; belayed recall, r=0.99. Scores for data anal~s represent the average of final scores assigned by two raters for each protocol. The results revealed no significant affect of delay period on recall. Scores on im~ediate and delayed-recall trials were silllihtr. The authors inferred that most forgetting! occurs very quickly, as a result of "overl$ading" working memory.
The ROCF and Taylor figure were administered in the same order to all participants, which is consistent with the order used in clinical practice. The time interval between administration of the two figures was approximately 1 month. Three conditions were administered for both figures: copy, immediate recall, and 20-rninute delayed recall (delay filled with nonvisuospatial tasks). Reproductions were scored according to the standard criteria. Interrater reliability based on scoring of 10 samples by two experienced neuropsychologists was 0.91. Correlations with age (- 0.11 to- 0.26) and education ( - 0.01 to 0.20) were relatively low. The authors concluded that performance on the copy condition for both figures was nearly identical; however, participants performed significantly better on the Taylor figure on both recall conditions.
Study strengths
Study strengths
.
1. Sample composition is well descrf.bed in terms of age, ethnicity, gender, jeducationallevel, handedness, and geographic location. ; 2. Adequate exclusion criteria were rused. 3. Interrater reliability and scoring 1system are reported. ; I 4. Means and SDs are reported. 5. Age range is probably sufltciently narrow.
Considerations regarding use of the study 1. While overall sample is adequate, individual sample sizes are small. 2. High educational level. 3. No IQ information is reported.
1. Information on interrater reliability is provided. 2. Information regarding age, education, and geographic area is provided. 3. Information on alternate form is provided. 4. Sample size approximates 50. 5. Minimally adequate exclusion criteria. 6. Means and SDs are reported.
Considerations regarding use of the study 1. The data are not broken down by age. 2. SDs for age and education are not reported. 3. No information regarding IQ or gender. [ROCF.12] Kuehn and Snow, 1992 (Table A12.14)
[ROCF.11] Delaney, Prevey, Cramer, and
Mattson, 1992 (Table A12.13) This study addressed the comparability of the ROCF and Taylor figure in a nonpatient sample and is based on the control iample data collected as part of a large study carried out in various locations of the United: States on the effect of anticonvulsant medica~ns on memory functioning. Participants were free of neurological and psychiatric disorders or current drug history. Ages ranged 22--61 years ' and education, 6-16 years.
The study explored the comparability of the ROCF and Taylor figure in a clinical sample. Participants were 38 Canadian patients referred for neuropsychological assessment for various forms of brain damage. Patients unable to draw a Greek cross or administered either figure previously were excluded from the study. Mean age was 46.7 years. The procedure consisted of copying each figure with a lead pencil, followed by 40minute delayed recall (without forewarning). Approximately 3 hours elapsed between
REY-OSTERRIETH COMPLEX FIGURE
administration of the two figures, during which time tests involving drawings or visual memory were not administered. Two figures were presented in a counterbalanced order. The standard scoring systems were used for both figures. Percent recall was calculated. The authors concluded that performance on both figures was equivalent for copy and recall scores. Percent recall scores, however, were higher for the Taylor figure, when it was administered first.
Study strengths 1. Scoring system is specified. 2. Information on gender, age, education, IQ, and geographic area is provided. 3. Information on alternate form is provided. 4. Means and SDs are reported.
Considerations regarding use of the study 1. Data are not broken down by age group. Age range is not specified. 2. The two groups, used for counterbalancing, are not comparable in education but are comparable in IQ. 3. Clinical sample; no exclusion criteria. 4. No information on interrater reliability. 5. Small sample size. 6. Data were collected in Canada and may be problematic for use in the United States. [ROCF.13] Boone, Lesser, Hiii-Gutierrez, Bennan, and D'Eiia, 1993b (Table A12.15)
The investigators collected data on 91 fluent English-speaking healthy older adults recruited in southern California through newspaper ads, flyers, and personal contacts as part of their investigation of the effects of age, IQ, education, and gender on ROCF performance. Exclusion criteria were current or past history of major psychiatric disorder or alcohol or other substance abuse, neurological illness, and significant medical illness which could affect central neiVous system function (e.g., uncontrolled hypertension or diabetes). In addition, potential participants were rejected if they had abnormal findings on neurological examination, metabolic disturbances detected with labora-
263
tory tests, or abnormal findings on EEG or MRI. The final sample included 34 males and 57 females. Seventy-one participants were Caucasian, 10 were African American, five were Asian, and five were Hispanic. Mean educational level was 14.5 (2.5) years. and mean WAIS-R FSIQ (Satz-Mogel format) was 115.9 (13.0). Participants were instructed to copy the figure onto a blank paper "as carefully as you can without tracing." Performance was not timed, and participants were allowed to make erasures. Following a 3-minute verbal fluency task and without forewarning, participants were instructed to draw what they could remember of the figure on a second sheet of blank paper. The E. M. Taylor (1959) scoring system was employed. Means and SDs are reported for copy scores and percent retention for three age groupings (45--59, 60--69, and 70--83) and four FSIQ levels (90-109, 110-119, 120-129, and 130-139). Interrater reliability between two experienced neuropsychologists was 0.82 for copy and 0.93 for delay. In regression analyses, a relatively small but significant percent of the variance in ROCF performance was associated with age and FSIQ; gender and education were not predictive of ROCF scores. In addition, ROCF copy score was not associated with delay score or percent retention. Significantly poorer ROCF scores did not emerge until age 70 and older, and individuals of average IQ showed a trend toward poorer performance on ROCF delay relative to participants falling in the very superior intelligence range. No interaction effects between age and FSIQ were obseiVed. The number and type of errors committed on copy and recall are summarized.
Study strengths 1. Information regarding education, gender, geographic recruitment area, ethnicity, and recruitment procedures is provided. 2. Rigorous exclusion criteria. 3. Data are presented by age and IQ groupings. 4. Scoring system is specified, and information on interrater reliability is provided.
264
PERCEPTUAL ORGANIZATION: VISUOSPATIAL AND TACTILE
5. Information regarding error number and type for copy and delay is provi.ed. 6. Large overall sample size, although individual cells all fall short of 50. 7. Means and SDs are reported.
Consideration regarding use of the study 1. High intellectual and educational level. Other comments 1. For participants older than 74, age-corrected FSIQs were based on Ryan et al. (1990) tables. [ROCF.14] Chiulli, Haaland, LaRue, and Garry, 1995 (Table A12.16)
The study explored rates of decline in ROCF performance after age 70. Participants were 153 healthy elderly individuals aged· 70-93, living independently, who participated in the New Mexico Aging Process Study, wliich explores nutrition and aging. Persons wtth serious medical illnesses or taking preseription medications were excluded. The sample was partitioned into three age groups. The ROCF was administered as part of a brief battery of psychological tests. Standard administration and scoring procedures were used. A copy condition was followed by immediate and 30-minute delayed recaU. If the reproduction started with the drawing of the large rectangle, the approach was categorized as "configura!." All other approaches were determined to be "nonconfigural." All protocols were checked by a second, blind ev~uator. The results revealed a significant main effect for age group. Accuracy was greitest in the copy condition but did not differ between the immediate and delayed recall conditions. The most pronounced decline in performance was demonstrated between the first aild second groups, which did not differ consi~erably from the third group performance. No gender effects were evident. The number of participants using the configura! approach ~d not differ significantly for the three age gri>ups. L
Study strengths 1. Data for an elderly sample ar~ partitioned into three age groups.
2. Relatively large sample size, and individual cells approximate 50. 3. Administration system is specified. 4. Exclusion criteria are specified. 5. Information on education, gender, and geographic recruitment area is reported. 6. The study assessed strategy used in approach to drawings. 7. Means and SDs are reported.
Consideration regarding use of the study 1. High educational level. 2. Data were checked by a blind evaluator, but no information on interrater reliability is provided. 3. No information on IQ. [ROCF.15] Meyers and Meyers, 1995a (Table A12.17)
The study explored the effect of different administration procedures on the rate of recall of the ROCF. Participants were undergraduate students from a college in Iowa and had no prior history of head injury, drug abuse, learning disability, or psychiatric illness. Participants were randomly assigned to one of four groups, each of which received a different combination of trials (30 participants in each group). There was no significant difference between the groups on age, gender, or education. Reproductions were scored according to the system developed by Meyers and Meyers (1992), which is based on the standard scoring system with addition of 114" rule for misplacement and a 118" rule for drawing errors. In addition, the authors used a recognition trial (Meyers & Lange, 1994). The authors suggest use of a 3-minute recall instead of immediate recall due to its higher correlation with the 30-minute recall.
Study strengths 1. Scoring system is described. 2. Sample composition and demographic characteristics are described, as well as geographic area. 3. Overall sample size is large (n = 120), although individual groupings are relatively small.
REY-OSTERRIETH COMPLEX FIGURE
4. Adequate exclusion criteria. 5. Means and SDs are reported. 6. Age grouping is suitably restricted. Consideration regarding use of the study 1. No information regarding interrater reliability or IQ. [ROCF .16] Ponton, Satz, Herrera, Ortiz, Urrutia, Young, D'Eiia, Furst, and Namerow, 1996 (Table A12.18)
The ROCF was administered to Spanishspeaking volunteers as part of a larger battery in a project designed to provide standardization of the Neuropsychological Screening Battery for Hispanics (NeSBHIS). Volunteers were recruited through fliers and advertisements in community centers of the greater Los Angeles area over a period of 2 years. Exclusion criteria were a history of neurological or psychiatric disorder, drug or alcohol abuse, and head trauma. Data for a sample of 300 participants with a median educational level of 10 years were analyzed. Participants ranged in age 16-75 years, with a mean of38.4 (13.5) years. Education ranged 1-20 years, with a mean of 10.7 (5.1) years. The male-tofemale ratio was 40%/60%. The average duration of residence in the United States was 16.4 (14.4) years. Seventy percent of the sample were monolingual Spanish-speaking, and 30% were bilingual. The proportion of the sample respective to their country of origin closely approximates the 1992 U.S. Census distribution. Correlations between Marin and Marin (1991) acculturation scale scores and neuropsychological variables are provided. Participants were instructed to copy the complex figure with no time limit. Reproductions were scored according to Taylor's (1959) criteria. The authors provided normative data for the copy and 10-minute delayed recall conditions. Study strengths 1. Large overall sample, with acceptable sample size for most of the cells. 2. The sample composition is well described in terms of age, education, gender, acculturation information, geographic area, and recruitment procedures.
265
3. Adequate exclusion criteria. 4. Test administration and scoring procedures are specified. 5. Means and SDs for the test scores are reported. 6. Data are partitioned by gender x age x education. Considerations regarding use of the study 1. No information regarding interrater reliability or IQ. 2. It is unclear which of the two educational groups included participants with 10 years of education. [ROCF.17] Rapport, Charter, Dutra, Farchione, and Kingsley, 1997 (Table A12.19)
The study addressed interrater and internal consistency reliabilities of the standard (as described in Lezak, 1995) and Denman scoring systems for the ROCF. Participants were 318 veterans (312 males, 6 females), aged 1884 years, who were referred to a Veterans Administration hospital assessment service. The majority of participants were inpatients. Mean age was 55.01 (4.31) years and mean education, 12.62 (2.77) years. Three independent raters scored copy and immediate recall reproductions using standard and Denman criteria. Interrater reliabilities are presented for the entire sample and for three referral sources separately: neurology, psychiatry, and rehabilitation medicine. The authors concluded that internal consistency and interrater reliabilities for both scoring systems were high. Coefficient !X reliabilities were also high, indicating psychometrically sound inter-item congruity for both scoring systems. Age was modestly related to performance on the copy condition and strongly related to recall. Education was modestly associated with copy and weakly associated with recall performance. Study strengths 1. Information on gender, age, education, and recruitment procedures is provided. 2. A large sample size. 3. Data on psychometric properties of the ROCF are provided.
266
PERCEPTUAL ORGANIZATION: VISUOSPATIAL AND TACTILE
4. Two scoring systems are compared. 5. Means and SDs are reported. , Considerations regarding use of the sJudy 1. Participants were V.A. inpatients from different wards, including neurology. Selection criteria and participants' diagnoses are not specified. The dati on test scores are of limited use with the, general population due to likely health confounds of the sample. 2. The sample was not partitioned Jnto age groups. 3. No information on IQ. 4. Mostly male population. [ROCF.18] Hartman and Potter, 1998 (Table A12.20)
The authors explored the contributio•s of visuospatial ability, organization, and memory to age differences on the ROCF in ad$lthood. Participants were 30 undergraduaf:e and graduate students aged 18-32, with a tnean of 22.3 years, and older adults recruited through fliers and advertisements in local ne~papers and senior-citizen newsletters. Participants were screened for history of neurological illness, head trauma or loss of consciousness,
significant psychiatric illness, untreated hypertension, current use of psychoactive medication, excessive current use of alcohol, and dementia. All participants lived independently in the community and reported thems~lves in good or excellent health. All older: adults scored >24 on the MMSE. The t)vo age groups were selected from a larger satn,ple in order to match them on Shipley Qartford Vocabulary Test scores (36.2 vs. 35.5)J The ROCF was administered actording to Rey's (1941) original instructions, using different-colored pens handed to p~ipants at equal intervals. Copy and immedia~ recall without forewarning were used. Sco~g was done by two investigators using BQSS ;nd the extended 36-point system. BQSS infraclass correlations for a subsample of 22 p~tocols ranged 0.79-1.00, with the exception ~f qualitative items (perseveration, confab'itlation, and neatness), which were low, 0.$--.65. Intraclass reliability coefficients for tht latter system ranged 0.79-0.99. Mean scores !for the
two age groups according to the extended 36-point scoring system are presented in Table A12.20. The authors found that lower performance for the older group, on the Copy condition, was the result of minor inaccuracies in drawing and, on the Recall condition, the result of omission of elements. No decline in organizational quality with age was evident. Small age differences were seen on the copy condition, with robust differences evident in recall. The authors discussed the advantages and disadvantages of the BQSS and the extended 36point scoring system. Table A12.20 provides data according to the latter scoring system. Study strengths 1. The sample composition is well described in terms of age, gender, vocabulary test scores, and recruitment procedures. 2. Rigorous exclusion criteria. 3. Two scoring systems are compared. 4. Means and SDs for the test scores are reported. 5. Information on scoring system and interrater reliability is provided. Considerations regarding use of the study 1. The samples are relatively small. 2. Educational levels for the samples are high. 3. SDs or ranges for education are not provided. [ROCF.19] Ostrosky-Solis, Jaime, and Ardila, 1998 (Table A12.21)
The authors investigated an effect of normal aging on memory abilities. The sample included 105 participants (44 male, 61 female) aged 20-89 years, with a minimum of 6 years of formal education. The sample was partitioned into seven age groups, with 15 participants in each group. All volunteers were of average socioeconomic status, lived in Mexico City, and were native Spanish speakers. Exclusion criteria were presence of dementia according to the DSM-IV criteria, a score < 24 on the MMSE, and a history of neurological or psychiatric conditions, per selfreport questionnaire.
267
REY-OSTERRIETH COMPLEX FIGURE
The ROCF was administered according to Taylor's (1959) instructions. Copy, Immediate Recall, and 20-minute Delayed Recall conditions were administered. The standard scoring procedure was used. Study strengths 1. The sample composition is well described in terms of age, gender, incentive for participation, and geographic area. 2. Minimally adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are reported. 5. Information on scoring system is provided. Considerations regarding use of the study 1. Overall sample is large, but individual cells are small. 2. Recruitment procedures are not reported. 3. Specific information on education is not provided, other than "the participants had a minimum of six years of formal education." 4. The data were obtained on Mexican participants, which may limit their usefulness for clinical interpretation in the United States. 5. No information on IQ is reported. 6. No information on interrater reliability is provided. [ROCF.20] Fastenau, Denburg, and Hufford, 1999
This normative study included 211 healthy adults aged 30-85 years, with a mean of 62.9 (14.2) years. Education ranged 12-25 years, with a mean of 14.9 (2.6) years; 55% were women, and over 95% were Caucasian. Participants were recruited using a stratified sampling procedure at three different sites as part of other studies and financially compensated. Exclusion criteria were history of cerebrovascu1ar insult, head injury with loss of consciousness exceeding 5 minutes, and chronic substance abuse, per structured interview. The Extended Complex Figure Test was administered, which supplements the original Copy, Immediate Recall, and Delayed Recall
with Recognition and Matching trials. Testing and scoring were performed by trained personnel. Scores were generated using Osterrieth's (1944) criteria. The data for conversion of the raw scores into scaled scores are presented in overlapping age groups using the midpoint interval technique introduced by lvnik et al. (1992a). These tables should be used in the context of the detailed procedures for their application, which are explained by the authors. Therefore, they are not reproduced in this book. Interested readers are referred to the original article. The authors concluded that age and education effects were evident on all trials but education explained minimal variance on the copy and memory trials. Gender had a minimal effect on performance. [ROCF.21] Schreiber, Javorsky, Robinson, and Stern, 1999 (Table A12.22)
The BQSS and the 36-point scoring system were compared on samples of adults with ADHD and matched controls. The control group included 18 participants (9 male, 9 female) aged 18-51, with a mean age of 29.5 (11.5) years and mean education of 15.1 (1.7) years. Exclusion criteria were history of neurological disorder, major medical illness, psychiatric illness, developmental disorder, learning disability, ADHD, or significant visual or auditory impairments. The ROCF was administered according to the procedures described in the BQSS manual (R. A. Stem et al., 1999), switching differentcolored pens. The Copy, Immediate, and 2030 minute Delayed Recall conditions were used. The test was administered and scored by trained personnel using the BQSS and the 36point scoring system. The interrater reliability of these scorers was reported in the BQSS manual. Table A12.22 provides a score for the copy condition obtained using the 36-point scoring system. The authors discussed the superiority of the BQSS in discriminating between the two groups. Study strengths 1. The sample composition is well described in terms of age, education, and gender.
268
PERCEPTUAL ORGANIZATION: VISUOSPATIAL AND TACTILE
2. Rigorous exclusion criteria. 3. Test administration and scorinJ procedures are specified. 4. Means and SDs for the test sa>res are reported.
Considerations regarding use of the *'dy 1. The sample is small, with a vnde age range. 2. Data for the recall conditions are not reported. 3. Educational level for the sample is high. 4. No information on IQ is reported.
[ROCF.22] Deckersbach, Savage, Henil\ Mataix-Cols, Otto, Wilhelm, Rauch, Ba.r, and Jenike, 2000 (Table A12.23) The psychometric properties of ~ scoring systems measuring organizational apptoach to the ROCF and influences of copy org;uzation and accuracy on immediate recall were studied on individuals diagnosed with oCD and normal controls. Control participants were recruited through bulletin board noti~s at the Massachusetts General Hospital. Th~ control group consisted of 55 healthy adults (38% male) 19-64 years of age, with a meaJl age of 35.13 (12.6) years, and education ranpg 12-20 years, with a mean of 16.7 (2.3) yeap. Beck Depression Inventory scores ranged 0--15, with a mean of 2.3 (3.2). All particip~ts were Caucasian and right-handed. Estimtted intelligence level was above averagttI Their health status was determined bas~ on a structured clinical interview. Exclusi«in criteria were history of Axis I psychiatric disorder, significant head injury, seizure, neu*>logical condition, or current medical conditiQil. Copy and Immediate Recall condiions of the ROCF were administered. The ~ tration procedure used switching ·colored pencils every 15 seconds. The prot~ls were scored according to Meyers and Meyers' (1995b) system. In addition, the organt?.ational approach used during the Copy condition was assessed according to the Shorr et ~ (1992) and Savage et al. (1999) scoring meth¥s. The interrater reliability for the Savagq et al. method, established on a subsample o( 15 randomly selected drawings, was mod~rate to high, with Cohen's " coefficients :ranging
0.69-0.92 for different organizational elements of the figure. Table Al2.23 provides scores for the Copy and Immediate Recall conditions based on the Meyers and Meyers scoring system. The authors concluded that organization during the Copy condition was a strong predictor of subsequent recall.
Study strengths 1. Relatively large sample. 2. The sample composition is well described in terms of age, education, gender, estimated intelligence level, geographic area, and recruitment procedures. 3. Rigorous exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. 6. Information on scoring systems and interrater reliability is provided.
Considerations regarding use of the study 1. The data are provided for a wide age range. 2. Educational level for the sample is high. [ROCF.23] Miller, 2003; Personal Communication (Table A12.24)
The investigation used participants from the Multi-Center AIDS Cohort Study (MACS). The data were collected from 729 seronegative homosexual and bisexual males for the purpose of establishing normative data for neuropsychological test performance based on a large sample. Mean age for the sample was 40.4 (7.4) years, and mean education was 16.2 (2.4) years; 91.2% were Caucasian, 2.5% Hispanic, 5.6% black, 0. 7% other. All participants were native English speakers. The Copy, Immediate Recall, and 20minute Delayed Recall conditions were administered according to standard instructions. The data are partitioned by three age groups (25-34, 35-44, 45-59) x three educational levels (< 16, 16, >16 years).
Study strengths 1. The overall sample size is large, and most individual cells have more than 50 participants.
REY-OSTERRIETH COMPLEX FIGURE
2. Normative data are stratified by age x education. 3. Information on age, education, ethnicity, and native language is reported. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. All-male sample. 2. No information on IQ is reported. 3. No information on exclusion criteria.
RESULTS OF THE META-ANALYSES OF THE ROCF DATA (See Appendix 12m)
Data collected from the studies reviewed in this chapter were combined in regression analyses in order to describe the relationship between age and test performance and to predict test scores for different age groups. Effects of other demographic variables were explored in follow-up analyses. The general procedures for data selection and analysis are described in Chapter 3. Detailed results of the meta-analyses and predicted test scores across adult age groups are provided in Appendix 12m. Only those data based on the standard 36-point scoring system, including Meyers and Meyers' (1995b) approach, were used in the analyses. Data generated using other methods were not included. Data provided in Meyers and Meyers' (1995b) manual are not included. Separate analyses were performed on the Copy, Immediate Recall, and Long-Delayed Recall conditions. Data for 3-minute delayed recall were not analyzed as only few studies reported data for this condition. The longdelay interval varies widely in the data reviewed. According to the literature, varying the delay interval between 15 and 60 minutes has minimal effect on the rate of recall (Berry & Carpenter, 1992). Therefore, 20-, 30-, 40-, and 45-minute delayed recall trials were combined in one run of analysis. In all cases, the long-delayed recall was preceded by an immediate or a 3-minute delayed recall but not both.
269
After data editing for consistency and for outlying scores, the following data were included in the analyses: nine studies, which generated 19 data points based on a total of 1,340 participants for the Copy condition; seven studies, which generated 12 data points based on a total of 1,086 participants for Immediate Recall; seven studies, which generated 11 data points based on a total of 1,056 participants for the Long-Delayed Recall. Quadratic regressions of the test scores on age yielded R2 of 0.899 for the Copy condition, 0.822 for Immediate Recall, and 0.862 for Long-Delayed Recall, indicating that 82%-90% of the variance in test scores for the three conditions is accounted for by the models. Based on these models, we estimated scores for the three conditions for age intervals between 22 and 79 years. If predicted scores are needed for age ranges outside the reported age boundaries, with proper caution (see Chapter 3), they can be calculated using the regression equations included in the tables, which underlie calculations of the predicted scores. It should be noted in the context of acrosscondition comparisons that mean age for the Copy condition is considerably higher than mean ages for Immediate Recall and LongDelayed Recall because data for two large studies based on the older samples were reported only for the Copy condition. The scores for the Copy condition of the ROCF for healthy young to middle-aged samples are not expected to be normally distributed. It should be noted that the majiority of the studies contributing to the aggregate sample for the Copy condition in our analyses, reported data for older age groups, with the mean age of 62.73 (19.27). The mean Copy score for the aggregate sample is 32.20 (1.79), reftecting age-related decline from the optimal performance expected in younger samples. Thus, the distribution of scores in our sample is more normal than expected in younger samples due to variability in both directions from the mean, avoiding scores being skewed due to ceiling effect. The pattern of SDs differs across the three conditions. For the Copy condition, linear regression of SDs on age yielded R2 of 0.685;
270
PERCEPTUAL ORGANIZATION: VISUOSPATIAL AND TACTILE
and for Immediate Recall, quadratic regression of SDs on age yielded R2 of 0.694, indicating increase in variability with advancing age, consistent with the literature. Predicted SDs, based on these models, are reported. Regressions of SDs for Long-Delayed Recall on age suggest that age does not account for a significant amount of variability in SDs (R2 = 0.482). Though some increase ;in variability with advancing age is expeded, this trend was not significantly evident in !the collected data. Therefore, we suggest that the mean SD for the aggregate sample be used across all age groups. Predicted scores and SDs for 12 age ranges across three conditions are summarized in Table A12m.4. Examination of the effects of demQgraphic variables on the ROCF scores indicated that education did not contribute to the te~ scores in the data available for analyses. Th. effects of intelligence level, gender, and hanf}edness on ROCF performance were not expl~d due to a scarcity of data available for revi~.
Strengths of the analyses
.
1. Total sample size of 1,340 for the Copy condition, 1,086 for Immediate· Recall, and 1,056 for Long-Delayed Re~all. 2. R2 of 0.899 for the Copy condition, 0.822 for Immediate Recall, and 0.~62 for Long-Delayed Recall, indicating a good model fit. 3. Postestimation tests for parameter specifications did not indicate problelDS with normality or homoscedasticity, With the exception of the marginally signifieant test for normalit:f for Long-Delayed ~4. It should be noted that the pledicted values match closely the normative data provided in the Meyers and Meyers' (1995b) manual for all three conditions, with respect to both the extrem' values and the direction/rate of ag~related changes.
Limitations of the analyses
1
1. Postestimation test for normality: for the Long-Delayed Recall was m~ginally significant. The Kdensity plot pemonstrated a positive skew in the disttibution
of residuals, which does not affect the estimates of regression coefficients and accuracy of prediction but does infiuence the results of significance tests. 2. Data for only a narrow range of higher levels of education are available for the analyses (12.2-16.2 years). Mean education of 14.33 (0.98) for the Copy condition is high. We were unable to fully explore the effect of education on the test scores because lower educational levels are not represented in the data. Though reports on the relationship between education and test scores are equivocal, a number of studies suggest that higher levels of education are associated with better test performance. Therefore, the predicted values might overestimate expected scores for individuals with lower educational levels. 3. Although the effect of intellectual level on ROCF performance has been reported in several studies, we could not include measures of intellectual level in our analyses due to great variability in the type of measures used to assess functional level among the different studies.
CONCLUSIONS
A great number of studies exploring the psychometric properties of the ROCF and its clinical utility attest to its popularity among clinicians and investigators alike. However, tremendous variability in administration and scoring of the ROCF obscures comparability of the results of these studies. To improve consistency across different studies, the procedures for administration and scoring need to be highlighted in detail by clinicians and investigators. It should be noted that the distribution of scores for the ROCF Copy condition deviates considerably from the normal distribution. A majority of participants are capable of copying the figure without major distortions. Therefore, a label of "superior" performance given to a subject achieving a high ROCF score is meaningless. On the other hand, the test is highly sensitive to deficits in visuospatial
REY-OSTERRIETH COMPLEX FIGURE
information processing, and achieving a low performance score falling in the outlying range has clinical significance. In addition to the numerical expression of a subject's performance, the value of qualitative interpretation and the delineation of subject's strategy/type of errors was emphasized in several studies reviewed above. In this context, the two avenues of research on the ROCF, namely, studies on clinical utility and on the cognitive processes involved in figure drawing, are mutually enriching. Recommendations for future research on the ROCF include careful analysis of the effects of demographic factors on performance. The well-documented effects of age and intelligence (and possibly education) need to be considered in subject selection and data presentation format. Although education did not have an effect on ROCF performance in the meta-analyses described in this chapter, this is due to a narrow range of education in the
271
aggregate sample. The scope of the research literature should be expanded to include lower levels of education and intellectual functioning. A large number of studies on the learnin!¥ processing strategies in children and on the clinical sensitivity of the test to different neurological conditions in adults are available in the literature, but only a few studies are dedicated to the cognitive/processing strategies issues related to older age groups. The psychometric properties of different scoring systems need to be further assessed. Data on interrater reliability, internal consistency, and test-retest reliability are scarce. From the review of existing studies, it appears that different scoring systems are differentially applicable to specific clinical and research situations. Additional information on the current use of the ROCF and suggestions for future investigations, submitted by clinicians, are summarized by Knight et al. (2003).
13 Hooper Visual Organization Test
BRIEF HISTORY OF THE TEST
The Hooper Visual Organization Test (HVOT) consists of 30 line drawings of familiar objects which have been fragmented into pieces. The task requires the examinee to mentally reintegrate and name the objects, which are arranged in order of increasing difficulty. The response format can be oral or written, depending on whether the individual administration or the booklet format is used. The score is the number of correctly identified items, with halfpoints available for some of the items. Wetzel and Murphy (1991) suggest a discontinuation rule of five consecutive errors, based on a rating change of only 1% using this strategy. The test was first published in 1958 and revised in 1983. The test manual for the revised edition provides conversion tables to correct raw scores for age and educational level. Corrected or uncorrected raw scores can be converted to T scores according to the tables provided in the manual, with higher T scores representing a greater likelihood of neurological dysfunction. The standardization data reported in the manual are based on Mason and Ganzler's (1964) all-male sample of 231 patients, personnel, and volunteer workers from a Veterans Administration hospital. The sample was stratified into nine age cohorts:25-29,30-34,35-39,40-44,45-49,5054, 55-59, 60-64, and 65-69 years.
272
In addition to using T-score tables, determination of impaired vs. normal performance can be made using the cutoff criteria. The cutoff scores recommended by the authors vary depending on test administration setting. In a clinical diagnostic setting, a cutoff score of ~24 is suggested in determining whether further assessment is needed. On the other hand, if the test is used as part of a screening battery administered to all patients admitted to a facility with a low incidence of organic brain pathology, a cutoff of 20 is recommended to minimize the rate of false-positive errors. Boyd (1981) argued: no single cutoff score can be recommended for use in all clinical situations. Factors such as the subject's age, educational level, intelligence, and whether the situation requires minimization of false positives or false negatives, must all be weighed in interpreting test results. (p. 19)
While the cutoff score suggested by Hooper was judged by Boyd (1981) to be optimal for evaluating chronically ill institutionalized patients, it appeared to be too low for less incapacitated patient populations. Furthermore, Nabors et al. (1997) suggested a cutoff score of ~ 15 for determination of cognitive impairment in medically ill elderly as this score provided the best correct classification in their sample of urban medical inpatients at
HOOPER VISUAL ORGANIZATION TEST
273
a post-acute geriatric rehabilitation unit (81% sensitivity, 79% specificity). Hooper also developed a qualitative system of response analysis involving four categories: isolate, perseverative, bizarre, and neologistic responses. Lezak et al. (2004) underscores the benefits of qualitative analysis of errors, pointing to the localizing significance of fragmentation tendencies. Nadler et al. (1996) concur that qualitative analysis of errors improves the differentiation between the effects of right vs. left hemisphere dysfunction on HVOT performance. Merten and Beal (2000) found item ranking for the HVOT to deviate from empirically based item difficulty in their sample of German-speaking neurological patients and rules for a number of items to be arbitrary. The authors proposed a revised version based on empirical item analysis, which retains the original items but has a modified set of instructions, order of items, and scoring and administration rules. Merten (2002) developed a short form consisting of 15 items, which was validated on another sample of Germanspeaking neurological patients.
1982a,b; Rathbun & Smith, 1982; Woodward, 1982). "nte above issue is directly related to assumptions as to which cognitive functions are measured by the HVOT. Two components of information processing involved in HVOT responses are mental reintegration and naming of the objects for each test item. If visual perception and synthesis are the primary mechanisms involved in item analysis, then nondominant hemisphere contribution prevails. If test performance also imposes considerable naming demands, then both dominant and nondominant hemispheres contribute substantially to test performance. Studies exploring the relative contribution of these cognitive processes to HVOT performance are largely equivocal. Lezak (1995), Lezak et al. (2004), and Spreen and Strauss (1991, 1998) suggest caution in interpreting HVOT failures as a manifestation of visuospatial deficit due to the contribution of the naming component. Schultheis et al. (2000) developed the Multiple-Choice Hooper Visual Organization Test (MC-HVOT), which consists of the 30 original stimuli presented with four response choices, in order to remove the naming demands on test performance. The authors found that performance of anomie patients was significantly facilitated by the multiple-choice format. Furthermore, patients with both right and left hemisphere involvement benefited from diminished naming demands. In contrast, Ricker and Axelrod (1995) found that perceptual organization accounted for 44% of HVOT performance variance, whereas confrontation naming ability was not significantly related to test performance. Similarly, in a study designed to replicate and extend the above research, Paolo et al. (1996c) observed the HVOT to be a measure of perceptual organization, whereas performance on the test was not significantly impacted by poor naming ability. Paul et al.'s (2001) results are consistent with these findings. Greve et al. (2000) found a small but significant effect of naming on HVOT performance, which, however, was interpreted by the authors as having little or no practical impact. Such discrepant findings are likely to be related to composition of study samples, with
Construct Validity The HVOT was developed as a screening instrument for organic brain dysfunction. However, the issue of the test's sensitivity to general vs. lateralized dysfunction remains controversial. The test authors suggest that the HVOT "is sensitive to general impairments, not specific visuopractic functions" (Hooper, 1983, p. 6). This view is supported by Boyd (1981, 1982a), Wang (1977), and Wetzel and Murphy (1991). However, the HVOTs sensitivity to lateralized dysfunction has been demonstrated in several studies. Lewis et al. (1997) report that HVOT performance is vulnerable to acute lesions in the right anterior quadrant of the brain. In contrast, Fitz et al. (1992), Rathbun and Smith (1982), and Woodward (1982) demonstrate HVOT sensitivity to localized dysfunction of the nondominant parietal lobe. In fact, a heated debate over general vs. specific sensitivity of the HVOT is reflected in a series of articles published in response to Boyd's (1981) article (Boyd,
274
PERCEPTUAL ORGANIZATION: VISUOSPATIAL AND TACTILE
samples comprised of aphasic patients demonstrating the largest effect of naming difficulty on HVOT performance. Merten and Beal (2000) indicated that the HVOT measures visuoperceptual and visuospatial-{)rganizational dysfunction, Seidel (1994) found it to be a measure of general visualperceptual-constructional abilities in a pediatric population, and Johnstone and Wilhelm (1997) concluded that HVOT measures global visuospatial intelligence and shares 12%-23% of variance with WAIS-R PIQ subtests.
Psychometric Properties of the Test Lopez et al. (2003) examined the psychometric properties of the test on a sample of 281 cognitively impaired and intact patients and reported acceptable estimates of internal consistency (oc = 0.882) and interrater reliability (0.977-0.992). Similarly, an internal consistency estimate of >0.88 was reported by Merten and Beal (2000) on a sample of 320 German-speaking neurological patients. Additional data on the reliability and validity of the HVOT are provided by Gerson (1974), Franzen (2000), Franzen et al. (1989), Lezak et al. (2004), and Spreen and Strauss (1998). Item analysis for use of the HVOT with Indian participants was performed by Verma et al. (1993).
RELATIONSHIP BETWEEN HVOT PERFORMANCE AND DEMOGRAPHIC FACTORS Age and intelligence level are consistently related to HVOT performance. Tamkin and Jacobsen (1984) report an effect of age and IQ on HVOT performance in their sample of 211 male, veteran, psychiatric inpatients. Similarly, Wentworth-Rohr et al. (1974) found a positive relationship between HVOT scores and intelligence level as well as a negative age/ HVOT relationship beginning in the late 30s. Age-related changes in HVOT performance are also documented by Farver and Farver (1982) and by Tamkin and Hyer (1984). Hilgert and Treloar (1985) documented an effect of age and IQ level but no gender differences
in elementlll)'-school children. An effect of IQ is also reported by Gerson (1974). Education and gender were unrelated to HVOT scores in a study by Wentworth-Robr et al. (1974). In contrast, Verma et al. (1993) found significant effect of education on HVOT scores. Based on the analysis of HVOT performance of 434 normal children aged ~13, Kirk (1992b) reported that boys attained adult performance by age 12, whereas girls participating in this study did not reach the adult level. Based on these data. Kirk documented an effect of age and gender on HVOT performance. An interaction between age and education in a sample of cognitively intact elderly was reported by Richardson and Marottoli (1996). Nabors et al. (1997) found HVOT scores to be significantly related to age and education in a total sample, which combined cognitively intact and impaired elderly urban medical patients, whereas performance was not significantly related to these demographic variables for the cognitively intact group considered separately. For further information regarding the HVOT, see Lezak et aL (2004) and Spreen and Strauss (1998).
METHOD FOR EVALUATING THE NORMATIVE REPORTS To adequately evaluate the HVOT normative reports, seven key criterion variables were deemed critical. The first six of these relate to subject variables, and the remaining refers to a procedural issue. Minimal criteria for meeting the criterion variables were as follows.
Subject Variables Sample Size
Fifty cases are considered a desirable sample size. Although this criterion is somewhat arbitrlll)', a large number of studies suggest that data based on small sample sizes are highly influenced by individual differences and do not provide a reliable estimate of the population mean.
HOOPER VISUAL ORGANIZATION TEST
Sample Composition Description
Information regarding medical and psychiatric exclusion criteria is important. It is unclear if geographic recruitment region, socioeconomic status, occupation, ethnicity, or recruitment procedures are relevant. Until this is determined, it is best that this information be provided. Age Group Intervals
This criterion refers to grouping of the data into limited age intervals. This requirement is relevant for this test since a strong effect of age on HVOT performance has been demonstrated in the literature. Reporting of Educational Levels
Given the possible association between education and HVOT performance, information regarding education should be provided for each subgroup. Reporting of Intellectual Levels
Given the relationship between HVOT performance and IQ, information regarding intellectual level should be provided for each subgroup, and preferably nonnative data should be presented by IQ levels. Reporting of Gender Composition
Given the possible association between gender and HVOT performance, information regarding gender composition should be reported for each subgroup. Procedural Variables Data Reporting
Means and standard deviations for the total number of correct responses should be reported.
SUMMARY OF THE STATUS OF THE NORMS There are only few studies available in the literature that provide performance levels for the HVOT. Several studies have reported data for psychiatric or neurological samples. Among the studies providing data for nonnal samples,
275
several used only selected HVOT items. Only studies that report data for the full HVOT for nonnal samples are reviewed in this chapter. In all articles reviewed below, the score represents the total number of correct responses (out of 30). In this chapter, nonnative publications and control data from clinical studies are reviewed in ascending chronological order. The text of study descriptions contains references to the corresponding tables identified by number in Appendix 13. Table A13.1, the locator table, summarizes information provided in the studies described in this chapter. 1
SUMMARIES OF THE STUDIES [HVOT.1] Rao, Leo, Bernardin, and Unverzagt, 1991a (Table A13.2)
The authors described the performance of a control group in their study on cognitive dysfunction in multiple sclerosis. The control group included 100 participants (75 females, 25 males), who were paid for their participation. The mean age of the sample was 46.0 (11.6), mean education was 13.3 (2.0), and estimated premorbid intelligence (based on demographic variables) was 106.5 (6.9). All except for one participant were Caucasian. Participants were recruited from newspaper advertisements. Exclusion criteria were history of substance abuse, psychiatric disturbance, head injury or any other nervous system disorder, or use of prescription medications. In addition to detailed medical and psychosocial history participants underwent a neurological examination, MRI, and neuropsychological testing. The HVOT was administered as part of a larger battery. For a description of the administration procedure, the authors referred readers to an earlier article. Study strengths 1. Large sample size. 2. The sample composition is well described in tenns of age, education, 'Nonnative data for children 5-11 years old are provided by Seidel (1994) and for those 5-13 years old by Kirk (1992b). See also Baron (2004) and Spreen and Strauss (1998).
276
PERCEPTUAL ORGANIZATION: VISUOSPATIAL AND TACTILE
gender, ethnicity, IQ estimate, geographic area, clinical setting, and recruitment procedures. 3. Rigorous exclusion criteria. 4. Means and SDs for the test scores are reported. Consideration regarding use of the study 1. The data are not partitioned by age group. [HVOT.2] Libon, Glosser, Malamut, Kaplan, Goldberg, Swenson, and Sands, 1994 (Table A13.3)
The HVOT was administered to a sample of 37 right-handed participants aged 64-94 years as part of a study examining the relationship between age and cognitive functions in normal aging. Participants were recruited from a local community center and from the Active Life Program, an exercise and fitness program at the Philadelphia Geriatric Center. All participants scored ?.27 on the Mini-Mental State Exam (MMSE) and ~10 on the Geriatric Depression Scale (GDS). All participants passed a physical examination and a graded exercise cardiac function test. Exclusion criteria were history of stroke, head injury, seizure disorder, or major psychiatric problems including substance abuse or psychoactive medications, per clinical interviews. The sample was divided into the young-old (64-74 years) and old-old (75-94 years) groups. There were no between-group differences in education or MMSE or GDS score. The HVOT was administered as part of a larger battery. The number of correct responses was recorded. Study strengths 1. The sample composition is well described in terms of age, education, gender, handedness, MMSE and GDS scores, geographic area, setting, and recruitment procedures. 2. Rigorous exclusion criteria. 3. Means and SDs for the test scores are reported. 4. The sample is divided into two age groups.
Consideration regarding use of the study 1. Small sample size. [HVOT.3] Richardson and Marottoli, 1996 (Table A13.4)
The authors report data for 101 autonomously living, mostly Caucasian, elderly participants who comprise a subsample of a cohort of participants in Project Safety, a study on driving performance conducted in New Haven, Connecticut. Individuals with a history of neurological disease, excessive use of alcohol, or risk for dementia (based on MMSE score) were excluded. The sample consisted of 53 males and 48 females, with a mean age of 81.47 (3.30), mean education of 11.02 (3.68) years, and mean MMSE score of 26.97 (2.55). Ethnic composition was 90.1% white and 9.9% black. The HVOT was administered and scored according to the standard instructions provided in the test manual. The data were divided into two age groups of younger-old (76-80) and older-old (81-91) by two education groups. The results indicated that the mean performance for participants with < 12 years of education was stable across younger-old and older-old age groups and considerably lower than for their more educated counterparts; however, performance for the younger-old age group with >12 years of education was superior to that of the older-old group with comparable education. Study strengths 1. Data for a relatively large sample of elderly participants are presented. 2. Sample composition is well described in terms of gender, education, geographic area, and ethnicity. 3. Adequate exclusion criteria. 4. The data are classified into age-byeducation groupings. 5. Means and SDs are reported. Considerations regarding use of the study 1. No information on intelligence level is provided. 2. Sample sizes for each age-by-education cell are relatively small.
HOOPER VISUAL ORGANIZATION TEST
[HVOT.4] Walsh, Lichtenberg, and Rowe, 1997 (Table A13.5)
The authors compared HVOT performance for three groups of geriatric rehabilitation inpatients: cognitively intact, mildly impaired, and severely impaired. Patients were referred for routine cognitive evaluations from two sites: a geriatric rehabilitation service of an urban university rehabilitation hospital and the physical medicine and rehabilitation unit at a suburban rehabilitation hospital. The cognitively intact group consisted of 32 participants (10 male, 22 female) who scored 2:123 on the Dementia Rating Scale or in the unimpaired range on all subtests of the Neurobehavioral Cognitive Status Examination. Participants had no evidence of closed head injury, stroke, or other neurological conditions which could affect cognition, as determined by medical chart review, patient interview, and/or negative radiological findings. The HVOT was administered according to standard instructions.
Study strengths 1. The sample composition is well described in terms of age, education, gender, and clinical setting. 2. Adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. The sample is relatively small. 2. No information on IQ is reported. [HVOT.Sl Lichtenberg, Ross, Youngblade,
and Vangel, 1998 (Table A13.6) The authors compared two groups of geriatric urban medical inpatients: cognitively intact and impaired. All patients were recruited from consecutive admissions to a geriatric medical
277
rehabilitation program in a midwestern urban university hospital. Seventy-four patients were identified as cognitively intact. This sample had a mean age of 76.9 (5.9) and mean education of 10.8 (3.0); 74% were women, 51% were African American, and 49% were European American. All participants were functionally independent across all cognitive domains and activities of daily living; had no history of neurological disease, psychiatric illness, or substance abuse; and had normal results of neurological examination. The HVOT was administered as part of a larger battery.
Study strengths 1. Adequate sample size. 2. The sample composition is well described in terms of age, education, gender, ethnicity, clinical setting, and recruitment procedures. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. The data are not partitioned by age group. 2. No information on IQ is reported.
CONCLUSIONS The HVOT has been used clinically as a measure of visual perception and organization. However, the effect of naming impairment on HVOT performance remains unclear. The clinical utility of this test would be enhanced with the availability of normative data for a large sample of neurologically intact participants of both genders across a wide age span, partitioned by age group and intelligence level. 2
2 Meta-analyses were not perfonned on the HVOT due to lack of sufficient data.
14 Visual Form Discrimination Test
BRIEF HISTORY OF THE TEST Many of the most commonly administered tests in neuropsychological practice require intact visual perception, and accurate interpretation of visually mediated tests often rests upon the assumption that visual perceptual skills are intact (Lezak et al., 2004). For example, in the absence of careful assessment of visual perceptual abilities, low performance scores on visual memory tests may be mistakenly attributed to memory impairment when in fact the deficits may be primarily related to visual perceptual ability rather than memory. The Visual Form Discrimination Test (VFDT) was developed by Arthur L. Benton and colleagues (Benton et al., 1983b) as a screening test for visual perceptual deficits. (Please see Appendix 1 for ordering information.) The VFDT is a multiple-choice, matchingto-sample task. The test is presented using a spiral-bound booklet (Benton et al., 1983b). The subject views an 81h x 11" inch page in the booklet displaying a sample design containing three geometric elements. Directly below the stimulus page, the adjoining 81h" x 11" inch page presents four smaller three-element designs (numbered 1, 2, 3, or 4). The subject, therefore, can concurrently view the main stimuli and the four smaller design groupings below. The designs on both pages are similar
278
in that each contains two large geometric shapes and a small peripheral figure. However, only one of the smaller designs shown on the adjoining page below is an exact match for the larger stimulus design above. The other three designs are considered "distracters" and are variants of the larger stimulus design. One of the three distracter designs is created by moving or rotating the peripheral figure, the second by distorting one of the major figures, and the third by rotating one of the major figures. The subject is requested to point to or "say the number" of the design below that exactly matches the larger stimulus design. The VFDT consists of two practice items and 16 test items. There is no time limit, and the scoring system awards 2 points for each correct answer and 1 point for an error that involves only the peripheral figure. Errors involving the major figures receive no points. Scores range 0-32. Unimpaired individuals usually can complete the test in less than 5 minutes, and the test rarely takes longer than 10 minutes to complete regardless of the level of impairment. Because the VFDT is a nonmotoric task, it is especially useful when assessing senior adults, patients with severe arthritis or hemiparesis, and/or the medically ill. The validity of the VFDT to assess visual perceptual impairments with various neurological conditions has been well established.
VISUAL FORM DISCRIMINATION TEST
For instance, the VFDT has been used to examine visual perceptual impairments in posthead injury patients (Iverson et al., 1997b, 2000; Malina et al., 2001; Millis et al., 2001; Wilde et al., 2000), aphasic patients (Varney, 1981), and patients with vascular dementia (Mast et al., 2000), Alzheimer's disease (Iverson et al., 1997a; Kaskie & Storandt, 1995), or Parkinson's disease (Tang & Liu, 1993). Patients with right hemisphere lesions show the highest rates of test failure (Benton, 1983a), although aphasic alexics have been observed to show a 36% failure rate (Varney, 1981) and recovery in letter recognition is accompanied by improvement in visual form discrimination. Test-retest reliability has been examined by Campo and Morales (2003) and found to be quite stable over brief intervals (e.g., ~1