Using JMP® Student Edition
For Windows and Macintosh
The User’s Guide to Statistics with JMP® Student Edition
The c...
57 downloads
832 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Using JMP® Student Edition
For Windows and Macintosh
The User’s Guide to Statistics with JMP® Student Edition
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2009. Using JMP® Student Edition. Cary, NC: SAS Institute Inc. Using JMP® Student Edition
Copyright © 2009, SAS Institute Inc., Cary, NC, USA ISBN 978-1-60764-190-2 All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st printing, April 2009 SAS® Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228. SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. For more information about this or any other JMP product, contact: SAS INSTITUTE INC. SAS Campus Drive Cary, NC 27513 USA
For permission to use this work, contact us: www.jmp.com/se fax: 919.677.4444 phone: 919.677.8000
Technical Support is provided by the publisher of the work that JMP-SE is bundled with, and to registered instructors.
Table of Contents 1
Getting Started with JMP Student Edition Prerequisites For This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computer and Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Learning JMP Student Edition with its Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conventions Used in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Starting JMP Student Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JMP Student Edition Toolbars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Macintosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finding Means, Medians, and Standard Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Where to Go from Here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
11 11 11 11 12 12 13 13 15 15 16 19
The Distribution Platform Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Launching the Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Testing a Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Testing Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Annotating Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Modeling Type of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous Variable Graphs and Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Histogram Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normal Quantile Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outlier Box Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantile Box Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stem and Leaf Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CDF Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Categorical Variable Graphs and Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Testing a Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21 21 22 22 24 25 26 28 29 30 30 30 31 31 32 32 33 33 34 34 35
Testing a Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Testing Categorical Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saving Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Whole-Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Capability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
The Fit Y by X Platform Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Launching the Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing a t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pooled t test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selecting and Marking Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis of Variance (ANOVA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fitting Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disclosure Icon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two-Way Contingency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Formula Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scatterplots—The Continuous by Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Summary of Fit Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Lack of Fit Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis of Variance Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter Estimates Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Fitting Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . One Way ANOVA—The Continuous by Categorical Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Script Submenu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contingency Analysis—The Categorical by Categorical Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logistic Regression—The Categorical by Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Whole Model Test Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
35 36 36 37 38 39
45 45 46 47 48 50 50 51 53 53 53 54 55 56 58 59 60 60 60 61 61 66 66 67 68 69
The Matched Pairs Platform Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Preparing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Launching the Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
The Matched Pairs Launch Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 The Matched Pairs Scatterplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Interpreting the Matched Pairs Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5
The Fit Model Platform Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Launching the Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Setting Titles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examining Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Least Squares Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Re-running an Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Fit Model Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fitting Personalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emphasis Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Run Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit Model Report Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leverage Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effect Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exploring the Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Row Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Save Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
79 79 80 81 82 82 84 85 85 86 86 87 88 88 88 89 91 93 93 95 96
Stepwise Regression Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 The Stepwise Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Stepwise Regression Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Current Estimates Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Step History Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Make Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 All Possible Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7
Control Charts Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 The Control Chart Launch Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Process Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Chart Type Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Specified Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tailoring the Horizontal Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Window Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tests for Special Causes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Western Electric Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Westgard Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Excluded, Hidden, and Deleted Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shewhart Control Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shewhart Control Charts for Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XBar-, R-, and S- Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Run Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Individual Measurement Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shewhart Control Charts for Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p- and np-Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . u-Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Sum (Cusum) Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Launch Options for Cusum Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cusum Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
113 114 114 115 115 116 118 119 120 122 124 124 125 125 126 127 129 130 131 132 133 133 135 136 137
Time Series Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Time Series Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Time Series Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time Series Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of Forecast Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Comparison Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Summary Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter Estimates Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
143 147 147 148 149 149 149 150 150 151 151 152
Forecast Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iteration History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smoothing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smoothing Model Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Double (Brown) Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear (Holt) Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Damped-Trend Linear Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seasonal Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Winters Method (Additive) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
153 153 153 154 154 155 156 157 157 158 158 158 159
Correlations and Multivariate Techniques Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Launch the Platform and Select Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correlations Multivariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Correlations and Partial Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scatterplot Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pairwise Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonparametric Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computations and Statistical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pearson Product-Moment Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonparametric Measures of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
161 163 163 163 164 166 167 167 168 169 169 169 170
10 Importing, Exporting, and Charting Data Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Chart Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Overlay Plot Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Importing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Macintosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Importing Text Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Importing Microsoft Excel Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results from Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
171 171 173 175 175 176 177 179 180
The Chart Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frame Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Level Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Overlay Plot Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-Plot Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
181 182 183 184 185 185 186 186
11 Full Factorial Designs Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a Factorial Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entering Responses and Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selecting Output Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Making the Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
189 195 195 196 197
12 Screening Designs Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a Screening Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entering Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entering Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choosing a Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Displaying and Modifying the Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Output Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viewing the Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuing the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
199 201 202 203 203 207 210 211 211
13 Response Surface Designs Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a Response Surface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entering Responses and Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choosing a Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Axial Value (Central Composite Designs Only) . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Output Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viewing the Design Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuing the Analysis, If Needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
216 218 219 219 220 221 221 222
14 Prospective Power and Sample Size Prospective Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 One-Sample and Two-Sample Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Single-Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Power and Sample Size Animation for a Single Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two-Sample Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . k-Sample Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . One-Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . One-Sample and Two-Sample Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Counts per Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sigma Quality Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
228 229 230 231 232 233 235 236
Index JMP-SE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Notices Technology License Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
1 Getting Started with JMP Student Edition Welcome to JMP Student Edition—the version of SAS Institute’s award-winning JMP Statistical Discovery software tailor-made for the introductory statistics student. JMP Student Edition is easy to learn and easy to use. All of the statistics are accessible in a familiar, point-and-click format, and the statistical concepts are supported with both graphs and appropriate numerical results. In addition, all the data tables, graphs, and charts are dynamically linked together, allowing for interactive exploration of patterns and outliers, whenever they present themselves. We hope that this visualization makes learning statistics more fun and easier than it has ever been before.
Prerequisites For This Book To use JMP Student Edition, minimal knowledge about computers and statistics is necessary. The specific prerequisites are as follows:
Computer and Operating System In this manual, familiarity with standard computer operations and operating system terminology is assumed, especially use of the mouse, standard menus, and commands. Knowledge of opening, closing, and saving files should also exist before reading this guide. See the reference books for the operating system and computer for more information on these topics.
Statistics Since JMP Student Edition is specially-made for the beginning statistics student, it requires no formal statistics knowledge. This book shows how to accomplish simple statistical tasks, like those in all introductory statistics texts.
Learning JMP Student Edition with its Documentation JMP Student Edition includes an extensive online help system. It can be read like a book, since it contains a complete table of contents, or it can be used to search for a specific topic. In addition, JMP Student Edition is equipped with context-sensitive help. To use it, select the help tool (see Figure 1.1) and click anywhere inside a data table or report. JMP Student Edition opens help specific to the item you clicked.
12
1 Getting Started with JMP Student Edition Conventions Used in This Book
Figure 1.1 The JMP Student Edition Help Tool
Help Tool
Conventions Used in This Book Throughout this manual, special typefaces are used to designate commands, menu items, or other unique features. • Menu items, buttons, and report titles are usually set by JMP Student Edition and are not alterable by the user. • Variables under study are arranged in columns in the data spreadsheet, so the words variable and column are often used interchangeably. • File names are opened and saved to disk or network folders. • New or important words are emphasized. Certain paragraphs are meant to be carried out while reading the text. They are designated by a mouse on the left. • The notation File > Open means to select the Open command from the File menu. • Sections titled “Introduction” provide a hands-on approach to learning the basics of JMP Student Edition. Each “Introduction” section explores the Denim.jmp sample data set using a specified platform or function. They are separate from the rest of the material in the chapter. In fact, all the “Introduction” sections could be read from each chapter before reading the rest of the material in the book, which is intended primarily as a reference.
Starting JMP Student Edition JMP Student Edition can be started in two ways: • Double-click the JMP Student Edition icon • Double-click a JMP Student Edition data set or script. By default, JMP Student Edition begins by opening a special navigation window, called the JMP Starter (see Figure 1.2). If the JMP Starter is not automatically opened, Select View > JMP Starter.
1 Getting Started with JMP Student Edition JMP Student Edition Toolbars
13
This window provides quick and easy access to all the menu commands of JMP Student Edition. Although these commands are accessible through menus and toolbars, they are also presented in the JMP Starter in a logical, organized way. There are nine tabs groups that partition the commands based on their function: • The File group contains commands related to opening and closing several types of files. • The Basic group contains commands that perform analyses for one-variable and two-variable situations. • The Model group contains commands for matched pairs (a special two-variable situation) and multivariate models. • The Survival group contains reliability and survival commands. • The Graph group contains commands for charts and 3D graphics. • The Measure group contains tools for capability analysis. • The Control group contains commands for control charts. • The DOE group contains commands for designing an experiment. • The Tables tab contains commands used to manipulate data tables.
JMP Student Edition Toolbars An alternative way of accessing JMP Student Edition commands is by using toolbars.
Windows Toolbars that duplicate the JMP Starter’s commands include the File/Edit toolbar (Figure 1.3), the Tools toolbar (Figure 1.4), the Analyze toolbar (Figure 1.5), the Graph toolbar (Figure 1.6), and the
1 Introduction to JMP Student Edition
Figure 1.2 The JMP Starter Window
14
1 Getting Started with JMP Student Edition JMP Student Edition Toolbars
Tables toolbar (Figure 1.7). There is also a Data Files toolbar, used to switch between open data tables, as well as user-customizable toolbars. Each of these commands is explained fully in later chapters. Figure 1.3 The File/Edit Toolbar
Run New Script Cut Data Open Copy Save Table New Paste Print Script
Figure 1.4 The Tools Toolbar
CrossArrow Hand Simple hair Lines Shapes Help Brush Selection Zoom Polygon Scroller Lasso Annotate
Figure 1.5 The Analyze Toolbar
Survival/ Fit Reliability Fit Y By X Model Multivariate Matched Time Series Pairs
Distribution
Figure 1.6 The Graph Toolbar
Overlay Pareto Plot Bar and Plot Pie Spinning Charts Plot
Figure 1.7 The Tables Toolbar
Subset Summary
Stack Sort
Split
Some of these toolbars are not displayed by default. To activate toolbars that are not showing, • Select View > Toolbars > Show Toolbars to open the Show Toolbars window (Figure 1.8)
1 Getting Started with JMP Student Edition First Session
15
Figure 1.8 Show Toolbars Window
Macintosh On the Macintosh, toolbars are not set in groups, but are all available to be added to a single toolbar. To see the definitions of each button on the toolbar, or to add and subtract buttons from the toolbar, Control-click on the toolbar area of a window. From the window that appears, drag buttons onto the toolbar to add them.
First Session This section is a guide through a few simple steps that demonstrate opening a data table, requesting an analysis, and closing a data table. To open a data table, select File > Open, select Open Data Table from the JMP Starter, or click the Open button on the File/Edit toolbar. Select the file Denim.jmp and click Open. The data should appear like the listing in Figure 1.9.
1 Introduction to JMP Student Edition
Toolbars that are checked become visible. Those that are unchecked are hidden.
16
1 Getting Started with JMP Student Edition First Session
Figure 1.9 Partial Listing of the Denim Data File
This data set contains data on the starch content of processed denim. In this example, we examine the data for the Starch Content (%) variable and answer the following questions: • What is the mean of the data? • What is its median? • What is its standard deviation? • Also, produce a histogram of the data.
Finding Means, Medians, and Standard Deviations To answer these questions, use the Distribution platform. Select Analyze > Distribution. This brings up the launch dialog as seen in Figure 1.10.
1 Getting Started with JMP Student Edition First Session
17
Select the variable Starch Content (%), then click the Y, Columns button. This step tells JMP Student Edition the variable to analyze. Since Starch Content (%) is the only variable of interest, we are finished with this dialog. Click OK. The report is presented in its default vertical format. However, some people prefer a horizontal layout for the Distribution report. To change the layout to horizontal, Click on the red triangle next to the word Starch Content (%) in the report (see Figure 1.11). Figure 1.11 Red Triangles Reveal Popup Menus
Popup Menus
All of these red triangles reveal popup menus when they are clicked. Watch closely for them—they reveal further options and explorations available during the data exploration process. The menu next to Starch Content (%) shows the options for this single variable, although there are cases (seen later) where the Distribution platform operates on several variables. Options available to all the variables in the report are in the menu next to the word Distributions. Select Display Options > Horizontal Layout to see the report in Figure 1.12.
1 Introduction to JMP Student Edition
Figure 1.10 The Distribution Dialog
18
1 Getting Started with JMP Student Edition First Session
Figure 1.12 The Starch Content Distribution Report
The answers to the four questions are all in this report. Read off the mean (25.516634), the median (24.349), and the standard deviation (9.6568876). The histogram is shown on the left. If a printed copy of this report is needed, Select File > Print. Alternatively, this output may be included in a lab report written using a word processor. To move the report into another program, use the cut and paste features of JMP Student Edition: Select the Selection tool, which looks like a fat plus.
Selection tool
Hold down the Shift key and click on each part of the report that needs to be copied. In Figure 1.13, all the text columns and the histogram have been selected. None of the headings have been, nor has the box plot. Note that the histogram’s axis is selected separately from the histogram itself. Figure 1.13 Selection of Report Parts
Select Edit > Copy.
1 Getting Started with JMP Student Edition First Session
19
Now that the analysis is completed, close JMP Student Edition. Select File > Exit.
Where to Go from Here This simple example has shown all the steps needed to complete a JMP Student Edition analysis. From here, feel free to explore any of the sample data files that came with JMP Student Edition, explore the online help, or continue reading this book.
1 Introduction to JMP Student Edition
In the word processor, select Edit > Paste.
2 The Distribution Platform Single-variable statistics are the domain of JMP Student Edition’s Distribution platform. It calculates summary statistics, displays graphs, and computes hypothesis tests for these variables.
Introduction Open the data file Denim.jmp. For information on opening a file, see “First Session,” p. 15.
About the Data This file contains information from an experiment with blue jeans, and is referred to in each introductory section of this book. When blue jeans are manufactured, they usually contain a fair amount of starch, creating stiffness and stability in the fabric. However, most people find this stiffness undesirable—in fact, some customers say that jeans have a “breaking in” period before they become truly comfortable. This breaking in period is, in actuality, the time it takes for some of the starch present in the jeans to wear away and wash out. In an effort to minimize the amount of time needed to break in a new pair of jeans, denim manufacturers subject the fabric to a variety of treatments to remove some of the starch. This experiment used three such treatments in differently-sized wash loads. The three different treatments, recorded in the Method column, are as follows: • Alpha Amalyze is an enzyme added to the wash water that eats the starch. • Caustic Soda is a chemical dissolved in the wash water that chemically destroys the starch. • Pumice Stone are a physical abrasive that is added to the wash water that literally pounds the starch out. These abrasive pebbles are the source of the so-called stone-washed jeans. In addition, after the initial washing process, some jeans are sand blasted. Whether or not the fabric was sand blasted is recorded in the Sand Blasted? column. The samples came from several different rolls of fabric, with each roll identified in the Lot Number column. After treating the jeans, two measurements were taken: one to quantify the starch content of the fabric (measuring stiffness, recorded as a percentage of weight) and one as a count of destroyed threads (measuring wear-and-tear, recorded in the Thread Wear Measured column). The measured thread wear has been converted into an ordinal variable in the Thread Wear column by using the Formula Editor.
22
2 The Distribution Platform Introduction
Launching the Platform In this example, the variables are examined one at a time. Select Analyze > Distribution from the menu bar. This brings up the Distribution platform launch dialog. Figure 2.1 Distribution Launch Dialog
Select the variables Method and Starch Content (%) from the list on the left by clicking on the first variable name, holding down the Control (Windows) or Option (Macintosh) key, and clicking on the second variable name. Click the Y, Columns button. Click OK. Histograms and textual information on all the variables now appear. For details on these reports, see “Continuous Variable Graphs and Reports,” p. 30 and “Categorical Variable Graphs and Reports,” p. 34. Two of the histograms from this report are used later in this chapter, and are seen in Figure 2.9 on page 29. Many descriptive statistics can be read directly off the text reports accompanying these histograms.
Using Histograms Histograms appear with bar widths and positions calculated internally by JMP Student Edition. Sometimes, it is desirable to change these settings. For example, suppose the bar widths and positions of the Starch Content (%) histogram need modifying. To change them, Select the hand tool (Figure 2.2) from the Tools toolbar. Figure 2.2 Hand Tool
Hand Tool
Position the hand tool over the Starch Content (%) histogram and press the mouse button. Move the mouse horizontally (assuming the histogram is in its default vertical layout) to change the bar widths of the histogram.
2 The Distribution Platform Introduction
23
These histograms are also useful in looking at some relationships among the variables. For example, Click on the histogram corresponding to Alpha Amalyze in the Method histogram.
The bar for Alpha Amalyze is highlighted, as are the bars in the other histograms for all the data points that have Alpha Amalyze as their method. Notice that the corresponding rows in the data table are also highlighted. To bring the data table to the front, select Window > Denim. To bring the Distribution report to the front, select Window > Denim-Distribution. Data rows are highlighted in the data table so that they can be assigned row states—specific markers, colors, or labels—that persist in all of JMP Student Edition’s active plots. Whenever a row is selected in any plot, its selection status ripples through all of JMP Student Edition’s open windows. Highlight and explore the other wash methods, paying attention to the starch content that gets highlighted with each one. Try to determine if one of the methods results in lower starch content than the others. Click in the histogram bars for Caustic Soda and Pumice Stone. Look at the corresponding points that are highlighted in the Starch Content (%) histogram. It is often useful to have confidence intervals on the means or levels in these histograms. To get, for example, a 95% confidence interval on the levels of Method and Starch Content (%),
2 Distribution: Single-Variable Statistics
Move the mouse vertically to change the position of the bars.
24
2 The Distribution Platform Introduction
Select Confidence Interval > .95 from the drop-down menu next to the variable names in the histograms’ title bar.
Testing a Mean Continuing the analysis, suppose that prior research claims that the mean starch content of Alpha Amalyze-washed denim is 20%. To test that the mean of the Alpha Amalyze denim has a mean of 20%, two steps are required. • Make separate histograms for each of the three levels of the Method variable. • Test the mean using the Alpha Amalyze histogram. To accomplish these two steps, Bring up the Distribution launch dialog by again selecting Analyze > Distribution from the menu bar. Select Starch Content (%) in the list of variables and click the Y, Columns button. Select Method in the list of variables and click the By button. Click OK. Three histograms should appear, with the corresponding level indicated in the title bar of the histogram. In the Method=Alpha Amalyze section, select Test Mean from the drop-down list next to Starch Content (%). Figure 2.3 Test Mean
2 The Distribution Platform Introduction
25
Type 20 in the entry field for Specify Hypothesized Mean. Figure 2.4 Test Mean Dialog
The true standard deviation is not known, so leave the other entry field blank. This tells JMP Student Edition to compute a t-test of the mean. If the standard deviation had been known and entered, a z-test would be performed. Also, leave the box for the Wilcoxon Signed-Rank test unchecked. This is a nonparametric test that is not usually covered in an introductory course. The online help contains further information on these topics. Click OK. The results of the test are appended to the Distribution report. In this case, the t-test is two tailed, since the percentage could be higher or lower than 20%. Therefore, examine the p-value listed beside Prob > |t|, which in this case is a non-significant 0.5740.
Normality Many statistical tests make an assumption that the data is approximately normally distributed. Although there are usually more important things to worry about than the exact normality of the data, JMP Student Edition provides a quick way of assessing normality through the Normal Quantile Plot. Complete details of the Normal Quantile Plot are in the section “Normal Quantile Plots,” p. 31. To produce a Normal Quantile Plot, Select Normal Quantile Plot from the drop-down list next to one of the variable’s name. Figure 2.5 Normal Quantile Plot
2 Distribution: Single-Variable Statistics
Since the hypothesized mean is 20%,
26
2 The Distribution Platform Introduction
Scroll down the report to see that this command only added a Normal Quantile Plot for one variable in the report. Many times, a command needs to be sent to all the variables in the report, yet it is tedious to select the same command many times. JMP Student Edition therefore provides a way to “broadcast” a command throughout a report, using the Control (Windows) or a (Macintosh) key. Hold down the Control (Windows) or a (Macintosh) key and again select Normal Quantile Plot from the drop-down list next to the variable’s name. This time, a Normal Quantile plot is appended to every histogram. This shortcut works for most commands in drop-down menus. You can also test for Normality by fitting a Normal distribution, then performing a goodness-of-fit test. Select Fit Distribution > Normal from the platform drop-down list. When the report appears, select Goodness of Fit from the fitted distribution report.
This produces a report showing the parameters of the distribution, along with a goodness-of-fit statistic testing the null hypothesis that the distribution is, in fact, Normal. Small p-values indicate a non-normal distribution.
Testing Probabilities Another question that could be asked about this data is whether the three levels of Thread Wear occur with equal frequency. To test this assumption, a distribution of the Thread Wear variable is necessary. Make sure the original Denim data table is the front window. If not, select Window > Denim.jmp.
2 The Distribution Platform Introduction
27
After the histogram appears, select Test Probabilities from the drop-down list next to Thread Wear in the title bar of the histogram. An addition to the report appears. A screen shot of the addition appears later in this chapter, in Figure 2.16 on page 36. Make sure Fix omitted at estimated values, rescale hypothesis is selected. JMP Student Edition automatically scales the numbers entered into the entry fields so that they sum to one. This allows an easy way to test for equal probabilities—simply enter 1 in each entry field. Enter 1 into each Hypoth Prob entry field in the Test Probabilities section of the report.
Click Done. Figure 2.6 Test Probabilities Results
2 Distribution: Single-Variable Statistics
Request a Distribution of Thread Wear (not Thread Wear Measured).
28
2 The Distribution Platform Introduction
The results of the test are listed in the column labeled Prob>Chisq. This test shows some highly significant results (p Paste Special and choose one of the graphic formats.
30
2 The Distribution Platform Continuous Variable Graphs and Reports
• Continuous variables have an outlier box plot, constructed to show possible outliers in continuous variables. Outlier plots are discussed in the section “Outlier Box Plot,” p. 31.
Continuous Variable Graphs and Reports Initially, JMP Student Edition produces graphs and text reports to give information from the analysis. The text reports for continuous variables summarize typical univariate statistics, such as the mean, standard deviation, confidence interval on the mean, number of data points, and quantiles. The popup menu for continuous variables (Figure 2.10) shows the options available for continuous variables. Figure 2.10 Continuous Variable Popup Menu
Display Options The Display Options menu contains the following items: • Quantiles shows or hides the Quantiles table. • Moments shows or hides the Moments table. This table displays the mean, standard deviation, standard error of the mean, upper and lower 95% confidence limits for the mean, and data set size. • More Moments adds to the Moments table the variable’s sum, variance, skewness, kurtosis, and the coefficient of variation. • Horizontal Layout arranges text reports to the right of their corresponding graphs and shows the histogram as a horizontal bar chart. Selecting this option again returns the report to a vertical layout.
Histogram Options The Histogram Options menu contains the following items: • Histogram shows or hides the histogram.
2 The Distribution Platform Continuous Variable Graphs and Reports
31
• Count Axis adds an axis that shows the frequency of each value represented by the histogram bars. • Prob Axis adds an axis that shows the proportion of each value represented by histogram bars. • The Density Axis is the length of the bars in the histogram. Any combination of these axes can be added to categorical or continuous histograms. As the length of the bars is changed with the hand tool, the Count and Prob axes change, but the Density axis remains constant.
Normal Quantile Plots The Normal Quantile Plot option adds a graph to the report that is used to visualize the extent to which the variable is normally distributed. If a variable is normal, the normal quantile plot is approximately a diagonal straight line. This kind of plot is sometimes also called a quantile-quantile plot, or q-q plot. The Normal Quantile plot also shows confidence bounds. If the data fall within these confidence bounds, the data are approximately normal.
Outlier Box Plot The Outlier Box Plot is a schematic that shows the dispersion of a variable. This makes the identification of points with extreme values, sometimes called outliers, relatively easy. The ends of the box are the 25th and 75th quantiles, also called the quartiles. The difference between the quartiles is the interquartile range. Outliers are often identified as points that fall above the upper quartile + 1.5×(interquartile range) or below the lower quartile – 1.5×(interquartile range). The line across the middle of the box identifies the median sample value, and the means diamond indicates the sample mean and 95% confidence interval. The dashed lines in the outlier box plot are sometimes called whiskers, extending from both ends of the box. The whiskers extend to the outermost data point that falls within the distances computed for judging outliers.
2 Distribution: Single-Variable Statistics
• Std Err Bars draws the standard error bar on each level of the histogram. The standard error bar automatically adjusts to reflect the histogram’s bar widths and positions when you change them using the hand tool.
32
2 The Distribution Platform Continuous Variable Graphs and Reports
The red bracket along the edge of the box identifies the shortest half, the smallest length that contains 50% of the data. This is useful when determining the shape of underlying distributions.
Quantile Box Plots The Quantile Box Plot command shows additional quantiles (sometimes called percentiles) on the axis of the histogram. If a distribution is normal, the quantiles are approximately equidistant from each other. Like the Normal Quantile Plot, the Quantile Box Plot is useful for seeing normality in a graphical way. For example, if the quantile marks are grouped closely at one end, but have greater spacing at the other end (as in this picture), the distribution is skewed toward the end with more spacing. Note that the quantile box plot is not the same as the outlier box plot from page 31. Quantiles are values that divide a distribution into two groups — where the pth quantile is larger than p% of the values. For example, half the data are below the 50th percentile (median).
Stem and Leaf Plots The Stem and Leaf command constructs a plot that is essentially a variation on the histogram. It was developed for tallying data in the days when computer printouts were neither graphical nor easy to produce. They remain useful because they show the actual data at the same time as the shape of the data. Each line of the plot has a stem value that is the leading digit of a range of column values. The leaf values are made from the remaining digits of the values. The data values can be reconstructed by joining the stem and leaf (and multiplying by the scale factor, if one exists). In the example pictured in Figure 2.11, the third line of the table reveals that there are data points with values 40 and 41. Values are reconstructed by using the legend at the bottom of the plot. Figure 2.11 Stem and Leaf Plot
Stem and leaf plots have similar interactive capabilities to JMP Student Edition’s graphics plots, in that they highlight corresponding data points in the data table when they are selected in the plot.
2 The Distribution Platform Continuous Variable Graphs and Reports
33
The CDF Plot command plots a cumulative distribution function step plot using the observed data (with weights or frequencies if specified). Consult a statistics text for a definition of a density function. A CDF plot (Figure 2.12) estimates the area under the density curve up to each data point. Figure 2.12 CDF Plot for Size of Load
Fit Distribution The Fit Distribution menu allows you to fit certain distributions (Normal, Lognormal, Weibull) to the data. After fitting, you can select among several options, including a Goodness of Fit test.
2 Distribution: Single-Variable Statistics
CDF Plot
34
2 The Distribution Platform Categorical Variable Graphs and Reports
Categorical Variable Graphs and Reports The only text report that appears by default in categorical distribution reports is a frequencies table (Figure 2.13). Figure 2.13 Frequencies Table
This table lists the levels of a categorical variable, the count (sometimes called the frequency) of each level, and the probability associated with each variable. This probability is simply the ratio of each level’s count to the total count. The standard error of these probabilities (StdErr Prob) and the cumulative probabilities (Cum Prob) for the data are also computed, but are not initially shown in the results table. To see them, Right-click on the table to bring up a popup menu (Figure 2.14) Select Columns to reveal a popup menu of all possible columns for the table. Tables that are currently shown have a check mark beside them. Select the column to be shown or hidden. Figure 2.14 Table Popup Menu
The options listed in the drop-down menu for categorical variables (Figure 2.15) work the same as those for continuous variables (see “Continuous Variable Graphs and Reports,” p. 30.) Figure 2.15 Drop-down Menu for Categorical Variables
Statistical Tests JMP Student Edition contains numerous statistical tests for single variables, including:
2 The Distribution Platform Statistical Tests
35
• a test of the standard deviation of a continuous variable • a test of the probabilities of a categorical variable These tests are all accessed through the popup menu next to the variable’s name at the top of the report.
Testing a Mean The Test Mean command prompts for a test value to compare to the sample mean. If a value is entered for the standard deviation, a z-test is computed. Otherwise, the sample standard deviation is used to compute a t-statistic. Optionally, the nonparametric Wilcoxon signed-rank test can be requested. After clicking OK, the Test Mean table is appended to the bottom of the reports for that variable. Use the Test Mean command repeatedly to test different values. Each time the mean is tested, a new Test Mean table is appended to the text report. The Test Mean command calculates and displays the following statistics: • t Test (or z test) lists the value of the test statistic and the p-values for the two-sided and one-sided alternatives. The test assumes the distribution is normal. • Signed-Rank lists the value of the Wilcoxon signed-rank statistic followed by the p-values for the two-sided and one-sided alternatives. The test assumes nothing about the normality of the distribution, only that it is symmetric. The probability values given in the Test Mean table are defined: • Prob > |t| is the probability of obtaining a greater absolute t value by chance alone when the sample mean is not different from the hypothesized value. This is the p-value for observed significance of the two-tailed t-test. • Prob > t is the probability of obtaining a t value greater than the computed sample t ratio by chance alone when the sample mean is not the hypothesized value. This is the p-value for observed significance of a one-tailed t-test. The value of this probability is half of Prob > |t|. • Prob < t is the probability of obtaining a t value less than the computed sample t ratio by chance alone when the sample mean is not the hypothesized value. This is the p-value for observed significance of a one-tailed t-test. The value of this probability is 1 – Prob>t.
Testing a Standard Deviation The Test Std Dev command requests a test value for statistical comparison to the sample standard deviation. After clicking OK, the Test Standard Deviation table is appended to the bottom of the reports for that variable. The Test Std Dev command can be used repeatedly to test different values. Each time a standard deviation is tested, a new table is appended to the text report. The Test Standard Deviation table shows the computed Chi Square statistic that tests whether the hypothesized standard deviation is the same as the computed sample standard deviation, and the probabilities associated with that Chi Square value:
2 Distribution: Single-Variable Statistics
• a test of the mean of a continuous variable
36
2 The Distribution Platform Confidence Intervals
• Prob>|ChiSq| is the probability of obtaining a greater absolute Chi Square value by chance alone when the sample standard deviation is not different from the hypothesized value. This is the p-value for observed significance of the two-tailed t-test. • Prob>ChiSq is the probability of obtaining a Chi Square value greater than the computed sample Chi Square by chance alone when the sample standard deviation is not the hypothesized value. This is the p-value for observed significance of a one-tailed t-test. • Prob 0.95. To obtain a confidence interval alpha that is not listed on the Confidence Interval menu, select Confidence Interval > Other and enter the desired level. An example is shown in Figure 2.19. Figure 2.19 Confidence Intervals for Categorical Variables
Saving Information To save information computed from continuous response variables, use the Save menu commands. Each command generates a new column in the current data table named by appending the response column name (denoted colname in the following definitions) to the saved statistic’s name. The Save commands can be used repeatedly. This enables the same statistic to be saved multiple times under different circumstances, such as before and after combining histogram bars. If the Save com-
2 Distribution: Single-Variable Statistics
Figure 2.17 The Moments Table for Continuous Variables
38
2 The Distribution Platform Whole-Platform Options
mand is used multiple times, the column name for the statistic is named colname1, colname2, and so forth, to create unique column names. The Save menu contains the following commands: • Level Numbers creates a new column, called Level colname. The level number of each observation corresponds to the histogram bar that contains the observation. The histogram bars are numbered from low to high, beginning with 1. • Level Midpoints creates a new column, called Midpoint colname. The midpoint value for each observation is computed by adding half its level width to its lower level bound. • Ranks creates a new column called Ranked colname that contains a ranking for each of the corresponding column’s values, starting at 1. If there are duplicates in the column, they are assigned consecutive ranks in order of their occurrence in the spreadsheet. • Ranks averaged creates a new column, called RankAvgd colname. If a value is unique, its averaged rank is the same as the rank. If a value occurs k times, its average rank is computed as the sum of its value’s ranks divided by k. • Prob Scores creates a new column, called Prob colname. For N non-missing scores, the probability score of a value is computed as the averaged rank of that value divided by N+1. This column is similar to the empirical cumulative distribution function. • Normal Quantiles creates a new column, called N-Quantile colname. These normal scores are Van Der Waerden approximations to the expected order statistics for the normal distribution. • Standardized creates a new column, called Std colname. This contains the original column’s standardized values (each value in the column has had the column mean subtracted, which is then divided by the column standard deviation).
Whole-Platform Options Each statistical platform has a popup menu in the outermost outline level next to the platform name. Options and commands in this menu affect all text reports and graphs on the platform.
The whole-platform options for the Distribution platform include the following: • Uniform Scaling scales all axes with the same minimum, maximum, and intervals so that the distributions are easily compared. This option applies to reports for all response variables when selected. • Stack lets you orient all the output in the report window as either portrait or landscape. • Script lets you rerun or save the JSL script that produced the platform results. If the script is saved to a file, you can edit it; if it is saved with the current data table, it is available to run the next time you open the table. The JSL generated by Save Script for All Objects is the same as Save Script to Script Window if there are no By-Groups. When there are By-Groups the script includes JSL Where clauses that identify the By-Group levels.
2 The Distribution Platform Capability Analysis
39
• Data Table Window gives a view of the underlying data table, which is especially useful when there are By-Groups.
Capability Analysis The Capability Analysis option gives a capability analysis for quality control applications. The capability study measures the conformance of a process to given specification limits. A dialog prompts you for Lower Spec Limit, Upper Spec Limit, and Target. You only have to enter one of the three values. Only those fields you enter are part of the resulting Capability Analysis table. Optionally, you can enter a known value for sigma, the process standard deviation. Capability Analyses can calculate capability indices using several different short-term estimates for σ. After requesting a Distribution, select Capability Analysis from the popup menu on the outline bar for the variable of interest. The Dialog box shown in Figure 2.21 appears, allowing specification of long-term or one or more short-term sigmas, grouped by a column or a fixed sample size. Figure 2.21 Capability Analysis Dialog Box
All capability analyses use the same formulas. The difference between the options lies in how sigma is computed. These options for sigma can be explained as: • Long-term uses the overall sigma. This is the option used for Ppk statistics, and has sigma computed as
2 Distribution: Single-Variable Statistics
Figure 2.20 The Script Menu
40
2 The Distribution Platform Capability Analysis
n
σ =
∑
2
( xi – x ) ------------------n–1
i=1 • Specified Sigma allows the user to enter a specific, known sigma used for computing capability
analyses. Sigma is, obviously, user-specified and is therefore not computed. This is the option used for control chart-generated capability analyses, where the sigma used in the chart is entered (in the dialog) as the specified sigma. • Short Term, Grouped by fixed subgroup size computes σ using the following formula. In this case, if r is the number of subgroups and each ith subgroup is defined by the order of the data, sigma is computed as n
∑ σ =
( x ij – x )
2
i.
i=1 ----------------------------------n–r–1
• Short Term, Grouped by Column brings up a column list dialog from which you choose the grouping column. In this case, with r equal to the number of subgroups, sigma is computed as n
∑ σ =
( x ij – x )
2
i.
i=1 ----------------------------------n–r–1
(Note that this is the same formula for Short Term, Grouped by fixed subgroup size and is commonly referred to as the Root Mean Square Error or RMSE.) Note: There is a preference for Distribution called Ppk Capability Labeling that will label the long-term capability output with Ppk labels. This option is found using File > Preferences. When you click OK, the platform appends a Capability Analysis table, like the one in Figure 2.22, at the bottom of the text reports. You can remove and redo a Capability Analysis as many times as you want. The specification limits can be stored and automatically retrieved as a column property. To do this, choose Spec Limits from the Save command menu. When you save the specification limits, they appear on the histogram when opened at a later time.
2 The Distribution Platform Capability Analysis
41
The Capability Analysis table is organized into two parts. The upper part of the table shows these quantities: • The Specification column lists the names of items for which values are shown. They are Lower Spec Limit, Upper Spec Limit, and Spec Target. • The Value column lists the values you specified for each limit and the target • %Actual is the observed percent of data falling outside the specification limits. The lower portion of the Capability Analysis table lists five basic process capability indexes, their values, and their upper and lower Confidence Intervals. It also lists the percent and PPM for areas outside the spec limits.The PPM column (parts per million) is the Percent column multiplied by 10,000. This Sigma Quality measurement is frequently used in Six Sigma methods, and is also referred to as the process sigma. ( # defects )-⎞ + 1.5 Sigma Quality = Normal Quantile ⎛⎝ 1 – Expected ----------------------------------------------⎠ n
For example, if there are 3 defects in n=1,000,000 observations, the formula yields 6.03, or a 6.03 Sigma process. The above and below columns do not sum to the total column because Sigma Quality uses values from the Normal distribution, and is therefore not additive. Table 2.1 “Capability Index Names and Computations,” p. 41, describes these indices and gives computational formulas. Table 2.1 Capability Index Names and Computations Index
CP
Index Name
process capability ratio, Cp
Computation
(USL – LSL)/6s where USL is the upper spec limit LSL is the lower spec limit
2 Distribution: Single-Variable Statistics
Figure 2.22 The Capability Analysis Table
42
2 The Distribution Platform Capability Analysis
Table 2.1 Capability Index Names and Computations CIs for Lower CI 2 χ1 – α CP on CP
(continued)
------------, n – 1
w CP --------------------------n–1
Upper CI on CP
2
χ1 – (1 – α)
--------------------------, n – 1
w ---------------------------------------n–1
CPK (PPK for AIAG) CIs for CPK
process capability index, Cpk Expected Let Value ˆ E ⎛ C pk⎞ ⎝ ⎠
min(CPL, CPU)
+ LSL-⎞ n ⎛ µ – USL ------------------------⎝ ⎠ 2 c = ------------------------------------------------σ
denote the noncentrality parameter and – LSLd = USL ------------------------σ
represent the specification limit in σ units. Then the expected value is n–2 Γ ⎛ ------------⎞ 2 1--- n – 1 ⎝ 2 ⎠ ⎛ –c ------------ --------------------- d n – 2 --2- exp ⎛ ⎛ --------⎞ – 2c ( 1 – 2Φ ( – c ) )⎞ ⎞ ⎝⎝ 2 ⎠ ⎠⎠ 6 2n ⎛ n – 1⎞ ⎝ π Γ -----------⎝ 2 ⎠
Variance Var ⎛ C pk⎞ ⎝ ⎠
Using c and d from above, the variance is 2
– 1-⎞ d n–1 d -⎛n --------------– ---------- ⎛ ------------⎞ 36 ⎝ n – 3⎠ 9 n ⎝ n – 3⎠
2
–c --2- exp ⎛ --------⎞ + c ( 1 – 2Φ ( – c ) ) ⎝ 2 ⎠ π
n–2 2 Γ ⎛⎝ ------------⎞⎠ 2 2 1--- ------------------n–1 n – 1 + ( 1 + c ) – ⎛⎝ ------------⎞⎠ --------------------9 n(n – 3) 72n n –1 Γ ⎛ ------------⎞ ⎝ 2 ⎠ 2
c -⎞ ⎛ d n – 2 --2- exp ⎛ ⎛ –------⎞⎞ ⎝ ⎝ ⎝ 2 ⎠ – 2c ( 1 – 2Φ ( – c ) )⎠ ⎠ π
Lower CI
ˆ ˆ E ⎛⎝ C pk⎞⎠ – k Var ( C pk )
Upper CI
ˆ ˆ E ⎛ C pk⎞ + k Var ( C pk ) ⎝ ⎠
2
43
2 The Distribution Platform Capability Analysis
2 CPM -------------------γ
Upper CI on CPM
2
χ1 – (1 – α)
--------------------------, γ
2 CPM -------------------------------γ
CPL
CPU
process (mean – LSL)/3s where s is the estimated standard deviation capability ratio case of one-sided lower specification process (USL - mean)/3s capability ratio of one-sided upper specification
In Japan, a capability index of 1.33 is considered to be the minimum acceptable. For a normal distribution, this gives an expected number of nonconforming units of about 6 per 100,000. Exact 100(1 – α)% lower and upper confidence limits for CPL are computed using a generalization of the method of Chou et al. (1990), who point out that the 100(1 – α) lower confidence limit for CPL (denoted by CPLLCL) satisfies the equation Pr { T n – 1 ( δ = 3 n )CPLLCL ≤ 3CPL n } = 1 – α
where Tn–1(δ) has a non-central t-distribution with n – 1 degrees of freedom and noncentrality parameter δ. Exact 100(1 – α)% lower and upper confidence limits for CPU are also computed using a generalization of the method of Chou et al. (1990). who point out that the 100(1 – α) lower confidence limit for CPU (denoted CPULCL) satisfies the equation Pr { T n – 1 ( δ = 3 n ) ( CPULCL ≥ 3CPL n ) } = 1 – α
where Tn–1(δ) has a non-central t-distribution with n – 1 degrees of freedom and noncentrality parameter δ. At the bottom of the report, Z statistics are reported. Z represents (according to the AIAG Statistical Process Control manual) the number of standard deviation units from the process average to a value of
2 Distribution: Single-Variable Statistics
Table 2.1 Capability Index Names and Computations (continued) CPM process min ( target – LSL, USL – target -) --------------------------------------------------------------------------capability 2 2 3 s + ( mean – target ) index, Cpm CIs for Lower CI 2 χ1 – α CPM on CPM ------------, γ
44
2 The Distribution Platform Capability Analysis
interest such as an engineering specification. When used in capability assessment, Z USL is the distance to the upper specification limit and Z LSL is the distance to the lower specification limit. Z USL = (USL-Xbar)/sigma = 3 * CPU Z LSL = (Xbar-LSL)/sigma = 3 * CPL Z Bench = Inverse Cumulative Prob(1 - P(LSL) - P(USL)) where P(LSL) = Prob(X < LSL) = 1 - Cum Prob(Z LSL) P(USL) = Prob(X > USL) = 1 - Cum Prob(Z USL) Note: You can also do a non-normal capability analysis through Fit Distribution options, described in the next section. After you fit a distribution, you have the option to generate quantiles and a target value for the fitted distribution. If you give a target value, a capability analysis is automatically generated by using the quantile values and target you specified.
3 The Fit Y by X Platform Any time two variables need to be compared, the Fit Y by X platform is the choice to make. This single platform produces one way ANOVA, scatterplots, and contingency table analysis—most of the two-variable analyses seen in an introductory statistics course.
Introduction After starting JMP Student Edition, Open the data file Denim.jmp. Details about this data are found in Chapter 2, “The Distribution Platform” in the section “About the Data,” p. 21.
Launching the Platform In this introduction section, the variables are examined in pairs. Select Analyze > Fit Y By X from the menu bar. This brings up the Fit Y By X platform launch dialog as seen in Figure 3.1 Figure 3.1 Fit Y By X Launch Dialog
Notice the word “Contextual” in the title bar. It is there because this dialog launches other platforms depending on the modeling types (continuous or categorical) of the variables in the analysis. More
46
3 The Fit Y by X Platform Introduction
information on modeling types is found in “The Modeling Type of Variables,” p. 29. Initially, this example consists of three analyses, with Starch Content (%) as the Y variable in all of them. Method, Size of Load (lbs), and Sand Blasted? are the X variables. All three analyses are requested at the same time to illustrate some of JMP Student Edition’s interactive capabilities. These analyses would be equally valid if performed separately. Select Starch Content (%) from the list of columns and click the Y, Response button. To select all three X variables, click on Method, hold down the Shift key, then click Sand Blasted?. Note that these dialog boxes respond to dragging as well as button clicks, as in the next step. Drag these highlighted variables to the box to the right of the X, Factor button. Click OK. Three plots appear as in Figure 3.2. Figure 3.2 Fit Y by X Results
On the far left and far right, dot plots of each level of a nominal variable are plotted side by side, a situation leading to one-way ANOVAs. In the middle plot, JMP Student Edition produces a scatterplot of two continuous variables, a situation leading to fitting lines and curves.
Computing a t-test As a simple example, examine the plot on the far right, relating starch content to whether the fabric was sand blasted or not. Is the starch content different for the two levels of Sand Blasted? This is a typical situation examined with a two-sample t-test. To conduct the t-test, Select t test from the drop-down menu in the plot’s title bar. The t-test report appears in the outline beneath the plot labeled t test. Figure 3.3 t-test Results
Some things should be noticed about this report.
3 The Fit Y by X Platform Introduction
47
Select the Means/Anova/Pooled t command from the report’s drop-down menu. • The plot gets embellished with means diamonds, and other text tables. All of these are discussed later in this tutorial in “Analysis of Variance (anova),” p. 50. Here, note that the p-value for the unpooled t-test is listed beside Prob>|t| in the original t test report. Notice that this is the same value as listed in the new Analysis of Variance report in the column labeled Prob>F. In essence, JMP Student Edition has tested the same hypothesis twice, with two different methods, and both methods agree (as they always should!). In fact, the square of the t statistic (listed under t-Test) is equal to the value of the F statistic (listed in the ANOVA table as F Ratio).
From the drop-down menu in the Oneway Analysis title bar, select Means/Anova/t test to remove the pooled t report. • This is a two sample t-test, not a matched pairs t-test. If the data from the two groups have a natural pairing (for example, the before-and-after measurements of a patient taking an experimental medication), use the Matched Pairs platform. Details on matched pairs are found in “The Matched Pairs Platform,” p. 71 in the “The Matched Pairs Platform” chapter.
Pooled t test Now, examine the plot on the left side in the report in Figure 3.2, of Starch Content (%) vs. Method. Denim washed with Alpha Amalyze appears to have a lower starch content than denim washed with
3 Fit Y by X: Comparing Two Variables
• There is a statement on the second line of the t-test report that says “Assuming unequal variances”. This test is also known as the unpooled t-test. If you want the pooled version (where the variances are assumed to be equal), select the Means/Anova/Pooled t command.
48
3 The Fit Y by X Platform Introduction
Caustic Soda or Pumice Stone. For more specificity, it is helpful to look at text reports of these results, examining the mean, median, standard deviation, and quantiles for the three levels of the Method variable, which are produced as follows. From the drop-down menu in the Oneway Analysis title bar, select Quantiles. Text reports appear below the plot, and box plots are superimposed on the plot. For details on box plots, see the section titled “Quantile Box Plots,” p. 32, or the online help. From the same drop-down menu, select Means and Std Dev. In addition to the new text reports, mean error bars and standard deviation lines appear on the plot. Box plots are superimposed on the plot, giving clues to the underlying distribution of each level. Details on these additions are found in “One Way anova—The Continuous by Categorical Case,” p. 61.
These additions can also be removed. In the same drop-down menu, select Quantiles and Means and Std Dev again.
Selecting and Marking Points This plot is not only useful for computing results, but also for selecting results in other plots. Select the Lasso tool from the Tools toolbar, as shown in Figure 3.4. Figure 3.4 Lasso Tool
Lasso Tool
The Lasso tool is used to draw curves around points. The “captured” points become selected.
3 The Fit Y by X Platform Introduction
49
Release the mouse button. Figure 3.5 Selecting Points with the Lasso Tool
JMP Student Edition briefly flashes how many points are contained in the selection region (32 in this case) and selects the points. Notice that these points are highlighted in all the plots, and in the data table. The Lasso tool works with all plots that show individual points, like scatterplots and leverage plots. To make these points distinctive, assign them a unique color and marker. With the points selected, right-click inside the plot, select Row Colors from the popup menu, and choose a red color from the color palette. The Alpha Amalyze points turn red in all the plots. Again, right-click inside the plot and select Row Markers. Select the small triangle from the markers palette. The Alpha Amalyze points change to the triangle in all the plots. In fact, there is an easier way to change the colors and markers of points in a plot if there is a certain column that divides up the data. For example, suppose that plots are needed that clearly distinguish the three levels of the Method variable. To mark all the data at once, From the Rows menu, select Color or Mark by Column. In the resulting dialog box, select Method from the list of variables. Make sure that the Set Color by Value, Set Marker by Value, and Make Window with Legend checkboxes are checked. Unique colors and markers are assigned to each level of the Method variable in all the plots.
3 Fit Y by X: Comparing Two Variables
While holding down the mouse button, drag the Lasso tool completely around the points for Alpha Amalyze, as shown in Figure 3.5.
50
3 The Fit Y by X Platform Introduction
Analysis of Variance (ANOVA) Is knowledge of the wash method useful in predicting starch content of the denim? The statistical test to answer this question is called a one-way ANOVA, and is produced in the same way as the t-test above. Select Means/Anova from the drop-down menu next to Oneway Analysis. Note that his command reads Means/Anova/t test when the categorical variable only has two levels. In all other cases (like this one), the t test is not appropriate, so is not available on the menu. An ANOVA table appears beneath the plots, and means diamonds appear on the plot. A means diamond illustrates a sample mean and its 95% confidence interval, as shown by the schematic in Figure 3.6. The horizontal line across each diamond represents the group mean. The vertical span of each diamond represents the 95% confidence interval for each group. Overlap marks are drawn above and below the group mean. For groups with equal sample sizes, overlapping overlap marks indicate that the two group means are not significantly different at the 95% confidence level. Figure 3.6 Means Diamonds Illustrated
Group Mean
Overlap Marks
95% CI
Examining the ANOVA table shows that Method is a highly significant predictor of starch content. In other words, at least one level of the Method variable has a significantly higher or lower starch content than the others. The obvious question is which levels are different from each other. JMP Student Edition uses comparison circles to explore this.
Comparison Circles To show comparison circles, Select Compare Means > Each Pair, Student’s t Complete details of comparison circles are on page 63. Put simply, they show differences among levels of a variable, and are clickable. When a circle is clicked, it turns red, levels that are not significantly different from it turn red, and levels that are significantly different from it turn gray. To see this, Click on the bottom comparison circle, corresponding to Alpha Amalyze.
3 The Fit Y by X Platform Introduction
51
The display changes to the one shown in Figure 3.7. This shows that Alpha Amalyze is significantly different from the other two wash methods. Click on the other two circles to discover their relationships. These comparison circles are based on the confidence interval around the mean, which is itself based on the α level. By default, the α-level is 5%. However, it can be changed. From the popup menu in the Oneway title bar, select Set α Level > .10. Notice that the comparison circles change diameter when the α-level changes.
Fitting Lines The middle plot of the report in Figure 3.2 is of two continuous variables, a situation that allows fitting of lines and curves through least-squares regression. For example, suppose you want to predict starch content based on the size of the wash load. A good guess may simply be the mean starch content from all the data points. To see this mean, Select Fit Mean from the drop-down list in the title bar of the plot. A line representing the mean appears on the plot, and Fit Mean appears in a legend below the plot. Notice that Fit Mean below the plot has its own drop-down menu, as shown in Figure 3.8. Figure 3.8 Fit Mean Results
Platform Menu
Fit Menu
3 Fit Y by X: Comparing Two Variables
Figure 3.7 Starch Content Comparison Circles
52
3 The Fit Y by X Platform Introduction
A more interesting statistical question is whether a line or a curve is a better predictor of starch content than this simple mean. To fit a regression line to this data, Select Fit Line from the platform menu in the plot’s title bar. A line is superimposed on the graph. This line should be compared with the simple mean to see if it is helpful in prediction. To do this comparison, JMP Student Edition can draw confidence intervals for the fit around the fitted line. If these confidence intervals do not contain the horizontal mean, then the fitted line is helpful. Select Confid Curves Fit from the Linear Fit menu in the legend below the plot. As seen in Figure 3.9, the dotted confidence interval around the linear fit does not contain the mean. Therefore, the linear fit is statistically significant. It is statistically sound to use the fitted line in predictions. Figure 3.9 Fit Line Results
There is also an option to produce shaded confidence curves, using the Confid Shaded Fit command.
The equation of the line, as well as several computed statistics, are found in the Linear Fit report. Values of the slope and intercept are also printed in the Parameter Estimates section of the report. Aside from the graphical confidence-curve method detailed above, there are numerical measures of the significance. One is the p-value associated with the slope of the line, also found in the Parameter Estimates report. In this case, the p-value is 0.0045, significant by almost any standard, reinforcing the graphical results above.
3 The Fit Y by X Platform Introduction
53
Another measure of fit is the correlation coefficient, frequently denoted by r. Its value does not appear on any of the reports so far, although the square of its value (r2) is listed beside RSquare in the Summary of Fit text report. To compute the value of r itself, request a density ellipse. Select Density Ellipse > .95 from the platform menu on the plot’s title bar. A new report named Correlation appears at the bottom of the text reports. It is initially closed, but can be opened by clicking on the blue disclosure icon (Figure 3.10).
Disclosure Icon Figure 3.10 The Disclosure Icon Click here... ...to reveal this report
The correlation coefficient is listed under the word Correlation. It is interesting to note that its significance (p=0.0045) is the same as that listed for the slope coefficient in the Parameter Estimates table, and the same as the Prob>F value in the Analysis of Variance table.
Residuals One of the best diagnostic tools for a linear fit is its residuals. All (good) introductory statistics textbooks discuss interpretation of residuals, which is not duplicated here. However, the first step in interpretation of residuals is to see them plotted. Of course, there are different residuals for each fit, so residuals commands are found in the fit menus in the legends below the plots. To see residuals for the linear fit, Select Plot Residuals from the fit popup menu in the legend below the plot. A plot of the residuals appears at the bottom of the report. The red horizontal line on the plot shows the mean of the residuals (which should, ideally, be near zero). Figure 3.11 Residual Plot
The plot is getting cluttered, so before continuing, remove the fits that are there now. For each of the fits below the plot, select Remove Fit from the fit popup menu. Another interesting question is whether a single line (like this model) is enough to describe starch content for all wash methods, or if a different line is needed for each level of the Method variable. In other
3 Fit Y by X: Comparing Two Variables
Correlation Coefficient
54
3 The Fit Y by X Platform Introduction
words, is starch content related to load size in the same way when washed in Alpha Amalyze as when washed in Caustic Soda or Pumice Stone? Although this question is more in the realm of the Fit Model platform (detailed in “The Fit Model Platform” chapter), some initial investigation is easy with this platform. First, instruct JMP Student Edition to group its calculations for each level of the Method variable. From the platform menu on the plot’s title bar, select Group By. In the resulting dialog box, select Method as the grouping variable. Now, request JMP Student Edition to fit a line as before. Select Fit Line from the Platform popup menu on the title bar above the plot. Separate lines for each wash method appear on the plot, as shown in Figure 3.12. Figure 3.12 Group By and Linear Fits
The statistical question is whether these lines are different enough to warrant the extra trouble in reporting all three, instead of the more compact (but possibly less accurate) reporting of the single line found previously.
Two-Way Contingency Tables In the next example, both X and Y are categorical variables. The question is whether the method of washing denim has an effect on thread count. The analysis uses contingency tables—orderly ways of arranging count data. To generate a contingency table for this problem, Select Analyze > Fit Y by X from the menu bar. Assign Thread Wear (not Thread Wear Measured) to the Y, Response role. Assign Method to the X, Factor role. Click OK. Since both variables are categorical, a mosaic plot appears, followed by a contingency table. Details of these displays are in the section “Contingency Analysis—The Categorical by Categorical Case,” p. 66. Note that the mosaic plot is clickable, like all plots in JMP Student Edition. For example, to select all rows washed in Alpha Amalyze with a low thread wear, Click in the lower, red section in the mosaic plot in the bar above Alpha Amalyze (see Figure 3.13)
3 The Fit Y by X Platform Introduction
55
Click in this area to select Alpha Amalyze-washed jeans with low thread wear
Just below the contingency plot are tests for the independence of the two variables. The p-value for this test appears in the column labeled Prob>ChiSq, which in this case is the non-significant 0.76. There is not enough evidence to say that these two variables are not independent—in other words, there is not enough evidence to say that the thread count of denim is affected by wash method.
Logistic Regression The Logistic platform fits the probabilities for response categories to a continuous x predictor. The fitted model estimates probabilities attributed to each x value. The logistic platform is the nominal/ordinal by continuous personality of the Fit Y by X command. There is a distinction between nominal and ordinal responses on this platform: • Nominal logistic regression estimates a set of curves to partition the attributed probability among the responses. • Ordinal logistic regression models the probability of being less than or equal to a given response. This has the effect of estimating a single logistic curve, which is shifted horizontally to produce probabilities for the ordered categories. This model is less general but more parsimonious, and is recommended for ordered responses. As an example, Select Analyze > Fit Y By X Assign Thread Wear as Y, Response and Size of Load (%) as X, Factor. Click OK. The report that appears shows the probability that the thread wear is low, moderate, or severe for each load size.
3 Fit Y by X: Comparing Two Variables
Figure 3.13 Mosaic Plot
56
3 The Fit Y by X Platform Introduction
The p-value of 0.0657 hints at a weak association between these two variables.
The Formula Editor A powerful (and often under used) feature of JMP Student Edition is its formula editor. Formulas serve a wide variety of purposes, from assigning simple values to computing complex calculations with parameters and conditional clauses. They are especially useful when transforming data. A column whose values are computed by a formula is both linked and locked. It is linked to (and dependent on) all other columns that are part of its formula. Its values are automatically recomputed whenever the values in these columns are edited. It is locked so that its data values cannot be edited individually. In the Denim.jmp sample data table, the ordinal variable Thread Wear is computed with a formula that partitions the values of Thread Wear Measured into low, moderate, and severe categories. To see the formula, Right-click in the heading of the column Thread Wear. Select Formula from the menu that appears. Click Cancel to return to the data table. The Formula Editor window operates like a pocket calculator with buttons, displays, and an extensive list of easy-to-use features. Example The essential features of the formula editor are best seen through an example. Suppose that it is necessary to calculate the logarithm of the values in the Starch Content (%) column (This calculation is common in real-world statistics). A new column is needed to store the new values in. There are two methods to create a new column.
3 The Fit Y by X Platform Introduction
57
Alternatively, double-click in the area to the right of the last column in the data table. When a new column is created, the default title is highlighted and ready to be changed. Change the title of the new column to Log Starch. Now, add a formula to the column. Right-click in the heading of the column Log Starch. Select Formula from the menu that appears. When the formula editor appears,
Click on Transcendental from the Functions list, then select Log from the resulting menu.
Note that Log is the natural logarithm (base e). Common (base 10) logs are computed using the Log10 function. Select Starch Content (%) from the columns list.
Click OK to close the Formula Editor and apply the formula. Further examples, as well as complete documentation of all the formula editor functions, are found in JMP Student Edition’s online help.
3 Fit Y by X: Comparing Two Variables
From the main menu bar, select Cols > New Column.
58
3 The Fit Y by X Platform Scatterplots—The Continuous by Continuous Case
Scatterplots—The Continuous by Continuous Case If both the X and Y variables are continuous, JMP Student Edition produces a bivariate analysis that initially shows a scatterplot. There are a number of options once the scatterplot appears, all accessed through the popup menu beside the variable name in the title bar (Figure 3.14) Figure 3.14 Bivariate Popup Menu
Show Points alternately hides or displays the points in the plot.
It is often useful to first fit the mean as a reference line for other fits. The Fit Mean command adds a horizontal line to the plot at the mean of the response variable (Y). As with all the fitting commands in this platform, a legend appears below the plot with its own drop-down menu, where additional commands for each fit are accessed.The Fit Mean Fit table shows the value of the mean, its standard deviation, its standard error, and the sum of squared errors around the mean Figure 3.15 Fit Commands
The Fit Line command adds a straight-line fit to the plot using least squares regression. Its drop-down menu (accessed as in Figure 3.15) has commands to save predicted values and residuals for the linear fit as new columns in the current data table.
3 The Fit Y by X Platform Scatterplots—The Continuous by Continuous Case
59
The Fit Polynomial command fits a polynomial curve of the degree selected from the Fit Polynomial submenu. After selecting the polynomial degree, the curve is fit to the data points using least squares regression. 95% confidence limits are plotted with the Confid Curves Fit and Confid Curves Indiv display options. The Fit Polynomial option can be selected multiple times with different polynomial degrees for comparison. As with the linear fit, options can save predicted values and residuals as new columns in the current data table for each polynomial fit. Each time a linear or polynomial fit is chosen, three additional tables are appended to the report (see Figure 3.16). A Lack of Fit table also appears if there are replicates. For details of the lack of fit test, see “The Lack of Fit Table,” p. 90 or the online help. Figure 3.16 Linear Fit Text Reports
The Summary of Fit Table The summary of fit table shows the following information: • R2 (RSquare), a measure of how well the line fits • The adjusted R2 (RSquare Adj), used to compare models with different numbers of variables • The Root Mean Square Error, an estimate of the standard deviation of the random error • The mean of the response variable • The number of observations (or, if weighted variables are involved, the sum of the weights).
3 Fit Y by X: Comparing Two Variables
If the confidence area around this line (produced through the Confid Curves Fit command) includes the horizontal line at the response mean, then the slope of the line of fit is not significantly different from zero at the 0.05 significance level.
60
3 The Fit Y by X Platform Scatterplots—The Continuous by Continuous Case
The Lack of Fit Table The Lack of Fit table shows a special diagnostic test that appears only when the data and the model provide the opportunity. It is a test to see if a different form of the model would fit the data better. A significant F statistic in this table indicates that a different model should be examined. For details of the lack of fit test, see “The Lack of Fit Table,” p. 90 or the online help.
Analysis of Variance Table This table look similar to the ANOVA tables in most textbooks. However, there may be some differences in the terminology for the ANOVA table’s parts. JMP Student Edition uses the following: • Source lists the three sources of variation, called Model, Error, and C Total. The “C” in C Total stands for corrected, as in corrected for the mean. • DF records the associated degrees of freedom (DF) for each source of variation • Sum of Squares records an associated sum of squares (SS for short) for each source of variation. • Mean Square is a sum of squares divided by its associated degrees of freedom. • F Ratio is the model mean square divided by the error mean square. The underlying hypothesis of the fit is that all the regression parameters (except the intercept) are zero. If a parameter is a significant model effect, the F Ratio is usually higher than expected by chance alone. • Prob > F is the observed significance probability (p-value) of obtaining a greater F value by chance alone if the specified model fits no better than the overall response mean.
Parameter Estimates Table The terms in the Parameter Estimates table for a linear fit (seen previously in Figure 3.16) are the intercept and the single X variable. The Parameter Estimates table displays the following: • Term lists the name of each parameter in the requested model. The intercept is a constant term in all models. • Estimate lists the parameter estimates of the linear model. These estimates are the coefficients in the linear model. • Std Error lists the estimates of the standard errors of the parameter estimates. They are used in constructing tests and confidence intervals. • t Ratio lists the test statistics for the hypothesis that each parameter is zero. It is the ratio of the parameter estimate to its standard error. Looking for a t ratio greater than 2 in absolute value is a common rule of thumb for judging significance, because it approximates the 0.05 significance level. • Prob>|t| lists the observed significance probability calculated from each t ratio. It is the probability of getting, by chance alone, a t ratio greater (in absolute value) than the computed value, given a true hypothesis. Often, a value below 0.05 is interpreted as evidence that the parameter is significantly different from zero.
3 The Fit Y by X Platform One Way anova—The Continuous by Categorical Case
61
The Fit Special command displays a dialog with choices for transformations of both the Y and X variables. Transformations include log, square root, square, reciprocal, and exponential. The fitted line is plotted on the original scale, so it appears as a curve on the plot. The regression report is shown with the transformed variables, but an extra report shows measures of fit transformed in the original Y scale (if there was a Y transformation). The Fit Each Value command fits a value to each unique X value. Fitting each value is like doing a one-way analysis of variance, but in the continuous by continuous bivariate platform. Compare it to other fitted lines to see the concept of lack of fit. The Density Ellipse command draws an ellipse that contains the specified mass of points, determined by the probability chosen from the Density Ellipse submenu. The Other selection allows the specification of any probability greater than zero and less than or equal to one. The density ellipsoid is a good graphical indicator of the correlation between two variables. The ellipsoid collapses diagonally as the correlation between the two variables approaches either 1 or –1. The ellipsoid is more circular (less diagonally oriented) if the two variables are uncorrelated. The Density Ellipse table that accompanies each Density Ellipse fit shows the correlation coefficient (r) for the X and Y variables and the probability that the correlation between the variables is significant. The Group By command in the fitting menu displays a dialog, allowing selection of a classification (grouping) variable. When a grouping variable is selected, the Fit Y by X platform computes a separate analysis for each level of the grouping variable, and overlays the regression curves or ellipses on the scatterplot. The fit for each level of the grouping variable is identified beneath the scatterplot, with individual popup menus to save or remove fitting information. The Group By command is checked in the fitting menu when a grouping variable is in effect. To change a grouping variable that is already in effect, Select the Group By command to remove (uncheck) the existing variable. Then, select the Group By command again and respond to its dialog as before.
One Way ANOVA—The Continuous by Categorical Case If the X variable is categorical and the Y variable is continuous, JMP Student Edition produces a one way ANOVA, initially displaying a plot that shows a vertical distribution of Y points for each X value. There are a number of options once this scatterplot appears, all accessed through the popup menu beside the variable name in the title bar (Figure 3.17).
3 Fit Y by X: Comparing Two Variables
Other Fitting Commands
62
3 The Fit Y by X Platform One Way anova—The Continuous by Categorical Case
Figure 3.17 One Way ANOVA Popup Menu
The Quantiles command displays the Quantiles table, which lists the 0% (minimum), 10%, 25%, 50% (median), 75%, 90%, and 100% (maximum) quantiles for each group. It also activates Box Plots from the Display Options menu. The Means/Anova/t test command fits means for each group and performs a one-way analysis of variance to test if there are differences among the means. Three tables are produced: a summary table, a one-way analysis of variance table, and a table that lists group frequencies, means, and standard errors computed with the pooled estimate of the error variance. If there are only two groups, a t-test also shows. This option automatically activates the Means Diamonds display option. See “Analysis of Variance (anova),” p. 50 for a detailed description of means diamonds. The Means and Std Dev command fits means for each group, but uses standard deviations computed within each group rather than the pooled estimate of the standard deviation used to calculate the standard errors of the means. This command also displays Means Dots, Error Bars, and Std Dev Lines display options. Compare Means has a submenu that provides the following four multiple comparison methods for comparing sets of group means. All activate the Comparison Circles display option.
• Each Pair, Student’s t displays a table with Student’s t statistics for all combinations of group means. • All Pairs, Tukey HSD displays a table that shows the Tukey-Kramer HSD (honestly significant difference) comparisons of group means. • With Best, Hsu’s MCB displays a table that shows Hsu’s MCB (Multiple Comparison with the Best) comparisons of group means to the best (maximum or minimum) group mean. • With Control, Dunnett’s displays a table showing Dunnett’s comparisons of group means with a control group.
3 The Fit Y by X Platform One Way anova—The Continuous by Categorical Case
63
Figure 3.18 Alignment of Comparison Circles
Compare each pair of group means visually by examining how the comparison circles intersect. The outside angle of intersection tells whether group means are significantly different (see Figure 3.19). Circles for means that are significantly different either do not intersect or barely intersect, so that the outside angle of intersection is less than 90°. If the circles intersect by an angle of more than 90°, or if they are nested, the means are not significantly different. If the intersection angle is close to 90°, it is easy to verify whether the means are significantly different by clicking on the comparison circle, thus highlighting it. The highlighted circle appears with a thick solid line. Circles representing means that are not significantly different from the highlighted circle show with thin lines (see Figure 3.20). Circles representing means that are significantly different show with a thick gray pattern. To deselect circles, click in the graph outside the circles. Figure 3.19 Angles in Comparison Circles angle greater than 90 degrees
angle equal to 90 degrees
angle less than 90 degrees
not significantly different
borderline significantly different
significantly different
3 Fit Y by X: Comparing Two Variables
Each multiple comparison test begins with a comparison circles plot, a visual representation of group mean comparisons. The plot follows with a table of means comparisons. The illustration in Figure 3.18 shows the alignment of comparison circles with the confidence intervals of their respective group means.
64
3 The Fit Y by X Platform One Way anova—The Continuous by Categorical Case
Figure 3.20 Comparison Circles after Clicking
The Nonparametric submenu allows computation of three nonparametric tests: the Wilcoxon, Median, and van der Warden tests. Nonparametric tests are useful to test whether group means or medians are located the same across groups. However, the usual analysis of variance assumption of normality is not made. Nonparametric tests use functions of the response variable ranks, called rank scores. • Wilcoxon rank scores are the simple ranks of the data. The Wilcoxon test is the most powerful rank test for errors with logistic distributions. • Median rank scores are either 1 or 0 depending on whether a rank is above or below the median rank. The Median test is the most powerful rank test for errors with doubly exponential distributions. • Van der Waerden rank scores are the ranks of the data divided by one plus the number of observations transformed to a normal score by applying the inverse of the normal distribution function. The Van der Waerden test is the most powerful rank test for errors with normal distributions. The UnEqual Variances command tests for equality of group variances. It uses (and reports) four different tests: O’Brien’s test, the Brown-Forsythe test, Levene’s test, and Bartlett’s test. When the variances across groups are not equal, the usual analysis of variance assumptions are not satisfied, so the standard ANOVA F test is not valid. There is a valid variant of the standard ANOVA, called the Welch ANOVA, which is displayed. Set Alpha Level has a submenu that allows a choice from the most common alpha levels, or the specification of any level with the Other selection. Changing the alpha level recalculates any confidence lim-
its, adjusts the means diamonds on the plot if they are showing, and modifies the upper and lower confidence level values in reports. Normal Quantile Plot shows overlaid normal quantile plots for each level of the X variable. Along with
the standard normality-assessing capabilities of the single-variable Normal Quantile Plot, this plot shows both the differences in the means (vertical position) and the variances (slopes) for each level of the categorical X factor (Figure 3.21).
3 The Fit Y by X Platform One Way anova—The Continuous by Categorical Case
65
Slopes show standard deviations Separations show differences in means
Normal Quantile Plot has these additional options:
• Plot Actual by Quantile generates a quantile plot with the response variable on the y-axis and quantiles on the x-axis. The plot shows quantiles computed within each level of the categorical X factor. • Plot Quantile by Actual reverses the x- and y-axes, as shown in Figure 3.21. • Line of Fit draws the straight diagonal reference lines for each level of the X variable. • Probability Labels shows probabilities on the right axis of the Quantile by Actual plot and on the top axis of the Actual by Quantile plot. CDF plots the cumulative distribution function for all the groups in the Oneway report. Save has a submenu of commands to save the following quantities as new columns in the current data
table: • Save Centered saves values computed as the response variable minus the mean of the response variable within each level of the factor variable. • Save Standardized saves standardized values of the response variable computed within each level of the factor variable. This is the centered response divided by the standard deviation within each level. • Save Normal Quantiles saves normal quantile values computed within each level of the categorical factor variable. Display Options allows addition or removal of plot elements.
• All Graphs shows or hides all graphs. • Points shows data points on the scatterplot. • Box Plots shows outlier box plots for each group. • Means Diamonds draws Means Diamonds. Complete details of means diamonds is found in “Analysis of Variance (anova),” p. 50. • Mean Lines draws a line at the mean of each group. • Mean CI Lines draws lines at the upper and lower 95% confidence levels for each group. • Mean Error Bars identifies the mean of each group with a large marker and shows error bars one standard error above and below the mean. • Grand Mean draws the overall mean of the Y variable on the scatterplot. • Std Dev Lines shows dotted lines one standard deviation above and below the mean of each group.
3 Fit Y by X: Comparing Two Variables
Figure 3.21 Normal Quantile Plot
66
3 The Fit Y by X Platform Contingency Analysis—The Categorical by Categorical Case
• Comparison Circles show comparison circles computed for the multiple comparison method selected in the platform menu. • Connect Means connects the group means with a straight line. • Mean of Means • X-Axis Proportional makes spacing on the x-axis proportional to the sample size of each level. • Points Spread spreads points over the width of the interval. • Points Jittered adds random horizontal jitter so that points that overlay on the same Y value can be seen.
Script Submenu The Script submenu contains commands related to saving a script to redo an analysis. • Redo Analysis repeats the analysis represented in the report. • Save Script to Data Table generates a script that can redraw the report, and attaches it to the data table. • Save Script to Report appends a script to the top of the report. • Save Script to Script Window produces a script that can re-create the report in a text window. This script can then be edited or saved to an external file. • Save Script for All Objects is useful when several analyses — like those from a By group, or from several variables in a single Distribution report— are in the same window. The resulting script generates all reports in the window. • Data Table Window brings the data table to the front of the display.
Contingency Analysis—The Categorical by Categorical Case If both the X and Y variables are categorical, JMP Student Edition produces a contingency analysis that initially shows a mosaic plot, contingency table (sometimes referred to as a crosstabs table), and a table of chi-square tests. Figure 3.22 Contingency Analysis
3 The Fit Y by X Platform Logistic Regression—The Categorical by Continuous Case
67
Figure 3.23 Contingency Popup Menu
The contingency table itself has a popup menu to turn its cell contents on and off. Figure 3.24 Contingency Table Popup Menu
• Count is the cell frequency, margin total frequencies, and grand total (total sample size). • Total % is the percentage of cell counts and margin totals to the grand total. • Row % is the percentage of each cell count to its row total • Col % is the percentage of each cell count to its column total • Expected is the expected frequency of each cell under the assumption of independence. It is computed as the product of the corresponding row total and column total, divided by the grand total • Deviation is the observed (actual) cell frequency minus the expected cell frequency • Cell ChiSq is the Chi Square values computed for each cell as 2
( Observed – Expected ) -------------------------------------------------------Expected
Logistic Regression—The Categorical by Continuous Case If the Y variable is categorical and the X variable is continuous, JMP Student Edition produces a logistic analysis that initially shows a logistic plot and text reports. The cumulative logistic probability plot gives a complete picture of what the logistic model is fitting. At each x value, the probability scale in the y direction is divided up (partitioned) into probabilities for
3 Fit Y by X: Comparing Two Variables
The popup menu for contingency analyses contains items to turn parts of the report on and off. Mosaic Plot, Contingency Table, and Tests all operate as toggles. Display Options > Horizontal Mosaic rotates the mosaic plot 90 degrees. The final item, Script, is explained in the section “Script Submenu,” p. 66.
68
3 The Fit Y by X Platform Logistic Regression—The Categorical by Continuous Case
each response category. The probabilities are measured as the vertical distance between the curves, with the total across all Y category probabilities sum to 1. Figure 3.25 Interpreting the Logistic Plot P(thread wear is severe) at load size 250 P(thread wear is moderate) at load size 250 P(thread wear is low) at load size 250 These three probabilities sum to one
Markers for the data are drawn at their x-coordinate, with the y position jittered randomly within the range corresponding to the response category for that row. You can see that the points tend to push the lines apart and make vertical space where they occur in numbers, and allow the curves to get close together where there is no data. The data pushes the curves because the criterion that is maximized is the product of the probabilities fitted by the model. The fit tries to avoid points attributed to have a small probability, which are points crowded by the curves of fit. There are only a couple of options in the Logistic drop-down menu (see Figure 3.26)—turning the plot on and off through the Logistic Plot command, and Script options. (See “Script Submenu,” p. 66 for details on scripting options.)
The Whole Model Test Table The Whole Model Test table shows if the model fits better than constant response probabilities. This table is analogous to the Analysis of Variance table for a continuous response model. It is a specific likelihood-ratio Chi-square test that evaluates how well the categorical model fits the data. The negative sum of logs of the observed probabilities is called the negative log-likelihood (–LogLikelihood). The
3 The Fit Y by X Platform Logistic Regression—The Categorical by Continuous Case
69
In Figure 3.25, the p-value of the Chi-square statistic is 0.3657, which is not statistically significant.
Platform Options The menu of platform options is shown in Figure 3.26. They turn various plot elements on and off, and allow adjustment of the line color. In addition, the standard Script menu appears. Figure 3.26 Logistic Plot Popup Menu
3 Fit Y by X: Comparing Two Variables
negative log-likelihood for categorical data plays the same role as sums of squares in continuous data. The difference in the log-likelihood from the model fitted by the data and the model with equal probabilities is a Chi-square statistic. This test statistic examines the hypothesis that the x variable has no effect on the response.
4 The Matched Pairs Platform Some two-variable data have a natural pairing to them. A classic example is a before-and-after study of the effect of a medication. Data in this form are handled by the Matched Pairs platform.
Introduction After starting JMP Student Edition, Open the data file Denim.jmp. Details about this data are found in “About the Data,” p. 21 in the “The Distribution Platform” chapter.
Preparing the Data This example examines the starch content of blue jeans, with one group having been sand blasted, and the other not. The examination is of jeans that come from the same lot, so they form a paired situation, and call for the Matched Pairs platform. To use the Matched Pairs platform, the paired data must be in two columns. However, in the Denim file, all the starch data is in the single column Starch Content (%). Therefore, the column needs to be split into two starch columns, based on whether the denim was sand blasted or not. To split the data in this way, Select Tables > Split. In the dialog that results, select Sand Blasted? from the list of columns and click Split By. Select Starch Content(%) from the list of columns and click Split Columns. Select Lot Number and Method and click Group. At this point, the split command is set to make a new data table, having split Starch Content (%) into two columns based on the value in the Sand Blasted? column. In the forthcoming example analysis, all of the original variables are not used, so they do not need to be included in this new table. In fact, no variables other than the ones already in the dialog need to be retained. To drop the unnecessary variables, Select the Drop All radio button at the bottom of the Split dialog. In the Output table name box, type “Paired Denim” to name the new data table. The dialog should appear like the one in Figure 4.1.
72
4 The Matched Pairs Platform Introduction
Figure 4.1 Split Columns Dialog
Click OK to create the data table. The data table appears as in Figure 4.2, with new columns no and yes containing starch information. Figure 4.2 Paired Denim Data
Launching the Platform Select Analyze > Matched Pairs from the menu bar. This brings up the Matched Pairs platform launch dialog as shown in Figure 4.4.
73
4 The Matched Pairs Platform Introduction
Select the no and yes variables from the columns list and click the Y, Paired Response button. The resulting report is easily interpretable, as shown in Figure 4.3. Figure 4.3 Paired Denim Matched Pairs Report The red line is the difference given by the data
The horizontal gray line represents zero The dotted lines are a 95% confidence interval on the difference. If it doesn’t contain zero, the difference is significant
The text reports below this plot show the same result—there is a difference in starch content of denim based on sand blasting, with a p-value of 0.002.
The Matched Pairs Launch Dialog The Matched Pairs platform launch dialog (Figure 4.4) requires at least two variables to be entered. These two variables are the values that are paired. Figure 4.4 Matched Pairs Launch Dialog
Optionally, a grouping variable can be entered in the X, Grouping role to have JMP Student Edition estimate means for the groups, and test both between and among the pairs. See the online help for an example using a grouping variable.
The Matched Pairs Scatterplot After it is launched, the Matched Pairs platform displays a scatterplot and numerical results. The primary graph in the platform is a plot of the difference of the two responses on the y-axis, and the mean of the two responses on the x-axis. This graph is the same as a scatterplot of the two original variables,
4 Paired Variables
To launch the dialog, select the two paired variables to be analyzed.
74
4 The Matched Pairs Platform Introduction
but turned 45° clockwise (see Figure 4.5). A 45° rotation turns the original coordinates into a difference and a sum. By rescaling, this plot shows the difference between the two variables, and the mean of the two variables. See the online help for more details of this transformation. Figure 4.5 Comparison of Scatterplot and Matched Pairs Plot
Notice the following in Figure 4.6: • The 45° tilted square shows the frame of the scatterplot of the original columns. • The mean difference is shown as the horizontal line, with the 95% confidence interval above and below. If the confidence region includes the horizontal line at zero, then the means are not significantly different at the 0.05 level. In the example shown in Figure 4.6, the difference is significant. • The mean of the mean of pairs is shown by the vertical line. Figure 4.6 The Matched Pairs Scatterplot 95% Confidence Interval Mean Difference Line where the two variables are equal Mean of Means
The Matched Pairs menu, shown in Figure 4.7, allows two plot options — plotting the difference by the mean, as in Figure 4.6, or plotting the difference by the row number. The square reference frame can also be toggled on and off, and standard scripting items are available. See “Script Submenu,” p. 66 for details on the Script submenu. Figure 4.7 The Matched Pairs Men
4 The Matched Pairs Platform Interpreting the Matched Pairs Plot
75
There are many possibilities for making statements regarding the patterns to be discovered in the new, rotated coordinates. The examples below show six different situations and their interpretations. Figure 4.8 No Change
The distribution vertically is small and centered at zero. The change from Y1 to Y2 is not significant. This is the high-positive-correlation pattern that is the typical situation. Figure 4.9 Highly Significant Shift Down
The Y2 score is consistently lower than Y1 across all subjects.
4 Paired Variables
Interpreting the Matched Pairs Plot
76
4 The Matched Pairs Platform Interpreting the Matched Pairs Plot
Figure 4.10 No average Shift, But Amplified Relationship
This situation shows a low variance of the difference, and high variance of the mean of the two values within a subject. Overall, the mean is the same from Y1 to Y2, but individually, the high scores got higher and the low scores got lower. Figure 4.11 No Average Shift, But Reverse Relationship
This example shows a high variance of the difference, and low variance of the mean of the two values within a subject. Overall, the mean is the same from Y1 to Y2, but the high Y1s are associated with low Y2s, and vice-versa. This is a high-negative-correlation pattern, and is unusual.
4 The Matched Pairs Platform Interpreting the Matched Pairs Plot
77
Overall, the mean is the same from Y1 to Y2, but the high scores drop a little, and low scores increase a little.
4 Paired Variables
Figure 4.12 No Average Shift, Positive Correlation, but Damped Instead of Accelerated
5 The Fit Model Platform General linear models—those that have more complicated forms than can be fit with simple linear regression—are fit with the Fit Model platform. Standard least squares fitting, including stepwise procedures, are fit using this single platform.
Introduction After starting JMP Student Edition, Open the data file Denim.jmp. Details about this data are found in “About the Data,” p. 21 in the “The Distribution Platform” chapter. In this introduction, models are developed to determine which variables (if any) are predictors of the starch content of denim, and which are predictors of thread wear in denim.
Launching the Platform Select Analyze > Fit Model from the menu bar. This brings up the Fit Model dialog, which is illustrated in Figure 5.1. To begin with, fit a simple model with only main effects—no interactions, no powers. In the list of columns, select Starch Content (%) and click the Y button. To select all the model effects together, click on Method, hold down the Shift key, click on Sand Blasted?, then click the Add button. Click Run Model. Unlike the other platform launchers in JMP Student Edition, the Fit Model dialog does not go away once the model is run. To see that it is still available, Click Window in the top menu bar to see a list of open windows.
80
5 The Fit Model Platform Introduction
Figure 5.1 The Fit Model Dialog
Setting Titles This report generated here is used later in this presentation, so it will need to be identified among the other windows. To make identification easy, change its title to something meaningful as follows. Make sure the report window is the front window. Select Window > Set Title. (Window > Set Report Title on Macintosh) In the resulting dialog box, type “Main Effects Only”. Title bars within the reports are also editable. Double click on the title bar that says Response Starch Content (%). Again type “Main Effects Only” and press Enter.
5 The Fit Model Platform Introduction
81 5 Fit Model
Examining Results Now, examine these results. There is a lot of information in this report, and although only a portion is used in this example, all of it is documented in sections that follow. In this initial look at the data, first check if the model as a whole is significant. Then, look at the p values associated with each effect, printed in several places, including just below each leverage plot. Leverage plots are detailed in “Leverage Plots,” p. 91. The first leverage plot is for the entire model, and its p value indicates that the model is significant. To examine the p values for the individual effects, scroll the window until each effect is visible. Scrolling is accomplished using scroll bars along the edges of the window, or by using the Scroller tool. Figure 5.2 The Scroller Tool
Scroller Tool
To scroll using the Scroller tool, Shift-click the Scroller tool in the Tools toolbar. Shift-clicking a tool keeps the tool active for multiple clicks. Without Shift-clicking, JMP Student Edition reverts back to the arrow tool after other tools’ first use. Move the Scroller tool over the results report.
82
5 The Fit Model Platform Introduction
Hold the mouse button and move the scroller tool to see the window move. Try moving the mouse while releasing the mouse button, repeatedly, in short movements. (That is, several short click-and-drags.) This illustrates the “inertia” that the scroller tool imparts on reports.
Least Squares Means Least squares means, called LSMeans by JMP Student Edition, show the values of the response (starch content in this case) for levels of a nominal effect. The response values are adjusted for the other terms in the model, so that the effect of each variable can be examined. By default, the LSMeans are displayed in this example model. To see a plot of the LSMeans, Select LSMeans Plot from the drop-down menu on the title bar of an effect. Figure 5.3 LS Means Table and Plot
In this case, the plot suggests that when controlled for the other effects in the model, the starch content for Caustic Soda is higher than Pumice Stone, which is in turn higher than Alpha Amalyze.
Re-running an Analysis After moving around the report a bit and observing the p-values for each effect, it should be clear that they are all significant at the 0.05 level. However, this analysis is fairly primitive, since it does not consider any interactions among the variables. Remove the existing effects from the model and re-run the analysis with interactions by doing the following. Click on the Window menu and select the Fit Model dialog. There are two ways to remove effects from the model. We want to remove all the effects in this case. Select one of the variables and click the Remove button located above the effects list. Double-click on each of the other variables. Now, add in an interaction effect. Select Method and Size of Load (lbs) in the list of columns and click the Cross button.
5 The Fit Model Platform Introduction
83
Select Method, Size of Load (lbs), and Sand Blasted? in the model effects list, remembering that the Control (Windows) or a (Macintosh) key allows for multiple selections. Click the Macros button and select Full Factorial from the popup menu.
The appropriate effects appear in the effects list. Make sure Starch Content (%) is still in the Y role at the top of the launch dialog.
5 Fit Model
Since it is rather tedious to specify all main effects and all crossed effects one at a time, JMP Student Edition provides some pre-defined macros to add popular effects combinations to models. These macros are completely discussed in “Macros,” p. 86. For now, request a full factorial model—all main effects with all possible interactions.
84
5 The Fit Model Platform Introduction
Click Run Model. Another report appears, this time much larger. A prudent model maker would, at this point, examine the p values of each effect, remove them one at a time from the model, then re-run the model to repeat the process until all effects are significant.
For example, noting that neither of the levels of the Method*Size of Load*Sand Blasted effect are significant at the 0.05 level, Again bring the Fit Model dialog to the front using the Window menu. Remove the Method*Size of Load*Sand Blasted effect. Click Run Model. This is exactly the reason that the Fit Model dialog persists, even after clicking the Run Model button. Many models have to be tweaked after initial results are examined. If you examine the new model results, you see that the Method*Size of Load effect levels are not significant, so it can be removed from the model as well. This iterative procedure may be repeated several times.
Linear Contrasts Another common task is to test that levels within an effect are different from each other. This is accomplished by using linear contrasts. For example, to test that the Alpha Amalyze wash method is significantly different from the Pumice Stone wash method, Select LS Means Contrast from the drop-down menu in the title bar of the Method variable. JMP Student Edition attaches a Contrast Dialog (Figure 5.4) to the report, where details of the linear contrast are specified. Figure 5.4 Contrast Dialog
Click the + button once next to Alpha Amalyze.
5 The Fit Model Platform The Fit Model Dialog Box
85
Click Done.
This test shows a highly significant p-value, giving confirmation that Alpha Amalyze is significantly different from Pumice Stone in affecting the resulting starch content of denim.
The Fit Model Dialog Box Regardless of the model to be fit, the Fit Model dialog box is the first step. The dialog box is completely illustrated in Figure 5.1 on page 80. This is where the roles of each variable get specified and the type of fit is selected. In JMP Student Edition, the type of fit (standard least squares, nominal logistic, or ordinal logistic) is referred to as the fitting personality. The Fit Model dialog is different from JMP Student Edition’s other launch dialogs in that it does not disappear after the model is launched. This facilitates experimentation with the model. If one of the variables is not significant, it can be removed and the model re-run quickly. To remove a variable from its role, highlight it and click Remove, or, alternatively, double-click on the variable’s name.
Roles To assign a variable to a role, select the variable name and click the appropriate button. The roles a variable can take are: • Y, which identifies one or more response variables (the dependent variables) • Weight, an optional role that identifies a column whose values signify the importance of each row in the model
5 Fit Model
Click the - button once next to Pumice Stone.
86
5 The Fit Model Platform The Fit Model Dialog Box
• Freq, an optional role that identifies a column whose values designate the frequency of rows in the analysis
Model Effects Effects are added to the model by using the buttons in the Construct Model Effects section of the dialog. To add a simple regressor to the model, select the variable name and click the Add button. To add a crossed effect to a model, select the two variables to be crossed (use Control-click or a-click for multiple selections) in the Select Columns list and click the Cross button. When levels of an effect (call it B) only occur within a single level of an effect (call it A), then B is said to be nested within A, and A is called the outside effect. To add a nested effect, • Select the outside effects in the column selection list and click Add or Cross. • When the outside effect appears in the Model Effects list, select it again. • Select the nested variable in the column selection list and click Nest.
Macros Common models can be generated using the macros drop-down list. Figure 5.5 Macros drop-down list
The following models are available: Full Factorial To look at many crossed factors, such as in a factorial design, use Full Factorial. It creates the set of effects corresponding to all crossings of all variables selected in the columns list. For example, with selected variables A, B, and C, the Full Factorial selection places A, B, C, A*B, A*C, B*C, and A*B*C in the Model Effects list. Factorial to Degree To create a limited factorial, select Factorial to Degree and enter the degree of interactions in the Degree box. A second degree factorial is a very common analysis.
5 The Fit Model Platform The Fit Model Dialog Box
87
The Factorial Sorted selection creates the same set of effects as Full Factorial, but lists them in order of degree. All main effects are listed first, followed by all two-way interactions, then all three-way interactions, and so forth. Response Surface Response surface models find the values of the terms that produce a maximum or a minimum expected response. This is accomplished by fitting a collection of terms in a quadratic model. The critical values for the surface are calculated from the parameter estimates and presented with a report on the shape of the surface. To specify a Response Surface effect, select the variable name, then select Response Surface Effect from the Attributes menu. Response surface effects appear with an ampersand (&) appended to their name. Mixture Response Surface Mixture response surface variables are selected in the same way as Response Surface Effect variables. Select Mixture Response Surface from the Attributes menu after selecting a variable name. Polynomial to Degree Polynomial effects are a series of terms that are powers of a single variable. To specify a polynomial effect, • click one or more variables in the column selection list • enter the degree of the polynomial in the Degree box • select the Polynomial to Degree command in the Macros popup menu. Scheffe Cubic Cubic Models are an advanced topic not usually covered in an advanced course. See the JMP help if you are interested in Scheffe cubics.
Fitting Personalities The three available personalities for model fitting are available in the Personality drop-down list, shown in Figure 5.6. Figure 5.6 Fitting Personalities
• Standard Least Squares models one or more continuous responses in the usual way through fitting a linear model by least squares. • Stepwise regression is an approach to selecting a subset of effects for a regression model. The Stepwise feature computes estimates that are the same as those of the Standard Least Squares
5 Fit Model
Factorial Sorted
88
5 The Fit Model Platform Fit Model Report Items
personality, but it facilitates searching and selecting among many models. The Stepwise personality allows only one continuous Y. Multiple categorical responses call for MANOVA or other advanced methods, only available in the professional version of JMP.
Emphasis Choices The Emphasis popup menu controls which plots and tables are initially shown in the analysis report: • Effect Leverage initially displays leverage and residual plots for the whole model. Select effect details and other statistical reports from the report itself. • Effect Screening shows whole-model information, followed by a scaled parameter report and the Prediction Profiler. • Minimal Report suppresses all plots. Request plots and reports from the report itself.
Run Model The Run Model button submits the model to the fitting platform, but does not close the dialog window. Use the dialog to make changes to the model for additional fits, or make changes to the data and then refit the same model.
Fit Model Report Items When a model is fit with the Standard Least Squares or Stepwise personality, several reports appear based on the Emphasis selected in the Fit Model dialog. Any report that does not appear by default can be requested from the platform menu.
5 The Fit Model Platform Fit Model Report Items
89
Regression reports give textual information about the fit. The Summary of Fit Table The Summary of Fit table appears first and shows the following numeric summaries of the response for the multiple regression model: Rsquare (R2) estimates the proportion of the variation in the response around the mean that can be
attributed to terms in the model, rather than to random error. It is also the square of the correlation between the actual and predicted response. An R2 of 1 occurs when there is a perfect fit (the errors are all zero). An R2 of 0 means that the fit predicts the response no better than the overall response mean. Rsquare Adj adjusts R2 to make it more comparable over models with different numbers of parame-
ters. Since adding terms to an existing model always increases R2, this adjustment compensates for adding terms to a model that already has terms in it. It is a ratio of mean squares instead of sums of squares. Root Mean Square Error estimates the standard deviation of the random error. It is the square root of the mean square for error in the corresponding analysis of variance table, and it is commonly denoted as s.
The Mean of Response is the overall mean of the response values. It is important as a base model for prediction because all other models are compared to it. The variance measured around this mean is the Corrected Total (C Total) mean square in the Analysis of Variance table. Observations (or Sum of Weights) records the number of observations used in the fit. If there are no
missing values and no excluded rows, this is the same as the number of rows in the data table. If there is a column assigned the role of weight, this is the sum of that column’s values. Weights are used in weighted least squares — an advanced topic. The Analysis of Variance Table The Analysis of Variance table shows the basic calculations for a linear model. The table compares the model to a model containing only the mean: Source lists the three sources of variation, called Model, Error, and C Total. DF records an associated degrees of freedom for each source of variation.
The C Total degrees of freedom is for the simple mean model. There is only one degree of freedom used (the estimate of the mean parameter) in the calculation of variation, so the C Total DF is always one less than the number of observations. The total degrees of freedom are partitioned into the Model and Error terms: • The Model degrees of freedom is the number of parameters (except for the intercept) used to fit the model. • The Error DF is the difference between the C Total DF and the Model DF. Sum of Squares records an associated sum of squares for each source of variation. Each is the sum of squares of the differences between the fitted response and the actual response.
• The Total (C Total) SS is the sum of squared distances of each response from the sample mean.
5 Fit Model
Regression Reports
90
5 The Fit Model Platform Fit Model Report Items
• The Error SS is the sum of squared differences between the fitted values and the actual values. This sum of squares corresponds to the unexplained error (residual) after fitting the regression model. A Mean Square is a sum of squares divided by its associated degrees of freedom. This computation converts the sum of squares to an average. The F Ratio is the model mean square divided by the error mean square. It tests the hypothesis that all the regression parameters (except the intercept) are zero. If there is a significant effect in the model, the F Ratio is higher than expected by chance alone. Prob>F is the probability of obtaining a greater F value by chance alone if the specified model fits no better than the overall response mean. Significance probabilities of 0.05 or less are often considered evidence that there is at least one significant regression factor in the model.
Note that large values of Model SS and small values of Error SS lead to large F ratios and low p values— desirable if the goal is to declare that terms in the model are significantly different from zero. Most practitioners check this F test first and make sure that it is significant before delving further into the details of the fit. This significance is also shown graphically by the whole-model leverage plot, described in “Leverage Plots,” p. 91. The Lack of Fit Table The Lack of Fit table shows a special diagnostic test and appears only when the data and the model provide the opportunity. Sometimes, it is possible to estimate the error variance independently of whether the right form of the model is the one under consideration. This occurs when observations are exact replicates of each other in terms of the X variables. The error for these exact replicates is called pure error. This is the portion of the sample error that cannot be explained or predicted no matter which form the model uses for the X variables. The difference between the residual error from the model and the pure error is called lack of fit error. A lack of fit error can be significantly greater than pure error if a regressor is in the model with the wrong functional form, or if too few interaction effects exist in an analysis of variance model. In these cases, consider adding interaction terms, if appropriate, to try to better capture the functional form of a regressor. There are two common situations where there is no lack of fit test: • There are no exactly replicated points with respect to the X data, and therefore there are no degrees of freedom for pure error. • The model is saturated, meaning that the model itself has a degree of freedom for each different X value; therefore, there are no degrees of freedom for lack of fit. The Lack of Fit table shows information about the error terms: Source lists the three sources of variation called Lack of Fit, Pure Error, and Total Error. Note that the pure error DF is pooled from each group where there are multiple rows with the same values for each effect.
The remaining portions of the Lack of Fit table are similar to those of the Analysis of Variance Table. The only additional information is the Max RSq, the maximum R2 that can be achieved by using only the variables in the model.
5 The Fit Model Platform Fit Model Report Items
91
The Parameter Estimates table shows the estimates of the parameters in the linear model and a t-test for the hypothesis that each parameter is zero. Simple continuous regressors have only one parameter. Models with complex classification effects have a parameter for each anticipated degree of freedom. The Parameter Estimates table shows these quantities: Term names the estimated parameter. The first parameter is always the intercept. Simple regressors
show as the name of the data table column. Regressors that are dummy indicator variables constructed from nominal or ordinal effects are labeled with the names of the levels in brackets. The dummy variables are coded as 1, except for the last level, which is coded as –1 across all the other dummy variables for that effect. Estimate lists the parameter estimates for each term. Std Error is the standard error, an estimate of the standard deviation of the distribution of the parameter estimate. This is the value used to construct t-tests and confidence intervals for the parameter. t Ratio is a statistic that tests whether the true parameter is zero. It is the ratio of the estimate to its standard error. Prob>|t| is the probability of getting an greater t statistic (in absolute value), given the hypothesis that the parameter is zero. This is the two-tailed test against the alternatives in each direction. Probabilities less than 0.05 are often considered as significant evidence that the parameter is not zero.
The Effect Test Table The Effect Test table shows the following information for each effect: Source lists the names of the effects in the model. Nparm is the number of parameters associated with the effect. Continuous effects have 1 parameter.
Nominal effects have one less parameter than the number of levels. Crossed effects multiply the number of parameters for each term. Nested effects depend on how levels occur. DF is the degrees of freedom for the effect test. Note that if DF is zero, no part of the effect is testable. Whenever DF is less than Nparm, the note Lost DFs appears to the right of the line in the report. Sum of Squares is the sum of squares for the hypothesis that the listed effect is zero. F Ratio is the F statistic for testing that the effect is zero. It is the ratio of the mean square for the effect divided by the mean square for error. Prob>F is the significance probability for the F ratio. It is the probability that if the null hypothesis is true, a larger F statistic would only occur due to random error. Values less than 0.0005 appear as 0.0000.
Leverage Plots Leverage plots reveal the significance of an effect in the model. These plots show point-by-point what the residual would be both with and without that effect in the model (See Figure 5.8). The fitting platform produces a leverage plot for each effect in the model. An example leverage plot is shown in Figure 5.7.
5 Fit Model
The Parameter Estimates Table
92
5 The Fit Model Platform Fit Model Report Items
Figure 5.7 Example Leverage Plot from Denim Data
In addition, there is a special leverage plot titled Whole Model that shows the actual values of the response plotted against the predicted values. This Whole Model leverage plot dramatizes the test that all the parameters (except intercepts) in the model are zero. This illustrates the same test reported in the Analysis of Variance report. In general, the horizontal line on the plot represents what the values of the model would be if the effect was removed from the model. The sloped line represents the values of the model with the effect included. Significance of the effect is seen by comparing the slope of the sloped line with that of the horizontal one, as in Figure 5.9. Figure 5.8 General Leverage Plot residual
residual constrained by hypothesis
points farther out pull on the line of fit with greater leverage than the points near the middle Figure 5.9 Significance of Effects Significant
confidence curve crosses horizontal line
Borderline
Not Significant
confidence curve asymptotic to horizontal line
confidence curve does not cross horizontal line
5 The Fit Model Platform Fit Model Report Items
93
In a Standard Least Squares analysis, the following effect details are available, dealing with least squares means, designated LS Means by JMP Student Edition. More detailed descriptions of each command are available in the online help. Figure 5.10 Effect Details
• LS Means Table shows predicted values from the specified model across the levels of a categorical effect. The other model factors are controlled —that is, set to neutral values. Least squares means shows which levels produce higher or lower responses, holding the other variables in the model constant. Least squares means are also called adjusted means or population marginal means. • LS Means Plot plots the LSMeans for nominal and ordinal main effects and two-way interactions. • LS Means Contrast displays a dialog for specifying contrasts with respect to an effect. (See “Linear Contrasts,” p. 84 for an example of using contrasts.) This command is enabled only for categorical effects. To construct a contrast, click the + and - buttons beside the levels to be compared. If possible, the dialog normalizes after each click to make the sum for a column zero and the absolute sum equal to two after each click. It adds to the plus or minus score proportionately. • The LS Means Student’s t command requests multiple comparison tests.
Exploring the Estimates The following commands allow you to further explore the estimated coefficients of the model. Expanded Estimates The standard Fit Model output includes a Parameter Estimates Table, as seen in Figure 5.11. For continuous effects, the estimates are the (estimated) coefficients of each term in the linear model. For nominal effects, the estimates are the coefficients of dummy variables whose value is 1 for all levels of the variable except the last, which gets the value -1. Ordinal effects show coefficients for dummy variables that measure the difference at levels of the variable from the mean of all levels of the effect. Figure 5.11 Parameter Estimates Table
The Expanded Estimates command shows the same information, but with a coefficient for each continuous variable and each level of other variables.
5 Fit Model
Effect Details
94
5 The Fit Model Platform Fit Model Report Items
Figure 5.12 Expanded Estimates Table
Compare the expanded estimates with the prediction formula for this model, shown here.
Custom Test In introductory statistics courses, null hypotheses are often about one variable at a time, frequently hypothesizing that a parameter is zero. However, it is possible to test far more complicated null hypotheses than this. For example, it is reasonable to test that several parameters are zero, one, or another value, or that some parameters are equal to others. These tests are known in statistics as a general linear hypotheses, and are tested using JMP’s Custom Test command. To test a custom hypothesis, select Custom Test from the Estimates popup menu, which displays the dialog shown in Figure 5.13. Figure 5.13 Custom Test Launch You can enter a descriptive label for this test—useful if you are doing several tests
The space beneath the Custom Test title bar is an editable area for entering a test name.
5 The Fit Model Platform Fit Model Report Items
95
One of the parameters is labeled “=”. In the edit box to its right, enter the value that you are testing the contrast against. For example, you may be testing that a certain combination of the factors sums to 1. You would enter a 1 beside the “=” in this dialog. Add Column adds another column of zeros so that several linear functions of the parameters can be jointly tested. Use the Add Column button to add as many columns to the test as needed.
When the test is specified, click Done to see the test performed. The results are appended to the bottom of the dialog. When the custom test is done, the report lists the test name, the function value of the parameters tested, the standard error, and other statistics for each test column in the dialog. A joint F test for all columns is at the bottom. Sample output for a custom test (that the Size of Load coefficient is equal to 1) is shown in Figure 5.14. Figure 5.14 Custom Test Output
Note: For tests within a categorical effect, instead of using a Custom test, consider using the contrast dialog, which tests hypotheses about the least squares means.
Correlation of Estimates The Correlation of Estimates option in the Estimates platform menu produces a correlation matrix for the all effects in a model.
Row Diagnostics Leverage Plots (the Plot Actual by Predicted and Plot Effect Leverage commands) are covered previously in this chapter under “Leverage Plots,” p. 91.
5 Fit Model
Parameter lists the names of the model parameters. To the right of the list of parameters are columns of zeros corresponding to these parameters. Click in these cells to enter a new hypothesized parameter value corresponding to the desired test.
96
5 The Fit Model Platform Fit Model Report Items
• Plot Actual by Predicted displays the observed values by the predicted values of Y. This is the leverage plot for the whole model. • Plot Effect Leverage produces a leverage plot for each effect in the model showing the point-by-point composition of the test for that effect. • Plot Residual By Predicted displays the residual values by the predicted values of Y. You typically want to see the residual values scattered randomly about zero. • Plot Residual By Row displays the residual value by the row number of its observation. • Durbin-Watson Test displays the Durbin-Watson statistic to test whether or not the errors have first-order autocorrelation. The autocorrelation of the residuals is also shown. The Durbin-Watson table has a popup command that computes and displays the exact probability associated with the statistic. This Durbin-Watson table is only appropriate for time series data when you suspect that the errors are correlated across time.
Save Commands The Save submenu offers the following choices. Each selection generates one or more new columns in the current data table titled as shown, where colname is the name of the response variable: Prediction Formula creates a new column, called Pred Formula colname, containing the predicted values computed by the specified model. It differs from the Save Predicted Values column in
that the prediction formula is saved with the new column. This is useful for predicting values in new rows or for obtaining a picture of the fitted model. Use the Column Info command and click the Edit Formula button to see the prediction formula. The prediction formula can require considerable space if the model is large. If you do not need the formula with the column of predicted values, use the Save Predicted Values option. Predicted Values creates a new column called Predicted colname that contain the predicted val-
ues computed by the specified model. Residuals creates a new column called Residual colname containing the residuals, which are the
observed response values minus predicted values. Mean Confidence Interval creates two new columns called Lower 95% Mean colname and Upper 95% Mean colname. The new columns contain the lower and upper 95% confidence
limits for the line of fit. Note: If you hold down the Shift key and select Save Mean Confidence Interval, you are prompted to enter an α-level for the computations. Individual Confidence Interval creates two new columns called Lower95% Indiv colname and Upper95% Indiv colname. The new columns contain lower and upper 95% confidence limits for
individual response values. Note: If you hold down the Shift key and select Save Individual Confidence Interval, you are
prompted to enter an α-level for the computations.
Studentized Residuals creates a new column called Studentized Resid colname. The new col-
umn values are the residuals divided by their standard error. creates a new column, called StdErr Pred colname, containing the standard errors of the predicted values.
Std Error of Predicted
5 The Fit Model Platform Fit Model Report Items
97
dard errors of the residual values. Std Error of Individual creates a new column, called StdErr Indiv colname, containing the stan-
dard errors of the individual predicted values. creates a set of new columns that contain the values for each leverage plot. The new columns consist of an X and Y column for each effect in the model. The columns are named as follows. If the response column name is R and the effects are X1 and X2, then the new column names are
Effect Leverage Pairs
X Leverage of X1 for R
Y Leverage of X1 for R
X Leverage of X2 for R
Y Leverage of X2 for R.
5 Fit Model
Std Error of Residual creates a new column called, StdErrResid colname, containing the stan-
6 Stepwise Regression Stepwise regression is an approach to selecting a subset of effects for a regression model. It is used when there is little theory to guide the selection of terms for a model and the modeler, in desperation, wants to use whatever seems to provide a good fit. The approach is somewhat controversial. The significance levels on the statistics for selected models violate the standard statistical assumptions because the model has been selected rather than tested within a fixed model. On the positive side, the approach has been of practical use for 30 years in helping to trim out models to predict many kinds of responses. The book Subset Selection in Regression, by A. J. Miller (1990), brings statistical sense to model selection statistics. This chapter uses the term “significance probability” in a mechanical way to represent that the calculation would be valid in a fixed model, recognizing that the true significance probability could be nowhere near the reported one.
Introduction In JMP, stepwise regression is a personality of the Model Fitting platform—it is one of the selections in the Fitting Personality popup menu on the Model Specification dialog (see Figure 5.6 on page 87). The Stepwise feature computes estimates that are the same as those of other least squares platforms, but it facilitates searching and selecting among many models. As an example, Open the Fitness.jmp data table in the Sample Data folder. This data shows results from an aerobic fitness study. Figure 6.1 shows a partial listing of the Fitness.jmp data table.
Aerobic fitness can be evaluated using a special test that measures the oxygen uptake of a person running on a treadmill for a prescribed distance. However, it would be more economical to find a formula that uses simpler measurements that evaluate fitness and predict oxygen uptake. To identify such an equation, measurements of age, weight, runtime, and pulse were taken for 31 participants who ran 1.5 miles. To find a good oxygen uptake prediction equation, you need to compare many different regression models. The Stepwise platform lets you search through models with combinations of effects and choose the model you want.
100
6 Stepwise Regression Introduction
Figure 6.1 The Fitness Data Table
Note: For purposes of illustration, certain values of MaxPulse and RunPulse have been changed from data reported by Rawlings (1988, p.105).
Figure 6.2 Model Specification Dialog for a Stepwise Model
To do stepwise regression, Select Fit Model in the Analyze menu. In the Model Specification dialog, Choose Oxy as the Y response Choose Weight, Runtime, RunPulse, RstPulse, and MaxPulse as Effects. Select Stepwise from the Fitting Personality popup menu. Click Run Model.
6 Stepwise Regression Introduction
101
Leave 0.25 as the Prob to Enter. We now have a choice of three stepwise methods: Forward (where effects are added as they become significant), Backward (where effects are removed as they become noon-significant) or Mixed, a combination of the two detailed below. This example uses forward selection, so Leave the default Forward selection method. We now want to add significant effects. To add the first detected effect, Click the Step button. After one step, the most significant term Runtime is entered into the model (top Current Estimates table in Figure 6.3). To automatically add all the detected effects automatically (rather than manually with the Step button) Click Go to see the stepwise process run to completion. The bottom table in Figure 6.3 shows that all the terms have been added except RstPulse and Weight which are not significant at the Prob to Enter value of 0.25 specified in the Stepwise Regression Control Panel. Figure 6.3 Current Estimates Table
After one step
After all steps
Now that we have selected the effects that contribute to explaining Oxy, we can make a model and examine its analysis. Click Make Model. A Fit Model dialog appears. Click Run Model. This produces a report identical to those seen in “The Fit Model Platform” chapter.
6 Stepwise Regression
When the report appears, you are presented with a control panel, used to specify how effects should enter or exit the model. In this example, we to add any effects that are significant at the 0.25 level or better.
102
6 Stepwise Regression The Stepwise Window
The Stepwise Window When launched, the stepwise platform displays a window that shows three areas: • The Stepwise Regression Control panel, which is an interactive control panel for operating the platform • The Current Estimates table, which shows the current status of the specified model, and provides additional control features • The Step History table, which lists the steps in the stepwise selection. The following sections describe the components of these areas and tell how to use them.
Stepwise Regression Control Panel The Stepwise Regression Control Panel (Control Panel for short), shown next, has editable areas, buttons and a popup menu. You use these dialog features to limit regressor effect probabilities, determine the method of selecting effects, begin or stop the selection process, and create a model.
You use the Control Panel as follows: • Prob to Enter is the significance probability that must be attributed to a regressor term for it to be considered as a forward step and entered into the model. Click the field to enter a value. • Prob to Leave is the significance probability that must be attributed to a regressor term in order for it to be considered as a backward step and removed from the model. Click the field to enter a value. • Direction accesses the popup menu shown here, which lets you choose how you want variables to enter the regression equation.
• Forward brings in the regressor that most improves the fit, given that term is significant at the level specified by Prob to Enter.
6 Stepwise Regression The Stepwise Window
103
• Mixed alternates the forward and backward steps. It includes the most significant term that satisfies Prob to Enter and removes the least significant term satisfying Prob to Leave. It continues removing terms until the remaining terms are significant and then it changes to the forward direction. Buttons on the controls panel let you control the stepwise processing: • Go starts the selection process. The process continues to run in the background until the model is finished. • Stop stops the background selection process. • Step stops after each step of the stepwise process • Enter All enters all unlocked terms into the model. • Remove All removes all terms from the model. • Make Model forms a model for the Model Specification Dialog from the model currently showing in the Current Estimates table. In cases where there are nominal or ordinal terms, Make Model can create new data table columns to contain terms that are needed for the model.
Current Estimates Table The Current Estimates table lets you enter, remove, and lock in model effects. The platform begins with no terms in the model except for the intercept, as is shown here. The intercept is permanently locked into the model.
You use check boxes to define the stepwise regression process: • Lock locks a term in or out of the model. Lock does not permit a term that is checked to be entered or removed from the model. Click an effect’s check box to change its lock status. • Entered shows whether a term is currently in the model. You can click a term’s check box to manually bring an effect into or out of the model. The following quantities update continually during the fitting process and are used in determining the final model. • Parameter lists the names of the regressor terms (effects). • Estimate is the current parameter estimate. It is missing (.) if the effect is not currently in the model.
6 Stepwise Regression
• Backward removes the regressor that affects the fit the least, given that term is not significant at the level specified in Prob to Leave.
104
6 Stepwise Regression The Stepwise Window
• nDF is the number of degrees of freedom for a term. A term has more than one degree of freedom if its entry into a model also forces other terms into the model. • SS is the reduction in the error (residual) SS if the term is entered into the model or the increase in the error SS if the term is removed from the model. If a term is restricted in some fashion, it could have a reported SS of zero. • “F Ratio” is the traditional test statistic to test that the term effect is zero. It is the square of a t-ratio. It is in quotation marks because it does not have an F-distribution for testing the term because the model was selected as it was fit. • “Prob>F” is the significance level associated with the F-statistic. Like the “F Ratio,” it is in quotation marks because it is not to be trusted as a real significance probability. Statistics for the current model appear above the list of effects: • SSE, DFE, MSE are the sum of squares, degrees of freedom, and mean square error (residual) of the current model. • RSquare is the proportion of the variation in the response that can be attributed to terms in the model rather than to random error. • RSquareAdj adjusts R2 to make it more comparable over models with different numbers of parameters by using the degrees of freedom in its computation. The adjusted R2 is useful in stepwise procedure because you are looking at many different models and want to adjust for the number of terms in the model. • Cp is Mallow’s Cp criterion. • AIC is Akaike’s Information Criterion.
Step History Table As each step is taken, the Step History table records the effect of adding a term to the model. The Step History table for the Fitness data example shows the order in which the terms entered the model and shows the effect as reflected by R2 and Cp.
If you use Mallow’s Cp as a model selection criterion, select the model where Cp approaches p, the number of parameters in the model. In this example, three or four variables appear to be a good choice for a regression model.
Make Model When you click Make Model, the model seen in the Current Estimates table appears in the Model Specification dialog. For example, if you click Make Model after the backward selection in Figure 6.4, the Model Specification dialog appears as shown in Figure 6.4 without a fitting personality selection.
6 Stepwise Regression All Possible Regressions
105
All Possible Regressions Stepwise includes an All Possible Models command. It is accessible from the red triangle drop-down menu on the stepwise control panel (see Figure 6.5). Figure 6.5 All Possible Models
When selected, all possible models of the regression parameters are run, resulting in the report seen in Figure 6.6. Note that this report is for a three-variable model consisting of Runtime, RunPulse, and MaxPulse.
6 Stepwise Regression
Figure 6.4 New Model Specification dialog from Forward Stepwise Procedure
106
6 Stepwise Regression All Possible Regressions
Figure 6.6 All Models Report
The models are listed in decreasing order of the number of parameters they contain. The model with the highest R2 for each number of parameters is highlighted. We suggest that no more than about 15 variables be used with this platform. More may be possible, but can strain computer memory (and human patience). Note: Mallow’s Cp statistic is computed, but initially hidden in the table. To make it visible, Right-click (Control-click on the Macintosh) and select Columns > Cp from the menu that appears.
7 Control Charts Control charts are a graphical and analytic tool for deciding whether a process is in a state of statistical control and for monitoring an in-control process. This monitoring process is often called quality control or QC.
Introduction Control charts have the following characteristics: • Each point represents a summary statistic computed from a subgroup sample of measurements of a quality characteristic. • The vertical axis of a control chart is scaled in the same units as the summary statistic. • The horizontal axis of a control chart identifies the subgroup samples. • The center line on a Shewhart control chart indicates the average (expected) value of the summary statistic when the process is in statistical control. • The upper and lower control limits, labeled UCL and LCL, give the range of variation to be expected in the summary statistic when the process is in statistical control. • A point outside the control limits signals the presence of a special cause of variation. • Graph > Control Chart subcommands create control charts that can be updated dynamically as samples are received and recorded or added to the data table. out of control point UCL
measurement axis
centerline LCL subgroup sample axis
The following example uses the Coating.jmp data in the Quality Control sample data folder (taken from the ASTM Manual on Presentation of Data and Control Chart Analysis). The quality characteristic of interest is the Weight column. A subgroup sample of four is chosen. An X -chart and an R-chart for the process are shown in Figure 7.1. To replicate this example, Choose the Graph > Control Chart > XBar command.
108
7 Control Charts Introduction
Note the selected chart types of XBar and R. Specify Weight as the Process variable. Since our example has four samples in each subgroup, Change the Sample Size Constant from 5 to 4.
Click OK. Sample six indicates that the process is not in statistical control. To check the sample values, click the sample six summary point on either control chart. The corresponding rows highlight in the data table.
109
7 Control Charts Introduction
7 Quality Control with Control Charts
Figure 7.1 Variables Charts for Coating Data
{
You can use Fit Y by X for an alternative visualization of the data. First, change the modeling type of Sample to Nominal. Specify the interval variable Weight as Y and the nominal variable Sample as X. The box plots in Figure 7.2 show that the sixth sample has a small range of high values. Figure 7.2 Quantiles Option in Fit Y By X Platform
all values in sample six are high
110
7 Control Charts The Control Chart Launch Dialog
The Control Chart Launch Dialog When you select a Control Chart from the Graph > Control Chart menu (Figure 7.3), you see a Control Chart Launch dialog similar to the one in Figure 7.4. (The exact controls vary depending on the type of chart you choose.) Initially, the dialog shows three kinds of information: • process information, for measurement variable selection • chart type information • limits specification. Figure 7.3 Control Chart Menu
Specific information shown for each section varies according to the type of chart you request. Figure 7.4 Control Chart Launch Dialog process information
chart type information limits specification
enter or remove known statistics
Add capability analysis to report
Through interaction with the Launch dialog, you specify exactly how you want your charts created. The following sections describe the panel elements.
7 Control Charts The Control Chart Launch Dialog
111
The Launch dialog displays a list of columns in the current data table. Here, you specify the variables to be analyzed and the subgroup sample size. Process selects variables for charting. • For variables charts, specify measurements as the process. • For attribute charts, specify the defect count or defective proportion as the process. Sample Label enables you to specify a variable whose values label the horizontal axis and can also identify unequal subgroup sizes. If no sample label variable is specified, the samples are identified by their subgroup sample number. • If the sample subgroups are the same size, check the Sample Size Constant radio button and enter the size into the text box. If you entered a Sample Label variable, its values are used to label the horizontal axis. • If the sample subgroups have an unequal number of rows or have missing values and you have a column identifying each sample, check the Sample Grouped by Sample Label radio button and enter the sample identifying column as the sample label. For attribute charts (p-, np-, c-, and u-charts), this variable is the subgroup sample size. In Variables charts, it identifies the sample. When the chart type is IR, a Range Span text box appears. The range span specifies the number of consecutive measurements from which the moving ranges are computed. The illustration in Figure 7.5 shows an X -chart for a process with unequal subgroup sample sizes, using the Coating.jmp sample data from the Quality Control sample data folder.
7 Quality Control with Control Charts
Process Information
112
7 Control Charts The Control Chart Launch Dialog
Figure 7.5 Variables Charts with Unequal Subgroup Sample Sizes
Phase The Phase role enables you to specify a column identifying different phases, or sections. A phase is a group of consecutive observations in the data table. For example, phases might correspond to time periods during which a new process is brought into production and then put through successive changes. Phases generate, for each level of the specified Phase variable, a new sigma, set of limits, zones, and resulting tests.
113
7 Control Charts The Control Chart Launch Dialog
7 Quality Control with Control Charts
Chart Type Information Shewhart control charts are broadly classified as variables charts and attribute charts. Moving average charts and cusum charts can be thought of as special kinds of variables charts. X , r- and s-
IR
CUSUM • XBar charts menu selection gives XBar, R, and S checkboxes. • The IR menu selection has checkbox options for the Individual Measurement, Moving Range, and Median moving range charts. • The Cusum chart is a special chart for means or individual measurements.
114
7 Control Charts The Control Chart Launch Dialog
• P, NP, C, and U charts, and Run Charts, have no additional specifications.
Parameters You specify computations for control limits by entering a value for k (K Sigma) or by entering a probability for α(Alpha). There must be a specification of either K Sigma or Alpha. The dialog default for K Sigma is 3. K Sigma allows specification of control limits in terms of a multiple of the sample standard error. K Sigma specifies control limits at k sample standard errors above and below the expected value, which shows as the center line. To specify k, the number of sigmas, click K Sigma and enter a positive k value into the textbox. The usual choice for k is three, which is three standard deviations. The examples shown in Figure 7.6 compare the X -chart for the Coating.jmp data with control lines drawn with K Sigma = 3 and K Sigma = 4. Figure 7.6 K Sigma =3 (left) and K Sigma=4 (right) Control Limits
Alpha specifies control limits (also called probability limits) in terms of the probability α that a single subgroup statistic exceeds its control limits, assuming that the process is in control. To specify alpha, click the Alpha radio button and enter the probability you want. Reasonable choices for α are 0.01 or 0.001
Using Specified Statistics If you click the Specify Stats (when available) button on the Control Chart Launch dialog, a tab with editable fields is appended to the bottom of the launch dialog. This lets you enter historical statistics (statistics obtained from historical data) for the process variable. The Control Chart platform uses those entries to construct control charts. The example here shows 1 as the standard deviation of the process variable and 20 as the mean measurement.
7 Control Charts Tailoring the Horizontal Axis
115
If you check the Capability option on the Control Chart launch dialog (see Figure 7.4), a dialog appears as the platform is launched asking for specification limits. The standard deviation for the control chart selected is sent to the dialog and appears as a Specified Sigma value, which is the default option. After entering the specification limits and clicking OK, capability output appears in the same window next to the control chart.
Tailoring the Horizontal Axis When you double-click the x-axis, the X Axis Specification dialog appears for you to specify the format, axis values, number of ticks, gridline and reference lines to display on the x-axis. For example, the Pickles.JMP data lists eight measures a day for three days. In this example, by default, the x-axis is labeled at every other tick. Sometimes this gives redundant labels, as shown to the left in Figure 7.7. If you specify a label at an increment of eight, with seven ticks between them, the x-axis is labeled once for each day, as shown in the chart on the right. Figure 7.7 Example of Labeled x-Axis Tick Marks
Display Options Control Charts have popup menus that affect various parts of the platform: • The menu on the top-most title bar affects the whole platform window. Its items vary with the type of chart you select. • There is a menu of items on the chart type title bar with options that affect each chart individually.
7 Quality Control with Control Charts
Note: When the mean is user-specified, it is labeled in the plot as µ0.
116
7 Control Charts Display Options
Single Chart Options The popup menu of chart options appears when you click the icon next to the chart name, or right-click the chart space.
Box Plots superimposes box plots on the subgroup means plotted in a Mean chart. The box plot shows the subgroup maximum, minimum, 75th percentile, 25th percentile, and median. Markers for subgroup means show unless you deselect the Show Points option. The control limits displayed apply only to the subgroup mean. The Box Plots option is available only for X -charts. It is most appropriate for larger subgroup sample sizes (more than 10 samples in a subgroup). Needle connects plotted points to the center line with a vertical line segment. Connect Points toggles between connecting and not connecting the points. Show Points toggles between showing and not showing the points representing summary statistics. Initially, the points show. You can use this option to suppress the markers denoting subgroup means when the Box Plots option is in effect.
7 Control Charts Display Options
117
Connect Color displays the JMP-SE color palette for you to choose the color of the line segments used to connect points. Center Line Color displays the JMP-SE color palette for you to choose the color of the line segments used to draw the center line. Limits Color displays the JMP-SE color palette for you to choose the color of the line segments used in the upper and lower limits lines. Line Width allows you to pick the width of the control lines. Options are Thin, Medium, or Thick. Show Center Line initially displays the center line in green. Deselecting Show Center Line removes the center line and its legend from the chart. Show Control Limits toggles between showing and not showing the chart control limits and their legends. Tests shows a submenu that enables you to choose which tests to mark on the chart when the test is positive. Tests apply only for charts whose limits are 3σ limits. Tests 1 to 4 apply to Mean, Individual and attribute charts. Tests 5 to 8 apply to Mean charts and Individual Measurement charts only. If tests do not apply to a chart, the Tests option is dimmed. Tests apply, but will not appear for charts whose control limits vary due to unequal subgroup sample sizes, until the sample sizes become equal. These spe-
7 Quality Control with Control Charts
Figure 7.8 Box Plot Option and Needle Option for Airport.jmp Data
118
7 Control Charts Display Options
cial tests are also referred to as the Western Electric rules. For more information on special causes tests, see “Tests for Special Causes” on page 119 later in this chapter. Show Zones toggles between showing and not showing the zone lines with the tests for special causes. The zones are labeled A, B, and C as shown here in the Mean plot for weight in the Coating.jmp sample data. Control Chart tests use the zone lines as boundaries. The seven zone lines are set one sigma apart, centered on the center line.
Westgard Rules are detailed in a later section. See the text and chart in “Westgard Rules,” p. 122. Test Beyond Limits flags as a “*” any point that is beyond the limits. This test works on all charts with limits, regardless of the sample size being constant, and regardless of the size of k or the width of the limits. For example, if you had unequal sample sizes, and wanted to flag any points beyond the limits of an r-chart, you could use this command. OC Curve gives Operating Characteristic (OC) curves for specific control charts. OC curves are defined in JMP-SE only for X -, p-, np-, c-, and u-charts. The curve shows how the probability of accepting a lot changes with the quality of the sample. When you choose the OC Curve option from the control chart option list, JMP-SE opens a new window containing the curve, using all the calculated values directly from the active control chart. Alternatively, you can run an OC curve directly from the QC tab on the JMP-SE Starter window. Select the chart on which you want the curve based, then a dialog prompts you for Target, LCL, UCL, K, Sigma, and sample size.
Window Options The popup menu on the window title bar lists options that affect the report window. The example menu shown here appears if you request XBar and R at the same time. You can check each chart to show or hide it.
7 Control Charts Tests for Special Causes
119
The following options show for all control charts except Run charts: Show Limits Legend shows or hides the Avg, UCL, and LCL values to the right of the chart. Connect thru Missing connects points when some samples have missing values. The left-hand
chart in Figure 7.9 is a control chart with no missing points. The middle chart has samples 8, 9, and 10 missing with the points not connected. The right-hand chart appears if you use the Connect thru Missing option, which is the default. Capability
launches a capability analysis. Details are found in Figure 2.20 on page 39.
Figure 7.9 Example of Connect thru Missing Option
has a submenu of commands available to all platforms that let you redo the analysis or save the JSL commands for the analysis to a window or a file.
Script
Tests for Special Causes The Tests option in the chart type popup menu displays a submenu for test selection. You can select one or more tests for special causes with the options popup menu. Nelson (1984) developed the numbering notation used to identify special tests on control charts. If a selected test is positive, the last point in the test sequence is labeled with the test number, where the sequence is the moving set of points evaluated for that particular test. When you select several tests for display and more than one test signals at a particular point, the label of the numerically lowest test specified appears beside the point.
7 Quality Control with Control Charts
The specific options that are available depend on the type of control chart you request. Unavailable options show as grayed menu items.
120
7 Control Charts Tests for Special Causes
Western Electric Rules Western Electric rules are implemented in the Tests submenu. Table 7.1 on page 120 lists and interprets the eight tests, and Figure 7.10 illustrates the tests. The following rules apply to each test: • The area between the upper and lower limits is divided into six zones, each with a width of one standard deviation. • The zones are labeled A, B, C, C, B, A with zones C nearest the center line. • A point lies in Zone B or beyond if it lies beyond the line separating zones C and B. That is, if it is more than one standard deviation from the centerline. • Any point lying on a line separating two zones lines is considered belonging to the outermost zone. Note: All Tests and zones require equal sample sizes in the subgroups of nonmissing data. Tests 1 through 8 apply to Mean ( X ) and individual measurement charts. Tests 1 through 4 can also apply to p-, np-, c-, and u-charts. Tests 1, 2, 5, and 6 apply to the upper and lower halves of the chart separately. Tests 3, 4, 7, and 8 apply to the whole chart. See Nelson (1984, 1985) for further recommendations on how to use these tests. 3σ limits zones centerline
Nelson (1984, 1985) Table 7.1 Description and Interpretation of Special Causes Tests Test 1 One point beyond Zone A detects a shift in the mean, an increase in the standard deviation, or a single aberration in the process. For interpreting Test 1, the R-chart can be used to rule out increases in variation. Test 2 Nine points in a row in a sin- detects a shift in the process mean. gle (upper or lower) side of Zone C or beyond Test 3 Six points in a row steadily detects a trend or drift in the process increasing or decreasing mean. Small trends will be signaled by this test before Test 1. Test 4 Fourteen points in a row detects systematic effects such as two alteralternating up and down nately used machines, vendors, or operators.
7 Control Charts Tests for Special Causes
7 Quality Control with Control Charts
Table 7.1 Description and Interpretation of Special Causes Tests Test 5 Two out of three points in a detects a shift in the process average or row in Zone A or beyond increase in the standard deviation. Any two out of three points provide a positive test. Test 6 Four out of five points in detects a shift in the process mean. Any Zone B or beyond four out of five points provide a positive test. Test 7 Fifteen points in a row in detects stratification of subgroups when Zone C, above and below the the observations in a single subgroup center line come from various sources with different means. Test 8 Eight points in a row on both detects stratification of subgroups when sides of the center line with the observations in one subgroup come none in Zones C from a single source, but subgroups come from different sources with different means.
121
122
7 Control Charts Tests for Special Causes
Figure 7.10 Illustration of Special Causes Tests Test 1: One point beyond Zone A UCL A B C C B A
Avg
LCL 1 Test 3: Six points in a row steadily increasing or decreasing A 3 UCL B C C B A
Avg LCL
Test 5: Two out of three points in a row in Zone A or beyond UCL A 5 5 B C C B A
Avg 5
LCL
Test 7: Fifteen points in a row in Zone C (above and below the centerline) UCL A B C C B A
7 Avg LCL
Test 2: Nine points in a row in a single (upper or lower) side of Zone C or beyond UCL A B C C B A
2
Avg LCL
Test 4: Fourteen points in a row alternating up and down A B C C B A
4
UCL Avg LCL
Test 6: Four out of five points in a row in Zone B or beyond UCL A 6 B C C B A
Avg LCL
Test 8: Eight points in a row on both sides of the centerline with none in Zone C UCL A B C C B A
Avg 8 LCL
Nelson (1984, 1985)
Westgard Rules Westgard rules are implemented under the Westgard Rules submenu of the Control Chart platform. The different tests are abbreviated with the decision rule for the particular test. For example, 1 2s refers to a test that one point is two standard deviations away from the mean.
123
7 Control Charts Tests for Special Causes
Table 7.2 Westgard Rules Rule 1 2s is commonly used with Levey-Jennings plots, where control limits are set 2 standard deviations away from the mean. The rule is triggered when any one point goes beyond these limits.
+3s +2s +1s -1s -2s -3s
UCL Avg LCL
Rule 1 3s refers to a rule common to +3s Levey-Jennings plots where the control limits are set 3 standard deviations away +2s +1s from the mean. The rule is triggered when any one point goes beyond these limits. -1s
Rule 2 2s is triggered when two consecu-
LCL
+3s
UCL
-1s -2s -3s
ment in a group is two standard deviations above the mean and the next is two standard deviations below.
Avg
-2s -3s
tive control measurements are farther than +2s +1s two standard deviations from the mean.
Rule 4s is triggered when one measure-
UCL
+3s +2s +1s -1s -2s -3s
Rule 4 1s is triggered when four consecu- +3s +2s tive measurements are more than one +1s standard deviation from the mean. -1s -2s -3s Rule 10 X is triggered when ten consecu- +3s +2s tive points are on one side of the mean. +1s -1s -2s -3s
Avg LCL UCL Avg LCL UCL Avg LCL UCL Avg LCL
7 Quality Control with Control Charts
Because Westgard rules are based on sigma and not the zones, they can be computed without regard to constant sample size.
124
7 Control Charts Excluded, Hidden, and Deleted Samples
Excluded, Hidden, and Deleted Samples The following table summarizes the effects of various conditions on samples and subgroups: Table 7.3 Excluded, Hidden, and Deleted Samples Sample is excluded before creating the Sample is not included in the calculation of the limits, but chart. it appears on the graph. Sample is excluded after creating the Sample is included in the calculation of the limits, and it chart. appears in the graph. Nothing will change on the output by excluding a sample with the graph open. Sample is hidden before creating the Sample is included in the calculation of the limits, but does chart. not appear on the graph. Sample is hidden after creating the Sample is included in the calculation of the limits, but does chart. not appear on the graph. The sample marker will disappear from the graph, the sample label will still appear on the axis, but limits remain the same. Sample is both excluded and hidden Sample is not included in the calculation of the limits, and before creating the chart. it does not appear on the graph. Sample is both excluded and hidden Sample is included in the calculation of the limits, but does after creating the chart. not appear on the graph. The sample marker will disappear from the graph, the sample label will still appear on the axis, but limits remain the same. Data set is subsetted with Sample Sample is not included in the calculation of the limits, the deleted before creating chart. axis will not include a value for the sample, and the sample marker does not appear on the graph. Data set is subsetted with Sample Sample is not included in the calculation of the limits, and deleted after creating chart. does not appear on the graph. The sample marker will disappear from the graph, the sample label will still be removed from the axis, the graph will shift, and the limits will change. Some additional notes: 1 Exclude and Hide operate only on the row state of the first observation in the sample. For example, if the second observation in the sample is hidden, while the first observation is not hidden, the sample will still appear on the chart. 2 An exception to the exclude/hide rule: Tests for Special Causes can flag if a sample is excluded, but will not flag if a sample is hidden.
Shewhart Control Charts Shewhart control charts are broadly classified into control charts for variables and control charts for attributes. Moving average charts are special kinds of control charts for variables. The Control Chart platform in JMP-SE implements a variety of control charts:
7 Control Charts Shewhart Control Charts for Variables
125
• Individual and Moving Range charts, • p-, np-, c-, and u-charts, • Phase Control Charts for X -, r-, IR-, p-, np-, c-, and u- charts One feature special to Control Charts, different from other platforms in JMP-SE, is that they update dynamically as data is added or changed in the table.
Shewhart Control Charts for Variables Control charts for variables are classified according to the subgroup summary statistic plotted on the chart: • X -charts display subgroup means (averages) • R-charts display subgroup ranges (maximum – minimum) • S-charts display subgroup standard deviations. • Run charts display data as a connected series of points. The IR selection gives two additional chart types: • Individual Measurement charts display individual measurements • Moving Range charts display moving ranges of two or more successive measurements.
XBar-, R-, and S- Charts For quality characteristics measured on a continuous scale, a typical analysis shows both the process mean and its variability with a mean chart aligned above its corresponding R- or S-chart. Or, if you are charting individual measurements, the individual measurement chart shows above its corresponding moving range chart.s Example. X - and S-charts with varying subgroup sizes This example uses the same data as example 1, Coating.jmp, in the Quality Control sample data folder. This time the quality characteristic of interest is the Weight 2 column. An X -chart and an S chart for the process are shown in Figure 7.11. To replicate this example, • Choose the Graph > Control Chart > XBar command. • Select the chart types of XBar and S. • Specify Weight 2 as the Process variable. • Specify the column, Sample as the Sample Label variable. • The Sample Size option should automatically change to Sample Grouped by Sample Label. • Click OK.
7 Quality Control with Control Charts
• X -, R-, and S-charts,
126
7 Control Charts Shewhart Control Charts for Variables
Figure 7.11 X and S charts for Varying Subgroup Sizes
Weight 2 has several missing values in the data, so you may notice the chart has uneven limits.
Although, each sample has the same number of observations, samples 1, 3, 5, and 7 each have a missing value. Note: Although they will turn on and appear checked, no zones or tests will appear on the chart until all samples are equally sized, as neither are valid on charts with unequally sized samples. If the samples change while the chart is open and they become equally sized, and the zone and/or test option is selected, the zones and/or tests will be applied immediately and appear on the chart.
Run Charts Run charts display a column of data as a connected series of points. The following example is a Run chart for the Weight variable from Coating.jmp.
7 Control Charts Shewhart Control Charts for Variables
127
When you select the Show Center Line option in the Run Chart drop-down, a line is drawn through the center value of the column.The center line is determined by the Use Median setting of the platform drop-down. When Use Median is selected, the median is used as the center line. Otherwise, the mean is used. When saving limits to a file, both the overall mean and median are saved. Run charts can also plot the group means when a sample label is given, either on the dialog or through a script.
Individual Measurement Charts Individual Measurement Chart Type displays individual measurements. Individual Measurement charts are appropriate when only one measurement is available for each subgroup sample. Moving Range Chart Type displays moving ranges of two or more successive measurements. Moving
ranges are computed for the number of consecutive measurements you enter in the Range Span box. The default range span is 2. Because moving ranges are correlated, these charts should be interpreted with care. Example. Individual Measurement and Moving Range Charts The Pickles.jmp data in the Quality Control sample data folder contains the acid content for vats of pickles. Because the pickles are sensitive to acidity and produced in large vats, high acidity ruins an entire pickle vat. The acidity in four vats is measured each day at 1, 2, and 3 PM. The data table records day, time, and acidity measurements. The dialog in Figure 7.13 creates Individual Measurement and Moving Range charts with date labels on the horizontal axis.
7 Quality Control with Control Charts
Figure 7.12 Run Chart
128
7 Control Charts Shewhart Control Charts for Variables
Figure 7.13 Launch Dialog for Individual Measurement and Moving Range Chart
To complete this example, • Choose the Graph > Control Chart > IR command. • Select both Individual Measurement and Moving Range chart types. • Specify Acid as the Process variable. • Specify Date as the Sample Label variable. • Click OK. The individual measurement and moving range charts shown in Figure 7.14 monitor the acidity in each vat produced.
129
7 Control Charts Shewhart Control Charts for Attributes
7 Quality Control with Control Charts
Figure 7.14 Individual Measurement and Moving Range Charts for Pickles Data
Note: If you choose a Median Moving range chart, the limits on the Individuals chart use the Median Moving Range as the sigma, rather than the Average Moving Range.
Shewhart Control Charts for Attributes In the previous types of charts, measurement data was the process variable. This data is often continuous, and the charts are based on continuous theory. Another type of data is count data, where the variable of interest is a discrete count of the number of defects or blemishes per subgroup. For discrete count data, attribute charts are applicable, as they are based on binomial and poisson models. Since the counts are measured per subgroup, it is important when comparing charts to determine whether you have similar number of items in the subgroups between the charts. Attribute charts, like variables charts, are classified according to the subgroup sample statistic plotted on the chart: Table 7.4 Determining which Attribute Chart to use Each item is judged as either conforming or For each item, the number of defects is counted non-conforming The subgroups are a The subgroups vary in The subgroups are a The subgroups vary in constant size size constant size size np-chart
p-chart
c-Chart
u-chart
130
7 Control Charts Shewhart Control Charts for Attributes
• p-charts display the proportion of nonconforming (defective) items in subgroup samples which can vary in size. Since each subgroup for a p-chart consists of N items, and an item is judged as either conforming or nonconforming, the maximum number of nonconforming items in a subgroup is N. • np-charts display the number of nonconforming (defective) items in constant sized subgroup samples. Since each subgroup for a np-chart consists of Ni items, and an item is judged as either conforming or nonconforming, the maximum number of nonconforming items in subgroup i is Ni. • c-charts display the number of nonconformities (defects) in a subgroup sample that usually consists of one inspection unit. • u-charts display the number of nonconformities (defects) per unit in subgroup samples that can have a varying number of inspection units.
p- and np-Charts Example. np-Charts The Washers.jmp data in the Quality Control sample data folder contains defect counts of 15 lots of 400 galvanized washers. The washers were inspected for finish defects such as rough galvanization and exposed steel. If a washer contained a finish defect, it was deemed nonconforming or defective. Thus, the defect count represents how many washers were defective for each lot of size 400. To replicate this example, follow these steps: • Choose the Graph > Control Chart > NP command. • Choose # defects as the Process variable. • Change the Constant Size to 400. • Click OK. The example here illustrates an np-chart for the number of defects. Figure 7.15 np-Chart
Example. p-Charts Again, using the Washers.jmp data, we can specify a sample size variable, which would allow for varying sample sizes.
7 Control Charts Shewhart Control Charts for Attributes
131
• Choose the Graph > Control Chart > P command. • Choose Lot as the Sample Label variable. • Choose # defects as the Process variable. • Choose Lot Size as the Sample Size variable. • Click OK. The chart shown here illustrates a p-chart for the proportion of defects. Figure 7.16 p-Chart
Note that although the points on the chart look the same as the np-chart, the y-axis, Avg and limits are all different since they are now based on proportions
u-Charts The Braces.jmp data in the Quality Control sample data folder records the defect count in boxes of automobile support braces. A box of braces is one inspection unit. The number of boxes inspected (per day) is the subgroup sample size, which can vary. The u-chart, shown here, is monitoring the number of brace defects per subgroup sample size. The upper and lower bounds vary according to the number of units inspected. Note:When you generate a u-chart, and select Capability, JMP-SE launches the Poisson Fit in Distri-
bution and gives a Poisson-specific capability analysis.
7 Quality Control with Control Charts
Note: This data contains all constant sample sizes. Follow these steps or submit the JSL script below:
132
7 Control Charts Shewhart Control Charts for Attributes
Figure 7.17 u-Chart
Example. u-Charts To replicate this example, follow these steps or submit the JSL below. • Open the Braces.jmp data in the Quality Control sample data folder. • Choose the Graph > Control Chart > U command. • Choose # defects as the Process variable. • Choose Unit size as the Unit Size variable. • Choose Date as the Sample Label. • Click OK.
c-Charts c-charts are similar to u-charts in that they monitor the number of nonconformities in an entire subgroup, made up of one or more units. However, they require constant subgroup sizes. c-charts can also be used to monitor the average number of defects per inspection unit. Note:When you generate a c-chart, and select Capability, JMP-SE launches the Poisson Fit in Distribu-
tion and gives a Poisson-specific capability analysis. Example 10. c-Charts for Noncomformities per Unit In this example, a clothing manufacturer ships shirts in boxes of ten. Prior to shipment, each shirt is inspected for flaws. Since the manufacturer is interested in the average number of flaws per shirt, the number of flaws found in each box is divided by ten and then recorded. To replicate this example, follow these steps or submit the JSL below. • Open the Shirts.jmp data in the Quality Control sample data folder. • Choose the Graph > Control Chart > C command. • Choose # Defects as the Process variable. • Choose Box Size as the Sample Size. • Choose Box as the Sample Label.
7 Control Charts Phases
133
Phases A phase is a group of consecutive observations in the data table. For example, phases might correspond to time periods during which a new process is brought into production and then put through successive changes. Phases generate, for each level of the specified Phase variable, a new sigma, set of limits, zones, and resulting tests. On the dialog for X -, r-, s-, IR-, p-, np-, c-, u-, Presummarized, and Levey-Jennings charts, a Phase variable button appears. If a phase variable is specified, the phase variable is examined, row by row, to identify to which phase each row belongs. Saving to a limits file reveals the sigma and specific limits calculated for each phase.
Example Open Diameter.JMP, found in the Quality Control sample data folder. This data set contains the diameters taken for each day, both with the first prototype (phase 1) and the second prototype (phase 2). • Select Graph > Control Chart > XBar. • Choose DIAMETER as the Process, DAY as the Sample Label, and Phase as the Phase. • Click OK.
7 Quality Control with Control Charts
• Click OK. Figure 7.18 c-Chart
134
7 Control Charts Phases
Figure 7.19 Launch Dialog for Phases
The resulting chart has different limits for each phase Figure 7.20 Phase Control Chart
135
7 Control Charts Cumulative Sum (Cusum) Charts
Cumulative Sum (Cusum) charts display cumulative sums of subgroup or individual measurements from a target value. Cusum charts are graphical and analytical tools for deciding whether a process is in a state of statistical control and for detecting a shift in the process mean. JMP cusum charts can be one-sided, which detect a shift in one direction from a specified target mean, or two-sided to detect a shift in either direction. Both charts can be specified in terms of geometric parameters (h and k described in Figure 7.21); two-sided charts allow specification in terms of error probabilities α and β. To interpret a two-sided Cusum chart, you compare the points with limits that compose a V-mask. A V-mask is formed by plotting V-shaped limits. The origin of a V-mask is the most recently plotted point, and the arms extended backward on the x-axis, as in Figure 7.21. As data are collected, the cumulative sum sequence is updated and the origin is relocated at the newest point. Figure 7.21 Illustration of a V-Mask for a Two-Sided Cusum Chart upper arm
lower arm
1 unit
vertex
d
h the rise in the arm corresponding to the distance (d) from origin to vertex
k, the rise in the arm corresponding to one sampling unit
Shifts in the process mean are visually easy to detect on a cusum chart because they produce a change in the slope of the plotted points. The point where the slope changes is the point where the shift occurs. A condition is out-of-control if one or more of the points previously plotted crosses the upper or lower arm of the V-mask. Points crossing the lower arm signal an increasing process mean, and points crossing the upper arm signal a downward shift. There are major differences between cusum charts and other control (Shewhart) charts: • A Shewhart control chart plots points based on information from a single subgroup sample. In cusum charts, each point is based on information from all samples taken up to and including the current subgroup. • On a Shewhart control chart, horizontal control limits define whether a point signals an out-of-control condition. On a cusum chart, the limits can be either in the form of a V-mask or a horizontal decision interval.
7 Quality Control with Control Charts
Cumulative Sum (Cusum) Charts
136
7 Control Charts Cumulative Sum (Cusum) Charts
• The control limits on a Shewhart control chart are commonly specified as 3σ limits. On a cusum chart, the limits are determined from average run length, from error probabilities, or from an economic design. A cusum chart is more efficient for detecting small shifts in the process mean. Lucas (1976) comments that a V-mask detects a 1σ shift about four times as fast as a Shewhart control chart.
Launch Options for Cusum Charts When you choose Graph > Control Charts > Cusum, the Control Charts Launch dialog appears, including appropriate options and specifications as shown here.
Note: The following items pertain only to cusum charts: Two Sided requests a two-sided cusum scheme when checked. If it is not checked, a one-sided scheme is used and no V-mask appears. If an H value is specified, a decision interval is displayed. Data Units specifies that the cumulative sums be computed without standardizing the subgroup means or individual values so that the vertical axis of the cusum chart is scaled in the same units as the data. Note: Data Units requires that the subgroup sample size be designated as constant. Beta specifies the probability of failing to discover that the specified shift occurred. Beta is the probability of a Type II error and is available only when you specify Alpha. H is the vertical distance h between the origin for the V-mask and the upper or lower arm of the V-mask for a two-sided scheme. When you click H, the Beta entry box is labeled K. You also enter a value for the increase in the lower V-mask per unit change on the subgroup axis (See Figure 7.26). For a one-sided scheme, H is the decision interval. Choose H as a multiple of the standard error. Specify Stats appends the panel shown here to the Control Charts Launch dialog, which lets you enter the process variable specifications.
7 Control Charts Cumulative Sum (Cusum) Charts
137
is the target mean (goal) for the process or population. The target mean must be scaled in the same units as the data. Delta
specifies the absolute value of the smallest shift to be detected as a multiple of the process standard deviation or of the standard error, depending on whether the shift is viewed as a shift in the population mean or as a shift in the sampling distribution of the subgroup mean, respectively. Delta is an alternative to the Shift option (described next). The relationship between Shift and Delta is given by ∆ δ = -----------------------(σ ⁄ ( n)) where δ represents Delta, ∆ represents the shift, σ represents the process standard deviation, and n is the (common) subgroup sample size. Shift
is the minimum value you want to detect on either side of the target mean. You enter the shift value in the same units as the data, and you interpret it as a shift in the mean of the sampling distribution of the subgroup mean. You can choose either Shift or Delta. Sigma
specifies a known standard deviation, σ0, for the process standard deviation, σ. By default, the Control Chart platform estimates sigma from the data. You can use Sigma instead of the Alpha option on the Control Charts Launch dialog. Head Start
specifies an initial value for the cumulative sum, S0, for a one-sided cusum scheme (S0 is usually zero). Enter Head Start as a multiple of standard error.
Cusum Chart Options Cusum charts have these options (in addition to standard chart options). Show Points shows or hides the sample data points. Connect Points connects the sample points with a straight line.
7 Quality Control with Control Charts
Target
138
7 Control Charts Cumulative Sum (Cusum) Charts
Mask Color displays the JMP color palette for you to select a line color for the V-mask. Connect Color displays the JMP color palette for you to select a color for the connect line when the Connect Points option is in effect. Center Line Color displays the JMP color palette for you to select a color for the center line. Show Shift shows or hides the shift you entered, or center line. Show V Mask shows or hides the V-mask based on the parameters (statistics) specified on the Control Charts Launch dialog when Cusum is selected as the Chart Type. Show Parameters displays a Parameters table (see Figure 7.26) that summarizes the Cusum charting parameters. Show ARL displays the average run length (ARL) information. Example 1. Two-Sided Cusum Chart with V-mask To see an example of a two-sided cusum chart, open the Oil1 Cusum.jmp file from the Quality Control sample data folder. A machine fills 8-ounce cans of two-cycle engine oil additive. The filling process is believed to be in statistical control. The process is set so that the average weight of a filled can, µ0, is 8.10 ounces. Previous analysis shows that the standard deviation of fill weights, σ0, is 0.05 ounces. Subgroup samples of four cans are selected and weighed every hour for twelve hours. Each observation in the Oil1 Cusum.jmp data table contains one value of weight along with its associated value of hour. The observations are sorted so that the values of hour are in increasing order. The Control Chart platform assumes that the data are sorted in increasing order. A two-sided cusum chart is used to detect shifts of at least one standard deviation in either direction from the target mean of 8.10 ounces. To create a Cusum chart for this example, • Choose the Graph > Control Chart > CUSUM command. • Click the Two Sided check box if it is not already checked. • Specify weight as the Process variable. • Specify hour as the Sample Label. • Click the H radio button and enter 2 into the text box.
7 Control Charts Cumulative Sum (Cusum) Charts
139
• Set Target to the average weight of 8.1. • Enter a Delta value of 1. • Set Sigma to the standard deviation of 0.05. The finished dialog should look like the one in Figure 7.22. Figure 7.22 Dialog for Cusum Chart Example
When you click OK, the chart in Figure 7.23 appears. Figure 7.23 Cusum Chart for Oil1 Cusum.jmp Data
You can interpret the chart by comparing the points with the V-mask whose right edge is centered at the most recent point (hour=12). Because none of the points cross the arms of the V-mask, there is no evidence that a shift in the process has occurred.
7 Quality Control with Control Charts
• Click Specify Stats to open the Known Statistics for CUSUM chart tab.
140
7 Control Charts Cumulative Sum (Cusum) Charts
A shift or out-of-control condition is signaled at a time t if one or more of the points plotted up to the time t cross an arm of the V-mask. An upward shift is signaled by points crossing the lower arm, and a downward shift is signaled by points crossing the upper arm. The time at which the shift occurred corresponds to the time at which a distinct change is observed in the slope of the plotted points. The cusum chart automatically updates when you add new samples. The Cusum chart in Figure 7.24 is the previous chart with additional points. You can move the origin of the V-mask by using the hand to click a point. The center line and V-mask adjust to reflect the process condition at that point. Figure 7.24 Updated Cusum Chart for the OIL Data
Example 2. One-Sided Cusum Chart with no V-mask Consider the data used in Example 1, where the machine fills 8-ounce cans of engine oil. Consider also that the manufacturer is now concerned about significant over-filling in order to cut costs, and not so concerned about under-filling. A one-sided Cusum Chart can be used to identify data approaching or exceeding the side of interest. Anything 0.25 ounces beyond the mean of 8.1 is considered a problem. To do this example, • Choose the Graph > Control Chart > CUSUM command. • Deselect the Two Sided check box. • Specify weight as the Process variable. • Specify hour as the Sample Label. • Click the H radio button and enter 0.25 into the text box. • Click Specify Stats to open the Known Statistics for CUSUM chart tab. • Set Target to the average weight of 8.1. • Enter a Delta value of 1. • Set Sigma to the standard deviation 0.05. The resulting output should look like the picture in Figure 7.25.
7 Control Charts Cumulative Sum (Cusum) Charts
141
Notice that the decision interval or horizontal line is set at the H-value entered (0.25). Also note that no V-mask appears with One-Sided Cusum charts. The Show Parameters option in the Cusum chart popup menu shows the Parameters report in Figure 7.26. The parameters report summarizes the charting parameters from the Known Statistics for CUSUM chart tab on the Control Chart Launch dialog. An additional chart option, Show ARL, adds the average run length (ARL) information to the report. The average run length is the expected number of samples taken before an out-of-control condition is signaled: • ARL (Delta), sometimes denoted ARL1, is the average run length for detecting a shift the size of the specified Delta • ARL(0), sometimes denoted ARL0, is the in-control average run length for the specified parameters (Montogomery (1985)). Figure 7.26 Show Parameters and Show ARL Options
7 Quality Control with Control Charts
Figure 7.25 One-Sided Cusum Chart for the OIL Data
8 Time Series The Time Series platform lets you explore, analyze, and forecast univariate time series. A time series is a set y1, y2, ... ,yN of observations taken over a series of equally-spaced time periods. The analysis begins with a plot of the points in the time series. In addition, the platform displays graphs of the autocorrelations and partial autocorrelations of the series. These indicate how and to what degree each point in the series is correlated with earlier values in the series and can be used to identify the type of model appropriate for describing and predicting (forecasting) the evolution of the time series. The model types include • ARIMA, autoregressive integrated moving-average, often called Box-Jenkins models • Smoothing Models, several forms of exponential smoothing and Winter’s method. Note: The Time Series Launch dialog requires that one or more continuous variables be assigned as the time series. Optionally, you can specify a time ID variable, which is used to label the time axis. If a time ID variable is specified, it must be continuous, sorted ascending, evenly spaced, and without missing values.
Introduction The data for the next examples are in the Seriesg.jmp table found in the Time Series sample data folder (Box and Jenkins 1976). The time series variable is Passengers and the time ID is Time. Select Analyze > Time Series to display the Time Series Launch dialog (Figure 8.1). This dialog allows you to specify the number of lags to use in computing the autocorrelations and partial autocorrelations. It also lets you specify the number of future periods to forecast using each model fitted to the data. For this example, assign Passengers as Y, Time Series and Time as X, Time ID. Figure 8.1 Launch Dialog
144
8 Time Series Introduction
The first thing you see is a graph showing the time series, its autocorrelation graph, and its partial autocorrelation graph. Figure 8.2 Initial Time Series Report
The graph shows that the series has an increasing spread over time. This should be accounted for before modeling the series. In general, increasing variances are transformed using logarithms. A column containing a count of Log Passengers is already included in the table. Again select Analyze > Time Series Assign Log Passengers as Y, Time Series and Time as X, Time ID. The series now has an acceptable appearance for modeling.
8 Time Series Introduction
145
Since the autocorrelation graph decreases slowly and steadily, but the partial autocorrelation graph drops off drastically after lag 1, a reasonable guess for a model is an MA(1). To try this model, Select ARIMA from the platform menu. Enter a 1 beside q, Moving Average Order. Click Estimate. JMP Student Edition estimates the model and displays a model summary, parameter estimates, and a forecast graph. The most important graph, however, is the residuals, which is initially closed.
8 Time Series
Figure 8.3 Log Passenger Series
146
8 Time Series Introduction
Figure 8.4 Model Results
Open the Residuals node to reveal a graph and autocorrelation plots for the model residuals. Figure 8.5 MA(1)Model Results
8 Time Series The Time Series Platform
147
Select ARIMA from the platform menu. Enter a 2 beside q, Moving Average Order. Click Estimate. Similar unsatisfactory results appear. However, note that JMP is accumulating a list of models, along with appropriate fit statistics, in the Model Comparison table. Figure 8.6 Model Comparison Table
Examine the R2 for the two models in this table. In fact, the MA(2) is a worse fit than the MA(1). Some reflection is necessary.
The Time Series Platform First, assign columns for analysis with the dialog in Figure 8.1. The selector list at the left of the dialog shows all columns in the current table. To cast a column into a role, select one or more columns in the column selector list and click a role button. Or, drag variables from the column selector list to one of the following role boxes: X, Time ID for the x-axis, one variable used for labeling the time axis Y, Time Series for the y-axis, one or more time series variables.
To remove an unwanted variable from an assigned role, select it in the role box and click Remove. After assigning roles, click OK to see the analysis for each time series variable versus the time ID. You set the number of lags for the autocorrelation and partial autocorrelation plots in the Autocorrelation Lags box. This is the maximum number of periods between points used in the com-
putation of the correlations. It must be more than one but less than the number of rows. A commonly used rule of thumb for the maximum number of lags is n/4, where n is the number of observations. The Forecast Periods box allows you to set the number of periods into the future that the fitted models are forecast. By default, JMP uses 25 lags and 25 forecast periods
The Time Series Graph The Time Series platform begins with a plot of each times series by the time ID, or row number if no time ID is specified (Figure 8.7). The plot, like others in JMP, has features to resize the graph, highlight points with the cursor or brush tool, and label points.
8 Time Series
The expected reduction in spikes did not occur, so an MA(1) is not an appropriate model. A second model, an MA(2), can be run in the same way.
148
8 Time Series Time Series Commands
Figure 8.7 Time Series Plot of Seriesg (Airline Passenger) Data
By default, graphs of the autocorrelation and partial autocorrelation (Figure 8.5) of the time series are also shown, but can be hidden with commands from the platform popup menu on the Time Series title bar. The platform popup menu, discussed next, also has fitting commands and options for displaying additional graphs and statistical tables.
Time Series Commands The popup menu next to the time series name has the commands shown here.
The first three items in this menu control the descriptive and diagnostic graphs and tables. These are typically used to determine the nature of the model to be fitted to the series. The ARIMA and Smoothing Model commands are for fitting various models to the data and producing forecasts. You can select the model fitting commands repeatedly. The result of each new fit is appended to the report. After the first model has been fit, a summary of all the models is inserted just above the first model report (an example is shown in “Model Comparison Table,” p. 151). The following sections describe options and model fits, discuss statistical results, and cover additional platform features.
8 Time Series Time Series Commands
149
The Time Series platform begins by showing a time series plot, like the one shown previously in Figure 8.7. The Graph command on the platform popup menu has a submenu of controls for the time series plot with the following commands. • Time Series Graph hides or displays the time series graph. • Show Points hides or displays the points in the time series graph. • Connecting Lines hides or displays the lines connecting the points in the time series graph. • Mean Line hides or displays a horizontal line in the time series graph that depicts the mean of the time series.
Autocorrelation The Autocorrelation command alternately hides or displays the autocorrelation graph of the sample, often called the sample autocorrelation function. This graph describes the correlation between all the pairs of points in the time series with a given separation in time or lag. By definition, the first autocorrelation (lag 0) always has length 1. In addition, confidence curves show twice the large-lag standard error (± 2 standard errors). The autocorrelation plot for the Seriesg data is shown on the left in Figure 8.8. You can examine the autocorrelation and partial autocorrelations plots to determine whether the time series is stationary (meaning it has a fixed mean and standard deviation over time) and what model might be appropriate to fit the time series.
Partial Autocorrelation The Partial Autocorrelation command alternately hides or displays the graph of the sample partial autocorrelations. The plot on the right in Figure 8.8 shows the partial autocorrelation function for the Seriesg data. The solid black lines represent ± 2 standard errors for approximate 95% confidence limits
8 Time Series
Graph
150
8 Time Series Modeling Reports
Figure 8.8 Autocorrelation and Partial Correlation Plots
Number of Forecast Periods The Number of Forecast Periods command displays a dialog for you to reset the number of periods into the future that the fitted models will forecast. The initial value is set in the Time Series Launch dialog. All existing and future forecast results will show the new number of periods with this command.
Modeling Reports The time series modeling commands are used to fit theoretical models to the series and use the fitted model to predict (forecast) future values of the series. These commands also produce statistics and residuals that allow you to ascertain the adequacy of the model you have elected to use. You can select the modeling commands repeatedly. Each time you select a model, a report of the results of the fit and a forecast is added to the platform results. The fit of each model begins with a dialog that lets you specify the details of the model being fit as well as how it will be fit. Each general class of models has its own dialog, as discussed previously in their respective sections. The models are fit by maximizing the likelihood function, using a Kalman filter to compute the likelihood function. The ARIMA, seasonal ARIMA, and smoothing models begin with the following report tables.
8 Time Series Modeling Reports
151
The Model Comparison table summarizes the fit statistics for each model. You can use it to compare several models fitted to the same time series. Each row corresponds to a different model. The numerical values in the table are drawn from the Model Summary table for each fitted model. The Model Comparison table shown above summarizes the ARIMA models (1, 0, 0), (0, 0, 1), and (1, 0, 1) respectively.
Model Summary Table Each model fit generates a Model Summary table, which summarizes the statistics of the fit. In the formulae below, n is the number of nonmissing observations and k is the number of fitted parameters in the model. • DF is the number of degrees of freedom in the fit, n – k. • Sum of Squared Errors is the sum of the squares of the prediction errors, SSE. • Variance Estimate is the unconditional sum of squares (SSE) divided by the number of degrees of freedom, SSE / (n – k). This is the sample estimate of the variance of the random shocks at, described in the section “ARIMA Model,” p. 154. • Standard Deviation is the square root of the variance estimate. This is a sample estimate of the standard deviation of at, the random shocks • Akaike’s Information Criterion [AIC], Schwartz’s Bayesian Criterion [SBC or BIC] are goodness of fit statistics, detailed in the online help. Smaller values of these criteria indicate better fit. • RSquare and RSquare Adj are also goodness of fit statistics, where values closer to 1 indicate a better fit. • –2LogLikelihood is minus two times the natural log of the likelihood function evaluated at the best-fit parameter estimates. Smaller values are better fits. • Stable indicates whether the autoregressive operator is stable. That is, whether all the roots of φ ( z ) = 0 lie outside the unit circle. • Invertible indicates whether the moving average operator is invertible. That is, whether all the roots of θ ( z ) = 0 lie outside the unit circle. Note: The φ and θ operators are defined in the section “ARIMA Model,” p. 154.
8 Time Series
Model Comparison Table
152
8 Time Series Modeling Reports
Parameter Estimates Table
There is a Parameter Estimates table for each selected fit, which gives the estimates for the time series model parameters. Each type of model has its own set of parameters. They are described in the sections on specific time series models. The Parameter Estimates table has these terms: • Term lists the name of the parameter. These are described below for each model type. Some models contain an intercept or mean term. In those models, the related constant estimate is also shown. The definition of the constant estimate is given under the description of ARIMA models. • Factor (Seasonal ARIMA only) lists the factor of the model that contains the parameter. This is only shown for multiplicative models. In the multiplicative seasonal models, Factor 1 is nonseasonal and Factor 2 is seasonal. • Lag lists the degree of the lag or backshift operator that is applied to the term to which the parameter is multiplied. • Estimate lists the parameter estimates of the time series model. • Std Error lists the estimates of the standard errors of the parameter estimates. They are used in constructing tests and confidence intervals. • t Ratio lists the test statistics for the hypotheses that each parameter is zero. It is the ratio of the parameter estimate to its standard error. If the hypothesis is true, then this statistic has an approximate Student’s t-distribution. Looking for a t-ratio greater than 2 in absolute value is a common rule of thumb for judging significance because it approximates the 0.05 significance level. • Prob>|t| lists the observed significance probability calculated from each t-ratio. It is the probability of getting, by chance alone, a t-ratio greater (in absolute value) than the computed value, given a true hypothesis. Often, a value below 0.05 (or sometimes 0.01) is interpreted as evidence that the parameter is significantly different from zero. The Parameter Estimates table also gives the Constant Estimate, for models that contain an intercept or mean term. The definition of the constant estimate is given under “ARIMA Model,” p. 154.
8 Time Series Modeling Reports
153
Each model has its own Forecast plot. The Forecast plot shows the values that the model predicts for the time series. It is divided by a vertical line into two regions. To the left of the separating line the one-step-ahead forecasts are shown overlaid with the input data points. To the right of the line are the future values forecast by the model and the confidence intervals for the forecasts. You can control the number of forecast values by changing the setting of the Forecast Periods box in the platform launch dialog or by selecting Number of Forecast Periods from the Time Series drop-down menu. The data and confidence intervals can be toggled on and off using the Show Points and Show Confidence Interval commands on the model’s popup menu.
Residuals The graphs under the residuals section of the output show the values of the residuals based on the fitted model. These are the actual values minus the one-step-ahead predicted values. In addition, the autocorrelation and partial autocorrelation of these residuals are shown. These can be used to determine whether the fitted model is adequate to describe the data. If it is, the points in the residual plot should be normally distributed about the zero line and the autocorrelation and partial autocorrelation of the residuals should not have any significant components for lags greater than zero.
Iteration History The model parameter estimation is an iterative procedure by which the log-likelihood is maximized by adjusting the estimates of the parameters. The iteration history for each model you request shows the value of the likelihood function for each iteration. This can be useful for diagnosing problems with the fitting procedure. Attempting to fit a model which is poorly suited to the data can result in a large number of iterations that fail to converge on an optimum value for the likelihood.
8 Time Series
Forecast Plot
154
8 Time Series ARIMA Model
Model Report Options
The title bar for each model you request has the popup menu shown to the right, with the following options for that model: Show Points hides or shows the data points in the forecast graph. Show Confidence Interval hides or shows the confidence intervals in the forecast graph. Save Columns creates a new data table with columns representing the results of the model. Residual Statistics controls which displays of residual statistics are shown for the model. These displays are described in the section “Time Series Commands,” p. 148; however, they are applied to the residual series (the one-step-ahead model predictions minus the input series).
ARIMA Model An AutoRegressive Integrated Moving Average (ARIMA) model predicts future values of a time series by a linear combination of its past values and a series of errors (also known as random shocks or innovations). The ARIMA command performs a maximum likelihood fit of the specified ARIMA model to the time series. For a response series { y i } , the general form for the ARIMA model is: φ ( B ) ( w t – µ ) = θ ( B )a t where t is the time index B is the backshift operator defined as B y t = y t – 1 d
w t = ( 1 – B ) y t is the response series after differencing
µ is the intercept or mean term. φ ( B ) and θ ( B ) , respectively, the autoregressive operator and the moving average operator and are
written 2
p
2
φ ( B ) = 1 – φ 1 B – φ 2 B – … – φ p B and θ ( B ) = 1 – θ 1 B – θ 2 B – … – θ q B
q
a t are the sequence of random shocks.
The a t are assumed to be independent and normally distributed with mean zero and constant variance. The model can be rewritten as φ ( B )w t = δ + θ ( B )a t where the constant estimate δ is given by the relation
8 Time Series Smoothing Models
155
The ARIMA command displays the Specify ARIMA Model dialog, which allows you to specify the ARIMA model you want to fit. The results appear when you click Estimate.
Use the Specify ARIMA Model dialog for the following three orders that can be specified for an ARIMA model: 1 2 3 4
The Autoregressive Order is the order (p) of the polynomial ϕ ( B ) operator. The Differencing Order is the order (d) of the differencing operator. The Moving Average Order is the order (q) of the differencing operator θ ( B ) . An ARIMA model is commonly denoted ARIMA(p,d,q). If any of p,d, or q are zero, the corresponding letters are often dropped. For example, if p and d are zero, then model would be denoted MA(q).
The Confidence Intervals box allows you to set the confidence level between 0 and 1 for the forecast confidence bands. The Intercept check box determines whether the intercept term µ will be part of the model. If the Constrain fit check box is checked, the fitting procedure will constrain the autoregressive parameters to always remain within the stable region and the moving average parameters within the invertible region. You might want to uncheck this box if the fitter is having difficulty finding the true optimum or if you want to speed up the fit. You can check the Model Summary table to see if the resulting fitted model is stable and invertible.
Smoothing Models JMP offers a variety of smoothing techniques. Smoothing models represent the evolution of a time series by the model: y t = µ t + β t + s ( t ) + a t where µ t is the time-varying mean term, β t is the time-varying slope term, s ( t ) is one of the s time-varying seasonal terms, a t are the random shocks.
8 Time Series
δ = φ ( B )µ = µ – φ 1 µ – φ 2 µ – … – φ p µ .
156
8 Time Series Smoothing Models
Models without a trend have βt = 0 and nonseasonal models have s ( t ) = 0 . The estimators for these time-varying terms are L t smoothed level that estimates µ t T t is a smoothed trend that estimates β t S t – j for j = 0, 1, …, s – 1 are the estimates of the s ( t ) .
Each smoothing model defines a set of recursive smoothing equations that describes the evolution of these estimators. The smoothing equations are written in terms of model parameters called smoothing weights. They are α, the level smoothing weight γ, the trend smoothing weight ϕ, the trend damping weight δ, the seasonal smoothing weight. While these parameters enter each model in a different way (or not at all), they have the common property that larger weights give more influence to recent data while smaller weights give less influence to recent data. Each smoothing model has an ARIMA model equivalent. These ARIMA equivalents are used to estimate the smoothing weights and provide forecasts. You may not be able to specify the equivalent ARIMA model using the ARIMA command because some smoothing models intrinsically constrain the ARIMA model parameters in ways the ARIMA command will not allow.
Smoothing Model Dialog The Smoothing Model dialog appears in the report window when you select one of the smoothing model commands.
The Confidence Intervals box allows you to set the confidence level for the forecast confidence bands. The dialogs for seasonal smoothing models include a Periods Per Season box for setting the number of periods in a season. The dialog also lets you to specify what type of constraint you want to enforce on the smoothing weights during the fit. The constraints are: Zero To One keeps the values of the smoothing weights in the range zero to one. Unconstrained allows the parameters to range freely. Stable Invertible constrains the parameters such that the equivalent ARIMA model is stable and
invertible. Custom expands the dialog to allow you to set constraints on individual smoothing weights.
8 Time Series Smoothing Models
157
The example shown here has the Level weight (α) fixed at a value of 0.3 and the Trend weight (γ) bounded by 0 and 1. In this case, the value of the Trend weight is allowed to move within the range 0 to 1 while the Level weight is held at 0.3. Note that you can specify all the smoothing weights in advance by using these custom constraints. In that case, none of the weights would be estimated from the data although forecasts and residuals would still be computed. When you click Estimate, the results of the fit appear in place of the dialog.
Simple Exponential Smoothing The model for simple exponential smoothing is y t = µ t + α t . The smoothing equation, Lt = αyt + (1 – α)Lt-1, is defined in terms of a single smoothing weight α. This model is equivalent to an ARIMA(0, 1, 1) model where ( 1 – B )y t = ( 1 – θ B )α t with θ = 1 – α .
The moving average form of the model is ∞
yt = at +
∑
αa t – j
j–1
Double (Brown) Exponential Smoothing The model for double exponential smoothing is y t = µ t + β 1 t + a t . The smoothing equations, defined in terms of a single smoothing weight α are L t = αy t + ( 1 – α )L t – 1 and
T t = α ( L t – L t – 1 ) + ( 1 – α )T t – 1 .
This model is equivalent to an ARIMA(0, 1, 1)(0, 1, 1)1 model 2
2
( 1 – B ) y t = ( 1 – θ B ) a t where θ 1, 1 = θ 2, 1 with θ = 1 – α .
The moving average form of the model is
8 Time Series
Each smoothing weight can be Bounded, Fixed, or Unconstrained as determined by the setting of the popup menu next to the weight’s name.
158
8 Time Series Smoothing Models ∞
∑
yt = at +
2
( 2α + ( j – 1 )α )a t – j
j=1
Linear (Holt) Exponential Smoothing The model for linear exponential smoothing is y t = µ t + β t t + a t . The smoothing equations defined in terms of smoothing weights α and γ are L t = αy t + ( 1 – α ) ( L t – 1 + T t – 1 ) and
T t = γ ( L t – L t – 1 ) + ( 1 – γ )T t – 1
This model is equivalent to an ARIMA(0, 2, 2) model where 2
2
( 1 – B ) y t = ( 1 – θ B – θ 2 B )a t with θ = 2 – α – αγ and θ 2 = α – 1 .
The moving average form of the model is ∞
∑
yt = at +
( α + jαγ )a t – j
j=1
Damped-Trend Linear Exponential Smoothing The model for damped-trend linear exponential smoothing is y t = µ t + β t t + a t . The smoothing equations in terms of smoothing weights α, γ, and ϕ are L t = αy t + ( 1 – α ) ( L t – 1 + ϕT t – 1 ) and
T t = γ ( L t – L t – 1 ) + ( 1 – γ )ϕT t – 1
This model is equivalent to an ARIMA(1, 1, 2) model where 2
( 1 – ϕ B ) ( 1 – B )y t = ( 1 – θ 1 B – θ 2 B )a t with θ 1 = 1 + ϕ – α – αγϕ and θ 2 = ( α – 1 )ϕ .
The moving average form of the model is ∞
yt = αt +
∑ j=1
j
⎛ α + αγϕ ( ϕ – 1 )⎞ ⎜ ---------------------------------------⎟ α t – j ϕ–1 ⎝ ⎠
Seasonal Exponential Smoothing The model for seasonal exponential smoothing is y t = µ t + s ( t ) + a t . The smoothing equations in terms of smoothing weights α and δ are L t = α ( y t – S t – s ) + ( 1 – α )L t – 1 and S t = δ ( y t – L t – s ) + ( 1 – δ )ϕS t – s
This model is equivalent to a seasonal ARIMA(0, 1, 1)(0, 1, 0)S model where we define θ 1 = θ 1, 1 , θ 2 = θ 2, s = θ 2, s , and θ 3 = – θ 1, 1 θ 2, s
159
8 Time Series Smoothing Models
s
2
( 1 – B ) ( 1 – B )y t = ( 1 – θ 1 B – θ 2 B – θ 3 B
s+1
8 Time Series
so )a t
with θ 1 = 1 – α , θ 2 = δ ( 1 – α ) , and θ 3 = ( 1 – α ) ( δ – 1 ) .
The moving average form of the model is ∞
yt = at +
∑
ψj at – j
j=1
⎧ α for jmods ≠ 0 ⎪ where ψ = ⎨ ⎪ ⎩ α + δ ( 1 – α ) forjmods = 0
Winters Method (Additive) The model for the additive version of Winter’s method is y t = µ t + β t t + s ( t ) + a t . The smoothing equations in terms of weights α, γ, and δ are Lt = α ( yt – St – s ) + ( 1 – α ) ( Lt – 1 + Tt – 1 ) S t = δ ( y t – L t ) + ( 1 – δ )S t – s .
,
T t = γ ( L t – L t – 1 ) + ( 1 – γ )T t – 1
This model is equivalent to a seasonal ARIMA(0, 1, s+1)(0, 1, 0)s model s+1 ⎛ ⎞ i ⎜ ( 1 – B ) ( 1 – B )y t = ⎜ 1 – ∑ θ i B ⎟⎟ a t ⎝ ⎠ i=1 2
The moving average form of the model is ∞
yt = at +
∑
Ψj at – j
j=1
where ⎧ α + jαγ for jmods ≠ 0 ψ = ⎨ ⎩ α + jαγ + δ ( 1 – α ) forjmods = 0
, and
9 Correlations and Multivariate Techniques The Multivariate platform specializes in exploring how many variables relate to each other. The platform begins by showing a standard correlation matrix. The Multivariate platform popup menu gives the additional correlations options and other techniques for looking at multiple variables such as • a scatterplot matrix with normal density ellipses • inverse, partial, and pairwise correlations • a covariance matrix • nonparametric measures of association • simple statistics (such as mean and standard deviation) All plots and the current data table are linked. You can highlight points on any scatterplot in the scatterplot matrix, or the outlier distance plot. The points are highlighted on all other plots and are selected in the data table.
Introduction For a short tour of the Multivariate platform, Open Solubility.jmp from the sample data folder. Select Analyze > Multivariate to bring up the launch dialog.
When the report appears, you see correlations and a scatterplot matrix.
162
9 Correlations and Multivariate Techniques Introduction
From here, you can calculate several different kinds of correlations, including nonparametric correlations (se below for instances of each kind. Note that the first two variables (1-Octonol and Ether) are correlated with each other. In addition, the last four variables are similarly correlated. This suggests that the variability in these six variables could be explained in fewer dimensions. Principal Components would confirm this. Select Principal Components > on Correlations from the platform’s drop-down list.
The principal components report, shown here, indicates that there are two strong directions of variation, corresponding to the eigenvalues of 4.785 and 0.945. We can express 95% of the variation of these six dimensions in only two dimensions.
9 Correlations and Multivariate Techniques Launch the Platform and Select Options
163
When you choose Analyze > Multivariate, a standard correlation matrix and scatterplot matrix appears first. The platform popup menu shown here lists additional correlation options and other techniques for looking at multiple variables. The following sections describe the tables and plots offered by the Multivariate platform.
In most of the following analysis options, a missing value in an observation causes the entire observation to be deleted. The exceptions are in Pairwise Correlations, which exclude rows that are missing on either of the variables under consideration, and Simple Statistics > Univariate, which calculates its statistics column-by-column, without regard to missing values in other columns. Many of the following examples use the Solubility.jmp sample data table.
Correlations Multivariate The Correlations Multivariate option gives the Correlations table, which is a matrix of correlation coefficients that summarizes the strength of the linear relationships between each pair of response (Y) variables. This correlation matrix only uses the observations that have nonmissing values for all variables in the analysis.
Inverse Correlations and Partial Correlations The inverse correlation matrix (Inverse Corr table), shown at the top in the next figure, provides useful multivariate information. The diagonal elements of the matrix are a function of how closely the variable is a linear function of the other variables. In the inverse correlation, the diagonal is 1/(1 – R2) for the fit of that variable by all the other variables. If the multiple correlation is zero, the diagonal inverse
9 Correlations and Multivariate Statistics
Launch the Platform and Select Options
164
9 Correlations and Multivariate Techniques Launch the Platform and Select Options
element is 1. If the multiple correlation is 1, then the inverse element becomes infinite and is reported missing.
The partial correlation table (Partial Corr table) shows the partial correlations of each pair of variables after adjusting for all the other variables. This is the negative of the inverse correlation matrix scaled to unit diagonal.
Scatterplot Matrix To help you visualize the correlations, a scatterplot for each pair of response variables displays in a matrix arrangement, as shown in Figure 9.1. The scatterplot matrix is shown by default. If the scatterplots are not showing, select Scatterplot Matrix from the platform popup menu. The cells of the scatterplot matrix are size-linked so that stretching a plot from any cell resizes all the scatterplot cells. By default, a 95% bivariate normal density ellipse is imposed on each scatterplot. If the variables are bivariate normally distributed, this ellipse encloses approximately 95% of the points. The correlation of the variables is seen by the collapsing of the ellipse along the diagonal axis. If the ellipse is fairly round and is not diagonally oriented, the variables are uncorrelated.
9 Correlations and Multivariate Techniques Launch the Platform and Select Options
165
Two clusters of correlations: The first two variables and the next four.
The popup menu next on the Scatterplot Matrix title bar button lets you tailor the matrix with color and density ellipses and by setting the α-level.
toggles the display of the density ellipses on the scatterplots constructed by the α level that you choose. By default they are 95% ellipses.
Density Ellipses
Show Correlations shows the correlation of each histogram in the upper left corner of each scat-
terplot. draws histograms in the diagonal of the scatterplot matrix. These histograms can be specified as Horizontal or Vertical. In addition, you can toggle the counts that label each bar with the Show Counts command.
Show Histogram
9 Correlations and Multivariate Statistics
Figure 9.1 Example of a Scatterplot Matrix
166
9 Correlations and Multivariate Techniques Launch the Platform and Select Options
Show Counts
Show Correlations
Show Histogram
(Horizontal in this case)
lets you select from a submenu of standard α-levels or select the Other command and specifically set the α level for the density ellipses.
Ellipse α
Ellipse Color lets you select from a palette of colors to change the color of the ellipses.
You can reorder the scatterplot matrix columns by dragging a diagonal (label) cell to another position on the diagonal. For example, if you drag the cell of the column labeled 1-octanol diagonally down one cell, the columns reorder as shown in Figure 9.2. When you look for patterns in the whole scatterplot matrix with reordered columns, you clearly see the variables cluster into groups based on their correlations, as illustrated previously by the two groups showing in Figure 9.1. Figure 9.2 Reorder Scatterplot Matrix
Drag cell diagonally
Covariance Matrix The Covariance Matrix command displays the covariance matrix for the analysis.
9 Correlations and Multivariate Techniques Launch the Platform and Select Options
167
The Pairwise Correlations table lists the Pearson product-moment correlations for each pair of Y variables, using all available values. The count values differ if any pair has a missing value for either variable. These are values produced by the Density Ellipse option on the Fit Y by X platform. The Pairwise Correlations report also shows significance probabilities and compares the correlations with a bar chart, as shown in Figure 9.3. Figure 9.3 Pairwise Correlations Report
Simple Statistics The Simple Statistics submenu allows you to display simple statistics (mean, standard deviation, and so on) for each column. These statistics can be calculated in two ways that differ when there are missing values in the data table. Univariate Simple Statistics are calculated on each column, regardless of values in other col-
umns. These values match the ones that would be produced using the Distribution platform.
9 Correlations and Multivariate Statistics
Pairwise Correlations
168
9 Correlations and Multivariate Techniques Launch the Platform and Select Options
Multivariate Simple Statistics are calculated by dropping any row that has a missing value for
any column in the analysis. These are the statistics that are used by the Multivariate platform to calculate correlations.
Nonparametric Correlations When you select Nonparametric Correlations from the platform popup menu, the Nonparametric Measures of Association table is shown. The Nonparametric submenu offers these three nonparametric measures: Spearman’s Rho is a correlation coefficient computed on the ranks of the data values instead of
on the values themselves. Kendall’s Tau is based on the number of concordant and discordant pairs of observations. A pair
is concordant if the observation with the larger value of X also has the larger value of Y. A pair is discordant if the observation with the larger value of X has the smaller value of Y. There is a correction for tied pairs (pairs of observations that have equal values of X or equal values of Y). Hoeffding’s D is a statistical scale that ranges from –0.5 to 1, with large positive values indicating
dependence.The statistic approximates a weighted sum over observations of chi-square statistics for two-by-two classification tables, and detects more general departures from independence. The Nonparametric Measures of Association report also shows significance probabilities for all measures and compares them with a bar chart similar to the one in Figure 9.3. See “Computations and Statistical Details,” p. 169, for computational information.
9 Correlations and Multivariate Techniques Computations and Statistical Details
169
Pearson Product-Moment Correlation The Pearson product-moment correlation coefficient measures the strength of the linear relationship between two variables. For response variables X and Y, it is denoted as r and computed as
∑ (x – x)(y – y) r = ---------------------------------------------------------- . 2
∑ (x – x) ∑ (y – y)
2
If there is an exact linear relationship between two variables, the correlation is 1 or –1, depending on whether the variables are positively or negatively related. If there is no linear relationship, the correlation tends toward zero.
Nonparametric Measures of Association For the Spearman, Kendall, or Hoeffding correlations, the data are first ranked. Computations are then performed on the ranks of the data values. Average ranks are used in case of ties. Spearman’s ρ (rho) Coefficients Spearman’s ρ correlation coefficient is computed on the ranks of the data using the formula for the Pearson’s correlation previously described. Kendall’s τb Coefficients Kendall’s τb coefficients are based on the number of concordant and discordant pairs. A pair of rows for two variables is concordant if they agree in which variable is greater. Otherwise they are discordant, or tied. The formula
∑ sgn ( x i – xj ) sgn ( y i – y j ) i <j τ b = --------------------------------------------------------------( T0 – T1 ) ( T0 – T2 ) computes Kendall’s τb where T0 = ( n ( n – 1 ) ) ⁄ 2 , T1 =
∑ ( ( ti ) ( ti – 1 ) ) ⁄ 2 , and
T2 =
∑ ( ( ui ) ( ui – 1 ) ) ⁄ 2 ,
Note that sgn ( z ) is equal to 1 if z > 0 , 0 if z = 0 , and –1 if z < 0 . The ti (the ui) are the number of tied x (respectively y) values in the ith group of tied x (respectively y) values, n is the number of observations, and Kendall’s τb ranges from –1 to 1. If a weight variable is specified, it is ignored.
9 Correlations and Multivariate Statistics
Computations and Statistical Details
170
9 Correlations and Multivariate Techniques Computations and Statistical Details
Computations proceed in the following way: • Observations are ranked in order according to the value of the first variable. • The observations are then re-ranked according to the values of the second variable. • The number of interchanges of the first variable is used to compute Kendall’s τb. Hoeffding’s D Statistic The formula for Hoeffding’s D (1948) is ( n – 2 ) ( n – 3 )D 1 + D 2 – 2 ( n – 2 )D 3 -⎞ where D = 30 ⎛ ---------------------------------------------------------------------------------------⎝ n( n – 1)( n – 2 )(n – 3)( n – 4) ⎠ D1 = Si ( Qi – 1 ) ( Qi – 2 ) . D2 = Si ( Ri – 1 ) ( Si – 1 ) ( Si – 2 ) D3 = ( Ri – 1 ) ( Si – 2 ) ( Qi – 1 )
The Ri and Si are ranks of the x and y values, and the Qi (sometimes called bivariate ranks) are one plus the number of points that have both x and y values less than the ith points. A point that is tied on its x value or y value, but not on both, contributes 1/2 to Qi if the other value is less than the corresponding value for the ith point. A point tied on both x and y contributes 1/4 to Qi. When there are no ties among observations, the D statistic has values between –0.5 and 1, with 1 indicating complete dependence. If a weight variable is specified, it is ignored.
Inverse Correlation Matrix The inverse correlation matrix provides useful multivariate information. The diagonal elements of the inverse correlation matrix, sometimes called the variance inflation factors (VIF), are a function of how closely the variable is a linear function of the other variables. Specifically, if the correlation matrix is ii denoted R and the inverse correlation matrix is denoted R-1, the diagonal element is denoted r and is computed as r
ii
1 , = VIF i = --------------2 1 – Ri
where Ri2 is the coefficient of variation from the model regressing the ith explanatory variable on the other explanatory variables. Thus, a large rii indicates that the ith variable is highly correlated with any number of the other variables. Note that the definition of R2 changes for no-intercept models. For no-intercept and hidden-intercept models, JMP uses the R2 from the uncorrected Sum of Squares, i.e. from the zero model, rather than the corrected sum of squares, from the mean model.
10 Importing, Exporting, and Charting Data This chapter shows you how to use JMP to interact with the rest of the world. Therefore, it illustrates ways of importing data from various formats into JMP for analysis. No statistics package is useful unless its results can be communicated to others. Graphs and charts are usually used to summarize results, so this chapter also describes copying and pasting results from JMP Student Edition into word processors, presentation managers, or web authoring tools. In addition to techniques for accomplishing these tasks, the Chart, and Overlay Plot platforms are documented.
Introduction After starting JMP Student Edition, Open the data file Denim.jmp. Details about this data are found in “About the Data,” p. 21 in the “The Distribution Platform” chapter. In this introduction, some of the variables’ statistics are charted.
Using the Chart Platform To begin with, produce a bar chart showing the mean starch content and the maximum thread wear for each of the wash methods. Select Graph > Chart from the menu bar. This brings up a dialog like the one in Figure 10.7 on page 181, shown later in this chapter. Select Starch Content (%) from the list of columns. Click on the Statistics button and select Mean from the drop-down list. Select Thread Wear Measured from the list of columns. Click on the Statistics button and select Max. Select Method from the list of variables. Click the Categories, X Levels button.
172
10 Importing, Exporting, and Charting Data Introduction
Click OK. A bar chart like the one in Figure 10.1 appears. Figure 10.1 Bar Chart
The Chart platform makes it easy to format this chart, or even change to another chart type. Right-Click on Mean(Starch Content (%)) in the legend to the right of the plot, select Overlay Color, and choose a color from the resulting palette. The colors of the bars for starch content change to the selected color. Please note that merely because JMP Student Edition allows for a great many chart options to be changed, do not feel that they all must be. Simple charts are almost always the most effective. With Mean(Starch Content (%)) still selected: From the drop-down list in the title bar next to Chart, select Y Options > Needle Plot. The bars for Mean(Starch Content (%)) change to a needle chart, as shown in Figure 10.2.
10 Importing, Exporting, and Charting Data Introduction
173
Click somewhere in the blank area above the legend to deselect all columns. From the drop-down list in the title bar next to Chart, select Y Options > Line Chart.
The entire chart changes to a line plot. Therefore, charting options can be applied to individual levels as well as entire charts.
Using the Overlay Plot Platform In this example, two charts are produced: • A plot of starch content and thread wear against wash method. • A plot of starch content versus thread wear for each level of wash method. Select Graph > Overlay Plot from the menu bar. Assign Thread Wear Measured and Starch Content (%) to the Y role, and Method to the Categories, X Level role.
10 Graphs
Figure 10.2 Half Needle Chart
174
10 Importing, Exporting, and Charting Data Introduction
Click OK. This produces a plot with both variables plotted on the y-axis. To connect the points and produce the plot shown in Figure 10.3, From the drop-down list in the title bar next to Overlay Plot, select Y Options > Connect Points. Figure 10.3 Final Overlay Plot
10 Importing, Exporting, and Charting Data Importing Data
175
Again select Graph > Overlay Plot from the menu bar. Select Starch Content (%) from the list of columns and click the Y button. Select Size of Load from the list of columns and click the X button. Select Method from the list of columns and click the By button. This produces three separate plots, one for each level of the Method variable. The graphs shown here have been reduced in size by holding down the control or a key and dragging on the corner of one graph. They all resize.
Importing Data The File > Open command displays a specialized open file dialog used to locate a file to open and tell JMP Student Edition the file format of the incoming file. The Open command then reads the file into a JMP Student Edition data table. JMP Student Edition directly reads JMP data tables, JMP journal files, JMP Script files, SAS transport files, text files with any column delimiter, and Excel files, and flat-file database files.
Windows The Files of Type selection filters the list of files displayed in the dialog. If applicable, JMP Student Edition gives additional information about the file in the File Open dialog. The example in Figure 10.4 shows an Open Data File dialog when the Files of Type drop-down list is changed from the default to JMP data table. The dialog shows the table notes, if they exist. If *.* is chosen from the Files of Type menu, JMP looks at the type of file given by the 3-character extension appended to its file name and opens it accordingly. This works as long as the file has the structure indicated by its name.
10 Graphs
To produce the plot of starch content versus load size for each level of wash method,
176
10 Importing, Exporting, and Charting Data Importing Data
Figure 10.4 The Open Data File Dialog to Read a JMP table
Macintosh The Open window allows you to open any type of file into JMP. If you select a text document to open, the Open As menu appears in the window (see the next section for details on text importing).
10 Importing, Exporting, and Charting Data Importing Data
177 10 Graphs
Importing Text Files You import text be doing one of three things, depending on the operating system you are working with. Windows Under Microsoft Windows, Choose Text from the Files of Type drop-down list to import the data based on your current preference settings. Select the Data With Preview checkbox to see a preview of an incoming text file. You then see the dialog shown in Figure 10.5 for specification of delimiters and other import details. Macintosh On the Macintosh, select a text file to open. This displays the Open As menu in the lower part of the dialog. Then choose one of the three text commands: • Text opens the file in a simple text editing window • Data (Best Guess) opens the data in a format that JMP thinks is appropriate based on the contents of the file. • Data (Using Preview) presents a dialog similar to the one shown in Figure 10.5, that lets you designate delimiters and other information, and shows a preview of the resulting data table. • Data (Using Preferences) opens the file and uses default rules (set in the preferences panel) to interpret end of field and end of line delimiters to create a JMP data table All Platforms JMP Student Edition attempts to discern the arrangement of text data. This is adequate for a rectangular text file with no missing fields, a consistent field delimiter, and an end-of-line delimiter
178
10 Importing, Exporting, and Charting Data Importing Data Note: If double-quotes are encountered when importing text data, JMP changes the delimiter rules to look for an end double-quote. Other text delimiters, including spaces embedded within the quotes, are ignored and treated as part of a text string.
The initial settings in the delimited import dialog are taken from the current Preferences file. The dialog also shows the column names, data types, and the first two rows of data. In Figure 10.5, preferences are set that indicate the incoming table contains column headers to be used as the column names; the column names are name, age, sex, and height. If no column names are indicated, the Name fields are called Column 1, Column 2, and so on. One or more end-of-field delimiters, end-of-line delimiters, the option to Strip enclosing quotes, and the ability to set how many rows and columns will be read are additional options presented in the dialog. Figure 10.5 Import Text File
10 Importing, Exporting, and Charting Data Importing Data
179
If your data is fixed width (that is, each variable uses a set number of columns in the text file), click the Try Fixed Width button to specify the separations of each column to be imported.
Importing Microsoft Excel Files JMP Student Edition has the ability to directly import Microsoft Excel worksheets and workbooks under Macintosh and Microsoft Windows. Excel worksheet and workbooks are imported simply by choosing Excel Files from the Files of Type (Windows) or Enable (Macintosh) lists as shown in Figure 10.6. Figure 10.6 Excel Open Choice
JMP Student Edition can also import Excel workbooks that contain several data tables inside them. After selecting Excel Files(*.xls) from the Open dialog and double-clicking on the desired workbook, JMP Student Edition opens all the worksheets in the workbook.
10 Graphs
Fixed-Width Text
180
10 Importing, Exporting, and Charting Data Results from Platforms
Results from Platforms The results from JMP Student Edition’s platforms can be cut and pasted into other programs using the system’s clipboard, through standard cut and paste facilities. To copy results into another program, Select the Selection tool (
).
Hold down the Shift key and click on each part of the report that needs to be copied. Note that axes frequently need to be selected in addition to the graphs they accompany. Select Edit > Copy. In the word processor, select Edit > Paste.
The Chart Platform The Chart platform computes and plots data and statistics about the data. Unlike the statistical platforms in JMP, the Chart platform is not intended as an exploratory device. It is used to report the results from other explorations. For that reason, the plots in the Chart platform do not “bristle with interactivity” to the degree that other platforms do. In essence, what is going to be reported should be known before the Chart platform is used. To plot descriptions of data, complete the following steps after bringing up the Chart launch dialog (Figure 10.7) Figure 10.7 Chart Launch Dialog
Select the data column in the column list. Click the Statistics button (revealing the menu shown in Figure 10.8) and select the statistic to be charted. From the section labeled Options, use the drop-down list to select the orientation of the chart (vertical or horizontal) and the type of chart (bar, line, pie, needle, or point chart) to be generated. Don’t worry too much about getting the orientation and chart type correct initially — they can be changed after the chart has been generated.
10 Importing, Exporting, and Charting Data The Chart Platform
181
Click OK. An example bar chart, plotting the mean of Starch Content (%) using Method as Categories, X Levels is shown in Figure 10.9.
Figure 10.8 Plottable Statistics
Figure 10.9 Chart Example
Many of the options on the launch dialog can be changed using the platform popup menu. Platform options affect all charts in the report window. However, some options can be applied to individual charts.
10 Graphs
Optionally, include an X, Level column to be plotted on the horizontal axis, or a Grouping variable to generate separate graphs for each level of the column, either in separate windows or overlaid in the same window. Weight, Freq, and By options work as in other platforms.
182
10 Importing, Exporting, and Charting Data The Chart Platform
Single-Chart Options To apply the following options to individual charts, right-click on a chart legend to see the popup menu shown in Figure 10.10. When this menu is accessed through a chart legend, the commands apply only to the individual chart. When accessed through the platform popup menu, without any legends highlighted, the commands apply to all charts. Figure 10.10 Chart Options
• Bar Chart displays a bar for each level of the chart variables. The default chart is a bar chart. • Line Chart replaces a bar chart with a line chart and connects each point with a straight line. Choose the Show Points option to show or hide the points. • Needle Chart replaces each bar with a line drawn from the axis to the plotted value. • Point Chart shows only the plot points, without connecting them. • Show Points toggles the point markers on a line or needle chart off and on. • Connect Points toggles the line connecting points on and off, leaving a point chart. • Std Error Bars overlays needles at plus or minus one standard error from the mean. • Overlay Color assigns color to a variable to identify it when overlaid with other charts. • Overlay Marker assigns plot points a marker to identify them in overlaid charts. • Overlay Pattern assigns bars a fill pattern to identify them in overlaid charts. • Pen Style allows the choice of a line style from the palette shown in Figure 10.10.
10 Importing, Exporting, and Charting Data The Chart Platform
183
Frame options allow control of the plot frame’s elements as a whole. Figure 10.11 Frame Options
• Background Color colors the background of the plot with the color chosen from the JMP Student Edition color palette. • Marker Size allows selection of the marker from a palette of six point sizes that range from dot to very large. The Preferred Size is set in the Preferences. The marker size applies to the plot point and its associated rows. • Marker Drawing Mode lets you adjust the way markers are drawn. Fast mode is useful for graphs with large number of points, so that they re-draw quickly after adjustments. Outlined mode is often used in presentations. • Border lets you turn border lines on or off. • Size/Scale allows changes in axes and plot frames. X Scale is only active on platforms where the X axis is numeric (the X axis for the Charts platform is categorical). Y Scale shows the standard axis scale dialog, which can also be shown by double-clicking in the Y axis. The axis scale dialog allows the specification of the maximum, minimum, increment, and tick marks for the axis, and draws tailored reference lines or a grid on the plot. The Frame Size command displays a dialog to enter the exact pixel size for the plot frame. • Add Graphics Script displays a text entry box to enter JSL commands, usually to tailor the graphics output in ways not provided with commands and options. See the JMP Scripting Language Guide for documentation of JSL commands. • DisplayBox lists the commands to conveniently select and deselect the plot area, and redraw the plot.
10 Graphs
Frame Options
184
10 Importing, Exporting, and Charting Data The Chart Platform
Level Options Click on a value in the legend to highlight it. Right-clicking on a highlighted legend shows the commands for the Colors and Markers palettes, which are identical to their Overlay equivalents in Figure 10.10. The commands affect only the highlighted level and its associated rows in the data table.
Platform Options After the charts appear, their drop-down menu has the following options. • Overlay displays a single overlaid chart when the chart has more than one Y (statistics) variable. Each chart can have its own chart type, which can be overlaid. For example, the chart shown in Figure 10.12 has two overlaid variables, with one as a bar chart and one as a needle chart. When Overlay is not checked, the platform shows duplicate axis notation for each chart. Figure 10.12 Different Chart Types
• Vertical changes a horizontal bar chart or a pie chart to a vertical bar chart. • Horizontal changes a vertical bar chart or a pie chart to a horizontal bar chart • Pie changes a horizontal or vertical chart into a pie chart. • Y Options accesses the options described in “Single-Chart Options,” p. 182. These options affect the variable whose legend is highlighted. • Level Options accesses the color, marker, and pattern palettes options described in “Level Options,” p. 184. These options affect highlighted bars. • Separate Axes duplicate the axis notation for each chart when there are multiple charts. By default, the axis notation only shows for the last chart displayed if the charts are not overlaid. Separate Axes is only enabled on the menu when there are multiple Y variables that are not overlaid. • Script has a submenu of commands available to all platforms that redo the analysis, or save the JSL commands for the analysis to a window or a file. (See “Script Submenu,” p. 66.)
10 Importing, Exporting, and Charting Data The Overlay Plot Platform
185
The Overlay Plot platform overlays numeric Y variables with a single numeric or character X variable. Optionally, the values of the X variable appear in ascending order, with points plotted and connected in that order. The Overlay Plot platform has platform plotting options accessed by the popup menu icon on the Overlay Plot title bar. There is also a single-plot options menu for each Y variable, which show when the Y variable legend beneath the plot is right-clicked. The individual plot options are the same as those in the Y Options submenu at the platform level. When one of these options is selected at the platform level, it affects all plots in the report if no legends are highlighted. If one or more plot legends are highlighted, the options affects all those plots.
Platform Options Platform options affect every plot in the report window. • Overlay overlays plots for all columns assigned the Y role. Plots initially appear overlaid with the Connect Points option in effect. When Overlay option is turned off, the plots show separately. • Separate Axes lets the X axis scale values be printed only once, on the last plot in the window. When Separate Axes is selected, the X axes for other Y variables show tick marks but show no scale values. • Uniform Y Scale makes the Y scales the same on grouped plots. • Connect Through Missing connects adjacent points in the plot, regardless if there are missing values between them. • Range Plot connects the lowest and highest points at each X value with a line with bars at each end. Note: The Needle option, described below, and Range option cannot be selected at the same time. • Y Options has a submenu of options that apply to all variables and plots in the report window when selected from the main platform window. The section “Single-Plot Options,” p. 186, describes Y options for each individual variable. • Ungroup Plots creates a separate chart for each level of a grouping variable • Arrange Plots allows you to specify the number of plots in each row • Script has a submenu of commands available to all platforms that redo the analysis or save the JSL commands for the analysis to a window or a file. (See “Script Submenu,” p. 66.)
Single-Plot Options Each Y variable is labeled beneath the plot, showing its name and symbol. Each Y variable’s plot can be modified by right-clicking on the variable name to bring up a menu. • Show Points alternately shows or hides points. • Connect Points is a toggle that alternately connects the points with lines. Connect Points can be activated when Show Points is not, allowing for greater flexibility in plot displays. • Needle draws a vertical line from each point to the X axis.
10 Graphs
The Overlay Plot Platform
186
10 Importing, Exporting, and Charting Data The Overlay Plot Platform
• Step joins the position of the points with a discrete step by drawing a straight horizontal line from each point to the X value of the following point, and then a straight vertical line to that point. Step, without showing points, is illustrated in Figure 10.13. Note: Only one of Connect Points, Needle, and Step can be chosen at a time. Figure 10.13 Overlay Step Plot
• Function Plot plots a formula (stored in the Y column) as a smooth curve. To use this function, store a formula in the Y column that is a function of a single X column. For example, the following column contains a formula involving the sine function. When used in an overlay plot, the function is plotted as a curve rather than individual points.
Note: Overlay Plot normally assumes you want a function plot when the Y column contains a formula. However, formulas that contain random number functions are more frequently used with simulations, where function plotting is not often wanted. Therefore, the Function Plot option is off (by default) when a random number function is present, but on for all other functions. • Connect Color displays the standard JMP color palette (see Figure 10.10) for assigning colors to lines that connect points. • Overlay Marker assigns markers to plotted points using the standard JMP marker palette (see Figure 10.10). • Overlay Marker Color lets you select the color of the overlay marker. • Line Style and Line Width let you adjust the appearance of the lines drawn on the plot.
10 Importing, Exporting, and Charting Data The Overlay Plot Platform
187 10 Graphs
11 Full Factorial Designs Designing Experiments A full factorial design contains all possible combinations of a set of factors. This is the most conservative design approach, but it is also the most costly in experimental resources. The full factorial designer supports both continuous factors and categorical factors with up to nine levels. In full factorial designs, you perform an experimental run at every combination of the factor levels. The sample size is the product of the numbers of levels of the factors. For example, a factorial experiment with a two-level factor, a three-level factor, and a four-level factor has 2 x 3 x 4 = 24 runs. Factorial designs with only two-level factors have a sample size that is a power of two (specifically 2f where f is the number of factors). When there are three factors, the factorial design points are at the vertices of a cube as shown in the diagram below. For more factors, the design points are the vertices of a hypercube. Full factorial designs are the most conservative of all design types. There is little scope for ambiguity when you are willing to try all combinations of the factor settings. Unfortunately, the sample size grows exponentially in the number of factors, so full factorial designs are too expensive to run for most practical purposes.
Introduction The following example, adapted from Meyer et al. (1996) and Box, Hunter, and Hunter (1978), shows a five-factor reactor example. To follow along with this example, open the folder Sample Data that was installed when you installed JMP. Within this folder, open Design Experiment > Reactor 32 Runs.jmp. Suppose you have used the screening designer to investigate the effects of five factors on the percent reaction of a chemical process. The factors (Feed Rate, Catalyst, Stir Rate, Temperature, and Concentration) are all two-level continuous factors. 1 Select DOE > Full Factorial Design. 2 Click the red triangle icon on the Full Factorial Design title bar and select Load Responses. 3 Open Reactor Response.jmp to load the responses by opening the Sample Data folder that was installed with JMP. In the Sample Data folder, open Design Experiment > Reactor Response.jmp.
4 Click the red triangle icon on the Full Factorial Design title bar and select Load Factors. 5 Open Reactor Factors.jmp to load the responses by opening the Sample Data folder that was installed with JMP. In the Sample Data folder, open Design Experiment > Reactor Factors.jmp.
190
11 Full Factorial Designs Introduction
The completed dialog is shown in Figure 11.1. Figure 11.1 Full-Factorial Example Response and Factors Panels
6 Click Continue to see Output Options panel. A full factorial design includes runs for all combinations of high and low factors for the five variables, giving 32 runs. 7 Click Make Table. The design data table (Figure 11.3) contains a run for every combination of high and low values for the five variables. Since there are five variables, there are 25=32 runs. This covers all combinations of a five factors with two levels each. Initially, the table has an empty Y column named Percent Reacted for entering response values when the experiment is complete. The values in your table may be different from those shown below. Figure 11.2 25 Factorial Reactor Data
To see the completed experiment and continue following this example, open the folder Sample Data that was installed when you installed JMP. Within this folder, open Design Experiment > Reactor 32 Runs.jmp.
11 Full Factorial Designs Introduction
191
Begin the analysis with a quick look at the data before fitting the factorial model. 1 Select Analyze > Distribution. 2 Highlight Percent Reacted and click Y, Columns. Then click OK. 3 Click the red triangle icon on the Percent Reacted title bar and select Normal Quantile Plot. The results are shown in Figure 11.4. Figure 11.4 Distribution of Response Variable for Reactor Data
Start the formal analysis with a stepwise regression. The data table has a script stored with it that automatically defines an analysis of the model that includes main effects and all two-factor interactions, and brings up the Stepwise control panel. 1 Click the red triangle icon next to the Fit Model script and select Run Script. 2 The probability to enter a factor (Prob to Enter) in the model should be 0.05. 3 The probability to remove a factor (Prob to Leave) should be 0.1.
11 DOE: Full Factorial Designs
Figure 11.3 Reactor 32 Runs.jmp
192
11 Full Factorial Designs Introduction
Figure 11.5 Run JSL Script for Stepwise Regression
4 A useful way to use the Stepwise platform is to check all the main effects in the Current Estimates table. To do this, make sure the menu beside Direction specifies Mixed. 5 Check the boxes for the main effects of the factors as shown in Figure 11.6. 6 Click Go. Figure 11.6 Starting Model For Stepwise Process
The mixed stepwise procedure removes insignificant main effects and adds important interactions. The end result is shown in Figure 11.7. Note that the Feed Rate and Stir Rate factors are no longer in the model. Figure 11.7 Model After Mixed Stepwise Regression
7 Click the Make Model button. The Model Specification window that appears is automatically set
11 Full Factorial Designs Introduction
193
8 Click Run Model to see the analysis for a candidate prediction model (Figure 11.9). The figure on the left in Figure 11.9 shows the actual by predicted plot for the model. The predicted model covers a range of predictions from 40% to 95% reacted. The size of the random noise as measured by the RMSE is only 3.3311%, which is more than an order of magnitude smaller than the range of predictions. This is strong evidence that the model has good predictive capability. The figure on the right in Figure 11.9 shows a table of model coefficients and their standard errors (labeled Parameter Estimates). All effects selected by the stepwise process are highly significant. Figure 11.9 Actual by Predicted Plot and Prediction Model Estimates
The factor Prediction Profiler also gives you a way to compare the factors and find optimal settings. 1 Open the Prediction Profiler by clicking the red triangle on the Response Percent Reacted title bar and selecting Factor Profiling > Profiler, as shown in Figure 11.10.
11 DOE: Full Factorial Designs
up with the appropriate effects (Figure 11.8). Figure 11.8 Fitting a Prediction Model
194
11 Full Factorial Designs Introduction
Figure 11.10 Selecting the Profiler
Figure 11.11 shows the profiler’s initial display. Figure 11.11 Viewing the Profiler
2 Click the red triangle on the Prediction Profiler title bar and select Maximize Desirability to see the profiler in Figure 11.12. Figure 11.12 Viewing the Maximum Desirability
The plot of Desirability versus Percent Reacted shows that the goal is to maximize Percent Reacted. The reaction is unfeasible economically unless the Percent Reacted is above 90%, therefore the Desirability for values less than 90% decreases and finally becomes zero. Desirability increases linearly as the Percent Reacted increases.
195
11 Full Factorial Designs Creating a Factorial Design
Creating a Factorial Design To start a full factorial design, select DOE > Full Factorial Design, or click the Full Factorial Design button on the JMP Starter DOE page. Then, follow the steps below.
Entering Responses and Factors To enter responses, follow the steps in Figure 11.13. Then, enter factors as shown in Figure 11.14 Figure 11.13 Entering Responses Click to enter lower and upper limits and importance weights. 4 To enter one 1 response at a time, click then select a goal type: Maximize, Match Target, Minimize, or None.
2 Double-click to edit the response name, if desired.
3 Click to change the response goal, if desired.
Tip: To quickly enter multiple responses, click the N Responses button and enter the
number of responses you want.
11 DOE: Full Factorial Designs
The maximum Desirability is 0.945 when Catalyst and Temperature are at their highest settings and Concentration is at its lowest setting. Percent Reacted increases from 65.5 at the center of the factor ranges to 95.875 at the most desirable setting.
196
11 Full Factorial Designs Creating a Factorial Design
Figure 11.14 Entering Factors in a Full Factorial Design To enter factors, click either the Continuous button or the Categorical button and select a factor type, level 2 - 9.
Double-click to edit the factor name.
Click to enter values or change the level names. To remove a level, click it, press the delete key on the keyboard, then press the Return or Enter key on the keyboard.
When you finish adding factors, click Continue.
Selecting Output Options Use the Output Options panel to specify how you want the output data table to appear:
• Run Order—Lets you designate the order you want the runs to appear in the data table when it is created. Choices are: Keep the Same—the rows (runs) in the output table appear as they do in the Design panel. Sort Left to Right—the rows (runs) in the output table appear sorted from left to right. Randomize—the rows (runs) in the output table appear in a random order. Sort Right to Left—the rows (runs) in the output table appear sorted from right to left.
• Number of Center Points—Specifies additional runs placed at the center of each continuous factor’s range. • Number of Replicates—Specify the number of times to replicate the entire design, including centerpoints. Type the number of times you want to replicate the design in the associated text box. One replicate doubles the number of runs.
11 Full Factorial Designs Creating a Factorial Design
197
When you click Make Table, the table shown in Figure 11.15 appears. Figure 11.15 Factorial Design Table The name of the table is the design type that generated it.
This script allows you to easily fit a model using the values in the design table.
values in the Pattern column describe the run each row represents for continuous factors, a plus sign represents high levels for continuous factors, a minus sign represents low levels level numbers represent values of categorical factors
11 DOE: Full Factorial Designs
Making the Table
12 Screening Designs Designing Experiments Screening designs are arguably the most popular designs for industrial experimentation. They examine many factors to see which have the greatest effect on the results of a process. Compared to other design methods, screening designs require fewer experimental runs, which is why they are cheap. Thus, they are attractive because they are a cheap and efficient way to begin improving a process. Often screening designs are a prelude to further experiments. It is wise to spend only about a quarter of your resource budget on an initial screening experiment. You can then use the results to guide further study. The efficiency of screening designs depends on the critical assumption of effect sparsity. Effect sparsity results because real-world processes usually have only a few driving factors; other factors are relatively unimportant. To understand the importance of effect sparsity, you can contrast screening designs to full factorial designs: • Full factorial designs consist of all combinations of the levels of the factors. The number of runs is the product of the factor levels. For example, a factorial experiment with a two-level factor, a three-level factor, and a four-level factor has 2 x 3 x 4 = 24 runs. • By contrast, screening designs reduce the number of runs by restricting the factors to two (or three) levels and by performing only a fraction of the full factorial design. Each factor in a screening design is usually set at two levels to economize on the number of runs needed, and response measurements are taken for only a fraction of the possible combinations of levels. In the case described above, you can restrict the factors to two levels, which yield 2 x 2 x 2 = 8 runs. Further, by doing half of these eight combinations you can still assess the separate effects of the three factors. So the screening approach reduces the 24-run experiment to four runs. Of course, there is a price for this reduction. This chapter discusses the screening approach in detail, showing both pros and cons. It also describes how to use JMP’s screening designer, which supplies a list of popular screening designs for two or more factors. These factors can be continuous or categorical, with two or three levels. The list of screening designs you can use includes designs that group the experimental runs into blocks of equal sizes where the size is a power of two.
Introduction Suppose an engineer wants to investigate a process that uses an electron beam welding machine to join two parts. The engineer fits the two parts into a welding fixture that holds them snugly together. A voltage applied to a beam generator creates a stream of electrons that heats the two parts, causing them
200 12 Screening Designs Introduction to fuse. The ideal depth of the fused region is 0.17 inches. The engineer wants to study the welding process to determine the best settings for the beam generator to produce the desired depth in the fused region. For this study, the engineer wants to explore the following three inputs, which are the factors for the study: • Operator, who is the technician operating the welding machine • Rotation Speed, which is the speed at which the part rotates under the beam • Beam Current, which is a current that affects the intensity of the beam After each processing run, the engineer cuts the part in half. This reveals an area where the two parts have fused. The Length of this fused area is the depth of penetration of the weld. This depth of penetration is the response for the study. The goals of the study are to: • find which factors affect the depth of the weld • quantify those effects • find specific factor settings that predict a weld depth of 0.17 inches To begin this example, select DOE > Screening Design from the main menu. Note that in the Responses panel, there is a single default response called Y. Change the default response as follows: 1 Double-click the response name and change it to Depth (In.). 2 The default goal for the single default response is Maximize, but the goal of this process is to get a target value of 0.17 inches with a lower bound of 0.12 and an upper bound of 0.22. Click the Goal text edit area and choose Match Target, as shown in Figure 12.1. Figure 12.1 Screening Design Response With Match Target Goal
3 Click the Lower Limit text edit area and enter 0.12 as the lower limit (minimum acceptable value), Then click the Upper Limit text edit area and enter 0.22 as the upper limit (maximum acceptable value). This example has one categorical factor (Operator) and two continuous factors (Speed and Current). 4 Add the categorical factor by clicking the Add button beside 2-Level Categorical. 5 Add two continuous factors by typing 2 in the Continuous box and clicking the associated Add button. 6 Double-click the factor names and rename them Operator, Speed, and Current. 7 Set high and low values for Speed to 3 and 5 rpm. Set high and low values for Current to 150 and 165 amps, and assign Mary and John as values for the categorical factor called Operator, as shown in Figure 12.2.
12 Screening Designs Creating a Screening Design
201
8 Click Continue. 9 Select Full Factorial in the list of designs, as shown in Figure 12.3, and then click Continue. Figure 12.3 List of Screening Designs for Two Continuous and One Categorical Factors
When the design details are complete, click Make Table to create a JMP table that contains the specified design. The table in Figure 12.4 appears. The table uses the names for responses, factors, and levels you specified. The Pattern variable shows the coded design runs. 10 View the table produced in this example by selecting Help (View on the Macintosh) > Sample Data Directory > Design Experiment > DOE Example 1.jmp. Figure 12.4 The Design Data Table
Creating a Screening Design To start a screening design, select DOE > Screening Design, or click the Screening Design button on the JMP Starter DOE page. Then, follow the steps below.
12 DOE: Screening Designs
Figure 12.2 Screening Design with Two Continuous and One Categorical Factor
202
12 Screening Designs Creating a Screening Design
Entering Responses To enter responses, follow the steps in Figure 12.5. Figure 12.5 Entering Responses Click to enter lower and upper limits and importance weights. 4 To enter one 1 response at a time, click then select a goal type: Maximize, Match Target, Minimize, or None.
2 Double-click to edit the response name, if desired.
3 Click to change the response goal, if desired.
Tip: To quickly enter multiple responses, click the N Responses button and enter the
number of responses you want. Specifying Goal Types and Lower and Upper Limits When entering responses, you can tell JMP that your goal is to obtain the maximum or minimum value possible, to match a specific value, or that there is no goal. The following description explains the relationship between the goal type (step 3 in Figure 12.5) and the lower and upper limits (step 4 in Figure 12.5): • For responses such as strength or yield, the best value is usually the largest possible. A goal of Maximize supports this objective. • The Minimize goal supports an objective of having the best objective be the smallest value, such as when the response is impurity or defects. • The Match Target goal supports the objective when the best value for a responses is a specific target value, such as with part dimensions. The default target value is assumed to be midway between the lower and upper limits. Note: If your target range is not symmetric around the target value, you can alter the
default target after you make a table from the design. In the data table, open the response’s Column Info dialog by double-clicking the column name, and enter an asymmetric target value.
12 Screening Designs Creating a Screening Design
203
When computing overall desirability, JMP uses the value you enter as the importance weight (step 4 in Figure 12.5) as the weight of each response. If there is only one response, then specifying importance is unnecessary. With two responses you can give greater weight to one response by assigning it a higher importance value.
Entering Factors After entering responses, enter factors. The Factors panel’s appearance depends on the design you select. Entering factors is the same in Screening Design, Space Filling Design, Mixture Design, and Response Surface Design. This process is described below, in Figure 12.6. Figure 12.6 Entering Factors
To enter factors, type the number of factors and click Add. Highlight the factor and click the Remove Selected button to remove a factor in the list.
Double-click to edit the factor name.
Click to enter factor values. To remove a level, click it, press the delete key on your keyboard, then press the Return or Enter key on your keyboard.
Types of Factors In general, when designing experiments, you can enter different types of factors in the model. Below is a description of factor types from which you can choose when creating screening designs: • Continuous Continuous factors have numeric data types only. In theory, you can set a continuous factor to any value between the lower and upper limits you supply. • Categorical Categorical factors (either numerical or categorical data types) have no implied order. If the values are numbers, the order is the numeric magnitude. If the values are character, the order is the sorting sequence. The settings of a categorical factor are discrete and have no intrinsic order. Examples of categorical factors are machine, operator, and gender. After your responses and factors are entered, click Continue.
Choosing a Design The list of screening designs you can use includes designs that group the experimental runs into blocks of equal sizes where the size is a power of two. Highlight the type of screening design you would like to use and click Continue.
12 DOE: Screening Designs
Understanding Importance Weights
204 12 Screening Designs Creating a Screening Design Figure 12.7 Choosing a Type of Screening Design
The screening designer provides the following types of designs: Two-Level Full Factorial A full factorial design contains all combinations of the levels of the factors. The samples size is the product of the levels of the factors. For two-level designs, this is 2k where k is the number of factors. This can be expensive if the number of factors is greater than 3 or 4. These designs are orthogonal. This means that the estimates of the effects are uncorrelated. If you remove an effect in the analysis, the values of the other estimates remain the same. Their p-values change slightly, because the estimate of the error variance and the degrees of freedom are different. Full factorial designs allow the estimation of interactions of all orders up to the number of factors. Most empirical modeling involves first- or second-order approximations to the true functional relationship between the factors and the responses. The figure to the left in Figure 12.8 is a geometric representation of a two-level factorial. Two-Level Fractional Factorial A fractional factorial design also has a sample size that is a power of two. If k is the number of factors, the number of runs is 2k – p where p < k. The fraction of the full factorial is 2-p. Like the full factorial, fractional factorial designs are orthogonal. The trade-off in screening designs is between the number of runs and the resolution of the design. If price is no object, you can run several replicates of all possible combinations of m factor levels. This provides a good estimate of everything, including interaction effects to the mth degree. But because running experiments costs time and money, you typically only run a fraction of all possible levels. This causes some of the higher-order effects in a model to become nonestimable. An effect is nonestimable when it is confounded with another effect. In fact, fractional factorials are designed by deciding in advance which interaction effects are confounded with the other interaction effects. Resolution Number: The Degree of Confounding In practice, few experimenters worry about interactions higher than two-way interactions. These higher-order interactions are assumed to be zero. Experiments can therefore be classified by resolution number into three groups: • Resolution = 3 means that main effects are confounded with one or more two-way interactions, which must be assumed to be zero for the main effects to be meaningful. • Resolution = 4 means that main effects are not confounded with other main effects or two-factor interactions. However, two-factor interactions are confounded with other two-factor interactions.
205
12 Screening Designs Creating a Screening Design
All the fractional factorial designs are minimum aberration designs. For DOE experts, the minimum aberration design of a given resolution minimizes the number of words in the defining relation that are of minimum length. The figure on the right in Figure 12.8 is geometric representation of a two-level fractional factorial design. Figure 12.8 Representation of Full Factorial (Left) and Two-Level Fractional Factorial (Right) Designs –1, –1, –1
-1, 1, -1
1, -1, -1
1, 1, -1 1, 1, 1
-1, -1, 1
Plackett-Burman Designs Plackett-Burman designs are an alternative to fractional factorials for screening. One useful characteristic is that the sample size is a multiple of four rather than a power of two. There are no two-level fractional factorial designs with sample sizes between 16 and 32 runs. However, there are 20-run, 24-run, and 28-run Plackett-Burman designs. The main effects are orthogonal and two-factor interactions are only partially confounded with main effects. This is different from resolution-three fractional factorial where two-factor interactions are indistinguishable from main effects. In cases of effect sparsity, a stepwise regression approach can allow for removing some insignificant main effects while adding highly significant and only somewhat correlated two-factor interactions. Mixed-Level Designs If you have qualitative factors with three values, then none of the classical designs discussed previously are appropriate. For pure three-level factorials, JMP offers fractional factorials. For mixed two-level and three-level designs, JMP offers complete factorials and specialized orthogonal-array designs, listed below. Table 12.1 Design
L18 John L18 Chakravarty L18 Hunter L36
Two–Level Factors
1 3 8 11
Three–Level Factors
7 6 4 12
12 DOE: Screening Designs
• Resolution ≥ 5 means there is no confounding between main effects, between two-factor interactions, or between main effects and two-factor interactions.
206 12 Screening Designs Creating a Screening Design If you have fewer than or equal to the number of factors for a design listed in the table, you can use that design by selecting an appropriate subset of columns from the original design. Some of these designs are not balanced, even though they are all orthogonal. Cotter Designs Cotter designs are used when you have very few resources and many factors, and you believe there may be interactions. Suppose you believe in effect sparsity— that very few effects are truly nonzero. You believe in this so strongly that you are willing to bet that if you add up a number of effects, the sum will show an effect if it contains an active effect. The danger is that several active effects with mixed signs will cancel and still sum to near zero and give a false negative. Cotter designs are easy to set up. For k factors, there are 2k + 2 runs. The design is similar to the “vary one factor at a time” approach many books call inefficient and naive. A Cotter design begins with a run having all factors at their high level. Then follow k runs each with one factor in turn at its low level, and the others high. The next run sets all factors at their low level and sequences through k more runs with one factor high and the rest low. This completes the Cotter design, subject to randomizing the runs. When you use JMP to generate a Cotter design, JMP also includes a set of extra columns to use as regressors. These are of the form factorOdd and factorEven where factor is a factor name. They are constructed by adding up all the odd and even interaction terms for each factor. For example, if you have three factors, A, B, and C: Figure 12.9 AOdd = A + ABC BOdd = B + ABC COdd = C + ABC
AEven = AB + AC BEven = AB + BC CEven = BC + AC
Because these columns in a Cotter design make an orthogonal transformation, testing the parameters on these combinations is equivalent to testing the combinations on the original effects. In the example of factors listed above, AOdd estimates the sum of odd terms involving A. AEven estimates the sum of the even terms involving A, and so forth. Because Cotter designs have a false-negative risk, many statisticians discourage their use. How to Run a Cotter Design By default, JMP does not include a Cotter design in the list of available screening designs (Figure 12.7). However, if you would like to make a Cotter design: 1 Immediately after entering responses and factors (and before clicking Continue), click the red triangle icon in the Screening Design title bar. 2 Select Supress Cotter Designs. Changing the setting via the red triangle menu applies only to the current design. To alter the setting for all screening designs: 1 2 3 4
Select File > Preferences. Click the Platform icon. Click DOE to highlight it. Uncheck the box beside Supress Cotter Designs.
12 Screening Designs Creating a Screening Design
207
After you select a design type, click the disclosure buttons ( on Windows and on the Macintosh) to display the design and show modification options using the Display and Modify Design panel to tailor the design (Figure 12.10). Figure 12.10 Display and Modification Options
• Change Generating Rules—Controls the choice of different fractional factorial designs for a given number of factors. • Aliasing of Effects—Shows the confounding pattern for fractional factorial designs. • Coded Design—Shows the pattern of high and low values for the factors in each run. Aliasing of Effects To see which effects are confounded with which other effects, click the disclosure button ( on Windows and on the Macintosh) to reveal the Aliasing of Effects panel. It shows effects and confounding up to two-factor interactions (Figure 12.11). Figure 12.11 Generating Rules and Aliasing of Effects Panel
For example, a full factorial with five factors requires 25 = 32 runs. Eight runs can only accommodate a full factorial with three two-level factors. It is necessary to construct the two additional factors in terms of the first three factors. The price of reducing the number of runs from 32 to eight is effect aliasing (confounding). Confounding is the direct result of the assignment of new factor values to products of the coded design columns.
12 DOE: Screening Designs
Displaying and Modifying the Design
208
12 Screening Designs Creating a Screening Design
For example, the values for Temperature are the product of the values for Feed Rate and Concentration. This means that you can’t tell the difference of the effect of Temperature and the synergistic (interactive) effect of Feed Rate and Concentration. In the example shown in Figure 12.11, all the main effects are confounded with two-factor interactions. This is characteristic of resolution-three designs. Viewing the Confounding Pattern JMP can create a data table that shows the aliasing pattern for a specified level. To create this table: 1 Click the red triangle at the bottom of the Aliasing of Effects area. 2 Select Show Confounding Pattern (Figure 12.12). Figure 12.12 Show Confounding Patterns
3 Enter the order of confounding you want to see (Figure 12.13). Figure 12.13 Enter Order
4 Click OK. Figure 12.14 shows the third level alias for the five-factor reactor example. The effect names begin with C (Constant) and are shown by their order number in the design. Thus, Temperature appears as “4”, with second order aliasing as “1 5” (Feed Rate and Concentration), and third order confounding as “1 2 3” (Feed Rate, Catalyst, and Stir Rate).
12 Screening Designs Creating a Screening Design
209
Understanding the Coded Design In the coded design panel, each row represents a run. Plus signs designate high levels and minus signs represent low levels. As shown in Figure 12.15, rows for the first three columns of the coded design, which represent Feed Rate, Catalyst, and Stir Rate are all combinations of high and low values (a full factorial design). The fourth column (Temperature) of the coded design is the element-by-element product of the first three columns. Similarly, the last column (Concentration) is the product of the second and third columns. Figure 12.15 Default Coded Designs
Changing the Coded Design In the Change Generating Rules panel, changing the checkmarks and clicking Apply changes the coded design; it changes the choice of different fractional factorial designs for a given number of factors. The Change Generating Rules table in Figure 12.16 shows how the last two columns are constructed in terms of the first three columns. The check marks for Temperature show it is a function of Feed Rate, Catalyst, and Stir Rate. The checkmarks for Concentration show it is a function of Catalyst and Stir Rate. If you check the options as shown in Figure 12.16 and click Apply, the Coded Design panel changes. The first three columns of the coded design remain a full factorial for the first three factors (Feed Rate, Catalyst, and Stir Rate). Temperature is now the product of Feed Rate and Catalyst, so the fourth column of the coded design is the element by element product of the first two columns. Concentration is a function of Feed Rate and Stir Rate.
12 DOE: Screening Designs
Figure 12.14 The Third Level Alias for the Five-Factor Reactor Example
210
12 Screening Designs Creating a Screening Design
Figure 12.16 Modified Coded Designs and Generating Rules
Specifying Output Options Use the Output Options panel to specify how you want the output data table to appear. When the options are correctly set up, click Make Table. Figure 12.17 Select the Output Options
• Run Order—Lets you designate the order you want the runs to appear in the data table when it is created. Choices are: Keep the Same—the rows (runs) in the output table appear as they do in the Design panel. Sort Left to Right—the rows (runs) in the output table appear sorted from left to right. Randomize—the rows (runs) in the output table appear in a random order. Sort Right to Left—the rows (runs) in the output table appear sorted from right to left. Randomize within Blocks—the rows (runs) in the output table will appear in random order
within the blocks you set up. • Number of Center Points—Specifies additional runs placed at the center points. • Number of Replicates—Specify the number of times to replicate the entire design, including centerpoints. Type the number of times you want to replicate the design in the associated text box. One replicate doubles the number of runs.
12 Screening Designs Creating a Screening Design
211
After clicking Make Table, you have a data table that outlines your experiment. In the table, the high and low values you specified are displayed for each run. Figure 12.18 The Design Data Table The name of the table is the design type that generated it.
The column called Pattern shows the pattern of low values denoted “–” and high values denoted “+”. Pattern is especially useful as a label variable in plots.
This script allows you to easily fit a model using the values in the design table.
Continuing the Analysis After creating and viewing the data table, you can now run analyses on the data. The data table contains a script labeled Model. Right-click it and select Run Script to run a fit model analysis (Figure 12.19). Figure 12.19 Running the Model Script
12 DOE: Screening Designs
Viewing the Table
212
12 Screening Designs Creating a Screening Design
The next sections describe some of the parts of the analysis report that appears when you click Run Model. Viewing an Actual-by-Predicted Plot When the model contains no interactions, an actual-by-predicted plot, shown on the left in Figure 12.20, appears at the top of the Fit Model report. Figure 12.20 An Actual-by-Predicted Plot
To show labels in the graph (on the right in Figure 12.20), select all points, right-click the graph, and select Row Label. The pattern variable displayed in the data table serves as the label for each point. In Figure 12.20, the mean line falls inside the bounds of the 95% confidence curves, which tells you that the model is not significant. The model p-value, R2, and RMSE appear below the plot. The RMSE is an estimate of the standard deviation of the process noise assuming that the unestimated effects are negligible. In this case, the RMSE is 14.199, which is much larger than expected. This suggests that effects other than the main effects of each factor are important. Because of the confounding between two-factor interactions and main effects in this design, it is impossible to determine which two-factor interactions are important without performing more experimental runs. Viewing a Scaled Estimates Report When you fit the model, JMP displays a Scaled Estimates report (Figure 12.21) as a part of the Fit Model report. The Scaled Estimates report displays a bar chart of the individual effects embedded in a table of parameter estimates. The last column of the table has the p-values for each effect. None of the factor effects are significant, but the Catalyst effect is large enough to be interesting if it is real. At this stage the results are not clear, but this does not mean that the experiment has failed. It means that some follow-up runs are necessary.
Figure 12.21 Example of a Scaled Estimates Report
If this scaled estimates report were not merely an example, you would then want to augment the design. For comparison, you might also want to have complete 32-run factorial experimental data and analysis.
13 Response Surface Designs Response surface designs are useful for modeling a curved quadratic surface to continuous factors. If a minimum or maximum response exists inside the factor region, a response surface model can pinpoint it. Three distinct values for each factor are necessary to fit a quadratic function, so the standard two-level designs cannot fit curved surfaces. The most popular response surface design is the central composite design, illustrated in the figure to the left below. It combines a two-level fractional factorial and two other kinds of points: • Center points, for which all the factor values are at the zero (or midrange) value. • Axial (or star) points, for which all but one factor are set at zero (midrange) and that one factor is set at outer (axial) values. The Box-Behnken design, illustrated in the figure on the right below, is an alternative to central composite designs. One distinguishing feature of the Box-Behnken design is that there are only three levels per factor. Another important difference between the two design types is that the Box-Behnken design has no points at the vertices of the cube defined by the ranges of the factors. This is sometimes useful when it is desirable to avoid these points due to engineering considerations. The price of this characteristic is the higher uncertainty of prediction near the vertices compared to the central composite design. Central Composite Design
Box-Behnken Design fractional factorial points
axial points
center points
216
13 Response Surface Designs Introduction
Introduction The Bounce Data.jmp sample data file has response surface data inspired by the tire tread data described in Derringer and Suich (1980). To see this example data table, open the Sample Data folder that was installed when you installed JMP, and select Design Experiment > Bounce Data.jmp. The objective of this experiment is to match a standardized target value (450) of tennis ball bounciness. The bounciness varies with amounts of Silica, Silane, and Sulfur used to manufacture the tennis balls. The experimenter wants to collect data over a wide range of values for these variables to see if a response surface can find a combination of factors that matches a specified bounce target. To follow this example: 1 Select DOE > Response Surface Design. 2 Load factors by clicking the red triangle icon on the Response Surface Design title bar and selecting Load Factors. Navigate to the Sample Data folder that was installed when you installed JMP, and select Design Experiment > Bounce Factors.jmp. 3 Load the responses by clicking the red triangle icon on the Response Surface Design title bar and selecting Load Responses. Navigate to the Sample Data folder that was installed when you installed JMP, and select Design Experiment > Bounce Response.jmp. Figure 13.1 shows the completed Response panel and Factors panel. Figure 13.1 Response and Factors For Bounce Data
After the response data and factors data are loaded, the Response Surface Design Choice dialog lists the designs in Figure 13.2. Figure 13.2 Response Surface Design Selection
The Box-Behnken design selected for three effects generates the design table of 15 runs shown in Figure 13.3. In real life, you would conduct the experiment and then enter the responses into the data table. Let’s pretend this happened and use a finalized data table called Bounce Data.jmp.
13 Response Surface Designs Introduction
217
After obtaining the Bounce Data.jmp data table, run a fit model analysis on the data. The data table contains a script labeled Model. 2 Right-click Model and select Run Script to start a fit model analysis. 3 Click Run Model. The standard Fit Model analysis results appear in tables shown in Figure 13.4, with parameter estimates for all response surface and crossed effects in the model. The prediction model is highly significant with no evidence of lack of fit. All main effect terms are significant as well as the two interaction effects involving Sulfur. Figure 13.4 JMP Statistical Reports for a Response Surface Analysis of Bounce Data
The Response Surface report also has the tables shown in Figure 13.5.
13 DOE: Response Surface Designs
1 Open to the Sample Data folder that was installed when you installed JMP, and select Design Experiment > Bounce Data.jmp (Figure 13.3). Figure 13.3 JMP Table for a Three-Factor Box-Behnken Design
218
13 Response Surface Designs Creating a Response Surface Design
Figure 13.5 Statistical Reports for a Response Surface Analysis
Provides a summary of the parameter estimates
Lists the critical values of the surface factors and tells the kind of solution (maximum, minimum, or saddlepoint). The solution for this example is a saddlepoint. The table also warns that the critical values given by the solution are outside the range of data values. Shows eigenvalues and eigenvectors of the effects. The eigenvector values show that the dominant negative curvature (yielding a maximum) is mostly in the Sulfur direction. The dominant positive curvature (yielding a minimum) is mostly in the Silica direction.
Creating a Response Surface Design Response Surface Methodology (RSM) is an experimental technique invented to find the optimal response within specified ranges of the factors. These designs are capable of fitting a second-order prediction equation for the response. The quadratic terms in these equations model the curvature in the true response function. If a maximum or minimum exists inside the factor region, RSM can find it. In industrial applications, RSM designs involve a small number of factors. This is because the required number of runs increases dramatically with the number of factors. Using the response surface designer, you choose to use well-known RSM designs for two to eight continuous factors. Some of these designs also allow blocking. Response surface designs are useful for modeling and analyzing curved surfaces. To start a response surface design, select DOE > Response Surface Design, or click the Response Surface Design button on the JMP Starter DOE page. Then, follow the steps below: • “Entering Responses and Factors,” p. 219 • “Choosing a Design,” p. 219 • “Specifying Axial Value (Central Composite Designs Only),” p. 220 • “Specifying Output Options,” p. 221 • “Viewing the Design Table,” p. 221 • “Continuing the Analysis, If Needed,” p. 222
13 Response Surface Designs Creating a Response Surface Design
219
The steps for entering factors in a response surface design are unique to this design. To add factors, follow the step in Figure 13.6. Figure 13.6 Entering Factors into a Response Surface Design
Click Continue to proceed to the next step.
Choosing a Design Highlight the type of response surface design you would like to use and click Continue. Figure 13.7 Choose a Design Type
The Response Surface designer provides the following types of designs: Box-Behnken Designs The Box-Behnken design has only three levels per factor and has no points at the vertices of the cube defined by the ranges of the factors. This is sometimes useful when it is desirable to avoid extreme
13 DOE: Response Surface Designs
Entering Responses and Factors
220
13 Response Surface Designs Creating a Response Surface Design
points due to engineering considerations. The price of this characteristic is the higher uncertainty of prediction near the vertices compared to the central composite design. Central Composite Designs The response surface design list contains two types of central composite designs: uniform precision and orthogonal. These properties of central composite designs relate to the number of center points in the design and to the axial values: • Uniform precision means that the number of center points is chosen so that the prediction variance at the center is approximately the same as at the design vertices. • For orthogonal designs, the number of center points is chosen so that the second order parameter estimates are minimally correlated with the other parameter estimates.
Specifying Axial Value (Central Composite Designs Only) When you select a central composite (CCD-Uniform Precision) design and then click Continue, you see the panel in Figure 13.8. It supplies default axial scaling information. Entering 1.0 in the text box instructs JMP to place the axial value on the face of the cube defined by the factors, which controls how far out the axial points are. You have the flexibility to enter the values you want to use. Figure 13.8 Display and Modify the Central Composite Design
• Rotatable makes the variance of prediction depend only on the scaled distance from the center of the design. This causes the axial points to be more extreme than the range of the factor. If this factor range cannot be practically achieved, it is recommended that you choose On Face or specify your own value. • Orthogonal makes the effects orthogonal in the analysis. This causes the axial points to be more extreme than the –1 or 1 representing the range of the factor. If this factor range cannot be practically achieved, it is recommended that you choose On Face or specify your own value. • On Face leaves the axial points at the end of the -1 and 1 ranges. • User Specified uses the value entered by the user, which can be any value greater than zero. Enter that value into the Axial Value text box. If you would like to inscribe the design, click the box beside Inscribe. When checked, JMP re-scales the whole design so that the axial points are at the low and high ends of the range (the axials are –1 and 1 and the factorials are shrunken based on that scaling).
13 Response Surface Designs Creating a Response Surface Design
221
Use the Output Options panel to specify how you want the output data table to appear. When the options are correctly set up, click Make Table. Figure 13.9 Select the Output Options
• Run Order—Lets you designate the order you want the runs to appear in the data table when it is created. Choices are: • Keep the Same—the rows (runs) in the output table will appear as they do in the Design panel. • Sort Left to Right—the rows (runs) in the output table will appear sorted from left to right. • Randomize—the rows (runs) in the output table will appear in a random order. • Sort Right to Left—the rows (runs) in the output table will appear sorted from right to left. • Randomize within Blocks—the rows (runs) in the output table will appear in random order
within the blocks you set up. • Number of Center Points—Specifies additional runs placed at the center points. • Number of Replicates—Specify the number of times to replicate the entire design, including centerpoints. Type the number of times you want to replicate the design in the associated text box. One replicate doubles the number of runs.
Viewing the Design Table Now you have a data table that outlines your experiment, as described in Figure 13.10.
13 DOE: Response Surface Designs
Specifying Output Options
222
13 Response Surface Designs Creating a Response Surface Design
Figure 13.10 The Design Data Table
The name of the table is the design type that generated it.
The column called Pattern identifies the coding of the factors. It shows all the codings with “+” for high, “–” for low factor, “a” and “A” for low and high axial values, and “0” for midrange. Pattern is suitable to use as a label variable in plots because when you hover over a point in a plot of the factors, the pattern value shows the factor coding of the point. Runs are in a random order.
This script allows you to easily fit a model using the values in the design table.
The Y column is for recording experimental results.
There are two center points per replicate.
Continuing the Analysis, If Needed After creating and viewing the design table, running the experiment, and recording your results in the design table’s Y column, run a fit model analysis on the data. The data table contains a script labeled Model. Right-click it and select Run Script (Figure 13.11) to fit the model. Figure 13.11 Running the Script
After clicking Run Model in the dialog, (Figure 13.12), review the analysis.
13 Response Surface Designs Creating a Response Surface Design
13 DOE: Response Surface Designs
Figure 13.12 Fitting the Model
223
14 Prospective Power and Sample Size Use the DOE > Sample Size and Power command to answer the question “How many runs do I need?” The important quantities are sample size, power, and the magnitude of the effect. These depend on the significance level—alpha—of the hypothesis test for the effect and the standard deviation of the noise in the response. You can supply either one or two of the three values. If you supply only one of these values, the result is a plot of the other two. If you supply two values, JMP computes the third. This capability is available for the single sample, two-sample, and k-sample situations. Using the Sample Size and Power command when doing a prospective analysis helps answer the question, “Will I detect the group differences I am looking for, given my proposed sample size, estimate of within-group variance, and alpha level?” In this type of analysis, you must give JMP an estimate of the group means and sample sizes in a data table as well as an estimate of the within-group standard deviation (σ). The Sample Size and Power command determines how large of a sample is needed to be reasonably likely that an experiment or sample will yield a significant result, given that the true effect size is at least a certain size. It requires that you enter any two of three quantities, difference to detect, sample size, and power, and computes the third for the following cases: • difference between one sample's mean and a hypothesized value • difference between two samples means • differences in the means among k samples • difference between a variance and a hypothesized value • difference between one sample proportion and a hypothesized value • difference between two sample proportions • difference between counts per unit in a Poisson-distributed sample and a hypothesized value. The calculations assume that there are equal numbers of units in each group. You can apply this platform to more general experimental designs, if they are balanced, and a number-of-parameters adjustment is specified.
Prospective Power Analysis The following five values have an important relationship in a statistical test on means: • Alpha Alpha is the significance level that prevents declaring a zero effect significant more than alpha portion of the time.
226
14 Prospective Power and Sample Size One-Sample and Two-Sample Means
• Error Standard Deviation Error Standard Deviation is the unexplained random variation around the means. • Sample Size Sample size is how many experimental units (runs, or samples) are involved in the experiment. • Power Power is the probability of declaring a significant result. • Effect Size Effect size is how different the means are from each other or from the hypothesized value. The Sample Size and Power command in JMP helps estimate in advance either the sample size needed, power expected, or the effect size expected in the experimental situation where there is a single mean comparison, a two sample comparison, or when comparing k sample means. When you select DOE > Sample Size and Power, the panel shown in Figure 14.1 appears with button selections for experimental situations. The following sections describe each of these selections and explains how to enter estimated parameter values and the desired computation. Figure 14.1 Sample Size and Power Choices
One-Sample and Two-Sample Means After you click either One Sample Mean, or Two Sample Means in the initial Sample Size selection list (Figure 14.1), the Power and Sample Size dialog in Figure 14.2 appears and asks for the anticipated experimental values. The values you enter depend on your initial choice. As an example, consider the two-sample situation.
14 Prospective Power and Sample Size One-Sample and Two-Sample Means
227
The Two Sample Means choice in the initial Power and Sample Size dialog always requires values for Alpha and the error standard deviation (Error Std Dev), as shown here, and one or two of the other three values: Difference to detect, Sample Size, and Power. The power and sample size platform then calculates the missing item. If there are two unspecified fields, the power and sample size platform constructs a plot that shows the relationship between those two values: • power as a function of sample size, given specific effect size • power as a function of effect size, given a sample size • effect size as a function of sample size, for a given power. The Power and Sample Size dialog asks for the values depending the first choice of design: • Alpha Alpha is the significance level, usually 0.05. This implies willingness to accept (if the true difference between groups is zero) that 5% (alpha) of the time a significant difference will be incorrectly declared. • Error Std Deviation Error Std (Standard) Deviation is the true residual error. Even though the true error is not known, the power calculations are an exercise in probability that calculates what might happen if the true values were as specified. • Extra Params Extra Params (Parameters) is only for multi-factor designs. Leave this field zero in simple cases. In a multi-factor balanced design, in addition to fitting the means described in the situation, there are other factors with the extra parameters that can be specified here. For example, in a three-factor two-level design with all three two-factor interactions, the number of extra parameters is five—two parameters for the extra main effects, and three parameters for the interactions. In practice, it isn’t very important what values you enter here unless the experiment is in a range where there is very few degrees of freedom for error. • Difference to Detect Difference to detect is the smallest detectable difference (how small a difference you want to be able to declare statistically significant). For single sample problems this is the difference between the hypothesized value and the true value. • Sample Size Sample size is the total number of observations (runs, experimental units, or samples). Sample size is not the number per group, but the total over all groups. Computed sample size numbers can have fractional values, which you need to adjust to real units. This is usually done by
14 DOE: Power And Sample Size
Figure 14.2 Initial Power and Sample Size Dialogs for Single Mean (left) and Two Means (right)
228
14 Prospective Power and Sample Size One-Sample and Two-Sample Means
increasing the estimated sample size to the smallest number evenly divisible by the number of groups. • Power Power is the probability of getting a statistic that will be declared statistically significant. Bigger power is better, but the cost is higher in sample size. Power is equal to alpha when the specified effect size is zero. You should go for powers of at least 0.90 or 0.95 if you can afford it. If an experiment requires considerable effort, plan so that the experimental design has the power to detect a sizable effect, when there is one. • Continue Evaluates at the entered values. • Back Goes back to the previous dialog. • Animation Script The Animation Script button runs a JSL script that displays an interactive plot showing power or sample size. See the section, “Power and Sample Size Animation for a Single Sample,” p. 229, for an illustration of this animation script.
Single-Sample Mean Suppose there is a single sample and the goal is to detect a difference of 2 where the error standard deviation is 0.9, as shown in the left-hand dialog in Figure 14.3. To calculate the power when the sample size is 10, leave Power missing in the dialog and click Continue. The dialog on the right in Figure 14.3, shows the power is calculated to be 0.99998, rounding to 1. Figure 14.3 A One-Sample Example
To see a plot of the relationship of power and sample size, leave both Sample Size and Power empty and click Continue. Double click on the horizontal axis to get any desired scale. The left-hand graph in Figure 14.4, shows a range of sample sizes for which the power varies from about 0.2 to 0.95. Change the range of the curve by changing the range of the horizontal axis. For example, the plot on the right in Figure 14.4, has the horizontal axis scaled from 1 to 8, which gives a more typical looking power curve.
14 Prospective Power and Sample Size One-Sample and Two-Sample Means
229
When only Sample Size is specified (Figure 14.5) and Difference to Detect and Power are empty, a plot of power by difference appears. Figure 14.5 Plot of Power by Difference to Detect for a Given Sample Size
Power and Sample Size Animation for a Single Sample The Animation Script button on the Power and Sample Size dialog for the single mean displays an interactive plot that illustrates the effect that changing the sample size has on power. In the example shown in Figure 14.6, Sample Size is 10, Alpha is 0.05, and the Difference to Detect is set to 0.4. The animation begins showing a normal curve positioned with mean at zero (representing the estimated mean and the true mean), and another with mean at 0.4 (the difference to be detected). The probabil-
14 DOE: Power And Sample Size
Figure 14.4 A One-Sample Example Plot
230
14 Prospective Power and Sample Size One-Sample and Two-Sample Means
ity of committing a Type II error (not detecting a difference when there is a difference), often represented as β in literature, is shaded in blue on this plot. You can drag the handles over the curves drag to see how their positions affect power. Also, you can click on the values for sample size and alpha showing beneath the plot to change them. Figure 14.6 Example of Animation Script to Illustrate Power
Two-Sample Means The dialogs work similarly for two samples; the Difference to Detect is the difference between two means. Suppose the error standard deviation is 0.9 (as before), the desired detectable difference is 1, and the sample size is 16. Leave Power blank and click Continue to see the power calculation, 0.5433, as shown in the dialog on the left in Figure 14.7. This is considerably lower than in the single sample because each mean has only half the sample size. The comparison is between two random samples instead of one. To increase the power requires a larger sample. To find out how large, leave both Sample Size and Power blank and examine the resulting plot, shown on the right in Figure 14.7. The crosshair tool estimates that a sample size of about 35 is needed to obtain a power of 0.9.
14 Prospective Power and Sample Size k-Sample Means
231
k-Sample Means The k-Sample Means situation can examine up to 10 kinds of means. The next example considers a situation where 4 levels of means are expected to be about 10 to 13, and the Error Std Dev is 0.9. When a sample size of 16 is entered the power calculation is 0.95, as shown in the dialog on the left in Figure 14.8. If both Sample Size and Power are left blank, the power and sample size calculations produce the power curve shown on the right in Figure 14.8. This confirms that a sample size of 16 looks acceptable. Notice that the difference in means is 2.236, calculated as square root of the sum of squared deviations from the grand mean. In this case it is the square root of (–1.5)2+ (–0.5)2+(0.5)2+(1.5)2, which is the square root of 5.
14 DOE: Power And Sample Size
Figure 14.7 Plot of Power by Sample Size to Detect for a Given Difference
232
14 Prospective Power and Sample Size One-Sample Variance
Figure 14.8 Prospective Power for k-Means and Plot of Power by Sample Size
One-Sample Variance The One-Sample Variance choice on the Power and Sample Size dialog (Figure 14.1) determines sample size for detection of a change in variance.The usual purpose of this option is to compute a large enough sample size to guarantee that the risk of accepting a false hypothesis (β) is small. In the dialog, specify a baseline variance, alpha level, and direction of change you want to detect. To indicate direction of change, select either Larger or Smaller from the Guarding a change menu. The computations then show whether the true variance is larger or smaller than its hypothesized value, entered as the Baseline Variance. An example is when the variance for resistivity measurements on a lot of silicon wafers is claimed to be 100 ohm-cm and a buyer is unwilling to accept a shipment if variance is greater than 55 ohm-cm for a particular lot. The examples throughout the rest of this chapter use engineering examples from the online manual of The National Institute of Standards and Technology (NIST). You can access the NIST manual examples at http://www.itl.nist.gov/div898/handbook. As with previous dialogs, you enter two of the items and the Power and Sample Size calculations determines the third. Suppose you want to detect an increase of 55 for a baseline variance of 100, with an alpha of 0.05 and power of 0.99. Enter these items as shown in Figure 14.9. When you click Continue, the computed result shows that you need a sample size of 170.
14 Prospective Power and Sample Size One-Sample and Two-Sample Proportions
233
Note: Remember to enter the variance in the Baseline Variance box, not the standard deviation. Figure 14.9 Sample Size and Power Dialog To Compare Single-Direction One-Sample Variance
One-Sample and Two-Sample Proportions The dialogs and computations to test power and sample sizes for proportions are similar to those for testing sample means. The dialogs are the same except you enter Baseline Proportion and also specify either a one-sided or two-sided test. For a one-sample situation, the Baseline Proportion is the average of a known baseline proportion and the single sample proportion. When there are two samples, the Baseline Proportion you enter is the average of the two sample proportions. The sampling distribution for proportions is actually binomial, but the computations to determine sample size and test proportions use a normal approximation, as indicated on the dialogs (Figure 14.10).
14 DOE: Power And Sample Size
If you want to detect a change to a small variance, enter a negative amount in the Difference to Detect box.
234
14 Prospective Power and Sample Size One-Sample and Two-Sample Proportions
Figure 14.10 Default Power and Sample Dialog for One-Sample and Two-Sample Proportions
Enter average of baseline and one-sample proportions, or average of two-sample proportions. Enter 1 or 2 to indicate the type of test (one- or two-sided)
Testing proportions is useful in production lines, where proportion of defects is part of process control monitoring. For example, suppose a line manager wants to detect a change in defective units that is 6% above a known baseline of approximately 10% defective.The manager does not want to stop the process unless it has degenerated greater than 16% defects (6% above the 10% known baseline defective). The Baseline Proportion in this example is 0.08, which is the average of the baseline (10%) and the proportion above the baseline (6%). The example process is monitored with a one-sided test at 5% alpha and a 10% risk (90% power) of failing to detect a change of that magnitude. Figure 14.11 shows the entries in the Sample Size and Power dialog to detect a given difference between an observed proportion and a baseline proportion, and the computed sample size of approximately 77. To see the plot on the right in Figure 14.11, leave both Difference to Detect and Sample Size blank. Use the grabber tool (hand) to move the x-axis and show a specific range of differences and sample sizes. Figure 14.11 Dialog To Compare One Proportion to a Baseline and Sample Size Plot
14 Prospective Power and Sample Size Counts per Unit
235
The Counts per Unit selection calculates sample size for the Poisson-distributed counts typical when you can measure more than one defect per unit. A unit can be an area and the counts can be fractions or large numbers. Although the number of defects observed in an area of a given size is often assumed to have a Poisson distribution, the area and count are assumed to be large enough to support a normal approximation. Questions of interest are: • Is the defect density within prescribed limits? • Is the defect density greater than or less than a prescribed limit? Enter alpha and the baseline count per unit. Then enter two of the remaining fields to see the calculation of the third. The test is for one-sided (one-tailed) change. Enter the Difference to Detect in terms of the baseline count per unit (defects per unit). The computed sample size is expressed in those units. As an example, consider a wafer manufacturing process with a target of 4 defects per wafer and you want to verify that a new process meets that target. 1 2 3 4
Select DOE > Sample Size and Power. Click the Counts per Unit button. Enter an alpha of 0.1 to be the chance of failing the test if the new process is as good as the target. Enter a power of 0.9, which is the chance of detecting a change larger than 2 (6 defects per wafer). In this kind of situation, alpha is sometimes called the producer’s risk and beta is called the consumer’s risk. 5 Click Continue to see the computed sample size of 8.128 (Figure 14.12). The process meets the target if there are less than 48 defects (6 defects per wafer in a sample of 8 wafers). Figure 14.12 Dialog For Counts Per Unit Example
14 DOE: Power And Sample Size
Counts per Unit
236
14 Prospective Power and Sample Size Sigma Quality Level
Sigma Quality Level Use the Sigma Quality Level feature, accessed by selecting DOE > Sample Size and Power, by entering any two of the following three quantities: • number of defects • number of opportunities • Sigma quality level When you click Continue, the sigma quality calculator computes the missing quantity using the formula Sigma Quality Level = NormalQuantile(1 – defects/opportunities) + 1.5. As an example, use the Sample Size and Power feature to compute the Sigma quality level for 50 defects in 1,000,000 opportunities: 1 Select DOE > Sample Size and Power. 2 Click the Sigma Quality Level button. 3 Enter 50 for the number of defects and 1,000,000 as the number of opportunities, as shown in the window to the left in Figure 14.13. 4 Click Continue. The results, as shown in the window on the right in Figure 14.13, are a Sigma quality level of 5.3. Figure 14.13 Sigma Quality Level Example 1
If you want to know how many defects reduce the Sigma Quality Level to “six-sigma” for 1,000,000 opportunities, enter 6 as the Sigma Quality Level and leave the Number of Defects blank (window to the left in Figure 14.14). The computation (window to the right in Figure 14.14) shows that the Number of Defects cannot be more than approximately 3.4.
14 Prospective Power and Sample Size Sigma Quality Level
14 DOE: Power And Sample Size
Figure 14.14 Sigma Quality Level Example 2
237
Index JMP-SE Symbols “F Ratio” 104 “Prob>F” 104
Numerics –2LogLikelihood 151
95% bivariate normal density ellipse 164
A aberration designs 205 acceptable values See lower limits and upper limits activating toolbars 14 Actual-by-Predicted plots 212 Add button 86 Add Column button 95 Add Graphics Script command 184 additional runs 196, 210, 221 AIC 151 AIC 104 Akaike’s Information Criterion 104, 151 aliasing effects 207 All Graphs command 65 All Pairs, Tukey Kramer command 62 Alpha 225, 227 Alpha Amalyze 21 analysis of variance report 47, 92 table 53, 60, 89 Analyze menu 22, 45, 72, 79 Analyze Toolbar 13–14 animation scripts 228 Annotate tool 28 annotating 28 resizing and repositioning 28 ANOVA 60 Display Options command 62 JMP INTRO terms 60
one way 46 popup menu 62 report 47 table 53, 89 ARIMA 143, 154–156 Arrange Plots 186 assigning importances (of responses) 195, 202 Autocorrelation 149 autocorrelation 148–149 Autocorrelation Lags 147 Autoregressive Order 155 axial points 215 scaling, central composite designs 220
B Background Color command 184 Backward 103
bar chart 172, 181 producing 171 Bar Chart command 183 bar chart of correlations 167 Bartlett’s test 64 BIC 151 bivariate normal density ellipse 164 blue diamond disclosure icon 53 Border 184 Bounce Data.jmp 216 Bounce Factors.jmp 216–217 Bounded 157 Box Plots command 62, 65 Box-Behnken designs 215–216, 219 See also Response Surface designs Box-Jenkins model see ARIMA Braces.jmp 131 Brown smoothing 157 Brown-Forsythe test 64 By role 38
240
Index JMP-SE
C C Total 89 Capability Analysis command 39 with Control Charts 115 Categorical factors 203 categorical probabilities testing 36 categorical variables 29 graphs and reports 34 Caustic Soda 21 CCD See central composite designs c-Chart 132 CDF plot command 33 center points central composite designs 215 response surface designs 215 central composite designs 215, 219–220 See also response surface designs Chakravarty 205 changing individual levels 172 Chart launch dialog 181–182 Chart platform 171, 181 Frame Options 183 Level Options 184 Platform options 185 Single-Chart Options 182 chart platform 172 changing all levels 173 chart types 185 Chi Square statistic 35 clipboard 180 Close command 19 Coating.jmp 107, 125–126 Color or Mark by Column command 49 colors and markers 49 Colors command 184 Column Info 96 Columns command in reports 34 Compare Means command 50, 62 comparison circles 50, 63 interpretation 63 Comparison Circles command 66 Confid Curves Fit command 59 Confid Curves Fit option 52 Confidence Interval command 24, 37 Confidence Intervals 155–156 confidence intervals in ANOVA 50 in linear regression 52
mean 23, 30, 36 score 37 selecting level 37 confidence limits in linear regression 59 confounding 207, 212 resolution numbers 204 confounding pattern 208 Connect Color command 187 Connect Means command 66 Connect Points option 174, 183, 186 Connect Through Missing 186 Connecting Lines 149 constant estimate 152 Constrain fit 155 contingency table 54, 66 analysis 45, 66 reports 66 Contingency Table command 67 continuous factors 203 continuous variables 29, 46 graphs and reports 30 popup menu 30 Contrast dialog 84 contrasts 84 Control Charts c 132 Individual Measurement 127 Moving Range 127 np 130 p 130 R 125 S 125 Shewhart 124–133 u 131 XBar 125 Copy command 18, 180 corrected total 89 correlation 161–170 correlation coefficient 53 correlation matrix 163 Correlation of Estimates command 95 Correlations Multivariate 163 Cotter designs 206 count 34 Count Axis command 31 counts per unit (power and sample size) 235 covariance 161–170 Covariance Matrix 166 Cp 41, 104 Cp 104
241
Index JMP-SE
D damped-trend linear exponential smoothing 158 data table opening 15 Data Table Window 39, 66 defects 235 Denim.jmp 21, 29, 45, 71, 79, 171 details 21 Density Axis command 31 Density Ellipse 164–165, 167 Density Ellipse command 53, 61 density functions 33 descriptive statistics 22 design resolutions 204 designs aberration 205 Box-Behnken 215–216, 219 central composite 215 fractional factorials 204 full factorial 189, 195, 199 full factorials 204 minimum aberration 205 mixed-level 205 orthogonal screening designs 204 surface designs 220 orthogonal arrays 205 Plackett-Burman 205 response surface 215 screening 199 uniform precision 220 desirability values 202 DF 151 DFE 104 dialog boxes
dragging in 46 Difference to Detect option 227, 230 Differencing Order 155 Direction 102
disclosure icon 53 DispayBox command 184 Display Options command 62, 65
Distribution platform 16–17, 21 graphs 29 launch dialog 17, 22, 24 launching 22 report 25 DOE simple examples 199 double exponential smoothing 157 drag 147, 166 dummy variables 93 Dunnett’s comparisons 62 Durbin-Watson Test 96
E Each Pair, Student’s t command 62 Edit Formula 96
effect aliasing 207 eigenvalue 218 eigenvector 218 size 226 sparsity 199, 205–206 effect details 93 Effect Leverage Pairs 97 Effect Leverage personality 88 Effect Screening personality 88 Effect Test table 91 effects nonestimable 204 orthogonal 220 eigenvalue of effect 218 eigenvector of effect 218 Ellipse Alpha 166 Ellipse Color 166 Enter All 103 Entered 103 equal variances in t test 47 Error Bars command 62 error SS 90 error standard deviation 226–227 Estimate 103, 152, 155, 157 evolution 156 Expanded Estimates command 93
Index
Cpk 41 Cpm 41 Cross button 82, 86 crossed effect 86 crosstabs table 66 cumulative distribution function 33 cumulative logistic probability plot 67 cumulative probabilities 34 Current Estimates table 102 Custom 156 Custom Test command 94 cut and paste 18, 28, 171, 180
242
Index JMP-SE
exponential smoothing see Smoothing Models extra parameters 227
F “F Ratio” 104
F Ratio 47 F test 90 Factor 152 Factor Profiling option 193 factorial designs fractionals 204 full 189, 195, 199, 204 three level 205 Factorial Sorted macro 87 Factorial to Degree macro 86 factors categorical 203 continuous 203 key factors 199 false negatives 206 Fat Plus (selection) tool 18, 180 File tab 13 File/Edit toolbar 13–14 Fit Distribution 26, 44 Fit Line command 52, 54, 58 Fit Mean command 51, 58 Fit Model dialog 79–80, 85 platform 79 Fit Model platform 100 examining results 81 launching 79 Save 96–97 Fit Polynomial command 59 Fit Special command 61 Fit Y By X platform 45 launching 45 Fitness.jmp 99 fitting lines 51 fitting personality 85, 87 Fixed 157 Forecast Periods 147, 153 Forecast plot 153 Formula command 56 Formula Editor 56 Forward 102 fractional factorial designs 204 Freq button 86 frequencies table 34 frequency 34
full factorial designs 189, 195, 199, 204 examples 189 Full Factorial macro 83, 86
G general linear model 79 Go 101, 103 goal types 195, 202 goals matching targets 202 minimizing and maximizing 202 Goodness of Fit 33 Grand Mean command 65 Graph 149 Graph menu 171, 173, 175 Graph toolbar 13–14 Graphs tab 13 Group By command 54, 61 group variances homogeneity 64 grouping variable 54, 61, 181
H hand tool in Distribution platform 22 with Distribution platform 31 help system 11 Help tool 12 histogram 16, 18, 22, 24 bar position 23 bar widths 22 red bracket (box plot) 32 using 22 Histogram command 30 Hoeffding’s D 168, 170 Holt smoothing 158 homogeneity of variances 64 honestly significant difference 62 Horizontal command 185 Horizontal Layout command 17, 30 hypothesized means specifying 25
I identifying key factors 199 importance of responses 195, 202 independent variables 55 Individual Confidence Interval 96
243
Index JMP-SE
J JMP Starter 12–13 JSL (JMP Scripting Language) animation scripts 228
K Keep the Same command 196, 210, 221 Kendall’s Tau 168
Kendall’s tau-b 169 k-Sample Means (power and sample size) 231
L L18 Chakravarty 205 L18 Hunter 205 L18 John 205 L36 205 Label column 211, 222 lack of fit 59, 61, 90 error 90 table 60 Lag 152 Lasso tool 48 leaf values 32 least squares means 82, 93 least squares regression 51, 58, 79 legends with colors and markers 49 Level Midpoints command 38 Level Numbers command 38 Level Options command 185 level smoothing weight 156 Levene’s test 64 leverage plot 91–92 whole model 92
likelihood ratio tests 36 Line Chart command 183 Line of Fit command 65 Line Style 187
linear contrasts 84 linear exponential smoothing 158 linear regression 51, 58 confidence limits 59 Lock 103 Log function 57 Log10 function 57 Logistic platform 55 Logistic Plot command 68 logistic regression 67 Lognormal 33 Long-term sigma 39 Lot Number column 21 Lower Spec Limit 39 LSMeans 82, 93 LSMeans Contrast command 84, 93 LSMeans Plot 82 LSMeans Plot command 93 LSMeans Student’s t command 93 LSMeans Table command 93
M macros 83 Macros drop-down list 86 Make Model 103–104 Mallow’s Cp criterion 104 Marker Drawing Mode 184 Marker Size command 184 Markers command 184 marking points 48 matched pairs 47 plot interpretation 75 scatterplot 73 Matched Pairs platform 47, 71 interpreting the scatterplot 73 launching 72 preparing the data 71 matching target goals 195, 202 maximize responses 195, 202 maximizing goals 202 mean 16, 18, 30 confidence interval 30, 36 specifying hypothesized 25 test 35 testing 24
Index
Individual Measurement Chart 127 inertia of Scroller tool 82 Inscribe option 220 interaction effect, adding 82 interactions 206 high-order 204 Intercept 155 intercept 152 interquartile range 31 Introduction sections, about 12 Inverse Corr table 163 inverse correlation 163, 170 Invertible 151
244
Index JMP-SE Mean CI Lines command 65 Mean Confidence Interval 96 Mean Error Bars command 65 Mean Error Bars option 48 Mean Line 149 Mean Lines command 65
means one and two sample 226 Means and Std Dev command 48 means diamonds 31, 50 Means Diamonds command 62, 65 Means Dots command 62 Means/Anova/t test command 50, 62 Means/Std Dev/Std Err command 62 median 16, 18, 31–32 Median rank scores 64 Method column 21 Minimal Report personality 88 minimize responses 195, 202 minimizing goals 202 minimum aberration designs 205 missing value 163 missing values 167 Mixed 103 mixed-level designs 205 Mixture Response Surface macro 87 Model Comparison table 151 model effects 86 Model script Model Specification dialog 211, 217, 222 model sum of squares 90 Model Summary table 151, 155 Modeling tab 13 modeling type 29, 45 Moments command 30 Moments table 36 More Moments command 30 mosaic plot 54, 66 Mosaic Plot command 67 Moving Average Order 155 Moving Range Chart 127 MSE 104 multiple comparison tests 62–63, 93 multiple regression example 99–105 Multivariate 161, 163 Multivariate platform 161–170
N N Responses button 195, 202 nDF 104
Needle Chart command 183 Needle option 186 Needle Plot command 172
nested effect 86 New Column
command 57 nominal logistic regression see Logistic platform nominal variables 29 nominal/ordinal by continuous fit see Logistic platform nonconforming unit 43 nonestimable effects 204 Nonparametric Correlations 168 Nonparametric Measures of Association table 168 nonparametric tests 25, 35 Normal 33 Normal 26 normal density ellipse 164 normal quantile plot 25, 32 Normal Quantile Plot command 25, 31, 64 Normal Quantiles command 38 normality 25 np-Chart 130 Number of Forecast Periods 150 number of runs screening designs 204
O O’Brien’s test 64 OC Curves 118 Oil1 Cusum.jmp 138 On Face option 220
one way analyses 47 one way ANOVA 46 one-sample and two-sample means 226 one-sample proportion (power and sample size) 233 one-sample variance (power and sample size) 232 opening data tables 15 order of runs 196, 210, 221 ordinal variables 29 orthogonal array designs 205 orthogonal designs screening designs 204 surface designs 220 Orthogonal option 220 Other 166 outlier box plot 30–31
Index JMP-SE
P p value 35, 47, 81, 85 p, d, q parameters 155 Pairwise Correlations 163
Pairwise Correlations table 167 Parameter 103
parameter estimates 60 with fitted lines 52 Parameter Estimates Table 60, 91 Parameter Estimates table 152 parameters, extra 227 Partial Autocorrelation 149 partial autocorrelation 148 Partial Corr table 164 partial correlation 164 Paste command 19, 180 Pattern column 197, 201, 211, 222 patterns confounding 208 p-Chart 130 Pearson Chi Square test 36 Pearson correlation 167, 169 Pen Style command 183 Periods Per Season 156 personality 99 Pickles.jmp 127 Pie command 185 Plackett-Burman designs 205 Plot Actual by Predicted 96 Plot Actual by Quantile command 65 Plot Effect Leverage 96 Plot Quantile by Actual command 65 Plot Residual By Predicted 96 Plot Residual By Row 96 Plot Residuals command 53
plots Actual-by-Predicted 212 Point Chart command 183 points axial 215 colors and markers 49 Points command 65 Points Jittered command 66 Points Spread command 66 Poisson-distributed counts 235 polynomial effect 87 Polynomial to Degree macro 87 power analyses 226 in statistical tests on means 226 one-sample and two-sample means 227–228 Power Analysis command 93 power and sample size calculations 225–237 animation 229 counts per unit 235 k-sample means 231 one-sample and two sample proportions 233 one-sample mean 228 one-sample variance 232 sigma quality level 236 two-sample means 230 Ppk Capability Labeling 40 Predicted Values 96 predicted values saving 58–59 prediction variances 220 Prediction Formula 96 prerequisites to using JMP INTRO 11 Press 96 Print command 18 printing reports 18 Prob Axis command 31 Prob Scores command 38 Prob to Enter 102 Prob to Leave 102 Prob>|t| 152 “Prob>F” 104 probabilities testing 26, 35 Probability Labels command 65 process capability ratio 41 product-moment correlation 167, 169 proportions (power and sample size) 233 Pumice Stone 21 pure error 90
Index
outliers 31 outside effect 86 overlap marks 50 Overlay Color command 183 Overlay Marker Color 187 Overlay Marker command 183, 187 Overlay option 185–186 Overlay Pattern command 183 Overlay Plot platform 173, 185 Connect Points option 174 platform options 186 single-plot options 186 Y Options 174 Y Options command 186
245
246
Index JMP-SE
examples 216–223 introduction 218 purpose 215 reports 217
p-value 36
Q q-q plot 31 Quantile Box Plot command 32
quantile-quantile plot 31 quantiles 32 Quantiles command 30, 48, 62 quartiles 31
R r 53 R2 59, 89 R2 adjusted 59, 89 Random Effect 99 Randomize within Blocks 210, 221 randomizing runs 196, 210, 221 Range option 186 Range Plot command 186 Ranks Averaged command 38 Ranks command 38 R-Chart 125 Reactor 32 Runs.jmp 189 Reactor Factors.jmp 189–190 Reactor Response.jmp 189 red bracket (box plot) 32 red triangle popup menus 17 Redo Analysis command 66 regressor columns 206 Remove 147 Remove 85 Remove All 103 Remove button 82 Remove Fit command 53 reports setting titles 80 requesting additional runs 196, 210, 221 re-running an analysis 82 rescaling designs 220 Residual Statistics 154 Residuals 96 residuals 53, 153 plotting 53 saving 58 resolution numbers 204 resolutions of designs 204 response surface designs
Response Surface Effect macro 87 Response Surface Methodology (RSM) 218 response surface models 87 responses custom designs 202, 219 desirability values 202 goals 195, 202 lower limits 202 upper limits 202 results annotating 28 revealing columns in reports 34 RMSE 193, 212 root mean square error 59, 89 Rotatable option 220 Row Colors command 49 Row Markers command 49 row states 23 RSM (Response Surface Methodology) 218 RSquare 104, 151 Rsquare 89 Adj 89 RSquare Adj 104 Run Charts 114 Run Model 100 Run Model button 84, 88 runs additional 196, 210, 221 order they appear in table 196, 210, 221 requesting additional 196, 210, 221 screening designs 204
S sample autocorrelation function 149 sample data Denim.jmp 21, 29, 45, 71, 79, 171 details 21 sample means 226 Sample Size and Power command 225 sample sizes example comparing one proportion to baseline and sample size plot 234 example comparing single-direction one-sample variances 232 example with counts per unit 235 one and two sample means 227
247
Index JMP-SE
shortest half 32 Show Center Line 127 Show Confidence Interval 153–154 Show Correlations 165 Show Histogram 165 Show Points 149, 153–154 Show Points command 58, 183, 186
sigma 39–40 Sigma Quality 41 sigma quality level (power and sample size) 236 signed-rank test 35 significance probability 167 stepwise regression 99 simple exponential smoothing 157 single-sample means (power and sample sizes) 228 Size/Scale commands 184 Smoothing Model dialog 156 smoothing models 143, 155–159 smoothing weight 156 Sort Left to Right 196, 210, 221 Sort Right to Left 196, 210, 221 sparsity, effect 199, 205–206 Spearman’s Rho 169 Spearman’s Rho 168 Spec Limits 40 Specified Sigma 40 specifying hypothesized means 25 Split command 71 Split command selecting rows 71 SS 104 SSE 104 Stable 151 Stable Invertible 156 Stack 38 Standard Deviation 151 standard deviation 16, 18, 25, 30 testing 35 standard deviation, error 226 Standardize command 38 standardized values 38 star points 215 starting JMP INTRO 12 statistical tests 34 Statistics 171 Statistics button 181 Std Dev Lines command 62, 65 Std Dev Lines option 48 Std Err Bars command 31 Std Error 152
Index
prospective power analysis 226 screening designs 189 Sand Blasted? column 21 Save 40 Save Centered command 65 Save Columns 154 Save commands 37 Save Normal Quantiles command 65 Save Predicted Values 96 Save Script 38 Save Script for All Objects command 66 Save Script to Data Table command 66 Save Script to Report command 66 Save Script to Script Window command 66 Save Standardized command 65 SBC 151 scaling axial 220 designs 220 scatterplot 45–46, 58 Scatterplot Matrix 164 scatterplot matrix 163 S-Chart 125 Schwartz’s Bayesian Criterion 151 score confidence intervals 37 screening designs 199 design types 204 Script 38 Script submenu 66, 185–186 scripts animation 228 generating the analysis model Model script See Model table property scripting See JSL Scroller tool 81 seasonal exponential smoothing 158 seasonal smoothing weight 156 Select Columns list 86 select rows in data table 161 selecting and marking points 48 selecting report items 18, 180 selection tool 18, 180 Separate Axes command 185–186 Seriesg.jmp 143 Set Alpha Level command 51, 64 setting titles windows and reports 80 Shewhart Control Charts 124–133 Shirts.jmp 132 Short Term, Grouped by Column 40 Short Term, Grouped by fixed subgroup size 40
248
Index JMP-SE Std Error of Individual 97 Std Error of Predicted 96 Std Error of Residual 97
StdErr Prob 34 Stem and Leaf command 32 Step 101, 103 Step command 186
Step History table 102 stepwise regression 100 Control panel 102–103 Stop 103 Studentized Residuals 96 Sum of Squared Errors 151 Summary of Fit table 59, 89 sums of squares 89 surface designs See response surface designs
T t Ratio 152 t statistic 35 t test 25, 35–36, 46–47 report 46 two sample 46–47 Tables menu Split command 71 Tables tab 13 Tables toolbar 14 Tag Line option 28 Target 39 target values 202 Term 152 Test Mean
command 24, 35 dialog 25 Test Probabilities
command 27, 36 table 36 Test Std Dev command 35 testing a mean 24 testing for independence 55 testing probabilities 26 scaling estimated values 27 Tests command 67 Thread Wear column 21 Thread Wear Measured column 21 Time ID role 147 Time Series 143 Time Series Graph 149 Time Series platform 143–159 ARIMA 154–155
commands 148–150 example 143–148 smoothing models 155–159 Time Series Plot 148 Time Series role 147 titles setting in windows and reports 80 toolbars 13 showing and hiding 14 Tools toolbar 13–14 trade-off in screening designs 204 trend 156 Try Fixed Width 179 Tukey-Kramer HSD 62 tutorial examples DOE 199–201 full factorial designs 189 multiple regression 99–105 response surface designs 216–223 time series 143–148 two-level categorical 200 two-level fractional factorials 204 two-level full factorials 204 two-sample and one-sample means 226, 230 two-sample proportion (power and sample size) 233 two-way contingency table 54
U u-Chart 131 Unconstrained 156–157 UnEqual Variances command 64 Ungroup Plots 186
uniform precision designs 220 Uniform Scaling 38 Uniform Y Scale 186 Univariate 163 Upper Spec Limit 39 Use Median 127 User Defined option 220 using histograms 22
V values target 202 Van der Waerden 64 variables categorical 29 continuous, ordinal, and nominal 29
Index JMP-SE
W-Z Washers.jmp 130 Weibull 33 Weight button in Fit Model 85 weight, importance 195, 202 Welch ANOVA 64 Westgard Rules 122 Where 38 whiskers 31 whole model 92 Whole Model Test table 68 Wilcoxon rank scores 64 Wilcoxon signed-rank test 25, 35 Window menu 23, 26, 79–80, 82, 84 windows setting titles 80 Winter’s method 159 With Best, Hsu’s MCB command 62 With Control, Dunnett’s command 62 word processing program with cut and paste 28, 180 X role 147 X-Axis Proportional command 66 XBar Chart 125 Y button in Fit Model 85 Y Options command 185 Y role 147 Y, Columns button 22 Z statistics 43 z test 25, 35 Zero To One 156
Index
modeling type 29 standardized values 65 Variance Estimate 151 variance of prediction 220 variances equality in t test 47 Vertical command 185
249
Notices
Technology License Notices The ImageMan DLL is used with permission of Data Techniques, Inc. Scintilla is Copyright 1998-2003 by Neil Hodgson . NEIL HODGSON DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL NEIL HODGSON BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
XRender is Copyright © 2002 Keith Packard. KEITH PACKARD DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL KEITH PACKARD BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. SAS INSTITUTE INC.’S LICENSORS MAKE NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, REGARDING THE SOFTWARE. SAS INSTITUTE INC.’S LICENSORS DO NOT WARRANT, GUARANTEE OR MAKE ANY REPRESENTATIONS REGARDING THE USE OR THE RESULTS OF THE USE OF THE SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, RELIABILITY, CURRENTNESS OR OTHERWISE. THE ENTIRE RISK AS TO THE RESULTS AND PERFORMANCE OF THE SOFTWARE IS ASSUMED BY YOU. THE EXCLUSION OF IMPLIED WARRANTIES IS NOT PERMITTED BY SOME STATES. THE ABOVE EXCLUSION MAY NOT APPLY TO YOU. IN NO EVENT WILL SAS INSTITUTE INC.’S LICENSORS AND THEIR DIRECTORS, OFFICERS, EMPLOYEES OR AGENTS (COLLECTIVELY SAS INSTITUTE INC.’S LICENSOR) BE LIABLE TO YOU FOR ANY CONSEQUENTIAL, INCIDENTAL OR INDIRECT DAMAGES (INCLUDING DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, AND THE LIKE) ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE EVEN IF SAS INSTITUTE INC.’S LICENSOR’S HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. BECAUSE SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATIONS MAY NOT APPLY TO YOU. SAS INSTITUTE INC.’S LICENSOR’S LIABILITY TO YOU FOR ACTUAL DAMAGES FOR ANY CAUSE WHATSOEVER, AND REGARDLESS OF THE FORM OF THE ACTION (WHETHER IN CONTRACT, TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY OR OTHERWISE WILL BE LIMITED TO
$50.00.